Why Technical Accessibility Is the Foundation of GEO
You can write the most authoritative, well-structured content in your industry — but if AI crawlers cannot reach it, none of it matters. Technical accessibility is the unglamorous prerequisite that makes every other GEO signal work.
This category covers five interconnected areas: robots.txt permissions, llms.txt, sitemap.xml, server-side rendering, and page speed. A failing score in any one of them can cap your overall GEO score regardless of how well you score elsewhere.
robots.txt: The Most Common GEO Mistake
robots.txt is a text file at yourdomain.com/robots.txt that tells crawlers which pages they can and cannot access. The file was designed for search engines in the 1990s — and a huge number of sites have configurations that accidentally block every AI crawler on the web.
The wildcard trap
The most common mistake looks like this:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /This configuration tells every bot except Googlebot: do not access anything. GPTBot, ClaudeBot, PerplexityBot, Google-Extended — all blocked. If your robots.txt looks like this, you are invisible to AI search.
Getting the configuration right
A correctly configured robots.txt will explicitly allow each AI crawler by its user-agent name — not rely on a wildcard default — so access is clear regardless of what other rules are in the file. The key bots to address are: GPTBot (OpenAI/ChatGPT), ClaudeBot (Anthropic/Claude), PerplexityBot (Perplexity AI), Google-Extended (Gemini/AI Overviews), and Bingbot (Microsoft Copilot).
Run your free GEO scan to find out exactly which bots your robots.txt is allowing or blocking — and get the specific fix your site needs.
llms.txt: The AI-Native Discovery File
llms.txt is a relatively new open standard — inspired by robots.txt — designed specifically for AI language models. Place it at yourdomain.com/llms.txt to give AI systems a structured summary of your site.
A well-formed llms.txt file includes:
- Company name and description: Who you are and what you do
- Key page inventory: Your most important URLs with brief descriptions
- Primary use cases: What questions your site answers
- Contact and verification: How AI systems can verify your identity
- Preferred citation format: How you want to be referenced in AI responses
Anthropic, Cloudflare, and dozens of other platforms have adopted llms.txt. Sites without it are missing a direct channel to communicate with AI crawlers. Creating a basic llms.txt takes under 30 minutes and has an outsized impact on AI discoverability.
sitemap.xml: Helping AI Crawlers Find Every Page
A sitemap.xml file at yourdomain.com/sitemap.xml lists every important page on your site with metadata about when it was last updated. Search crawlers — including AI crawlers — use sitemaps to discover pages they might otherwise miss.
Key sitemap best practices for GEO:
- Include priority values: Set
<priority>1.0</priority>on your most important pages to guide crawl depth - Keep lastmod accurate: The
<lastmod>date should reflect genuine content updates, not just technical changes - Submit to Google Search Console: Google-Extended piggybacks on Googlebot's crawl data
- Use IndexNow: Submit the sitemap to Bing for near-real-time Copilot/Perplexity notification
A missing or outdated sitemap means crawlers have to discover your content entirely through internal links — and may miss pages entirely.
Server-Side Rendering: Can AI Actually Read Your Pages?
Many modern websites render content using JavaScript frameworks (React, Vue, Angular). When a user visits, the browser runs JavaScript to build the page. The problem: most AI crawlers do not execute JavaScript. They receive the raw HTML response — and if your content only exists after JavaScript runs, they see nothing.
If your site was built with a modern SPA framework and you have not explicitly configured server-side rendering (SSR) or static site generation (SSG), there is a real chance AI crawlers are receiving an empty HTML shell. This is one of the most impactful — and most overlooked — technical GEO issues.
The GEO scan checks whether your key pages deliver their content in the initial HTML response, without requiring JavaScript execution. Sites that fail this check often see dramatic improvements in AI visibility after switching to SSR or SSG.
Page Speed: Crawl Budget and Content Extraction
Slow pages are crawled less frequently and less completely. AI crawlers — like all web bots — have time limits per request. A page that takes 6 seconds to load may be abandoned entirely.
The key metrics to target:
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| Time to First Byte (TTFB) | <200ms | 200–500ms | >500ms |
| Largest Contentful Paint (LCP) | <2.5s | 2.5–4s | >4s |
| First Input Delay (FID) | <100ms | 100–300ms | >300ms |
Page speed improvements that benefit both GEO and traditional SEO: image optimisation, removing unused JavaScript, enabling browser caching, and using a CDN.
What the GEO Scan Checks
Your free GEO scan tests all of these technical signals automatically — robots.txt bot access, llms.txt presence, sitemap validity, SSR detection, page speed, and HTTPS configuration — and shows you exactly which are passing and which need attention, with specific guidance for your site.
Technical accessibility is the highest-leverage starting point in GEO. Most fixes are one-time changes that unlock every other optimisation you make. Run your scan to see where you stand.
Check your GEO score for free
See how your website scores across all 8 GEO categories. Takes 60 seconds.
Get your free GEO score →