Skip to main content
GEO Scoring8 min read· Last updated:

Technical Accessibility for GEO: robots.txt, llms.txt, Sitemaps & Page Speed

The technical layer of GEO determines whether AI crawlers can reach your content at all. robots.txt misconfigurations, missing llms.txt, and slow page loads are silently blocking millions of sites from AI visibility.

By Kyle Fairburn, Founder & AI Specialist at NexRank

Why Technical Accessibility Is the Foundation of GEO

You can write the most authoritative, well-structured content in your industry — but if AI crawlers cannot reach it, none of it matters. Technical accessibility is the unglamorous prerequisite that makes every other GEO signal work.

This category covers five interconnected areas: robots.txt permissions, llms.txt, sitemap.xml, server-side rendering, and page speed. A failing score in any one of them can cap your overall GEO score regardless of how well you score elsewhere.

robots.txt: The Most Common GEO Mistake

robots.txt is a text file at yourdomain.com/robots.txt that tells crawlers which pages they can and cannot access. The file was designed for search engines in the 1990s — and a huge number of sites have configurations that accidentally block every AI crawler on the web.

The wildcard trap

The most common mistake looks like this:

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

This configuration tells every bot except Googlebot: do not access anything. GPTBot, ClaudeBot, PerplexityBot, Google-Extended — all blocked. If your robots.txt looks like this, you are invisible to AI search.

Getting the configuration right

A correctly configured robots.txt will explicitly allow each AI crawler by its user-agent name — not rely on a wildcard default — so access is clear regardless of what other rules are in the file. The key bots to address are: GPTBot (OpenAI/ChatGPT), ClaudeBot (Anthropic/Claude), PerplexityBot (Perplexity AI), Google-Extended (Gemini/AI Overviews), and Bingbot (Microsoft Copilot).

Run your free GEO scan to find out exactly which bots your robots.txt is allowing or blocking — and get the specific fix your site needs.

llms.txt: The AI-Native Discovery File

llms.txt is a relatively new open standard — inspired by robots.txt — designed specifically for AI language models. Place it at yourdomain.com/llms.txt to give AI systems a structured summary of your site.

A well-formed llms.txt file includes:

  • Company name and description: Who you are and what you do
  • Key page inventory: Your most important URLs with brief descriptions
  • Primary use cases: What questions your site answers
  • Contact and verification: How AI systems can verify your identity
  • Preferred citation format: How you want to be referenced in AI responses

Anthropic, Cloudflare, and dozens of other platforms have adopted llms.txt. Sites without it are missing a direct channel to communicate with AI crawlers. Creating a basic llms.txt takes under 30 minutes and has an outsized impact on AI discoverability.

sitemap.xml: Helping AI Crawlers Find Every Page

A sitemap.xml file at yourdomain.com/sitemap.xml lists every important page on your site with metadata about when it was last updated. Search crawlers — including AI crawlers — use sitemaps to discover pages they might otherwise miss.

Key sitemap best practices for GEO:

  1. Include priority values: Set <priority>1.0</priority> on your most important pages to guide crawl depth
  2. Keep lastmod accurate: The <lastmod> date should reflect genuine content updates, not just technical changes
  3. Submit to Google Search Console: Google-Extended piggybacks on Googlebot's crawl data
  4. Use IndexNow: Submit the sitemap to Bing for near-real-time Copilot/Perplexity notification

A missing or outdated sitemap means crawlers have to discover your content entirely through internal links — and may miss pages entirely.

Server-Side Rendering: Can AI Actually Read Your Pages?

Many modern websites render content using JavaScript frameworks (React, Vue, Angular). When a user visits, the browser runs JavaScript to build the page. The problem: most AI crawlers do not execute JavaScript. They receive the raw HTML response — and if your content only exists after JavaScript runs, they see nothing.

If your site was built with a modern SPA framework and you have not explicitly configured server-side rendering (SSR) or static site generation (SSG), there is a real chance AI crawlers are receiving an empty HTML shell. This is one of the most impactful — and most overlooked — technical GEO issues.

The GEO scan checks whether your key pages deliver their content in the initial HTML response, without requiring JavaScript execution. Sites that fail this check often see dramatic improvements in AI visibility after switching to SSR or SSG.

Page Speed: Crawl Budget and Content Extraction

Slow pages are crawled less frequently and less completely. AI crawlers — like all web bots — have time limits per request. A page that takes 6 seconds to load may be abandoned entirely.

The key metrics to target:

MetricGoodNeeds ImprovementPoor
Time to First Byte (TTFB)<200ms200–500ms>500ms
Largest Contentful Paint (LCP)<2.5s2.5–4s>4s
First Input Delay (FID)<100ms100–300ms>300ms

Page speed improvements that benefit both GEO and traditional SEO: image optimisation, removing unused JavaScript, enabling browser caching, and using a CDN.

What the GEO Scan Checks

Your free GEO scan tests all of these technical signals automatically — robots.txt bot access, llms.txt presence, sitemap validity, SSR detection, page speed, and HTTPS configuration — and shows you exactly which are passing and which need attention, with specific guidance for your site.

Technical accessibility is the highest-leverage starting point in GEO. Most fixes are one-time changes that unlock every other optimisation you make. Run your scan to see where you stand.

Check your GEO score for free

See how your website scores across all 8 GEO categories. Takes 60 seconds.

Get your free GEO score →