Robots.txt
Robots.txt is a text file placed at the root of a website (at the address domain.com/robots.txt) that tells crawlers, such as Googlebot, which parts of the site they are allowed or not allowed to access. It relies on the Robots Exclusion Protocol and uses simple directives: User-agent specifies the targeted crawler, Disallow forbids the crawling of a path, and Allow permits it. Robots.txt controls crawling, meaning access to content, but does not guarantee de-indexing: a blocked URL can still appear in search results if it receives links. It is an essential tool for managing crawl budget, preventing crawlers from wasting their resources on pages with no SEO value, such as admin pages, shopping carts, or faceted filters. A misconfigured robots.txt can unintentionally block strategic pages and seriously harm a site's visibility in search.
Robots.txt is one of the first files a crawler checks when it visits a site. Although tiny, it has a direct impact on how your content is discovered and explored by search engines and, increasingly, by AI crawlers.
How it works
The file follows the Robots Exclusion Protocol. It groups directives into blocks, each targeting one or more crawlers via the User-agent directive. The Disallow and Allow directives then define forbidden or permitted paths. Here is a simple example:
User-agent: *
Disallow: /admin/
Disallow: /cart/
Allow: /
Sitemap: https://domain.com/sitemap.xml
The Sitemap line points to the XML sitemap, helping crawlers discover all the important URLs. Well-behaved crawlers, such as Googlebot, read this file before exploring the site.
Why it matters
The main benefit of robots.txt is controlling your crawl budget. On a large site, preventing crawlers from wasting resources on low-value pages (filters, URL parameters, private areas) lets them focus on strategic content. Conversely, a syntax error can block entire sections and make a site disappear from search results.
Robots.txt and generative AI
In 2026, robots.txt plays a new role: it lets you allow or block the crawlers of language models (GPTBot, ClaudeBot, PerplexityBot, Google-Extended). Blocking these agents protects your content from training, but can also reduce your chances of being cited in AI assistant answers. This strategic trade-off is now an integral part of a modern visibility approach, to be coordinated with your llms.txt file and your overall strategy.
Questions fréquentes
No. Robots.txt blocks crawling, not indexing. A blocked URL can still appear in search results if other pages link to it. To prevent indexing, use the meta robots noindex tag on a page that remains accessible to crawlers.
The file must be placed at the root of the domain, accessible at domain.com/robots.txt. Placed anywhere else, it will be ignored by crawlers. Each subdomain requires its own robots.txt file.
Termes & ressources liés
Une question sur votre visibilité IA ?
Score de visibilité IA de votre site. Gap analysis vs 3 concurrents directs. 5 optimisations prioritaires. Livré en PDF, sans engagement.
Réponse sous 24h · Sans engagement · contact@luwiz.io