Robots.txt for AI & LLMs: Deep Dive

TL;DR • Robots.txt began as a crawl-management tool but now doubles as a policy signal in the AI era. Keep indexers in, keep training scrapers out. • Crawling ≠ Indexing. Robots.txt controls access, not whether a URL appears in results. Use noindex/X-Robots-Tag for de-indexing. • Use dual control: explicitly allow Googlebot/Bingbot; disallow GPTBot (and similar). […]