
ai-catalog.json, llms.txt, and AGENTS.md are not competing standards but a complementary stack for AI discoverability, covering the registry, retrieval, and environment layers respectively.AGENTS.md file has the highest leverage, demonstrating a 100% "selection flip" in experiments by instructing coding agents which tool to use before their search begins.llms.txt is crucial for ensuring agents can accurately read your documentation, while ai-catalog.json makes your tools discoverable to AI agent registries.Three standards. Three different layers. One ecosystem.
In the past year, three file formats have emerged that companies are being told to publish for "AI discoverability": AGENTS.md, llms.txt, and now the ARD spec's ai-catalog.json. If you've been wondering whether these compete, overlap, or which one you actually need — this is the answer.
Short answer: they serve completely different layers of the agentic discovery stack. You likely need all three. But understanding which layer each serves tells you where to start and why. This is the map.
Before we go deep, here's the full comparison so scanners can orient themselves:
ai-catalog.json (ARD) | llms.txt | AGENTS.md | |
|---|---|---|---|
| Layer | Registry/catalog | Retrieval/docs | Environment |
| What it does | Makes your tools discoverable to registry crawlers and federated agents | Tells agents where your docs are + includes directives | Instructs agents how to use your product + bans deprecated patterns |
| Who reads it | ARD registries, agent clients querying registries at runtime | Agents fetching your site directly during research | Coding agents reading your repo at session start |
| Where it lives | /.well-known/ai-catalog.json | /llms.txt on your domain | AGENTS.md in your repo root |
| When it fires | Agent queries a registry for capabilities | Agent browses to your site or fetches your docs | Agent starts a coding session in a project |
| Discovery mechanism | Registry crawlers + Agentmap in robots.txt + HTML link tag | Agents know to look for it; indexers like Context7 parse it | Agents (Claude Code, Codex, Cursor) read it at session start |
| Published by | Google/Microsoft/HuggingFace + AI Catalog Working Group | Answer.AI / Jeremy Howard | Anthropic (open standard) |
| Maturity | v0.9 draft (June 2026) | Informal standard (2024) | Widely adopted (2025) |
| Strongest effect | Registry discoverability | Docs quality + retrieval ranking | Selection flip (100% in our experiments) |
| Covers deprecated APIs? | No | Yes (directive section) | Yes (ALWAYS/NEVER directives) |
The rest of this article explains why each row looks the way it does, and how to prioritize your implementation.
Stage: Find → Research
ai-catalog.json is the newest and most formal of the three standards. Announced on June 17 by Google, Microsoft, and Hugging Face as part of the Agentic Resource Discovery (ARD) specification, it answers a very specific question: "When an AI agent queries a registry looking for a tool, will yours appear?"
You publish a JSON manifest at /.well-known/ai-catalog.json on your domain. This file declares your capabilities — MCP servers, A2A agents, APIs, skills — in a structured format that ARD registries can crawl and index. When an agent sends a natural-language query to a registry ("find me a payment processing tool"), the registry returns your tool if it matches.
The discovery chain has three links: registry crawlers find your ai-catalog.json, index your capabilities, and surface you to agents at runtime.
representativeQueriesThe most important field in the spec is representativeQueries — 2–5 task-oriented phrases that describe how an agent would naturally search for your tool. Think of it as keyword research for the agentic web. If your representative queries don't match how agents phrase their requests, you won't surface even in a registry you're indexed in.
It won't help you survive the web search fetch layer — our research shows agents only open ~6% of domains surfaced in web search. It doesn't instruct agents on how to use your API. And it does nothing to prevent agents from using deprecated API patterns pulled from stale training data.
Stage: Research
llms.txt is a Markdown index file hosted at /llms.txt on your domain. Proposed by Jeremy Howard of Answer.AI in September 2024, it exists to solve a structural problem: LLM context windows are finite, and your full documentation site is not something an agent can efficiently parse by scraping HTML through complex CSS layouts. As one developer put it in a Reddit thread on agent-readiness: "serving markdown mirrors alongside your HTML dramatically reduces the inference cost for agents and prevents the hallucinations that occur when they scrape complex CSS layouts."
llms.txt is the standardized answer to that problem.
When an agent browses to your site during a research task, it looks for /llms.txt to understand what documentation is available. The format is plain Markdown — a short description of your project, followed by a curated list of doc URLs with one-line descriptions. You can also optionally publish /llms-full.txt with the complete content pre-fetched.
Documentation indexers like Context7 — which sees 1.14M weekly npm installs, making it far more active than alternatives like Smithery's 6.8K "uses" count — parse these files to build retrieval indexes that agents query in real time.
1. Index function: A curated, described table of contents for your docs. Instead of an agent crawling your entire site and guessing which pages matter, it gets a direct map. This is the difference between an agent finding your API reference on the first try versus hallucinating an endpoint from a blog post.
2. Directive function: An optional section for explicit ALWAYS/NEVER instructions. This is where you enforce behavioral guardrails — "always check npm for the latest version, never hardcode version numbers from training data." It's also where you kill deprecated API patterns that agents keep pulling from their stale training priors.
Here's the data point that should make you take llms.txt maintenance seriously: staleness is the strongest correlate of retrieval quality in our research, with a Spearman correlation of ρ = −0.54. The freshest 5 entries in Context7 average a benchmark score of 83.6; the stalest 5 average 72.3. That's an 11-point gap driven entirely by recency.
If you publish llms.txt and never update it, it doesn't just stop helping — it actively drags your retrieval ranking. This matters especially given that agent training data is approximately ~5 months stale as of June 2026. Your docs are the freshness bridge.
It doesn't make your tool discoverable to agent registries. And it doesn't inject your preferences directly into the agent's working environment at the start of a coding session — that's the next layer's job.
Stage: Shortlist → Act
AGENTS.md is a Markdown file placed at the root of a code repository. It's an open standard from Anthropic, widely adopted across the major coding agent ecosystem in 2025. Variations include CLAUDE.md for Claude Code projects and rule files in .cursor/rules/ for Cursor. Every coding agent — Claude Code, GitHub Copilot, Cursor, Codex — reads it at the start of every session, before doing anything else.
Think of it as the product brief every agent reads before touching your code.
When a developer opens a project containing an AGENTS.md, the agent loads it into context at session start. Your product preferences, API version directives, deprecated pattern blocklists, and "use X not Y" instructions are all present before the agent writes a single line of code. This happens before any web search, before any registry query, before any documentation retrieval.
This is the data point that reframes how seriously you should take AGENTS.md. In our experiment, a single one-paragraph AGENTS.md mandating Stack Auth over NextAuth produced a 100% selection flip — reversing the agent's default choice from 3/3 (NextAuth) to 0/3 (NextAuth), a complete reversal. Every agent chose Stack Auth.
The more telling detail: the agents didn't just comply. They rationalized the choice, generating reasons why Stack Auth was technically superior in this context. The environment layer doesn't just instruct — it reframes cognition.
Bun has operationalized this insight at the scaffolder level. Running bun init automatically writes CLAUDE.md and a Cursor rules file into every new project it creates. The Cursor rules filename isn't subtle: use-bun-instead-of-node-vite-npm-pnpm.mdc.
Every developer who scaffolds a project with Bun gets an agent environment that permanently biases their coding assistant toward Bun. That's not a marketing campaign — that's infrastructure-level product placement.
As detailed in Atlan's guide to AGENTS.md, effective files contain concrete, imperative instructions — not prose. The key sections:
NEVER use the v1 completions endpoint)ALWAYS use the streaming SDK, not raw HTTP)If coding agents are ever in the decision loop for your product — choosing between your SDK and a competitor's, writing integration code, scaffolding new projects — AGENTS.md is your highest-leverage starting point. The environment layer fires before everything.
It only fires when a developer already has your code in their project. It provides zero lift for initial discovery — the phase before a user has chosen your product at all. It also doesn't index your public documentation for retrieval.
This is the insight that makes prioritization clear.
The agentic discovery pipeline runs in a specific order, and understanding that order changes which file you build first:
AGENTS.md) — Loaded at session start. Fires first, before everything.llms.txt) — Agent fetches /llms.txt or queries an index like Context7 to navigate your docs.ai-catalog.json) — Agent queries an ARD registry for a specific capability type.The counterintuitive part: AGENTS.md fires before steps 2–5. It's not in the discovery pipeline — it precedes the discovery pipeline.
That's why the 100% selection flip is possible. When the environment layer is set, the agent enters every downstream step already oriented toward your product. Web search results, retrieval quality, and registry listings all operate inside a frame you've already established.
| Your situation | Start here |
|---|---|
| Developer tool with an SDK or CLI | AGENTS.md — highest leverage, immediate effect on agent decisions |
| MCP server or API targeting enterprise | ai-catalog.json — especially as Google's Agent Registry scales |
| Breaking API changes in last 18 months | llms.txt with directives — bridge the stale training data window |
| All of the above | All three — they don't overlap, they compound |
For a SaaS developer tool, a full agentic discovery implementation covers all three layers plus the two ARD discovery mechanisms:
1. Environment layer
AGENTS.md at the root of your public SDK repo or starter template2. Retrieval layer
llms.txt at https://yourdomain.com/llms.txtALWAYS/NEVER behavioral rules3. Registry layer
ai-catalog.json at https://yourdomain.com/.well-known/ai-catalog.jsonrepresentativeQueries with 2–5 task-oriented phrases that match how agents search for your capability4. ARD discovery signal — robots.txt
Agentmap directive pointing to your catalog so ARD crawlers can find it without guessing5. ARD discovery signal — HTML
<link rel="ai-catalog" href="/.well-known/ai-catalog.json"> to the <head> of your siteFive files. Five different jobs. No redundancy. They form a complete stack that covers the full agent discovery lifecycle: being found in registries, being read in retrieval, and being preferred in environment.
The three standards aren't competing — they're a stack. Each one covers a layer the others can't:
ai-catalog.json. Gets you into registries when agents search for capabilities.llms.txt. Makes your docs navigable and your behavioral rules explicit when agents read your site.AGENTS.md. Flips agent selection in your favor before any of the above even runs.The confusion comes from treating these as alternatives. They're not. A company serious about agentic go-to-market needs all three, implemented correctly, kept fresh.
This is also just the foundation. There's an 8-play gap beyond these three files: getting surfaced in agent web search, writing code snippets agents actually use, building scaffolder injection (the Bun strategy), creating agent-first onboarding flows, and running eval harnesses to measure how often agents choose your product unprompted. The full framework maps all 11 plays across four stages — find, research, shortlist, and act. See our agentic discovery playbook.
The agentic web is not a future concern. Agents are already querying registries, fetching docs, and writing integration code. Findability is consistently weak for most websites today — agents can't use what they can't discover. These three files are where you start fixing that.
Implementing this full stack is the foundation for a complete agentic go-to-market strategy. To see how Synscribe can build and manage your agentic discovery presence, explore our GEO platform.
They operate on three different layers of the agent discovery stack. ai-catalog.json is for the registry layer (getting found), llms.txt is for the retrieval layer (getting read correctly), and AGENTS.md is for the environment layer (getting installed and used properly). They are complementary, not competing.
Yes, for complete agent discoverability, you likely need all three. They serve distinct, non-overlapping functions in the agent workflow. ai-catalog.json handles initial discovery in registries, llms.txt ensures accurate documentation retrieval, and AGENTS.md guides implementation within a coding environment.
Your priority depends on your product. For developer tools with an SDK, start with AGENTS.md for the highest leverage. For enterprise APIs needing runtime discovery, prioritize ai-catalog.json. If you have breaking API changes, start with llms.txt to prevent agents from using deprecated code.
AGENTS.md is loaded into the agent's context at the very start of a coding session. This occurs before any web search, registry query, or doc retrieval. By providing direct instructions and preferences at this initial stage, it effectively reframes the agent's cognition and makes your tool the default choice.
Each file has a specific, standardized location. Place ai-catalog.json at /.well-known/ai-catalog.json on your domain, llms.txt at the root of your domain (/llms.txt), and AGENTS.md in the root directory of your code repository (e.g., on GitHub).
Stale documentation is a primary cause of poor agent performance. Research shows a strong negative correlation (ρ = −0.54) between document freshness and retrieval quality. An outdated llms.txt not only stops helping but can actively harm your product's discoverability and lead agents to use deprecated patterns.
Synscribe helps B2B companies with SEO & GEO using programmatic SEO approach. Book a call to find out how we help you win.