ARD vs. llms.txt vs. AGENTS.md: Which Agentic Discovery Standard Do You Actually Need?

ARD vs. llms.txt vs. AGENTS.md: Which Agentic Discovery Standard Do You Actually Need?

Summary

  • ai-catalog.json, llms.txt, and AGENTS.md are not competing standards but a complementary stack for AI discoverability, covering the registry, retrieval, and environment layers respectively.
  • The AGENTS.md file has the highest leverage, demonstrating a 100% "selection flip" in experiments by instructing coding agents which tool to use before their search begins.
  • llms.txt is crucial for ensuring agents can accurately read your documentation, while ai-catalog.json makes your tools discoverable to AI agent registries.
  • Implementing this full stack is the foundation for a complete agentic go-to-market strategy — see Synscribe's Agentic Discovery Playbook for the full 11-play framework.

Three standards. Three different layers. One ecosystem.

In the past year, three file formats have emerged that companies are being told to publish for "AI discoverability": AGENTS.md, llms.txt, and now the ARD spec's ai-catalog.json. If you've been wondering whether these compete, overlap, or which one you actually need — this is the answer.

Short answer: they serve completely different layers of the agentic discovery stack. You likely need all three. But understanding which layer each serves tells you where to start and why. This is the map.

The Quick View: All Three at a Glance

Before we go deep, here's the full comparison so scanners can orient themselves:

ai-catalog.json (ARD)llms.txtAGENTS.md
LayerRegistry/catalogRetrieval/docsEnvironment
What it doesMakes your tools discoverable to registry crawlers and federated agentsTells agents where your docs are + includes directivesInstructs agents how to use your product + bans deprecated patterns
Who reads itARD registries, agent clients querying registries at runtimeAgents fetching your site directly during researchCoding agents reading your repo at session start
Where it lives/.well-known/ai-catalog.json/llms.txt on your domainAGENTS.md in your repo root
When it firesAgent queries a registry for capabilitiesAgent browses to your site or fetches your docsAgent starts a coding session in a project
Discovery mechanismRegistry crawlers + Agentmap in robots.txt + HTML link tagAgents know to look for it; indexers like Context7 parse itAgents (Claude Code, Codex, Cursor) read it at session start
Published byGoogle/Microsoft/HuggingFace + AI Catalog Working GroupAnswer.AI / Jeremy HowardAnthropic (open standard)
Maturityv0.9 draft (June 2026)Informal standard (2024)Widely adopted (2025)
Strongest effectRegistry discoverabilityDocs quality + retrieval rankingSelection flip (100% in our experiments)
Covers deprecated APIs?NoYes (directive section)Yes (ALWAYS/NEVER directives)

The rest of this article explains why each row looks the way it does, and how to prioritize your implementation.

What is ai-catalog.json (ARD)? — The Registry Layer

Stage: Find → Research

ai-catalog.json is the newest and most formal of the three standards. Announced on June 17 by Google, Microsoft, and Hugging Face as part of the Agentic Resource Discovery (ARD) specification, it answers a very specific question: "When an AI agent queries a registry looking for a tool, will yours appear?"

How It Works

You publish a JSON manifest at /.well-known/ai-catalog.json on your domain. This file declares your capabilities — MCP servers, A2A agents, APIs, skills — in a structured format that ARD registries can crawl and index. When an agent sends a natural-language query to a registry ("find me a payment processing tool"), the registry returns your tool if it matches.

The discovery chain has three links: registry crawlers find your ai-catalog.json, index your capabilities, and surface you to agents at runtime.

The Critical Field: representativeQueries

The most important field in the spec is representativeQueries — 2–5 task-oriented phrases that describe how an agent would naturally search for your tool. Think of it as keyword research for the agentic web. If your representative queries don't match how agents phrase their requests, you won't surface even in a registry you're indexed in.

When To Prioritize ai-catalog.json

  • You have an MCP server, A2A agent, or developer API that needs runtime discoverability
  • You're selling into enterprise environments where agents query managed registries rather than the open web
  • You want positioning as Google's Gemini Enterprise Agent Platform and its integrated Agent Registry come online

What ai-catalog.json Doesn't Do

It won't help you survive the web search fetch layer — our research shows agents only open ~6% of domains surfaced in web search. It doesn't instruct agents on how to use your API. And it does nothing to prevent agents from using deprecated API patterns pulled from stale training data.

What is llms.txt? — The Retrieval Layer

Stage: Research

llms.txt is a Markdown index file hosted at /llms.txt on your domain. Proposed by Jeremy Howard of Answer.AI in September 2024, it exists to solve a structural problem: LLM context windows are finite, and your full documentation site is not something an agent can efficiently parse by scraping HTML through complex CSS layouts. As one developer put it in a Reddit thread on agent-readiness: "serving markdown mirrors alongside your HTML dramatically reduces the inference cost for agents and prevents the hallucinations that occur when they scrape complex CSS layouts."

llms.txt is the standardized answer to that problem.

How It Works

When an agent browses to your site during a research task, it looks for /llms.txt to understand what documentation is available. The format is plain Markdown — a short description of your project, followed by a curated list of doc URLs with one-line descriptions. You can also optionally publish /llms-full.txt with the complete content pre-fetched.

Documentation indexers like Context7 — which sees 1.14M weekly npm installs, making it far more active than alternatives like Smithery's 6.8K "uses" count — parse these files to build retrieval indexes that agents query in real time.

Two Functions in One File

1. Index function: A curated, described table of contents for your docs. Instead of an agent crawling your entire site and guessing which pages matter, it gets a direct map. This is the difference between an agent finding your API reference on the first try versus hallucinating an endpoint from a blog post.

2. Directive function: An optional section for explicit ALWAYS/NEVER instructions. This is where you enforce behavioral guardrails — "always check npm for the latest version, never hardcode version numbers from training data." It's also where you kill deprecated API patterns that agents keep pulling from their stale training priors.

The Freshness Problem

Here's the data point that should make you take llms.txt maintenance seriously: staleness is the strongest correlate of retrieval quality in our research, with a Spearman correlation of ρ = −0.54. The freshest 5 entries in Context7 average a benchmark score of 83.6; the stalest 5 average 72.3. That's an 11-point gap driven entirely by recency.

If you publish llms.txt and never update it, it doesn't just stop helping — it actively drags your retrieval ranking. This matters especially given that agent training data is approximately ~5 months stale as of June 2026. Your docs are the freshness bridge.

When To Prioritize llms.txt

  • Any developer tool, API, or platform where agents need to read your docs as part of their workflow
  • If you've had breaking API changes in the last 18 months — use the directive section to explicitly kill deprecated patterns before agents use them in production code

What llms.txt Doesn't Do

It doesn't make your tool discoverable to agent registries. And it doesn't inject your preferences directly into the agent's working environment at the start of a coding session — that's the next layer's job.

What is AGENTS.md? — The Environment Layer

Stage: Shortlist → Act

AGENTS.md is a Markdown file placed at the root of a code repository. It's an open standard from Anthropic, widely adopted across the major coding agent ecosystem in 2025. Variations include CLAUDE.md for Claude Code projects and rule files in .cursor/rules/ for Cursor. Every coding agent — Claude Code, GitHub Copilot, Cursor, Codex — reads it at the start of every session, before doing anything else.

Think of it as the product brief every agent reads before touching your code.

How It Works

When a developer opens a project containing an AGENTS.md, the agent loads it into context at session start. Your product preferences, API version directives, deprecated pattern blocklists, and "use X not Y" instructions are all present before the agent writes a single line of code. This happens before any web search, before any registry query, before any documentation retrieval.

The 100% selection flip (Experiment E1)

This is the data point that reframes how seriously you should take AGENTS.md. In our experiment, a single one-paragraph AGENTS.md mandating Stack Auth over NextAuth produced a 100% selection flip — reversing the agent's default choice from 3/3 (NextAuth) to 0/3 (NextAuth), a complete reversal. Every agent chose Stack Auth.

The more telling detail: the agents didn't just comply. They rationalized the choice, generating reasons why Stack Auth was technically superior in this context. The environment layer doesn't just instruct — it reframes cognition.

The Bun Strategy (A Real-World Example)

Bun has operationalized this insight at the scaffolder level. Running bun init automatically writes CLAUDE.md and a Cursor rules file into every new project it creates. The Cursor rules filename isn't subtle: use-bun-instead-of-node-vite-npm-pnpm.mdc.

Every developer who scaffolds a project with Bun gets an agent environment that permanently biases their coding assistant toward Bun. That's not a marketing campaign — that's infrastructure-level product placement.

What Goes in AGENTS.md

As detailed in Atlan's guide to AGENTS.md, effective files contain concrete, imperative instructions — not prose. The key sections:

  • Deprecated API blocklists (NEVER use the v1 completions endpoint)
  • Preferred patterns (ALWAYS use the streaming SDK, not raw HTTP)
  • Task-specific instructions (setup steps, testing conventions)
  • Product quickstart — the fastest path from zero to working code

When To Prioritize AGENTS.md

If coding agents are ever in the decision loop for your product — choosing between your SDK and a competitor's, writing integration code, scaffolding new projects — AGENTS.md is your highest-leverage starting point. The environment layer fires before everything.

What AGENTS.md Doesn't Do

It only fires when a developer already has your code in their project. It provides zero lift for initial discovery — the phase before a user has chosen your product at all. It also doesn't index your public documentation for retrieval.

The Sequence: When Each Layer Actually Fires

This is the insight that makes prioritization clear.

The agentic discovery pipeline runs in a specific order, and understanding that order changes which file you build first:

  1. Environment layer (AGENTS.md) — Loaded at session start. Fires first, before everything.
  2. Training prior — The agent's baseline knowledge, approximately ~5 months stale as of June 2026.
  3. Web search + fetch — Agent runs queries, but only opens ~6% of surfaced domains and discards ~48% of verified claims it encounters (Synscribe research).
  4. Retrieval layer (llms.txt) — Agent fetches /llms.txt or queries an index like Context7 to navigate your docs.
  5. Registry layer (ai-catalog.json) — Agent queries an ARD registry for a specific capability type.

The counterintuitive part: AGENTS.md fires before steps 2–5. It's not in the discovery pipeline — it precedes the discovery pipeline.

That's why the 100% selection flip is possible. When the environment layer is set, the agent enters every downstream step already oriented toward your product. Web search results, retrieval quality, and registry listings all operate inside a frame you've already established.

Priority Recommendation by Situation

Your situationStart here
Developer tool with an SDK or CLIAGENTS.md — highest leverage, immediate effect on agent decisions
MCP server or API targeting enterpriseai-catalog.json — especially as Google's Agent Registry scales
Breaking API changes in last 18 monthsllms.txt with directives — bridge the stale training data window
All of the aboveAll three — they don't overlap, they compound

The Full Stack: What a Complete Implementation Looks Like

For a SaaS developer tool, a full agentic discovery implementation covers all three layers plus the two ARD discovery mechanisms:

1. Environment layer

  • AGENTS.md at the root of your public SDK repo or starter template
  • Include deprecated API blocklists, preferred patterns, and quickstart instructions in imperative form

2. Retrieval layer

  • llms.txt at https://yourdomain.com/llms.txt
  • Curated list of your key documentation pages with one-line descriptions
  • Directive section with ALWAYS/NEVER behavioral rules
  • Update it on every major release — freshness is ρ = −0.54 correlated with retrieval quality

3. Registry layer

  • ai-catalog.json at https://yourdomain.com/.well-known/ai-catalog.json
  • Fill representativeQueries with 2–5 task-oriented phrases that match how agents search for your capability
  • Reference the ARD specification for the full data model

4. ARD discovery signal — robots.txt

  • Add an Agentmap directive pointing to your catalog so ARD crawlers can find it without guessing

5. ARD discovery signal — HTML

  • Add <link rel="ai-catalog" href="/.well-known/ai-catalog.json"> to the <head> of your site

Five files. Five different jobs. No redundancy. They form a complete stack that covers the full agent discovery lifecycle: being found in registries, being read in retrieval, and being preferred in environment.

Build Your Full Agentic Discovery Stack

The three standards aren't competing — they're a stack. Each one covers a layer the others can't:

  • ai-catalog.json. Gets you into registries when agents search for capabilities.
  • llms.txt. Makes your docs navigable and your behavioral rules explicit when agents read your site.
  • AGENTS.md. Flips agent selection in your favor before any of the above even runs.

The confusion comes from treating these as alternatives. They're not. A company serious about agentic go-to-market needs all three, implemented correctly, kept fresh.

This is also just the foundation. There's an 8-play gap beyond these three files: getting surfaced in agent web search, writing code snippets agents actually use, building scaffolder injection (the Bun strategy), creating agent-first onboarding flows, and running eval harnesses to measure how often agents choose your product unprompted. The full framework maps all 11 plays across four stages — find, research, shortlist, and act. See our agentic discovery playbook.

The agentic web is not a future concern. Agents are already querying registries, fetching docs, and writing integration code. Findability is consistently weak for most websites today — agents can't use what they can't discover. These three files are where you start fixing that.

Implementing this full stack is the foundation for a complete agentic go-to-market strategy. To see how Synscribe can build and manage your agentic discovery presence, explore our GEO platform.

Frequently Asked Questions

What is the main difference between ai-catalog.json, llms.txt, and AGENTS.md?

They operate on three different layers of the agent discovery stack. ai-catalog.json is for the registry layer (getting found), llms.txt is for the retrieval layer (getting read correctly), and AGENTS.md is for the environment layer (getting installed and used properly). They are complementary, not competing.

Do I really need all three files for my product?

Yes, for complete agent discoverability, you likely need all three. They serve distinct, non-overlapping functions in the agent workflow. ai-catalog.json handles initial discovery in registries, llms.txt ensures accurate documentation retrieval, and AGENTS.md guides implementation within a coding environment.

Which of the three agent discovery files should I prioritize creating first?

Your priority depends on your product. For developer tools with an SDK, start with AGENTS.md for the highest leverage. For enterprise APIs needing runtime discovery, prioritize ai-catalog.json. If you have breaking API changes, start with llms.txt to prevent agents from using deprecated code.

How does AGENTS.md work to change an AI agent's choice of tool?

AGENTS.md is loaded into the agent's context at the very start of a coding session. This occurs before any web search, registry query, or doc retrieval. By providing direct instructions and preferences at this initial stage, it effectively reframes the agent's cognition and makes your tool the default choice.

Where are ai-catalog.json, llms.txt, and AGENTS.md supposed to be located?

Each file has a specific, standardized location. Place ai-catalog.json at /.well-known/ai-catalog.json on your domain, llms.txt at the root of your domain (/llms.txt), and AGENTS.md in the root directory of your code repository (e.g., on GitHub).

Why is it so important to keep llms.txt updated?

Stale documentation is a primary cause of poor agent performance. Research shows a strong negative correlation (ρ = −0.54) between document freshness and retrieval quality. An outdated llms.txt not only stops helping but can actively harm your product's discoverability and lead agents to use deprecated patterns.

Tags:
Published on June 18, 2026

Dominate ChatGPT and Google Search

Synscribe helps B2B companies with SEO & GEO using programmatic SEO approach. Book a call to find out how we help you win.