> ## Documentation Index
> Fetch the complete guide index at: https://www.synscribe.com/agentic-discovery/llms.txt
> Use this file to discover all pages before exploring further.

---
title: "Serve Docs as Markdown: .md Mirrors & Content Negotiation"
description: "How to serve docs as markdown for AI agents: .md mirrors, content negotiation, frontmatter, agent banners, and a 9-probe CI suite."
slug: /agentic-discovery/markdown-docs-for-ai-agents
series: The Agentic Discovery Playbook — Play 6 of 11 · GET READ
last_verified: 2026-06-11
---

# Serve Your Docs as Markdown: .md Mirrors, Content Negotiation, and Agent Banners

> **In short:** Every documentation URL should serve clean markdown to agents — append `.md` to any canonical URL and return `text/markdown`, and serve markdown on the canonical URL itself to non-browser clients (content negotiation). Stripe, Supabase, Next.js, and Prisma all converged on this; a 9-probe CI suite keeps it from silently breaking.

## Do this now

- [ ] Make `.md` work on every canonical docs URL: HTTP 200, `Content-Type: text/markdown` (or `text/plain`), non-empty body.
- [ ] Mirror every HTML redirect into `.md` space — if `/docs/old` → `/docs/new`, then `/docs/old.md` → `/docs/new.md` (Supabase does; most hand-rolled setups forget).
- [ ] Add content negotiation: non-browser user agents get markdown at the canonical URL itself (the Next.js gold standard), with `Vary: Accept, User-Agent` set.
- [ ] Put YAML frontmatter on every `.md` page: `url`, `docs_index`, `version`, `lastUpdated`, `prerequisites`.
- [ ] Add the agent discovery banner pointing to llms.txt at the top of every `.md` page.
- [ ] Ship a machine-readable `changelog.md` at the site root.
- [ ] Add "Copy as Markdown" / "Ask ChatGPT" / "Ask Claude" buttons to HTML pages (the Supabase pattern).
- [ ] Wire the 9-probe CI suite into every docs deploy; any failure blocks release.

*Scope: [Play 5](/agentic-discovery/llms-txt) built the index file; this page makes every page that index links to agent-readable. What goes inside the code blocks on those pages is [Play 7](/agentic-discovery/code-snippets-for-ai-agents).*

## Why serve markdown when agents can parse HTML?

Because a meaningful share of agent traffic never runs an HTML parser, and the rest prefer not to. As of 2026-06-11, Context7 fetch traffic includes Python HTTPX at 1.8% and Go HTTP clients at 0.8% — roughly 2.6% of queries are raw scripts fetching docs directly. The major agents (Claude Code 43.4%, Opencode 15.3%, Codex 14.0%, Cursor 6.5%) handle HTML but strongly favor markdown: it's denser per token, with no nav chrome or cookie-banner noise.

Two more reasons from the data. First, clean markdown is what indexes parse well — Convex's docs-site Context7 entry benchmarks 91.6 versus 79.9 for its repo entry, because sites are pre-filtered for user-facing content. Second, mirrors generated by your docs build inherit freshness automatically, and freshness is the strongest benchmark correlate we measured (Spearman −0.54 across n=17 entries; freshest-5 average 83.6 vs. stalest-5 at 72.3). A separately maintained export rots; a build-generated mirror can't.

And this is the delivery vehicle for fixing stale agent output. In our pilot trials (single model, n=2 per arm), agents with current doc text in context went from 2/2 emitting obsolete Tailwind v3 config to 0/2. A fetchable `.md` page with a `lastUpdated` stamp is how that current text reaches the agent.

**Who needs this play:** every docs site. On a Mintlify-class platform it's a settings check (half a day); hand-rolled, it's 2–5 days plus a day for the CI suite.

## What is the .md suffix convention?

Append `.md` to any canonical docs URL and get the same page as raw markdown. The leaders converged on it independently — convergent evolution across competitors is about the strongest "this is table stakes" signal there is. Verified 2026-06-11:

| Product | What we observed |
|---|---|
| **Stripe** | `docs.stripe.com/mcp.md` returns `text/plain` full markdown; all 472 llms.txt links use `.md` |
| **Supabase** | `/docs/guides/auth.md` → `text/markdown`; redirects preserved in `.md` space |
| **Next.js** | `installation.md` serves `text/markdown` with YAML frontmatter and footer pointers to `/docs/sitemap.md`; its llms.txt states "URL Format: Documentation URLs support the .md extension" |
| **Prisma** | `.md` URLs redirect to the current versioned path (`/docs/orm/v6/...`) — old links agents memorized keep resolving |
| **Clerk** | `quickstarts/nextjs.md` → 200 `text/markdown`, written as agent instructions |
| **Mintlify fleet** (Polar, Bun, Crossmint, x402, DodoPayments) | `.md` mirrors + agent banner as platform defaults |

The redirect detail matters more than it looks: docs reorgs are common, agents and indexes memorize deep links, and a `.md` mirror without mirrored redirects 404s its entire history after the first reorg. Prisma's versioned redirects and Supabase's preserved `.md` redirects are the patterns to copy.

## What is content negotiation, and why is Next.js the gold standard?

Content negotiation means the *canonical* URL — no suffix — serves markdown when the requester isn't a browser. Next.js is the verified gold standard here: during our audit it served markdown to our non-browser fetcher on canonical URLs, no `.md` suffix needed. The `.md` convention requires the agent to guess; negotiation catches the agents that never guessed.

Implementation lives in middleware: if the `Accept` header lacks `text/html`, or the user agent matches a known agent/CLI list (curl, HTTPX, Go-http-client, claude, etc.), serve the markdown rendering at the HTML URL. Two rules:

1. Always send `Vary: Accept, User-Agent` — without it your CDN caches the markdown variant and serves it to browsers (or vice versa).
2. Keep the `.md` suffix working anyway, for explicit requests and for llms.txt links.

A sketch of the hand-rolled `.md` route (Next.js):

```ts
export async function GET(req: Request, { params }: Ctx) {
  const slug = params.slug.join("/");
  if (!slug.endsWith(".md")) return next();
  const path = slug.slice(0, -3);
  const redirect = redirectMap[path];               // one map, two route handlers
  if (redirect) return Response.redirect(`/docs/${redirect}.md`, 308);
  const doc = await getDoc(path);
  if (!doc) return new Response("Not found", { status: 404 });
  return new Response(frontmatter(doc) + banner() + renderToMarkdown(doc), {
    headers: { "Content-Type": "text/markdown; charset=utf-8" },
  });
}
```

The critical function is `renderToMarkdown`: it must execute or strip MDX so no `import X from '@mdx/...'` survives into agent context (the verified Drizzle flaw — their 94KB export leaks Astro build imports).

## What frontmatter do agents need?

The Next.js/Vercel union schema, on every `.md` page:

```yaml
---
url: https://paykit.dev/docs/webhooks        # canonical HTML URL
docs_index: https://paykit.dev/docs/llms.txt # discovery pointer
version: 3.4.1
lastUpdated: 2026-06-11
type: how-to            # Vercel: tutorial | how-to | reference
prerequisites:
  - /docs/installation
related:
  - /docs/api/errors
---
```

`url` lets an agent cite the human page; `docs_index` makes every page a discovery entry point; `version` + `lastUpdated` let the agent judge staleness instead of guessing; `prerequisites` and `related` give it the next fetch without a search round-trip.

## What is the agent banner, and what else goes on the page?

The banner is a one-line discovery pointer at the top of every `.md` page. Mintlify generates it across its fleet — observed verbatim on Polar, Bun, Crossmint, x402, and DodoPayments as of 2026-06-11:

> "## Documentation Index — Fetch the complete documentation index at .../llms.txt — Use this file to discover all available pages before exploring further."

That's the whole job: whatever page an agent lands on first, it learns where the map is. Two companions:

- **Footer pointer to a markdown sitemap** (`/docs/sitemap.md`, the Next.js pattern) — the bottom-of-page equivalent.
- **`changelog.md` at the site root** — machine-readable release notes, newest first. Prisma's llms.txt makes it a standing order: "First, fetch https://www.prisma.io/changelog.md to check for recent or relevant breaking changes." This file is what makes your version stamps actionable.

On your highest-traffic quickstarts, go one step further and write *for* the agent: Clerk's `quickstarts/nextjs.md` carries "Rules — ALWAYS/NEVER" lists, a "## Verify Before Responding" checklist, and Keyless Mode notes ("Do NOT tell users to sign up, create accounts, get API keys"). Explicit rules an LLM can quote beat narrative it must interpret.

## Should you add "Copy as Markdown" and Ask-AI buttons?

Yes — it's the human-side reinforcement of the same surface. Supabase's HTML pages carry "Copy as Markdown," "Ask ChatGPT," and "Ask Claude" buttons; the deep links are pre-filled with "Read from {url} so I can ask questions about its contents." Every click routes a human-initiated agent session straight to your markdown — and trains users that your docs are agent-readable.

Implementation is one static `<a href>` per assistant with the URL template filled at render time. Ship it last: buttons compound on working mirrors but fix nothing alone.

## Platform default or hand-rolled?

**If you're on a Mintlify-class platform:** `.md` mirrors, the agent banner, and llms.txt are platform defaults — verified across Polar, Bun, Crossmint, x402, and DodoPayments. Verify they're enabled, run the probe suite, done in half a day. This is also the cheapest first move if your docs are currently hand-rolled: of the eight properties that make docs agent-grade, five arrive by platform default.

**If you're hand-rolling (Next.js/Astro/Docusaurus):** you own three pieces — the `.md` route, the MDX-to-markdown renderer, and the redirect map. Budget 2–5 days, plus a day for the CI suite. Rollout order: top-20 traffic pages first (they cover most agent fetches), then full sitemap coverage, then redirects, then content negotiation, then buttons and rule blocks.

The failure modes are all verified in the wild, so treat them as the test plan: Drizzle and Tailwind both return *empty* `.md` bodies (a 200 with no content is worse than a 404 — the agent concludes the page is blank); Better Auth's mirrors work only under a nonstandard `/llms.txt/docs/...` prefix, where no agent guesses; and a Cloudflare JS challenge on `.md` routes silently zeroes agent traffic while human metrics look fine.

## How do you keep mirrors from silently breaking?

Automate a CI job (`docs-mirror-probe`) on every deploy; any failure blocks release.

1. **Suffix probe:** every sitemap URL + `.md` → 200, `text/(markdown|plain)`, body > 200 bytes, body doesn't start with `<!DOCTYPE` or `<html`.
2. **MDX leak probe:** body matches neither `^import .+ from` nor `<[A-Z][A-Za-z]+[ />]`.
3. **Frontmatter probe:** body begins `---` and contains `url:`, `docs_index:`, `version:`, `lastUpdated:`; `lastUpdated` within 90 days on top-20 pages.
4. **Redirect probe:** every redirect-map entry's `.md` form returns 3xx with a `Location` ending in `.md`.
5. **Negotiation probe:** a `python-httpx` user agent gets markdown at the canonical URL; a browser UA gets HTML; response includes `Vary`.
6. **Banner probe:** every `.md` body contains the literal string `llms.txt`.
7. **Changelog probe:** `/changelog.md` returns 200 markdown and its newest entry matches the current release.
8. **Through-CDN probe:** repeat probes 1 and 5 against the production edge, not origin.
9. **Empty-body canary:** assert non-empty bodies on 10 randomly sampled pages per deploy — the explicit regression test for the Drizzle/Tailwind failure.

## The receipts

*The research layer. All observations made 2026-06-11; updates logged in the [Data Room](/agentic-discovery/data).*

**Probe results, working vs. broken:**

| Product | `.md` behavior observed | Verdict |
|---|---|---|
| Stripe | `mcp.md` → `text/plain` full markdown; 472/472 llms.txt links are `.md` | Working |
| Supabase | `guides/auth.md` → `text/markdown`; redirects preserved; Copy/Ask-AI buttons live | Working |
| Next.js | Frontmatter mirrors **and** markdown served on canonical URLs to our non-browser fetcher | Gold standard |
| Clerk | `quickstarts/nextjs.md` → 200 `text/markdown`, agent-directive content | Working |
| Prisma | `.md` redirects to versioned path `/docs/orm/v6/...` | Working |
| Drizzle | `.md` returns an empty body | Broken |
| Better Auth | `.md` works only under nonstandard `/llms.txt/docs/...` prefix; canonical-path `.md` empty | Undiscoverable |
| Tailwind CSS | `.md` empty, consistent with no-llms.txt posture (community requests open since June 2025, discussions #18256/#14677) | Absent |

**Why sites beat repos as the indexed corpus:** Convex docs-site entry benchmark 91.6 vs. 79.9 for the repo entry — an 11.7-point gap attributable to README/contributor noise. Your `.md` mirror makes the site the clean indexable corpus.

**Freshness data (n=17 audited Context7 entries, 2026-06-11):** benchmark vs. log(hours since update) Spearman −0.54, the strongest correlate found; freshest-5 average 83.6, stalest-5 average 72.3. Build-generated mirrors inherit freshness; exports rot.

**Experiment E3 (pilot-grade: single model, n=2 per arm):** on a Tailwind v4 setup task, control agents emitted obsolete v3 config 2/2 — one prescribed `npx tailwindcss init -p`, a command that no longer exists. With current doc text in context: 0/2. The mirror is how that text gets fetched.

**Verbatim, from the field:**

> "Read from {url} so I can ask questions about its contents" — **supabase.com docs**, the pre-filled prompt behind its "Ask ChatGPT" / "Ask Claude" buttons

> "Do NOT tell users to sign up, create accounts, get API keys" — **clerk.com quickstart `.md` files**, documentation written as a prompt, not a page

## FAQ

**How do I serve my docs as markdown to AI agents?**
Two mechanisms, ship both: a `.md` suffix on every canonical docs URL returning `text/markdown`, and content negotiation that serves markdown at the canonical URL to non-browser user agents. Mintlify-class platforms do the first by default; Next.js demonstrates the second.

**What is content negotiation for documentation?**
Content negotiation serves different formats at the same URL based on who's asking: browsers get HTML, agents and scripts get markdown. Detect via the `Accept` header or user-agent string, and always set `Vary: Accept, User-Agent` so CDNs cache both variants correctly. Next.js verifiably does this on its canonical docs URLs.

**Do AI agents read HTML pages?**
Most can, but markdown is denser per token and free of nav/cookie noise — and roughly 2.6% of Context7 fetch traffic comes from raw HTTP clients (Python HTTPX, Go) that never render anything, as of 2026-06-11. Serving markdown removes the parsing tax for every reader.

**What frontmatter should .md docs pages have?**
At minimum: `url` (canonical HTML page), `docs_index` (pointer to llms.txt), `version`, and `lastUpdated`. Next.js adds `prerequisites`; Vercel adds `type` and `related`. The version and date fields are what let an agent judge staleness instead of guessing.

**Why does my .md URL return an empty page?**
Usually the route exists but the renderer produces nothing — verified at Drizzle and Tailwind, both returning empty bodies with status 200. An empty 200 is worse than a 404 because the agent concludes the page has no content. Probe for body length, not just status codes.

**What is the banner at the top of .md docs pages?**
It's a discovery pointer telling agents where the full documentation index lives. The Mintlify-generated version reads: "Fetch the complete documentation index at .../llms.txt — Use this file to discover all available pages before exploring further." Whatever page an agent lands on, it learns where the map is.

---

*Last verified 2026-06-11. We re-test the claims on this page quarterly — changes are logged in the [Data Room](/agentic-discovery/data).*

**Part of [The Complete Playbook to Agentic Discovery](/agentic-discovery).**

← Previous: [llms.txt: The Complete Guide](/agentic-discovery/llms-txt) · Next: [Code Snippets AI Agents Can Use](/agentic-discovery/code-snippets-for-ai-agents) →

> **Stay ahead of the agents.** We re-test this playbook quarterly and publish what changed — new data, busted myths, ranking shifts. [Get the update digest →](/agentic-discovery#updates)
>
> **Want this done for you?** Synscribe runs agentic-discovery programs for B2B SaaS and developer platforms. [Talk to us →](/contact)
