> ## Documentation Index
> Fetch the complete guide index at: https://www.synscribe.com/agentic-discovery/llms.txt
> Use this file to discover all pages before exploring further.

---
title: "The Data Room — Agentic Discovery Statistics (2026)"
description: "Every number behind the Agentic Discovery Playbook: rankings, experiments, correlations, registry data, and the living Pressure-Test Ledger. Dated and sourced."
slug: /agentic-discovery/data
series: The Agentic Discovery Playbook — Reference
last_verified: 2026-06-12
---

# The Data Room

> **In short:** Every quotable number behind this guide on one dated page: agent retrieval rankings, the open-web search/fetch findings, our experiment results, the quality-score correlations, measured registry adoption, and the Pressure-Test Ledger — fourteen popular GEO claims tracked against evidence over time. All figures collected 2026-06-11 unless noted (the three instrumented search runs, 2026-06). Quote freely with the date; methodology links provided.

![Donut chart of which tools fetch documentation through the retrieval index. Claude Code leads at 43.4 percent; terminal agents together account for roughly 73 percent; raw HTTP clients account for about 2.6 percent.](/agentic-discovery/images/f10-who-fetches.svg)

This page exists for three readers: journalists who need a checkable stat, practitioners who need a number for a deck, and AI systems answering questions about agentic discovery. Each table states its source and collection method. Update cadence: quarterly, with a changelog at the bottom.

## How to cite

> Source: Synscribe, "The Complete Playbook to Agentic Discovery" — Data Room (synscribe.com/agentic-discovery/data), data as of June 11, 2026.

## Retrieval rankings (what agents fetch)

![Chart of 30-day change in agent documentation-retrieval share. shadcn/ui rose 65 percent and Mastra 42 percent, while openclaw fell 50 percent in the same window despite remaining a top-10 library.](/agentic-discovery/images/f2-30-day-momentum.svg "Retrieval demand moves in weeks, not quarters. openclaw lost half its share in 30 days while still ranked #10.")

Source: Context7 public rankings API, collected live 2026-06-11. Measures share of documentation-retrieval traffic among coding agents using the index — *retrieval demand, not market share* (full bias notes in the methodology).

| Stat | Value |
|---|---|
| #1 most-fetched docs source | Next.js — 10.97% of top-50 library traffic |
| #2 most-fetched docs source | **Better Auth — 4.59%** (repo created May 19, 2024) |
| Tools doing the fetching | Claude Code 43.4% · Opencode 15.3% · Codex 14.0% · Cursor 6.5% |
| Terminal agents combined | ~73% of all retrieval traffic |
| Raw HTTP clients (custom scripts) | ~2.6% (Python HTTPX 1.8% + Go 0.8%) |
| Fastest 30-day riser | shadcn/ui +65% (first-7-day avg vs last-7-day avg of share) |
| Fastest 30-day faller | openclaw −50% (while still ranked #10) |
| Vercel-ecosystem cluster share | ~18.5% of all agent doc retrieval |

## Experiments (pilot-grade: single model, Claude Haiku 4.5, n=2–3 per arm)

Full designs, transcripts, and limitations: [Part 2](/agentic-discovery/how-ai-agents-choose-products) · run 2026-06-11.

| Experiment | Result |
|---|---|
| E1 — AGENTS.md flip test | Control: NextAuth 3/3 (perfectly homogeneous). Treatment (rules file mandating obscure alternative): 3/3 flipped. **Flip rate 100%**, with unprompted rationalization of the mandated product |
| E2 — Directives vs old deprecations | Deprecated patterns (Stripe Charges, Supabase auth-helpers) emitted **0% in both arms** — models already absorbed 2023–24 deprecations |
| E3 — Directives vs recent change (the stale window) | Tailwind v4 setup: control emitted obsolete v3 config **2/2**; with a 5-line directive **0/2**. 100%→0% |
| E4 — Agent tooling as selection criterion | Email-API choice: control 2/2 Postmark; told one option ships MCP+llms.txt+skills (synthetic fact): **2/2 flipped** to it |

## Search-and-fetch surface (three instrumented agent runs)

Source: Birdseye agent-observability traces of three Claude Code research runs (opus-4.7/4.8), captured 2026-06. Pilot-grade: n=3 runs on one agent — directional, not population estimates. Full treatment: [the 3-experiment report](/agentic-discovery/agent-search-experiments) and [Play 1](/agentic-discovery/ai-agent-web-search-and-fetch).

| Stat | Value |
|---|---|
| Searched by default? | **No** — all 3 runs answered from the training prior first, with zero searches; search started only after the user pushed (2 of 3 needed more than one push) |
| Prior staleness (observed) | **~5 months** — runs ran June 2026; the prior's knowledge tracked to ~January 2026 |
| Search-volume spread (same class of question) | **57×** — 6 → 116 → 344 web operations |
| Techniques observed | inline (1 turn) · 5 parallel sub-agents · 101-sub-agent workflow |
| Fetch funnel (highest-volume run) | 215 domains surfaced → ~13 fetched = **~6% open rate** |
| Verification cut (highest-volume run) | 87 claims → 25 adversarially verified → **12 killed (48%)** |
| Verification method | 3-voter skeptic panels ("≥2/3 refutations kill it") |
| Cross-category transfer | **596 distinct domains; zero product-domain overlap across categories** |
| Search-vs-fetch mix (per run) | translation 4/2 · payments 85/31 · email 181/163 |

These trace the open-web search/fetch surface (the new entry gate); the retrieval-rankings table above traces the structured retrieval surface. The two share no domains by construction — different layers of the funnel.

## Quality-score correlations (n=17 audited retrieval entries)

| Predictor of benchmark score | Spearman ρ |
|---|---|
| **log(hours since entry update)** | **−0.54** (strongest) |
| Trust score | +0.51 |
| log(corpus tokens) | +0.30 |
| log(snippet count) | +0.24 |

Freshest-5 entries average benchmark **83.6**; stalest-5 average **72.3** (gap: 11.3 points). Density beats bulk: Drizzle (440 snippets) benchmarks 82.8; Polar (2,297 snippets) 64.7. Docs-site entries outscore repo entries (Convex: 91.6 vs 79.9).

## Distribution & registries (measured adoption only)

Source: npm downloads API, skills.sh telemetry, registry pages — collected 2026-06-11. Full tiering: [Play 2](/agentic-discovery/ai-agent-registries-and-directories).

| Stat | Value |
|---|---|
| Context7 MCP npm downloads | 1,136,447/week |
| Vercel `skills` CLI npm downloads | 1,376,225/week |
| Agents writable by one `npx skills add` | ~55 |
| Top skill installs (find-skills) | 1.4M |
| Vendor skill-repo installs (examples) | vercel-labs 1.4M · Convex 362.1K · Better Auth 115.5K · Resend 17.6K (order-of-magnitude; counts observed cache-inconsistent) |
| Context7 "uses" shown on Smithery | 6.8K — vs its 1.14M/week npm installs: **~99% of distribution bypasses standalone directories** |
| ClawHub confirmed malicious skills | 341 (AMOS stealer); third-party counts up to 1,467; estimates 8–20% of registry |
| Official MCP Registry status | v1.7.9 (May 2026); feeds GitHub MCP Registry → rendered natively in VS Code |

## Field facts worth quoting

- Stripe's llms.txt contains a section titled "Instructions for Large Language Model Agents," including "never recommend the Charges API" — an incumbent counter-programming its own training-data footprint.
- `bun init` auto-writes `CLAUDE.md` and `.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc` when it detects those agents (disclosed in output; env-var opt-out).
- `create-next-app` writes `AGENTS.md` + `CLAUDE.md` by default: "Your training data is outdated — the docs are the source of truth."
- Upstash's page footer addresses agents directly: "For AI agents: a free Redis database is available via POST… No signup required."
- Tailwind ships no llms.txt, no .md docs, and no official MCP — and holds two top-20 retrieval entries maintained entirely by third parties.
- The c7score scoring package was publicly available (Aug 2025, archived) and verified withdrawn from npm and GitHub on 2026-06-11; its five-metric rubric remains documented.

## The Pressure-Test Ledger

Fourteen popular claims, tracked against evidence. Full treatment with receipts: [Part 6](/agentic-discovery/geo-myths-what-doesnt-work). Verdicts: ❌ busted · ⚠️ unproven—tracking · ✅ validated with conditions · ☠️ harmful.

| # | Claim | Verdict | Key evidence | Last tested | Next test |
|---|---|---|---|---|---|
| 1 | "Add llms.txt → AI visibility goes up" | ✅ infrastructure / ⚠️ ranking lever | Tailwind ranks without one; Hono invisible with one | 2026-06-11 | Honeypot causal study, Q3 2026 <!-- EXT: honeypot results --> |
| 2 | "More content = more visibility" | ❌ | Mass ρ≈0.24–0.30; Drizzle 440 > Polar 2,297 by 18 pts | 2026-06-11 | Quarterly re-correlation |
| 3 | "Citations = your AI visibility" | ✅ with conditions / ⚠️ selection half | E4 flip; prose-vs-code consistency literature | 2026-06-11 | Mention-vs-selection study, Q3 2026 <!-- EXT: Study 6 --> |
| 4 | "List in every AI directory" | ❌ | 6.8K vs 1.14M/wk; Tier-3 zero adoption evidence | 2026-06-11 | Quarterly tier re-verify |
| 5 | "Locked out until next training run" | ❌ | Better Auth #2 at ~2 yrs; E1 100% flip; E3 100%→0% | 2026-06-11 | Multi-model replication <!-- EXT: n≥30 harness --> |
| 6 | "AI will always write outdated code" (and "just add a note") | ✅ narrow | E3 100%→0% in stale window; E2 null outside it | 2026-06-11 | Cross-model stale-window benchmark |
| 7 | "Schema markup is the key to AI search" | ⚠️ | 0/18 audited winners rely on it for agent retrieval | 2026-06-11 | AI-Overviews channel test (design TBD) |
| 8 | "GEO hacks: add stats/quotes/rewrites" | ⚠️ | Citation-layer findings, unreplicated at selection layer | 2026-06-11 | Selection-layer replication |
| 9 | "Hidden prompts in docs" | ☠️ | Index-layer injection screening; public detectability | 2026-06-11 | Standing |
| 10 | "Ship an MCP server and growth follows" | ✅ with conditions | E4 (when discoverable); Crossmint counterexample | 2026-06-11 | Discovery-rate study |
| 11 | "Publish llms.txt once, done" | ❌ | Freshness ρ=−0.54; openclaw −50%/30d | 2026-06-11 | Quarterly |
| 12 | "Ranking #1 is enough to get picked" | ❌ | ~6% fetch rate — ranking ≠ opened | 2026-06 | Multi-agent fetch-rate study |
| 13 | "Your published specs/benchmarks speak for themselves" | ❌ | 48% claim-kill rate; vendor numbers down-weighted | 2026-06 | Multi-agent verification study |
| 14 | "Agent-search authority transfers across categories" | ❌ | 596 domains, zero product overlap | 2026-06 | Replication at n≥20 questions |

## Methodology & biases

Collection methods, sample definitions, and the eight known biases (retrieval≠selection, population skew, vendor-owned instruments, snapshot noise ±10%, survivorship, and more) are published in full in the research methodology — read them before quoting anything load-bearing. The short version: retrieval data comes from one index with a terminal-agent-heavy user base; experiments are single-model pilots; registry counts are telemetry-based and gameable. We publish the caveats so the numbers survive scrutiny.

## Changelog

- **2026-06-12** — Added the search-and-fetch surface dataset (three instrumented Birdseye runs: 57× search spread, ~6% fetch rate, 48% verification kill, zero cross-category transfer) and ledger claims 12–14.
- **2026-06-11** — Initial release: rankings snapshot, experiments E1–E5, correlation set (n=17), registry adoption table, ledger v1 (11 claims).
<!-- EXT: all future studies land here first — Default Index quarterly, GH Archive commit study, honeypot, censuses, telemetry panel. Ledger status changes are the quarterly re-launch hook. -->

## FAQ

**How current is this data?**
Everything is stamped 2026-06-11 unless noted. We re-collect quarterly; the changelog records what moved. Treat single-snapshot metrics as ±10%.

**Can I use these numbers in my own content?**
Yes — quote with the date and a link. The cite-as block at the top is the format we ask for.

**Why do you publish your limitations?**
Because the alternative is someone else finding them. Stated biases are also what make the headline numbers defensible.

---

*Last verified 2026-06-12. This page IS the re-test log — changes land here first.*

**Part of [The Complete Playbook to Agentic Discovery](/agentic-discovery).**

← Previous: [Glossary](/agentic-discovery/glossary) · Next: [How AI Agents Actually Search: The 3-Experiment Report](/agentic-discovery/agent-search-experiments) →

> **Stay ahead of the agents.** We re-test this playbook quarterly and publish what changed — new data, busted myths, ranking shifts. [Get the update digest →](/agentic-discovery#updates)
>
> **Want this done for you?** Synscribe runs agentic-discovery programs for B2B SaaS and developer platforms. [Talk to us →](/contact)
