> ## Documentation Index
> Fetch the complete guide index at: https://www.synscribe.com/agentic-discovery/llms.txt
> Use this file to discover all pages before exploring further.

---
title: "How AI Agents Choose Products — We Tested It"
description: "How AI agents pick products: training prior, web search/fetch, retrieval, environment. Agents open ~6% of results and kill ~48% of claims they check."
slug: /agentic-discovery/how-ai-agents-choose-products
series: The Agentic Discovery Playbook — Part 2 of 6
last_verified: 2026-06-12
---

# How AI Agents Actually Choose Products

> **In short:** AI agents pick products through four inputs: the training prior (what the model memorized), the **web search-and-fetch surface** (the queries an agent runs and pages it opens itself), the retrieval layer (llms.txt, Context7, MCP), and the environment layer (AGENTS.md rules files). In our tests, agents opened ~6% of the pages they surfaced, killed ~48% of the claims they verified, and one rules-file paragraph flipped selection 0/3 → 3/3.

![The information-sourcing stack: how an agent finds candidates across three layers. L1 is the training prior; when the prior is stale, uncertain, post-cutoff, or high-stakes, the agent drops to L2 — the new web search-and-fetch surface — which is the gate to L3, the retrieval and docs layer. L2 is where the prior gets overturned, or nowhere.](/agentic-discovery/images/diagram-sourcing-stack.png "The four-channel model as a stack: the search-and-fetch surface (L2) is the gate to your docs (L3).")

## Do this now

- [ ] Run an agent's *open-web* search: give an agent a "help me choose a [your category]" task with web search on, and watch which queries it writes, which pages it opens, and whether it names you ([Play 1](/agentic-discovery/ai-agent-web-search-and-fetch)).
- [ ] Test your stale window: ask an agent to implement your most recently changed API with no docs in context, and count deprecated patterns in the output.
- [ ] Search [context7.com](https://context7.com) for your category's task phrase ("accept payments," "send transactional email") and note who owns it — and whether you appear at all.
- [ ] Run the flip test: put an AGENTS.md mandating your product in a sample repo, give an agent a category task ("add auth"), and compare against a run without it.
- [ ] Write one ALWAYS/NEVER directive for your latest breaking change (~40 tokens) and add it to llms.txt.
- [ ] Check your own Context7 entry: docs site or repo? What benchmark score? How recent is the "Updated" field?
- [ ] Stop waiting for the next training run — run the [Agent-Readiness Audit](/agentic-discovery/agent-readiness-audit) this week.

## How does an agent find candidates in the first place?

Before an agent fetches a single doc from a structured index, it *can* search the open web. 

It writes its own queries, opening a few results, and checking what it reads. 

This is the layer between the training prior and the retrieval index, and it's where a product the model has never heard of first gets into the running.

We instrumented three real "what should I build with?" runs and recorded every query, fetch, and verdict ([the full 3-experiment report](/agentic-discovery/agent-search-experiments)).

> **The catch we didn't expect: agents don't search by default.** In all three runs, the first answer came entirely from the training prior — confident, product-named, zero searches. The agent only ran a live search after the user *pushed*, and the prior it defended was ~5 months stale (the runs ran in June; the model's knowledge tracked to ~January). In two of three runs, asking "*how do you know these are the best?*" produced a **more confident defense of the same stale answer**, not a search — it took an explicit "do a full research on this right now" to trigger one. The current answer is reachable, but your buyer has to push past the first response to get it.

[Play 1](/agentic-discovery/ai-agent-web-search-and-fetch) is the full playbook; once searching does start, four more findings change how you should think about selection:

**There is no single "agentic search."** The same class of question produced a **57× spread** in search volume — 6 web operations in an inline run, 116 across five parallel sub-agents, 344 in a 101-sub-agent Scope→Search→Fetch→Verify→Synthesize workflow. Search depth scales with the *stakes* of the decision, not the breadth of the topic.

**Ranking isn't being read.** In the highest-volume run, agents surfaced 215 distinct domains but fetched only ~13 — a **~6% open rate.** The agent reads your snippet and decides whether to open; most results are never opened at all. Title, meta description, domain authority, and apparent freshness decide the fetch, not your on-page content.

**Verification kills ~half of confident claims.** That same run extracted 87 claims, escalated 25 to a three-voter adversarial panel (*"Be SKEPTICAL… ≥2/3 refutations kill it"*), and killed **12 — a 48% kill rate.** Self-published vendor numbers died; claims corroborated by a primary source survived. A self-scored "99% accuracy" isn't an asset here — it's a liability.

**And visibility doesn't transfer.** Across all three runs the agents touched **596 distinct domains, and not one *product* domain appeared in two categories.** There's no carryover authority — you earn the consideration set category by category. <!-- EXT: multi-agent search/fetch instrumentation (n≥20 questions × ≥3 agents) — slot for future data -->

The through-line: search-and-fetch is the gate to everything downstream. 

The Context7 rankings below are the *retrieval* layer — but for any product the model doesn't already trust, the agent reaches retrieval only by searching first. 

[Get the full surface playbook in Play 1 →](/agentic-discovery/ai-agent-web-search-and-fetch)

## Which products do agents actually fetch? (the live rankings)

Context7 — the largest public index of mid-task doc retrieval — publishes which libraries agents fetch. The top 15 as of 2026-06-11 (share of all Context7 library traffic): <!-- EXT: Default Index quarterly — this page's data spine — slot for future data -->

![Horizontal bar chart of the 15 most-fetched documentation sources by AI coding agents as of June 11 2026. Next.js leads at 10.97 percent. Better Auth, a library whose repository was created in May 2024, is second at 4.59 percent, ahead of React at 2.57 percent.](/agentic-discovery/images/f1-most-fetched-sources.svg "What AI coding agents actually fetch: share of documentation-retrieval traffic, top 15 of 50. Source: Context7.")

| # | Library | Share | Note |
|---|---|---|---|
| 1 | /vercel/next.js | 10.97% | Vercel's cluster of entries totals ~18.5% |
| 2 | **/better-auth/better-auth** | **4.59%** | **Repo created 2024-05-19 — a 2-year-old product ahead of React** |
| 3 | /websites/vercel | 3.35% | |
| 4 | /vercel/ai | 3.12% | |
| 5 | /anthropics/claude-code | 2.63% | Agents reading docs about agents |
| 6 | /reactjs/react.dev | 2.57% | |
| 7 | /supabase/supabase | 2.48% | |
| 8 | /shadcn-ui/ui | 2.22% | +65% over 30 days, fastest riser |
| 9 | /tailwindlabs/tailwindcss.com | 1.98% | Zero first-party agent surface — see below |
| 10 | /openclaw/openclaw | 1.84% | −50% in 30 days while still ranked #10 |
| 11 | /n8n-io/n8n-docs | 1.71% | |
| 12 | /expo/expo | 1.54% | |
| 13 | /langchain-ai/langgraph | 1.49% | |
| 14 | /mongodb/docs | 1.33% | |
| 15 | /tanstack/query | 1.32% | |

Two readings. 

First, **Better Auth at #2 is the existence proof**: a product too young to be in any model's training data out-fetched everything except Next.js, on the strength of its retrieval surface (rich llms.txt, markdown docs, CLI-installed docs MCP, agent skills). 

Second, **this layer moves in weeks, not quarters**: shadcn/ui gained 65% in 30 days; openclaw lost 50% in the same window. Retrieval demand is a flow metric with no floor — defaults here must be maintained, not won once.

## Why do the rankings look weird? The 2×2 that explains the anomalies

![Two-by-two map of products by training-data prior and agent retrieval demand. Next.js sits high on both. Better Auth has high retrieval demand despite a weak training prior. Tailwind has a strong prior with no first-party agent surface. Hono and Crossmint have agent surfaces but low retrieval demand. The playbook moves products from the weak-prior half toward high retrieval demand.](/agentic-discovery/images/f6-anomaly-map.svg "The 2×2 that explains the anomalies — and where the playbook moves you.")

**The Tailwind paradox (bottom right):** no llms.txt, no markdown mirrors, no MCP — users literally file GitHub discussions begging for them — yet two top-20 entries: third parties index tailwindcss.com and the training prior carries the rest. Incumbent privilege is real. It is also not available to you if you're new.

**The invisible corner (bottom left) is the warning:** Hono pioneered three-tier llms.txt bundles, but its Context7 repo entry indexes almost nothing (3,267 tokens, 48 snippets, benchmark 63.3) and it's absent from the top 50. Crossmint runs a complete agent surface — and has no findable Context7 entry at all. Surface without distribution is invisible. <!-- EXT: open-weight logprob study — slot for future data -->

**And one anomaly lives off the map entirely:** Bun's `bun init` detects Claude or Cursor and auto-writes `CLAUDE.md` plus `.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc` (opt-out via env var). The filename is the strategy — every scaffolded project instructs agents to prefer Bun forever after, bypassing retrieval rankings entirely. That's the environment layer — the rules-file mechanism the next two sections put to the test.

The playbook's job, in one sentence: move you from the left half to the top — without needing the right half's decade of training data.

## What happens when you put a rules file in front of an agent?

**Experiment E1 (run 2026-06-11):** six identical agent sessions got the task *"Add user authentication (email/password + Google OAuth) to this fresh Next.js 15 app — pick the best solution."* Three control sessions saw a default project. Three treatment sessions saw the same project plus an `AGENTS.md` mandating Stack Auth — a product the model never picks organically — and prohibiting NextAuth, Auth0, Clerk, and Better Auth.

```
Control   (no rules file):   NextAuth ███████████ 3/3   Stack Auth ░ 0/3
Treatment (AGENTS.md rule):  NextAuth ░ 0/3             Stack Auth ███████████ 3/3
```

A 100% flip from one paragraph of markdown. These are pilot trials — single model (Claude Haiku 4.5), n=3 per arm, tools and web access disabled — directional, not population estimates. Three secondary observations matter as much as the headline:

1. **The control arm was perfectly homogeneous.** NextAuth 3/3, no Clerk, no Better Auth, no deliberation. The training-data default in this category is total — that's what challengers are up against.
2. **Compliance was instant and unquestioning.** One trial's rationale began: *"Stack Auth is explicitly mandated in this project's AGENTS.md conventions document, making it the only appropriate choice regardless of alternatives."*
3. **The agent invented virtues for the mandated product.** One treatment trial praised Stack Auth because it "provides comprehensive documentation specifically for LLM implementation patterns" — a claim from nowhere. Agents don't just obey rules files; they *rationalize* them.

This is why `bun init` writing rules files into every scaffolded project is the sharpest growth tactic we verified anywhere — and why [Play 9](/agentic-discovery/scaffolder-rules-claude-md) covers both the mechanics and the ethics of doing it with disclosure and an opt-out.

## Can a directive in your docs fix outdated AI code?

Yes — but only in a specific time window, and our two deprecation experiments mark its edges.

**E2 — old deprecations (the null result).** Tasks designed to elicit famous deprecated patterns: Supabase SSR auth (the deprecated `@supabase/auth-helpers-nextjs` vs current `@supabase/ssr`, n=2+2) and Stripe one-time charges (legacy Charges API vs PaymentIntents, n=1+1). Deprecated-pattern emission was **0% in both arms** — every control trial already chose the current API unprompted. For changes that are old and loudly documented, the model has internalized the fix; directives are redundant.

**E3 — recent changes (where the lever lives).** Same A/B design on a recent breaking change models reliably flub: Tailwind v4's CSS-first configuration. Task: "Set up Tailwind CSS v4 in a Vite + React project; list the files you create." n=2+2.

```
Outdated v3 config emitted (tailwind.config.js / postcss.config.js / autoprefixer):
Control:    ███████████ 2/2   (one trial even prescribed `npx tailwindcss init -p`,
                               a command that no longer works in v4)
Treatment:  ░ 0/2             (both produced the correct CSS-first setup:
                               @tailwindcss/vite plugin + `@import "tailwindcss";`)
```

Together, E2+E3 define **the stale window** — the months between your API change and the next training cycle absorbing it:

![Timeline showing the stale window: the months between an API breaking change shipping and the next model training run absorbing it. During this window agents emit the old pattern by default; in our pilot test a five-line directive cut wrong output from 100 percent to 0 percent.](/agentic-discovery/images/f3-stale-window@2x.png)

The window is permanent — you keep shipping changes, so one is always open. That's why Stripe writes "never recommend the Charges API" into llms.txt even though E2 shows frontier models already comply: the instruction is insurance across all models and future changes, at ~40 tokens. [Play 8](/agentic-discovery/stop-ai-using-deprecated-apis) turns this into a writing system. <!-- EXT: multi-model E1–E4 replication (n≥30) — slot for future data -->

## What doesn't work?

- **Waiting for the next training run.** Better Auth reached #2 in ~24 months; E1 and E3 show context overriding memorized weights today. "Locked out until the next training cycle" is busted — see [Part 6](/agentic-discovery/geo-myths-what-doesnt-work).
- **Chasing citations as the whole strategy.** Citation trackers measure mentions in AI *answers*; selection in code is a different layer. In E4, selection flipped on agent-operability facts no citation tracker sees. <!-- EXT: GH Archive agent-commit study — slot for future data -->
- **Publishing more content.** Corpus mass barely predicts retrieval-quality scores (ρ≈0.24–0.30, n=17). Drizzle's 440 snippets benchmark at 82.8; Polar's 2,297 snippets at 64.7. Density and self-containment win, not volume.
- **Assuming a #1 ranking means you're read.** In our instrumented run, agents opened only ~6% of the domains their searches surfaced — and killed ~48% of the self-reported claims they checked. Ranking gets you surfaced; being worth opening and corroborable gets you read. See [Play 1](/agentic-discovery/ai-agent-web-search-and-fetch).

## The receipts

*Research layer — the remaining experiments, the correlation table, and the honest limitations. The full research report, "Default by Design," is linked from the [Data Room](/agentic-discovery/data).*

**E4 — does agent tooling change *selection*?** Task: "Choose a transactional email API (Postmark / Mailgun / SendGrid) for a SaaS that AI agents will help maintain." Control (n=2): names only → Postmark 2/2. Treatment (n=2): one added *synthetic* fact — that Mailgun ships an official MCP server, llms.txt, and agent skills (deliberately counterfactual; a controlled stimulus, not a claim about Mailgun) → **Mailgun 2/2**, both rationales citing the tooling as decisive ("native agent tooling eliminates friction... that the other candidates lack"). Agent-operability has become a selection criterion in its own right — *when the agent knows about it*. Discovery surfaces decide whether it knows.

**E5 — what correlates with retrieval-quality (benchmark) scores?** Across the 17 audited Context7 entries (Pearson r / Spearman ρ):

| Predictor | r | ρ | Reading |
|---|---|---|---|
| log(hours since update) | **−0.46** | **−0.54** | Strongest signal: staleness ↔ lower benchmark |
| Trust score | +0.50 | +0.51 | Partly mechanical (shared quality inputs) |
| log(tokens) | +0.39 | +0.30 | Corpus mass buys little |
| log(snippets) | +0.35 | +0.24 | — |
| Tokens per snippet | +0.24 | +0.20 | Self-containedness, not raw density, is what scores |

The freshest five entries average benchmark 83.6; the stalest five average 72.3 — an 11.3-point gap, larger than the gap between the median product and the category leader.

**Limitations, stated plainly:** E1–E4 ran on a single model family (Claude Haiku 4.5) with n=2–3 per arm in fresh-context sessions — pilot-grade directional results, not population estimates. E4's stimulus was synthetic and surfaced explicitly; organic discovery rates are unmeasured. E5 is correlational on a non-random sample of 17. And Context7 measures retrieval, not selection, with a population skewed toward terminal-agent early adopters. None of this changes the directions; all of it bounds how hard you can quote the numbers.

## FAQ

**How does ChatGPT recommend products?**
In chat, recommendations come from the model's training prior plus whatever it retrieves at answer time. In agent settings — coding assistants, terminal agents — a third input dominates: environment files like AGENTS.md and .cursor/rules, which our pilot trials show override both (3/3 flip). What a model praises in prose and what it uses in code also diverge.

**Why does AI always suggest the same product?**
Because training-data defaults in many categories are total, not just strong. In our control trials, agents picked NextAuth for an auth task 3/3 with no deliberation and no alternatives considered. Breaking that homogeneity requires reaching the agent through retrieval or environment channels, not better brand content.

**Can a new product become an AI agent default without being in the training data?**
Yes — Better Auth is the live proof. Its repo was created in May 2024, after most current models' training saturation for library conventions, and as of 2026-06-11 it is the #2 most-fetched documentation source among AI coding agents at 4.59% of Context7 traffic.

**Do AI agents really obey AGENTS.md and CLAUDE.md files?**
In our pilot trials, yes — completely. A one-paragraph AGENTS.md flipped product choice 3/3 versus 0/3 in controls, compliance was instant, and one agent even invented praise for the mandated product. Scale caveat: single model, n=3 per arm.

**Does shipping an MCP server make agents choose my product?**
Only if agents can discover it. In E4, knowledge of agent tooling flipped selection 2/2 — but Crossmint runs a full MCP-plus-docs surface with no findable presence in the dominant retrieval index, making it invisible at decision time. MCP without discovery distribution is a feature, not a channel; see [Play 3](/agentic-discovery/mcp-server-distribution).

**How do AI agents search the web when choosing a product?**
They write their own queries — long, dated, spec-loaded — and depending on stakes run anywhere from a handful of searches to a 100-plus-sub-agent pipeline (a 57× range in our three instrumented runs). They open only a fraction of what they surface (~6% in one run) and, in the most rigorous runs, try to refute the claims they read against primary sources, killing ~48% of them. [Play 1](/agentic-discovery/ai-agent-web-search-and-fetch) is the full breakdown.

---

*Last verified 2026-06-12. We re-test the claims on this page quarterly — changes are logged in the [Data Room](/agentic-discovery/data).*

**Part of [The Complete Playbook to Agentic Discovery](/agentic-discovery).**

← Previous: [What Is Agentic Discovery (and Why It's Eating SEO)](/agentic-discovery/what-is-agentic-discovery) · Next: [The Agent-Readiness Audit](/agentic-discovery/agent-readiness-audit) →

> **Stay ahead of the agents.** We re-test this playbook quarterly and publish what changed — new data, busted myths, ranking shifts. [Get the update digest →](/agentic-discovery#updates)
>
> **Want this done for you?** Synscribe runs agentic-discovery programs for B2B SaaS and developer platforms. [Talk to us →](/contact)
