> ## Documentation Index
> Fetch the complete guide index at: https://www.synscribe.com/agentic-discovery/llms.txt
> Use this file to discover all pages before exploring further.

---
title: "Birdseye: See How AI Coding Agents Actually Think & Search (Free Mac App)"
description: "A free Mac app that replays a Claude Code or Codex run as debuggable layers: every query the agent wrote, every page it read vs skipped, and every claim it kept or killed."
slug: /agentic-discovery/resources/birdseye
series: The Agentic Discovery Playbook · Resource
last_verified: 2026-06-12
---

# Birdseye: See How Coding Agents Actually Think & Search

**What this is:** Birdseye is a free Mac app that opens up a Claude Code run and shows you, layer by layer, how the agent made its decisions. You see the queries it wrote for itself, the pages it opened versus the ones it only glanced at, the claims it kept and the claims it killed, and the answer it finally handed the user. It's the instrument behind the numbers in [the 3-experiment report](/agentic-discovery/agent-search-experiments) and the search-and-fetch surface in [Play 1](/agentic-discovery/ai-agent-web-search-and-fetch), now yours to run on your own product.

> 📥 **Download Birdseye (v0.1.0.):** [(.dmg)](/agentic-discovery/resources/Birdseye_0.1.0_universal.dmg) · or the [.zip](/agentic-discovery/resources/Birdseye-universal.zip).

![The Birdseye Mac app replaying a Claude Code research run titled "Build multi-inbox email automation app for sales." A left sidebar shows a tree of the main thread and its sub-agents; a header shows web-operation counts; the main panel is a timeline of the agent's web searches, the pages it fetched, and the findings it extracted from each.](/agentic-discovery/images/birdseye-timeline.png "Birdseye replays a run as a navigable timeline: main thread, sub-agents, individual searches, fetches, and the findings pulled from each page.")

## Why you can't see this from your own analytics

When an AI agent evaluates your product, your own instruments are blind.

Your server logs a bare `WebFetch` hit. No query, no competitors it weighed you against, no verdict on whether your claim survived.

A citation tracker counts mentions in *answers* and misses the agent's working layer entirely.

The whole evaluation happens in machine-readable text neither you nor your marketing team ever sees: which queries it wrote, which of the dozens of results it bothered to open, which of your claims it threw out.

Birdseye reads the agent's own session and reconstructs that hidden layer.

You point it at a Claude Code run, and it replays the decision: the main thread, every sub-agent it spawned, every search each one ran, every page each one read, and the reasoning that turned all of it into a recommendation.

## The four layers it reconstructs

A real research run isn't one search. It's a tree. The main thread delegates to sub-agents, each sub-agent fans out into its own searches, and each search returns pages the agent may or may not open. Birdseye breaks that tree into four debuggable layers, and each one maps to a different reason you did, or didn't, get chosen:

![The four findings layers, from page up to user. Per-page (per fetch): the claims extracted from a single opened URL, which must survive the fetch cut and be verifiable. Per-search (per query): a query's ranked results, where you need to rank for the agent's self-authored query. Per-agent (per sub-agent): a confidence-graded brief on one slice, where you win a sub-category. Synthesis (per conversation): the final answer to the user, where you need to be in the verdict.](/agentic-discovery/images/diagram-findings-layers.png)

- **Per-page (per fetch):** the claims the agent extracted from a single opened URL. *Your failure mode if you're missing here:* surfaced but not opened (a snippet or authority problem), or opened but your claim got killed in verification.
- **Per-search (per query):** one self-authored query, its ranked results, and the agent's takeaway. *Failure mode:* you never surfaced, a ranking or recency problem.
- **Per-agent (per sub-agent):** one sub-agent's confidence-graded brief on its slice of the problem. *Failure mode:* you lost a sub-category to a better-covered competitor.
- **Synthesis (per conversation):** the final recommendation handed to the user, and the claims behind it. *Failure mode:* out-positioned, or your headline number was thrown out before it reached the verdict.

## What you can do with it

**See how the agent decides at every level.** Walk the run top-down (main thread → sub-agent / workflow → individual search → individual fetch) and watch where a product enters the consideration set, where it gets cut, and where the final verdict locks in. This is how we found that agents *don't search by default* and that one question class produced a 57× spread in search volume.

**Extract every fan-out query.** Pull the full list of queries the agent wrote for itself across every sub-agent. These are long, dated, spec-loaded strings, not keywords. That list is your real keyword research for the agent channel: rank for the queries agents actually author, and you're in the running.

**Compare what was *read* vs what was merely *returned*.** Search surfaces far more than the agent opens. In one run, 215 domains surfaced and only ~13 were fetched (a ~6% open rate). Birdseye tags every page **Read** or **Shown**, so you can see exactly where being *worth opening* (your title, meta description, freshness, and apparent authority) gated you out before the agent ever saw your content.

![Birdseye's Sources view. Every page the agent encountered is listed and tagged: "Read" means the agent actually fetched the page; "Shown" means a search surfaced it but the agent never opened it. Each entry shows a title, URL, and the snippet or extracted content.](/agentic-discovery/images/birdseye-sources.png "Read vs Shown: what the agent actually opened, versus what a search only surfaced. The gap is the fetch cut.")

**Measure SEO/GEO changes before and after.** Run an agent's "help me choose a [your category]" task, instrument it in Birdseye, ship your change (a new llms.txt, a rewritten landing page, a claim you can back up), then re-run the same task. Compare the two: are you surfaced now? Opened? Do your claims survive the verification pass? It turns agentic-discovery work from guesswork into a measured before/after. The [weekly tracker in Part 5](/agentic-discovery/measure-ai-visibility) covers the cadence.

## Install

1. Open the `.dmg` and drag **Birdseye** into your Applications folder (or unzip the `.zip` and move the app there).
2. On first launch, because this is an early build from an independent developer, macOS may block it. Right-click the app → **Open**, then confirm. You can also allow it under **System Settings → Privacy & Security**.
3. Point it at a Claude Code session to replay the run.

*Birdseye is read-only. It reads a session and visualizes it; it never changes your project or sends your run anywhere.* It's the same tool referenced throughout this guide (it was "public release on the roadmap," and this is that release). Scale caveat we keep everywhere: the published findings come from **three runs on a single agent and one model family**. They are directional, not population estimates. Run it on your own product to see your numbers.

---

*Last verified 2026-06-12. Part of [The Complete Playbook to Agentic Discovery](/agentic-discovery).*