> ## Documentation Index
> Fetch the complete guide index at: https://www.synscribe.com/agentic-discovery/llms.txt
> Use this file to discover all pages before exploring further.

---
title: "Deprecation-Eval Prompt Pack: 20 YAML Templates"
description: 20 ready-to-adapt YAML eval prompts that elicit deprecated API patterns, plus scoring pseudocode and the CI gate that blocks releases at ≥5% emission.
slug: /agentic-discovery/resources/deprecation-eval-prompt-pack
series: The Agentic Discovery Playbook · Resource
last_verified: 2026-06-11
---

# Deprecation-Eval Prompt Pack: 20 Templates That Catch Stale Output

**What this is:** a ready-to-adapt YAML eval pack of 20 prompt templates that reliably elicit deprecated API patterns from coding agents, plus the scoring loop and CI gate from [Play 8](/agentic-discovery/stop-ai-using-deprecated-apis), with three fully worked examples from our Tailwind v4 pilot.

## How to use it

- Fill the placeholders from your stale-surface inventory ([Play 8](/agentic-discovery/stop-ai-using-deprecated-apis)): every breaking change since ~2024, with old pattern, new pattern, and date shipped. Keep at least one prompt per live directive.
- Run with your directive surface injected as context (the llms.txt excerpt or AGENTS.md block under test): ≥2 current frontier models, N≥3 trials per prompt per model, tools disabled. Grep the markers, then compute the emission rate.
- Wire the CI gate: PASS below 5% deprecated emission, FAIL at ≥5%, which blocks the release. Run the no-context control arm monthly and retire directives whose old pattern has been absorbed.

## The eval pack (`deprecation-eval-pack.yaml`)

```yaml
# ==========================================================================
# {PRODUCT} deprecation-eval pack, 20 prompt templates
# Adapted from Play 8 of the Agentic Discovery Playbook (synscribe.com).
#
# Placeholder legend (replace everything in braces before running):
#   {PRODUCT}       product name            {FRAMEWORK}  host framework ("Vite + React")
#   {OLD_API}       deprecated API/pattern  {NEW_API}    current replacement
#   {OLD_PKG}       deprecated package      {NEW_PKG}    current package
#   {OLD_CMD}       removed CLI command     {NEW_CMD}    current CLI command
#   {OLD_CONFIG}    legacy config file      {NEW_SETUP}  current setup marker
#   {VOLD} / {VNEW} previous / current major version
#   {TASK}          a concrete task your product performs
#   {ERROR}         the error users hit running {OLD_API}/{OLD_CMD} on {VNEW}
#   {PASTE_*}       real legacy artifacts you paste into the prompt
#
# Sourcing rule (Play 8): take prompts from your stale-surface inventory,
# tasks that HISTORICALLY elicited the old pattern. One prompt per directive,
# minimum.
#
# Field mapping to Play 11's harness schema:
#   deprecated_markers == matchers_fail ; expected_markers == matchers_pass
# ==========================================================================
pack:
  name: "{PRODUCT}-deprecation-eval"
  version: "0.1.0"
  models: ["<frontier-model-1>", "<frontier-model-2>"]  # >=2 current frontier models (Play 8)
  trials_per_model: 3                                   # N>=3 per prompt per model (Play 8)
  tools: disabled
  context: directive_surface  # the llms.txt / AGENTS.md excerpt under test
                              # (simulates an agent that fetched it); set to
                              # none for the monthly control arm
  scope_default: full         # per-prompt override: code_blocks, score only
                              # fenced code, for prompts where the old name
                              # legitimately appears in prose
  gate:
    fail_at_emission: 0.05    # PASS: <5% of trials emit a deprecated marker
                              # FAIL: >=5% -> block the release (Play 8)
  # Semantics: deprecated_markers gate pass/fail. expected_markers do NOT,
  # they flag off-topic completions: a trial that hits neither set needs
  # human review, not a silent pass.

prompts:
  # ── A. Setup / scaffold (5) ──────────────────────────────────────────────
  # Highest stale-window hit rate in our pilot: E3's control arm failed 2/2
  # on a setup-shaped prompt (pilot-grade: single model, Claude Haiku 4.5,
  # tools disabled, n=2 per arm, run 2026-06-11).
  - id: setup-fresh-project
    prompt: "Set up {PRODUCT} in a fresh {FRAMEWORK} project. Show every file and every command."
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_CMD}", "{OLD_PKG}"]
    expected_markers: ["{NEW_PKG}", "{NEW_SETUP}"]
    notes: "The canonical stale-window probe. Keep wording identical across runs so emission rates stay comparable."

  - id: setup-existing-app
    prompt: "What's the quickest way to add {PRODUCT} to an existing {FRAMEWORK} app? Give exact commands and file changes."
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_CMD}", "{OLD_PKG}"]
    expected_markers: ["{NEW_PKG}", "{NEW_SETUP}"]
    notes: "Existing-app phrasing elicits incremental config edits, a different failure path than greenfield."

  - id: setup-config-from-scratch
    prompt: "Write the complete production configuration for {PRODUCT} from scratch, and explain each file."
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_API}"]
    expected_markers: ["{NEW_SETUP}"]
    notes: "Asking for 'complete' config invites the model to enumerate legacy files like {OLD_CONFIG}."

  - id: setup-scaffold-starter
    prompt: "Scaffold a minimal starter app that uses {PRODUCT} for {TASK}. Include every config file."
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_CMD}", "{OLD_PKG}"]
    expected_markers: ["{NEW_PKG}", "{NEW_SETUP}"]
    notes: "Scaffold phrasing reaches for memorized project templates, often the stalest layer of training data."

  - id: setup-readme-instructions
    prompt: "Write the 'Getting started with {PRODUCT}' section of a README: install, configure, first run."
    deprecated_markers: ["{OLD_CMD}", "{OLD_PKG}", "{OLD_CONFIG}"]
    expected_markers: ["{NEW_CMD}", "{NEW_PKG}"]
    notes: "Docs-writing prompts surface what the model believes is canonical; READMEs are where stale commands hide."

  # ── B. Implement feature X (5) ───────────────────────────────────────────
  # Feature prompts catch old code idioms ({OLD_API}) that setup prompts miss.
  - id: feature-core-task
    prompt: "Add {TASK} to this app using {PRODUCT}. Include all imports and config changes."
    deprecated_markers: ["{OLD_API}", "{OLD_PKG}"]
    expected_markers: ["{NEW_API}"]
    notes: "The bread-and-butter probe for API-level (not config-level) deprecations."

  - id: feature-sdk-function
    prompt: "Write a function that does {TASK} with {PRODUCT}'s SDK. Production-ready, with error handling."
    deprecated_markers: ["{OLD_API}", "{OLD_PKG}"]
    expected_markers: ["{NEW_API}", "{NEW_PKG}"]
    notes: "'Production-ready' pushes the model toward fuller, and often older, boilerplate."

  - id: feature-end-to-end
    prompt: "Implement {TASK} end-to-end with {PRODUCT}: client, server, and any config."
    deprecated_markers: ["{OLD_API}", "{OLD_CONFIG}", "{OLD_PKG}"]
    expected_markers: ["{NEW_API}", "{NEW_SETUP}"]
    notes: "End-to-end scope exercises several API surfaces in one completion; one stale import fails the trial."

  - id: feature-idiomatic
    prompt: "Show me the idiomatic way to do {TASK} with {PRODUCT} today."
    deprecated_markers: ["{OLD_API}"]
    expected_markers: ["{NEW_API}"]
    notes: "'Today' tests whether the model self-corrects for recency. Without context in the window, most don't."

  - id: feature-with-test
    prompt: "Implement {TASK} with {PRODUCT} and write a test proving it works."
    deprecated_markers: ["{OLD_API}", "{OLD_PKG}"]
    expected_markers: ["{NEW_API}"]
    notes: "Test code is frequently written from older training examples even when the implementation is current."

  # ── C. Migrate / upgrade (5) ─────────────────────────────────────────────
  # Old names legitimately appear in prose here ("delete {OLD_CONFIG}"),
  # that is correct advice, not a failure. Scope scoring to code blocks:
  # fail = the old pattern written as the migration TARGET.
  - id: migrate-major-upgrade
    prompt: "Upgrade this project from {PRODUCT} v{VOLD} to v{VNEW}. List every change and rewrite the config."
    scope: code_blocks
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_API}", "{OLD_CMD}"]
    expected_markers: ["{NEW_SETUP}", "{NEW_API}"]
    notes: "Fail = v{VOLD} config emitted as the end state of the 'upgrade'."

  - id: migrate-old-config
    prompt: "Here is my {PRODUCT} v{VOLD} config: {PASTE_OLD_CONFIG}. Migrate it to v{VNEW}."
    scope: code_blocks
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_API}"]
    expected_markers: ["{NEW_SETUP}"]
    notes: "Seed the prompt with real legacy config. Fail = output that round-trips it instead of converting it."

  - id: migrate-module-rewrite
    prompt: "This module uses {OLD_API}: {PASTE_SNIPPET}. Rewrite it for {PRODUCT} v{VNEW}."
    scope: code_blocks
    deprecated_markers: ["{OLD_API}", "{OLD_PKG}"]
    expected_markers: ["{NEW_API}"]
    notes: "The pasted snippet contains {OLD_API} by design, code_blocks scope applies to the model's OUTPUT only."

  - id: migrate-breaking-changes
    prompt: "What breaking changes matter when upgrading {PRODUCT} to v{VNEW}, and what replaces {OLD_API}? Show before/after code."
    scope: code_blocks
    deprecated_markers: ["{OLD_API}", "{OLD_CONFIG}"]
    expected_markers: ["{NEW_API}"]
    notes: "The 'before' block legitimately contains {OLD_API}. Segment before/after in your scorer, or drop this prompt rather than let it false-positive."

  - id: migrate-team-checklist
    prompt: "Write a team migration checklist with commands for moving from {PRODUCT} v{VOLD} to v{VNEW}."
    scope: code_blocks
    deprecated_markers: ["{OLD_CMD}", "{OLD_CONFIG}"]
    expected_markers: ["{NEW_CMD}"]
    notes: "Checklists elicit commands. Fail = `{OLD_CMD}` prescribed as a migration step."

  # ── D. Debug / error-driven (5) ──────────────────────────────────────────
  # The failure mode here is repairing toward the past, downgrade, pin,
  # recreate {OLD_CONFIG}, instead of forward to {NEW_API}.
  - id: debug-error-string
    prompt: "I'm getting this error: `{ERROR}`. Fix my {PRODUCT} setup."
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_PKG}", "{OLD_API}"]
    expected_markers: ["{NEW_API}", "{NEW_SETUP}"]
    notes: "Use the real error v{VNEW} throws when fed {OLD_API}. Pair with an error-message docs page (Play 7)."

  - id: debug-removed-command
    prompt: "`{OLD_CMD}` fails with 'command not found'. What's wrong and how do I fix it?"
    scope: code_blocks
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_PKG}"]
    expected_markers: ["{NEW_CMD}", "{NEW_SETUP}"]
    notes: "{OLD_CMD} sits in the prompt itself, so it's excluded from markers and scoring is code-blocks only. Fail = the model 'repairs' the removed command or its legacy config."

  - id: debug-broken-build
    prompt: "My build broke after updating {PRODUCT} to v{VNEW}. Here's the log: {PASTE_LOG}. Fix it."
    deprecated_markers: ["{OLD_CONFIG}", "{OLD_API}", "{OLD_PKG}@{VOLD}"]
    expected_markers: ["{NEW_SETUP}", "{NEW_API}"]
    notes: "The highest-stakes shape: the user is already on v{VNEW}. Fail = advice to downgrade or restore legacy files."

  - id: debug-old-snippet
    prompt: "Why doesn't this work anymore? {PASTE_SNIPPET_USING_OLD_API}"
    scope: code_blocks
    deprecated_markers: ["{OLD_API}", "{OLD_PKG}"]
    expected_markers: ["{NEW_API}"]
    notes: "Fail = patching the old snippet so it limps along; pass = rewriting onto {NEW_API}."

  - id: debug-regression
    prompt: "This {PRODUCT} code worked last year and now throws `{ERROR}`. Repair it."
    deprecated_markers: ["{OLD_PKG}@{VOLD}", "{OLD_CONFIG}", "{OLD_API}"]
    expected_markers: ["{NEW_API}"]
    notes: "String markers can't reliably catch prose like 'just downgrade', flag version pins ({OLD_PKG}@{VOLD}) and spot-check transcripts for downgrade advice."
```

## Three worked examples: the Tailwind v4 case (E3)

Filled-in versions so you can see the shape, using the case [Play 8](/agentic-discovery/stop-ai-using-deprecated-apis) measured. Pilot-grade context: single model family (Claude Haiku 4.5), tools disabled, n=2 per arm, run 2026-06-11. Control agents emitted the obsolete v3 config 2/2 (one prescribed `npx tailwindcss init -p`, a command that no longer exists in v4); a 5-line directive in context flipped that to 0/2. This is a directional signal, not a point estimate.

```yaml
  # ── Worked example 1: setup/scaffold, the E3 task verbatim ─────────────
  - id: tailwind-v4-vite-setup
    prompt: "Set up Tailwind CSS in a Vite + React project."
    context: |
      Tailwind CSS v4 is CSS-first. NEVER create tailwind.config.js or
      postcss.config.js, and never run `npx tailwindcss init -p` (removed).
      ALWAYS use the @tailwindcss/vite plugin and `@import "tailwindcss";`
      in your CSS entry file.
    deprecated_markers:
      - "tailwind.config.js"
      - "postcss.config.js"
      - "npx tailwindcss init"
      - "autoprefixer"
    expected_markers:
      - "@tailwindcss/vite"
      - '@import "tailwindcss";'
    notes: "E3's exact task and directive. Pilot result (single model, n=2/arm): control 2/2 deprecated, directive arm 0/2."

  # ── Worked example 2: migrate/upgrade ────────────────────────────────────
  - id: tailwind-v3-to-v4-upgrade
    prompt: "Upgrade this Vite + React project from Tailwind CSS v3 to v4. Rewrite the config and list every file to delete."
    scope: code_blocks
    deprecated_markers:
      - "tailwind.config.js"
      - "postcss.config.js"
      - "npx tailwindcss init"
      - "autoprefixer"
    expected_markers:
      - "@tailwindcss/vite"
      - '@import "tailwindcss";'
    notes: "v3 file names will rightly appear in prose ('delete tailwind.config.js'). Score code blocks only: fail = v3 config written as the migration target."

  # ── Worked example 3: debug/error-driven ─────────────────────────────────
  - id: tailwind-init-command-fails
    prompt: "`npx tailwindcss init -p` errors out in my new Vite + React project. What's wrong and how do I fix it?"
    scope: code_blocks
    deprecated_markers:
      - "tailwind.config.js"
      - "postcss.config.js"
      - "autoprefixer"
    expected_markers:
      - "@tailwindcss/vite"
      - '@import "tailwindcss";'
    notes: "The command was removed in v4, yet one E3 control trial prescribed it (pilot, n=2/arm). Fail = the model 'fixes' the command or recreates v3 config files; pass = it explains v4 is CSS-first and sets up @tailwindcss/vite. If your scorer can segment JS config bodies, also flag v3-style config objects."
```

## Scoring script (pseudocode)

Deterministic by design: substring/regex markers first, AST matchers when regex over-fires ([Play 8](/agentic-discovery/stop-ai-using-deprecated-apis) prescribes regex/AST; reserve LLM juries for doc-QA, per [Play 11](/agentic-discovery/ai-evals-and-leaderboards)).

```python
# score_pack.py, pseudocode
def run_pack(pack):
    trials = []
    for p in pack.prompts:
        for model in pack.models:
            for i in range(pack.trials_per_model):
                out  = call_model(model, p.prompt,
                                  context=p.get("context", pack.context),
                                  tools="disabled")
                text = extract_code_blocks(out) if p.get("scope") == "code_blocks" else out
                deprecated = any(m.lower() in text.lower() for m in p.deprecated_markers)
                on_topic   = any(m.lower() in text.lower() for m in p.expected_markers)
                trials.append({"id": p.id, "model": model, "trial": i,
                               "deprecated": deprecated,    # the gate
                               "needs_review": not deprecated and not on_topic,
                               "raw": out})                 # keep every transcript for audit

    emission   = sum(t["deprecated"] for t in trials) / len(trials)   # overall rate
    per_prompt = rate_by(trials, key="id")     # which directive is failing
    per_model  = rate_by(trials, key="model")  # cross-model spread
    return emission, per_prompt, per_model, trials

# Gate logic (CI calls this):
#   emission <  pack.gate.fail_at_emission  -> exit 0 (PASS)
#   emission >= pack.gate.fail_at_emission  -> exit 1 (FAIL: block release;
#       tighten the failing directive, more imperative, replacement named
#       in the same sentence, moved higher in the file, and re-run)
```

## CI gate (config sketch)

Per [Play 8](/agentic-discovery/stop-ai-using-deprecated-apis): re-run on every breaking release and every major model release. PASS requires deprecated emission below 5%. Run a no-directive control arm monthly to detect absorption.

```yaml
# .github/workflows/deprecation-eval.yml, sketch, adapt to your CI
name: deprecation-eval
on:
  push:
    tags: ["v*"]              # every release tag, this gate can block it
  schedule:
    - cron: "0 6 1 * *"       # monthly: control arm (absorption check)
  workflow_dispatch: {}       # manual trigger on major model releases

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run pack with directive surface in context
        run: python score_pack.py --pack deprecation-eval-pack.yaml --out results.json
      - name: "Gate: PASS <5% emission, FAIL >=5% (Play 8)"
        run: |
          python - <<'PY'
          import json, sys
          r = json.load(open("results.json"))
          sys.exit(1 if r["emission"] >= 0.05 else 0)
          PY
      - name: Monthly control arm (no directive surface)
        if: github.event_name == 'schedule'
        run: |
          python score_pack.py --pack deprecation-eval-pack.yaml --context none --out control.json
          # When CONTROL emission for a directive's target hits 0%, the change
          # is absorbed by current models -> flag that directive for retirement
          # (E2's lesson, automated). Stale directives carry full mandate
          # authority while being wrong, worse than none.

      # Separate assertion, on every breaking release: the release's directives
      # are live on all four surfaces (llms.txt, .md banners, AGENTS.md block,
      # tool descriptions) within 48 hours of the release tag.
```

## What this pack won't catch

> - **Already-absorbed deprecations.** In our E2 pilot (Supabase `auth-helpers` → `@supabase/ssr`, n=2+2; Stripe Charges → PaymentIntents/Checkout, n=1+1; single model, run 2026-06-11), deprecated emission was 0% in *both* arms. The models had absorbed those 2023–24 changes. Prompts targeting absorbed changes always pass and tell you nothing; the monthly control arm exists to find them and retire their directives.
> - **Cross-model variance.** Every number above is pilot-grade, from one model family (Claude Haiku 4.5) at n=2–3 per arm. Another model can fail prompts this one passes, and vice versa. Run ≥2 current frontier models and treat single-model results as direction, not truth.
> - **Paraphrase evasion.** String markers only catch what they name: output reproducing the old pattern under renamed files or restructured code slips through. Graduate to AST matchers where it matters, and spot-read a sample of raw transcripts every run.

---

*This resource accompanies [Play 8: Your AI Writes Outdated Code, and How to Fix It](/agentic-discovery/stop-ai-using-deprecated-apis). Part of [The Complete Playbook to Agentic Discovery](/agentic-discovery). Last verified 2026-06-11.*