Why your AI coding agent rediscovers your codebase every prompt (and how to fix it)

Every Claude Code, Cursor, or Codex session re-greps the same files, re-reads the same handlers, re-builds the same blurry mental model — every prompt. Here's why the rediscovery tax happens, and what a context engine actually does about it.

April 26, 2026 · 8 min read · mcpcontexttokens

The rediscovery tax

Every time you ask Claude Code, Cursor, or Codex to fix a bug, the agent does the same five things:

Greps your repo for keywords related to the task.
Walks the directory tree to find what looks relevant.
Reads four to seven files trying to find the actual handler.
Re-reads files it already read because it forgot what was in them.
Finally takes a guess.

On a typical Next.js project, this rediscovery loop can burn 8,000–15,000 tokens per coding task before the agent makes its first edit. Your numbers will vary by codebase size and the agent's tool discipline — but the pattern is consistent: most tokens get spent rebuilding the model, not editing code.

That's the rediscovery tax. Every prompt, your agent rebuilds a blurry mental model of your codebase from scratch, because there's no persistent layer telling it what's actually true. Then it works from that blur.

If you're paying per token (Claude API, OpenAI API, Cursor's pro tier when you exceed the cap), this hits your wallet directly. But the bigger cost is signal loss. The agent's first read of a file is rushed. By the time it edits, it's running on vibes.

This article walks through why this happens, and the fix that's emerged in the last six months: context engines that sit between your agent and your codebase.

Why agents grep blind in the first place

Coding agents are stateless across turns. Every prompt starts a fresh context window. The agent's options for "what's in this codebase":

Trust the user's prompt — the user mentioned three files; assume those are the only relevant ones. (Wrong half the time.)
Trust the agent's training data — most LLMs were trained on a snapshot of public code from a year ago. Your private repo is not in there.
Look it up live — grep, file walk, read.

Option 3 is what every agent does by default. It's the only honest choice. But it's slow, expensive, and lossy.

The standard approach to "give the agent context" used to be:

Paste big chunks of code into the prompt
Maintain a CLAUDE.md / .cursorrules with project-wide rules
Use @file mentions to point at specific files

These help, but they're manual. The user has to know what's relevant before the agent does. If you knew that, you wouldn't need an agent.

What you actually want: the agent looks something up, gets a deterministic answer, and moves on. Like calling a function instead of running a search engine.

Enter MCP

Anthropic released the Model Context Protocol (MCP) in November 2024. It's a stdio-based protocol that lets AI agents call external tools via JSON-RPC. The agent spawns an MCP server process; the server exposes tools with typed schemas; the agent calls them when it needs to.

The protocol itself is unsexy. The interesting part is the architecture it unlocks: one server, every agent.

Before MCP, every agent had its own custom tooling. Cursor's @codebase was Cursor-only. Claude Code's bash tools were Claude-only. Each tool you wanted, you reinvented per agent.

After MCP: build the tool once, every compliant agent uses it. As of now, MCP-compatible clients include Claude Code, Codex CLI, Cursor, Cline, Continue.dev, and a growing list of others.

So the question becomes: what tool would actually fix the rediscovery tax?

What "context engine" means

A context engine is a process that:

Indexes your codebase — files, symbols, routes, imports, schema. Stored locally, queryable.
Tracks state — what was indexed when, what changed, what's stale.
Returns ranked, typed context on demand — the agent calls a single tool, gets back the right starting context, and proceeds.

The opposite of grep. Grep gives you a list of matches. A context engine gives you a structured answer: "the auth callback is at src/auth/oauth.ts:42, it touches sessions and users tables, here are 2 prior reviews of this code, read middleware/session.ts:88 next."

That's a different game.

The architectural design space:

Approach	What it stores	Lookup style	Where the data lives
Vector search / RAG	embeddings	semantic similarity	usually hosted
Built-in editor index (Cursor)	per-tool snapshot	hybrid	hosted/local
Context engine (e.g. agentmako)	typed graph	deterministic	local

Vector search wins for unstructured text (docs, support tickets, prose). It loses for code because code is highly structured. A graph that knows "this file imports from that file" beats vector similarity on accuracy and token cost, every time.

What changed in the last six months

The MCP ecosystem went from "Anthropic's experimental protocol" to "the way coding agents talk to tools" between November 2024 and now.

Three accelerants:

Cursor added MCP support in early 2025. That broke the dam. Once Cursor accepted MCP, Cline followed, then Continue, then Codex.
Claude Code shipped its plugin system with MCP as the transport. Plugins distribute MCP servers + skills (instructions on when to call them).
The "context engine" concept got named. Devs started building dedicated MCP servers focused on codebase intelligence — not generic filesystem access, but typed views of code structure.

I've been building one of these (agentmako) for the past several months. It's open source under Apache-2.0, runs locally, and indexes a TS/JS/TSX repo in ~10–30 seconds for a 50k-file project.

The point of this post isn't to pitch agentmako specifically. It's to argue that if you're using AI coding agents seriously, you should run a context engine. Whether that's agentmako, a different MCP server, or something you build — stop letting your agent grep blind.

What the fix looks like in practice

Here's the same task, two ways.

Without a context engine:

> grep -rn "callback" .
   ./node_modules/express/...     (847 hits)
   ./node_modules/passport/...    (412 hits)
   ./src/auth/oauth.ts:14         (maybe?)
   ./src/legacy/old-auth.ts:88    (deleted? unclear)
   ./tests/auth.spec.ts:201       (test only)
> read ./src/auth/oauth.ts
> read ./src/auth/oauth-config.ts
> read ./src/middleware/session.ts
> read ./src/routes/index.ts
// 4 file reads, ~12k tokens, still
// not sure which path is current.

With a context engine call:

> context_packet "fix broken auth callback"

→ target:    src/auth/oauth.ts:42
  routes:    GET /auth/callback
  touches:   sessions, jwt
  db:        users, sessions
  findings:  2 prior reviews on this path
  read next: middleware/session.ts:88

// 1 tool call. Typed.
// Deterministic. ~600 tokens.

Same task. ~20x token reduction. And the second answer is deterministic — same prompt produces the same packet, every time. That matters because it makes agent behavior debuggable.

How to install one in five minutes

If you want to try this with agentmako specifically, here's the setup. (Substitute another MCP server if you prefer — the pattern is the same.)

1. Install:

npm install -g agentmako

2. Attach a project:

cd /path/to/your/project
agentmako connect . --no-db
agentmako doctor

agentmako doctor should show all green. The first connect indexes your repo (10–30 seconds for a typical project, longer for monorepos).

3. Wire MCP in your agent.

For Claude Code:

claude mcp add mako-ai agentmako mcp

For Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "mako-ai": {
      "command": "agentmako",
      "args": ["mcp"]
    }
  }
}

For Codex CLI (~/.codex/config.toml):

[mcp_servers.mako-ai]
command = "agentmako"
args = ["mcp"]

For Cline: Settings → MCP Servers → paste the same JSON as Cursor.

4. Tell the agent to actually use it.

This is the step most people miss. Drop the agentmako CLAUDE.md template into your project root:

curl -O https://agentmako.drhalto.com/CLAUDE.md

That file tells the agent: "before grepping, call context_packet. For DB questions, use db_table_schema instead of guessing." Without it, the agent will default to grep no matter how good your tools are.

What you'll notice

In my own usage, switching from "agent greps blind" to "agent calls context_packet first":

Token consumption per task drops 60–90%. Most of my Claude Code sessions used to burn through API quota in an hour. Now they last all day.
The agent stops hallucinating routes, table names, function signatures. Because it actually looks them up instead of pattern-matching.
Cross-session memory. When the agent finds a bug, the finding persists. Next session, the agent remembers.

The skeptical version: I'm one engineer with one tool I built. Maybe my repos are weird. But I've heard the same thing from every dev I've gotten to try it.

Try it on your own

If you want to see the rediscovery tax in your own workflow:

Pick a real coding task you'd give your agent (a bug to fix, a refactor, etc.).
Run it normally. Note how many file reads happen and how many tokens get spent before any code gets edited.
Install agentmako (or any MCP context engine). Run the same task with context_packet as the first call.
Compare.

If the second version isn't dramatically faster and more accurate, the tool wasn't worth it for your stack. Move on. But I bet it will be.

agentmako specifically is at agentmako.drhalto.com — Apache-2.0, no hosted service, npm install. The full FAQ is at /docs/faq.html if you want technical depth before installing.

The broader point stands either way: AI coding agents in 2026 should not be grepping blind. The infrastructure to fix it now exists. Use it.

Want this for your codebase?

agentmako is local-first, Apache-2.0, and works with every MCP-compatible coding agent.

Read the docs →