Why your AI coding agent rediscovers your codebase every prompt (and how to fix it)
Every Claude Code, Cursor, or Codex session re-greps the same files, re-reads the same handlers, re-builds the same blurry mental model — every prompt. Here's why the rediscovery tax happens, and what a context engine actually does about it.
The rediscovery tax
Every time you ask Claude Code, Cursor, or Codex to fix a bug, the agent does the same five things:
- Greps your repo for keywords related to the task.
- Walks the directory tree to find what looks relevant.
- Reads four to seven files trying to find the actual handler.
- Re-reads files it already read because it forgot what was in them.
- Finally takes a guess.
On a typical Next.js project, this rediscovery loop can burn 8,000–15,000 tokens per coding task before the agent makes its first edit. Your numbers will vary by codebase size and the agent's tool discipline — but the pattern is consistent: most tokens get spent rebuilding the model, not editing code.
That's the rediscovery tax. Every prompt, your agent rebuilds a blurry mental model of your codebase from scratch, because there's no persistent layer telling it what's actually true. Then it works from that blur.
If you're paying per token (Claude API, OpenAI API, Cursor's pro tier when you exceed the cap), this hits your wallet directly. But the bigger cost is signal loss. The agent's first read of a file is rushed. By the time it edits, it's running on vibes.
This article walks through why this happens, and the fix that's emerged in the last six months: context engines that sit between your agent and your codebase.
Why agents grep blind in the first place
Coding agents are stateless across turns. Every prompt starts a fresh context window. The agent's options for "what's in this codebase":
- Trust the user's prompt — the user mentioned three files; assume those are the only relevant ones. (Wrong half the time.)
- Trust the agent's training data — most LLMs were trained on a snapshot of public code from a year ago. Your private repo is not in there.
- Look it up live — grep, file walk, read.
Option 3 is what every agent does by default. It's the only honest choice. But it's slow, expensive, and lossy.
The standard approach to "give the agent context" used to be:
- Paste big chunks of code into the prompt
- Maintain a
CLAUDE.md/.cursorruleswith project-wide rules - Use
@filementions to point at specific files
These help, but they're manual. The user has to know what's relevant before the agent does. If you knew that, you wouldn't need an agent.
What you actually want: the agent looks something up, gets a deterministic answer, and moves on. Like calling a function instead of running a search engine.
Enter MCP
Anthropic released the Model Context Protocol (MCP) in November 2024. It's a stdio-based protocol that lets AI agents call external tools via JSON-RPC. The agent spawns an MCP server process; the server exposes tools with typed schemas; the agent calls them when it needs to.
The protocol itself is unsexy. The interesting part is the architecture it unlocks: one server, every agent.
Before MCP, every agent had its own custom tooling. Cursor's
@codebase was Cursor-only. Claude Code's bash tools were
Claude-only. Each tool you wanted, you reinvented per agent.
After MCP: build the tool once, every compliant agent uses it. As of now, MCP-compatible clients include Claude Code, Codex CLI, Cursor, Cline, Continue.dev, and a growing list of others.
So the question becomes: what tool would actually fix the rediscovery tax?
What "context engine" means
A context engine is a process that:
- Indexes your codebase — files, symbols, routes, imports, schema. Stored locally, queryable.
- Tracks state — what was indexed when, what changed, what's stale.
- Returns ranked, typed context on demand — the agent calls a single tool, gets back the right starting context, and proceeds.
The opposite of grep. Grep gives you a list of matches. A context
engine gives you a structured answer: "the auth callback is at
src/auth/oauth.ts:42, it touches sessions
and users tables, here are 2 prior reviews of this code,
read middleware/session.ts:88 next."
That's a different game.
The architectural design space:
| Approach | What it stores | Lookup style | Where the data lives |
|---|---|---|---|
| Vector search / RAG | embeddings | semantic similarity | usually hosted |
| Built-in editor index (Cursor) | per-tool snapshot | hybrid | hosted/local |
| Context engine (e.g. agentmako) | typed graph | deterministic | local |
Vector search wins for unstructured text (docs, support tickets, prose). It loses for code because code is highly structured. A graph that knows "this file imports from that file" beats vector similarity on accuracy and token cost, every time.
What changed in the last six months
The MCP ecosystem went from "Anthropic's experimental protocol" to "the way coding agents talk to tools" between November 2024 and now.
Three accelerants:
- Cursor added MCP support in early 2025. That broke the dam. Once Cursor accepted MCP, Cline followed, then Continue, then Codex.
- Claude Code shipped its plugin system with MCP as the transport. Plugins distribute MCP servers + skills (instructions on when to call them).
- The "context engine" concept got named. Devs started building dedicated MCP servers focused on codebase intelligence — not generic filesystem access, but typed views of code structure.
I've been building one of these (agentmako) for the past several months. It's open source under Apache-2.0, runs locally, and indexes a TS/JS/TSX repo in ~10–30 seconds for a 50k-file project.
The point of this post isn't to pitch agentmako specifically. It's to argue that if you're using AI coding agents seriously, you should run a context engine. Whether that's agentmako, a different MCP server, or something you build — stop letting your agent grep blind.
What the fix looks like in practice
Here's the same task, two ways.
Without a context engine:
> grep -rn "callback" .
./node_modules/express/... (847 hits)
./node_modules/passport/... (412 hits)
./src/auth/oauth.ts:14 (maybe?)
./src/legacy/old-auth.ts:88 (deleted? unclear)
./tests/auth.spec.ts:201 (test only)
> read ./src/auth/oauth.ts
> read ./src/auth/oauth-config.ts
> read ./src/middleware/session.ts
> read ./src/routes/index.ts
// 4 file reads, ~12k tokens, still
// not sure which path is current. With a context engine call:
> context_packet "fix broken auth callback"
→ target: src/auth/oauth.ts:42
routes: GET /auth/callback
touches: sessions, jwt
db: users, sessions
findings: 2 prior reviews on this path
read next: middleware/session.ts:88
// 1 tool call. Typed.
// Deterministic. ~600 tokens. Same task. ~20x token reduction. And the second answer is deterministic — same prompt produces the same packet, every time. That matters because it makes agent behavior debuggable.
How to install one in five minutes
If you want to try this with agentmako specifically, here's the setup. (Substitute another MCP server if you prefer — the pattern is the same.)
1. Install:
npm install -g agentmako 2. Attach a project:
cd /path/to/your/project
agentmako connect . --no-db
agentmako doctor agentmako doctor should show all green. The first
connect indexes your repo (10–30 seconds for a typical
project, longer for monorepos).
3. Wire MCP in your agent.
For Claude Code:
claude mcp add mako-ai agentmako mcp For Cursor (.cursor/mcp.json):
{
"mcpServers": {
"mako-ai": {
"command": "agentmako",
"args": ["mcp"]
}
}
} For Codex CLI (~/.codex/config.toml):
[mcp_servers.mako-ai]
command = "agentmako"
args = ["mcp"] For Cline: Settings → MCP Servers → paste the same JSON as Cursor.
4. Tell the agent to actually use it.
This is the step most people miss. Drop the agentmako CLAUDE.md template into your project root:
curl -O https://agentmako.drhalto.com/CLAUDE.md That file tells the agent: "before grepping, call
context_packet. For DB questions, use
db_table_schema instead of guessing." Without it, the
agent will default to grep no matter how good your tools are.
What you'll notice
In my own usage, switching from "agent greps blind" to "agent calls
context_packet first":
- Token consumption per task drops 60–90%. Most of my Claude Code sessions used to burn through API quota in an hour. Now they last all day.
- The agent stops hallucinating routes, table names, function signatures. Because it actually looks them up instead of pattern-matching.
- Cross-session memory. When the agent finds a bug, the finding persists. Next session, the agent remembers.
The skeptical version: I'm one engineer with one tool I built. Maybe my repos are weird. But I've heard the same thing from every dev I've gotten to try it.
Try it on your own
If you want to see the rediscovery tax in your own workflow:
- Pick a real coding task you'd give your agent (a bug to fix, a refactor, etc.).
- Run it normally. Note how many file reads happen and how many tokens get spent before any code gets edited.
- Install agentmako (or any MCP context engine). Run the same task
with
context_packetas the first call. - Compare.
If the second version isn't dramatically faster and more accurate, the tool wasn't worth it for your stack. Move on. But I bet it will be.
agentmako specifically is at agentmako.drhalto.com — Apache-2.0, no hosted service, npm install. The full FAQ is at /docs/faq.html if you want technical depth before installing.
The broader point stands either way: AI coding agents in 2026 should not be grepping blind. The infrastructure to fix it now exists. Use it.
Want this for your codebase?
agentmako is local-first, Apache-2.0, and works with every MCP-compatible coding agent.