One Prompt, 50% of Token Usage Is Gone — Here's Why and What I Did About It

I've been using Claude Code heavily on a few client projects, and at some point I noticed something that didn't sit right with me.

I'd type a prompt, something pretty straightforward — "update the auth middleware to handle expired tokens" — and before Claude even wrote a single line of code, my context meter had jumped from 10% to almost 50%.

That didn't make sense to me. So I ran /context to see what was actually going on.

Turns out, I had no idea what I was paying for.

What's Actually Happening Inside Claude Code

Here's something Claude Code doesn't advertise loudly enough: it doesn't index your codebase.

There's no background process that maps your files, no vector search running quietly, no pre-built understanding of your project structure. Every time you give Claude a task, it starts from scratch — reading files, running searches, exploring directories — just to figure out where the relevant code even lives.

And all of that exploration? It all goes into your main context window.

So by the time Claude actually starts writing the fix you asked for, your context is already half full of file contents, grep results, and directory listings that you'll never need to see again.

This is the context window breakdown I was looking at:

claude-sonnet-4-6
76k/200k tokens (38%)

System prompt:    2.7k  (1.3%)
System tools:    16.8k  (8.4%)
Memory files:     7.4k  (3.7%)
Messages:        49.1k  (24.6%)  ← this is where it all goes
Free space:     124k   (62%)

That Messages line is your entire conversation history — including every file Claude read, every search result it processed, every command output. It compounds fast.

Why This Gets Worse As Your Project Grows

When your codebase is small, this isn't a big deal. Claude reads a couple of files, figures out the pattern, writes the code.

But once your project crosses a certain size — multiple modules, nested directories, a mix of frontend and backend — the exploration cost scales up. Claude has to read more files to understand context. It runs more searches to find where things are. It checks more configuration files to understand the conventions.

And because all of that stays in your main context window, you hit compaction faster. Claude starts "forgetting" decisions from earlier in the session. The quality of its output drifts. You end up re-explaining things you already covered.

I've had sessions where I barely made it through two features before I had to /clear and start over.

The Fix: Subagents

The solution that actually worked for me was using subagents — and more importantly, setting them up so they trigger automatically.

Here's the concept: instead of Claude doing all the exploration inside your main conversation, you delegate that work to a separate Claude instance. The subagent runs in its own isolated context window, reads everything it needs, then sends back a clean summary to your main session.

Your main conversation never sees the noise. It only gets the result.

The difference in practice is significant. A task that previously burned 40% of my context on exploration now costs maybe 5–8% — because the exploration happened elsewhere.

Creating Your Explorer Subagent

Create a file at .claude/agents/explorer.md in your project:

---
name: explorer
description: Use this agent for ANY task that requires reading or searching files across the codebase — understanding existing patterns, finding where something is implemented, auditing code, or investigating how a feature works. Always prefer this over reading files in the main conversation.
model: sonnet
tools: Read, Grep, Glob
---

You are a codebase exploration specialist. Your job is to read files,
find patterns, and return a concise summary back to the main agent.

Never modify files. Explore efficiently — read only what's relevant.
Return a structured summary with:
- What you found
- Where it is (file paths)
- Key patterns or conventions to follow

The description field is critical. That's what Claude uses to decide when to delegate. Write it in plain terms — describe the kinds of tasks that should trigger it, not just the agent's capabilities.

Once this file exists, Claude will automatically route codebase exploration tasks to this agent without you asking.

Adding Routing Rules to CLAUDE.md

The subagent file handles the automatic delegation, but I also added explicit rules to my CLAUDE.md to reinforce the behavior:

## Context Management Rules

- NEVER read more than 2 files in the main conversation to answer a question
- Use the explorer subagent for any codebase investigation or research
- Keep the main conversation for implementation only
- When in doubt, delegate the research first, then implement

CLAUDE.md loads at the start of every session, so these rules are always active. Think of it as a standing instruction to yourself — except it's for Claude.

The Two-Layer Setup in Practice

Here's what the workflow looks like now when I give Claude a task:

Before (without subagents):

I type: "Update auth middleware to handle expired tokens"
Claude reads middleware.py, auth.py, settings.py, utils.py, token-related tests...
All of that lands in my main context
Claude writes the fix
Context meter: 45–55%

After (with subagents):

I type: "Update auth middleware to handle expired tokens"
Claude delegates to the explorer subagent
Explorer reads all the relevant files — in its own context
Explorer returns: "Token expiry is handled in /backend/auth/middleware.py line 47, using the TokenValidator class. Existing pattern uses raise AuthenticationFailed. Tests are in /tests/test_auth.py."
Claude uses that summary to write the fix
Context meter: 12–18%

Same result. A fraction of the cost.

Setting Up a Code Reviewer Subagent Too

While I was at it, I created a second subagent for code review tasks, since those also involve a lot of file reading:

---
name: code-reviewer
description: Use this agent to review code changes, audit existing implementations, check for security issues, or analyze code quality across multiple files. Use before committing significant changes.
model: sonnet
tools: Read, Grep, Glob
---

You are a code review specialist. Analyze code for:
- Correctness and logic errors
- Security vulnerabilities
- Performance concerns
- Consistency with existing patterns

Return a structured review with issues ranked by severity.
Never modify files.

Now when I ask Claude to "review the changes before I push," it automatically delegates to this agent instead of dumping a full diff analysis into my main context.

What About Tasks That Should Stay in the Main Conversation?

Not everything should be delegated. Here's how I think about it:

Use subagents for:

Any task where Claude needs to "figure out" where something lives
Codebase-wide searches
Reviewing or auditing multiple files
Anything described as "investigate", "find", "check all", "audit"

Keep in the main conversation:

Quick targeted changes when you already know the file
Iterative back-and-forth on something you're actively building
Writing tests for code you just discussed in the same session

The rule I follow: if Claude needs to explore to do the task, it goes to a subagent. If I'm handing it a specific file and telling it exactly what to do, it stays in main.

One More Thing: Run /context Regularly

Before any of this clicked for me, I didn't actually know how my tokens were being spent. Running /context was what made it concrete.

Make it a habit to check it at the start of a session and after a few prompts. You'll quickly develop a sense for what's expensive and what isn't — and you'll catch context bloat before it becomes a problem.

Summary

If your prompts are eating 40–50% of your context window before Claude even starts coding, this is almost certainly why:

Claude reads files to understand your codebase on every task
All of that exploration accumulates in your main context window
The larger your project, the worse this gets

The fix is a one-time setup:

Create .claude/agents/explorer.md with a clear description of when to trigger
Add routing rules to your CLAUDE.md
That's it — Claude delegates automatically from that point

You don't have to ask it every time. You just set the rules once and it handles the rest.

My sessions now run longer, the output quality stays consistent deeper into a session, and I'm not constantly hitting compaction mid-task. Worth the 10 minutes to set up.

Thoughts & lessons frombuilding AI systems.