Chapter 5: Context & Token Mastery
What's In Your Context Window?
What's In Your Context Window?
Every time you start a Claude Code session, a bunch of things are loaded into your context window before you even type your first prompt. Understanding this "baseline overhead" is key to managing tokens effectively.
The full picture
Here's everything that can live in your context window, and roughly how many tokens each piece costs:
| What | Token cost | When it loads |
|---|---|---|
| System prompt | ~1-2K tokens | Always (Claude Code's base instructions, cached) |
| CLAUDE.md files | 200-2K+ tokens | Every session start |
| Auto memory (MEMORY.md) | 0-2K tokens | Session start (first 200 lines / 25KB) |
| Skill/command descriptions | 50-200 tokens each | Session start (full content loads on invocation) |
| MCP tool names | 100-500 tokens | Session start (full schemas deferred until use) |
| Git status | 100-500 tokens | Session start (current branch, uncommitted changes) |
| Your prompts | Varies | As you type them |
| Claude's responses | Varies | As Claude responds |
| File reads | Depends on file size | When Claude reads a file |
| Command output | Depends on verbosity | When Claude runs a command |
Total baseline overhead: 1.5-5.5K tokens before you type anything.
That might sound like a lot, but in a 1M-token context window, it's less than 1%. The real context consumers are file reads, long conversations, and verbose command outputs.
This is why starting a fresh session with /clear is so powerful -- it resets everything except the baseline overhead. Your CLAUDE.md gives Claude Code all the project context it needs to get right back to work.
Where the tokens actually go
In a typical session, here's how context usage breaks down:
- Baseline (system prompt, CLAUDE.md, git status): ~2-4K tokens
- Your first prompt: ~50-200 tokens
- Claude reads 2-3 files to understand the task: ~1-3K tokens
- Claude's response with a plan and code: ~500-2K tokens
- Follow-up prompts and responses: accumulates over the session
- Command outputs (build errors, test results): can spike to 1-5K tokens each
By message 10-15 in a conversation, you might be at 15-30K tokens. By message 30+, you could be at 50-100K. This is why focused conversations matter.
The smart parts: caching and deferral
Claude Code doesn't load everything at full cost. Two mechanisms keep the baseline efficient:
Prompt caching -- The system prompt and CLAUDE.md are cached across requests. Cached tokens cost roughly 10% of standard pricing, so that 2K-token CLAUDE.md isn't costing you full price on every turn.
Deferred loading -- Skill descriptions and MCP tool schemas are loaded as short summaries at session start. The full content only loads when you actually invoke them. This means having 20 MCP tools configured doesn't blow up your context -- only the ones you use add significant tokens.
Prompt caching helps keep costs down: system prompt and CLAUDE.md are cached across requests, so repeated tokens cost roughly 10% of standard pricing.
What this means for you
The practical takeaway: your context window is mostly consumed by the conversation itself -- the back-and-forth of prompts, responses, file reads, and command outputs. The baseline overhead is small and well-optimized. Focus your token management on keeping conversations short and targeted, not on trimming your system prompt.