CC
0 XP
0

Chapter 5: Context & Token Mastery

What's In Your Context Window?

concept5 min

What's In Your Context Window?

Every time you start a Claude Code session, a bunch of things are loaded into your context window before you even type your first prompt. Understanding this "baseline overhead" is key to managing tokens effectively.

The full picture

Here's everything that can live in your context window, and roughly how many tokens each piece costs:

WhatToken costWhen it loads
System prompt~1-2K tokensAlways (Claude Code's base instructions, cached)
CLAUDE.md files200-2K+ tokensEvery session start
Auto memory (MEMORY.md)0-2K tokensSession start (first 200 lines / 25KB)
Skill/command descriptions50-200 tokens eachSession start (full content loads on invocation)
MCP tool names100-500 tokensSession start (full schemas deferred until use)
Git status100-500 tokensSession start (current branch, uncommitted changes)
Your promptsVariesAs you type them
Claude's responsesVariesAs Claude responds
File readsDepends on file sizeWhen Claude reads a file
Command outputDepends on verbosityWhen Claude runs a command

Total baseline overhead: 1.5-5.5K tokens before you type anything.

That might sound like a lot, but in a 1M-token context window, it's less than 1%. The real context consumers are file reads, long conversations, and verbose command outputs.

💡Info

This is why starting a fresh session with /clear is so powerful -- it resets everything except the baseline overhead. Your CLAUDE.md gives Claude Code all the project context it needs to get right back to work.

Where the tokens actually go

In a typical session, here's how context usage breaks down:

  1. Baseline (system prompt, CLAUDE.md, git status): ~2-4K tokens
  2. Your first prompt: ~50-200 tokens
  3. Claude reads 2-3 files to understand the task: ~1-3K tokens
  4. Claude's response with a plan and code: ~500-2K tokens
  5. Follow-up prompts and responses: accumulates over the session
  6. Command outputs (build errors, test results): can spike to 1-5K tokens each

By message 10-15 in a conversation, you might be at 15-30K tokens. By message 30+, you could be at 50-100K. This is why focused conversations matter.

The smart parts: caching and deferral

Claude Code doesn't load everything at full cost. Two mechanisms keep the baseline efficient:

Prompt caching -- The system prompt and CLAUDE.md are cached across requests. Cached tokens cost roughly 10% of standard pricing, so that 2K-token CLAUDE.md isn't costing you full price on every turn.

Deferred loading -- Skill descriptions and MCP tool schemas are loaded as short summaries at session start. The full content only loads when you actually invoke them. This means having 20 MCP tools configured doesn't blow up your context -- only the ones you use add significant tokens.

Tip

Prompt caching helps keep costs down: system prompt and CLAUDE.md are cached across requests, so repeated tokens cost roughly 10% of standard pricing.

What this means for you

The practical takeaway: your context window is mostly consumed by the conversation itself -- the back-and-forth of prompts, responses, file reads, and command outputs. The baseline overhead is small and well-optimized. Focus your token management on keeping conversations short and targeted, not on trimming your system prompt.