Chapter 5: Context & Token Mastery
Token Management Strategies
Token Management Strategies
You understand tokens and context windows. Now let's put that knowledge to work. Here are the most effective strategies for managing token usage, ranked by impact.
1. Start fresh between tasks
This is the single most impactful habit you can build. When you finish a task -- feature complete, bug fixed, refactoring done -- run /clear before starting the next one.
Conversation history cleared. Context reset.
Do not carry a sorting conversation into a search feature conversation. The leftover context from the previous task adds noise, costs tokens, and can cause Claude to make connections that are not relevant.
Your CLAUDE.md provides the continuity you need. Every fresh session starts with the same project context, conventions, and patterns.
2. Use /compact proactively
Do not wait for context to degrade. Compact after each major milestone:
Conversation compacted. Summary focused on search feature changes retained.
The optional focus phrase tells Claude what to prioritize in the summary. This is useful when a conversation has covered multiple topics but you only need to continue one thread.
Good times to compact:
- Feature is built but you want to iterate on it
- Bug is fixed and you are cleaning up
- Design iteration is done, moving to implementation
- You notice Claude referencing things from 20 messages ago
3. Choose the right model
Sonnet is roughly 3x cheaper than Opus per token. If you use Sonnet by default and only switch to Opus for complex reasoning, you save significantly over a work session.
Use /model to switch. No context loss, just different pricing going forward. See the previous section for a detailed decision framework.
4. Delegate to subagents
When Claude Code launches an agent to research something -- reading multiple files, searching the codebase, investigating an error -- only the summary returns to your main context. The full research stays in the subagent's context and is discarded when the subagent finishes.
This can save 30-50% of context for investigation-heavy tasks. You get the answer without all the intermediate steps filling up your conversation.
You do not need to do anything special to trigger subagents. Claude Code decides automatically when to delegate. But knowing this pattern helps you understand why "research this codebase" does not blow up your context the way you might expect.
5. Disable unused MCP servers
Each MCP server adds tool definitions to your context. If you have a database server, a Slack server, and a GitHub server connected but you are only working on code, the unused servers are consuming tokens for no benefit.
Connected MCP servers: github, slack, postgres
Run /mcp to see what is connected. Disconnect servers you are not using in your current workflow.
6. Control extended thinking
Claude Code's thinking effort affects token usage. Lower effort means less internal reasoning and fewer tokens consumed.
/effort low-- For simple tasks like renaming, formatting, or lookups- Default effort -- Fine for most coding work
/effort high-- For complex reasoning, debugging, or architectural planning
Effort set to low. Claude will use less thinking for simpler responses.
Match the effort to the task. You do not need deep reasoning for "add a CSS class."
7. Filter verbose output with hooks
When a command produces 1,000 lines of output, Claude sees all of it. Test suites, build logs, and lint reports can dump thousands of tokens into your context.
Hooks can filter command output before it reaches context -- trimming test output to just failures, or build logs to just errors. This is an advanced technique covered in the Claude Code documentation.
The #1 mistake intermediate users make is running one long conversation for an entire work session. Short, focused conversations with /clear between tasks produce better results AND cost less.
Try it yourself
Put these strategies into practice right now:
- Open Claude Code in your todo-app
- Run /cost -- note the starting point
- Ask Claude to explain 3 different files
- Run /cost again -- notice the increase
- Run /compact
- Run /cost one more time -- notice the reduced active context
- Run /context to see the before/after
This exercise gives you a concrete feel for how quickly tokens accumulate with file reads, and how effectively /compact reclaims context space. Once you have seen the numbers, you will naturally start managing your sessions more intentionally.