Claude Code Cache Prompt Management Playbook

The no-dev explanation

Imagine Claude Code has to reread a giant binder every time you ask a question: system instructions, tool list, project rules, files it already read, chat history, and your new request. Prompt caching is like saying, "The first part of this binder is identical to last time, so reuse it."

Good: Keep the beginning of the session stable. Pick your model, connect MCP servers, choose output style, and load project rules at the start.

Careful: New messages are fine because they are added at the end. Cache usually breaks when something changes near the beginning, like model, tools, or system prompt.

Bad: Switching models, changing MCP tool availability, or compacting mid-task can cause slower or more expensive turns while the cache rebuilds.

One sentence rule

Cache loves boring consistency. Set up the session once, then keep working in the same lane until a natural break.

Start stableWork in one task laneCompact at breaksUse subagents for side questsClear between unrelated tasks

The request stack Claude Code sends

Claude Code sends context every turn. The cache works by exact prefix matching, so earlier layers are more sensitive than later layers.

System prompt

Most sensitive

Project context

Loaded at start

Conversation

Appends normally

Plain English: changing "rules of the game" is expensive. Adding another normal message is expected.

Before you start Claude Code

Do this	Why it helps cache/context
Open the correct repo folder first.	Starting in the right place avoids rebuilding around a different project context.
Pick the model once.	Each model has its own cache. Switching models forces a fresh read.
Connect MCP servers before serious work.	Tool definitions affect the system prompt. Changing tool availability mid-session can invalidate cache.
Check `CLAUDE.md` and auto memory.	Project rules and memories load at session start. Stable, concise memory reduces repeated explanations.
Choose output style before the session matters.	Output style is part of the system prompt and loads at session start.

While working

Habit	Use it when
`/usage` or a status line	You want to know whether cache reads are high and creation tokens are low.
`/recap`	You want a session summary without replacing history.
`/compact Focus on...`	You reached a natural break and want a smaller conversation going forward.
`/clear`	You are switching to an unrelated task and stale context would confuse Claude.
Subagents	You need research, logs, or file digging that would pollute your main conversation.

The simple decision tree

Same task? Keep going. Do not switch model or MCP tools.

Side investigation? Use a subagent so the main context stays clean.

Long conversation but still same project? Use /compact at a clean checkpoint and tell Claude what to preserve.

Completely different task? Use /rename, then /clear, then start clean.

Copy-paste prompt library

Use these directly in Claude Code. They are written for someone who does not want to think about cache mechanics every time.

Stable session start prompt

Before changing files, orient yourself to this repo. 1. Read the top-level README, package/config files, and CLAUDE.md if present. 2. Summarize the project in beginner-friendly language. 3. Identify the main folders and how the app runs. 4. Tell me what assumptions you are making. 5. Do not switch models, change MCP tools, or compact unless I ask.

Add durable project rules to CLAUDE.md

Add a concise section to CLAUDE.md called "Context management rules". Include: - Start by reading this file before making major changes. - Prefer targeted file reads over scanning huge folders. - Use subagents for broad research or log digging. - Ask before changing MCP configuration, model, or output style. - When compacting, preserve architecture decisions, commands that worked, unresolved blockers, and file paths changed. Keep it short and specific.

Safe compaction prompt

/compact Preserve only the information needed to continue this implementation: - Current goal and acceptance criteria - Files changed and why - Commands that worked or failed - Important design decisions - Known bugs, blockers, and next steps - Any user preferences or constraints I explicitly gave you Discard dead-end exploration, repeated logs, and irrelevant discussion.

Cache performance troubleshooting

Help me diagnose whether this Claude Code session is wasting context or breaking cache. Check for: - Recent model switches - MCP server connect/disconnect events - broad deny rules like Bash or WebFetch - unnecessary compaction - repeated large file reads - stale or oversized CLAUDE.md content - unrelated tasks mixed into this session Give me a simple "keep / compact / clear / restart" recommendation.

Subagent prompt for a side quest

Use a subagent for this research so the main conversation stays clean. Task: Find the files and code paths related to [FEATURE]. Return only: - 5-10 most relevant files - short explanation of how they connect - risks or confusing parts - recommended next implementation step Do not dump long file contents into the main conversation.

MCP stability checklist prompt

Before we continue, check whether my MCP setup is stable for this task. Tell me: - Which MCP tools you can currently see - Which ones are actually needed - Whether any tool/server changes would likely disrupt cache - Whether I should restart now or leave everything alone Do not modify MCP settings unless I explicitly approve.

Cache warmth simulator

Select actions you plan to take during a Claude Code session. The app estimates whether your next turn stays warm or likely rebuilds cache.

Continue same task and ask another targeted question. Switch models with /model mid-session. Connect/disconnect an MCP server while working. Run /compact in the middle of a task. Use /recap for a summary. Use a subagent for broad research. Edit CLAUDE.md and expect it to apply immediately. Edit source files normally and let Claude reread only what it needs.

Result

100

Warm

Your session is stable. Keep going in the same lane.

Constraints and edge cases

Situation	What actually happens	Beginner takeaway
Editing `CLAUDE.md` mid-session	Does not usually invalidate cache, but the new content also does not apply until `/clear`, `/compact`, or restart.	Edit it before the session matters, or restart after major changes.
Changing output style	It is part of the system prompt and is read at session start; a change applies after `/clear` or a new session.	Pick your style up front.
Switching models	Each model has its own cache.	Do not bounce between models unless the cold turn is worth it.
MCP server changes	Tool definitions affect the system prompt layer.	Connect needed servers first; avoid unstable tool lists mid-task.
File edits	Editing a file does not rewrite old conversation history. Claude rereads changed files when needed.	Normal coding is fine. The issue is changing session structure, not editing code.
`/compact`	Replaces history with a summary, so the old conversation prefix changes.	Compact at natural breaks and tell Claude what to preserve.
`/recap`	Generates a displayed summary without replacing the conversation history.	Use it when you just need a status summary.
Subagents	They run in their own context/cache and return summaries to the parent session.	Great for research that would clutter the main thread.
Subscription vs API key	Claude subscription sessions can use one-hour TTL automatically; API/third-party defaults are usually five minutes unless configured.	On API usage, long breaks can cool the cache faster.
Privacy/local transcripts	Claude Code can store local session transcripts in plaintext under `~/.claude/projects/` by default for resumption.	Be careful with sensitive projects and retention settings.

Your project cache notes

This scratchpad saves in this browser. If hosted in an environment with window.vibes.save/load, it uses that; otherwise it falls back to localStorage.

Personal checklist

□ I picked the correct repo folder.

□ I chose the model I plan to use for this session.

□ MCP servers are connected and stable.

□ CLAUDE.md is short, specific, and current.

□ I know when I will compact: after the feature phase, not randomly.

□ I will use subagents for side research.

References and where this app's guidance came from

This app paraphrases these official Anthropic / Claude sources and turns them into beginner workflows. It does not call the Claude API or any external service.

[1] Anthropic API Docs -- Prompt caching
Used for cache prefixes, cache hit vs creation behavior, TTL behavior, and repetitive long-prompt guidance.
platform.claude.com/docs/en/build-with-claude/prompt-caching

[2] Claude Code Docs -- Prompt caching
Used for Claude Code-specific automatic caching, exact prefix matching, invalidators, cache scope, TTL by authentication method, and subagent cache behavior.
code.claude.com/docs/en/prompt-caching

[3] Claude Code Docs -- Memory
Used for CLAUDE.md, auto memory, /memory, and memory loading behavior.
code.claude.com/docs/en/memory

[4] Claude Code Docs -- Manage costs effectively
Used for context-size cost guidance, /usage, /clear, /rename, /resume, and custom compaction instructions.
code.claude.com/docs/en/costs

[5] Claude Code Docs -- Output styles
Used for output-style behavior and system-prompt impact.
code.claude.com/docs/en/output-styles

[6] Claude Code Docs -- Common workflows
Used for beginner prompt patterns: overview, finding relevant code, planning before editing, resuming, worktrees, and delegating research.
code.claude.com/docs/en/common-workflows

[7] Claude Code Docs -- Subagents
Used for the recommendation to isolate broad research/log digging into subagents to preserve main context.
code.claude.com/docs/en/sub-agents

[8] Claude Code Docs -- Data usage
Used for the note that local Claude Code transcripts can be stored in plaintext under ~/.claude/projects/ by default for session resumption.
code.claude.com/docs/en/data-usage

Keep Claude Code fast, cheap, and focused.