Prompt caching
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (cacheWrite), and later matching requests can read them back (cacheRead).
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
This page covers all cache-related knobs that affect prompt reuse and token cost.
For provider-specific cache behavior and current pricing, check the provider docs:
https://docs.anthropic.com/docs/build-with-claude/prompt-caching
Primary knobs
cacheRetention (model and per-agent)
Set cache retention on model params:
agents.defaults.models["provider/model"].paramsagents.list[].params(matching agent id; overrides by key)
Legacy cacheControlTtl
Legacy values are still accepted and mapped:
5m->short1h->long
cacheRetention for new config.
contextPruning.mode: "cache-ttl"
Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.
Heartbeat keep-warm
Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.agents.list[].heartbeat.
Provider behavior
Anthropic (direct API)
cacheRetentionis supported.- With Anthropic API-key auth profiles, Fased seeds
cacheRetention: "short"for Anthropic model refs when unset.
OpenRouter Anthropic models
Foropenrouter/anthropic/* model refs, Fased injects Anthropic cache_control on system/developer prompt blocks to improve prompt-cache reuse.
Other providers
If the provider does not support this cache mode,cacheRetention has no effect.
Tuning patterns
Mixed traffic (recommended default)
Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:Cost-first baseline
- Set baseline
cacheRetention: "short". - Enable
contextPruning.mode: "cache-ttl". - Keep heartbeat below your TTL only for agents that benefit from warm caches.
Cache diagnostics
Fased exposes dedicated cache-trace diagnostics for embedded agent runs.diagnostics.cacheTrace config
filePath:$FASED_STATE_DIR/logs/cache-trace.jsonlincludeMessages:trueincludePrompt:trueincludeSystem:true
Env toggles (one-off debugging)
FASED_CACHE_TRACE=1enables cache tracing.FASED_CACHE_TRACE_FILE=/path/to/cache-trace.jsonloverrides output path.FASED_CACHE_TRACE_MESSAGES=0|1toggles full message payload capture.FASED_CACHE_TRACE_PROMPT=0|1toggles prompt text capture.FASED_CACHE_TRACE_SYSTEM=0|1toggles system prompt capture.
What to inspect
- Cache trace events are JSONL and include staged snapshots like
session:loaded,prompt:before,stream:context, andsession:after. - Per-turn cache token impact is visible in the Control UI Usage page via
cacheReadandcacheWrite. - Chat
/usage fullcan still show a per-response footer for the current session, but the Usage page is the better history view for provider/model/accounting review. - Cache trace files are operator diagnostics; use Logs or Advanced > Debug when you need raw trace inspection.
Quick troubleshooting
- High
cacheWriteon most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings. - No effect from
cacheRetention: confirm model key matchesagents.defaults.models["provider/model"]. - Configured providers without compatible cache support may ignore Fased cache settings or show no cache-token effect. Bedrock-compatible Anthropic model configs can pass explicit
cacheRetentionwhen configured through Advanced Config or a custom provider block.