Prompt caching

Prompt caching lets the model provider reuse unchanged prompt prefixes across turns. Those prefixes are usually system/developer instructions and other stable context. The first matching request writes cache tokens (cacheWrite). Later matching requests can read them back (cacheRead). Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. This page covers all cache-related knobs that affect prompt reuse and token cost. For provider-specific cache behavior and current pricing, check the provider docs:

Anthropic prompt caching

Primary knobs

`cacheRetention` (model and per-agent)

Set cache retention on model params:

agents:
  defaults:
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "short" # none | short | long

Per-agent override:

agents:
  list:
    - id: "alerts"
      params:
        cacheRetention: "none"

Config merge order:

agents.defaults.models["provider/model"].params
agents.list[].params (matching agent id; overrides by key)

Legacy `cacheControlTtl`

Legacy values are still accepted and mapped:

5m -> short
1h -> long

Prefer cacheRetention for new config.

`contextPruning.mode: "cache-ttl"`

Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.

agents:
  defaults:
    contextPruning:
      mode: "cache-ttl"
      ttl: "1h"

See Session Pruning for full behavior.

Heartbeat keep-warm

Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.

agents:
  defaults:
    heartbeat:
      every: "55m"

Per-agent heartbeat is supported at agents.list[].heartbeat.

Provider behavior

Anthropic (direct API)

cacheRetention is supported.
With Anthropic API-key auth profiles, Fased seeds cacheRetention: "short" for Anthropic model refs when unset.

OpenRouter Anthropic models

For openrouter/anthropic/* model refs, Fased injects Anthropic cache_control on system/developer prompt blocks to improve prompt-cache reuse.

Other providers

If the provider does not support this cache mode, cacheRetention has no effect.

Tuning patterns

Mixed traffic (recommended default)

Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:

agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-6"
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "long"
  list:
    - id: "research"
      default: true
      heartbeat:
        every: "55m"
    - id: "alerts"
      params:
        cacheRetention: "none"

Cost-first baseline

Set baseline cacheRetention: "short".
Enable contextPruning.mode: "cache-ttl".
Keep heartbeat below your TTL only for agents that benefit from warm caches.

Cache diagnostics

Fased exposes dedicated cache-trace diagnostics for embedded agent runs.

`diagnostics.cacheTrace` config

diagnostics:
  cacheTrace:
    enabled: true
    filePath: "~/.fased/logs/cache-trace.jsonl" # optional
    includeMessages: false # default true
    includePrompt: false # default true
    includeSystem: false # default true

Defaults:

filePath: $FASED_STATE_DIR/logs/cache-trace.jsonl
includeMessages: true
includePrompt: true
includeSystem: true

Env toggles (one-off debugging)

FASED_CACHE_TRACE=1 enables cache tracing.
FASED_CACHE_TRACE_FILE=/path/to/cache-trace.jsonl overrides output path.
FASED_CACHE_TRACE_MESSAGES=0|1 toggles full message payload capture.
FASED_CACHE_TRACE_PROMPT=0|1 toggles prompt text capture.
FASED_CACHE_TRACE_SYSTEM=0|1 toggles system prompt capture.

What to inspect

Cache trace events are JSONL and include staged snapshots like session:loaded, prompt:before, stream:context, and session:after.
Per-turn cache token impact is visible in the Control UI Usage page via cacheRead and cacheWrite.
Chat /usage full can still show a per-response footer for the current session. The Usage page is the better history view for provider, model, and accounting review.
Cache trace files are operator diagnostics; use Logs or Advanced > Debug when you need raw trace inspection.

Quick troubleshooting

High cacheWrite on most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings.
No effect from cacheRetention: confirm model key matches agents.defaults.models["provider/model"].
Providers without compatible cache support may ignore Fased cache settings or show no cache-token effect.
Bedrock-compatible Anthropic model configs can pass explicit cacheRetention when configured through Advanced Config or a custom provider block.

Related docs:

CLI overview

Setup and lifecycle

Runtime and ops

Agents and tasks

Channels and devices

Models and plugins

Network and economy

RPC and API

Workspace templates

Prompt, memory, and cost

Setup internals

Formatting and behavior

Design

Project and release

Prompt Caching

Prompt caching

Primary knobs

`cacheRetention` (model and per-agent)

Legacy `cacheControlTtl`

`contextPruning.mode: "cache-ttl"`

Heartbeat keep-warm

Provider behavior

Anthropic (direct API)

OpenRouter Anthropic models

Other providers

Tuning patterns

Mixed traffic (recommended default)

Cost-first baseline

Cache diagnostics

`diagnostics.cacheTrace` config

Env toggles (one-off debugging)

What to inspect

Quick troubleshooting

​Prompt caching

​Primary knobs

​cacheRetention (model and per-agent)

​Legacy cacheControlTtl

​contextPruning.mode: "cache-ttl"

​Heartbeat keep-warm

​Provider behavior

​Anthropic (direct API)

​OpenRouter Anthropic models

​Other providers

​Tuning patterns

​Mixed traffic (recommended default)

​Cost-first baseline

​Cache diagnostics

​diagnostics.cacheTrace config

​Env toggles (one-off debugging)

​What to inspect

​Quick troubleshooting

Prompt caching

Primary knobs

`cacheRetention` (model and per-agent)

Legacy `cacheControlTtl`

`contextPruning.mode: "cache-ttl"`

Heartbeat keep-warm

Provider behavior

Anthropic (direct API)

OpenRouter Anthropic models

Other providers

Tuning patterns

Mixed traffic (recommended default)

Cost-first baseline

Cache diagnostics

`diagnostics.cacheTrace` config

Env toggles (one-off debugging)

What to inspect

Quick troubleshooting