Token use & costs

Fased tracks tokens, not characters. Tokens are model-specific, but most OpenAI-style models average ~4 characters per token for English text.

How the system prompt is built

Fased assembles its own system prompt on every run. It includes:

Tool list + short descriptions
Skills list (only metadata; instructions are loaded on demand with read)
Self-update instructions
Workspace + bootstrap files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md when new, plus canonical MEMORY.md and compatibility memory.md when present.
Large bootstrap files are truncated by agents.defaults.bootstrapMaxChars (default: 20000). Total bootstrap injection is capped by agents.defaults.bootstrapTotalMaxChars (default: 150000).
memory/*.md files are on-demand through memory tools and are not auto-injected.
Time (UTC + user timezone)
Reply tags + heartbeat behavior
Runtime metadata (host/OS/model/thinking)

See the full breakdown in System Prompt.

What counts in the context window

Everything the model receives counts toward the context limit:

System prompt (all sections listed above)
Conversation history (user + assistant messages)
Tool calls and tool results
Attachments/transcripts (images, audio, files)
Compaction summaries and pruning artifacts
Provider wrappers or safety headers (not visible, but still counted)

For images, Fased downscales transcript/tool image payloads before provider calls. Use agents.defaults.imageMaxDimensionPx (default: 1200) to tune this:

Lower values usually reduce vision-token usage and payload size.
Higher values preserve more visual detail for OCR/UI-heavy screenshots.

For a practical breakdown per injected file, tools, skills, and system prompt size, use /context list or /context detail. See Context.

How to see token usage

Use the Usage page in the Control UI for the local usage history. It reports model calls from:

chat sessions
channel deliveries
tasks and cron runs
CLI/system runs when usage records exist
session-store fallback counters only when no better run/transcript record exists

The Usage page groups by provider, model, agent, channel, task, session, and source. It shows input, output, cache read/write, total tokens, and cost when pricing exists. If pricing is missing, it shows tokens only. Chat commands are session-level controls:

/status shows the current session model, context estimate, recent response token data, and session key.
/usage off|tokens|full appends an optional per-response footer to the current session.
/usage cost shows a local cost summary from stored usage records.

CLI/provider status surfaces are separate:

fased status --usage and model/provider status commands can show provider quota windows.
Provider quota windows are account/provider snapshots, not local per-message billing totals.

Cost estimation (when shown)

Costs are estimated from your model pricing config:

models.providers.<provider>.models[].cost

These are USD per 1M tokens for input, output, cacheRead, and cacheWrite. If pricing is missing, Fased shows tokens only. OAuth tokens never show dollar cost.

Cache TTL and pruning impact

Provider prompt caching only applies within the cache TTL window. Fased can optionally run cache-ttl pruning: it prunes the session once the cache TTL has expired, then resets the cache window so subsequent requests can re-use the freshly cached context instead of re-caching the full history. This keeps cache write costs lower when a session goes idle past the TTL. Configure it from Agent > Models where available, or from Gateway configuration for advanced fields. See the behavior details in Session pruning. Heartbeat can keep the cache warm across idle gaps. If your model cache TTL is 1h, setting the heartbeat interval just under that (e.g., 55m) can avoid re-caching the full prompt, reducing cache write costs. In multi-agent setups, you can keep one shared model config and tune cache behavior per agent with agents.list[].params.cacheRetention. For a full knob-by-knob guide, see Prompt Caching. For Anthropic API pricing, cache reads are significantly cheaper than input tokens, while cache writes are billed at a higher multiplier. See Anthropic’s prompt caching pricing for the latest rates and TTL multipliers:

Anthropic prompt caching

Example: keep 1h cache warm with heartbeat

agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-6"
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "long"
    heartbeat:
      every: "55m"

Example: mixed traffic with per-agent cache strategy

agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-6"
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "long" # default baseline for most agents
  list:
    - id: "research"
      default: true
      heartbeat:
        every: "55m" # keep long cache warm for deep sessions
    - id: "alerts"
      params:
        cacheRetention: "none" # avoid cache writes for bursty notifications

agents.list[].params merges on top of the selected model’s params, so you can override only cacheRetention and inherit other model defaults unchanged.

Example: enable Anthropic 1M context beta header

Anthropic’s 1M context window is currently beta-gated. Fased can inject the required anthropic-beta value when you enable context1m on supported Opus or Sonnet models.

agents:
  defaults:
    models:
      "anthropic/claude-opus-4-6":
        params:
          context1m: true

This maps to Anthropic’s context-1m-2025-08-07 beta header. If you authenticate Anthropic with OAuth/subscription tokens (sk-ant-oat-*), Fased skips the context-1m-* beta header because Anthropic currently rejects that combination with HTTP 401.

Tips for reducing token pressure

Use /compact to summarize long sessions.
Trim large tool outputs in your workflows.
Lower agents.defaults.imageMaxDimensionPx for screenshot-heavy sessions.
Keep skill descriptions short (skill list is injected into the prompt).
Prefer smaller models for verbose, exploratory work.

See Skills for the exact skill list overhead formula.

CLI overview

Setup and lifecycle

Runtime and ops

Agents and tasks

Channels and devices

Models and plugins

Network and economy

RPC and API

Workspace templates

Prompt, memory, and cost

Setup internals

Formatting and behavior

Design

Project and release

Token Use and Costs

Token use & costs

How the system prompt is built

What counts in the context window

How to see token usage

Cost estimation (when shown)

Cache TTL and pruning impact

Example: keep 1h cache warm with heartbeat

Example: mixed traffic with per-agent cache strategy

Example: enable Anthropic 1M context beta header

Tips for reducing token pressure

​Token use & costs

​How the system prompt is built

​What counts in the context window

​How to see token usage

​Cost estimation (when shown)

​Cache TTL and pruning impact

​Example: keep 1h cache warm with heartbeat

​Example: mixed traffic with per-agent cache strategy

​Example: enable Anthropic 1M context beta header

​Tips for reducing token pressure

Token use & costs

How the system prompt is built

What counts in the context window

How to see token usage

Cost estimation (when shown)

Cache TTL and pruning impact

Example: keep 1h cache warm with heartbeat

Example: mixed traffic with per-agent cache strategy

Example: enable Anthropic 1M context beta header

Tips for reducing token pressure