Token use & costs
Fased tracks tokens, not characters. Tokens are model-specific, but most OpenAI-style models average ~4 characters per token for English text.How the system prompt is built
Fased assembles its own system prompt on every run. It includes:- Tool list + short descriptions
- Skills list (only metadata; instructions are loaded on demand with
read) - Self-update instructions
- Workspace + bootstrap files (
AGENTS.md,SOUL.md,TOOLS.md,IDENTITY.md,USER.md,HEARTBEAT.md,BOOTSTRAP.mdwhen new, plus canonicalMEMORY.mdand compatibilitymemory.mdwhen present). Large files are truncated byagents.defaults.bootstrapMaxChars(default: 20000), and total bootstrap injection is capped byagents.defaults.bootstrapTotalMaxChars(default: 150000).memory/*.mdfiles are on-demand via memory tools and are not auto-injected. - Time (UTC + user timezone)
- Reply tags + heartbeat behavior
- Runtime metadata (host/OS/model/thinking)
What counts in the context window
Everything the model receives counts toward the context limit:- System prompt (all sections listed above)
- Conversation history (user + assistant messages)
- Tool calls and tool results
- Attachments/transcripts (images, audio, files)
- Compaction summaries and pruning artifacts
- Provider wrappers or safety headers (not visible, but still counted)
agents.defaults.imageMaxDimensionPx (default: 1200) to tune this:
- Lower values usually reduce vision-token usage and payload size.
- Higher values preserve more visual detail for OCR/UI-heavy screenshots.
/context list or /context detail. See Context.
How to see token usage
Use the Usage page in the Control UI for the local usage history. It reports model calls from:- chat sessions
- channel deliveries
- tasks and cron runs
- CLI/system runs when usage records exist
- session-store fallback counters only when no better run/transcript record exists
/statusshows the current session model, context estimate, recent response token data, and session key./usage off|tokens|fullappends an optional per-response footer to the current session./usage costshows a local cost summary from stored usage records.
fased status --usageand model/provider status commands can show provider quota windows.- Provider quota windows are account/provider snapshots, not local per-message billing totals.
Cost estimation (when shown)
Costs are estimated from your model pricing config:input, output, cacheRead, and
cacheWrite. If pricing is missing, Fased shows tokens only. OAuth tokens
never show dollar cost.
Cache TTL and pruning impact
Provider prompt caching only applies within the cache TTL window. Fased can optionally run cache-ttl pruning: it prunes the session once the cache TTL has expired, then resets the cache window so subsequent requests can re-use the freshly cached context instead of re-caching the full history. This keeps cache write costs lower when a session goes idle past the TTL. Configure it from Agent > Models where available, or from Gateway configuration for advanced fields, and see the behavior details in Session pruning. Heartbeat can keep the cache warm across idle gaps. If your model cache TTL is1h, setting the heartbeat interval just under that (e.g., 55m) can avoid
re-caching the full prompt, reducing cache write costs.
In multi-agent setups, you can keep one shared model config and tune cache behavior
per agent with agents.list[].params.cacheRetention.
For a full knob-by-knob guide, see Prompt Caching.
For Anthropic API pricing, cache reads are significantly cheaper than input
tokens, while cache writes are billed at a higher multiplier. See Anthropic’s
prompt caching pricing for the latest rates and TTL multipliers:
https://docs.anthropic.com/docs/build-with-claude/prompt-caching
Example: keep 1h cache warm with heartbeat
Example: mixed traffic with per-agent cache strategy
agents.list[].params merges on top of the selected model’s params, so you can
override only cacheRetention and inherit other model defaults unchanged.
Example: enable Anthropic 1M context beta header
Anthropic’s 1M context window is currently beta-gated. Fased can inject the requiredanthropic-beta value when you enable context1m on supported Opus
or Sonnet models.
context-1m-2025-08-07 beta header.
If you authenticate Anthropic with OAuth/subscription tokens (sk-ant-oat-*),
Fased skips the context-1m-* beta header because Anthropic currently
rejects that combination with HTTP 401.
Tips for reducing token pressure
- Use
/compactto summarize long sessions. - Trim large tool outputs in your workflows.
- Lower
agents.defaults.imageMaxDimensionPxfor screenshot-heavy sessions. - Keep skill descriptions short (skill list is injected into the prompt).
- Prefer smaller models for verbose, exploratory work.