Skip to main content

Testing

Fased has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners. This doc is a “how we test” guide:
  • What each suite covers (and what it deliberately does not cover)
  • Which commands to run for common workflows (local, pre-push, debugging)
  • How live tests discover credentials and select models/providers
  • How to add regressions for real-world model/provider issues

Quick start

Most days:
  • Full gate (expected before push): pnpm build && pnpm check && pnpm test
  • Docs-only gate: pnpm check:docs
When you touch tests or want extra confidence:
  • Coverage gate: pnpm test:coverage
  • E2E suite: pnpm test:e2e
When debugging real providers/models (requires real creds):
  • Live suite (models + gateway tool/image probes): pnpm test:live
Tip: when you only need one failing case, prefer narrowing live tests via the allowlist env vars described below. When a browser/runtime issue is not a test failure yet, collect operator evidence first:
  • Logs for the runtime error stream.
  • Usage for token/provider/model accounting.
  • Advanced > Debug for raw status snapshots and plugin/runtime diagnostics.
  • Advanced > Nodes for paired device/runtime failures.
Then add the smallest regression that covers the broken path.

Test suites (what runs where)

Think of the suites as “increasing realism” (and increasing flakiness/cost):

Unit / integration (default)

  • Command: pnpm test
  • Config: scripts/test-parallel.mjs (runs vitest.unit.config.ts, vitest.extensions.config.ts, vitest.gateway.config.ts)
  • Files: src/**/*.test.ts, extensions/**/*.test.ts
  • Scope:
    • Pure unit tests
    • In-process integration tests (gateway auth, routing, tooling, parsing, config)
    • Deterministic regressions for known bugs
  • Expectations:
    • Runs in CI
    • No real keys required
    • Should be fast and stable
  • Pool note:
    • Fased uses Vitest vmForks on Node 22/23 for faster unit shards.
    • On Node 24+, Fased automatically falls back to regular forks to avoid Node VM linking errors (ERR_VM_MODULE_LINK_FAILURE / module is already linked).
    • Override manually with FASED_TEST_VM_FORKS=0 (force forks) or FASED_TEST_VM_FORKS=1 (force vmForks).

E2E (gateway smoke)

  • Command: pnpm test:e2e
  • Config: vitest.e2e.config.ts
  • Files: src/**/*.e2e.test.ts
  • Runtime defaults:
    • Uses Vitest vmForks for faster file startup.
    • Uses adaptive workers (CI: 2-4, local: 4-8).
    • Runs in silent mode by default to reduce console I/O overhead.
  • Useful overrides:
    • FASED_E2E_WORKERS=<n> to force worker count (capped at 16).
    • FASED_E2E_VERBOSE=1 to re-enable verbose console output.
  • Scope:
    • Multi-instance gateway end-to-end behavior
    • WebSocket/HTTP surfaces, node pairing, and heavier networking
  • Expectations:
    • Runs in CI (when enabled in the pipeline)
    • No real keys required
    • More moving parts than unit tests (can be slower)

Live (real providers + real models)

  • Command: pnpm test:live
  • Config: vitest.live.config.ts
  • Files: src/**/*.live.test.ts
  • Default: enabled by pnpm test:live (sets FASED_LIVE_TEST=1)
  • Scope:
    • “Does this provider/model actually work today with real creds?”
    • Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
  • Expectations:
    • Not CI-stable by design (real networks, real provider policies, quotas, outages)
    • Costs money / uses rate limits
    • Prefer running narrowed subsets instead of “everything”
    • Live runs will source ~/.profile to pick up missing API keys
  • API key rotation (provider-specific): set *_API_KEYS with comma/semicolon format or *_API_KEY_1, *_API_KEY_2 (for example OPENAI_API_KEYS, ANTHROPIC_API_KEYS, GEMINI_API_KEYS) or per-live override via FASED_LIVE_*_KEY; tests retry on rate limit responses.

Which suite should I run?

Use this decision table:
  • Editing logic/tests: run pnpm test (and pnpm test:coverage if you changed a lot)
  • Touching gateway networking / WS protocol / pairing: add pnpm test:e2e
  • Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed pnpm test:live
  • Debugging UI-only state mismatch: reproduce in the focused page first, then add or update the matching UI/controller test.

Wallet, mining, and Fased Network regressions

If you touch the operator stack, do not stop at generic unit tests. The minimum useful split is:
  • wallet and SAT UI rendering
  • wallet and SAT UI controllers
  • signer, passkey, mining, and Fased Network backend paths

Fast UI confidence

Run the main browser-surface regressions:
pnpm exec vitest run \
  ui/src/ui/views/wallet.test.ts \
  ui/src/ui/views/mining.test.ts \
  ui/src/ui/views/federation.test.ts

Controller and interaction confidence

Run the focused interaction tests:
pnpm exec vitest run \
  ui/src/ui/controllers/wallet.test.ts \
  ui/src/ui/controllers/mining.test.ts \
  ui/src/ui/wallet-passkey.test.ts \
  ui/src/ui/mining-commit.test.ts

Runtime and gateway confidence

Run the backend regressions that cover signer health, SAT HTTP flows, passkey approval auth, and Fased Network attach logic:
pnpm exec vitest run \
  src/commands/wallet.signer-doctor.test.ts \
  src/gateway/server.wallet-approval-auth-http.test.ts \
  src/gateway/server.sat-mining-http.test.ts \
  src/federation/auto-connect.test.ts
Practical rule:
  • changing Wallet docs or labels usually needs the Wallet and passkey UI tests
  • changing Mining math, capital, commit, or readiness needs Mining UI plus SAT HTTP tests
  • changing Fased Network bond or route logic needs Fased Network UI plus src/federation/auto-connect.test.ts

Live: Android node capability sweep

  • Test: src/gateway/android-node.capabilities.live.test.ts
  • Script: pnpm android:test:integration
  • Goal: invoke every command currently advertised by a connected Android node and assert command contract behavior.
  • Scope:
    • Preconditioned/manual setup (the suite does not install/run/pair the app).
    • Command-by-command gateway node.invoke validation for the selected Android node.
  • Required pre-setup:
    • Android app already connected + paired to the gateway.
    • App kept in foreground.
    • Permissions/capture consent granted for capabilities you expect to pass.
  • Optional target overrides:
    • FASED_ANDROID_NODE_ID or FASED_ANDROID_NODE_NAME.
    • FASED_ANDROID_GATEWAY_URL / FASED_ANDROID_GATEWAY_TOKEN / FASED_ANDROID_GATEWAY_PASSWORD.
  • Full Android setup details: Android App

Live: model smoke (profile keys)

Live tests are split into two layers so we can isolate failures:
  • “Direct model” tells us the provider/model can answer at all with the given key.
  • “Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).

Layer 1: Direct model completion (no gateway)

  • Test: src/agents/models.profiles.live.test.ts
  • Goal:
    • Enumerate discovered models
    • Use getApiKeyForModel to select models you have creds for
    • Run a small completion per model (and targeted regressions where needed)
  • How to enable:
    • pnpm test:live (or FASED_LIVE_TEST=1 if invoking Vitest directly)
  • Set FASED_LIVE_MODELS=modern (or all, alias for modern) to actually run this suite; otherwise it skips to keep pnpm test:live focused on gateway smoke
  • How to select models:
    • FASED_LIVE_MODELS=modern to run the current modern allowlist from code.
    • FASED_LIVE_MODELS=all is an alias for the modern allowlist
    • or FASED_LIVE_MODELS="<provider/model>,<provider/model>" (comma allowlist)
  • How to select providers:
    • FASED_LIVE_PROVIDERS="google,google-gemini-cli" (comma allowlist)
  • Where keys come from:
    • By default: profile store and env fallbacks
    • Set FASED_LIVE_REQUIRE_PROFILE_KEYS=1 to enforce profile store only
  • Why this exists:
    • Separates “provider API is broken / key is invalid” from “gateway agent pipeline is broken”
    • Contains small, isolated regressions (example: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)

Layer 2: Gateway + dev agent smoke (what “@fased” actually does)

  • Test: src/gateway/gateway-models.profiles.live.test.ts
  • Goal:
    • Spin up an in-process gateway
    • Create/patch a agent:dev:* session (model override per run)
    • Iterate models-with-keys and assert:
      • “meaningful” response (no tools)
      • a real tool invocation works (read probe)
      • optional extra tool probes (exec+read probe)
      • OpenAI regression paths (tool-call-only → follow-up) keep working
  • Probe details (so you can explain failures quickly):
    • read probe: the test writes a nonce file in the workspace and asks the agent to read it and echo the nonce back.
    • exec+read probe: the test asks the agent to exec-write a nonce into a temp file, then read it back.
    • image probe: the test attaches a generated PNG (cat + randomized code) and expects the model to return cat <CODE>.
    • Implementation reference: src/gateway/gateway-models.profiles.live.test.ts and src/gateway/live-image-probe.ts.
  • How to enable:
    • pnpm test:live (or FASED_LIVE_TEST=1 if invoking Vitest directly)
  • How to select models:
    • Default: current modern allowlist from code.
    • FASED_LIVE_GATEWAY_MODELS=all is an alias for the modern allowlist
    • Or set FASED_LIVE_GATEWAY_MODELS="provider/model" (or comma list) to narrow
  • How to select providers (avoid “OpenRouter everything”):
    • FASED_LIVE_GATEWAY_PROVIDERS="google,google-gemini-cli,openai,anthropic,zai,minimax" (comma allowlist)
  • Tool + image probes are always on in this live test:
    • read probe + exec+read probe (tool stress)
    • image probe runs when the model advertises image input support
    • Flow (high level):
      • Test generates a tiny PNG with “CAT” + random code (src/gateway/live-image-probe.ts)
      • Sends it via agent attachments: [{ mimeType: "image/png", content: "<base64>" }]
      • Gateway parses attachments into images[] (src/gateway/server-methods/agent.ts + src/gateway/chat-attachments.ts)
      • Embedded agent forwards a multimodal user message to the model
      • Assertion: reply contains cat + the code (OCR tolerance: minor mistakes allowed)
Tip: to see what you can test on your machine (and the exact provider/model ids), run:
fased models list
fased models list --json

Live: Anthropic setup-token smoke

  • Test: src/agents/anthropic.setup-token.live.test.ts
  • Goal: verify Claude Code CLI setup-token (or a pasted setup-token profile) can complete an Anthropic prompt.
  • Enable:
    • pnpm test:live (or FASED_LIVE_TEST=1 if invoking Vitest directly)
    • FASED_LIVE_SETUP_TOKEN=1
  • Token sources (pick one):
    • Profile: FASED_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test
    • Raw token: FASED_LIVE_SETUP_TOKEN_VALUE=<setup-token>
  • Model override (optional):
    • FASED_LIVE_SETUP_TOKEN_MODEL=<provider/model>
Setup example:
fased models auth paste-token --provider anthropic --profile-id anthropic:setup-token-test
FASED_LIVE_SETUP_TOKEN=1 FASED_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test pnpm test:live src/agents/anthropic.setup-token.live.test.ts

Live: CLI backend smoke (Claude Code CLI or other local CLIs)

  • Test: src/gateway/gateway-cli-backend.live.test.ts
  • Goal: validate the Gateway + agent pipeline using a local CLI backend, without touching your default config.
  • Enable:
    • pnpm test:live (or FASED_LIVE_TEST=1 if invoking Vitest directly)
    • FASED_LIVE_CLI_BACKEND=1
  • Defaults:
    • Model: the CLI backend default from code.
    • Command: claude
    • Args: ["-p","--output-format","json","--dangerously-skip-permissions"]
  • Overrides (optional):
    • FASED_LIVE_CLI_BACKEND_MODEL="<provider/model>"
    • FASED_LIVE_CLI_BACKEND_COMMAND="/full/path/to/claude"
    • FASED_LIVE_CLI_BACKEND_ARGS='["-p","--output-format","json","--permission-mode","bypassPermissions"]'
    • FASED_LIVE_CLI_BACKEND_CLEAR_ENV='["ANTHROPIC_API_KEY","ANTHROPIC_API_KEY_OLD"]'
    • FASED_LIVE_CLI_BACKEND_IMAGE_PROBE=1 to send a real image attachment (paths are injected into the prompt).
    • FASED_LIVE_CLI_BACKEND_IMAGE_ARG="--image" to pass image file paths as CLI args instead of prompt injection.
    • FASED_LIVE_CLI_BACKEND_IMAGE_MODE="repeat" (or "list") to control how image args are passed when IMAGE_ARG is set.
    • FASED_LIVE_CLI_BACKEND_RESUME_PROBE=1 to send a second turn and validate resume flow.
  • FASED_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0 to keep Claude Code CLI MCP config enabled (default disables MCP config with a temporary empty file).
Example:
FASED_LIVE_CLI_BACKEND=1 \
  FASED_LIVE_CLI_BACKEND_MODEL="<provider/model>" \
  pnpm test:live src/gateway/gateway-cli-backend.live.test.ts
Narrow, explicit allowlists are fastest and least flaky:
  • Single model, direct (no gateway):
    • FASED_LIVE_MODELS="<provider/model>" pnpm test:live src/agents/models.profiles.live.test.ts
  • Single model, gateway smoke:
    • FASED_LIVE_GATEWAY_MODELS="<provider/model>" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
  • Tool calling across several providers you actively support:
    • FASED_LIVE_GATEWAY_MODELS="<provider/model>,<provider/model>" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
  • Google focus (Gemini API key + Gemini CLI):
    • Gemini API key route: FASED_LIVE_GATEWAY_MODELS="google/<model>" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
    • Gemini CLI route: FASED_LIVE_GATEWAY_MODELS="google-gemini-cli/<model>" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Notes:
  • google/... uses the Gemini API (API key).
  • google-gemini-cli/... uses the local Gemini CLI on your machine (separate auth + tooling quirks).
  • Gemini API vs Gemini CLI:
    • API: Fased calls Google’s hosted Gemini API over HTTP (API key / profile auth); this is what most users mean by “Gemini”.
    • CLI: Fased shells out to a local gemini binary; it has its own auth and can behave differently (streaming/tool support/version skew).

Live: model matrix

There is no fixed public model list in this doc. Live tests should follow what the local provider registry and credentials support on the machine running the suite. Use:
fased models list
fased models scan
Pick a small matrix that covers:
  • at least one direct hosted model path;
  • at least one Gateway + Agent smoke path;
  • one model with tool calling;
  • one image-capable model if the change touches attachments or vision;
  • any custom/OpenAI-compatible endpoint you depend on.
Keep exact provider/model allowlists in local runbooks or CI variables, not in this help page.

Credentials (never commit)

Live tests discover credentials the same way the CLI does. Practical implications:
  • If the CLI works, live tests should find the same keys.
  • If a live test says “no creds”, debug the same way you’d debug fased models list / model selection.
  • Profile store: ~/.fased/credentials/ (preferred; what “profile keys” means in the tests)
  • Config: ~/.fased/fased.json (or FASED_CONFIG_PATH)
If you want to rely on env keys (e.g. exported in your ~/.profile), run local tests after source ~/.profile, or use the Docker runners below (they can mount ~/.profile into the container).

Deepgram live (audio transcription)

  • Test: src/media-understanding/providers/deepgram/audio.live.test.ts
  • Enable: DEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live src/media-understanding/providers/deepgram/audio.live.test.ts

BytePlus coding plan live

  • Test: src/agents/byteplus.live.test.ts
  • Enable: BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live src/agents/byteplus.live.test.ts
  • Optional model override: BYTEPLUS_CODING_MODEL=<model-id>

Docker runners (optional “works in Linux” checks)

These run pnpm test:live inside the repo Docker image, mounting your local config dir and workspace (and sourcing ~/.profile if mounted):
  • Direct models: pnpm test:docker:live-models (script: scripts/test-live-models-docker.sh)
  • Gateway + dev agent: pnpm test:docker:live-gateway (script: scripts/test-live-gateway-models-docker.sh)
  • Onboarding wizard (TTY, full scaffolding): pnpm test:docker:onboard (script: scripts/e2e/onboard-docker.sh)
  • Gateway networking (two containers, WS auth + health): pnpm test:docker:gateway-network (script: scripts/e2e/gateway-network-docker.sh)
  • Plugins (custom extension load + registry smoke): pnpm test:docker:plugins (script: scripts/e2e/plugins-docker.sh)
Manual ACP plain-language thread smoke (not CI):
  • bun scripts/dev/discord-acp-plain-language-smoke.ts --channel <discord-channel-id> ...
  • Keep this script for regression/debug workflows. It may be needed again for ACP thread routing validation, so do not delete it.
Useful env vars:
  • FASED_CONFIG_DIR=... (default: ~/.fased) mounted to /home/node/.fased
  • FASED_WORKSPACE_DIR=... (default: ~/.fased/workspace) mounted to /home/node/.fased/workspace
  • FASED_PROFILE_FILE=... (default: ~/.profile) mounted to /home/node/.profile and sourced before running tests
  • FASED_LIVE_GATEWAY_MODELS=... / FASED_LIVE_MODELS=... to narrow the run
  • FASED_LIVE_REQUIRE_PROFILE_KEYS=1 to ensure creds come from the profile store (not env)

Docs sanity

Run docs checks after doc edits:
pnpm check:docs
pnpm docs:list

Offline regression (CI-safe)

These are “real pipeline” regressions without real providers:
  • Gateway tool calling (mock OpenAI, real gateway + agent loop): src/gateway/gateway.test.ts (case: “runs a mock OpenAI tool call end-to-end via gateway agent loop”)
  • Gateway wizard (WS wizard.start/wizard.next, writes config + auth enforced): src/gateway/gateway.test.ts (case: “runs wizard over ws and writes auth token config”)

Agent reliability evals (skills)

We already have a few CI-safe tests that behave like “agent reliability evals”:
  • Mock tool-calling through the real gateway + agent loop (src/gateway/gateway.test.ts).
  • End-to-end wizard flows that validate session wiring and config effects (src/gateway/gateway.test.ts).
What’s still missing for skills (see Skills):
  • Decisioning: when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?
  • Compliance: does the agent read SKILL.md before use and follow required steps/args?
  • Workflow contracts: multi-turn scenarios that assert tool order, session history carryover, and sandbox boundaries.
Future evals should stay deterministic first:
  • A scenario runner using mock providers to assert tool calls + order, skill file reads, and session wiring.
  • A small suite of skill-focused scenarios (use vs avoid, gating, prompt injection).
  • Optional live evals (opt-in, env-gated) only after the CI-safe suite is in place.

Adding regressions (guidance)

When you fix a provider/model issue discovered in live:
  • Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
  • If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
  • Prefer targeting the smallest layer that catches the bug:
    • provider request conversion/replay bug → direct models test
    • gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test