Local models

Fased ships the provider clients and configuration surfaces. It does not ship a local model server or model weights. Install and run Ollama, LM Studio, vLLM, LiteLLM, SGLang, or another compatible server separately, then connect it from Agent > Models. Local models can work well when they have enough context and tool-calling quality. Small or heavily quantized models may truncate context, miss policy cues, or make weaker tool decisions. Use the strongest model your hardware can run reliably, keep hosted fallbacks configured when you need higher reliability, and review local-model exposure with the same care as any private endpoint. See Security.

Advanced local stack: LM Studio

Load a strong model in LM Studio, enable the local server, and connect Fased through the first-class LM Studio provider in Agent > Models.

{
  agents: {
    defaults: {
      model: { primary: "lmstudio/qwen/qwen3.5-9b" },
      models: {
        "anthropic/claude-opus-4-6": { alias: "Opus" },
        "lmstudio/qwen/qwen3.5-9b": { alias: "LM Studio" },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      lmstudio: {
        baseUrl: "http://127.0.0.1:1234/v1",
        apiKey: "lmstudio-local",
        api: "openai-completions",
        request: { allowPrivateNetwork: true },
        models: [
          {
            id: "qwen/qwen3.5-9b",
            name: "qwen/qwen3.5-9b",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 128000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}

Setup checklist

Install LM Studio: https://lmstudio.ai
In LM Studio, download the largest model you can run reliably, start the server, and confirm http://127.0.0.1:1234/api/v1/models lists it.
Keep the model loaded; cold-load adds startup latency.
Adjust contextWindow/maxTokens if your LM Studio build differs.
If LM Studio auth is disabled, leave the token blank in Agent > Models.

Keep hosted models configured even when running local. Use models.mode: "merge" so fallbacks stay available.

Hybrid config: hosted primary, local fallback

{
  agents: {
    defaults: {
      model: {
        primary: "anthropic/claude-sonnet-4-5",
        fallbacks: ["lmstudio/qwen/qwen3.5-9b", "anthropic/claude-opus-4-6"],
      },
      models: {
        "anthropic/claude-sonnet-4-5": { alias: "Sonnet" },
        "lmstudio/qwen/qwen3.5-9b": { alias: "LM Studio" },
        "anthropic/claude-opus-4-6": { alias: "Opus" },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      lmstudio: {
        baseUrl: "http://127.0.0.1:1234/v1",
        apiKey: "lmstudio-local",
        api: "openai-completions",
        request: { allowPrivateNetwork: true },
        models: [
          {
            id: "qwen/qwen3.5-9b",
            name: "qwen/qwen3.5-9b",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 128000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}

Local-first with hosted safety net

Swap the primary and fallback order. Keep the same providers block and models.mode: "merge" so you can fall back to Sonnet or Opus when the local box is down.

Regional hosting / data routing

Hosted MiniMax/Kimi/GLM variants also exist on OpenRouter with region-pinned endpoints, such as US-hosted. Pick the regional variant to keep traffic in your chosen jurisdiction while still using models.mode: "merge" for Anthropic/OpenAI fallbacks.
Local-only remains the strongest privacy path. Hosted regional routing is the middle ground when you need provider features but want control over data flow.

Registry-Supported Local Routes

Normal setup should use Agent > Models in the Control UI. Add or select one of these provider surfaces there:

Ollama for a native local/cloud/hybrid Ollama server.
LM Studio for the local server on localhost:1234.
vLLM for a vLLM server.
LiteLLM for a LiteLLM gateway.
Custom Provider for SGLang or any other private OpenAI-compatible endpoint.

Ollama

For the normal UI path:

Provider: Ollama
Base URL: http://127.0.0.1:11434
API key: optional for local-only
Model: llama3.3

Ollama uses the native API. Do not configure it as /v1 unless you deliberately choose Custom Provider for compatibility testing.

Other OpenAI-compatible Local Proxies

vLLM, LiteLLM, SGLang, OAI-proxy, or custom gateways work if they expose an OpenAI-style /v1 endpoint. Replace the provider block above with your endpoint and model ID:

{
  models: {
    mode: "merge",
    providers: {
      local: {
        baseUrl: "http://127.0.0.1:8000/v1",
        apiKey: "sk-local",
        api: "openai-completions",
        request: { allowPrivateNetwork: true },
        models: [
          {
            id: "my-local-model",
            name: "Local Model",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 120000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}

Keep models.mode: "merge" so hosted models stay available as fallbacks. Private, loopback, and LAN model provider URLs require request.allowPrivateNetwork: true. Public hosted providers do not need this flag.

Troubleshooting

Gateway can reach the proxy? curl http://127.0.0.1:1234/v1/models.
LM Studio model unloaded? Reload; cold start is a common “hanging” cause.
Context errors? Lower contextWindow or raise your server limit.
Safety: local models skip provider-side filters; keep agents narrow and compaction on to limit prompt injection blast radius.

The local chat-model path is separate from in-process local memory embeddings. See Core And Optional Components and Memory for the optional node-llama-cpp boundary.

Gateway overview

Run and repair

Configuration

Secrets and auth

Networking

Security and sandboxing

Protocols and APIs

Web interfaces

Security evidence

Local Models

Local models

Advanced local stack: LM Studio

Hybrid config: hosted primary, local fallback

Local-first with hosted safety net

Regional hosting / data routing

Registry-Supported Local Routes

Ollama

Other OpenAI-compatible Local Proxies

Troubleshooting

​Local models

​Advanced local stack: LM Studio

​Hybrid config: hosted primary, local fallback

​Local-first with hosted safety net

​Regional hosting / data routing

​Registry-Supported Local Routes

​Ollama

​Other OpenAI-compatible Local Proxies

​Troubleshooting

Local models

Advanced local stack: LM Studio

Hybrid config: hosted primary, local fallback

Local-first with hosted safety net

Regional hosting / data routing

Registry-Supported Local Routes

Ollama

Other OpenAI-compatible Local Proxies

Troubleshooting