Hugging Face (Inference)

Hugging Face Inference Providers offer OpenAI-compatible chat completions through a single router API. You get access to many models, including DeepSeek, Llama, and more, with one token. Fased uses the OpenAI-compatible endpoint for chat completions only. For text-to-image, embeddings, or speech, use the HF inference clients directly. Hugging Face is a current Fased model provider route. The route id is huggingface; model availability comes from the HF router plus Fased’s built-in catalog fallback.

Provider: huggingface
Auth: HUGGINGFACE_HUB_TOKEN or HF_TOKEN (fine-grained token with Make calls to Inference Providers)
API: OpenAI-compatible (https://router.huggingface.co/v1)
Billing: single HF token; pricing follows the active provider route.

Quick start

Create a fine-grained token at Hugging Face → Settings → Tokens with the Make calls to Inference Providers permission.
Run onboarding and choose Hugging Face in the provider dropdown, then enter your API key when prompted:

fased onboard --auth-choice huggingface-api-key

In onboarding, pick the Hugging Face model you want. The list is loaded from the Inference API when you have a valid token; otherwise a built-in list is shown.
In the browser UI, open Agents, select the Agent, then use Agent > Models to assign Hugging Face as the Agent’s primary, fallback, or task model.
You can also set the Agent default model in config for automation:

{
  agents: {
    defaults: {
      model: { primary: "huggingface/openai/gpt-oss-120b" },
    },
  },
}

Non-interactive example

fased onboard --non-interactive \
  --mode local \
  --auth-choice huggingface-api-key \
  --huggingface-api-key "$HF_TOKEN"

This stores the key and can set huggingface/openai/gpt-oss-120b as the Agent default model.

Environment note

If the Gateway runs as a daemon (launchd/systemd), make sure HUGGINGFACE_HUB_TOKEN or HF_TOKEN is available to that process. Use ~/.fased/.env or env.shellEnv. Fased discovers models by calling the Inference endpoint directly:

GET https://router.huggingface.co/v1/models

For the full list, send Authorization: Bearer $HUGGINGFACE_HUB_TOKEN or Authorization: Bearer $HF_TOKEN. Some endpoints return a subset without auth. The response is OpenAI-style:

{ "object": "list", "data": [{ "id": "Qwen/Qwen3-8B", "owned_by": "Qwen" }] }

When you configure a Hugging Face API key through onboarding, HUGGINGFACE_HUB_TOKEN, or HF_TOKEN, Fased uses this request to discover available chat-completion models. During interactive onboarding, after you enter your token, the Default Hugging Face model dropdown is populated from that list. If the request fails, Fased uses the built-in catalog. At runtime, for example during Gateway startup, Fased calls GET https://router.huggingface.co/v1/models again when a key is present. The result is merged with the built-in catalog for metadata such as context window and cost. If the request fails or no key is set, only the built-in catalog is used.

Model names and editable options

Name from API: The model display name is hydrated from GET /v1/models when the API returns name, title, or display_name. Otherwise it is derived from the model id, for example openai/gpt-oss-120b → “GPT OSS 120B”.
Override display name: You can set a custom label per model in config so it appears the way you want in the CLI and UI:

{
  agents: {
    defaults: {
      models: {
        "huggingface/openai/gpt-oss-120b": { alias: "GPT-OSS 120B" },
        "huggingface/openai/gpt-oss-120b:cheapest": { alias: "GPT-OSS 120B (cheap)" },
      },
    },
  },
}

Provider / policy selection: Append a suffix to the model id to choose how the router picks the backend:
- :fastest — highest throughput (router picks; provider choice is locked — no interactive backend picker).
- :cheapest — lowest cost per output token (router picks; provider choice is locked).
- :provider — force a specific backend (e.g. :sambanova, :together).
When you select :cheapest or :fastest, the provider is locked: the router decides by cost or speed and no optional “prefer specific backend” step is shown. You can add these as separate entries in models.providers.huggingface.models or set model.primary with the suffix. You can also set your default order in Inference Provider settings. No suffix means Fased uses that order.
Config merge: Existing entries in models.providers.huggingface.models, for example in models.json, are kept when config is merged. Custom name, alias, and model options stay in place.

Model IDs and configuration examples

Model refs use the form huggingface/<org>/<model> with Hub-style IDs. The first-run list is curated from the official Chat Completion recommendations. When you have a valid token, the runtime can still discover more with GET https://router.huggingface.co/v1/models. Example IDs from Fased’s built-in Hugging Face catalog:

Model	Ref (prefix with `huggingface/`)
GPT-OSS 120B	`openai/gpt-oss-120b`
DeepSeek V4 Pro	`deepseek-ai/DeepSeek-V4-Pro`
Kimi K2.6	`moonshotai/Kimi-K2.6`
MiniMax M2.7	`MiniMaxAI/MiniMax-M2.7`
GLM 5.1	`zai-org/GLM-5.1`
Qwen3.6 35B A3B	`Qwen/Qwen3.6-35B-A3B`
Qwen3.5 397B A17B	`Qwen/Qwen3.5-397B-A17B`
Qwen3 Coder Next	`Qwen/Qwen3-Coder-Next`
Qwen3 Coder 480B	`Qwen/Qwen3-Coder-480B-A35B-Instruct`
Gemma 4 31B IT	`google/gemma-4-31B-it`

You can append :fastest, :cheapest, or :provider such as :together or :sambanova to the model id. Set your default order in Inference Provider settings. See Inference Providers and GET https://router.huggingface.co/v1/models for the full list.

Complete configuration examples

Primary GPT-OSS 120B with Qwen Coder fallback:

{
  agents: {
    defaults: {
      model: {
        primary: "huggingface/openai/gpt-oss-120b",
        fallbacks: ["huggingface/Qwen/Qwen3-Coder-480B-A35B-Instruct"],
      },
      models: {
        "huggingface/openai/gpt-oss-120b": { alias: "GPT-OSS 120B" },
        "huggingface/Qwen/Qwen3-Coder-480B-A35B-Instruct": { alias: "Qwen3 Coder" },
      },
    },
  },
}

Qwen as default, with :cheapest and :fastest variants:

{
  agents: {
    defaults: {
      model: { primary: "huggingface/Qwen/Qwen3.6-35B-A3B" },
      models: {
        "huggingface/Qwen/Qwen3.6-35B-A3B": { alias: "Qwen3.6 35B" },
        "huggingface/Qwen/Qwen3.6-35B-A3B:cheapest": { alias: "Qwen3.6 35B (cheap)" },
        "huggingface/Qwen/Qwen3.6-35B-A3B:fastest": { alias: "Qwen3.6 35B (fast)" },
      },
    },
  },
}

GPT-OSS + GLM + DeepSeek with aliases:

{
  agents: {
    defaults: {
      model: {
        primary: "huggingface/openai/gpt-oss-120b",
        fallbacks: ["huggingface/zai-org/GLM-5.1", "huggingface/deepseek-ai/DeepSeek-V4-Pro"],
      },
      models: {
        "huggingface/openai/gpt-oss-120b": { alias: "GPT-OSS 120B" },
        "huggingface/zai-org/GLM-5.1": { alias: "GLM 5.1" },
        "huggingface/deepseek-ai/DeepSeek-V4-Pro": { alias: "DeepSeek V4 Pro" },
      },
    },
  },
}

Force a specific backend with :provider:

{
  agents: {
    defaults: {
      model: { primary: "huggingface/deepseek-ai/DeepSeek-R1:together" },
      models: {
        "huggingface/deepseek-ai/DeepSeek-R1:together": { alias: "DeepSeek R1 (Together)" },
      },
    },
  },
}

Multiple Qwen and DeepSeek models with policy suffixes:

{
  agents: {
    defaults: {
      model: { primary: "huggingface/Qwen/Qwen3.6-35B-A3B:cheapest" },
      models: {
        "huggingface/Qwen/Qwen3.6-35B-A3B": { alias: "Qwen3.6 35B" },
        "huggingface/Qwen/Qwen3.6-35B-A3B:cheapest": { alias: "Qwen3.6 35B (cheap)" },
        "huggingface/deepseek-ai/DeepSeek-R1:fastest": { alias: "DeepSeek R1 (fast)" },
        "huggingface/openai/gpt-oss-120b": { alias: "GPT-OSS 120B" },
      },
    },
  },
}

Overview

Setup and routing

Common providers

Local and private

Additional providers

Hugging Face (Inference)

Hugging Face (Inference)

Quick start

Non-interactive example

Environment note

Model names and editable options

Model IDs and configuration examples

Complete configuration examples

​Hugging Face (Inference)

​Quick start

​Non-interactive example

​Environment note

​Model discovery and onboarding dropdown

​Model names and editable options

​Model IDs and configuration examples

​Complete configuration examples

Hugging Face (Inference)

Quick start

Non-interactive example

Environment note

Model discovery and onboarding dropdown

Model names and editable options

Model IDs and configuration examples

Complete configuration examples