LLM provider configuration

What this page is

How AuroraSOC supports hosted LLM providers (DeepSeek, Gemini, OpenAI, Anthropic) in addition to local Ollama/vLLM: the configuration surface, where API keys are stored, the runtime admin API, and how a switch propagates.

Why it exists this way

Architecture §3.5 commits to a provider abstraction where agents can run against local or hosted inference keyed on sensitivity. Some hosts cannot run a local model at all, and trusted deployments may prefer a managed endpoint. ADR 022 records the decision: native BeeAI providers as first-class backends, runtime configuration via an admin API, and Vault-first key storage with an encrypted-database fallback.

How it works

Backends

ServingBackend (in granite/init.py) now includes deepseek, gemini, and anthropic next to ollama, vllm, and openai. Each resolves a BeeAI model reference with the provider's native prefix so ChatModel.from_name loads the right LiteLLM adapter:

deepseek:deepseek-v4-flash
gemini:gemini-2.5-pro
anthropic:claude-sonnet-4-6
openai:gpt-4o (also used for any OpenAI-compatible endpoint via a base URL)

The vLLM auto-probe (ADR 005) is unchanged and stays vLLM-only; the new providers are selected explicitly.

Settings (env)

All keys use the LLM_ prefix and live in config/settings/llm.py:

LLM_BACKEND=deepseek                 # ollama | vllm | openai | deepseek | gemini | anthropic | auto

LLM_DEEPSEEK_API_KEY=...
LLM_DEEPSEEK_MODEL=deepseek-v4-flash
LLM_DEEPSEEK_ORCHESTRATOR_MODEL=deepseek-v4-pro
LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com

LLM_GEMINI_API_KEY=...
LLM_GEMINI_MODEL=gemini-2.5-pro      # Gemini takes no base_url

LLM_ANTHROPIC_API_KEY=...
LLM_ANTHROPIC_MODEL=claude-sonnet-4-6

# OpenAI / OpenAI-compatible endpoints use the existing compatible vars:
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
LLM_OPENAI_COMPATIBLE_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_API_KEY=...

Env values are the boot defaults. Keys entered at runtime override them in the running process and persist to the secret store.

Key storage

core/secret_store.py writes provider keys Vault-first (KV-v2 at aurorasoc/llm-providers/<provider>) and falls back to a Fernet-encrypted row in the llm_provider_credentials table (Alembic 028) when Vault is not configured. The Fernet key is derived from AURORA_SECRET_KEY if set, otherwise from the mandatory JWT_SECRET_KEY. Raw keys are never returned by the API, only key_present, key_last4, and the storage source.

On boot, apply_stored_keys_to_settings() loads stored keys into the live settings so the granite layer sees them.

Runtime API

The llm_providers router (api/routers/llm_providers.py) exposes admin endpoints (all mutations require users:manage):

GET    /api/v1/system/llm/providers              # inventory + active backend + key presence
PUT    /api/v1/system/llm/config                 # { provider, model?, orchestrator_model?, base_url? }
PUT    /api/v1/system/llm/providers/{p}/key      # { api_key }
DELETE /api/v1/system/llm/providers/{p}/key
POST   /api/v1/system/llm/providers/{p}/test     # one-token reachability probe

Every mutation invalidates the shared ChatModel pool (reset_chat_model_pool) and the backend-resolution cache (reset_resolution_cache) and is written to the hash-chained user audit log. In-process inference (chat, investigation workflows) adopts the change immediately; agent A2A server processes adopt it on their next factory build or restart. Fleet-wide hot propagation is a later wave.

Choosing a provider

Backend	Key	Base URL	Data leaves host	Best for
ollama	no	local	no	dev workstations, air-gapped sites
vllm	no	local	no	GPU production (default)
deepseek	yes	optional	yes	low-cost hosted reasoning
gemini	yes	none	yes	long-context, multimodal
openai	yes	optional	yes	broad model choice / compatible endpoints
anthropic	yes	optional	yes	hardest investigations, tool use

What goes wrong and how do you fix it

No model configured for <provider>. Set the specialist model in Settings or the LLM_<PROVIDER>_MODEL env var before testing.
Key stored but agents still keyless after restart. Confirm AURORA_SECRET_KEY (or JWT_SECRET_KEY) is stable across restarts, a changed seed cannot decrypt the database fallback. Vault-stored keys are unaffected.
Gemini rejects a base URL. The Gemini adapter takes no base_url; the runtime never sends one. Leave it blank.

Inference backend resolution, the auto resolver and cache.
LLM evaluation harness, comparing models before promotion.
Operator runbook, the dashboard workflow.

What this page is​

Why it exists this way​

How it works​

Backends​

Settings (env)​

Key storage​

Runtime API​

Choosing a provider​

What goes wrong and how do you fix it​

Related​