LLM provider configuration
What this page is
How AuroraSOC supports hosted LLM providers (DeepSeek, Gemini, OpenAI, Anthropic) in addition to local Ollama/vLLM: the configuration surface, where API keys are stored, the runtime admin API, and how a switch propagates.
Why it exists this way
Architecture §3.5 commits to a provider abstraction where agents can run against local or hosted inference keyed on sensitivity. Some hosts cannot run a local model at all, and trusted deployments may prefer a managed endpoint. ADR 022 records the decision: native BeeAI providers as first-class backends, runtime configuration via an admin API, and Vault-first key storage with an encrypted-database fallback.
How it works
Backends
ServingBackend (in
granite/init.py)
now includes deepseek, gemini, and anthropic next to ollama, vllm, and
openai. Each resolves a BeeAI model reference with the provider's native prefix
so ChatModel.from_name loads the right LiteLLM adapter:
deepseek:deepseek-v4-flashgemini:gemini-2.5-proanthropic:claude-sonnet-4-6openai:gpt-4o(also used for any OpenAI-compatible endpoint via a base URL)
The vLLM auto-probe (ADR 005) is unchanged and stays vLLM-only; the new providers are selected explicitly.
Settings (env)
All keys use the LLM_ prefix and live in
config/settings/llm.py:
LLM_BACKEND=deepseek # ollama | vllm | openai | deepseek | gemini | anthropic | auto
LLM_DEEPSEEK_API_KEY=...
LLM_DEEPSEEK_MODEL=deepseek-v4-flash
LLM_DEEPSEEK_ORCHESTRATOR_MODEL=deepseek-v4-pro
LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com
LLM_GEMINI_API_KEY=...
LLM_GEMINI_MODEL=gemini-2.5-pro # Gemini takes no base_url
LLM_ANTHROPIC_API_KEY=...
LLM_ANTHROPIC_MODEL=claude-sonnet-4-6
# OpenAI / OpenAI-compatible endpoints use the existing compatible vars:
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
LLM_OPENAI_COMPATIBLE_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_API_KEY=...
Env values are the boot defaults. Keys entered at runtime override them in the running process and persist to the secret store.
Key storage
core/secret_store.py
writes provider keys Vault-first (KV-v2 at aurorasoc/llm-providers/<provider>)
and falls back to a Fernet-encrypted row in the llm_provider_credentials
table (Alembic 028) when Vault is not configured. The Fernet key is derived from
AURORA_SECRET_KEY if set, otherwise from the mandatory JWT_SECRET_KEY. Raw
keys are never returned by the API, only key_present, key_last4, and the
storage source.
On boot, apply_stored_keys_to_settings() loads stored keys into the live
settings so the granite layer sees them.
Runtime API
The llm_providers router
(api/routers/llm_providers.py)
exposes admin endpoints (all mutations require users:manage):
GET /api/v1/system/llm/providers # inventory + active backend + key presence
PUT /api/v1/system/llm/config # { provider, model?, orchestrator_model?, base_url? }
PUT /api/v1/system/llm/providers/{p}/key # { api_key }
DELETE /api/v1/system/llm/providers/{p}/key
POST /api/v1/system/llm/providers/{p}/test # one-token reachability probe
Every mutation invalidates the shared ChatModel pool (reset_chat_model_pool)
and the backend-resolution cache (reset_resolution_cache) and is written to the
hash-chained user audit log. In-process inference (chat, investigation
workflows) adopts the change immediately; agent A2A server processes adopt it on
their next factory build or restart. Fleet-wide hot propagation is a later wave.
Choosing a provider
| Backend | Key | Base URL | Data leaves host | Best for |
|---|---|---|---|---|
| ollama | no | local | no | dev workstations, air-gapped sites |
| vllm | no | local | no | GPU production (default) |
| deepseek | yes | optional | yes | low-cost hosted reasoning |
| gemini | yes | none | yes | long-context, multimodal |
| openai | yes | optional | yes | broad model choice / compatible endpoints |
| anthropic | yes | optional | yes | hardest investigations, tool use |
What goes wrong and how do you fix it
No model configured for <provider>. Set the specialist model in Settings or theLLM_<PROVIDER>_MODELenv var before testing.- Key stored but agents still keyless after restart. Confirm
AURORA_SECRET_KEY(orJWT_SECRET_KEY) is stable across restarts, a changed seed cannot decrypt the database fallback. Vault-stored keys are unaffected. - Gemini rejects a base URL. The Gemini adapter takes no
base_url; the runtime never sends one. Leave it blank.
Related
- Inference backend resolution, the
autoresolver and cache. - LLM evaluation harness, comparing models before promotion.
- Operator runbook, the dashboard workflow.