Environment Variables Reference — LLM & Inference
This document is the single source of truth for all LLM-related environment variables in AuroraSOC. Every variable that controls inference behavior, backend selection, or model configuration is listed with exact name, default value, accepted values, and failure impact.
Reference Table
| Variable | Default | Accepted Values | Description | Consequence if Wrong |
|---|---|---|---|---|
LLM_BACKEND | vllm | vllm, ollama, openai | Selects which inference backend all agents and API chat calls use. | Agents target the wrong backend and fail to connect or return runtime inference errors. |
VLLM_BASE_URL | http://vllm:8000/v1 | Valid reachable vLLM OpenAI-compatible base URL | Base URL used when LLM_BACKEND=vllm. | Connection refused/timeouts or requests sent to a non-vLLM endpoint. |
VLLM_MODEL | granite-soc-specialist | Served vLLM model name string | Specialist model name sent in vLLM chat payloads. | vLLM returns model-not-found (404) or wrong model behavior. |
VLLM_ORCHESTRATOR_MODEL | granite-soc-specialist | Served vLLM model name string | Orchestrator model identifier for coordination workflows. | Orchestrator calls fail or use wrong reasoning profile. |
VLLM_TENSOR_PARALLEL | 1 | Positive integer up to available GPU count | Number of GPUs used for tensor parallel serving. | Startup failures or CUDA device errors if value exceeds available devices. |
HF_TOKEN | empty | Hugging Face access token string | Token used for gated/private model access during serving pulls. | vLLM may fail startup or fail model download with 401/403 errors. |
OLLAMA_BASE_URL | http://ollama:11434 | Valid reachable Ollama base URL | Base URL used when LLM_BACKEND=ollama. | API/agents cannot reach Ollama or call wrong service path. |
OLLAMA_MODEL | granite4:8b | Installed Ollama model tag | Specialist model tag for Ollama mode. | Ollama returns "model not found" and chat requests fail. |
OLLAMA_ORCHESTRATOR_MODEL | granite4:dense | Installed Ollama model tag | Orchestrator model tag for Ollama mode. | Orchestrator requests fail or degrade to incorrect model. |
OPENAI_COMPATIBLE_BASE_URL | empty | Valid reachable OpenAI-compatible base URL | Base URL used when LLM_BACKEND=openai. | Connection refused or requests sent to wrong endpoint. |
OPENAI_COMPATIBLE_MODEL | empty | Model name accepted by the endpoint | Specialist model name for OpenAI-compatible mode. | Endpoint returns model-not-found or wrong model behavior. |
OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL | empty | Model name accepted by the endpoint | Orchestrator model for OpenAI-compatible mode. Falls back to OPENAI_COMPATIBLE_MODEL if empty. | Orchestrator may use incorrect model. |
OPENAI_COMPATIBLE_API_KEY | empty | API key / bearer token string | Authentication token sent as Authorization: Bearer <key>. | 401/403 from provider if required. |
Setting LLM_BACKEND to vllm requires a reachable vLLM service. Setting it to ollama requires a running Ollama service with configured models already pulled. Setting it to openai requires a reachable OpenAI-compatible endpoint and (usually) an API key. If this variable is wrong, AuroraSOC routes requests to the wrong backend family and startup/runtime inference calls fail with connectivity or model-resolution errors.
When LLM_BACKEND=openai, AuroraSOC passes model names through as-is — Granite-specific normalization and per-agent fine-tuned model routing do not apply. The OPENAI_COMPATIBLE_* env var prefix is used (rather than OPENAI_*) to avoid collision with the openai Python SDK’s own OPENAI_API_KEY environment variable.
When LLM_BACKEND=vllm, AuroraSOC routes agent and workflow inference using VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL directly. GRANITE_USE_FINETUNED and GRANITE_USE_PER_AGENT_MODELS influence Ollama routing behavior, not vLLM model ID selection.
Common Misconfiguration Patterns
-
VLLM_MODELdoes not match--served-model-nameindocker-compose.yml. Cause: model name mismatch between runtime config and vLLM service declaration. Fix: align names exactly; examplegranite-soc-specialistin both locations. -
LLM_BACKENDchanged in.envbut containers were not restarted. Cause: containers keep old environment until recreated. Fix: rundocker compose up -dto apply environment changes. -
HF_TOKENis missing for a gated model. Cause: backend attempts authenticated pull without credentials. Fix: setHF_TOKENto a valid token with required repository access. -
VLLM_TENSOR_PARALLELis higher than available GPU count. Cause: tensor parallel config exceeds physical device inventory. Fix: setVLLM_TENSOR_PARALLELto a value less than or equal to detected GPUs. -
OLLAMA_BASE_URLandVLLM_BASE_URLare swapped. Cause: backend URL values point to the opposite engine type. Fix: restore canonical pairing (vllmURL for vLLM, Ollama URL for Ollama). -
Expecting
GRANITE_USE_FINETUNED=trueto change vLLM model names. Cause: in vLLM mode, runtime model names come fromVLLM_MODELandVLLM_ORCHESTRATOR_MODEL. Fix: set the desired vLLM served model IDs inVLLM_MODEL/VLLM_ORCHESTRATOR_MODELand restart services.
How to Apply Changes
Environment variables are read at container startup. Changing .env alone has no effect on already-running containers. Use this procedure:
- Edit
.env. - Run:
docker compose up -d
Docker will detect environment changes and recreate only affected services.