Environment Variables Reference — LLM & Inference

This document is the single source of truth for all LLM-related environment variables in AuroraSOC. Every variable that controls inference behavior, backend selection, or model configuration is listed with exact name, default value, accepted values, and failure impact.

Reference Table

Variable	Default	Accepted Values	Description	Consequence if Wrong
`LLM_BACKEND`	`vllm`	`vllm`, `ollama`, `openai`	Selects which inference backend all agents and API chat calls use.	Agents target the wrong backend and fail to connect or return runtime inference errors.
`VLLM_BASE_URL`	`http://vllm:8000/v1`	Valid reachable vLLM OpenAI-compatible base URL	Base URL used when `LLM_BACKEND=vllm`.	Connection refused/timeouts or requests sent to a non-vLLM endpoint.
`VLLM_MODEL`	`granite-soc-specialist`	Served vLLM model name string	Specialist model name sent in vLLM chat payloads.	vLLM returns model-not-found (404) or wrong model behavior.
`VLLM_ORCHESTRATOR_MODEL`	`granite-soc-specialist`	Served vLLM model name string	Orchestrator model identifier for coordination workflows.	Orchestrator calls fail or use wrong reasoning profile.
`VLLM_TENSOR_PARALLEL`	`1`	Positive integer up to available GPU count	Number of GPUs used for tensor parallel serving.	Startup failures or CUDA device errors if value exceeds available devices.
`HF_TOKEN`	empty	Hugging Face access token string	Token used for gated/private model access during serving pulls.	vLLM may fail startup or fail model download with 401/403 errors.
`OLLAMA_BASE_URL`	`http://ollama:11434`	Valid reachable Ollama base URL	Base URL used when `LLM_BACKEND=ollama`.	API/agents cannot reach Ollama or call wrong service path.
`OLLAMA_MODEL`	`granite4:8b`	Installed Ollama model tag	Specialist model tag for Ollama mode.	Ollama returns "model not found" and chat requests fail.
`OLLAMA_ORCHESTRATOR_MODEL`	`granite4:dense`	Installed Ollama model tag	Orchestrator model tag for Ollama mode.	Orchestrator requests fail or degrade to incorrect model.
`OPENAI_COMPATIBLE_BASE_URL`	empty	Valid reachable OpenAI-compatible base URL	Base URL used when `LLM_BACKEND=openai`.	Connection refused or requests sent to wrong endpoint.
`OPENAI_COMPATIBLE_MODEL`	empty	Model name accepted by the endpoint	Specialist model name for OpenAI-compatible mode.	Endpoint returns model-not-found or wrong model behavior.
`OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL`	empty	Model name accepted by the endpoint	Orchestrator model for OpenAI-compatible mode. Falls back to `OPENAI_COMPATIBLE_MODEL` if empty.	Orchestrator may use incorrect model.
`OPENAI_COMPATIBLE_API_KEY`	empty	API key / bearer token string	Authentication token sent as `Authorization: Bearer <key>`.	401/403 from provider if required.

Setting LLM_BACKEND to vllm requires a reachable vLLM service. Setting it to ollama requires a running Ollama service with configured models already pulled. Setting it to openai requires a reachable OpenAI-compatible endpoint and (usually) an API key. If this variable is wrong, AuroraSOC routes requests to the wrong backend family and startup/runtime inference calls fail with connectivity or model-resolution errors.

When LLM_BACKEND=openai, AuroraSOC passes model names through as-is — Granite-specific normalization and per-agent fine-tuned model routing do not apply. The OPENAI_COMPATIBLE_* env var prefix is used (rather than OPENAI_*) to avoid collision with the openai Python SDK’s own OPENAI_API_KEY environment variable.

When LLM_BACKEND=vllm, AuroraSOC routes agent and workflow inference using VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL directly. GRANITE_USE_FINETUNED and GRANITE_USE_PER_AGENT_MODELS influence Ollama routing behavior, not vLLM model ID selection.

Common Misconfiguration Patterns

VLLM_MODEL does not match --served-model-name in docker-compose.yml. Cause: model name mismatch between runtime config and vLLM service declaration. Fix: align names exactly; example granite-soc-specialist in both locations.
LLM_BACKEND changed in .env but containers were not restarted. Cause: containers keep old environment until recreated. Fix: run docker compose up -d to apply environment changes.
HF_TOKEN is missing for a gated model. Cause: backend attempts authenticated pull without credentials. Fix: set HF_TOKEN to a valid token with required repository access.
VLLM_TENSOR_PARALLEL is higher than available GPU count. Cause: tensor parallel config exceeds physical device inventory. Fix: set VLLM_TENSOR_PARALLEL to a value less than or equal to detected GPUs.
OLLAMA_BASE_URL and VLLM_BASE_URL are swapped. Cause: backend URL values point to the opposite engine type. Fix: restore canonical pairing (vllm URL for vLLM, Ollama URL for Ollama).
Expecting GRANITE_USE_FINETUNED=true to change vLLM model names. Cause: in vLLM mode, runtime model names come from VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL. Fix: set the desired vLLM served model IDs in VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL and restart services.

How to Apply Changes

Environment variables are read at container startup. Changing .env alone has no effect on already-running containers. Use this procedure:

Edit .env.
Run:

docker compose up -d

Docker will detect environment changes and recreate only affected services.

Reference Table​

Common Misconfiguration Patterns​

How to Apply Changes​

Reference Table

Common Misconfiguration Patterns

How to Apply Changes