Skip to main content

Environment Variables Reference — LLM & Inference

This document is the single source of truth for all LLM-related environment variables in AuroraSOC. Every variable that controls inference behavior, backend selection, or model configuration is listed with exact name, default value, accepted values, and failure impact.

Reference Table

VariableDefaultAccepted ValuesDescriptionConsequence if Wrong
LLM_BACKENDvllmvllm, ollama, openaiSelects which inference backend all agents and API chat calls use.Agents target the wrong backend and fail to connect or return runtime inference errors.
VLLM_BASE_URLhttp://vllm:8000/v1Valid reachable vLLM OpenAI-compatible base URLBase URL used when LLM_BACKEND=vllm.Connection refused/timeouts or requests sent to a non-vLLM endpoint.
VLLM_MODELgranite-soc-specialistServed vLLM model name stringSpecialist model name sent in vLLM chat payloads.vLLM returns model-not-found (404) or wrong model behavior.
VLLM_ORCHESTRATOR_MODELgranite-soc-specialistServed vLLM model name stringOrchestrator model identifier for coordination workflows.Orchestrator calls fail or use wrong reasoning profile.
VLLM_TENSOR_PARALLEL1Positive integer up to available GPU countNumber of GPUs used for tensor parallel serving.Startup failures or CUDA device errors if value exceeds available devices.
HF_TOKENemptyHugging Face access token stringToken used for gated/private model access during serving pulls.vLLM may fail startup or fail model download with 401/403 errors.
OLLAMA_BASE_URLhttp://ollama:11434Valid reachable Ollama base URLBase URL used when LLM_BACKEND=ollama.API/agents cannot reach Ollama or call wrong service path.
OLLAMA_MODELgranite4:8bInstalled Ollama model tagSpecialist model tag for Ollama mode.Ollama returns "model not found" and chat requests fail.
OLLAMA_ORCHESTRATOR_MODELgranite4:denseInstalled Ollama model tagOrchestrator model tag for Ollama mode.Orchestrator requests fail or degrade to incorrect model.
OPENAI_COMPATIBLE_BASE_URLemptyValid reachable OpenAI-compatible base URLBase URL used when LLM_BACKEND=openai.Connection refused or requests sent to wrong endpoint.
OPENAI_COMPATIBLE_MODELemptyModel name accepted by the endpointSpecialist model name for OpenAI-compatible mode.Endpoint returns model-not-found or wrong model behavior.
OPENAI_COMPATIBLE_ORCHESTRATOR_MODELemptyModel name accepted by the endpointOrchestrator model for OpenAI-compatible mode. Falls back to OPENAI_COMPATIBLE_MODEL if empty.Orchestrator may use incorrect model.
OPENAI_COMPATIBLE_API_KEYemptyAPI key / bearer token stringAuthentication token sent as Authorization: Bearer <key>.401/403 from provider if required.

Setting LLM_BACKEND to vllm requires a reachable vLLM service. Setting it to ollama requires a running Ollama service with configured models already pulled. Setting it to openai requires a reachable OpenAI-compatible endpoint and (usually) an API key. If this variable is wrong, AuroraSOC routes requests to the wrong backend family and startup/runtime inference calls fail with connectivity or model-resolution errors.

When LLM_BACKEND=openai, AuroraSOC passes model names through as-is — Granite-specific normalization and per-agent fine-tuned model routing do not apply. The OPENAI_COMPATIBLE_* env var prefix is used (rather than OPENAI_*) to avoid collision with the openai Python SDK’s own OPENAI_API_KEY environment variable.

When LLM_BACKEND=vllm, AuroraSOC routes agent and workflow inference using VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL directly. GRANITE_USE_FINETUNED and GRANITE_USE_PER_AGENT_MODELS influence Ollama routing behavior, not vLLM model ID selection.

Common Misconfiguration Patterns

  1. VLLM_MODEL does not match --served-model-name in docker-compose.yml. Cause: model name mismatch between runtime config and vLLM service declaration. Fix: align names exactly; example granite-soc-specialist in both locations.

  2. LLM_BACKEND changed in .env but containers were not restarted. Cause: containers keep old environment until recreated. Fix: run docker compose up -d to apply environment changes.

  3. HF_TOKEN is missing for a gated model. Cause: backend attempts authenticated pull without credentials. Fix: set HF_TOKEN to a valid token with required repository access.

  4. VLLM_TENSOR_PARALLEL is higher than available GPU count. Cause: tensor parallel config exceeds physical device inventory. Fix: set VLLM_TENSOR_PARALLEL to a value less than or equal to detected GPUs.

  5. OLLAMA_BASE_URL and VLLM_BASE_URL are swapped. Cause: backend URL values point to the opposite engine type. Fix: restore canonical pairing (vllm URL for vLLM, Ollama URL for Ollama).

  6. Expecting GRANITE_USE_FINETUNED=true to change vLLM model names. Cause: in vLLM mode, runtime model names come from VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL. Fix: set the desired vLLM served model IDs in VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL and restart services.

How to Apply Changes

Environment variables are read at container startup. Changing .env alone has no effect on already-running containers. Use this procedure:

  1. Edit .env.
  2. Run:
docker compose up -d

Docker will detect environment changes and recreate only affected services.