Skip to main content

LLM Provider Configuration

AuroraSOC supports multiple LLM backends - from fully local (air-gapped, no data leaves your network) to cloud-hosted providers. This guide covers every supported option with step-by-step setup instructions.

The shipped default backend is deepseek with the deepseek-v4-flash model (see .env.example). For a fully local, air-gapped deployment, switch LLM_BACKEND to ollama and run granite4:8b.

Quick Reference

ProviderTypePrivacyPerformanceCost
OllamaLocal (CPU/GPU)Maximum - no data leaves hostGood (depends on hardware)Free
vLLMLocal (GPU required)Maximum - no data leaves hostExcellent (continuous batching)Free (GPU hardware cost)
Jan AILocal (desktop app)Maximum - no data leaves hostGoodFree
LM StudioLocal (desktop app)Maximum - no data leaves hostGoodFree
DeepSeekCloudData sent to DeepSeek serversExcellent~$0.14/M input tokens
Google GeminiCloudData sent to Google serversExcellentFree tier available
OpenAICloudData sent to OpenAI serversExcellent~$2.50/M input tokens
Anthropic ClaudeCloudData sent to Anthropic serversExcellent~$3/M input tokens
GroqCloud (fast inference)Data sent to Groq serversVery fastFree tier available
Together AICloudData sent to Together serversGoodPay-per-token

All LLM configuration is done through environment variables in your .env file. This is the recommended approach for production deployments.

Step 1: Set the Backend

Edit your .env file and set LLM_BACKEND to one of: ollama, vllm, openai, deepseek, gemini, anthropic, or auto.

# Choose ONE backend:
LLM_BACKEND=deepseek

Step 2: Configure the Provider

Add the provider-specific settings below.


Local Providers (Air-Gapped / On-Premises)

Ollama is the easiest way to run LLMs locally. No GPU required (but recommended).

Setup:

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull the recommended model
ollama pull granite4:8b

# 3. Verify it's running
ollama list

.env configuration:

LLM_BACKEND=ollama
LLM_OLLAMA_BASE_URL=http://localhost:11434
LLM_DEFAULT_MODEL=ollama:granite4:8b
LLM_ORCHESTRATOR_MODEL=ollama:granite4:8b

Recommended models for Ollama:

ModelSizeUse Case
granite4:8b4.9 GBDefault local model - good balance of speed and quality
granite4:3b2.0 GBLightweight - for resource-constrained environments
llama3.3:70b40 GBHigh quality - requires 48+ GB RAM
qwen2.5:32b19 GBStrong reasoning - requires 32+ GB RAM
deepseek-r1:8b4.9 GBReasoning-focused - good for investigations

Docker deployment (recommended for production):

# In docker-compose, Ollama runs as a sibling container:
LLM_OLLAMA_BASE_URL=http://ollama:11434

vLLM provides high-throughput LLM serving with continuous batching. Requires an NVIDIA GPU.

Setup:

# Option A: Use the AuroraSOC compose stack (automatic)
just stack-up # Auto-detects GPU and starts vLLM

# Option B: Run vLLM separately
pip install vllm
vllm serve ibm-granite/granite-3.2-8b-instruct \
--served-model-name granite-soc-specialist \
--port 8000 \
--max-model-len 8192

.env configuration:

LLM_BACKEND=vllm
LLM_VLLM_BASE_URL=http://localhost:8000/v1
LLM_VLLM_MODEL=granite-soc-specialist
LLM_VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist

GPU requirements:

Model SizeMinimum GPU VRAMRecommended
2B params6 GBRTX 3060
8B params16 GBRTX 4090 / A100 40GB
70B params80 GB2x A100 80GB

Jan AI (Desktop Application)

Jan is a desktop application that runs LLMs locally with a user-friendly interface and an OpenAI-compatible API.

Setup:

  1. Download Jan from jan.ai
  2. Install and launch
  3. Download a model (e.g., Llama 3.3 8B or Granite 3.2)
  4. Start the local API server (Settings → Advanced → Local API Server → Enable)

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=http://localhost:1337/v1
LLM_OPENAI_COMPATIBLE_MODEL=llama3.3-8b
LLM_OPENAI_COMPATIBLE_API_KEY=jan

LM Studio (Desktop Application)

LM Studio provides a polished desktop experience for running LLMs locally with an OpenAI-compatible server.

Setup:

  1. Download LM Studio from lmstudio.ai
  2. Install and launch
  3. Search and download a model (e.g., granite-3.2-8b-instruct)
  4. Go to the "Local Server" tab and click "Start Server"

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
LLM_OPENAI_COMPATIBLE_MODEL=granite-3.2-8b-instruct
LLM_OPENAI_COMPATIBLE_API_KEY=lm-studio

Cloud Providers (Hosted)

Data Privacy

Cloud providers process your prompts on external servers. For regulated environments (PCI-DSS, HIPAA, GDPR), use local providers or ensure your cloud provider agreement covers your compliance requirements.

DeepSeek

DeepSeek offers high-quality models at very competitive pricing.

Get an API key:

  1. Go to platform.deepseek.com
  2. Sign up or log in
  3. Navigate to API Keys in the dashboard
  4. Click Create API Key
  5. Copy the key (it starts with sk-)

.env configuration:

LLM_BACKEND=deepseek
LLM_DEEPSEEK_API_KEY=sk-your-key-here
LLM_DEEPSEEK_MODEL=deepseek-v4-flash
LLM_DEEPSEEK_ORCHESTRATOR_MODEL=deepseek-v4-pro
LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com

Run the full fleet on DeepSeek:

With the key set above, one command brings up the whole system (API, all 11 agents, every MCP server, and the dashboard) on DeepSeek:

just stack-up-fleet

It waits for the API, applies migrations, seeds the default admin, and prints the dashboard URL, mode, model, and login.

Available models:

ModelBest ForContext
deepseek-v4-flashDefault - fast, cost-effective general agent tasks and triage64K
deepseek-v4-proPremium quality - complex investigations, multi-step reasoning64K

Google Gemini

Google Gemini provides powerful multimodal models with a generous free tier.

Get an API key:

  1. Go to aistudio.google.com
  2. Sign in with your Google account
  3. Click Get API key in the top navigation
  4. Click Create API key and select a Google Cloud project
  5. Copy the generated key

.env configuration:

LLM_BACKEND=gemini
LLM_GEMINI_API_KEY=your-api-key-here
LLM_GEMINI_MODEL=gemini-2.5-flash
LLM_GEMINI_ORCHESTRATOR_MODEL=gemini-2.5-pro

Available models:

ModelBest ForContextCost
gemini-2.5-flashFast triage, high volume1M tokensFree tier: 15 RPM
gemini-2.5-proDeep investigations1M tokensFree tier: 2 RPM
gemini-2.0-flashLegacy compatibility1M tokensFree tier available

OpenAI

Get an API key:

  1. Go to platform.openai.com
  2. Navigate to API Keys
  3. Click Create new secret key
  4. Copy the key (starts with sk-)

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
LLM_OPENAI_COMPATIBLE_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_API_KEY=sk-your-key-here

Anthropic Claude

Get an API key:

  1. Go to console.anthropic.com
  2. Navigate to API Keys
  3. Click Create Key
  4. Copy the key (starts with sk-ant-)

.env configuration:

LLM_BACKEND=anthropic
LLM_ANTHROPIC_API_KEY=sk-ant-your-key-here
LLM_ANTHROPIC_MODEL=claude-sonnet-4-6
LLM_ANTHROPIC_ORCHESTRATOR_MODEL=claude-sonnet-4-6

Groq (Ultra-Fast Inference)

Groq provides extremely fast inference for open-weight models using custom LPU hardware.

Get an API key:

  1. Go to console.groq.com
  2. Navigate to API Keys
  3. Create a new key

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
LLM_OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
LLM_OPENAI_COMPATIBLE_API_KEY=gsk_your-key-here

Together AI

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.together.xyz/v1
LLM_OPENAI_COMPATIBLE_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
LLM_OPENAI_COMPATIBLE_API_KEY=your-together-key

Method 2: Operator Console UI

For runtime changes without restarting the backend:

  1. Log in to the operator console at http://localhost:3000
  2. Navigate to Settings → LLM Providers
  3. Select a provider from the dropdown
  4. Enter the API key and model name
  5. Click Test Connection to verify
  6. Click Set as Active to switch

Method 3: Runtime API

Switch providers programmatically via the REST API:

# Set DeepSeek as active
curl -X PUT http://localhost:8000/api/v1/llm-providers/active \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"provider": "deepseek", "model": "deepseek-v4-flash"}'

# Test connection
curl -X POST http://localhost:8000/api/v1/llm-providers/test \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"provider": "gemini"}'

# Check active provider
curl http://localhost:8000/api/v1/llm-providers/active \
-H "Authorization: Bearer $TOKEN"

Auto-Resolution Mode

Set LLM_BACKEND=auto to let AuroraSOC automatically detect the best available backend:

  1. Probes the vLLM endpoint (LLM_VLLM_BASE_URL) with a 2-second timeout
  2. If vLLM responds with at least one model → uses vLLM
  3. Otherwise → falls back to Ollama

This is useful for deployments that may or may not have GPU hardware available.


Troubleshooting

ProblemSolution
"LLM unreachable" in the dashboardCheck that the provider is running and the URL/key are correct
Agents produce empty responsesThe model may be too small; try a larger model or a different provider
High latency on local modelsEnsure GPU is being used (check nvidia-smi); reduce max_tokens
"Rate limit exceeded" on cloud providersReduce concurrent agents or upgrade your API plan
Air-gap mode blocks cloud callsThis is intentional; set LLM_BACKEND=ollama or vllm

Security Best Practices

  1. Never commit API keys to version control. Use .env (gitignored) or Vault.
  2. Rotate keys regularly - use the operator console to update keys without downtime.
  3. Use local inference for regulated workloads (HIPAA, PCI-DSS, classified).
  4. Enable egress allowlist (AURORA_ENFORCE_EGRESS_ALLOWLIST=1) to control which cloud endpoints agents can reach.
  5. Monitor token usage via the Grafana aurora_llm_tokens_total metric.