LLM Provider Configuration
AuroraSOC supports multiple LLM backends - from fully local (air-gapped, no data leaves your network) to cloud-hosted providers. This guide covers every supported option with step-by-step setup instructions.
The shipped default backend is deepseek with the deepseek-v4-flash model (see .env.example). For a fully local, air-gapped deployment, switch LLM_BACKEND to ollama and run granite4:8b.
Quick Reference
| Provider | Type | Privacy | Performance | Cost |
|---|---|---|---|---|
| Ollama | Local (CPU/GPU) | Maximum - no data leaves host | Good (depends on hardware) | Free |
| vLLM | Local (GPU required) | Maximum - no data leaves host | Excellent (continuous batching) | Free (GPU hardware cost) |
| Jan AI | Local (desktop app) | Maximum - no data leaves host | Good | Free |
| LM Studio | Local (desktop app) | Maximum - no data leaves host | Good | Free |
| DeepSeek | Cloud | Data sent to DeepSeek servers | Excellent | ~$0.14/M input tokens |
| Google Gemini | Cloud | Data sent to Google servers | Excellent | Free tier available |
| OpenAI | Cloud | Data sent to OpenAI servers | Excellent | ~$2.50/M input tokens |
| Anthropic Claude | Cloud | Data sent to Anthropic servers | Excellent | ~$3/M input tokens |
| Groq | Cloud (fast inference) | Data sent to Groq servers | Very fast | Free tier available |
| Together AI | Cloud | Data sent to Together servers | Good | Pay-per-token |
Method 1: Environment Variables (Recommended for Production)
All LLM configuration is done through environment variables in your .env file. This is the recommended approach for production deployments.
Step 1: Set the Backend
Edit your .env file and set LLM_BACKEND to one of: ollama, vllm, openai, deepseek, gemini, anthropic, or auto.
# Choose ONE backend:
LLM_BACKEND=deepseek
Step 2: Configure the Provider
Add the provider-specific settings below.
Local Providers (Air-Gapped / On-Premises)
Ollama (Recommended for Development)
Ollama is the easiest way to run LLMs locally. No GPU required (but recommended).
Setup:
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull the recommended model
ollama pull granite4:8b
# 3. Verify it's running
ollama list
.env configuration:
LLM_BACKEND=ollama
LLM_OLLAMA_BASE_URL=http://localhost:11434
LLM_DEFAULT_MODEL=ollama:granite4:8b
LLM_ORCHESTRATOR_MODEL=ollama:granite4:8b
Recommended models for Ollama:
| Model | Size | Use Case |
|---|---|---|
granite4:8b | 4.9 GB | Default local model - good balance of speed and quality |
granite4:3b | 2.0 GB | Lightweight - for resource-constrained environments |
llama3.3:70b | 40 GB | High quality - requires 48+ GB RAM |
qwen2.5:32b | 19 GB | Strong reasoning - requires 32+ GB RAM |
deepseek-r1:8b | 4.9 GB | Reasoning-focused - good for investigations |
Docker deployment (recommended for production):
# In docker-compose, Ollama runs as a sibling container:
LLM_OLLAMA_BASE_URL=http://ollama:11434
vLLM (Recommended for Production with GPU)
vLLM provides high-throughput LLM serving with continuous batching. Requires an NVIDIA GPU.
Setup:
# Option A: Use the AuroraSOC compose stack (automatic)
just stack-up # Auto-detects GPU and starts vLLM
# Option B: Run vLLM separately
pip install vllm
vllm serve ibm-granite/granite-3.2-8b-instruct \
--served-model-name granite-soc-specialist \
--port 8000 \
--max-model-len 8192
.env configuration:
LLM_BACKEND=vllm
LLM_VLLM_BASE_URL=http://localhost:8000/v1
LLM_VLLM_MODEL=granite-soc-specialist
LLM_VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist
GPU requirements:
| Model Size | Minimum GPU VRAM | Recommended |
|---|---|---|
| 2B params | 6 GB | RTX 3060 |
| 8B params | 16 GB | RTX 4090 / A100 40GB |
| 70B params | 80 GB | 2x A100 80GB |
Jan AI (Desktop Application)
Jan is a desktop application that runs LLMs locally with a user-friendly interface and an OpenAI-compatible API.
Setup:
- Download Jan from jan.ai
- Install and launch
- Download a model (e.g., Llama 3.3 8B or Granite 3.2)
- Start the local API server (Settings → Advanced → Local API Server → Enable)
.env configuration:
LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=http://localhost:1337/v1
LLM_OPENAI_COMPATIBLE_MODEL=llama3.3-8b
LLM_OPENAI_COMPATIBLE_API_KEY=jan
LM Studio (Desktop Application)
LM Studio provides a polished desktop experience for running LLMs locally with an OpenAI-compatible server.
Setup:
- Download LM Studio from lmstudio.ai
- Install and launch
- Search and download a model (e.g.,
granite-3.2-8b-instruct) - Go to the "Local Server" tab and click "Start Server"
.env configuration:
LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
LLM_OPENAI_COMPATIBLE_MODEL=granite-3.2-8b-instruct
LLM_OPENAI_COMPATIBLE_API_KEY=lm-studio
Cloud Providers (Hosted)
Cloud providers process your prompts on external servers. For regulated environments (PCI-DSS, HIPAA, GDPR), use local providers or ensure your cloud provider agreement covers your compliance requirements.
DeepSeek
DeepSeek offers high-quality models at very competitive pricing.
Get an API key:
- Go to platform.deepseek.com
- Sign up or log in
- Navigate to API Keys in the dashboard
- Click Create API Key
- Copy the key (it starts with
sk-)
.env configuration:
LLM_BACKEND=deepseek
LLM_DEEPSEEK_API_KEY=sk-your-key-here
LLM_DEEPSEEK_MODEL=deepseek-v4-flash
LLM_DEEPSEEK_ORCHESTRATOR_MODEL=deepseek-v4-pro
LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com
Run the full fleet on DeepSeek:
With the key set above, one command brings up the whole system (API, all 11 agents, every MCP server, and the dashboard) on DeepSeek:
just stack-up-fleet
It waits for the API, applies migrations, seeds the default admin, and prints the dashboard URL, mode, model, and login.
Available models:
| Model | Best For | Context |
|---|---|---|
deepseek-v4-flash | Default - fast, cost-effective general agent tasks and triage | 64K |
deepseek-v4-pro | Premium quality - complex investigations, multi-step reasoning | 64K |
Google Gemini
Google Gemini provides powerful multimodal models with a generous free tier.
Get an API key:
- Go to aistudio.google.com
- Sign in with your Google account
- Click Get API key in the top navigation
- Click Create API key and select a Google Cloud project
- Copy the generated key
.env configuration:
LLM_BACKEND=gemini
LLM_GEMINI_API_KEY=your-api-key-here
LLM_GEMINI_MODEL=gemini-2.5-flash
LLM_GEMINI_ORCHESTRATOR_MODEL=gemini-2.5-pro
Available models:
| Model | Best For | Context | Cost |
|---|---|---|---|
gemini-2.5-flash | Fast triage, high volume | 1M tokens | Free tier: 15 RPM |
gemini-2.5-pro | Deep investigations | 1M tokens | Free tier: 2 RPM |
gemini-2.0-flash | Legacy compatibility | 1M tokens | Free tier available |
OpenAI
Get an API key:
- Go to platform.openai.com
- Navigate to API Keys
- Click Create new secret key
- Copy the key (starts with
sk-)
.env configuration:
LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
LLM_OPENAI_COMPATIBLE_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_API_KEY=sk-your-key-here
Anthropic Claude
Get an API key:
- Go to console.anthropic.com
- Navigate to API Keys
- Click Create Key
- Copy the key (starts with
sk-ant-)
.env configuration:
LLM_BACKEND=anthropic
LLM_ANTHROPIC_API_KEY=sk-ant-your-key-here
LLM_ANTHROPIC_MODEL=claude-sonnet-4-6
LLM_ANTHROPIC_ORCHESTRATOR_MODEL=claude-sonnet-4-6
Groq (Ultra-Fast Inference)
Groq provides extremely fast inference for open-weight models using custom LPU hardware.
Get an API key:
- Go to console.groq.com
- Navigate to API Keys
- Create a new key
.env configuration:
LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
LLM_OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
LLM_OPENAI_COMPATIBLE_API_KEY=gsk_your-key-here
Together AI
.env configuration:
LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.together.xyz/v1
LLM_OPENAI_COMPATIBLE_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
LLM_OPENAI_COMPATIBLE_API_KEY=your-together-key
Method 2: Operator Console UI
For runtime changes without restarting the backend:
- Log in to the operator console at
http://localhost:3000 - Navigate to Settings → LLM Providers
- Select a provider from the dropdown
- Enter the API key and model name
- Click Test Connection to verify
- Click Set as Active to switch
Method 3: Runtime API
Switch providers programmatically via the REST API:
# Set DeepSeek as active
curl -X PUT http://localhost:8000/api/v1/llm-providers/active \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"provider": "deepseek", "model": "deepseek-v4-flash"}'
# Test connection
curl -X POST http://localhost:8000/api/v1/llm-providers/test \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"provider": "gemini"}'
# Check active provider
curl http://localhost:8000/api/v1/llm-providers/active \
-H "Authorization: Bearer $TOKEN"
Auto-Resolution Mode
Set LLM_BACKEND=auto to let AuroraSOC automatically detect the best available backend:
- Probes the vLLM endpoint (
LLM_VLLM_BASE_URL) with a 2-second timeout - If vLLM responds with at least one model → uses vLLM
- Otherwise → falls back to Ollama
This is useful for deployments that may or may not have GPU hardware available.
Troubleshooting
| Problem | Solution |
|---|---|
| "LLM unreachable" in the dashboard | Check that the provider is running and the URL/key are correct |
| Agents produce empty responses | The model may be too small; try a larger model or a different provider |
| High latency on local models | Ensure GPU is being used (check nvidia-smi); reduce max_tokens |
| "Rate limit exceeded" on cloud providers | Reduce concurrent agents or upgrade your API plan |
| Air-gap mode blocks cloud calls | This is intentional; set LLM_BACKEND=ollama or vllm |
Security Best Practices
- Never commit API keys to version control. Use
.env(gitignored) or Vault. - Rotate keys regularly - use the operator console to update keys without downtime.
- Use local inference for regulated workloads (HIPAA, PCI-DSS, classified).
- Enable egress allowlist (
AURORA_ENFORCE_EGRESS_ALLOWLIST=1) to control which cloud endpoints agents can reach. - Monitor token usage via the Grafana
aurora_llm_tokens_totalmetric.