LLM Provider Configuration

AuroraSOC supports multiple LLM backends - from fully local (air-gapped, no data leaves your network) to cloud-hosted providers. This guide covers every supported option with step-by-step setup instructions.

The shipped default backend is deepseek with the deepseek-v4-flash model (see .env.example). For a fully local, air-gapped deployment, switch LLM_BACKEND to ollama and run granite4:8b.

Quick Reference

Provider	Type	Privacy	Performance	Cost
Ollama	Local (CPU/GPU)	Maximum - no data leaves host	Good (depends on hardware)	Free
vLLM	Local (GPU required)	Maximum - no data leaves host	Excellent (continuous batching)	Free (GPU hardware cost)
Jan AI	Local (desktop app)	Maximum - no data leaves host	Good	Free
LM Studio	Local (desktop app)	Maximum - no data leaves host	Good	Free
DeepSeek	Cloud	Data sent to DeepSeek servers	Excellent	~$0.14/M input tokens
Google Gemini	Cloud	Data sent to Google servers	Excellent	Free tier available
OpenAI	Cloud	Data sent to OpenAI servers	Excellent	~$2.50/M input tokens
Anthropic Claude	Cloud	Data sent to Anthropic servers	Excellent	~$3/M input tokens
Groq	Cloud (fast inference)	Data sent to Groq servers	Very fast	Free tier available
Together AI	Cloud	Data sent to Together servers	Good	Pay-per-token

Method 1: Environment Variables (Recommended for Production)

All LLM configuration is done through environment variables in your .env file. This is the recommended approach for production deployments.

Step 1: Set the Backend

Edit your .env file and set LLM_BACKEND to one of: ollama, vllm, openai, deepseek, gemini, anthropic, or auto.

# Choose ONE backend:
LLM_BACKEND=deepseek

Step 2: Configure the Provider

Add the provider-specific settings below.

Local Providers (Air-Gapped / On-Premises)

Ollama (Recommended for Development)

Ollama is the easiest way to run LLMs locally. No GPU required (but recommended).

Setup:

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull the recommended model
ollama pull granite4:8b

# 3. Verify it's running
ollama list

.env configuration:

LLM_BACKEND=ollama
LLM_OLLAMA_BASE_URL=http://localhost:11434
LLM_DEFAULT_MODEL=ollama:granite4:8b
LLM_ORCHESTRATOR_MODEL=ollama:granite4:8b

Recommended models for Ollama:

Model	Size	Use Case
`granite4:8b`	4.9 GB	Default local model - good balance of speed and quality
`granite4:3b`	2.0 GB	Lightweight - for resource-constrained environments
`llama3.3:70b`	40 GB	High quality - requires 48+ GB RAM
`qwen2.5:32b`	19 GB	Strong reasoning - requires 32+ GB RAM
`deepseek-r1:8b`	4.9 GB	Reasoning-focused - good for investigations

Docker deployment (recommended for production):

# In docker-compose, Ollama runs as a sibling container:
LLM_OLLAMA_BASE_URL=http://ollama:11434

vLLM (Recommended for Production with GPU)

vLLM provides high-throughput LLM serving with continuous batching. Requires an NVIDIA GPU.

Setup:

# Option A: Use the AuroraSOC compose stack (automatic)
just stack-up    # Auto-detects GPU and starts vLLM

# Option B: Run vLLM separately
pip install vllm
vllm serve ibm-granite/granite-3.2-8b-instruct \
  --served-model-name granite-soc-specialist \
  --port 8000 \
  --max-model-len 8192

.env configuration:

LLM_BACKEND=vllm
LLM_VLLM_BASE_URL=http://localhost:8000/v1
LLM_VLLM_MODEL=granite-soc-specialist
LLM_VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist

GPU requirements:

Model Size	Minimum GPU VRAM	Recommended
2B params	6 GB	RTX 3060
8B params	16 GB	RTX 4090 / A100 40GB
70B params	80 GB	2x A100 80GB

Jan AI (Desktop Application)

Jan is a desktop application that runs LLMs locally with a user-friendly interface and an OpenAI-compatible API.

Setup:

Download Jan from jan.ai
Install and launch
Download a model (e.g., Llama 3.3 8B or Granite 3.2)
Start the local API server (Settings → Advanced → Local API Server → Enable)

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=http://localhost:1337/v1
LLM_OPENAI_COMPATIBLE_MODEL=llama3.3-8b
LLM_OPENAI_COMPATIBLE_API_KEY=jan

LM Studio (Desktop Application)

LM Studio provides a polished desktop experience for running LLMs locally with an OpenAI-compatible server.

Setup:

Download LM Studio from lmstudio.ai
Install and launch
Search and download a model (e.g., granite-3.2-8b-instruct)
Go to the "Local Server" tab and click "Start Server"

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
LLM_OPENAI_COMPATIBLE_MODEL=granite-3.2-8b-instruct
LLM_OPENAI_COMPATIBLE_API_KEY=lm-studio

Cloud Providers (Hosted)

Data Privacy

Cloud providers process your prompts on external servers. For regulated environments (PCI-DSS, HIPAA, GDPR), use local providers or ensure your cloud provider agreement covers your compliance requirements.

DeepSeek

DeepSeek offers high-quality models at very competitive pricing.

Get an API key:

Go to platform.deepseek.com
Sign up or log in
Navigate to API Keys in the dashboard
Click Create API Key
Copy the key (it starts with sk-)

.env configuration:

LLM_BACKEND=deepseek
LLM_DEEPSEEK_API_KEY=sk-your-key-here
LLM_DEEPSEEK_MODEL=deepseek-v4-flash
LLM_DEEPSEEK_ORCHESTRATOR_MODEL=deepseek-v4-pro
LLM_DEEPSEEK_BASE_URL=https://api.deepseek.com

Run the full fleet on DeepSeek:

With the key set above, one command brings up the whole system (API, all 11 agents, every MCP server, and the dashboard) on DeepSeek:

just stack-up-fleet

It waits for the API, applies migrations, seeds the default admin, and prints the dashboard URL, mode, model, and login.

Available models:

Model	Best For	Context
`deepseek-v4-flash`	Default - fast, cost-effective general agent tasks and triage	64K
`deepseek-v4-pro`	Premium quality - complex investigations, multi-step reasoning	64K

Google Gemini

Google Gemini provides powerful multimodal models with a generous free tier.

Get an API key:

Go to aistudio.google.com
Sign in with your Google account
Click Get API key in the top navigation
Click Create API key and select a Google Cloud project
Copy the generated key

.env configuration:

LLM_BACKEND=gemini
LLM_GEMINI_API_KEY=your-api-key-here
LLM_GEMINI_MODEL=gemini-2.5-flash
LLM_GEMINI_ORCHESTRATOR_MODEL=gemini-2.5-pro

Available models:

Model	Best For	Context	Cost
`gemini-2.5-flash`	Fast triage, high volume	1M tokens	Free tier: 15 RPM
`gemini-2.5-pro`	Deep investigations	1M tokens	Free tier: 2 RPM
`gemini-2.0-flash`	Legacy compatibility	1M tokens	Free tier available

OpenAI

Get an API key:

Go to platform.openai.com
Navigate to API Keys
Click Create new secret key
Copy the key (starts with sk-)

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
LLM_OPENAI_COMPATIBLE_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL=gpt-4o
LLM_OPENAI_COMPATIBLE_API_KEY=sk-your-key-here

Anthropic Claude

Get an API key:

Go to console.anthropic.com
Navigate to API Keys
Click Create Key
Copy the key (starts with sk-ant-)

.env configuration:

LLM_BACKEND=anthropic
LLM_ANTHROPIC_API_KEY=sk-ant-your-key-here
LLM_ANTHROPIC_MODEL=claude-sonnet-4-6
LLM_ANTHROPIC_ORCHESTRATOR_MODEL=claude-sonnet-4-6

Groq (Ultra-Fast Inference)

Groq provides extremely fast inference for open-weight models using custom LPU hardware.

Get an API key:

Go to console.groq.com
Navigate to API Keys
Create a new key

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
LLM_OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
LLM_OPENAI_COMPATIBLE_API_KEY=gsk_your-key-here

Together AI

.env configuration:

LLM_BACKEND=openai
LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.together.xyz/v1
LLM_OPENAI_COMPATIBLE_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
LLM_OPENAI_COMPATIBLE_API_KEY=your-together-key

Method 2: Operator Console UI

For runtime changes without restarting the backend:

Log in to the operator console at http://localhost:3000
Navigate to Settings → LLM Providers
Select a provider from the dropdown
Enter the API key and model name
Click Test Connection to verify
Click Set as Active to switch

Method 3: Runtime API

Switch providers programmatically via the REST API:

# Set DeepSeek as active
curl -X PUT http://localhost:8000/api/v1/llm-providers/active \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"provider": "deepseek", "model": "deepseek-v4-flash"}'

# Test connection
curl -X POST http://localhost:8000/api/v1/llm-providers/test \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"provider": "gemini"}'

# Check active provider
curl http://localhost:8000/api/v1/llm-providers/active \
  -H "Authorization: Bearer $TOKEN"

Auto-Resolution Mode

Set LLM_BACKEND=auto to let AuroraSOC automatically detect the best available backend:

Probes the vLLM endpoint (LLM_VLLM_BASE_URL) with a 2-second timeout
If vLLM responds with at least one model → uses vLLM
Otherwise → falls back to Ollama

This is useful for deployments that may or may not have GPU hardware available.

Troubleshooting

Problem	Solution
"LLM unreachable" in the dashboard	Check that the provider is running and the URL/key are correct
Agents produce empty responses	The model may be too small; try a larger model or a different provider
High latency on local models	Ensure GPU is being used (check `nvidia-smi`); reduce `max_tokens`
"Rate limit exceeded" on cloud providers	Reduce concurrent agents or upgrade your API plan
Air-gap mode blocks cloud calls	This is intentional; set `LLM_BACKEND=ollama` or `vllm`

Security Best Practices

Never commit API keys to version control. Use .env (gitignored) or Vault.
Rotate keys regularly - use the operator console to update keys without downtime.
Use local inference for regulated workloads (HIPAA, PCI-DSS, classified).
Enable egress allowlist (AURORA_ENFORCE_EGRESS_ALLOWLIST=1) to control which cloud endpoints agents can reach.
Monitor token usage via the Grafana aurora_llm_tokens_total metric.

Quick Reference​

Method 1: Environment Variables (Recommended for Production)​

Step 1: Set the Backend​

Step 2: Configure the Provider​

Local Providers (Air-Gapped / On-Premises)​

Ollama (Recommended for Development)​

vLLM (Recommended for Production with GPU)​

Jan AI (Desktop Application)​

LM Studio (Desktop Application)​

Cloud Providers (Hosted)​

DeepSeek​

Google Gemini​

OpenAI​

Anthropic Claude​

Groq (Ultra-Fast Inference)​

Together AI​

Method 2: Operator Console UI​

Method 3: Runtime API​

Auto-Resolution Mode​

Troubleshooting​

Security Best Practices​

Quick Reference

Method 1: Environment Variables (Recommended for Production)

Step 1: Set the Backend

Step 2: Configure the Provider

Local Providers (Air-Gapped / On-Premises)

Ollama (Recommended for Development)

vLLM (Recommended for Production with GPU)

Jan AI (Desktop Application)

LM Studio (Desktop Application)

Cloud Providers (Hosted)

DeepSeek

Google Gemini

OpenAI

Anthropic Claude

Groq (Ultra-Fast Inference)

Together AI

Method 2: Operator Console UI

Method 3: Runtime API

Auto-Resolution Mode

Troubleshooting

Security Best Practices