Model Swap Guide
AuroraSOC is designed for plug-and-play model swapping — you can switch between base, fine-tuned, per-agent, or even non-Granite models by changing environment variables. No code changes required.
Quick Reference
# Use base Granite 4 (default — no fine-tuning)
make disable-finetuned
# Use a single fine-tuned model for all agents
make enable-finetuned
# Use per-agent fine-tuned specialists
export GRANITE_USE_FINETUNED=true
export GRANITE_USE_PER_AGENT_MODELS=true
# Force a specific model for all agents in Ollama mode (testing/debugging)
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama3.2:3b
export OLLAMA_ORCHESTRATOR_MODEL=llama3.2:3b
Scenario 1: Base Model (Default)
When: First setup, before any fine-tuning, or for development/testing.
# .env
LLM_BACKEND=ollama
OLLAMA_MODEL=granite4:8b
OLLAMA_ORCHESTRATOR_MODEL=granite4:dense
GRANITE_USE_FINETUNED=false
GRANITE_USE_PER_AGENT_MODELS=false
What happens: All 16 agents use granite4:8b from Ollama. The model has general language capability but no AuroraSOC-specific security training.
Scenario 2: Single Fine-Tuned Model
When: You've trained a generic SOC model (via make train) and want all agents to use it.
# .env
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=false
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest
Or use the Makefile shortcut:
make enable-finetuned
What happens: All agents share the same fine-tuned model. This model has been trained on SOC data across all domains.
Scenario 3: Per-Agent Specialists
When: You've trained individual specialist models (via python training/scripts/train_all_agents.py) and want each agent to use its domain-specific model.
# .env
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest
What happens: Each agent resolves to its specialist model. Agents without a trained specialist fall back to the generic fine-tuned model, then to the base model.
Scenario 4: Override for Testing
When: You want to test a completely different model (e.g., Llama, Mistral) across all agents.
# .env
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2:3b
OLLAMA_ORCHESTRATOR_MODEL=llama3.2:3b
What happens: All agents use the explicitly configured backend model IDs. In Ollama mode, this bypasses per-agent specialist resolution and uses the specified tags directly.
Important: Backend model IDs (OLLAMA_MODEL / OLLAMA_ORCHESTRATOR_MODEL or VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL) take precedence in their respective backend paths.
Scenario 5: vLLM for Production
When: Deploying to production where you need high throughput and multiple concurrent requests.
# .env
LLM_BACKEND=vllm
VLLM_BASE_URL=http://vllm-server:8000/v1
VLLM_MODEL=granite-soc-specialist
VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist
What happens: ChatModel.from_name() uses the OpenAI-compatible API (pointing at vLLM). The model IDs come directly from VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL.
Scenario 6: OpenAI-Compatible Providers
When: You want to use a cloud-hosted model (Together AI, Groq, Fireworks AI, OpenAI) or any local server that exposes an OpenAI-compatible /v1/chat/completions endpoint (llama.cpp, LM Studio).
# .env
LLM_BACKEND=openai
OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
OPENAI_COMPATIBLE_API_KEY=${GROQ_API_KEY}
What happens: ChatModel.from_name() uses the generic OpenAI adapter with your custom base URL and optional API key. No GPU or local model required.
You can optionally set a different model for the orchestrator:
OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL=llama-3.1-8b-instant
Switching Between Scenarios
From Base → Fine-Tuned
# 1. Train the model
make train-data
make train
# 2. Import to Ollama
make train-serve-ollama
# 3. Enable fine-tuned
make enable-finetuned
# 4. Restart services
docker compose restart
From Fine-Tuned → Per-Agent
# 1. Train all specialists
python training/scripts/train_all_agents.py
# 2. Import all to Ollama
python training/scripts/serve_model.py ollama-all --output-dir training/output
# 3. Enable per-agent
echo "GRANITE_USE_PER_AGENT_MODELS=true" >> .env
# 4. Restart
docker compose restart
From Per-Agent → Back to Base
make disable-finetuned
docker compose restart
A/B Testing Models
You can run two instances of AuroraSOC with different model configurations:
# Instance A: Base model (port 8001)
GRANITE_USE_FINETUNED=false PORT=8001 docker compose up
# Instance B: Fine-tuned (port 8002)
GRANITE_USE_FINETUNED=true PORT=8002 docker compose up
Send the same alerts to both instances and compare:
- Response quality
- Latency
- MITRE mapping accuracy
- False positive rates
Using Non-Granite Models
AuroraSOC's architecture supports any model accessible via vLLM (default), Ollama (fallback), or another OpenAI-compatible API:
Via Ollama
# Pull any Ollama model
ollama pull llama3.2:3b
ollama pull mistral:7b
ollama pull qwen2.5:7b
# Use it in AuroraSOC
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama3.2:3b
export OLLAMA_ORCHESTRATOR_MODEL=llama3.2:3b
Via OpenAI-Compatible API
# Point to any OpenAI-compatible endpoint
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.example.com/v1
export OPENAI_COMPATIBLE_MODEL=my-custom-model
export OPENAI_COMPATIBLE_API_KEY=sk-...
Via Cloud Providers
# OpenAI
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
export OPENAI_COMPATIBLE_MODEL=gpt-4o-mini
export OPENAI_COMPATIBLE_API_KEY=$OPENAI_API_KEY
# Groq
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
export OPENAI_COMPATIBLE_API_KEY=$GROQ_API_KEY
# Together AI
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.together.xyz/v1
export OPENAI_COMPATIBLE_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
export OPENAI_COMPATIBLE_API_KEY=$TOGETHER_API_KEY
Cloud API providers require appropriate authentication. Set OPENAI_COMPATIBLE_API_KEY to your provider's API key. The fine-tuned per-agent specialist models are local only — they can't be used with cloud providers unless you upload them.
Verifying the Active Model
Check What Each Agent Resolves To
You can verify model resolution programmatically:
from aurorasoc.granite import get_default_granite_config
config = get_default_granite_config()
# Check resolution for each agent
agents = ["security_analyst", "threat_hunter", "malware_analyst",
"incident_responder", "orchestrator"]
for agent in agents:
model = config.resolve_model(agent)
print(f"{agent:25s} → {model}")
Expected output with per-agent models enabled:
security_analyst → granite-soc-security-analyst:latest
threat_hunter → granite-soc-threat-hunter:latest
malware_analyst → granite-soc-malware-analyst:latest
incident_responder → granite-soc-incident-responder:latest
orchestrator → granite-soc-orchestrator:latest
Check Backend Models
# vLLM default: list served models
curl http://localhost:8000/v1/models
# Ollama fallback: list available local tags
ollama list
# Verify a specific model works
ollama run granite-soc:latest "Classify this alert: ET TROJAN Cobalt Strike Beacon"
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
Agent uses base model despite USE_FINETUNED=true | Fine-tuned model not in Ollama | Run make train-serve-ollama to import |
All agents use same model despite PER_AGENT=true | Agent model not in AGENT_MODEL_MAP or not imported | Check vLLM /v1/models (default) or ollama list (fallback), then import missing models |
ChatModel.from_name() fails | Ollama not running or wrong host | Verify OLLAMA_BASE_URL, run ollama serve |
| Model responds poorly | Using GGUF with excessive quantization | Re-export with q8_0 instead of q4_k_m |
| Override not taking effect | Env var not propagated | Restart the service, check docker compose config |
Next Steps
- Serving Backends — Ollama vs vLLM in detail
- Local Deployment — complete local setup walkthrough