Skip to main content

Model Swap Guide

AuroraSOC is designed for plug-and-play model swapping — you can switch between base, fine-tuned, per-agent, or even non-Granite models by changing environment variables. No code changes required.

Quick Reference

# Use base Granite 4 (default — no fine-tuning)
make disable-finetuned

# Use a single fine-tuned model for all agents
make enable-finetuned

# Use per-agent fine-tuned specialists
export GRANITE_USE_FINETUNED=true
export GRANITE_USE_PER_AGENT_MODELS=true

# Force a specific model for all agents in Ollama mode (testing/debugging)
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama3.2:3b
export OLLAMA_ORCHESTRATOR_MODEL=llama3.2:3b

Scenario 1: Base Model (Default)

When: First setup, before any fine-tuning, or for development/testing.

# .env
LLM_BACKEND=ollama
OLLAMA_MODEL=granite4:8b
OLLAMA_ORCHESTRATOR_MODEL=granite4:dense
GRANITE_USE_FINETUNED=false
GRANITE_USE_PER_AGENT_MODELS=false

What happens: All 16 agents use granite4:8b from Ollama. The model has general language capability but no AuroraSOC-specific security training.

Scenario 2: Single Fine-Tuned Model

When: You've trained a generic SOC model (via make train) and want all agents to use it.

# .env
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=false
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest

Or use the Makefile shortcut:

make enable-finetuned

What happens: All agents share the same fine-tuned model. This model has been trained on SOC data across all domains.

Scenario 3: Per-Agent Specialists

When: You've trained individual specialist models (via python training/scripts/train_all_agents.py) and want each agent to use its domain-specific model.

# .env
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest

What happens: Each agent resolves to its specialist model. Agents without a trained specialist fall back to the generic fine-tuned model, then to the base model.

Scenario 4: Override for Testing

When: You want to test a completely different model (e.g., Llama, Mistral) across all agents.

# .env
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2:3b
OLLAMA_ORCHESTRATOR_MODEL=llama3.2:3b

What happens: All agents use the explicitly configured backend model IDs. In Ollama mode, this bypasses per-agent specialist resolution and uses the specified tags directly.

Important: Backend model IDs (OLLAMA_MODEL / OLLAMA_ORCHESTRATOR_MODEL or VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL) take precedence in their respective backend paths.

Scenario 5: vLLM for Production

When: Deploying to production where you need high throughput and multiple concurrent requests.

# .env
LLM_BACKEND=vllm
VLLM_BASE_URL=http://vllm-server:8000/v1
VLLM_MODEL=granite-soc-specialist
VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist

What happens: ChatModel.from_name() uses the OpenAI-compatible API (pointing at vLLM). The model IDs come directly from VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL.

Scenario 6: OpenAI-Compatible Providers

When: You want to use a cloud-hosted model (Together AI, Groq, Fireworks AI, OpenAI) or any local server that exposes an OpenAI-compatible /v1/chat/completions endpoint (llama.cpp, LM Studio).

# .env
LLM_BACKEND=openai
OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
OPENAI_COMPATIBLE_API_KEY=${GROQ_API_KEY}

What happens: ChatModel.from_name() uses the generic OpenAI adapter with your custom base URL and optional API key. No GPU or local model required.

You can optionally set a different model for the orchestrator:

OPENAI_COMPATIBLE_ORCHESTRATOR_MODEL=llama-3.1-8b-instant

Switching Between Scenarios

From Base → Fine-Tuned

# 1. Train the model
make train-data
make train

# 2. Import to Ollama
make train-serve-ollama

# 3. Enable fine-tuned
make enable-finetuned

# 4. Restart services
docker compose restart

From Fine-Tuned → Per-Agent

# 1. Train all specialists
python training/scripts/train_all_agents.py

# 2. Import all to Ollama
python training/scripts/serve_model.py ollama-all --output-dir training/output

# 3. Enable per-agent
echo "GRANITE_USE_PER_AGENT_MODELS=true" >> .env

# 4. Restart
docker compose restart

From Per-Agent → Back to Base

make disable-finetuned
docker compose restart

A/B Testing Models

You can run two instances of AuroraSOC with different model configurations:

# Instance A: Base model (port 8001)
GRANITE_USE_FINETUNED=false PORT=8001 docker compose up

# Instance B: Fine-tuned (port 8002)
GRANITE_USE_FINETUNED=true PORT=8002 docker compose up

Send the same alerts to both instances and compare:

  • Response quality
  • Latency
  • MITRE mapping accuracy
  • False positive rates

Using Non-Granite Models

AuroraSOC's architecture supports any model accessible via vLLM (default), Ollama (fallback), or another OpenAI-compatible API:

Via Ollama

# Pull any Ollama model
ollama pull llama3.2:3b
ollama pull mistral:7b
ollama pull qwen2.5:7b

# Use it in AuroraSOC
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama3.2:3b
export OLLAMA_ORCHESTRATOR_MODEL=llama3.2:3b

Via OpenAI-Compatible API

# Point to any OpenAI-compatible endpoint
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.example.com/v1
export OPENAI_COMPATIBLE_MODEL=my-custom-model
export OPENAI_COMPATIBLE_API_KEY=sk-...

Via Cloud Providers

# OpenAI
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.openai.com/v1
export OPENAI_COMPATIBLE_MODEL=gpt-4o-mini
export OPENAI_COMPATIBLE_API_KEY=$OPENAI_API_KEY

# Groq
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.groq.com/openai/v1
export OPENAI_COMPATIBLE_MODEL=llama-3.3-70b-versatile
export OPENAI_COMPATIBLE_API_KEY=$GROQ_API_KEY

# Together AI
export LLM_BACKEND=openai
export OPENAI_COMPATIBLE_BASE_URL=https://api.together.xyz/v1
export OPENAI_COMPATIBLE_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
export OPENAI_COMPATIBLE_API_KEY=$TOGETHER_API_KEY
caution

Cloud API providers require appropriate authentication. Set OPENAI_COMPATIBLE_API_KEY to your provider's API key. The fine-tuned per-agent specialist models are local only — they can't be used with cloud providers unless you upload them.

Verifying the Active Model

Check What Each Agent Resolves To

You can verify model resolution programmatically:

from aurorasoc.granite import get_default_granite_config

config = get_default_granite_config()

# Check resolution for each agent
agents = ["security_analyst", "threat_hunter", "malware_analyst",
"incident_responder", "orchestrator"]

for agent in agents:
model = config.resolve_model(agent)
print(f"{agent:25s}{model}")

Expected output with per-agent models enabled:

security_analyst → granite-soc-security-analyst:latest
threat_hunter → granite-soc-threat-hunter:latest
malware_analyst → granite-soc-malware-analyst:latest
incident_responder → granite-soc-incident-responder:latest
orchestrator → granite-soc-orchestrator:latest

Check Backend Models

# vLLM default: list served models
curl http://localhost:8000/v1/models

# Ollama fallback: list available local tags
ollama list

# Verify a specific model works
ollama run granite-soc:latest "Classify this alert: ET TROJAN Cobalt Strike Beacon"

Troubleshooting

ProblemCauseSolution
Agent uses base model despite USE_FINETUNED=trueFine-tuned model not in OllamaRun make train-serve-ollama to import
All agents use same model despite PER_AGENT=trueAgent model not in AGENT_MODEL_MAP or not importedCheck vLLM /v1/models (default) or ollama list (fallback), then import missing models
ChatModel.from_name() failsOllama not running or wrong hostVerify OLLAMA_BASE_URL, run ollama serve
Model responds poorlyUsing GGUF with excessive quantizationRe-export with q8_0 instead of q4_k_m
Override not taking effectEnv var not propagatedRestart the service, check docker compose config

Next Steps