Local Deployment Guide
This page walks through deploying AuroraSOC with Granite LLMs on a single machine — the fastest way to go from fine-tuned model to running agents. It covers the automated setup script, manual steps, Docker Compose integration, and verification.
Prerequisites
Before you begin:
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Ubuntu 22.04 / macOS 14+ | Ubuntu 24.04 |
| RAM | 16 GB | 32 GB |
| GPU VRAM | 4 GB (GGUF q4) | 8+ GB (GGUF q8) |
| Disk | 20 GB free | 50 GB free |
| Ollama | v0.4+ | Latest |
| Docker | 24.0+ | 27.0+ |
| Python | 3.10+ | 3.12 |
Automated Setup – setup_local.sh
The fastest path. This script installs everything and verifies the setup:
chmod +x scripts/setup_local.sh
./scripts/setup_local.sh
What the Script Does
- Checks system dependencies — verifies Python ≥ 3.10, Docker, Docker Compose, NVIDIA drivers (if GPU present)
- Installs Ollama — downloads and installs Ollama if not present
- Pulls the base model —
ollama pull granite3.2:2b - Creates Python virtualenv — installs AuroraSOC with training extras (
pip install -e ".[training]") - Copies
.env.example→.env— sets sane defaults (Ollama backend, localhost URLs) - Runs database migrations —
alembic upgrade head - Verifies Ollama inference — sends a test prompt and checks for a valid response
- Prints status summary — shows all service URLs and next steps
When to Use the Script
- First time setup on a new machine
- After cloning the repository on a fresh environment
- When onboarding a new developer who needs everything working quickly
When NOT to Use the Script
- You already have a working environment (just update
.envmanually) - You're deploying to production (use Docker Compose instead)
- You need vLLM (the script sets up Ollama only)
Manual Setup (Step-by-Step)
If you prefer control over each step:
Step 1: Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve & # Start in background
Step 2: Pull or Import a Model
Option A — Use the base model (no training):
ollama pull granite3.2:2b
Option B — Import your fine-tuned GGUF:
# Generate Modelfile + create Ollama model
python training/scripts/serve_model.py ollama \
--gguf training/output/generic/unsloth.Q8_0.gguf \
--name granite-soc:latest
Option C — Import per-agent models:
# Create all agent-specific models at once
python training/scripts/serve_model.py ollama-all \
--output-dir training/output
This creates separate Ollama models for each trained agent:
granite-soc-security-analyst:latestgranite-soc-threat-hunter:latestgranite-soc-incident-responder:latest- (etc.)
Step 3: Configure Environment
cp .env.example .env
Edit .env:
# Backend selection
GRANITE_SERVING_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
# Model selection (base or fine-tuned)
GRANITE_BASE_MODEL=granite3.2:2b # Base model name in Ollama
GRANITE_FINETUNED_MODEL=granite-soc:latest # Fine-tuned generic model
# Optional: use per-agent fine-tuned models
GRANITE_USE_FINETUNED=true
Step 4: Verify
# Confirm models are loaded
ollama list
# Test inference directly
ollama run granite-soc:latest "What is a C2 beacon?"
# Test via the Granite module
python -c "
from aurorasoc.granite import create_granite_chat_model, get_default_granite_config
config = get_default_granite_config()
print(f'Backend: {config.serving_backend}')
print(f'Model: {config.resolve_model(\"security_analyst\")}')
"
Step 5: Start AuroraSOC
# Start the API server
uvicorn aurorasoc.api.main:app --host 0.0.0.0 --port 8000 --reload
# Or use the Makefile
make run
Docker Compose Deployment
For a containerised deployment with all services:
The x-granite-env YAML Anchor
The docker-compose.yml uses a YAML anchor to avoid repeating Granite environment variables across services:
x-granite-env: &granite-env
GRANITE_SERVING_BACKEND: ${GRANITE_SERVING_BACKEND:-ollama}
GRANITE_BASE_MODEL: ${GRANITE_BASE_MODEL:-granite3.2:2b}
GRANITE_FINETUNED_MODEL: ${GRANITE_FINETUNED_MODEL:-granite-soc:latest}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
OLLAMA_HOST: ${OLLAMA_HOST:-http://ollama:11434}
VLLM_API_BASE: ${VLLM_API_BASE:-http://vllm:8000}
Why an anchor? Multiple services (API, workers, health-check) need the same Granite settings. The anchor ensures they stay in sync: change it once, and every service that references *granite-env picks up the change.
Each service merges these variables:
services:
api:
environment:
<<: *granite-env
# ... other service-specific vars
Starting Docker Compose
# Start all services (Ollama, API, workers, monitoring)
docker compose up -d
# Check service health
docker compose ps
# View Ollama logs
docker compose logs ollama
# View API logs
docker compose logs api
Docker Compose Service Architecture
Importing Fine-Tuned Models in Docker
After training, import your GGUF into the Docker Ollama instance:
# Copy GGUF into the Ollama container
docker compose cp training/output/generic/unsloth.Q8_0.gguf ollama:/tmp/
# Exec into the container and create the model
docker compose exec ollama ollama create granite-soc:latest \
-f /tmp/Modelfile
# Or use the serve script which handles this automatically
docker compose exec api python training/scripts/serve_model.py ollama \
--gguf /models/generic/unsloth.Q8_0.gguf \
--name granite-soc:latest
Enabling / Disabling Fine-Tuned Models
Use base model only (no fine-tuning)
# .env
GRANITE_USE_FINETUNED=false
GRANITE_BASE_MODEL=granite3.2:2b
All agents will use the same base model. Good for initial development.
Use a single fine-tuned generic model
# .env
GRANITE_USE_FINETUNED=true
GRANITE_FINETUNED_MODEL=granite-soc:latest
All agents share one fine-tuned model. Good after generic training.
Use per-agent fine-tuned models
# .env
GRANITE_USE_FINETUNED=true
GRANITE_FINETUNED_MODEL=granite-soc:latest # fallback for agents without a specialist model
The AGENT_MODEL_MAP in aurorasoc/granite/__init__.py maps each agent to its specialist:
AGENT_MODEL_MAP = {
"security_analyst": "granite-soc-security-analyst:latest",
"threat_hunter": "granite-soc-threat-hunter:latest",
"incident_responder": "granite-soc-incident-responder:latest",
# ...
}
The 4-tier resolution automatically selects the right model:
Override → Per-agent fine-tuned → Generic fine-tuned → Base
See Granite Module for the full resolution logic.
Verification Checklist
Run through this checklist after deployment:
# 1. Ollama is running and responsive
curl -s http://localhost:11434/api/tags | jq '.models[].name'
# 2. Expected models are loaded
ollama list | grep granite
# 3. Inference works
curl -s http://localhost:11434/api/chat -d '{
"model": "granite-soc:latest",
"messages": [{"role": "user", "content": "What is lateral movement?"}],
"stream": false
}' | jq '.message.content'
# 4. API server starts without errors
curl -s http://localhost:8000/health | jq
# 5. Granite module resolves models correctly
python -c "
from aurorasoc.granite import get_default_granite_config
cfg = get_default_granite_config()
for agent in ['security_analyst', 'threat_hunter', 'incident_responder']:
print(f'{agent}: {cfg.resolve_model(agent)}')
"
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
connection refused :11434 | Ollama not running | ollama serve or systemctl start ollama |
model not found | Model not imported | ollama list → then ollama create or ollama pull |
out of memory | GGUF too large for GPU | Use smaller quant (Q4_K_M) or set OLLAMA_GPU_LAYERS=0 for CPU |
| Models load slowly | Cold start | Set OLLAMA_KEEP_ALIVE=30m to keep model warm |
| Wrong output format | Missing chat template | Re-import with serve_model.py which generates correct Modelfile |
CUDA error | Driver mismatch | Check nvidia-smi and ollama CUDA version compatibility |
| API returns base model output | GRANITE_USE_FINETUNED=false | Set to true in .env and restart |
Production Considerations
For production deployments beyond a single machine:
- Use vLLM — switch backend for throughput. See Serving Backends.
- Separate GPU node — run the LLM server on a dedicated GPU machine, point
OLLAMA_HOSTorVLLM_API_BASEto its IP. - Model versioning — tag models with dates (
granite-soc:2025-01-15) to enable rollback. - Health monitoring — integrate
check_ollama_models()/check_vllm_models()into your monitoring stack. - GPU metrics — export
nvidia-smimetrics to Prometheus viadcgm-exporter.
Next Steps
- Serving Backends — Ollama vs vLLM deep dive
- Model Swap & Override — switch models without redeploying
- Training: Overview — go back and train a model