Skip to main content

Local Deployment Guide

This page walks through deploying AuroraSOC with Granite LLMs on a single machine — the fastest way to go from fine-tuned model to running agents. It covers the automated setup script, manual steps, Docker Compose integration, and verification.

Prerequisites

Before you begin:

RequirementMinimumRecommended
OSUbuntu 22.04 / macOS 14+Ubuntu 24.04
RAM16 GB32 GB
GPU VRAM4 GB (GGUF q4)8+ GB (GGUF q8)
Disk20 GB free50 GB free
Ollamav0.4+Latest
Docker24.0+27.0+
Python3.10+3.12

Automated Setup – setup_local.sh

The fastest path. This script installs everything and verifies the setup:

chmod +x scripts/setup_local.sh
./scripts/setup_local.sh

What the Script Does

  1. Checks system dependencies — verifies Python ≥ 3.10, Docker, Docker Compose, NVIDIA drivers (if GPU present)
  2. Installs Ollama — downloads and installs Ollama if not present
  3. Pulls the base modelollama pull granite3.2:2b
  4. Creates Python virtualenv — installs AuroraSOC with training extras (pip install -e ".[training]")
  5. Copies .env.example.env — sets sane defaults (Ollama backend, localhost URLs)
  6. Runs database migrationsalembic upgrade head
  7. Verifies Ollama inference — sends a test prompt and checks for a valid response
  8. Prints status summary — shows all service URLs and next steps

When to Use the Script

  • First time setup on a new machine
  • After cloning the repository on a fresh environment
  • When onboarding a new developer who needs everything working quickly

When NOT to Use the Script

  • You already have a working environment (just update .env manually)
  • You're deploying to production (use Docker Compose instead)
  • You need vLLM (the script sets up Ollama only)

Manual Setup (Step-by-Step)

If you prefer control over each step:

Step 1: Install Ollama

curl -fsSL https://ollama.ai/install.sh | sh
ollama serve & # Start in background

Step 2: Pull or Import a Model

Option A — Use the base model (no training):

ollama pull granite3.2:2b

Option B — Import your fine-tuned GGUF:

# Generate Modelfile + create Ollama model
python training/scripts/serve_model.py ollama \
--gguf training/output/generic/unsloth.Q8_0.gguf \
--name granite-soc:latest

Option C — Import per-agent models:

# Create all agent-specific models at once
python training/scripts/serve_model.py ollama-all \
--output-dir training/output

This creates separate Ollama models for each trained agent:

  • granite-soc-security-analyst:latest
  • granite-soc-threat-hunter:latest
  • granite-soc-incident-responder:latest
  • (etc.)

Step 3: Configure Environment

cp .env.example .env

Edit .env:

# Backend selection
GRANITE_SERVING_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434

# Model selection (base or fine-tuned)
GRANITE_BASE_MODEL=granite3.2:2b # Base model name in Ollama
GRANITE_FINETUNED_MODEL=granite-soc:latest # Fine-tuned generic model

# Optional: use per-agent fine-tuned models
GRANITE_USE_FINETUNED=true

Step 4: Verify

# Confirm models are loaded
ollama list

# Test inference directly
ollama run granite-soc:latest "What is a C2 beacon?"

# Test via the Granite module
python -c "
from aurorasoc.granite import create_granite_chat_model, get_default_granite_config
config = get_default_granite_config()
print(f'Backend: {config.serving_backend}')
print(f'Model: {config.resolve_model(\"security_analyst\")}')
"

Step 5: Start AuroraSOC

# Start the API server
uvicorn aurorasoc.api.main:app --host 0.0.0.0 --port 8000 --reload

# Or use the Makefile
make run

Docker Compose Deployment

For a containerised deployment with all services:

The x-granite-env YAML Anchor

The docker-compose.yml uses a YAML anchor to avoid repeating Granite environment variables across services:

x-granite-env: &granite-env
GRANITE_SERVING_BACKEND: ${GRANITE_SERVING_BACKEND:-ollama}
GRANITE_BASE_MODEL: ${GRANITE_BASE_MODEL:-granite3.2:2b}
GRANITE_FINETUNED_MODEL: ${GRANITE_FINETUNED_MODEL:-granite-soc:latest}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
OLLAMA_HOST: ${OLLAMA_HOST:-http://ollama:11434}
VLLM_API_BASE: ${VLLM_API_BASE:-http://vllm:8000}

Why an anchor? Multiple services (API, workers, health-check) need the same Granite settings. The anchor ensures they stay in sync: change it once, and every service that references *granite-env picks up the change.

Each service merges these variables:

services:
api:
environment:
<<: *granite-env
# ... other service-specific vars

Starting Docker Compose

# Start all services (Ollama, API, workers, monitoring)
docker compose up -d

# Check service health
docker compose ps

# View Ollama logs
docker compose logs ollama

# View API logs
docker compose logs api

Docker Compose Service Architecture

Importing Fine-Tuned Models in Docker

After training, import your GGUF into the Docker Ollama instance:

# Copy GGUF into the Ollama container
docker compose cp training/output/generic/unsloth.Q8_0.gguf ollama:/tmp/

# Exec into the container and create the model
docker compose exec ollama ollama create granite-soc:latest \
-f /tmp/Modelfile

# Or use the serve script which handles this automatically
docker compose exec api python training/scripts/serve_model.py ollama \
--gguf /models/generic/unsloth.Q8_0.gguf \
--name granite-soc:latest

Enabling / Disabling Fine-Tuned Models

Use base model only (no fine-tuning)

# .env
GRANITE_USE_FINETUNED=false
GRANITE_BASE_MODEL=granite3.2:2b

All agents will use the same base model. Good for initial development.

Use a single fine-tuned generic model

# .env
GRANITE_USE_FINETUNED=true
GRANITE_FINETUNED_MODEL=granite-soc:latest

All agents share one fine-tuned model. Good after generic training.

Use per-agent fine-tuned models

# .env
GRANITE_USE_FINETUNED=true
GRANITE_FINETUNED_MODEL=granite-soc:latest # fallback for agents without a specialist model

The AGENT_MODEL_MAP in aurorasoc/granite/__init__.py maps each agent to its specialist:

AGENT_MODEL_MAP = {
"security_analyst": "granite-soc-security-analyst:latest",
"threat_hunter": "granite-soc-threat-hunter:latest",
"incident_responder": "granite-soc-incident-responder:latest",
# ...
}

The 4-tier resolution automatically selects the right model:

Override → Per-agent fine-tuned → Generic fine-tuned → Base

See Granite Module for the full resolution logic.

Verification Checklist

Run through this checklist after deployment:

# 1. Ollama is running and responsive
curl -s http://localhost:11434/api/tags | jq '.models[].name'

# 2. Expected models are loaded
ollama list | grep granite

# 3. Inference works
curl -s http://localhost:11434/api/chat -d '{
"model": "granite-soc:latest",
"messages": [{"role": "user", "content": "What is lateral movement?"}],
"stream": false
}' | jq '.message.content'

# 4. API server starts without errors
curl -s http://localhost:8000/health | jq

# 5. Granite module resolves models correctly
python -c "
from aurorasoc.granite import get_default_granite_config
cfg = get_default_granite_config()
for agent in ['security_analyst', 'threat_hunter', 'incident_responder']:
print(f'{agent}: {cfg.resolve_model(agent)}')
"

Troubleshooting

SymptomCauseFix
connection refused :11434Ollama not runningollama serve or systemctl start ollama
model not foundModel not importedollama list → then ollama create or ollama pull
out of memoryGGUF too large for GPUUse smaller quant (Q4_K_M) or set OLLAMA_GPU_LAYERS=0 for CPU
Models load slowlyCold startSet OLLAMA_KEEP_ALIVE=30m to keep model warm
Wrong output formatMissing chat templateRe-import with serve_model.py which generates correct Modelfile
CUDA errorDriver mismatchCheck nvidia-smi and ollama CUDA version compatibility
API returns base model outputGRANITE_USE_FINETUNED=falseSet to true in .env and restart

Production Considerations

For production deployments beyond a single machine:

  • Use vLLM — switch backend for throughput. See Serving Backends.
  • Separate GPU node — run the LLM server on a dedicated GPU machine, point OLLAMA_HOST or VLLM_API_BASE to its IP.
  • Model versioning — tag models with dates (granite-soc:2025-01-15) to enable rollback.
  • Health monitoring — integrate check_ollama_models() / check_vllm_models() into your monitoring stack.
  • GPU metrics — export nvidia-smi metrics to Prometheus via dcgm-exporter.

Next Steps