Deployment Modes
AuroraSOC supports three production-shaped deployment topologies. All three keep inference local and never call out to managed LLM providers.
| Mode | LLM location | Application services | When to use |
|---|---|---|---|
| Host stack | Host (Ollama on :11434) | Host processes via make stack-up | Laptops, single-node demos, fastest dev loop |
| Distributed | Host (Ollama on :11434) | Containers via Docker Compose | Multi-tenant lab nodes, GPU-rich hosts where Ollama owns the GPU |
| Remote LLM | Remote Ollama endpoint | Containers via Docker Compose | Detached inference appliance reachable over a private network |
The Network Command Center HITL surface, agent fleet, and case workflow are identical across all three modes — only the LLM transport changes.
1. Host stack (default)
Use make stack-up for laptops or single-node demos. See
Single-Command Host Stack. This mode does not require
Docker for the AuroraSOC application services themselves; only Postgres, Redis,
and NATS run as containers from docker-compose.dev.yml.
2. Distributed (host Ollama + containerised stack)
This mode runs every AuroraSOC service inside Docker (API, dashboard, all fourteen agents, MCP servers) but keeps the LLM on the host so the GPU is exclusively owned by Ollama and never re-bound by container restarts.
The override file docker-compose.host-ollama.yml rewrites every service's
OLLAMA_BASE_URL to http://host.docker.internal:11434 and disables the
in-stack vllm and ollama profiles.
Bring it up
# 1. Make sure the host Ollama daemon is running and has the model
ollama serve &
ollama pull granite3.2:8b
# 2. Validate connectivity before launching the stack
make smoke-distributed-dry-run # prints the planned checks
make smoke-distributed # actually probes Ollama + warms the model
# 3. Launch the full compose stack against host Ollama
make compose-host-ollama
Tear it down
make compose-host-ollama-down
What smoke-distributed checks
scripts/smoke_distributed_stack.sh runs five checks and exits non-zero on the
first failure:
- A compose runtime (
dockerorpodman) is onPATH. docker-compose.host-ollama.ymlexists and pinshost.docker.internal.- The host Ollama endpoint (
OLLAMA_BASE_URL, defaulthttp://localhost:11434) responds on/api/tags. - The required model tag (
OLLAMA_MODEL, defaultgranite4:8b) is present. - A
num_predict=1warmup call against/api/generatesucceeds — this is also the cold-load timer for the first prompt.
If the API stack is already up the script also probes /health on
API_HOST_PORT (default 8000); when it isn't up the check is skipped.
Choosing the model
The mandate for MVP-1 is granite3.2:8b with GRANITE_SINGLE_MODEL_MODE=true.
Override per invocation:
OLLAMA_MODEL=granite3.2:8b \
OLLAMA_ORCHESTRATOR_MODEL=granite3.2:8b \
make compose-host-ollama
The same env vars are honoured by make smoke-distributed.
3. Remote LLM
When the LLM lives on a different host (an inference appliance, a colleague's
GPU box, or a dedicated lab node) point both OLLAMA_BASE_URL and
OLLAMA_DOCKER_BASE_URL at the remote endpoint:
# Validate first
bash scripts/smoke_distributed_stack.sh --remote-llm-url http://10.0.0.42:11434
# Then bring up the stack
OLLAMA_DOCKER_BASE_URL=http://10.0.0.42:11434 \
OLLAMA_BASE_URL=http://10.0.0.42:11434 \
make compose-host-ollama
The override still applies (extra_hosts: host.docker.internal:host-gateway)
but containers will resolve the env-supplied URL first.
Network requirement. Remote LLM mode assumes a private/trusted link between the AuroraSOC host and the inference node. Ollama does not authenticate clients, so expose it only on a private interface or behind a reverse proxy that does.
Troubleshooting
| Symptom | Resolution |
|---|---|
Ollama not reachable at http://localhost:11434 | Start the daemon: ollama serve. On Linux confirm it isn't bound to 127.0.0.1 only when remote containers need it. |
model not found: granite4:8b | Pull it: ollama pull granite4:8b (or override OLLAMA_MODEL). |
| Containers can resolve the host but get connection refused | The host Ollama is bound to 127.0.0.1. Re-run with OLLAMA_HOST=0.0.0.0 ollama serve. |
| First prompt is slow | Expected — the warmup call performed by make smoke-distributed keeps the model resident. |