Deployment Modes

AuroraSOC supports three production-shaped deployment topologies. All three keep inference local and never call out to managed LLM providers.

Mode	LLM location	Application services	When to use
Host stack	Host (Ollama on `:11434`)	Host processes via `make stack-up`	Laptops, single-node demos, fastest dev loop
Distributed	Host (Ollama on `:11434`)	Containers via Docker Compose	Multi-tenant lab nodes, GPU-rich hosts where Ollama owns the GPU
Remote LLM	Remote Ollama endpoint	Containers via Docker Compose	Detached inference appliance reachable over a private network

The Network Command Center HITL surface, agent fleet, and case workflow are identical across all three modes — only the LLM transport changes.

1. Host stack (default)

Use make stack-up for laptops or single-node demos. See Single-Command Host Stack. This mode does not require Docker for the AuroraSOC application services themselves; only Postgres, Redis, and NATS run as containers from docker-compose.dev.yml.

2. Distributed (host Ollama + containerised stack)

This mode runs every AuroraSOC service inside Docker (API, dashboard, all fourteen agents, MCP servers) but keeps the LLM on the host so the GPU is exclusively owned by Ollama and never re-bound by container restarts.

The override file docker-compose.host-ollama.yml rewrites every service's OLLAMA_BASE_URL to http://host.docker.internal:11434 and disables the in-stack vllm and ollama profiles.

Bring it up

# 1. Make sure the host Ollama daemon is running and has the model
ollama serve &
ollama pull granite3.2:8b

# 2. Validate connectivity before launching the stack
make smoke-distributed-dry-run     # prints the planned checks
make smoke-distributed             # actually probes Ollama + warms the model

# 3. Launch the full compose stack against host Ollama
make compose-host-ollama

Tear it down

make compose-host-ollama-down

What `smoke-distributed` checks

scripts/smoke_distributed_stack.sh runs five checks and exits non-zero on the first failure:

A compose runtime (docker or podman) is on PATH.
docker-compose.host-ollama.yml exists and pins host.docker.internal.
The host Ollama endpoint (OLLAMA_BASE_URL, default http://localhost:11434) responds on /api/tags.
The required model tag (OLLAMA_MODEL, default granite4:8b) is present.
A num_predict=1 warmup call against /api/generate succeeds — this is also the cold-load timer for the first prompt.

If the API stack is already up the script also probes /health on API_HOST_PORT (default 8000); when it isn't up the check is skipped.

Choosing the model

The mandate for MVP-1 is granite3.2:8b with GRANITE_SINGLE_MODEL_MODE=true. Override per invocation:

OLLAMA_MODEL=granite3.2:8b \
OLLAMA_ORCHESTRATOR_MODEL=granite3.2:8b \
make compose-host-ollama

The same env vars are honoured by make smoke-distributed.

3. Remote LLM

When the LLM lives on a different host (an inference appliance, a colleague's GPU box, or a dedicated lab node) point both OLLAMA_BASE_URL and OLLAMA_DOCKER_BASE_URL at the remote endpoint:

# Validate first
bash scripts/smoke_distributed_stack.sh --remote-llm-url http://10.0.0.42:11434

# Then bring up the stack
OLLAMA_DOCKER_BASE_URL=http://10.0.0.42:11434 \
OLLAMA_BASE_URL=http://10.0.0.42:11434 \
make compose-host-ollama

The override still applies (extra_hosts: host.docker.internal:host-gateway) but containers will resolve the env-supplied URL first.

Network requirement. Remote LLM mode assumes a private/trusted link between the AuroraSOC host and the inference node. Ollama does not authenticate clients, so expose it only on a private interface or behind a reverse proxy that does.

Troubleshooting

Symptom	Resolution
`Ollama not reachable at http://localhost:11434`	Start the daemon: `ollama serve`. On Linux confirm it isn't bound to `127.0.0.1` only when remote containers need it.
`model not found: granite4:8b`	Pull it: `ollama pull granite4:8b` (or override `OLLAMA_MODEL`).
Containers can resolve the host but get connection refused	The host Ollama is bound to `127.0.0.1`. Re-run with `OLLAMA_HOST=0.0.0.0 ollama serve`.
First prompt is slow	Expected — the warmup call performed by `make smoke-distributed` keeps the model resident.

1. Host stack (default)​

2. Distributed (host Ollama + containerised stack)​

Bring it up​

Tear it down​

What smoke-distributed checks​

Choosing the model​

3. Remote LLM​

Troubleshooting​

Related runbooks​