Skip to main content

Deployment Modes

AuroraSOC supports three production-shaped deployment topologies. All three keep inference local and never call out to managed LLM providers.

ModeLLM locationApplication servicesWhen to use
Host stackHost (Ollama on :11434)Host processes via make stack-upLaptops, single-node demos, fastest dev loop
DistributedHost (Ollama on :11434)Containers via Docker ComposeMulti-tenant lab nodes, GPU-rich hosts where Ollama owns the GPU
Remote LLMRemote Ollama endpointContainers via Docker ComposeDetached inference appliance reachable over a private network

The Network Command Center HITL surface, agent fleet, and case workflow are identical across all three modes — only the LLM transport changes.

1. Host stack (default)

Use make stack-up for laptops or single-node demos. See Single-Command Host Stack. This mode does not require Docker for the AuroraSOC application services themselves; only Postgres, Redis, and NATS run as containers from docker-compose.dev.yml.

2. Distributed (host Ollama + containerised stack)

This mode runs every AuroraSOC service inside Docker (API, dashboard, all fourteen agents, MCP servers) but keeps the LLM on the host so the GPU is exclusively owned by Ollama and never re-bound by container restarts.

The override file docker-compose.host-ollama.yml rewrites every service's OLLAMA_BASE_URL to http://host.docker.internal:11434 and disables the in-stack vllm and ollama profiles.

Bring it up

# 1. Make sure the host Ollama daemon is running and has the model
ollama serve &
ollama pull granite3.2:8b

# 2. Validate connectivity before launching the stack
make smoke-distributed-dry-run # prints the planned checks
make smoke-distributed # actually probes Ollama + warms the model

# 3. Launch the full compose stack against host Ollama
make compose-host-ollama

Tear it down

make compose-host-ollama-down

What smoke-distributed checks

scripts/smoke_distributed_stack.sh runs five checks and exits non-zero on the first failure:

  1. A compose runtime (docker or podman) is on PATH.
  2. docker-compose.host-ollama.yml exists and pins host.docker.internal.
  3. The host Ollama endpoint (OLLAMA_BASE_URL, default http://localhost:11434) responds on /api/tags.
  4. The required model tag (OLLAMA_MODEL, default granite4:8b) is present.
  5. A num_predict=1 warmup call against /api/generate succeeds — this is also the cold-load timer for the first prompt.

If the API stack is already up the script also probes /health on API_HOST_PORT (default 8000); when it isn't up the check is skipped.

Choosing the model

The mandate for MVP-1 is granite3.2:8b with GRANITE_SINGLE_MODEL_MODE=true. Override per invocation:

OLLAMA_MODEL=granite3.2:8b \
OLLAMA_ORCHESTRATOR_MODEL=granite3.2:8b \
make compose-host-ollama

The same env vars are honoured by make smoke-distributed.

3. Remote LLM

When the LLM lives on a different host (an inference appliance, a colleague's GPU box, or a dedicated lab node) point both OLLAMA_BASE_URL and OLLAMA_DOCKER_BASE_URL at the remote endpoint:

# Validate first
bash scripts/smoke_distributed_stack.sh --remote-llm-url http://10.0.0.42:11434

# Then bring up the stack
OLLAMA_DOCKER_BASE_URL=http://10.0.0.42:11434 \
OLLAMA_BASE_URL=http://10.0.0.42:11434 \
make compose-host-ollama

The override still applies (extra_hosts: host.docker.internal:host-gateway) but containers will resolve the env-supplied URL first.

Network requirement. Remote LLM mode assumes a private/trusted link between the AuroraSOC host and the inference node. Ollama does not authenticate clients, so expose it only on a private interface or behind a reverse proxy that does.

Troubleshooting

SymptomResolution
Ollama not reachable at http://localhost:11434Start the daemon: ollama serve. On Linux confirm it isn't bound to 127.0.0.1 only when remote containers need it.
model not found: granite4:8bPull it: ollama pull granite4:8b (or override OLLAMA_MODEL).
Containers can resolve the host but get connection refusedThe host Ollama is bound to 127.0.0.1. Re-run with OLLAMA_HOST=0.0.0.0 ollama serve.
First prompt is slowExpected — the warmup call performed by make smoke-distributed keeps the model resident.