إنتقل إلى المحتوى الرئيسي

Deployment

What this page is

How to deploy AuroraSOC in development, single-node production, and multi-node production configurations. It covers Docker Compose (the primary method) and Helm (for Kubernetes clusters).

Why it exists this way

AuroraSOC runs on-premises by design. No cloud dependency, no SaaS fallback. The deployment surface must be documented precisely so an operator can bring the platform up without guessing at configuration or reading source code.

How to deploy

Step 1: Prerequisites

See System Requirements for hardware profiles and software prerequisites.

Step 2: Configure environment

Copy the example environment file and fill in real values:

cp .env.example .env

The following variables must be set before the stack starts. The settings validator rejects the placeholder defaults so you cannot accidentally ship them.

VariableHow to generatePurpose
JWT_SECRET_KEYopenssl rand -hex 32HMAC-SHA256 token signing
API_SERVICE_KEYopenssl rand -hex 32Internal service-to-service auth
PG_PASSWORDopenssl rand -base64 24PostgreSQL password
REDIS_PASSWORDopenssl rand -base64 24Redis password
OTEL_BASIC_AUTH_PASSWORDopenssl rand -base64 24OpenTelemetry collector auth
GRAFANA_PASSWORDChoose a strong passwordGrafana admin password

For LLM inference, set LLM_BACKEND to one of:

  • ollama (default, CPU-friendly, no GPU required)
  • vllm (GPU required, production default)
  • openai (OpenAI-compatible HTTP API, local or remote)
  • deepseek, gemini, anthropic (hosted providers)

See the LLM Providers runbook for configuration details per backend.

The compose files live under infra/compose/. The base stack file brings up PostgreSQL, Redis, NATS, the API, the MCP servers, the dashboard, Suricata, the observability stack, and Vault. The A2A agent fleet is opt-in via the agents profile.

From a clone with .env filled in (including LLM_DEEPSEEK_API_KEY), one command brings up the whole system: the API, all 11 agents, every MCP server, and the dashboard, with inference on DeepSeek V4 Flash. It waits for the API, applies migrations, seeds the default admin, and prints a summary.

just stack-up-fleet

This wraps the docker-compose.llm-api.yml and docker-compose.deepseek.yml overlays with the agents profile. After a code change, recreate the affected service with just stack-rebuild api; check health with just stack-status; stop with just stack-down (volumes preserved).

Development stack

The lightweight development compose file starts only the data-plane services (PostgreSQL, Redis, NATS, Ollama) without the full agent mesh:

docker compose -f infra/compose/docker-compose.dev.yml up -d

Then run the backend and dashboard locally as described in the developer setup guide.

Production stack (single node)

docker compose -f infra/compose/docker-compose.yml up -d

Compose overlay files are available for specific scenarios:

FilePurpose
docker-compose.ymlFull production stack (vLLM default)
docker-compose.dev.ymlLightweight dev stack (data-plane only)
docker-compose.gpu.ymlGPU passthrough for vLLM on consumer GPUs
docker-compose.host-ollama.ymlUse a host-installed Ollama instead of containerised
docker-compose.llm-api.ymlOpenAI-compatible inference; routes the fleet (used by just stack-up-fleet)
docker-compose.deepseek.ymlDeepSeek fleet: api SYSTEM_MODE + MCP health probe (just stack-up-fleet)
docker-compose.site-b.ymlMulti-site federation (second site)
docker-compose.spire.ymlSPIFFE/SPIRE identity overlay
docker-compose.training.ymlGranite fine-tuning pipeline

Overlay files are composed with the base:

docker compose \
-f infra/compose/docker-compose.yml \
-f infra/compose/docker-compose.gpu.yml \
up -d

Container networks

The stack uses four isolated networks. See System Requirements for the network diagram.

  • frontend: Dashboard and API (user-facing).
  • agent-mesh: Orchestrator, specialist agents, Ollama, OTel.
  • data-plane: PostgreSQL, Redis, NATS, Prometheus.
  • iot-plane: Mosquitto MQTT, Rust core (CPS devices).

Traffic flows through the API gateway; the dashboard cannot directly communicate with agents, and IoT devices cannot reach the database.

Step 3b: Kubernetes (Helm)

A Helm chart is provided at infra/helm/aurorasoc/. It requires Kubernetes 1.28 or later.

helm install aurorasoc infra/helm/aurorasoc/ \
--namespace aurorasoc \
--create-namespace \
--set config.JWT_SECRET_KEY="$(openssl rand -hex 32)" \
--set config.API_SERVICE_KEY="$(openssl rand -hex 32)" \
--set config.PG_PASSWORD="$(openssl rand -base64 24)" \
--set redis.auth.password="$(openssl rand -base64 24)" \
--set postgresql.auth.password="$(openssl rand -base64 24)"

The chart includes:

  • Deployments for the API, dashboard, orchestrator, and specialist agents.
  • MCP server deployments (Splunk, Elastic, CrowdStrike, and more).
  • Bitnami subcharts for PostgreSQL and Redis.
  • Embedded NATS with JetStream.
  • Ollama with configurable CPU or GPU mode.
  • Suricata with host networking for packet capture.
  • Ingress with TLS (cert-manager).
  • NetworkPolicy (default deny), PodDisruptionBudget, HPA.
  • ServiceMonitor for Prometheus scraping.

Edit values.yaml for your environment. The tier field (small, mid, large) scales replica counts and resource limits.

Step 4: Verify

After the stack is running:

  1. Open the operator console at http://<host>:3000.
  2. Sign in with the admin credentials (logged to backend stdout on first run for local auth, or via your OIDC/SAML provider).
  3. Confirm the Security Overview dashboard shows healthy platform status (MCP servers and agent fleet indicators).
  4. Check the API health endpoint: curl http://<host>:8000/health.

Step 5: Pull the LLM model

If using Ollama:

docker compose exec ollama ollama pull granite4:8b

If using vLLM, the model downloads automatically on first start from Hugging Face (set HF_TOKEN in .env for gated models).

What goes wrong and how do you fix it

  • Compose fails with "JWT_SECRET_KEY must be set". The .env file is missing or the placeholder values were not replaced. Run cp .env.example .env and set the required variables.
  • PostgreSQL healthcheck fails. A data volume from a previous run has a different password. Delete the volume (docker volume rm aurorasoc_pg-data) or set PG_PASSWORD in .env to match the existing volume.
  • vLLM container restarts with OOM. The GPU does not have enough VRAM for the model at the configured VLLM_MAX_MODEL_LEN. Reduce VLLM_MAX_MODEL_LEN in .env or switch to Ollama (LLM_BACKEND=ollama).
  • Agents show "no LLM backend available". The LLM service has not finished starting. Ollama model pulls can take several minutes on first run. Check docker compose logs ollama or docker compose logs vllm.
  • Helm install fails with "chart requires Kubernetes 1.28+". The cluster version is too old. Upgrade the cluster or use Docker Compose instead.