Agent Fleet Runbook

Use this page after AI Agent Fleet Deployment when you need the operational details behind the AuroraSOC agent mesh.

MVP-1 Readiness Verdict

All 14 agents (1 orchestrator + 13 specialists) are real BeeAI RequirementAgent instances with live LLM round-trips. There are no placeholder or mock agents in the fleet.
The fleet has been smoke-tested end-to-end via make agents-smoke against host-run Ollama on a single shared model tag.
The orchestrator at port 9000 plus specialists on 9001–9010, 9012, 9015, and 9016 are the canonical local topology.
Either granite4:8b (codebase canonical default) or granite3.2:8b (verified single-model override) is supported — both are run on the same shared ChatModel pool when GRANITE_SINGLE_MODEL_MODE=true.

When To Use This Page

You are debugging agent startup order, discovery, or MCP connectivity.
You need to decide between the host-run Ollama path and the containerized vLLM path.
You want the current runtime caveats in one place before changing Compose or environment settings.

MVP-1 Single-Model Lock

For MVP-1 the entire 14-agent mesh runs on a single Ollama model tag — granite3.2:8b — for both specialists and the orchestrator. The lock is enforced by three settings, all of which make agents-local and make stack-up set automatically:

Setting	Value	Effect
`GRANITE_SINGLE_MODEL_MODE`	`true`	Forces every agent to resolve to the same model tag
`GRANITE_USE_SHARED_MODEL_POOL`	`true`	Reuses one BeeAI ChatModel client across agents on the same tag
`OLLAMA_MODEL` / `OLLAMA_ORCHESTRATOR_MODEL`	same value	The shared tag (default `granite3.2:8b` for MVP-1)

If you need to experiment with per-agent fine-tunes, flip GRANITE_SINGLE_MODEL_MODE=false and set GRANITE_USE_FINETUNED=true. Doing so exits the MVP-1 supported envelope — expect higher VRAM use and longer cold loads.

The fastest verification path is make llm-doctor followed by make agents-smoke; the latter sends a deterministic prompt to every live agent and prints a PASS/FAIL summary.

Current Runnable Topology

AuroraSOC currently exposes 14 runnable agents:

1 orchestrator
13 specialist agents

The source of truth for the specialist list is agent-factory. The orchestrator handoff list is derived from that same live specialist catalog at runtime, so startup probes and A2A delegation stay aligned with the 13 specialist fleet.

Supported Runtime Shapes

Shape	Backend	Startup surface	Best for	Current caveat
Host-run local mesh	Ollama	`docker-compose.dev.yml` plus host-run MCP servers, API, dashboard, specialists, and orchestrator	Local debugging and iterative development	Requires `MCP_CLIENT_HOST=localhost` and `A2A_CLIENT_HOST=localhost`
Containerized fleet	vLLM	`docker-compose.yml` + `docker-compose.gpu.yml` + `--profile agents`	Full-fleet validation and GPU-backed inference	Requires NVIDIA GPU support and a Hugging Face token

The container stack remains vLLM-first. The Ollama path is cleanest when you run the app and agent mesh on the host and keep Compose for shared dependencies.

For a local Ollama mesh, make agents-local MODEL=<installed-ollama-tag> starts all 13 specialist A2A servers and the orchestrator on 127.0.0.1 with A2A_CLIENT_HOST=127.0.0.1, MCP_CLIENT_HOST=127.0.0.1, GRANITE_SINGLE_MODEL_MODE=true, and the same OLLAMA_MODEL/OLLAMA_ORCHESTRATOR_MODEL value. The launcher waits for specialist /health endpoints before starting the orchestrator, so handoff discovery sees ready local agents. Use scripts/run_local_agents.py --agents NetworkAnalyzer,ThreatHunter --model <tag> when you only need a focused subset during development.

Startup Order

Use this order regardless of backend:

Start the shared data plane: PostgreSQL, Redis, NATS, and Mosquitto.
Start the chosen LLM backend: Ollama or vLLM.
Start the MCP domain servers.
Start the specialist agents.
Start the orchestrator.
Start the API, dashboard, and any task workers.

If you reverse steps 3 through 5, the agent processes usually fail fast with MCP binding or A2A discovery errors.

Environment Variables That Matter Most

Variable	Use	Notes
`LLM_BACKEND`	Selects `ollama`, `vllm`, or `openai`	All agents use the same backend at runtime
`OLLAMA_BASE_URL`	Host-run or container Ollama endpoint	Host-run default is `http://localhost:11434`
`VLLM_BASE_URL`	Internal vLLM endpoint	Default is `http://vllm:8000/v1` inside Compose
`ENABLED_AGENTS`	`all` or comma-separated subset	Parsed in `aurorasoc/config/settings.py`
`A2A_CLIENT_HOST`	Overrides A2A service discovery for host-run agents	Set to `localhost` for the local-first path
`MCP_CLIENT_HOST`	Overrides MCP service discovery for host-run agents	Set to `localhost` for the local-first path
`SYSTEM_MODE`	`dummy`, `dry_run`, or `real`	Keep first bring-up in `dummy` or `dry_run`

Compose Profiles

These profiles are defined in Docker Configuration:

Profile	Purpose
`agents-core`	Baseline orchestrator and triage workflow
`agents-extended`	Additional analysis and investigation specialists
`agents-specialized`	Deep investigation and reporting specialists
`agents`	Full orchestrator plus the complete specialist set
`rust-core`	Optional Rust fast path

Use ENABLED_AGENTS for fine-grained subsets inside a running profile. Use Compose profiles when you want to control which containers start at all.

MCP Runtime Notes

AuroraSOC now loads tools from the domain-isolated MCP servers through aurorasoc.tools.mcp_launcher.

Do not treat make mcp as the preferred startup path for new work. The monolithic registry server is retained for backward compatibility, but the current agent loader resolves domain URLs through settings.mcp.get_domain_url() and expects the domain-specific ports.

For host-run local development:

Set MCP_CLIENT_HOST=localhost
Start the domain servers explicitly
Keep one log file per domain so startup failures are easy to isolate

If PostgreSQL is unavailable during a host-run smoke test, agent startup still proceeds without persisted MCP health state. The affected agent logs the health persistence gap once, skips the rest of the MCP health reads and writes for that startup, attempts live MCP discovery, and starts with zero tools if the bound MCP domains are also unavailable.

For the containerized stack:

Let Compose service discovery resolve mcp-siem, mcp-soar, and the other domain names
Keep the shared MCP_SERVICE_TOKEN and mTLS settings aligned if you enable secure transport

Health Checks

Host-Run Ollama Path

Use these checks first:

curl -s http://localhost:8000/health | python -m json.tool
curl -s http://localhost:8000/api/v1/agents/a2a-health | python -m json.tool
curl -s http://localhost:11434/api/tags | python -m json.tool
ss -ltn | grep -E ':9000|:9001|:9010|:9016'
tail -n 20 .logs/mcp/siem.log
tail -n 20 .logs/agents/SecurityAnalyst.log

The a2a-health endpoint is the same live probe used by the Network Command Center. It reports the orchestrator, all enabled specialists, the resolved single model, and each agent's A2A /health result without depending on PostgreSQL.

The dashboard Agent Fleet page now consumes that same live A2A health signal during host-run sessions. When the runtime backend is not controllable from the API layer, the page still shows real agent status and one live replica row per reachable agent, but it intentionally leaves deploy / scale / restart controls read-only instead of pretending they are wired.

Containerized vLLM Path

Use the container runtime you started the stack with:

podman compose ps
curl -s http://localhost:8000/health | python -m json.tool
curl -s http://localhost:8001/v1/models | python -m json.tool

Equivalent Docker commands work the same way.

Common Failure Modes

Agents cannot resolve MCP hosts

Cause: host-run agents are still looking for Compose service DNS names.

Fix: set MCP_CLIENT_HOST=localhost before starting the local-first path.

Orchestrator cannot resolve specialist hosts

Cause: host-run A2A discovery still points at Compose service names.

Fix: set A2A_CLIENT_HOST=localhost before the orchestrator starts.

The vLLM stack reports a port conflict

Cause: the host already has something bound to the API or vLLM ports.

Fix: adjust API_HOST_PORT or VLLM_HOST_PORT in .env, then recreate the containers.

The dashboard loads but investigations stall

Cause: the UI is up, but the LLM backend, MCP domain servers, or specialists are not healthy.

Fix: validate /health, the backend model endpoint, and the relevant agent logs before debugging the dashboard itself.

MVP-1 Readiness Verdict​

When To Use This Page​

MVP-1 Single-Model Lock​

Current Runnable Topology​

Supported Runtime Shapes​

Startup Order​

Environment Variables That Matter Most​

Compose Profiles​

MCP Runtime Notes​

Health Checks​

Host-Run Ollama Path​

Containerized vLLM Path​

Common Failure Modes​

Agents cannot resolve MCP hosts​

Orchestrator cannot resolve specialist hosts​

The vLLM stack reports a port conflict​

The dashboard loads but investigations stall​

Related Pages​