Agent Fleet Runbook
Use this page after AI Agent Fleet Deployment when you need the operational details behind the AuroraSOC agent mesh.
MVP-1 Readiness Verdict
- All 14 agents (1 orchestrator + 13 specialists) are real BeeAI
RequirementAgentinstances with live LLM round-trips. There are no placeholder or mock agents in the fleet. - The fleet has been smoke-tested end-to-end via
make agents-smokeagainst host-run Ollama on a single shared model tag. - The orchestrator at port
9000plus specialists on9001–9010,9012,9015, and9016are the canonical local topology. - Either
granite4:8b(codebase canonical default) orgranite3.2:8b(verified single-model override) is supported — both are run on the same sharedChatModelpool whenGRANITE_SINGLE_MODEL_MODE=true.
When To Use This Page
- You are debugging agent startup order, discovery, or MCP connectivity.
- You need to decide between the host-run Ollama path and the containerized vLLM path.
- You want the current runtime caveats in one place before changing Compose or environment settings.
MVP-1 Single-Model Lock
For MVP-1 the entire 14-agent mesh runs on a single Ollama model tag —
granite3.2:8b — for both specialists and the orchestrator. The lock is enforced
by three settings, all of which make agents-local and make stack-up set
automatically:
| Setting | Value | Effect |
|---|---|---|
GRANITE_SINGLE_MODEL_MODE | true | Forces every agent to resolve to the same model tag |
GRANITE_USE_SHARED_MODEL_POOL | true | Reuses one BeeAI ChatModel client across agents on the same tag |
OLLAMA_MODEL / OLLAMA_ORCHESTRATOR_MODEL | same value | The shared tag (default granite3.2:8b for MVP-1) |
If you need to experiment with per-agent fine-tunes, flip
GRANITE_SINGLE_MODEL_MODE=false and set GRANITE_USE_FINETUNED=true. Doing so
exits the MVP-1 supported envelope — expect higher VRAM use and longer cold loads.
The fastest verification path is make llm-doctor followed by
make agents-smoke; the latter sends a deterministic prompt to every live
agent and prints a PASS/FAIL summary.
Current Runnable Topology
AuroraSOC currently exposes 14 runnable agents:
- 1 orchestrator
- 13 specialist agents
The source of truth for the specialist list is agent-factory. The orchestrator handoff list is derived from that same live specialist catalog at runtime, so startup probes and A2A delegation stay aligned with the 13 specialist fleet.
Supported Runtime Shapes
| Shape | Backend | Startup surface | Best for | Current caveat |
|---|---|---|---|---|
| Host-run local mesh | Ollama | docker-compose.dev.yml plus host-run MCP servers, API, dashboard, specialists, and orchestrator | Local debugging and iterative development | Requires MCP_CLIENT_HOST=localhost and A2A_CLIENT_HOST=localhost |
| Containerized fleet | vLLM | docker-compose.yml + docker-compose.gpu.yml + --profile agents | Full-fleet validation and GPU-backed inference | Requires NVIDIA GPU support and a Hugging Face token |
The container stack remains vLLM-first. The Ollama path is cleanest when you run the app and agent mesh on the host and keep Compose for shared dependencies.
For a local Ollama mesh, make agents-local MODEL=<installed-ollama-tag> starts all 13 specialist A2A servers and the orchestrator on 127.0.0.1 with A2A_CLIENT_HOST=127.0.0.1, MCP_CLIENT_HOST=127.0.0.1, GRANITE_SINGLE_MODEL_MODE=true, and the same OLLAMA_MODEL/OLLAMA_ORCHESTRATOR_MODEL value. The launcher waits for specialist /health endpoints before starting the orchestrator, so handoff discovery sees ready local agents. Use scripts/run_local_agents.py --agents NetworkAnalyzer,ThreatHunter --model <tag> when you only need a focused subset during development.
Startup Order
Use this order regardless of backend:
- Start the shared data plane: PostgreSQL, Redis, NATS, and Mosquitto.
- Start the chosen LLM backend: Ollama or vLLM.
- Start the MCP domain servers.
- Start the specialist agents.
- Start the orchestrator.
- Start the API, dashboard, and any task workers.
If you reverse steps 3 through 5, the agent processes usually fail fast with MCP binding or A2A discovery errors.
Environment Variables That Matter Most
| Variable | Use | Notes |
|---|---|---|
LLM_BACKEND | Selects ollama, vllm, or openai | All agents use the same backend at runtime |
OLLAMA_BASE_URL | Host-run or container Ollama endpoint | Host-run default is http://localhost:11434 |
VLLM_BASE_URL | Internal vLLM endpoint | Default is http://vllm:8000/v1 inside Compose |
ENABLED_AGENTS | all or comma-separated subset | Parsed in aurorasoc/config/settings.py |
A2A_CLIENT_HOST | Overrides A2A service discovery for host-run agents | Set to localhost for the local-first path |
MCP_CLIENT_HOST | Overrides MCP service discovery for host-run agents | Set to localhost for the local-first path |
SYSTEM_MODE | dummy, dry_run, or real | Keep first bring-up in dummy or dry_run |
Compose Profiles
These profiles are defined in Docker Configuration:
| Profile | Purpose |
|---|---|
agents-core | Baseline orchestrator and triage workflow |
agents-extended | Additional analysis and investigation specialists |
agents-specialized | Deep investigation and reporting specialists |
agents | Full orchestrator plus the complete specialist set |
rust-core | Optional Rust fast path |
Use ENABLED_AGENTS for fine-grained subsets inside a running profile. Use Compose profiles when you want to control which containers start at all.
MCP Runtime Notes
AuroraSOC now loads tools from the domain-isolated MCP servers through aurorasoc.tools.mcp_launcher.
Do not treat make mcp as the preferred startup path for new work. The monolithic registry server is retained for backward compatibility, but the current agent loader resolves domain URLs through settings.mcp.get_domain_url() and expects the domain-specific ports.
For host-run local development:
- Set
MCP_CLIENT_HOST=localhost - Start the domain servers explicitly
- Keep one log file per domain so startup failures are easy to isolate
If PostgreSQL is unavailable during a host-run smoke test, agent startup still proceeds without persisted MCP health state. The affected agent logs the health persistence gap once, skips the rest of the MCP health reads and writes for that startup, attempts live MCP discovery, and starts with zero tools if the bound MCP domains are also unavailable.
For the containerized stack:
- Let Compose service discovery resolve
mcp-siem,mcp-soar, and the other domain names - Keep the shared
MCP_SERVICE_TOKENand mTLS settings aligned if you enable secure transport
Health Checks
Host-Run Ollama Path
Use these checks first:
curl -s http://localhost:8000/health | python -m json.tool
curl -s http://localhost:8000/api/v1/agents/a2a-health | python -m json.tool
curl -s http://localhost:11434/api/tags | python -m json.tool
ss -ltn | grep -E ':9000|:9001|:9010|:9016'
tail -n 20 .logs/mcp/siem.log
tail -n 20 .logs/agents/SecurityAnalyst.log
The a2a-health endpoint is the same live probe used by the Network Command Center. It reports the orchestrator, all enabled specialists, the resolved single model, and each agent's A2A /health result without depending on PostgreSQL.
The dashboard Agent Fleet page now consumes that same live A2A health signal during host-run sessions. When the runtime backend is not controllable from the API layer, the page still shows real agent status and one live replica row per reachable agent, but it intentionally leaves deploy / scale / restart controls read-only instead of pretending they are wired.
Containerized vLLM Path
Use the container runtime you started the stack with:
podman compose ps
curl -s http://localhost:8000/health | python -m json.tool
curl -s http://localhost:8001/v1/models | python -m json.tool
Equivalent Docker commands work the same way.
Common Failure Modes
Agents cannot resolve MCP hosts
Cause: host-run agents are still looking for Compose service DNS names.
Fix: set MCP_CLIENT_HOST=localhost before starting the local-first path.
Orchestrator cannot resolve specialist hosts
Cause: host-run A2A discovery still points at Compose service names.
Fix: set A2A_CLIENT_HOST=localhost before the orchestrator starts.
The vLLM stack reports a port conflict
Cause: the host already has something bound to the API or vLLM ports.
Fix: adjust API_HOST_PORT or VLLM_HOST_PORT in .env, then recreate the containers.
The dashboard loads but investigations stall
Cause: the UI is up, but the LLM backend, MCP domain servers, or specialists are not healthy.
Fix: validate /health, the backend model endpoint, and the relevant agent logs before debugging the dashboard itself.