Skip to main content

Agent Fleet Runbook

Use this page after AI Agent Fleet Deployment when you need the operational details behind the AuroraSOC agent mesh.

MVP-1 Readiness Verdict

  • All 14 agents (1 orchestrator + 13 specialists) are real BeeAI RequirementAgent instances with live LLM round-trips. There are no placeholder or mock agents in the fleet.
  • The fleet has been smoke-tested end-to-end via make agents-smoke against host-run Ollama on a single shared model tag.
  • The orchestrator at port 9000 plus specialists on 9001–9010, 9012, 9015, and 9016 are the canonical local topology.
  • Either granite4:8b (codebase canonical default) or granite3.2:8b (verified single-model override) is supported — both are run on the same shared ChatModel pool when GRANITE_SINGLE_MODEL_MODE=true.

When To Use This Page

  • You are debugging agent startup order, discovery, or MCP connectivity.
  • You need to decide between the host-run Ollama path and the containerized vLLM path.
  • You want the current runtime caveats in one place before changing Compose or environment settings.

MVP-1 Single-Model Lock

For MVP-1 the entire 14-agent mesh runs on a single Ollama model taggranite3.2:8b — for both specialists and the orchestrator. The lock is enforced by three settings, all of which make agents-local and make stack-up set automatically:

SettingValueEffect
GRANITE_SINGLE_MODEL_MODEtrueForces every agent to resolve to the same model tag
GRANITE_USE_SHARED_MODEL_POOLtrueReuses one BeeAI ChatModel client across agents on the same tag
OLLAMA_MODEL / OLLAMA_ORCHESTRATOR_MODELsame valueThe shared tag (default granite3.2:8b for MVP-1)

If you need to experiment with per-agent fine-tunes, flip GRANITE_SINGLE_MODEL_MODE=false and set GRANITE_USE_FINETUNED=true. Doing so exits the MVP-1 supported envelope — expect higher VRAM use and longer cold loads.

The fastest verification path is make llm-doctor followed by make agents-smoke; the latter sends a deterministic prompt to every live agent and prints a PASS/FAIL summary.

Current Runnable Topology

AuroraSOC currently exposes 14 runnable agents:

  • 1 orchestrator
  • 13 specialist agents

The source of truth for the specialist list is agent-factory. The orchestrator handoff list is derived from that same live specialist catalog at runtime, so startup probes and A2A delegation stay aligned with the 13 specialist fleet.

Supported Runtime Shapes

ShapeBackendStartup surfaceBest forCurrent caveat
Host-run local meshOllamadocker-compose.dev.yml plus host-run MCP servers, API, dashboard, specialists, and orchestratorLocal debugging and iterative developmentRequires MCP_CLIENT_HOST=localhost and A2A_CLIENT_HOST=localhost
Containerized fleetvLLMdocker-compose.yml + docker-compose.gpu.yml + --profile agentsFull-fleet validation and GPU-backed inferenceRequires NVIDIA GPU support and a Hugging Face token

The container stack remains vLLM-first. The Ollama path is cleanest when you run the app and agent mesh on the host and keep Compose for shared dependencies.

For a local Ollama mesh, make agents-local MODEL=<installed-ollama-tag> starts all 13 specialist A2A servers and the orchestrator on 127.0.0.1 with A2A_CLIENT_HOST=127.0.0.1, MCP_CLIENT_HOST=127.0.0.1, GRANITE_SINGLE_MODEL_MODE=true, and the same OLLAMA_MODEL/OLLAMA_ORCHESTRATOR_MODEL value. The launcher waits for specialist /health endpoints before starting the orchestrator, so handoff discovery sees ready local agents. Use scripts/run_local_agents.py --agents NetworkAnalyzer,ThreatHunter --model <tag> when you only need a focused subset during development.

Startup Order

Use this order regardless of backend:

  1. Start the shared data plane: PostgreSQL, Redis, NATS, and Mosquitto.
  2. Start the chosen LLM backend: Ollama or vLLM.
  3. Start the MCP domain servers.
  4. Start the specialist agents.
  5. Start the orchestrator.
  6. Start the API, dashboard, and any task workers.

If you reverse steps 3 through 5, the agent processes usually fail fast with MCP binding or A2A discovery errors.

Environment Variables That Matter Most

VariableUseNotes
LLM_BACKENDSelects ollama, vllm, or openaiAll agents use the same backend at runtime
OLLAMA_BASE_URLHost-run or container Ollama endpointHost-run default is http://localhost:11434
VLLM_BASE_URLInternal vLLM endpointDefault is http://vllm:8000/v1 inside Compose
ENABLED_AGENTSall or comma-separated subsetParsed in aurorasoc/config/settings.py
A2A_CLIENT_HOSTOverrides A2A service discovery for host-run agentsSet to localhost for the local-first path
MCP_CLIENT_HOSTOverrides MCP service discovery for host-run agentsSet to localhost for the local-first path
SYSTEM_MODEdummy, dry_run, or realKeep first bring-up in dummy or dry_run

Compose Profiles

These profiles are defined in Docker Configuration:

ProfilePurpose
agents-coreBaseline orchestrator and triage workflow
agents-extendedAdditional analysis and investigation specialists
agents-specializedDeep investigation and reporting specialists
agentsFull orchestrator plus the complete specialist set
rust-coreOptional Rust fast path

Use ENABLED_AGENTS for fine-grained subsets inside a running profile. Use Compose profiles when you want to control which containers start at all.

MCP Runtime Notes

AuroraSOC now loads tools from the domain-isolated MCP servers through aurorasoc.tools.mcp_launcher.

Do not treat make mcp as the preferred startup path for new work. The monolithic registry server is retained for backward compatibility, but the current agent loader resolves domain URLs through settings.mcp.get_domain_url() and expects the domain-specific ports.

For host-run local development:

  • Set MCP_CLIENT_HOST=localhost
  • Start the domain servers explicitly
  • Keep one log file per domain so startup failures are easy to isolate

If PostgreSQL is unavailable during a host-run smoke test, agent startup still proceeds without persisted MCP health state. The affected agent logs the health persistence gap once, skips the rest of the MCP health reads and writes for that startup, attempts live MCP discovery, and starts with zero tools if the bound MCP domains are also unavailable.

For the containerized stack:

  • Let Compose service discovery resolve mcp-siem, mcp-soar, and the other domain names
  • Keep the shared MCP_SERVICE_TOKEN and mTLS settings aligned if you enable secure transport

Health Checks

Host-Run Ollama Path

Use these checks first:

curl -s http://localhost:8000/health | python -m json.tool
curl -s http://localhost:8000/api/v1/agents/a2a-health | python -m json.tool
curl -s http://localhost:11434/api/tags | python -m json.tool
ss -ltn | grep -E ':9000|:9001|:9010|:9016'
tail -n 20 .logs/mcp/siem.log
tail -n 20 .logs/agents/SecurityAnalyst.log

The a2a-health endpoint is the same live probe used by the Network Command Center. It reports the orchestrator, all enabled specialists, the resolved single model, and each agent's A2A /health result without depending on PostgreSQL.

The dashboard Agent Fleet page now consumes that same live A2A health signal during host-run sessions. When the runtime backend is not controllable from the API layer, the page still shows real agent status and one live replica row per reachable agent, but it intentionally leaves deploy / scale / restart controls read-only instead of pretending they are wired.

Containerized vLLM Path

Use the container runtime you started the stack with:

podman compose ps
curl -s http://localhost:8000/health | python -m json.tool
curl -s http://localhost:8001/v1/models | python -m json.tool

Equivalent Docker commands work the same way.

Common Failure Modes

Agents cannot resolve MCP hosts

Cause: host-run agents are still looking for Compose service DNS names.

Fix: set MCP_CLIENT_HOST=localhost before starting the local-first path.

Orchestrator cannot resolve specialist hosts

Cause: host-run A2A discovery still points at Compose service names.

Fix: set A2A_CLIENT_HOST=localhost before the orchestrator starts.

The vLLM stack reports a port conflict

Cause: the host already has something bound to the API or vLLM ports.

Fix: adjust API_HOST_PORT or VLLM_HOST_PORT in .env, then recreate the containers.

The dashboard loads but investigations stall

Cause: the UI is up, but the LLM backend, MCP domain servers, or specialists are not healthy.

Fix: validate /health, the backend model endpoint, and the relevant agent logs before debugging the dashboard itself.