Agent Fleet Live Smoke

make agents-smoke sends a deterministic prompt to every live A2A agent in the mesh (the orchestrator on port 9000 plus all 13 specialists) and prints a PASS/FAIL matrix. A green run proves the entire LLM chat path — settings → Granite config → shared ChatModel pool → BeeAI → LiteLLM → Ollama — works end-to-end against the locked single model.

When To Use

After make agents-local brings up the mesh, before driving the dashboard.
After changing OLLAMA_MODEL, aurorasoc/granite/registry.py, or any agent system prompt.
As the gate before commits that touch the agent runtime.

Prerequisites

Ollama is running on the host and the model tag is pulled (default: granite3.2:8b). Run make llm-doctor first.
The full mesh is up: make agents-local. All 14 ports (9000, 9001–9010, 9012, 9015, 9016) must be listening.
The MCP_HEALTH_PROBE_ENABLED=false flag is acceptable for this smoke — the harness only exercises the chat path, not MCP tools.

Running The Smoke

make agents-smoke

This is a thin wrapper around:

./.venv/bin/python scripts/smoke_agent_fleet.py

Useful Flags

Flag	Purpose
`--agents SecurityAnalyst,NetworkAnalyzer`	Probe a subset only.
`--no-orchestrator`	Skip port 9000.
`--prompt "..."`	Use a custom prompt instead of the deterministic ready check.
`--timeout 180`	Increase per-agent timeout (default 90 s).
`--parallel`	Probe all agents concurrently (more contention on the shared model).
`--json`	Emit a machine-readable payload for CI gates.

Reading The Matrix

AGENT                  PORT   STATUS       TIME  RESPONSE / ERROR
--------------------------------------------------------------------------------
Orchestrator           9000   PASS      3851 ms  I am ready to assume the role o...
SecurityAnalyst        9001   PASS      2654 ms  I am ready to analyze security ...
...
NetworkAnalyzer        9016   PASS      2978 ms  I am ready to analyze network t...
--------------------------------------------------------------------------------
14/14 agents responded.

PASS — the agent returned non-empty text within --timeout seconds.
FAIL — connection error, timeout, or an empty response. The error string is printed inline.
The script exits non-zero if any agent fails, so you can wire it into CI or a pre-commit gate.

Common Failure Patterns

Symptom	Likely cause	Fix
All agents FAIL with `ConnectError`	Mesh not running	`make agents-local`, then re-run.
One port FAILs, rest PASS	That specialist crashed during startup	Check `/tmp/agents-mesh.log` (or your launcher log) for that agent's traceback.
Multiple FAILs after a model swap	Wrong `OLLAMA_MODEL` or model not pulled	`make llm-doctor` then `make ollama-pull-granite`.
Timeouts on the first probe only	Cold start of the shared model	Re-run; subsequent probes reuse the warm `ChatModel`.

When To Use​

Prerequisites​

Running The Smoke​

Useful Flags​

Reading The Matrix​

Common Failure Patterns​

Related​