Agent Fleet Live Smoke
make agents-smoke sends a deterministic prompt to every live A2A agent in the mesh (the orchestrator on port 9000 plus all 13 specialists) and prints a PASS/FAIL matrix. A green run proves the entire LLM chat path — settings → Granite config → shared ChatModel pool → BeeAI → LiteLLM → Ollama — works end-to-end against the locked single model.
When To Use
- After
make agents-localbrings up the mesh, before driving the dashboard. - After changing
OLLAMA_MODEL,aurorasoc/granite/registry.py, or any agent system prompt. - As the gate before commits that touch the agent runtime.
Prerequisites
- Ollama is running on the host and the model tag is pulled (default:
granite3.2:8b). Runmake llm-doctorfirst. - The full mesh is up:
make agents-local. All 14 ports (9000,9001–9010,9012,9015,9016) must be listening. - The
MCP_HEALTH_PROBE_ENABLED=falseflag is acceptable for this smoke — the harness only exercises the chat path, not MCP tools.
Running The Smoke
make agents-smoke
This is a thin wrapper around:
./.venv/bin/python scripts/smoke_agent_fleet.py
Useful Flags
| Flag | Purpose |
|---|---|
--agents SecurityAnalyst,NetworkAnalyzer | Probe a subset only. |
--no-orchestrator | Skip port 9000. |
--prompt "..." | Use a custom prompt instead of the deterministic ready check. |
--timeout 180 | Increase per-agent timeout (default 90 s). |
--parallel | Probe all agents concurrently (more contention on the shared model). |
--json | Emit a machine-readable payload for CI gates. |
Reading The Matrix
AGENT PORT STATUS TIME RESPONSE / ERROR
--------------------------------------------------------------------------------
Orchestrator 9000 PASS 3851 ms I am ready to assume the role o...
SecurityAnalyst 9001 PASS 2654 ms I am ready to analyze security ...
...
NetworkAnalyzer 9016 PASS 2978 ms I am ready to analyze network t...
--------------------------------------------------------------------------------
14/14 agents responded.
- PASS — the agent returned non-empty text within
--timeoutseconds. - FAIL — connection error, timeout, or an empty response. The error string is printed inline.
- The script exits non-zero if any agent fails, so you can wire it into CI or a pre-commit gate.
Common Failure Patterns
| Symptom | Likely cause | Fix |
|---|---|---|
All agents FAIL with ConnectError | Mesh not running | make agents-local, then re-run. |
| One port FAILs, rest PASS | That specialist crashed during startup | Check /tmp/agents-mesh.log (or your launcher log) for that agent's traceback. |
| Multiple FAILs after a model swap | Wrong OLLAMA_MODEL or model not pulled | make llm-doctor then make ollama-pull-granite. |
| Timeouts on the first probe only | Cold start of the shared model | Re-run; subsequent probes reuse the warm ChatModel. |