Skip to main content

Agent Fleet Live Smoke

make agents-smoke sends a deterministic prompt to every live A2A agent in the mesh (the orchestrator on port 9000 plus all 13 specialists) and prints a PASS/FAIL matrix. A green run proves the entire LLM chat path — settings → Granite config → shared ChatModel pool → BeeAI → LiteLLM → Ollama — works end-to-end against the locked single model.

When To Use

  • After make agents-local brings up the mesh, before driving the dashboard.
  • After changing OLLAMA_MODEL, aurorasoc/granite/registry.py, or any agent system prompt.
  • As the gate before commits that touch the agent runtime.

Prerequisites

  • Ollama is running on the host and the model tag is pulled (default: granite3.2:8b). Run make llm-doctor first.
  • The full mesh is up: make agents-local. All 14 ports (9000, 90019010, 9012, 9015, 9016) must be listening.
  • The MCP_HEALTH_PROBE_ENABLED=false flag is acceptable for this smoke — the harness only exercises the chat path, not MCP tools.

Running The Smoke

make agents-smoke

This is a thin wrapper around:

./.venv/bin/python scripts/smoke_agent_fleet.py

Useful Flags

FlagPurpose
--agents SecurityAnalyst,NetworkAnalyzerProbe a subset only.
--no-orchestratorSkip port 9000.
--prompt "..."Use a custom prompt instead of the deterministic ready check.
--timeout 180Increase per-agent timeout (default 90 s).
--parallelProbe all agents concurrently (more contention on the shared model).
--jsonEmit a machine-readable payload for CI gates.

Reading The Matrix

AGENT PORT STATUS TIME RESPONSE / ERROR
--------------------------------------------------------------------------------
Orchestrator 9000 PASS 3851 ms I am ready to assume the role o...
SecurityAnalyst 9001 PASS 2654 ms I am ready to analyze security ...
...
NetworkAnalyzer 9016 PASS 2978 ms I am ready to analyze network t...
--------------------------------------------------------------------------------
14/14 agents responded.
  • PASS — the agent returned non-empty text within --timeout seconds.
  • FAIL — connection error, timeout, or an empty response. The error string is printed inline.
  • The script exits non-zero if any agent fails, so you can wire it into CI or a pre-commit gate.

Common Failure Patterns

SymptomLikely causeFix
All agents FAIL with ConnectErrorMesh not runningmake agents-local, then re-run.
One port FAILs, rest PASSThat specialist crashed during startupCheck /tmp/agents-mesh.log (or your launcher log) for that agent's traceback.
Multiple FAILs after a model swapWrong OLLAMA_MODEL or model not pulledmake llm-doctor then make ollama-pull-granite.
Timeouts on the first probe onlyCold start of the shared modelRe-run; subsequent probes reuse the warm ChatModel.