Skip to main content

Agent fleet consolidation

What this page is

The shape of the consolidated specialist pool after ADR 004. Covers the agent registry, the orchestrator's dispatch path, and the per-agent contract that downstream tools and the operator console rely on.

Why it exists this way

Before consolidation, the agent fleet was a mix of hand-registered Python classes and shell-launched processes that disagreed on lifecycle (some restarted on crash, some did not), on logging shape (structured vs not), and on policy gate (some checked the system mode, some did not). ADR 004 picked a single in-process BeeAI Agent subclass per specialist and moved orchestration into aurorasoc.workflows.investigation.

How it works

The registry lives at packages/backend/aurorasoc/agents/factory.py. AgentFactory is the only API the rest of the backend uses to construct an agent; it reads the registry, resolves the inference backend per ADR 005, wires the per-agent tool list, and returns a configured BeeAI Agent ready to serve task dispatches.

Nine specialists ship in this configuration:

  • orchestrator, dispatch router; does not call tools itself.
  • threat-hunter, Sigma and IOC pivots; the canary path in the LLM evaluation harness wires through this agent first.
  • malware-analyst, sandbox dispatch, YARA scanning.
  • incident-responder, playbook execution under the approval gate.
  • network-security, Suricata/Zeek triage and IDS sig evaluation.
  • cps-security, OT attestation and Modbus/MQTT triage.
  • threat-intel, IOC enrichment and feed reconciliation.
  • endpoint-security, EDR action dispatch and live triage.
  • forensic-analyst, evidence collection and timeline reconstruction.

Every agent's startup is checkpointed by aurorasoc.repositories.investigation_repository so a worker restart does not lose the in-flight investigation.

What goes wrong

  • A specialist's required tools are unavailable (the underlying MCP server is down). The factory returns the agent in a degraded state with the typed AgentInitError::MissingTools so the orchestrator can skip the agent for the current dispatch rather than block.
  • The inference backend disagreement (vLLM healthy, Ollama fallback returns an empty completion). The handling is owned by Inference backend resolution; agents themselves are agnostic and surface the typed error upstream.