Agent task execution

What this page is

This page describes the working AI agent task path in the migrated AuroraSOC stack. It is for engineers who need to start the BeeAI A2A agent mesh, queue work for a specialist, and confirm the task moved from an operator request to an agent result.

The current operational path is FastAPI, Redis Streams, BeeAI A2A, MCP tool servers, IBM Granite served by vLLM, and the Next.js operator console. Temporal-backed workflow durability remains part of the long-term architecture, but it is not required for this local/dev task execution path.

Why it exists this way

Manual and automated agent work both need the same durable handoff: the API validates the request, the fleet router selects a target, the worker consumes the stream, and the agent result is published for the API and console to observe. Keeping that path explicit makes startup drift, routing mistakes, and failed assignments visible instead of silently losing work.

How it works

The specialist agents are discovered from aurorasoc.agents.registry. Each agent owns an AgentDefinition with its name, A2A port, tags, prompt, and allowed MCP domains. The compose stack and the local launcher must use that registry as the source of truth for names and ports.

The task envelope carries the fields the console and audit path need: task_id, assignment_id, assigned_agent_id, optional assigned_replica_id, optional preferred_site_id, priority, and correlation fields such as case_id, alert_id, and traceparent. Assignment priority uses the same 0..9 scale across the API schema, stream envelope, database check constraint, and console selector.

Running the GPU/vLLM stack

Use the GPU overlay when verifying real local inference:

docker compose \
  -f infra/compose/docker-compose.yml \
  -f infra/compose/docker-compose.gpu.yml \
  --project-directory . \
  --env-file .env \
  --profile agents-core \
  up -d

After the stack is up, run migrations and seed the default admin if the database is fresh:

just migrate
just seed-admin

The core services for a task smoke are vllm, api, redis, agent-task-worker, agent-orchestrator, agent-security-analyst, and agent-threat-hunter. The specialist compose services launch with python -m aurorasoc.agents.server; the orchestrator keeps its dedicated python -m aurorasoc.agents.orchestrator.server entrypoint.

Verifying a task

From the operator console, open the Agents page, expand an active agent, and use Manual Assignment. Leave the replica unset for normal least-loaded routing, or pin a replica when testing a specific target.

The same flow can be checked over HTTP by posting to /api/v1/agents/{agent_id}/assign, then polling /api/v1/tasks/{task_id}/assignment. A healthy run moves the assignment from pending to dispatched and then to a terminal state such as completed or failed. The open queue is available through /api/v1/agents/{agent_id}/queue?status=pending&status=dispatched&status=in_progress.

For a live A2A smoke against already running agent ports, use:

python tools/scripts/legacy/smoke_agent_fleet.py \
  --agents SecurityAnalyst,ThreatHunter \
  --timeout 120

What goes wrong and how do you fix it

A specialist exits immediately with No module named aurorasoc.agents.generic_server. The compose service or launcher is stale. Specialist services must run aurorasoc.agents.server.
The local launcher imports the wrong scripts package. Run the launcher from this repository path, or use tools/scripts/legacy/run_local_agents.py; it injects packages/backend into PYTHONPATH for child processes.
Assignment priority is rejected after the API accepted it. The priority contract has drifted. Keep every layer on the 0..9 scale.
A task fails before it reaches Redis. The assignment should be marked failed, and tasks_in_flight must remain unchanged because no dispatch occurred.
The tools panel is empty or degraded. Check /api/v1/mcp/health and the MCP domain containers. Agents can start with degraded tools, but the domain health row should explain which server is down.
The worker is running but assignments never complete. Check agent-task-worker logs, aurora_agent_task_worker_retries_total, aurora_agent_task_worker_dead_letters_total, and the target A2A agent health URL.

Code pointers

Agent definitions and validation: packages/backend/aurorasoc/agents/registry.py
Specialist entrypoint: packages/backend/aurorasoc/agents/server.py
Manual assignment API helper: packages/backend/aurorasoc/api/fleet_dispatch.py
Routing and assignment hooks: packages/backend/aurorasoc/services/dispatch_router.py
Worker execution loop: packages/backend/aurorasoc/workers/agent_task_worker.py
MCP health and tool-call audit: packages/backend/aurorasoc/tools/mcp_health.py
Operator console Agents page: apps/operator-console/src/app/agents/page.tsx

What this page is​

Why it exists this way​

How it works​

Running the GPU/vLLM stack​

Verifying a task​

What goes wrong and how do you fix it​

Code pointers​