Architecture Overview

This section provides a deep-dive into AuroraSOC's architecture for developers contributing to or extending the platform. Understanding the architecture is essential before modifying any component.

If you are new to the codebase, read this page in order from top to bottom once, then use the linked deep-dive pages at the end.

The default Compose stack follows the Python API and event-processing path. Enable --profile rust-core only when you need the optional Rust fast path for high-throughput ingest and attestation workloads.

If you are looking for the operator-facing startup path, use AI Agent Fleet Deployment first and return here when you need implementation detail.

When to Use This Page

Use this page before you:

Add new agents, tools, events, or storage integrations
Change message routing, task execution, or API behavior
Plan performance testing or production hardening work

Prerequisites

Recommended background before making architecture changes:

Familiarity with async Python (asyncio, FastAPI, SQLAlchemy async)
Basic understanding of Redis Streams consumer groups
Familiarity with container networking (Compose and service DNS)
Working knowledge of OpenTelemetry and Prometheus basics

High-Level Architecture

In default deployments, MQTT_S --> PY_EDGE --> REDIS_S is the active edge-ingest path. The Rust node is only present when --profile rust-core is enabled.

Project Structure

AuroraSOC/
├── aurorasoc/              # Python backend (main application)
│   ├── agents/             # 16 AI agents + factory + prompts
│   ├── agents/             # 13 specialist agents + orchestrator + prompts
│   ├── api/                # FastAPI application and contract surface
│   ├── config/             # Pydantic settings (10 subsystem configs)
│   ├── core/               # Auth, DB, logging, rate limiting, tracing
│   ├── engine/             # SOAR playbook engine
│   ├── events/             # Redis Streams, NATS, MQTT consumers
│   ├── workers/            # Redis stream workers (agent task execution)
│   ├── memory/             # Three-tier agent memory system
│   ├── models/             # Pydantic domain models + enums
│   ├── services/           # Background scheduler
│   ├── tools/              # 50+ MCP tools across multiple modules
│   └── workflows/          # BeeAI AgentWorkflows
├── rust_core/              # Optional Rust fast path
│   └── src/                # Event normalizer, security middleware, publishers
├── dashboard/              # Next.js 15 frontend
│   └── src/                # React components, Zustand store, API client
├── firmware/               # Three firmware platforms
│   ├── esp32s3/            # Zephyr RTOS (C)
│   ├── nrf52840/           # Embassy-rs (Rust)
│   └── stm32/             # Ada SPARK (Ada)
├── infrastructure/         # Docker, monitoring, broker configs
├── tests/                  # pytest test suite
├── alembic/                # Database migrations
└── docs/                   # This Docusaurus documentation

Component Interactions

Request Flow: Alert Investigation

Runtime Data Flow (Task Worker Path)

This complements the API-centric flow above and focuses on worker correlation behavior.

Design Principles

1. Graceful Degradation

Fallback behavior is intentionally mode-aware:

PostgreSQL down in dummy mode -> selected read endpoints may serve in-memory showcase data
PostgreSQL down in dry_run or real mode -> DB-backed reads fail clearly instead of silently substituting showcase data
Redis down -> selected runtime protections fall back to in-memory behavior where explicitly implemented
pgvector unavailable -> agent memory falls back to sliding-window-only behavior
Agent offline -> circuit breakers and timeout controls prevent cascading failures

2. Configuration over Code

All behavior is configurable via environment variables with sensible defaults. No code changes needed for:

LLM provider switching
Port assignments
Connection strings
Feature toggles
Rate limits

3. Separation of Concerns

Each module has a single responsibility:

agents/ — AI agent creation and configuration
tools/ — External system integration
events/ — Message transport
memory/ — Knowledge persistence
engine/ — Playbook execution
core/ — Cross-cutting concerns (auth, logging, tracing)

4. Event Sourcing Lite

While not a full event-sourced system, AuroraSOC captures all state changes in Redis Streams, providing:

Complete audit trail
Event replay capability
Decoupled producers and consumers

Failure-Mode and Recovery Matrix

Failure Mode	Detection Signal	Immediate Behavior	Recovery Strategy
Agent A2A endpoint unavailable	Startup connectivity probe + dispatch exception	Warning-only degraded startup, circuit-breaker protection	Restore service, breaker resets via half-open success
PostgreSQL unavailable	DB health checks and query failures	API can return degraded responses for selected views	Restore DB, re-run failed writes if needed
Redis stream lag growth	Prometheus lag/retry/dead-letter signals	Investigation latency increases	Scale worker consumers, inspect slow handlers
Result correlation mismatch	`aurora_agent_results_unmatched_total` increasing	Pending futures may timeout	Validate correlation IDs and event payload schema
Metrics exporter port conflict	Worker startup warning	Worker continues without local metrics endpoint	Change `AGENT_TASK_WORKER_METRICS_PORT` and restart worker

Performance Considerations

Latency-sensitive path

The most latency-sensitive path is:

API task publish
Worker task consume and dispatch
Result publish and correlation
API response/websocket fanout

Instrumentation to watch:

aurora_agent_task_worker_task_duration_ms
aurora_agent_result_correlation_latency_ms
aurora_agent_result_futures_pending

Throughput tuning levers

Lever	Location	Effect
`REDIS_BATCH_SIZE`	worker and consumers	Higher throughput, potential burst latency
`REDIS_BLOCK_MS`	stream consumers	Lower polling overhead, affects responsiveness
Worker replica count	deployment/compose	Higher parallelism for task handling
A2A timeout values	dispatch layer	Prevents long hangs during partial outages

Technology Stack Decision Matrix

Component	Technology	Why This Choice	Alternatives Considered
AI Framework	BeeAI	A2A + MCP native support	LangChain, CrewAI, AutoGen
API	FastAPI	Async, type-safe, OpenAPI	Django, Flask, Express
ORM	SQLAlchemy 2.0	Async, mature, type-safe	Tortoise ORM, Prisma
Event Bus	Redis Streams	Low latency, consumer groups	Kafka, RabbitMQ
Federation	NATS JetStream	Lightweight, persistent	Kafka, Pulsar
IoT Transport	MQTT v5	Industry standard, QoS	AMQP, CoAP
Vector DB	pgvector (PG ext.)	Single-DB simplicity, HNSW indexes	Qdrant, Pinecone, Weaviate, Milvus
Core Engine	Rust (tokio+axum, opt-in profile)	High-throughput normalization and attestation fast path	Go, C++
Frontend	Next.js 15	SSR, React, Turbopack	Nuxt, SvelteKit
Firmware	C/Rust/Ada	Platform-specific strengths	MicroPython, Arduino
Database	PostgreSQL 16	JSONB, reliability, extensions	MySQL, MongoDB
Tracing	OpenTelemetry	Vendor-neutral, standard	Jaeger, Zipkin (OTLP exports to these)

Port Map

Port	Service	Protocol
8000	FastAPI	HTTP/WS
8080	Rust Core Engine (opt-in profile)	HTTP
3000	Next.js Dashboard	HTTP
5432	PostgreSQL + pgvector	TCP
6379	Redis	TCP
4222	NATS	TCP
1883	MQTT	TCP
4317	OTLP gRPC	gRPC
9000-9016	A2A Agents	HTTP
9090	Prometheus	HTTP
3001	Grafana	HTTP

When to Use This Page​

Prerequisites​

High-Level Architecture​

Project Structure​

Component Interactions​

Request Flow: Alert Investigation​

Runtime Data Flow (Task Worker Path)​

Design Principles​

1. Graceful Degradation​

2. Configuration over Code​

3. Separation of Concerns​

4. Event Sourcing Lite​

Failure-Mode and Recovery Matrix​

Performance Considerations​

Latency-sensitive path​

Throughput tuning levers​

Related Pages​

Technology Stack Decision Matrix​

Port Map​