Event-Driven Pipeline
AuroraSOC processes security events through a multi-stage, event-driven pipeline using three complementary messaging systems. This architecture ensures events are processed reliably, at scale, and with proper ordering.
Pipeline Overview
Default deployments use the Python API and MQTT consumer bridge to normalize
and validate events before publishing them into Redis Streams. Enable
--profile rust-core only when you want the optional Rust fast path to handle
high-throughput ingest or attestation workloads.
Why Three Messaging Systems?
Each messaging system serves a distinct purpose:
| System | Purpose | Pattern | Durability |
|---|---|---|---|
| Redis Streams | Internal event bus | Consumer groups | In-memory + AOF |
| NATS JetStream | Cross-site federation | Pub/Sub + persistence | Disk-backed |
| MQTT | IoT device communication | Pub/Sub + QoS levels | Broker-dependent |
Redis Streams — The Internal Bus
Redis Streams is the primary event bus within a single AuroraSOC deployment:
Why Redis Streams?
- Consumer groups — Multiple consumers share the workload, each message processed once
- Message acknowledgment — Failed messages can be re-processed
- Stream trimming — Automatic cleanup of old messages (configurable max length)
- Sub-millisecond latency — Critical for real-time alerting
- Already deployed — Redis is used for caching, so no additional infrastructure
Agent Task Worker Architecture
AuroraSOC runs a dedicated worker process at aurorasoc/workers/agent_task_worker.py to execute queued tasks and publish correlated results.
Consumer groups used in production:
| Stream | Consumer Group | Primary Consumer |
|---|---|---|
aurora:agent:tasks | aurora-agents | AgentTaskWorker |
aurora:agent:results | api-agent-results | FastAPI result correlator |
aurora:alerts | api-ws-alerts | WebSocket alert broadcaster |
aurora:audit | api-ws-thoughts | WebSocket reasoning broadcaster |
Stream Lag Metrics
Prometheus should track stream lag and throughput for the task/result path:
| Metric | Description | Alert Suggestion |
|---|---|---|
aurora_agent_tasks_pending_total | Pending messages in aurora:agent:tasks group | Warn if > 200 for 5 min |
aurora_agent_task_retry_total | Count of worker requeued tasks | Warn on sustained growth |
aurora_agent_dead_letter_total | Count of dead-lettered tasks | Page on any non-zero spike |
aurora_agent_result_correlation_latency_ms | Publish→future-resolution latency | Warn if p95 > 5000 ms |
aurora_agent_results_unmatched_total | Result events with no waiting future | Investigate if steadily increasing |
NATS JetStream — Cross-Site Federation
For organizations with multiple AuroraSOC instances:
The AURORA JetStream stream has subjects:
AURORA.alerts— Alert federationAURORA.ioc_sharing— IOC disseminationAURORA.agent_results— Investigation sharingAURORA.audit— Cross-site audit trail
Why NATS JetStream?
- Persistent delivery — Messages survive broker restarts
- Exactly-once semantics — Via consumer acknowledgment + deduplication
- Geographic distribution — NATS clusters span data centers
- Lightweight — Single binary, minimal resource usage
MQTT — IoT Edge Communication
MQTT connects resource-constrained IoT devices:
Why MQTT?
- Minimal overhead — 2-byte header, ideal for constrained devices (nRF52840 with 256KB RAM)
- QoS levels — Guaranteed delivery even with intermittent connectivity
- Topic wildcards —
aurora/sensors/+/*captures all device telemetry - Industry standard — Every IoT platform speaks MQTT
- TLS support — Encrypted device-to-broker communication
Event Processing Guarantees
| Guarantee | Implementation |
|---|---|
| At-least-once delivery | Consumer group ack + re-delivery on timeout |
| Ordering | Per-stream ordering via Redis Streams |
| Deduplication | SHA-256 hash-based dedup in scheduler (5-min window) |
| Backpressure | Consumer groups naturally provide backpressure |
| Dead letter | Failed tasks after 3 retries moved to aurora:agent:dead_letter |
AuroraSOC follows the "smart endpoints, dumb pipes" philosophy. The messaging systems are simple transport — all business logic lives in the consuming services (API, agents, scheduler).
Event Flow Example: Alert Processing
Here's the complete journey of a security event:
When --profile rust-core is enabled, the Rust service can perform the ingest
normalization step before publishing into the same Redis streams shown above.