Project Overview — What Is AuroraSOC?
Welcome. This page explains what AuroraSOC is, what problem it solves, who it's built for, and how every piece of the system fits together. No prior knowledge is assumed — every term is defined the first time it appears.
The Problem: Security Operations Centers Are Overwhelmed
A Security Operations Center (SOC) is a team of people (called security analysts) who monitor an organization's computers, networks, and devices for signs of cyberattacks. Think of it as the "security control room" for a company's entire digital infrastructure.
Modern SOCs face crippling challenges:
| Problem | What It Means |
|---|---|
| Alert fatigue | Security tools generate thousands of alerts per day. Most are false alarms, but analysts must check each one. They burn out and start missing real threats. |
| Slow response | Investigating a single alert can take hours of manual work — searching logs, correlating data, checking threat databases. Attackers move faster than humans can respond. |
| IoT/OT blind spots | Organizations increasingly use IoT (Internet of Things — smart devices like sensors, cameras, industrial controllers) and OT (Operational Technology — factory machines, power grid controllers). Traditional SOCs have almost no visibility into these devices. |
| Knowledge silos | When an experienced analyst leaves, their expertise leaves with them. Investigative knowledge isn't systematically preserved. |
| Scaling costs | The only way to handle more alerts in a traditional SOC is to hire more analysts. That doesn't scale economically. |
The Solution: AuroraSOC
AuroraSOC is an open-source, AI-powered Security Operations Center. Instead of relying solely on human analysts, it deploys a team of 14 specialized AI agents — each an expert in a specific security domain — coordinated by a master Orchestrator agent.
When a security alert arrives, the Orchestrator reads it, decides which specialists are needed (for example: "This looks like a malware infection on an endpoint, so I need the Malware Analyst, Endpoint Security, and Incident Responder"), dispatches tasks to them in parallel, collects their findings, synthesizes them into a report, and recommends response actions — all in minutes instead of hours.
What "Agentic AI" Means
An AI agent is a program powered by a Large Language Model (LLM) — the same kind of technology behind ChatGPT — but enhanced with the ability to use tools and take actions. Unlike a chatbot that only generates text, an agent can:
- Search your SIEM logs (SIEM = Security Information and Event Management — the central log database)
- Isolate a compromised computer from the network
- Run a malware scanner on a suspicious file
- Create an incident case for human review
- Query external threat intelligence databases (databases that track known attacker IPs, malware signatures, etc.)
Each AuroraSOC agent is a specialized AI agent that knows its security domain deeply (because we fine-tune the underlying LLM on domain-specific data) and has access only to the tools relevant to its job (for security — a Malware Analyst can't accidentally isolate an endpoint).
What "Decentralized" Means
Each agent runs as its own independent service (its own process, on its own port). They communicate with each other using the A2A (Agent-to-Agent) protocol. This means:
- Agents can be scaled independently — if you need more malware analysis capacity, just run more Malware Analyst replicas
- If one agent crashes, the others keep running (fault isolation)
- Agents can run on different machines in a network
Who Is AuroraSOC For?
| Audience | How They Use AuroraSOC |
|---|---|
| SOC teams | Augment human analysts — AI handles the routine triage, humans focus on complex incidents |
| Security researchers | Study multi-agent AI applied to cybersecurity, experiment with custom agents and tools |
| IoT/OT operators | Get visibility into physical-cyber threats through the CPS Security agent and hardware attestation |
| Organizations without a SOC | Get SOC-like capabilities without hiring a full security team |
| Students and contributors | Learn how production AI agent systems, security operations, and event-driven architectures work |
How It All Fits Together — High-Level Architecture
Here is how every major piece of AuroraSOC connects. Each term is explained below the diagram.
What Each Piece Is
| Component | What It Is | Plain-English Explanation |
|---|---|---|
| Edge Devices | Physical hardware (microcontrollers) running custom firmware | These are tiny computers embedded in doors, sensors, industrial machines, etc. They report their health and security status. |
| MQTT Broker | A message bus for IoT devices | A lightweight "post office" that IoT devices use to send and receive messages. Uses mTLS (mutual TLS — both sides verify each other's identity with certificates) for security. |
| Rust Core Engine | A high-performance data processing service written in Rust | Takes raw security alerts from many different formats and normalizes them into a single consistent format. Also verifies that IoT device firmware hasn't been tampered with. Written in Rust for speed. |
| Redis Streams | An internal event bus | A fast message queue that delivers events (alerts, tasks, results) between system components. Think of it as an internal postal system. |
| NATS JetStream | A distributed event bus | Like Redis Streams, but designed to share data across multiple sites/locations. If your organization has SOCs in New York and London, NATS syncs threat data between them. |
| AI Agents | Specialized AI programs | 16 programs, each powered by an LLM fine-tuned for a specific security domain. They analyze data, make decisions, and take actions using tools. |
| Orchestrator | The master coordinator agent | Reads incoming alerts, decides which specialists to involve, dispatches tasks, collects results, and generates the final report. |
| MCP Tool Servers | Tool provider services | Each domain's tools run as separate services. Agents connect to them using the MCP (Model Context Protocol) standard to call tools like "SearchLogs" or "IsolateEndpoint". |
| PostgreSQL | The relational database | Stores structured data: alerts, investigation cases, device records, playbooks, audit logs. |
| Qdrant | A vector database | Stores embeddings (numerical representations of text) so agents can semantically search past investigations — "find cases similar to this alert". |
| Redis | A cache and rate limiter | Stores frequently-accessed data (like threat intelligence lookups) in memory for fast retrieval. Also enforces API rate limits. |
| FastAPI Backend | The REST API server | The backend application that the dashboard and external tools talk to. Built with FastAPI (a Python web framework). Exposes REST endpoints and WebSocket connections. |
| Next.js Dashboard | The browser-based UI | A web application where human analysts see alerts, track cases, monitor agents, and approve or reject AI recommendations. |
Key Design Decisions
These are the deliberate architectural choices that shape AuroraSOC. Understanding them helps you understand why the code is structured the way it is.
1. BeeAI Framework, Not LangChain/LangGraph
AuroraSOC uses the IBM BeeAI Agent Framework (documentation) — not LangChain or LangGraph, which are more commonly seen in LLM projects. Why?
- ACP (Agent Communication Protocol): BeeAI provides a standardized protocol for agents to communicate, enabling the decentralized multi-agent architecture
- MCP (Model Context Protocol): A standard for connecting agents to tools, ensuring each agent only accesses authorized tools
- RequirementAgent: A BeeAI agent class that enforces rules like "always use ThinkTool first" — giving us structured, auditable reasoning
- Production-grade: Signal handling, graceful shutdown, health checks built in
2. IBM Granite Models, Not GPT/Claude
The LLM powering each agent is IBM Granite 4 (documentation) — an open-source model from IBM. Why?
- Open source: You can run it locally without any API keys or cloud dependencies. Full data sovereignty.
- Fine-tunable: We fine-tune Granite specifically for security operations using domain-specific datasets (MITRE ATT&CK, Sigma rules, CVE data)
- Efficient: Can run on consumer GPUs with 4-bit quantization via Unsloth
- Per-agent specialization: Each agent can have its own fine-tuned model variant, making the Malware Analyst better at malware analysis than a generic model would be
3. Redis Streams + NATS, Not Kafka
Many event-driven systems use Apache Kafka. AuroraSOC uses Redis Streams (for internal events) and NATS JetStream (for cross-site distribution). Why?
- Redis Streams: Already using Redis for cache, so no additional infrastructure. Sub-millisecond latency for co-located services. Consumer groups for load balancing.
- NATS JetStream: Lightweight, persistent (survives restarts), exactly-once delivery. Perfect for federating data across multiple SOC sites.
- Together: Redis handles the fast internal pipeline; NATS handles the durable, distributed pipeline. Two tools, each used where it excels.
4. Human-in-the-Loop by Default
AuroraSOC never takes high-risk actions autonomously. If a playbook wants to isolate a production server or revoke a certificate, it creates a Human Approval request (with a 4-hour expiry) and waits. A human analyst must approve or deny before the action executes. The AI assists — humans decide.
5. Hardware-Rooted Trust for IoT/CPS
The CPS (Cyber-Physical Systems) Security agent doesn't just monitor network traffic. It verifies the firmware integrity of physical devices using hardware attestation — cryptographic certificates generated by the device hardware itself, verified using ECDSA P-256 signatures. If a device's firmware has been tampered with, AuroraSOC detects it at the hardware level.
Summary
AuroraSOC is a multi-agent AI system that automates security operations. It uses 14 specialized AI agents powered by IBM Granite LLMs, coordinated by an orchestrator, connected to security tools via MCP, streaming events through Redis and NATS, with a Next.js dashboard for human operators. It uniquely integrates hardware security for IoT/CPS devices and enforces human approval for all high-risk actions.
Next: Technology Stack → — Deep dive into every technology used and why.