Project Overview — What Is AuroraSOC?

Welcome. This page explains what AuroraSOC is, what problem it solves, who it's built for, and how every piece of the system fits together. No prior knowledge is assumed — every term is defined the first time it appears.

The Problem: Security Operations Centers Are Overwhelmed

A Security Operations Center (SOC) is a team of people (called security analysts) who monitor an organization's computers, networks, and devices for signs of cyberattacks. Think of it as the "security control room" for a company's entire digital infrastructure.

Modern SOCs face crippling challenges:

Problem	What It Means
Alert fatigue	Security tools generate thousands of alerts per day. Most are false alarms, but analysts must check each one. They burn out and start missing real threats.
Slow response	Investigating a single alert can take hours of manual work — searching logs, correlating data, checking threat databases. Attackers move faster than humans can respond.
IoT/OT blind spots	Organizations increasingly use IoT (Internet of Things — smart devices like sensors, cameras, industrial controllers) and OT (Operational Technology — factory machines, power grid controllers). Traditional SOCs have almost no visibility into these devices.
Knowledge silos	When an experienced analyst leaves, their expertise leaves with them. Investigative knowledge isn't systematically preserved.
Scaling costs	The only way to handle more alerts in a traditional SOC is to hire more analysts. That doesn't scale economically.

The Solution: AuroraSOC

AuroraSOC is an open-source, AI-powered Security Operations Center. Instead of relying solely on human analysts, it deploys a team of 16 specialized AI agents — each an expert in a specific security domain — coordinated by a master Orchestrator agent.

When a security alert arrives, the Orchestrator reads it, decides which specialists are needed (for example: "This looks like a malware infection on an endpoint, so I need the Malware Analyst, Endpoint Security, and Incident Responder"), dispatches tasks to them in parallel, collects their findings, synthesizes them into a report, and recommends response actions — all in minutes instead of hours.

What "Agentic AI" Means

An AI agent is a program powered by a Large Language Model (LLM) — the same kind of technology behind ChatGPT — but enhanced with the ability to use tools and take actions. Unlike a chatbot that only generates text, an agent can:

Search your SIEM logs (SIEM = Security Information and Event Management — the central log database)
Isolate a compromised computer from the network
Run a malware scanner on a suspicious file
Create an incident case for human review
Query external threat intelligence databases (databases that track known attacker IPs, malware signatures, etc.)

Each AuroraSOC agent is a specialized AI agent that knows its security domain deeply (because we fine-tune the underlying LLM on domain-specific data) and has access only to the tools relevant to its job (for security — a Malware Analyst can't accidentally isolate an endpoint).

What "Decentralized" Means

Each agent runs as its own independent service (its own process, on its own port). They communicate with each other using the A2A (Agent-to-Agent) protocol. This means:

Agents can be scaled independently — if you need more malware analysis capacity, just run more Malware Analyst replicas
If one agent crashes, the others keep running (fault isolation)
Agents can run on different machines in a network

Who Is AuroraSOC For?

Audience	How They Use AuroraSOC
SOC teams	Augment human analysts — AI handles the routine triage, humans focus on complex incidents
Security researchers	Study multi-agent AI applied to cybersecurity, experiment with custom agents and tools
IoT/OT operators	Get visibility into physical-cyber threats through the CPS Security agent and hardware attestation
Organizations without a SOC	Get SOC-like capabilities without hiring a full security team
Students and contributors	Learn how production AI agent systems, security operations, and event-driven architectures work

How It All Fits Together — High-Level Architecture

Here is how every major piece of AuroraSOC connects. Each term is explained below the diagram.

What Each Piece Is

Component	What It Is	Plain-English Explanation
Edge Devices	Physical hardware (microcontrollers) running custom firmware	These are tiny computers embedded in doors, sensors, industrial machines, etc. They report their health and security status.
MQTT Broker	A message bus for IoT devices	A lightweight "post office" that IoT devices use to send and receive messages. Uses mTLS (mutual TLS — both sides verify each other's identity with certificates) for security.
Rust Core Engine	An optional high-performance Rust service	Adds a dedicated normalize-and-attest fast path for deployments that need it. The default stack can run without it.
Redis Streams	An internal event bus	A fast message queue that delivers events (alerts, tasks, results) between system components. Think of it as an internal postal system.
NATS JetStream	A distributed event bus	Like Redis Streams, but designed to share data across multiple sites/locations. If your organization has SOCs in New York and London, NATS syncs threat data between them.
AI Agents	Specialized AI programs	16 programs, each powered by an LLM fine-tuned for a specific security domain. They analyze data, make decisions, and take actions using tools.
Orchestrator	The master coordinator agent	Reads incoming alerts, decides which specialists to involve, dispatches tasks, collects results, and generates the final report.
MCP Tool Servers	Tool provider services	Each domain's tools run as separate services. Agents connect to them using the MCP (Model Context Protocol) standard to call tools like "SearchLogs" or "IsolateEndpoint".
PostgreSQL	The relational database	Stores structured data: alerts, investigation cases, device records, playbooks, audit logs.
pgvector	A PostgreSQL extension	Adds vector storage and HNSW indexing to PostgreSQL so agents can semantically search past investigations — "find cases similar to this alert" — without a separate database.
Redis	A cache and rate limiter	Stores frequently-accessed data (like threat intelligence lookups) in memory for fast retrieval. Also enforces API rate limits.
FastAPI Backend	The REST API server	The backend application that the dashboard and external tools talk to. Built with FastAPI (a Python web framework). Exposes REST endpoints and WebSocket connections.
Next.js Dashboard	The browser-based UI	A web application where human analysts see alerts, track cases, monitor agents, and approve or reject AI recommendations.

Key Design Decisions

These are the deliberate architectural choices that shape AuroraSOC. Understanding them helps you understand why the code is structured the way it is.

1. BeeAI Framework, Not LangChain/LangGraph

AuroraSOC uses the IBM BeeAI Agent Framework (documentation) — not LangChain or LangGraph, which are more commonly seen in LLM projects. Why?

ACP (Agent Communication Protocol): BeeAI provides a standardized protocol for agents to communicate, enabling the decentralized multi-agent architecture
MCP (Model Context Protocol): A standard for connecting agents to tools, ensuring each agent only accesses authorized tools
RequirementAgent: A BeeAI agent class that enforces rules like "always use ThinkTool first" — giving us structured, auditable reasoning
Production-grade: Signal handling, graceful shutdown, health checks built in

2. IBM Granite Models, Not GPT/Claude

The LLM powering each agent is IBM Granite 4 (documentation) — an open-source model from IBM. Why?

Open source: You can run it locally without any API keys or cloud dependencies. Full data sovereignty.
Fine-tunable: We fine-tune Granite specifically for security operations using domain-specific datasets (MITRE ATT&CK, Sigma rules, CVE data)
Efficient: Can run on consumer GPUs with 4-bit quantization via Unsloth
Per-agent specialization: Each agent can have its own fine-tuned model variant, making the Malware Analyst better at malware analysis than a generic model would be

3. Redis Streams + NATS, Not Kafka

Many event-driven systems use Apache Kafka. AuroraSOC uses Redis Streams (for internal events) and NATS JetStream (for cross-site distribution). Why?

Redis Streams: Already using Redis for cache, so no additional infrastructure. Sub-millisecond latency for co-located services. Consumer groups for load balancing.
NATS JetStream: Lightweight, persistent (survives restarts), exactly-once delivery. Perfect for federating data across multiple SOC sites.
Together: Redis handles the fast internal pipeline; NATS handles the durable, distributed pipeline. Two tools, each used where it excels.

4. Human-in-the-Loop by Default

AuroraSOC never takes high-risk actions autonomously. If a playbook wants to isolate a production server or revoke a certificate, it creates a Human Approval request (with a 4-hour expiry) and waits. A human analyst must approve or deny before the action executes. The AI assists — humans decide.

5. Hardware-Rooted Trust for IoT/CPS

The CPS (Cyber-Physical Systems) Security agent doesn't just monitor network traffic. It verifies the firmware integrity of physical devices using hardware attestation — cryptographic certificates generated by the device hardware itself, verified using ECDSA P-256 signatures. If a device's firmware has been tampered with, AuroraSOC detects it at the hardware level.

Summary

AuroraSOC is a multi-agent AI system that automates security operations. It uses 16 specialized AI agents powered by IBM Granite LLMs, coordinated by an orchestrator, connected to security tools via MCP, streaming events through Redis and NATS, with a Next.js dashboard for human operators. It uniquely integrates hardware security for IoT/CPS devices and enforces human approval for all high-risk actions.

Next: Technology Stack → — Deep dive into every technology used and why.

The Problem: Security Operations Centers Are Overwhelmed​

The Solution: AuroraSOC​

What "Agentic AI" Means​

What "Decentralized" Means​

Who Is AuroraSOC For?​

How It All Fits Together — High-Level Architecture​

What Each Piece Is​

Key Design Decisions​

1. BeeAI Framework, Not LangChain/LangGraph​

2. IBM Granite Models, Not GPT/Claude​

3. Redis Streams + NATS, Not Kafka​

4. Human-in-the-Loop by Default​

5. Hardware-Rooted Trust for IoT/CPS​

Summary​