Skip to main content

Project Overview — What Is AuroraSOC?

Welcome. This page explains what AuroraSOC is, what problem it solves, who it's built for, and how every piece of the system fits together. No prior knowledge is assumed — every term is defined the first time it appears.


The Problem: Security Operations Centers Are Overwhelmed

A Security Operations Center (SOC) is a team of people (called security analysts) who monitor an organization's computers, networks, and devices for signs of cyberattacks. Think of it as the "security control room" for a company's entire digital infrastructure.

Modern SOCs face crippling challenges:

ProblemWhat It Means
Alert fatigueSecurity tools generate thousands of alerts per day. Most are false alarms, but analysts must check each one. They burn out and start missing real threats.
Slow responseInvestigating a single alert can take hours of manual work — searching logs, correlating data, checking threat databases. Attackers move faster than humans can respond.
IoT/OT blind spotsOrganizations increasingly use IoT (Internet of Things — smart devices like sensors, cameras, industrial controllers) and OT (Operational Technology — factory machines, power grid controllers). Traditional SOCs have almost no visibility into these devices.
Knowledge silosWhen an experienced analyst leaves, their expertise leaves with them. Investigative knowledge isn't systematically preserved.
Scaling costsThe only way to handle more alerts in a traditional SOC is to hire more analysts. That doesn't scale economically.

The Solution: AuroraSOC

AuroraSOC is an open-source, AI-powered Security Operations Center. Instead of relying solely on human analysts, it deploys a team of 14 specialized AI agents — each an expert in a specific security domain — coordinated by a master Orchestrator agent.

When a security alert arrives, the Orchestrator reads it, decides which specialists are needed (for example: "This looks like a malware infection on an endpoint, so I need the Malware Analyst, Endpoint Security, and Incident Responder"), dispatches tasks to them in parallel, collects their findings, synthesizes them into a report, and recommends response actions — all in minutes instead of hours.

What "Agentic AI" Means

An AI agent is a program powered by a Large Language Model (LLM) — the same kind of technology behind ChatGPT — but enhanced with the ability to use tools and take actions. Unlike a chatbot that only generates text, an agent can:

  • Search your SIEM logs (SIEM = Security Information and Event Management — the central log database)
  • Isolate a compromised computer from the network
  • Run a malware scanner on a suspicious file
  • Create an incident case for human review
  • Query external threat intelligence databases (databases that track known attacker IPs, malware signatures, etc.)

Each AuroraSOC agent is a specialized AI agent that knows its security domain deeply (because we fine-tune the underlying LLM on domain-specific data) and has access only to the tools relevant to its job (for security — a Malware Analyst can't accidentally isolate an endpoint).

What "Decentralized" Means

Each agent runs as its own independent service (its own process, on its own port). They communicate with each other using the A2A (Agent-to-Agent) protocol. This means:

  • Agents can be scaled independently — if you need more malware analysis capacity, just run more Malware Analyst replicas
  • If one agent crashes, the others keep running (fault isolation)
  • Agents can run on different machines in a network

Who Is AuroraSOC For?

AudienceHow They Use AuroraSOC
SOC teamsAugment human analysts — AI handles the routine triage, humans focus on complex incidents
Security researchersStudy multi-agent AI applied to cybersecurity, experiment with custom agents and tools
IoT/OT operatorsGet visibility into physical-cyber threats through the CPS Security agent and hardware attestation
Organizations without a SOCGet SOC-like capabilities without hiring a full security team
Students and contributorsLearn how production AI agent systems, security operations, and event-driven architectures work

How It All Fits Together — High-Level Architecture

Here is how every major piece of AuroraSOC connects. Each term is explained below the diagram.

What Each Piece Is

ComponentWhat It IsPlain-English Explanation
Edge DevicesPhysical hardware (microcontrollers) running custom firmwareThese are tiny computers embedded in doors, sensors, industrial machines, etc. They report their health and security status.
MQTT BrokerA message bus for IoT devicesA lightweight "post office" that IoT devices use to send and receive messages. Uses mTLS (mutual TLS — both sides verify each other's identity with certificates) for security.
Rust Core EngineA high-performance data processing service written in RustTakes raw security alerts from many different formats and normalizes them into a single consistent format. Also verifies that IoT device firmware hasn't been tampered with. Written in Rust for speed.
Redis StreamsAn internal event busA fast message queue that delivers events (alerts, tasks, results) between system components. Think of it as an internal postal system.
NATS JetStreamA distributed event busLike Redis Streams, but designed to share data across multiple sites/locations. If your organization has SOCs in New York and London, NATS syncs threat data between them.
AI AgentsSpecialized AI programs16 programs, each powered by an LLM fine-tuned for a specific security domain. They analyze data, make decisions, and take actions using tools.
OrchestratorThe master coordinator agentReads incoming alerts, decides which specialists to involve, dispatches tasks, collects results, and generates the final report.
MCP Tool ServersTool provider servicesEach domain's tools run as separate services. Agents connect to them using the MCP (Model Context Protocol) standard to call tools like "SearchLogs" or "IsolateEndpoint".
PostgreSQLThe relational databaseStores structured data: alerts, investigation cases, device records, playbooks, audit logs.
QdrantA vector databaseStores embeddings (numerical representations of text) so agents can semantically search past investigations — "find cases similar to this alert".
RedisA cache and rate limiterStores frequently-accessed data (like threat intelligence lookups) in memory for fast retrieval. Also enforces API rate limits.
FastAPI BackendThe REST API serverThe backend application that the dashboard and external tools talk to. Built with FastAPI (a Python web framework). Exposes REST endpoints and WebSocket connections.
Next.js DashboardThe browser-based UIA web application where human analysts see alerts, track cases, monitor agents, and approve or reject AI recommendations.

Key Design Decisions

These are the deliberate architectural choices that shape AuroraSOC. Understanding them helps you understand why the code is structured the way it is.

1. BeeAI Framework, Not LangChain/LangGraph

AuroraSOC uses the IBM BeeAI Agent Framework (documentation) — not LangChain or LangGraph, which are more commonly seen in LLM projects. Why?

  • ACP (Agent Communication Protocol): BeeAI provides a standardized protocol for agents to communicate, enabling the decentralized multi-agent architecture
  • MCP (Model Context Protocol): A standard for connecting agents to tools, ensuring each agent only accesses authorized tools
  • RequirementAgent: A BeeAI agent class that enforces rules like "always use ThinkTool first" — giving us structured, auditable reasoning
  • Production-grade: Signal handling, graceful shutdown, health checks built in

2. IBM Granite Models, Not GPT/Claude

The LLM powering each agent is IBM Granite 4 (documentation) — an open-source model from IBM. Why?

  • Open source: You can run it locally without any API keys or cloud dependencies. Full data sovereignty.
  • Fine-tunable: We fine-tune Granite specifically for security operations using domain-specific datasets (MITRE ATT&CK, Sigma rules, CVE data)
  • Efficient: Can run on consumer GPUs with 4-bit quantization via Unsloth
  • Per-agent specialization: Each agent can have its own fine-tuned model variant, making the Malware Analyst better at malware analysis than a generic model would be

3. Redis Streams + NATS, Not Kafka

Many event-driven systems use Apache Kafka. AuroraSOC uses Redis Streams (for internal events) and NATS JetStream (for cross-site distribution). Why?

  • Redis Streams: Already using Redis for cache, so no additional infrastructure. Sub-millisecond latency for co-located services. Consumer groups for load balancing.
  • NATS JetStream: Lightweight, persistent (survives restarts), exactly-once delivery. Perfect for federating data across multiple SOC sites.
  • Together: Redis handles the fast internal pipeline; NATS handles the durable, distributed pipeline. Two tools, each used where it excels.

4. Human-in-the-Loop by Default

AuroraSOC never takes high-risk actions autonomously. If a playbook wants to isolate a production server or revoke a certificate, it creates a Human Approval request (with a 4-hour expiry) and waits. A human analyst must approve or deny before the action executes. The AI assists — humans decide.

5. Hardware-Rooted Trust for IoT/CPS

The CPS (Cyber-Physical Systems) Security agent doesn't just monitor network traffic. It verifies the firmware integrity of physical devices using hardware attestation — cryptographic certificates generated by the device hardware itself, verified using ECDSA P-256 signatures. If a device's firmware has been tampered with, AuroraSOC detects it at the hardware level.


Summary

AuroraSOC is a multi-agent AI system that automates security operations. It uses 14 specialized AI agents powered by IBM Granite LLMs, coordinated by an orchestrator, connected to security tools via MCP, streaming events through Redis and NATS, with a Next.js dashboard for human operators. It uniquely integrates hardware security for IoT/CPS devices and enforces human approval for all high-risk actions.

Next: Technology Stack → — Deep dive into every technology used and why.