LLM Integration Overview
AuroraSOC uses a Large Language Model (LLM) as the reasoning core for security investigations. An LLM is a system trained on massive amounts of text so it learns statistical patterns of language and can generate useful responses to new questions, which is the same class of technology used by systems like ChatGPT, but here it is deployed for Security Operations Center (SOC) workflows and hosted inside your own infrastructure.
Why AuroraSOC Runs Its Own LLMs Locally
AuroraSOC is designed for environments where security data is sensitive, response speed matters, and cost must stay predictable.
Privacy and Compliance
SOC investigations include internal telemetry, endpoint artifacts, incident timelines, and potentially regulated data. Running inference locally means this data does not leave your infrastructure boundary. For teams operating under SOC 2 (Service Organization Control 2), ISO 27001 (International Organization for Standardization information security standard), or HIPAA (Health Insurance Portability and Accountability Act), sending raw incident payloads to a public cloud API can become a compliance liability. Local inference keeps control, auditability, and data residency in your hands.
Latency Under Multi-Agent Load
AuroraSOC uses one orchestrator plus many specialist agents. During an active intrusion, multiple agents analyze evidence in parallel. If every request depends on an external API round-trip, each call adds network and provider latency. That delay compounds across an investigation chain, so what looks like "only a few seconds per call" can quickly become minutes when many calls are chained together.
Cost for Continuous Operations
AuroraSOC is built for 24/7 operation, not occasional prompts. A realistic load can look like:
- 16 specialist agents
- ~1,000 tokens per call
- 100 calls per hour
That is about 1.4 million tokens per hour. At common hosted API pricing, this can become hundreds of dollars per day, before incident surges. Self-hosted inference shifts that profile from variable per-token spend to infrastructure cost you can plan and cap.
The Models: IBM Granite 4
AuroraSOC standardizes on IBM Granite 4 because it provides open weights, a commercial-friendly license posture, and strong structured reasoning quality for security analysis tasks. It also performs well for tasks that need disciplined, format-aware outputs.
AuroraSOC uses two runtime model roles:
granite-soc-specialist: default vLLM model used by the specialist fleet.VLLM_ORCHESTRATOR_MODEL: orchestrator model identifier (defaults togranite-soc-specialist, but can be set togranite-soc-orchestratoror another served model).
Granite 4 is a general-purpose foundation model. Fine-tuning turns it into a SOC-specialized model. A practical analogy is hiring an exceptionally smart generalist and then giving them three months of intensive SOC apprenticeship: they already know how to reason, and fine-tuning teaches them your security language, triage habits, and investigation style.
The Architecture: How a Security Alert Becomes an AI Response
Imagine a ransomware alert arrives from your SIEM: endpoint encryption spikes, suspicious process chains, and lateral movement indicators appear at once. AuroraSOC first normalizes and attests the event stream in the Rust core so all downstream systems receive a consistent, trustworthy payload. That normalized alert is pushed into Redis Streams, where the agent worker pipeline picks it up and forwards it to the orchestrator. The orchestrator queries vLLM using VLLM_ORCHESTRATOR_MODEL (default: granite-soc-specialist), decomposes the incident into specialist tasks, and dispatches those tasks concurrently.
Network analysis, malware analysis, threat intelligence enrichment, and endpoint investigation then run in parallel. Each specialist queries vLLM with granite-soc-specialist and returns structured findings. The orchestrator merges those findings into a single investigation artifact. That artifact is then delivered to the Next.js dashboard over WebSocket so analysts see the investigation evolve in near real time instead of waiting for a serialized, single-threaded pipeline.
IoT / SIEM Alert
│
▼
Rust Core Engine (normalize + attest)
│
▼
Redis Streams → Agent Worker
│
▼
Orchestrator Agent
(queries LLM backend: vLLM / Ollama / OpenAI-compatible)
│
├──▶ Network Analyzer Agent ──▶ LLM Backend
├──▶ Malware Analyst Agent ──▶ LLM Backend
├──▶ Threat Intel Agent ──▶ LLM Backend (all concurrent)
└──▶ Endpoint Security Agent ──▶ LLM Backend
│
▼
Structured Investigation Report
│
▼
Next.js Dashboard (WebSocket)
Training Pipeline Summary
AuroraSOC trains and exports specialist and orchestrator variants through the Granite fine-tuning pipeline, then serves those exported artifacts through vLLM as the production default. The pipeline covers dataset preparation, LoRA fine-tuning, merged FP16 export for vLLM, and validation before deployment; see Training Pipeline — Complete Guide for Beginners for the full step-by-step workflow.
What to Read Next
- Inference Backends: vLLM and Ollama — Complete Guide: detailed backend architecture, switching workflow, OpenAI-compatible provider setup, and troubleshooting.
- Training Pipeline — Complete Guide for Beginners: end-to-end training from dataset prep to deployed model serving.
- Environment Variables Reference — LLM & Inference: single source of truth for backend and model configuration variables.