LLM Integration Architecture
This page explains how IBM Granite 4 models are wired into AuroraSOC's multi-agent system — from the factory that creates agents, through the Granite module that resolves models, to the serving backends that run inference.
High-Level Architecture
Component Responsibilities
| Component | File | Purpose |
|---|---|---|
| AuroraAgentFactory | aurorasoc/agents/factory.py | Creates all 17 BeeAI agents (orchestrator + 16 specialists) with correct LLMs, tools, and prompts |
| Granite Module | aurorasoc/granite/__init__.py | Model resolution, configuration, and ChatModel creation |
| Model Registry | aurorasoc/granite/registry.py | Health checks, model availability, warmup |
| Agent Prompts | aurorasoc/agents/prompts.py | System prompts for each agent persona |
| BeeAI Framework | beeai-framework | RequirementAgent, ChatModel, tools, middleware |
The Request Flow
When an alert arrives in AuroraSOC, here's the complete flow from event to LLM response:
Step-by-step:
- Event ingestion: An alert arrives via MQTT or NATS JetStream
- Normalization: The event pipeline normalizes the alert into a standard format
- Dispatch: The API layer calls the factory to handle the alert
- Agent selection: The factory determines which specialist agent should handle this alert type
- Model resolution:
_llm_for(agent_name)calls the Granite module to resolve the correct model - ChatModel creation: The Granite module returns a BeeAI
ChatModelinstance configured for the correct backend - Agent creation: The factory creates a
RequirementAgentwith the model, tools, and system prompt - Inference: The agent uses the ChatModel to query Ollama or vLLM
- Response: The agent returns structured analysis back to the API layer
The Factory: AuroraAgentFactory
The factory (aurorasoc/agents/factory.py) is the central point where agents get their LLMs. Specialist definitions are declared in AGENT_SPECS, then instantiated through a generic create_agent() path.
Factory Initialization
class AuroraAgentFactory:
def __init__(self, model_name: str | None = None,
granite_config: GraniteModelConfig | None = None):
self.granite_config = granite_config or get_default_granite_config()
if model_name:
self.granite_config.model_name = model_name
The factory accepts an optional model_name (for manual override) and an optional GraniteModelConfig. If neither is provided, it falls back to get_default_granite_config() which reads from environment variables.
The _llm_for() Method
This is the critical bridge between agents and models:
def _llm_for(self, agent_name: str) -> ChatModel:
"""Resolve the appropriate ChatModel for a given agent."""
return create_granite_chat_model(
config=self.granite_config,
agent_name=agent_name,
)
Every create_*() method calls _llm_for() to get its model:
def create_threat_hunter(self) -> RequirementAgent:
llm = self._llm_for("threat_hunter")
return RequirementAgent(
llm=llm,
tools=[ThinkTool(), ...],
# ... system prompt, middleware
)
The Orchestrator
The orchestrator is a special agent that routes alerts to the correct specialist via HandoffTool:
def create_orchestrator(self, agents: dict) -> RequirementAgent:
llm = self._llm_for("orchestrator")
handoff_tools = [
HandoffTool(agent=agent, name=name)
for name, agent in agents.items()
]
return RequirementAgent(
llm=llm,
tools=[ThinkTool(), *handoff_tools],
# ...
)
The orchestrator gets its own LLM (potentially a fine-tuned orchestration model) and can delegate to any specialist agent.
Building the Full Team
async def build_full_team(self):
specialists = [
await self.create_agent(name) for name in AGENT_SPECS
]
orchestrator = await self.create_orchestrator(specialists)
return orchestrator, specialists
This builds all 16 specialist agents from AGENT_SPECS and wires them into the orchestrator.
Data Flow Diagram
Key Design Decisions
Why BeeAI Framework?
AuroraSOC uses IBM's BeeAI Agent Framework because it provides:
- RequirementAgent — agents that can specify requirements before responding (e.g., "I need the PCAP file")
- ConditionalRequirement — attach requirements conditionally (e.g., only request ThinkTool for complex analysis)
- GlobalTrajectoryMiddleware — shared memory across agent turns
- HandoffTool — zero-code agent-to-agent delegation
- ChatModel.from_name() — provider-agnostic model resolution (supports Ollama, vLLM, OpenAI-compatible APIs)
Why Per-Agent Models?
Not all security domains need the same model:
- Malware analysis needs deep code/binary understanding → specialist model
- Compliance needs regulatory knowledge → specialist model
- Report generation needs structured output → different prompt engineering
Per-agent models let each specialist agent use a model fine-tuned for its exact domain.
Why Granite 4 Hybrid?
IBM Granite 4 Hybrid uses a Mamba-Transformer architecture:
- Mamba layers provide O(n) complexity for long sequences (better for log analysis)
- Transformer layers provide strong attention for reasoning tasks
- Hybrid combines both strengths at lower VRAM than a pure Transformer
Next Steps
- Granite Module Deep Dive — full details on model resolution
- Model Swap Guide — switch between base and fine-tuned models
- Serving Backends — Ollama vs vLLM configuration