Training IBM Granite Models for AuroraSOC
AuroraSOC uses IBM Granite 4.0 language models as the reasoning backbone for all 16 AI agents. While the base Granite models provide strong general-purpose capabilities, fine-tuning them on Security Operations Center (SOC) data produces dramatically better results for security-specific tasks like alert triage, threat hunting, incident response, and malware analysis.
This guide covers every step of the training pipeline — from preparing datasets to deploying fine-tuned models into production.
Why Fine-Tune?
Base language models are trained on broad internet text. They know about cybersecurity, but they don't think like a SOC analyst. Fine-tuning addresses this gap:
| Capability | Base Granite 4 | Fine-Tuned Granite 4 |
|---|---|---|
| Alert triage accuracy | General heuristics | SOC-specific severity scoring |
| MITRE ATT&CK mapping | Knows the framework | Automatically maps techniques to alerts |
| Incident response plans | Generic advice | NIST SP 800-61 / SANS-structured playbooks |
| ICS/OT protocol analysis | Minimal knowledge | Modbus, DNP3, OPC-UA, IEC 62443 |
| Output format | Free-form text | Structured JSON for downstream automation |
| Response time | Comparable | Faster (smaller, quantized models) |
Key insight: Fine-tuning teaches the model your SOC's language — the specific alert formats, triage procedures, escalation criteria, and reporting templates that your team uses every day.
What Gets Trained
AuroraSOC's training pipeline supports two modes:
1. Generic SOC Model (Recommended Starting Point)
A single model trained on data from all SOC domains. Every agent uses the same fine-tuned weights. This is simpler to manage and works well when you have limited training data.
One model → All 16 agents
2. Per-Agent Specialist Models
Individual models fine-tuned for each agent's specific domain. The SecurityAnalyst agent gets a model trained primarily on alert analysis data, the ThreatHunter gets one focused on hunting hypotheses, etc. This produces better results but requires more compute and management overhead.
9 specialist models → 9 agent types
security_analyst → SecurityAnalyst, EndpointSecurity, WebSecurity, CloudSecurity
threat_hunter → ThreatHunter, UEBAAnalyst
malware_analyst → MalwareAnalyst
incident_responder → IncidentResponder
network_security → NetworkSecurity
cps_security → CPSSecurity
threat_intel → ThreatIntel
forensic_analyst → ForensicAnalyst
orchestrator → Orchestrator
When to Train
Use this decision tree to determine your approach:
Decision Criteria
| Scenario | Recommendation | Why |
|---|---|---|
| First deployment, testing | Base Granite 4 via Ollama | Zero training needed, get running fast |
| Production with standard SOC data | Generic fine-tuned model | Best effort-to-quality ratio |
| Production with large domain-specific datasets | Per-agent specialists | Maximum accuracy per domain |
| Air-gapped environment | Local GPU training | No internet required after initial setup |
| No GPU access | Google Colab (free T4) | Free 16GB GPU, exports GGUF for local use |
| Team-wide model iteration | Docker training pipeline | Reproducible, versioned, CI/CD friendly |
Architecture Overview
The training pipeline consists of five stages:
| Stage | Script / Tool | Output |
|---|---|---|
| 1. Prepare Data | training/scripts/prepare_datasets.py | training/data/soc_train.jsonl |
| 2. Fine-Tune | training/scripts/finetune_granite.py or Colab notebook | LoRA adapters in training/checkpoints/ |
| 3. Evaluate | training/scripts/evaluate_model.py | Benchmark scores (JSON report) |
| 4. Export | Built into finetune script | GGUF (Ollama) or FP16 (vLLM) |
| 5. Deploy | training/scripts/serve_model.py or scripts/setup_local.sh | Running model in Ollama or vLLM |
IBM Granite 4.0 Model Variants
AuroraSOC supports four Granite 4.0 model variants. Choose based on your available hardware:
| Model | Architecture | Parameters | Min VRAM (4-bit) | Best For |
|---|---|---|---|---|
unsloth/granite-4.0-micro | Dense Transformer | ~1B | 4 GB | Quick testing, minimal hardware |
unsloth/granite-4.0-h-micro | Hybrid Mamba-Transformer | ~1B | 6 GB | Fast inference, edge deployment |
unsloth/granite-4.0-h-tiny | Hybrid Mamba-Transformer | ~2B | 8 GB | Recommended default (T4/RTX 3060+) |
unsloth/granite-4.0-h-small | Hybrid Mamba-Transformer | ~4B | 16 GB | Best quality (A100/L4/RTX 4090) |
Why Hybrid Mamba-Transformer? The "h" variants use IBM's hybrid architecture that combines Mamba state-space layers with standard Transformer attention. This gives faster inference (especially for long sequences) while maintaining strong reasoning ability. The Mamba layers handle pattern recognition across long security logs, while the Transformer layers handle complex multi-step reasoning.
Training Framework: Unsloth
All training uses Unsloth, which provides:
- 2× faster training compared to standard Hugging Face
SFTTrainer - 70% less VRAM via intelligent gradient checkpointing
- Native Granite 4 support including hybrid Mamba-Transformer LoRA targets
- Built-in GGUF export for direct Ollama deployment
- Response masking via
train_on_responses_only— the model only learns from assistant outputs, not user prompts
What is LoRA?
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique. Instead of updating all model parameters (billions of weights), LoRA injects small trainable matrices into specific layers. This means:
- Training uses much less VRAM (a 2B model fits on a 16GB T4)
- Training is much faster (minutes to hours, not days)
- The base model is not modified — you can swap LoRA adapters without re-downloading
- You can have multiple LoRA adapters for different agents, all sharing the same base
AuroraSOC's LoRA configuration targets these layers (configured in training/configs/granite_soc_finetune.yaml):
lora:
r: 64 # Rank — higher = more capacity, more VRAM
lora_alpha: 64 # Scaling factor (typically = r)
target_modules: # Which layers to adapt
- q_proj # Query projection (attention)
- k_proj # Key projection (attention)
- v_proj # Value projection (attention)
- o_proj # Output projection (attention)
- gate_proj # Feed-forward gate
- up_proj # Feed-forward up projection
- down_proj # Feed-forward down projection
- shared_mlp.input_linear # Granite 4 Hybrid Mamba layers
- shared_mlp.output_linear # Granite 4 Hybrid Mamba layers
The shared_mlp.input_linear and shared_mlp.output_linear targets are specific to Granite 4 Hybrid models — they adapt the Mamba state-space layers in addition to the standard Transformer attention layers.
Chat Template
Granite 4 uses a specific chat template format. All training data must follow this format:
<|start_of_role|>system<|end_of_role|>
You are the AuroraSOC Security Analyst...<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
Analyze this Suricata alert: ...<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
Based on the alert, I identify the following...<|end_of_text|>
The training pipeline uses train_on_responses_only from Unsloth to mask the loss on system and user tokens. This means the model only learns to generate assistant responses — it doesn't memorize the prompts. This is critical for security: the model learns how to respond, not how to parrot training data.
Quick Reference: Make Targets
All training operations have Makefile shortcuts:
# Data preparation
make train-data # Download and prepare all SOC datasets
# Training (local GPU)
make train # Train generic SOC model
make train AGENT=security_analyst # Train specific agent model
make train-all # Train all 9 agent profiles
# Training (Docker)
make train-docker # Train in Docker container
make train-agent-docker AGENT=threat_hunter # Docker per-agent
# Evaluation
make train-eval # Evaluate local checkpoint
make train-eval-ollama # Evaluate Ollama model
# Export & Serve
make train-serve-ollama # Import GGUF into Ollama
make train-serve-vllm # Start vLLM server
make train-modelfile # Generate Ollama Modelfile
Next Steps
- Prerequisites — System requirements and setup
- Dataset Preparation — Building training data
- Local GPU Training — Train on your own hardware
- Docker Training — Reproducible containerized training
- Google Colab Training — Free cloud GPU training
- Per-Agent Specialists — Domain-specific models
- Evaluation & Export — Test and deploy models
- Configuration Reference — Full YAML config docs