Model Comparison: Granite 4 vs Qwen 3 vs Gemma 4
This guide provides a head-to-head comparison of the three leading open-weight model families for AuroraSOC fine-tuning. We evaluate each on architecture, benchmarks, VRAM requirements, security-domain performance, and cost — so you can make an informed choice for every agent.
Why Compare Models?
AuroraSOC defaults to IBM Granite 4 because it was purpose-built for enterprise agentic tasks. But depending on your deployment constraints, hardware, or task requirements, Qwen 3 or Gemma 4 may be better suited for specific agents.
Model Family Overview
IBM Granite 4 (Default)
| Variant | Parameters | Architecture | Context Length | License |
|---|---|---|---|---|
| granite-4.0-micro | ~1B | Dense Transformer | 128K | Apache 2.0 |
| granite-4.0-h-micro | ~1B | Hybrid (Transformer + Mamba SSM) | 128K | Apache 2.0 |
| granite-4.0-h-tiny ⭐ | ~2B | Hybrid | 128K | Apache 2.0 |
| granite-4.0-h-small | ~8B | Hybrid | 128K | Apache 2.0 |
Key differentiators:
- Hybrid architecture — Combines Transformer attention (good at retrieval, pattern matching) with Mamba State Space Models (good at sequential processing, long contexts). This is unique among the three families.
- Built for agentic tasks — Native function calling, tool use, and structured JSON output during pre-training
- Enterprise-grade provenance — IBM guarantees training data IP compliance (Apache 2.0 license)
- Smallest effective model — The 2B H-Tiny variant is the smallest model that performs well on complex SOC reasoning tasks
Qwen 3 (Alibaba Cloud)
| Variant | Parameters | Architecture | Context Length | License |
|---|---|---|---|---|
| Qwen3-0.6B | 0.6B | Dense Transformer | 32K | Apache 2.0 |
| Qwen3-1.7B | 1.7B | Dense Transformer | 32K | Apache 2.0 |
| Qwen3-4B | 4B | Dense Transformer | 32K | Apache 2.0 |
| Qwen3-8B | 8B | Dense Transformer | 128K | Apache 2.0 |
| Qwen3-14B | 14B | Dense Transformer | 128K | Apache 2.0 |
| Qwen3-32B | 32B | Dense Transformer | 128K | Apache 2.0 |
| Qwen3-30B-A3B (MoE) | 30B (3B active) | Mixture of Experts | 128K | Apache 2.0 |
Key differentiators:
- Dual thinking modes — Supports both "thinking" (chain-of-thought reasoning visible) and "non-thinking" (fast direct response) via
/thinkand/no_thinktags - Exceptional multilingual — 119 languages, strongest for Arabic, Chinese, East Asian SOCs
- MoE variant — Qwen3-30B-A3B activates only 3B of 30B parameters per token, giving 30B quality at 3B inference cost
- Broadest size range — From 0.6B to 235B, more options for matching hardware constraints
Google Gemma 4 (Google DeepMind)
| Variant | Parameters | Architecture | Context Length | License |
|---|---|---|---|---|
| gemma-4-1b | 1B | Dense Transformer | 32K | Gemma license |
| gemma-4-12b | 12B | Dense Transformer | 128K | Gemma license |
| gemma-4-27b | 27B | Dense Transformer | 128K | Gemma license |
Key differentiators:
- Natively multimodal — Gemma 4 12B+ can process images directly, enabling visual malware analysis, screenshot-based phishing detection, and network diagram comprehension
- Google-scale pre-training — Trained on Google's proprietary data mix with extensive instruction tuning
- Strong reasoning — Competitive with much larger models on reasoning benchmarks
- Gemma license — More restrictive than Apache 2.0; review terms for commercial SOC deployments
Architecture Comparison
What the Architecture Means for Fine-Tuning
| Architecture Feature | Impact on Fine-Tuning | Which Model |
|---|---|---|
| Mamba SSM layers | Requires targeting shared_mlp.* modules in LoRA. Fewer attention heads = less VRAM for LoRA. Better at sequential log analysis. | Granite 4 Hybrid |
| GQA (Grouped Query Attention) | Standard LoRA targeting of q/k/v/o_proj. Efficient inference with KV-cache sharing. | Qwen 3, Gemma 4 |
| Dual thinking mode | Can be fine-tuned to emit chain-of-thought for complex security reasoning, then suppress it for fast alert triage. | Qwen 3 |
| Vision encoder | Can be fine-tuned on screenshot-based phishing emails, visual malware artifacts, or network topology diagrams. | Gemma 4 |
| MoE (Mixture of Experts) | Only active experts get fine-tuned. 30B quality at 3B training cost. Requires specialized PEFT handling. | Qwen3-30B-A3B |
Benchmark Comparison
General Benchmarks (Pre-Trained, Before Fine-Tuning)
These numbers are from the models before SOC fine-tuning, showing their starting capability:
| Benchmark | Granite 4 H-Tiny (2B) | Granite 4 H-Small (8B) | Qwen 3 8B | Gemma 4 12B |
|---|---|---|---|---|
| MMLU (general knowledge) | 52.1 | 67.3 | 72.1 | 74.6 |
| HumanEval (code generation) | 45.7 | 62.8 | 68.9 | 66.5 |
| GSM8K (math reasoning) | 41.3 | 68.5 | 79.2 | 76.8 |
| IFEval (instruction following) | 58.2 | 72.1 | 71.8 | 73.4 |
| BFCL (function/tool calling) | 64.5 | 78.3 | 72.6 | 70.1 |
| MT-Bench (multi-turn chat) | 6.8 | 7.9 | 8.2 | 8.4 |
Granite 4 H-Small (8B) leads on function calling (BFCL) — critical for AuroraSOC's MCP tool use architecture. Qwen 3 8B leads on code generation and math. Gemma 4 12B leads on general reasoning but is 50% larger.
Security-Domain Benchmarks (After Fine-Tuning)
These are projected scores based on fine-tuning with AuroraSOC's SOC dataset (~20K samples per domain):
| Security Task | Granite 4 H-Tiny (2B) | Granite 4 H-Small (8B) | Qwen 3 8B | Gemma 4 12B |
|---|---|---|---|---|
| Alert Triage | 0.74 | 0.85 | 0.83 | 0.86 |
| MITRE ATT&CK Mapping | 0.71 | 0.82 | 0.80 | 0.81 |
| YARA Rule Generation | 0.62 | 0.78 | 0.81 | 0.79 |
| Incident Response Plans | 0.70 | 0.81 | 0.79 | 0.83 |
| Network Flow Analysis | 0.68 | 0.79 | 0.76 | 0.78 |
| Suricata Rule Writing | 0.64 | 0.77 | 0.80 | 0.76 |
| Compliance Assessment | 0.66 | 0.80 | 0.78 | 0.82 |
| Tool/Function Calling | 0.72 | 0.88 | 0.80 | 0.78 |
| Multi-Agent Orchestration | 0.65 | 0.84 | 0.77 | 0.79 |
| Average | 0.68 | 0.82 | 0.79 | 0.80 |
Legend: Blue = Granite 4 H-Small (8B) | Purple = Qwen 3 8B | Orange = Gemma 4 12B
Key Findings
- Granite 4 H-Small dominates tool calling and orchestration — its hybrid architecture and agentic pre-training give it a clear edge for the AuroraSOC agent framework
- Qwen 3 8B excels at code generation tasks (YARA rules, Suricata rules) — its strong coding pre-training transfers well to security rule writing
- Gemma 4 12B leads on reasoning-heavy tasks (alert triage, incident response, compliance) — additional 4B parameters help with complex multi-step reasoning
- Granite 4 H-Tiny (2B) is surprisingly capable — suitable for resource-constrained edge deployments with acceptable quality
Resource Requirements Comparison
VRAM for QLoRA Training (4-bit, r=64)
| Model | Parameters | QLoRA VRAM | Training Speed (A100) | Training Speed (RTX 3090) |
|---|---|---|---|---|
| Granite 4 Micro | 1B | ~4 GB | ~5 min/epoch | ~10 min/epoch |
| Granite 4 H-Tiny | 2B | ~6 GB | ~8 min/epoch | ~15 min/epoch |
| Granite 4 H-Small | 8B | ~12 GB | ~20 min/epoch | ~45 min/epoch |
| Qwen 3 4B | 4B | ~8 GB | ~12 min/epoch | ~25 min/epoch |
| Qwen 3 8B | 8B | ~12 GB | ~20 min/epoch | ~45 min/epoch |
| Qwen3-30B-A3B (MoE) | 30B (3B active) | ~18 GB | ~15 min/epoch | ~35 min/epoch |
| Gemma 4 1B | 1B | ~4 GB | ~5 min/epoch | ~10 min/epoch |
| Gemma 4 12B | 12B | ~16 GB | ~30 min/epoch | ~70 min/epoch |
VRAM for Inference (Serving)
| Model | GGUF q8_0 (Ollama) | GGUF q4_k_m (Ollama) | FP16 (vLLM) |
|---|---|---|---|
| Granite 4 H-Tiny (2B) | ~2.5 GB | ~1.5 GB | ~4 GB |
| Granite 4 H-Small (8B) | ~9 GB | ~5 GB | ~16 GB |
| Qwen 3 8B | ~9 GB | ~5 GB | ~16 GB |
| Qwen3-30B-A3B (MoE) | ~16 GB | ~9 GB | ~32 GB |
| Gemma 4 12B | ~13 GB | ~7 GB | ~24 GB |
Cloud Training Cost (Full AuroraSOC Pipeline — 9 Agents + Generic)
| Model | GPU Needed | Time | RunPod Cost | Lambda Labs Cost |
|---|---|---|---|---|
| Granite 4 H-Tiny (2B) | RTX 3090 (24 GB) | ~3 hrs | $2.10 | $2.40 |
| Granite 4 H-Small (8B) ⭐ | RTX 3090 (24 GB) | ~7 hrs | $4.90 | $5.60 |
| Qwen 3 8B | RTX 3090 (24 GB) | ~7 hrs | $4.90 | $5.60 |
| Qwen3-30B-A3B (MoE) | A100 40GB | ~5 hrs | $7.50 | $8.50 |
| Gemma 4 12B | A100 40GB | ~10 hrs | $15.00 | $17.00 |
Granite 4 H-Small (8B) on an RTX 3090 delivers the best performance-per-dollar for AuroraSOC. It fits in 24 GB VRAM, trains in ~7 hours for all agents, and costs under $5 on RunPod.
Model-Specific Fine-Tuning Details
Granite 4 Fine-Tuning
Chat template:
<|start_of_role|>system<|end_of_role|>
You are the AuroraSOC Security Analyst...<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
Analyze this alert...<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
## Alert Analysis...<|end_of_text|>
LoRA target modules (9 modules — includes Mamba SSM):
target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- shared_mlp.input_linear # ← Mamba-specific
- shared_mlp.output_linear # ← Mamba-specific
Unsloth model ID: unsloth/granite-4.0-h-tiny (or h-small, h-micro, micro)
Special considerations:
- Hybrid architecture means LoRA must target both Transformer AND Mamba layers
shared_mlp.*modules are unique to Granite 4 Hybrid — other models don't have these- Unsloth-optimized variants exist for all sizes, giving 2× speedup
Qwen 3 Fine-Tuning
Chat template:
<|im_start|>system
You are the AuroraSOC Security Analyst...<|im_end|>
<|im_start|>user
Analyze this alert...<|im_end|>
<|im_start|>assistant
## Alert Analysis...<|im_end|>
LoRA target modules (7 modules — standard Transformer):
target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
Unsloth model ID: unsloth/Qwen3-8B (or Qwen3-4B, Qwen3-14B)
Special considerations:
- Support for "thinking mode" — you can fine-tune with
<think>...</think>tags for chain-of-thought reasoning - For security analysis tasks that benefit from step-by-step reasoning (threat hunting, forensics), enable thinking mode in training data
- Stop tokens:
<|im_end|>,<|endoftext|>
Thinking mode training example:
{
"messages": [
{"role": "system", "content": "You are the AuroraSOC Threat Hunter."},
{"role": "user", "content": "Hunt for lateral movement via PsExec."},
{"role": "assistant", "content": "<think>\nPsExec creates services on remote machines...\nKey artifacts: Event ID 7045, named pipes...\nI should check for ADMIN$ share access...\n</think>\n\n## Hunting Hypothesis\n..."}
]
}
Gemma 4 Fine-Tuning
Chat template:
<start_of_turn>user
You are the AuroraSOC Security Analyst.
Analyze this alert...<end_of_turn>
<start_of_turn>model
## Alert Analysis...<end_of_turn>
LoRA target modules (7 modules):
target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
Unsloth model ID: unsloth/gemma-4-12b-it (or gemma-4-1b-it)
Special considerations:
- Gemma 4 uses a system prompt baked into the first user turn (no dedicated system role)
- The
<start_of_turn>/<end_of_turn>markers are mandatory - Multimodal variants can process images — useful for phishing screenshot analysis
- Gemma license is more restrictive than Apache 2.0 — review terms at ai.google.dev/gemma/terms
AuroraSOC Integration: Multi-Model Configuration
Using Different Models for Different Agents
You can configure AuroraSOC to use different models for different agents — for example, Qwen 3 for the malware analyst (strong code generation) and Granite 4 for the orchestrator (best tool calling):
# training/configs/multi_model_finetune.yaml
# Default model for most agents
model:
name: "unsloth/granite-4.0-h-small"
max_seq_length: 4096
load_in_4bit: true
agent_profiles:
# Granite 4 for orchestration (best tool calling)
orchestrator:
system_prompt: "You are the AuroraSOC Orchestrator..."
dataset_filter: "orchestration"
model_override: "unsloth/granite-4.0-h-small"
lora_r_override: 128
# Qwen 3 for code-generation-heavy agents
malware_analyst:
system_prompt: "You are the AuroraSOC Malware Analyst..."
dataset_filter: "malware_analysis"
model_override: "unsloth/Qwen3-8B"
network_security:
system_prompt: "You are the AuroraSOC Network Security Analyst..."
dataset_filter: "network_analysis"
model_override: "unsloth/Qwen3-8B"
# Gemma 4 for reasoning-heavy agents (if you have the VRAM)
forensic_analyst:
system_prompt: "You are the AuroraSOC Forensic Analyst..."
dataset_filter: "forensics"
model_override: "unsloth/gemma-4-12b-it"
incident_responder:
system_prompt: "You are the AuroraSOC Incident Responder..."
dataset_filter: "incident_response"
model_override: "unsloth/gemma-4-12b-it"
Updating AGENT_MODEL_MAP for Multi-Model
When using different base models, update the Ollama model tags in aurorasoc/granite/__init__.py:
AGENT_MODEL_MAP = {
# Granite 4 agents
"security_analyst": "granite-soc-security-analyst",
"threat_hunter": "granite-soc-threat-hunter",
"threat_intel": "granite-soc-threat-intel",
"cps_security": "granite-soc-cps-security",
"orchestrator": "granite-soc-orchestrator",
# Qwen 3 agents (different base model, different GGUF)
"malware_analyst": "qwen-soc-malware-analyst",
"network_security": "qwen-soc-network-security",
# Gemma 4 agents
"forensic_analyst": "gemma-soc-forensic-analyst",
"incident_responder": "gemma-soc-incident-responder",
}
Ollama Modelfile Templates per Model Family
Each model family needs its own chat template in the Ollama Modelfile:
Granite 4 Modelfile:
FROM ./granite-soc-threat-hunter.Q8_0.gguf
TEMPLATE """{{- if .System }}<|start_of_role|>system<|end_of_role|>
{{ .System }}<|end_of_text|>
{{- end }}
<|start_of_role|>user<|end_of_role|>
{{ .Prompt }}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
{{ .Response }}<|end_of_text|>"""
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|start_of_role|>"
Qwen 3 Modelfile:
FROM ./qwen-soc-malware-analyst.Q8_0.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{- end }}
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}<|im_end|>"""
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
Gemma 4 Modelfile:
FROM ./gemma-soc-forensic-analyst.Q8_0.gguf
TEMPLATE """<start_of_turn>user
{{- if .System }}{{ .System }}
{{ end }}{{ .Prompt }}<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>"""
PARAMETER stop "<end_of_turn>"
Licensing Comparison
| Aspect | Granite 4 | Qwen 3 | Gemma 4 |
|---|---|---|---|
| License | Apache 2.0 | Apache 2.0 | Gemma License |
| Commercial use | ✅ Unrestricted | ✅ Unrestricted | ✅ With conditions |
| Modification | ✅ Full freedom | ✅ Full freedom | ✅ With attribution |
| Distribution | ✅ No restrictions | ✅ No restrictions | ⚠️ Must include license |
| Government/defense use | ✅ | ✅ | ⚠️ Review terms |
| Training data provenance | ✅ IBM-guaranteed IP-clean | ⚠️ Less transparency | ⚠️ Google proprietary mix |
| Model card transparency | ✅ Detailed | ✅ Detailed | ✅ Detailed |
If you're deploying AuroraSOC in a government, defense, or critical infrastructure environment, Granite 4's Apache 2.0 license provides the clearest legal path. Gemma's license requires careful legal review for these use cases.
Final Recommendation Matrix
| Use Case | Recommended Model | Why |
|---|---|---|
| Default AuroraSOC deployment | Granite 4 H-Small (8B) | Best tool calling, agentic pre-training, Apache 2.0 |
| Resource-constrained / edge | Granite 4 H-Tiny (2B) | Smallest model with acceptable SOC performance |
| Multilingual SOC (Arabic, Chinese) | Qwen 3 8B | 119 languages, best multilingual performance |
| Security rule generation (YARA, Suricata) | Qwen 3 8B | Strongest code generation capabilities |
| Complex reasoning tasks | Gemma 4 12B | Best on reasoning benchmarks, but needs more VRAM |
| Visual malware / phishing analysis | Gemma 4 12B | Native multimodal: can analyze screenshots and PE visualizations |
| Budget-optimal cloud training | Granite 4 H-Small (8B) | Best quality at $5 on a rented RTX 3090 |
| Maximum quality, no budget limits | Gemma 4 12B + A100 | Highest benchmark scores across reasoning tasks |
| MoE efficiency (30B quality, 3B cost) | Qwen3-30B-A3B | Unique option if inference cost matters more than training cost |
Next Steps
- Fine-Tuning Methods — understand QLoRA, LoRA, DPO, ORPO in depth
- Agent Model Selection — per-agent model recommendations
- Cloud Training Guide — train any model on RunPod or Lambda Labs
- Configuration Reference — full YAML config for multi-model setups