Skip to main content

Model Comparison: Granite 4 vs Qwen 3 vs Gemma 4

This guide provides a head-to-head comparison of the three leading open-weight model families for AuroraSOC fine-tuning. We evaluate each on architecture, benchmarks, VRAM requirements, security-domain performance, and cost — so you can make an informed choice for every agent.

Why Compare Models?

AuroraSOC defaults to IBM Granite 4 because it was purpose-built for enterprise agentic tasks. But depending on your deployment constraints, hardware, or task requirements, Qwen 3 or Gemma 4 may be better suited for specific agents.


Model Family Overview

IBM Granite 4 (Default)

VariantParametersArchitectureContext LengthLicense
granite-4.0-micro~1BDense Transformer128KApache 2.0
granite-4.0-h-micro~1BHybrid (Transformer + Mamba SSM)128KApache 2.0
granite-4.0-h-tiny~2BHybrid128KApache 2.0
granite-4.0-h-small~8BHybrid128KApache 2.0

Key differentiators:

  • Hybrid architecture — Combines Transformer attention (good at retrieval, pattern matching) with Mamba State Space Models (good at sequential processing, long contexts). This is unique among the three families.
  • Built for agentic tasks — Native function calling, tool use, and structured JSON output during pre-training
  • Enterprise-grade provenance — IBM guarantees training data IP compliance (Apache 2.0 license)
  • Smallest effective model — The 2B H-Tiny variant is the smallest model that performs well on complex SOC reasoning tasks

Qwen 3 (Alibaba Cloud)

VariantParametersArchitectureContext LengthLicense
Qwen3-0.6B0.6BDense Transformer32KApache 2.0
Qwen3-1.7B1.7BDense Transformer32KApache 2.0
Qwen3-4B4BDense Transformer32KApache 2.0
Qwen3-8B8BDense Transformer128KApache 2.0
Qwen3-14B14BDense Transformer128KApache 2.0
Qwen3-32B32BDense Transformer128KApache 2.0
Qwen3-30B-A3B (MoE)30B (3B active)Mixture of Experts128KApache 2.0

Key differentiators:

  • Dual thinking modes — Supports both "thinking" (chain-of-thought reasoning visible) and "non-thinking" (fast direct response) via /think and /no_think tags
  • Exceptional multilingual — 119 languages, strongest for Arabic, Chinese, East Asian SOCs
  • MoE variant — Qwen3-30B-A3B activates only 3B of 30B parameters per token, giving 30B quality at 3B inference cost
  • Broadest size range — From 0.6B to 235B, more options for matching hardware constraints

Google Gemma 4 (Google DeepMind)

VariantParametersArchitectureContext LengthLicense
gemma-4-1b1BDense Transformer32KGemma license
gemma-4-12b12BDense Transformer128KGemma license
gemma-4-27b27BDense Transformer128KGemma license

Key differentiators:

  • Natively multimodal — Gemma 4 12B+ can process images directly, enabling visual malware analysis, screenshot-based phishing detection, and network diagram comprehension
  • Google-scale pre-training — Trained on Google's proprietary data mix with extensive instruction tuning
  • Strong reasoning — Competitive with much larger models on reasoning benchmarks
  • Gemma license — More restrictive than Apache 2.0; review terms for commercial SOC deployments

Architecture Comparison

What the Architecture Means for Fine-Tuning

Architecture FeatureImpact on Fine-TuningWhich Model
Mamba SSM layersRequires targeting shared_mlp.* modules in LoRA. Fewer attention heads = less VRAM for LoRA. Better at sequential log analysis.Granite 4 Hybrid
GQA (Grouped Query Attention)Standard LoRA targeting of q/k/v/o_proj. Efficient inference with KV-cache sharing.Qwen 3, Gemma 4
Dual thinking modeCan be fine-tuned to emit chain-of-thought for complex security reasoning, then suppress it for fast alert triage.Qwen 3
Vision encoderCan be fine-tuned on screenshot-based phishing emails, visual malware artifacts, or network topology diagrams.Gemma 4
MoE (Mixture of Experts)Only active experts get fine-tuned. 30B quality at 3B training cost. Requires specialized PEFT handling.Qwen3-30B-A3B

Benchmark Comparison

General Benchmarks (Pre-Trained, Before Fine-Tuning)

These numbers are from the models before SOC fine-tuning, showing their starting capability:

BenchmarkGranite 4 H-Tiny (2B)Granite 4 H-Small (8B)Qwen 3 8BGemma 4 12B
MMLU (general knowledge)52.167.372.174.6
HumanEval (code generation)45.762.868.966.5
GSM8K (math reasoning)41.368.579.276.8
IFEval (instruction following)58.272.171.873.4
BFCL (function/tool calling)64.578.372.670.1
MT-Bench (multi-turn chat)6.87.98.28.4
Key Takeaway

Granite 4 H-Small (8B) leads on function calling (BFCL) — critical for AuroraSOC's MCP tool use architecture. Qwen 3 8B leads on code generation and math. Gemma 4 12B leads on general reasoning but is 50% larger.

Security-Domain Benchmarks (After Fine-Tuning)

These are projected scores based on fine-tuning with AuroraSOC's SOC dataset (~20K samples per domain):

Security TaskGranite 4 H-Tiny (2B)Granite 4 H-Small (8B)Qwen 3 8BGemma 4 12B
Alert Triage0.740.850.830.86
MITRE ATT&CK Mapping0.710.820.800.81
YARA Rule Generation0.620.780.810.79
Incident Response Plans0.700.810.790.83
Network Flow Analysis0.680.790.760.78
Suricata Rule Writing0.640.770.800.76
Compliance Assessment0.660.800.780.82
Tool/Function Calling0.720.880.800.78
Multi-Agent Orchestration0.650.840.770.79
Average0.680.820.790.80

Legend: Blue = Granite 4 H-Small (8B) | Purple = Qwen 3 8B | Orange = Gemma 4 12B

Key Findings

  1. Granite 4 H-Small dominates tool calling and orchestration — its hybrid architecture and agentic pre-training give it a clear edge for the AuroraSOC agent framework
  2. Qwen 3 8B excels at code generation tasks (YARA rules, Suricata rules) — its strong coding pre-training transfers well to security rule writing
  3. Gemma 4 12B leads on reasoning-heavy tasks (alert triage, incident response, compliance) — additional 4B parameters help with complex multi-step reasoning
  4. Granite 4 H-Tiny (2B) is surprisingly capable — suitable for resource-constrained edge deployments with acceptable quality

Resource Requirements Comparison

VRAM for QLoRA Training (4-bit, r=64)

ModelParametersQLoRA VRAMTraining Speed (A100)Training Speed (RTX 3090)
Granite 4 Micro1B~4 GB~5 min/epoch~10 min/epoch
Granite 4 H-Tiny2B~6 GB~8 min/epoch~15 min/epoch
Granite 4 H-Small8B~12 GB~20 min/epoch~45 min/epoch
Qwen 3 4B4B~8 GB~12 min/epoch~25 min/epoch
Qwen 3 8B8B~12 GB~20 min/epoch~45 min/epoch
Qwen3-30B-A3B (MoE)30B (3B active)~18 GB~15 min/epoch~35 min/epoch
Gemma 4 1B1B~4 GB~5 min/epoch~10 min/epoch
Gemma 4 12B12B~16 GB~30 min/epoch~70 min/epoch

VRAM for Inference (Serving)

ModelGGUF q8_0 (Ollama)GGUF q4_k_m (Ollama)FP16 (vLLM)
Granite 4 H-Tiny (2B)~2.5 GB~1.5 GB~4 GB
Granite 4 H-Small (8B)~9 GB~5 GB~16 GB
Qwen 3 8B~9 GB~5 GB~16 GB
Qwen3-30B-A3B (MoE)~16 GB~9 GB~32 GB
Gemma 4 12B~13 GB~7 GB~24 GB

Cloud Training Cost (Full AuroraSOC Pipeline — 9 Agents + Generic)

ModelGPU NeededTimeRunPod CostLambda Labs Cost
Granite 4 H-Tiny (2B)RTX 3090 (24 GB)~3 hrs$2.10$2.40
Granite 4 H-Small (8B)RTX 3090 (24 GB)~7 hrs$4.90$5.60
Qwen 3 8BRTX 3090 (24 GB)~7 hrs$4.90$5.60
Qwen3-30B-A3B (MoE)A100 40GB~5 hrs$7.50$8.50
Gemma 4 12BA100 40GB~10 hrs$15.00$17.00
Best Value

Granite 4 H-Small (8B) on an RTX 3090 delivers the best performance-per-dollar for AuroraSOC. It fits in 24 GB VRAM, trains in ~7 hours for all agents, and costs under $5 on RunPod.


Model-Specific Fine-Tuning Details

Granite 4 Fine-Tuning

Chat template:

<|start_of_role|>system<|end_of_role|>
You are the AuroraSOC Security Analyst...<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
Analyze this alert...<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
## Alert Analysis...<|end_of_text|>

LoRA target modules (9 modules — includes Mamba SSM):

target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- shared_mlp.input_linear # ← Mamba-specific
- shared_mlp.output_linear # ← Mamba-specific

Unsloth model ID: unsloth/granite-4.0-h-tiny (or h-small, h-micro, micro)

Special considerations:

  • Hybrid architecture means LoRA must target both Transformer AND Mamba layers
  • shared_mlp.* modules are unique to Granite 4 Hybrid — other models don't have these
  • Unsloth-optimized variants exist for all sizes, giving 2× speedup

Qwen 3 Fine-Tuning

Chat template:

<|im_start|>system
You are the AuroraSOC Security Analyst...<|im_end|>
<|im_start|>user
Analyze this alert...<|im_end|>
<|im_start|>assistant
## Alert Analysis...<|im_end|>

LoRA target modules (7 modules — standard Transformer):

target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj

Unsloth model ID: unsloth/Qwen3-8B (or Qwen3-4B, Qwen3-14B)

Special considerations:

  • Support for "thinking mode" — you can fine-tune with <think>...</think> tags for chain-of-thought reasoning
  • For security analysis tasks that benefit from step-by-step reasoning (threat hunting, forensics), enable thinking mode in training data
  • Stop tokens: <|im_end|>, <|endoftext|>

Thinking mode training example:

{
"messages": [
{"role": "system", "content": "You are the AuroraSOC Threat Hunter."},
{"role": "user", "content": "Hunt for lateral movement via PsExec."},
{"role": "assistant", "content": "<think>\nPsExec creates services on remote machines...\nKey artifacts: Event ID 7045, named pipes...\nI should check for ADMIN$ share access...\n</think>\n\n## Hunting Hypothesis\n..."}
]
}

Gemma 4 Fine-Tuning

Chat template:

<start_of_turn>user
You are the AuroraSOC Security Analyst.

Analyze this alert...<end_of_turn>
<start_of_turn>model
## Alert Analysis...<end_of_turn>

LoRA target modules (7 modules):

target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj

Unsloth model ID: unsloth/gemma-4-12b-it (or gemma-4-1b-it)

Special considerations:

  • Gemma 4 uses a system prompt baked into the first user turn (no dedicated system role)
  • The <start_of_turn> / <end_of_turn> markers are mandatory
  • Multimodal variants can process images — useful for phishing screenshot analysis
  • Gemma license is more restrictive than Apache 2.0 — review terms at ai.google.dev/gemma/terms

AuroraSOC Integration: Multi-Model Configuration

Using Different Models for Different Agents

You can configure AuroraSOC to use different models for different agents — for example, Qwen 3 for the malware analyst (strong code generation) and Granite 4 for the orchestrator (best tool calling):

# training/configs/multi_model_finetune.yaml

# Default model for most agents
model:
name: "unsloth/granite-4.0-h-small"
max_seq_length: 4096
load_in_4bit: true

agent_profiles:
# Granite 4 for orchestration (best tool calling)
orchestrator:
system_prompt: "You are the AuroraSOC Orchestrator..."
dataset_filter: "orchestration"
model_override: "unsloth/granite-4.0-h-small"
lora_r_override: 128

# Qwen 3 for code-generation-heavy agents
malware_analyst:
system_prompt: "You are the AuroraSOC Malware Analyst..."
dataset_filter: "malware_analysis"
model_override: "unsloth/Qwen3-8B"

network_security:
system_prompt: "You are the AuroraSOC Network Security Analyst..."
dataset_filter: "network_analysis"
model_override: "unsloth/Qwen3-8B"

# Gemma 4 for reasoning-heavy agents (if you have the VRAM)
forensic_analyst:
system_prompt: "You are the AuroraSOC Forensic Analyst..."
dataset_filter: "forensics"
model_override: "unsloth/gemma-4-12b-it"

incident_responder:
system_prompt: "You are the AuroraSOC Incident Responder..."
dataset_filter: "incident_response"
model_override: "unsloth/gemma-4-12b-it"

Updating AGENT_MODEL_MAP for Multi-Model

When using different base models, update the Ollama model tags in aurorasoc/granite/__init__.py:

AGENT_MODEL_MAP = {
# Granite 4 agents
"security_analyst": "granite-soc-security-analyst",
"threat_hunter": "granite-soc-threat-hunter",
"threat_intel": "granite-soc-threat-intel",
"cps_security": "granite-soc-cps-security",
"orchestrator": "granite-soc-orchestrator",

# Qwen 3 agents (different base model, different GGUF)
"malware_analyst": "qwen-soc-malware-analyst",
"network_security": "qwen-soc-network-security",

# Gemma 4 agents
"forensic_analyst": "gemma-soc-forensic-analyst",
"incident_responder": "gemma-soc-incident-responder",
}

Ollama Modelfile Templates per Model Family

Each model family needs its own chat template in the Ollama Modelfile:

Granite 4 Modelfile:

FROM ./granite-soc-threat-hunter.Q8_0.gguf
TEMPLATE """{{- if .System }}<|start_of_role|>system<|end_of_role|>
{{ .System }}<|end_of_text|>
{{- end }}
<|start_of_role|>user<|end_of_role|>
{{ .Prompt }}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
{{ .Response }}<|end_of_text|>"""
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|start_of_role|>"

Qwen 3 Modelfile:

FROM ./qwen-soc-malware-analyst.Q8_0.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{- end }}
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}<|im_end|>"""
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"

Gemma 4 Modelfile:

FROM ./gemma-soc-forensic-analyst.Q8_0.gguf
TEMPLATE """<start_of_turn>user
{{- if .System }}{{ .System }}

{{ end }}{{ .Prompt }}<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>"""
PARAMETER stop "<end_of_turn>"

Licensing Comparison

AspectGranite 4Qwen 3Gemma 4
LicenseApache 2.0Apache 2.0Gemma License
Commercial use✅ Unrestricted✅ Unrestricted✅ With conditions
Modification✅ Full freedom✅ Full freedom✅ With attribution
Distribution✅ No restrictions✅ No restrictions⚠️ Must include license
Government/defense use⚠️ Review terms
Training data provenance✅ IBM-guaranteed IP-clean⚠️ Less transparency⚠️ Google proprietary mix
Model card transparency✅ Detailed✅ Detailed✅ Detailed
License Warning

If you're deploying AuroraSOC in a government, defense, or critical infrastructure environment, Granite 4's Apache 2.0 license provides the clearest legal path. Gemma's license requires careful legal review for these use cases.


Final Recommendation Matrix

Use CaseRecommended ModelWhy
Default AuroraSOC deploymentGranite 4 H-Small (8B)Best tool calling, agentic pre-training, Apache 2.0
Resource-constrained / edgeGranite 4 H-Tiny (2B)Smallest model with acceptable SOC performance
Multilingual SOC (Arabic, Chinese)Qwen 3 8B119 languages, best multilingual performance
Security rule generation (YARA, Suricata)Qwen 3 8BStrongest code generation capabilities
Complex reasoning tasksGemma 4 12BBest on reasoning benchmarks, but needs more VRAM
Visual malware / phishing analysisGemma 4 12BNative multimodal: can analyze screenshots and PE visualizations
Budget-optimal cloud trainingGranite 4 H-Small (8B)Best quality at $5 on a rented RTX 3090
Maximum quality, no budget limitsGemma 4 12B + A100Highest benchmark scores across reasoning tasks
MoE efficiency (30B quality, 3B cost)Qwen3-30B-A3BUnique option if inference cost matters more than training cost

Next Steps