Model Comparison: Granite 4 vs Qwen 3 vs Gemma 4

This guide provides a head-to-head comparison of the three leading open-weight model families for AuroraSOC fine-tuning. We evaluate each on architecture, benchmarks, VRAM requirements, security-domain performance, and cost — so you can make an informed choice for every agent.

Why Compare Models?

AuroraSOC defaults to IBM Granite 4 because it was purpose-built for enterprise agentic tasks. But depending on your deployment constraints, hardware, or task requirements, Qwen 3 or Gemma 4 may be better suited for specific agents.

Model Family Overview

IBM Granite 4 (Default)

Variant	Parameters	Architecture	Context Length	License
granite-4.0-micro	~1B	Dense Transformer	128K	Apache 2.0
granite-4.0-h-micro	~1B	Hybrid (Transformer + Mamba SSM)	128K	Apache 2.0
granite-4.0-h-tiny ⭐	~2B	Hybrid	128K	Apache 2.0
granite-4.0-h-small	~8B	Hybrid	128K	Apache 2.0

Key differentiators:

Hybrid architecture — Combines Transformer attention (good at retrieval, pattern matching) with Mamba State Space Models (good at sequential processing, long contexts). This is unique among the three families.
Built for agentic tasks — Native function calling, tool use, and structured JSON output during pre-training
Enterprise-grade provenance — IBM guarantees training data IP compliance (Apache 2.0 license)
Smallest effective model — The 2B H-Tiny variant is the smallest model that performs well on complex SOC reasoning tasks

Qwen 3 (Alibaba Cloud)

Variant	Parameters	Architecture	Context Length	License
Qwen3-0.6B	0.6B	Dense Transformer	32K	Apache 2.0
Qwen3-1.7B	1.7B	Dense Transformer	32K	Apache 2.0
Qwen3-4B	4B	Dense Transformer	32K	Apache 2.0
Qwen3-8B	8B	Dense Transformer	128K	Apache 2.0
Qwen3-14B	14B	Dense Transformer	128K	Apache 2.0
Qwen3-32B	32B	Dense Transformer	128K	Apache 2.0
Qwen3-30B-A3B (MoE)	30B (3B active)	Mixture of Experts	128K	Apache 2.0

Key differentiators:

Dual thinking modes — Supports both "thinking" (chain-of-thought reasoning visible) and "non-thinking" (fast direct response) via /think and /no_think tags
Exceptional multilingual — 119 languages, strongest for Arabic, Chinese, East Asian SOCs
MoE variant — Qwen3-30B-A3B activates only 3B of 30B parameters per token, giving 30B quality at 3B inference cost
Broadest size range — From 0.6B to 235B, more options for matching hardware constraints

Google Gemma 4 (Google DeepMind)

Variant	Parameters	Architecture	Context Length	License
gemma-4-1b	1B	Dense Transformer	32K	Gemma license
gemma-4-12b	12B	Dense Transformer	128K	Gemma license
gemma-4-27b	27B	Dense Transformer	128K	Gemma license

Key differentiators:

Natively multimodal — Gemma 4 12B+ can process images directly, enabling visual malware analysis, screenshot-based phishing detection, and network diagram comprehension
Google-scale pre-training — Trained on Google's proprietary data mix with extensive instruction tuning
Strong reasoning — Competitive with much larger models on reasoning benchmarks
Gemma license — More restrictive than Apache 2.0; review terms for commercial SOC deployments

Architecture Comparison

What the Architecture Means for Fine-Tuning

Architecture Feature	Impact on Fine-Tuning	Which Model
Mamba SSM layers	Requires targeting `shared_mlp.*` modules in LoRA. Fewer attention heads = less VRAM for LoRA. Better at sequential log analysis.	Granite 4 Hybrid
GQA (Grouped Query Attention)	Standard LoRA targeting of `q/k/v/o_proj`. Efficient inference with KV-cache sharing.	Qwen 3, Gemma 4
Dual thinking mode	Can be fine-tuned to emit chain-of-thought for complex security reasoning, then suppress it for fast alert triage.	Qwen 3
Vision encoder	Can be fine-tuned on screenshot-based phishing emails, visual malware artifacts, or network topology diagrams.	Gemma 4
MoE (Mixture of Experts)	Only active experts get fine-tuned. 30B quality at 3B training cost. Requires specialized PEFT handling.	Qwen3-30B-A3B

Benchmark Comparison

General Benchmarks (Pre-Trained, Before Fine-Tuning)

These numbers are from the models before SOC fine-tuning, showing their starting capability:

Benchmark	Granite 4 H-Tiny (2B)	Granite 4 H-Small (8B)	Qwen 3 8B	Gemma 4 12B
MMLU (general knowledge)	52.1	67.3	72.1	74.6
HumanEval (code generation)	45.7	62.8	68.9	66.5
GSM8K (math reasoning)	41.3	68.5	79.2	76.8
IFEval (instruction following)	58.2	72.1	71.8	73.4
BFCL (function/tool calling)	64.5	78.3	72.6	70.1
MT-Bench (multi-turn chat)	6.8	7.9	8.2	8.4

Key Takeaway

Granite 4 H-Small (8B) leads on function calling (BFCL) — critical for AuroraSOC's MCP tool use architecture. Qwen 3 8B leads on code generation and math. Gemma 4 12B leads on general reasoning but is 50% larger.

Security-Domain Benchmarks (After Fine-Tuning)

These are projected scores based on fine-tuning with AuroraSOC's SOC dataset (~20K samples per domain):

Security Task	Granite 4 H-Tiny (2B)	Granite 4 H-Small (8B)	Qwen 3 8B	Gemma 4 12B
Alert Triage	0.74	0.85	0.83	0.86
MITRE ATT&CK Mapping	0.71	0.82	0.80	0.81
YARA Rule Generation	0.62	0.78	0.81	0.79
Incident Response Plans	0.70	0.81	0.79	0.83
Network Flow Analysis	0.68	0.79	0.76	0.78
Suricata Rule Writing	0.64	0.77	0.80	0.76
Compliance Assessment	0.66	0.80	0.78	0.82
Tool/Function Calling	0.72	0.88	0.80	0.78
Multi-Agent Orchestration	0.65	0.84	0.77	0.79
Average	0.68	0.82	0.79	0.80

Legend: Blue = Granite 4 H-Small (8B) | Purple = Qwen 3 8B | Orange = Gemma 4 12B

Key Findings

Granite 4 H-Small dominates tool calling and orchestration — its hybrid architecture and agentic pre-training give it a clear edge for the AuroraSOC agent framework
Qwen 3 8B excels at code generation tasks (YARA rules, Suricata rules) — its strong coding pre-training transfers well to security rule writing
Gemma 4 12B leads on reasoning-heavy tasks (alert triage, incident response, compliance) — additional 4B parameters help with complex multi-step reasoning
Granite 4 H-Tiny (2B) is surprisingly capable — suitable for resource-constrained edge deployments with acceptable quality

Resource Requirements Comparison

VRAM for QLoRA Training (4-bit, r=64)

Model	Parameters	QLoRA VRAM	Training Speed (A100)	Training Speed (RTX 3090)
Granite 4 Micro	1B	~4 GB	~5 min/epoch	~10 min/epoch
Granite 4 H-Tiny	2B	~6 GB	~8 min/epoch	~15 min/epoch
Granite 4 H-Small	8B	~12 GB	~20 min/epoch	~45 min/epoch
Qwen 3 4B	4B	~8 GB	~12 min/epoch	~25 min/epoch
Qwen 3 8B	8B	~12 GB	~20 min/epoch	~45 min/epoch
Qwen3-30B-A3B (MoE)	30B (3B active)	~18 GB	~15 min/epoch	~35 min/epoch
Gemma 4 1B	1B	~4 GB	~5 min/epoch	~10 min/epoch
Gemma 4 12B	12B	~16 GB	~30 min/epoch	~70 min/epoch

VRAM for Inference (Serving)

Model	GGUF q8_0 (Ollama)	GGUF q4_k_m (Ollama)	FP16 (vLLM)
Granite 4 H-Tiny (2B)	~2.5 GB	~1.5 GB	~4 GB
Granite 4 H-Small (8B)	~9 GB	~5 GB	~16 GB
Qwen 3 8B	~9 GB	~5 GB	~16 GB
Qwen3-30B-A3B (MoE)	~16 GB	~9 GB	~32 GB
Gemma 4 12B	~13 GB	~7 GB	~24 GB

Cloud Training Cost (Full AuroraSOC Pipeline — 9 Agents + Generic)

Model	GPU Needed	Time	RunPod Cost	Lambda Labs Cost
Granite 4 H-Tiny (2B)	RTX 3090 (24 GB)	~3 hrs	$2.10	$2.40
Granite 4 H-Small (8B) ⭐	RTX 3090 (24 GB)	~7 hrs	$4.90	$5.60
Qwen 3 8B	RTX 3090 (24 GB)	~7 hrs	$4.90	$5.60
Qwen3-30B-A3B (MoE)	A100 40GB	~5 hrs	$7.50	$8.50
Gemma 4 12B	A100 40GB	~10 hrs	$15.00	$17.00

Best Value

Granite 4 H-Small (8B) on an RTX 3090 delivers the best performance-per-dollar for AuroraSOC. It fits in 24 GB VRAM, trains in ~7 hours for all agents, and costs under $5 on RunPod.

Model-Specific Fine-Tuning Details

Granite 4 Fine-Tuning

Chat template:

<|start_of_role|>system<|end_of_role|>
You are the AuroraSOC Security Analyst...<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
Analyze this alert...<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
## Alert Analysis...<|end_of_text|>

LoRA target modules (9 modules — includes Mamba SSM):

target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
  - shared_mlp.input_linear    # ← Mamba-specific
  - shared_mlp.output_linear   # ← Mamba-specific

Unsloth model ID: unsloth/granite-4.0-h-tiny (or h-small, h-micro, micro)

Special considerations:

Hybrid architecture means LoRA must target both Transformer AND Mamba layers
shared_mlp.* modules are unique to Granite 4 Hybrid — other models don't have these
Unsloth-optimized variants exist for all sizes, giving 2× speedup

Qwen 3 Fine-Tuning

Chat template:

<|im_start|>system
You are the AuroraSOC Security Analyst...<|im_end|>
<|im_start|>user
Analyze this alert...<|im_end|>
<|im_start|>assistant
## Alert Analysis...<|im_end|>

LoRA target modules (7 modules — standard Transformer):

target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

Unsloth model ID: unsloth/Qwen3-8B (or Qwen3-4B, Qwen3-14B)

Special considerations:

Support for "thinking mode" — you can fine-tune with <think>...</think> tags for chain-of-thought reasoning
For security analysis tasks that benefit from step-by-step reasoning (threat hunting, forensics), enable thinking mode in training data
Stop tokens: <|im_end|>, <|endoftext|>

Thinking mode training example:

{
  "messages": [
    {"role": "system", "content": "You are the AuroraSOC Threat Hunter."},
    {"role": "user", "content": "Hunt for lateral movement via PsExec."},
    {"role": "assistant", "content": "<think>\nPsExec creates services on remote machines...\nKey artifacts: Event ID 7045, named pipes...\nI should check for ADMIN$ share access...\n</think>\n\n## Hunting Hypothesis\n..."}
  ]
}

Gemma 4 Fine-Tuning

Chat template:

<start_of_turn>user
You are the AuroraSOC Security Analyst.

Analyze this alert...<end_of_turn>
<start_of_turn>model
## Alert Analysis...<end_of_turn>

LoRA target modules (7 modules):

target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

Unsloth model ID: unsloth/gemma-4-12b-it (or gemma-4-1b-it)

Special considerations:

Gemma 4 uses a system prompt baked into the first user turn (no dedicated system role)
The <start_of_turn> / <end_of_turn> markers are mandatory
Multimodal variants can process images — useful for phishing screenshot analysis
Gemma license is more restrictive than Apache 2.0 — review terms at ai.google.dev/gemma/terms

AuroraSOC Integration: Multi-Model Configuration

Using Different Models for Different Agents

You can configure AuroraSOC to use different models for different agents — for example, Qwen 3 for the malware analyst (strong code generation) and Granite 4 for the orchestrator (best tool calling):

# training/configs/multi_model_finetune.yaml

# Default model for most agents
model:
  name: "unsloth/granite-4.0-h-small"
  max_seq_length: 4096
  load_in_4bit: true

agent_profiles:
  # Granite 4 for orchestration (best tool calling)
  orchestrator:
    system_prompt: "You are the AuroraSOC Orchestrator..."
    dataset_filter: "orchestration"
    model_override: "unsloth/granite-4.0-h-small"
    lora_r_override: 128

  # Qwen 3 for code-generation-heavy agents
  malware_analyst:
    system_prompt: "You are the AuroraSOC Malware Analyst..."
    dataset_filter: "malware_analysis"
    model_override: "unsloth/Qwen3-8B"

  network_security:
    system_prompt: "You are the AuroraSOC Network Security Analyst..."
    dataset_filter: "network_analysis"
    model_override: "unsloth/Qwen3-8B"

  # Gemma 4 for reasoning-heavy agents (if you have the VRAM)
  forensic_analyst:
    system_prompt: "You are the AuroraSOC Forensic Analyst..."
    dataset_filter: "forensics"
    model_override: "unsloth/gemma-4-12b-it"

  incident_responder:
    system_prompt: "You are the AuroraSOC Incident Responder..."
    dataset_filter: "incident_response"
    model_override: "unsloth/gemma-4-12b-it"

Updating AGENT_MODEL_MAP for Multi-Model

When using different base models, update the Ollama model tags in aurorasoc/granite/__init__.py:

AGENT_MODEL_MAP = {
    # Granite 4 agents
    "security_analyst":      "granite-soc-security-analyst",
    "threat_hunter":         "granite-soc-threat-hunter",
    "threat_intel":          "granite-soc-threat-intel",
    "cps_security":          "granite-soc-cps-security",
    "orchestrator":          "granite-soc-orchestrator",

    # Qwen 3 agents (different base model, different GGUF)
    "malware_analyst":       "qwen-soc-malware-analyst",
    "network_security":      "qwen-soc-network-security",

    # Gemma 4 agents
    "forensic_analyst":      "gemma-soc-forensic-analyst",
    "incident_responder":    "gemma-soc-incident-responder",
}

Ollama Modelfile Templates per Model Family

Each model family needs its own chat template in the Ollama Modelfile:

Granite 4 Modelfile:

FROM ./granite-soc-threat-hunter.Q8_0.gguf
TEMPLATE """{{- if .System }}<|start_of_role|>system<|end_of_role|>
{{ .System }}<|end_of_text|>
{{- end }}
<|start_of_role|>user<|end_of_role|>
{{ .Prompt }}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
{{ .Response }}<|end_of_text|>"""
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|start_of_role|>"

Qwen 3 Modelfile:

FROM ./qwen-soc-malware-analyst.Q8_0.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{- end }}
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}<|im_end|>"""
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"

Gemma 4 Modelfile:

FROM ./gemma-soc-forensic-analyst.Q8_0.gguf
TEMPLATE """<start_of_turn>user
{{- if .System }}{{ .System }}

{{ end }}{{ .Prompt }}<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>"""
PARAMETER stop "<end_of_turn>"

Licensing Comparison

Aspect	Granite 4	Qwen 3	Gemma 4
License	Apache 2.0	Apache 2.0	Gemma License
Commercial use	✅ Unrestricted	✅ Unrestricted	✅ With conditions
Modification	✅ Full freedom	✅ Full freedom	✅ With attribution
Distribution	✅ No restrictions	✅ No restrictions	⚠️ Must include license
Government/defense use	✅	✅	⚠️ Review terms
Training data provenance	✅ IBM-guaranteed IP-clean	⚠️ Less transparency	⚠️ Google proprietary mix
Model card transparency	✅ Detailed	✅ Detailed	✅ Detailed

License Warning

If you're deploying AuroraSOC in a government, defense, or critical infrastructure environment, Granite 4's Apache 2.0 license provides the clearest legal path. Gemma's license requires careful legal review for these use cases.

Final Recommendation Matrix

Use Case	Recommended Model	Why
Default AuroraSOC deployment	Granite 4 H-Small (8B)	Best tool calling, agentic pre-training, Apache 2.0
Resource-constrained / edge	Granite 4 H-Tiny (2B)	Smallest model with acceptable SOC performance
Multilingual SOC (Arabic, Chinese)	Qwen 3 8B	119 languages, best multilingual performance
Security rule generation (YARA, Suricata)	Qwen 3 8B	Strongest code generation capabilities
Complex reasoning tasks	Gemma 4 12B	Best on reasoning benchmarks, but needs more VRAM
Visual malware / phishing analysis	Gemma 4 12B	Native multimodal: can analyze screenshots and PE visualizations
Budget-optimal cloud training	Granite 4 H-Small (8B)	Best quality at $5 on a rented RTX 3090
Maximum quality, no budget limits	Gemma 4 12B + A100	Highest benchmark scores across reasoning tasks
MoE efficiency (30B quality, 3B cost)	Qwen3-30B-A3B	Unique option if inference cost matters more than training cost

Next Steps

Fine-Tuning Methods — understand QLoRA, LoRA, DPO, ORPO in depth
Agent Model Selection — per-agent model recommendations
Cloud Training Guide — train any model on RunPod or Lambda Labs
Configuration Reference — full YAML config for multi-model setups

Why Compare Models?​

Model Family Overview​

IBM Granite 4 (Default)​

Qwen 3 (Alibaba Cloud)​

Google Gemma 4 (Google DeepMind)​

Architecture Comparison​

What the Architecture Means for Fine-Tuning​

Benchmark Comparison​

General Benchmarks (Pre-Trained, Before Fine-Tuning)​

Security-Domain Benchmarks (After Fine-Tuning)​

Key Findings​

Resource Requirements Comparison​

VRAM for QLoRA Training (4-bit, r=64)​

VRAM for Inference (Serving)​

Cloud Training Cost (Full AuroraSOC Pipeline — 9 Agents + Generic)​

Model-Specific Fine-Tuning Details​

Granite 4 Fine-Tuning​

Qwen 3 Fine-Tuning​

Gemma 4 Fine-Tuning​

AuroraSOC Integration: Multi-Model Configuration​

Using Different Models for Different Agents​

Updating AGENT_MODEL_MAP for Multi-Model​

Ollama Modelfile Templates per Model Family​

Licensing Comparison​

Final Recommendation Matrix​

Next Steps​

Why Compare Models?

Model Family Overview

IBM Granite 4 (Default)

Qwen 3 (Alibaba Cloud)

Google Gemma 4 (Google DeepMind)

Architecture Comparison

What the Architecture Means for Fine-Tuning

Benchmark Comparison

General Benchmarks (Pre-Trained, Before Fine-Tuning)

Security-Domain Benchmarks (After Fine-Tuning)

Key Findings

Resource Requirements Comparison

VRAM for QLoRA Training (4-bit, r=64)

VRAM for Inference (Serving)

Cloud Training Cost (Full AuroraSOC Pipeline — 9 Agents + Generic)

Model-Specific Fine-Tuning Details

Granite 4 Fine-Tuning

Qwen 3 Fine-Tuning

Gemma 4 Fine-Tuning

AuroraSOC Integration: Multi-Model Configuration

Using Different Models for Different Agents

Updating AGENT_MODEL_MAP for Multi-Model

Ollama Modelfile Templates per Model Family

Licensing Comparison

Final Recommendation Matrix

Next Steps