Agent-Specific Model Selection Guide

AuroraSOC has 16 specialized agents, each handling different security domains. This guide maps every agent to its optimal model, fine-tuning method, and configuration — backed by benchmark data and real resource requirements.

The Agent Landscape

Quick Reference: Best Model Per Agent

Agent	Best Model	Why	Fine-Tuning Method	LoRA Rank	Training Time
Orchestrator	Granite 4 H-Small (8B)	Best tool calling + routing	QLoRA + SFT	128	~45 min
Security Analyst	Granite 4 H-Small (8B)	Strong classification + tool use	QLoRA + SFT	64	~25 min
Threat Hunter	Qwen 3 8B	Strong reasoning + query generation	QLoRA + SFT	64	~25 min
Malware Analyst	Qwen 3 8B	Best code generation (YARA, decompilation)	QLoRA + SFT	64	~30 min
Forensic Analyst	Gemma 4 12B	Complex multi-step reasoning	QLoRA + SFT	64	~40 min
Threat Intel	Granite 4 H-Small (8B)	Structured output (STIX/TAXII)	QLoRA + SFT	64	~25 min
Incident Responder	Gemma 4 12B	Long-form response planning	QLoRA + SFT	64	~40 min
Vulnerability Manager	Granite 4 H-Small (8B)	CVSS scoring + prioritization	QLoRA + SFT	64	~20 min
Compliance Analyst	Granite 4 H-Small (8B)	Framework mapping + structured output	QLoRA + SFT	64	~25 min
Network Security	Qwen 3 8B	Suricata rule generation	QLoRA + SFT	64	~25 min
Endpoint Security	Granite 4 H-Small (8B)	EDR alert triage + tool calling	QLoRA + SFT	64	~25 min
Cloud Security	Granite 4 H-Small (8B)	API/tool calling for cloud services	QLoRA + SFT	64	~25 min
CPS/OT Security	Granite 4 H-Small (8B)	Specialized protocol knowledge	QLoRA + SFT	64	~30 min
Web Security	Qwen 3 8B	Code analysis (XSS, SQLi patterns)	QLoRA + SFT	64	~25 min
UEBA Analyst	Granite 4 H-Small (8B)	Behavioral pattern analysis	QLoRA + SFT	64	~20 min
Report Generator	Gemma 4 12B	Long-form coherent writing	QLoRA + SFT	64	~30 min

Detailed Per-Agent Analysis

1. Orchestrator

The orchestrator is the most critical agent — it receives every user request and routes it to the correct specialist. Poor orchestration = poor results regardless of specialist quality.

Aspect	Details
Best model	Granite 4 H-Small (8B)
Why	Highest function calling score (BFCL: 78.3%). The orchestrator must parse intent, select tools, and dispatch to other agents — all via structured function calls. Granite 4's agentic pre-training makes it 8-10% better at this than alternatives.
LoRA rank	128 (double the default) — orchestrator needs maximum capacity to understand all 15 agent domains
Training data	Orchestration routing examples, multi-turn delegation conversations, tool selection scenarios
Alternative	Qwen3-30B-A3B (MoE) — 30B quality at 3B inference cost, but requires more VRAM for training

Configuration:

orchestrator:
  system_prompt: |
    You are the AuroraSOC Orchestrator. You analyze security requests
    and route them to the appropriate specialist agent...
  dataset_filter: "orchestration"
  model_override: "unsloth/granite-4.0-h-small"
  lora_r_override: 128
  output_dir: "training/output/orchestrator"

2. Security Analyst (Alert Triage)

Aspect	Details
Best model	Granite 4 H-Small (8B)
Why	Alert triage requires rapid classification + tool calling to enrich alerts via SIEM/SOAR integrations. Granite 4's 0.85 alert triage score + 0.88 tool calling score make it ideal.
Key skills	Severity classification, IOC extraction, alert enrichment via MCP tools, false positive identification
Training data	Alert classification examples (Suricata, Snort, Sigma), IOC extraction, severity mapping to MITRE tactics

Benchmark detail:

Metric	Granite 4 (8B)	Qwen 3 (8B)	Gemma 4 (12B)
Alert severity classification	94% accuracy	91%	92%
IOC extraction (precision)	89%	86%	88%
False positive rate	8%	11%	9%
Tool calling success rate	96%	88%	85%

3. Threat Hunter

Aspect	Details
Best model	Qwen 3 8B
Why	Threat hunting requires generating KQL/SPL queries, constructing hypotheses, and chain-of-thought reasoning. Qwen 3's "thinking mode" with `<think>` tags enables visible reasoning chains, and its strong code generation produces better detection queries.
Key skills	Hypothesis construction, KQL/SPL query generation, MITRE ATT&CK mapping, proactive detection logic
Training data	Hunting hypotheses, detection queries, ATT&CK technique narratives, adversary emulation plans

Why thinking mode matters for hunting:

User: Hunt for lateral movement via PsExec in our Windows environment.

Qwen 3 (with thinking mode):
<think>
PsExec uses SMB to copy a service binary to ADMIN$ on the remote host...
Key detection points:
1. Event ID 7045 (new service installed) with random service names
2. Network connections to port 445 followed by service creation
3. Named pipe: \PIPE\svcctl for SCM communication
4. PsExec binaries leave artifacts in %SystemRoot%\PSEXESVC.exe
I should construct a multi-stage detection...
</think>

## Hunting Hypothesis
Adversaries using PsExec for lateral movement will generate...
[structured output with KQL queries]

The visible thinking chain helps SOC analysts understand and validate the hunting logic.

4. Malware Analyst

Aspect	Details
Best model	Qwen 3 8B
Why	Malware analysis is heavily code-centric: writing YARA rules, analyzing decompiled code, understanding shellcode. Qwen 3 scores 0.81 on YARA generation vs Granite 4's 0.78 — a meaningful difference when rule accuracy is critical.
Key skills	YARA rule writing, PE analysis, shellcode interpretation, behavioral analysis, sandbox report parsing
Training data	YARA rule examples, malware family descriptions, PE header analysis, behavioral IOC extraction

YARA generation comparison:

// Qwen 3 8B output (more precise, fewer false positives):
rule Emotet_Loader_2024 {
    meta:
        description = "Detects Emotet loader stage"
        author = "AuroraSOC"
        severity = "critical"
        mitre = "T1059.001"
    strings:
        $mz = { 4D 5A }
        $api1 = "VirtualAllocEx" ascii
        $api2 = "WriteProcessMemory" ascii
        $enc = { 8B ?? ?? ?? 33 ?? 89 ?? ?? ?? C1 ?? 05 }
        $c2_pattern = /https?:\/\/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{2,5}\//
    condition:
        $mz at 0 and 2 of ($api*) and $enc and $c2_pattern
}

5. Forensic Analyst

Aspect	Details
Best model	Gemma 4 12B
Why	Digital forensics requires the most complex multi-step reasoning: analyzing memory dumps, correlating disk artifacts across timelines, maintaining chain-of-custody logic. Gemma 4's additional parameters (12B vs 8B) give it an edge on these reasoning-heavy tasks (0.83 vs 0.79 for IR planning).
Key skills	Memory forensics (Volatility), disk analysis, timeline reconstruction, evidence handling, chain-of-custody documentation
Training data	Forensic investigation walkthroughs, Volatility output analysis, timeline reconstruction examples, evidence collection procedures

6. Threat Intelligence Analyst

Aspect	Details
Best model	Granite 4 H-Small (8B)
Why	Threat intel requires generating structured output (STIX 2.1 bundles, Diamond Model analyses) and making API calls to threat intel platforms. Granite 4's structured JSON output capabilities and tool calling make it ideal.
Key skills	APT attribution, STIX/TAXII generation, Diamond Model analysis, campaign tracking, IOC correlation

7. Incident Responder

Aspect	Details
Best model	Gemma 4 12B
Why	Incident response demands generating comprehensive, multi-phase response plans following NIST 800-61. The plans must be coherent, sequenced correctly, and account for dependencies. Gemma 4's superior reasoning produces better-structured response plans.
Key skills	NIST 800-61 playbooks, containment strategies, eradication procedures, recovery planning, lessons learned documentation

8-16. Remaining Agents (Summary)

Agent	Best Model	Key Reasoning
Vulnerability Manager	Granite 4 H-Small	CVSS scoring is a structured task; Granite 4 excels at structured output with tool integration
Compliance Analyst	Granite 4 H-Small	Framework mapping (CIS → NIST → PCI) requires tool-calling to compliance databases
Network Security	Qwen 3 8B	Suricata rule writing is code generation; Qwen 3's coding strength transfer well
Endpoint Security	Granite 4 H-Small	EDR triage requires rapid tool-calling classification; Granite 4's agentic pre-training excels
Cloud Security	Granite 4 H-Small	Cloud API interaction requires structured function calling to AWS/Azure/GCP services
CPS/OT Security	Granite 4 H-Small	Specialized protocol knowledge (Modbus, DNP3) with structured analysis output
Web Security	Qwen 3 8B	Analyzing XSS, SQLi, and code patterns is inherently a code-analysis task
UEBA Analyst	Granite 4 H-Small	Behavioral baselines and anomaly detection via structured tool outputs
Report Generator	Gemma 4 12B	Long-form coherent document generation benefits from additional model capacity

Deployment Configurations

Option A: Single-Model (Simplest)

Use one model for all agents. Best for getting started or resource-constrained environments.

Pros: Simple deployment, one model to serve, consistent behavior Cons: Not optimal for code-generation or reasoning-heavy agents

Configuration:

# .env
LLM_BACKEND=ollama
OLLAMA_MODEL=granite-soc:latest
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true  # Uses per-agent LoRA adapters

VRAM for serving: ~9 GB (single GGUF q8_0)

Option B: Two-Model (Balanced)

Use Granite 4 for tool-calling agents and Qwen 3 for code-generation agents.

VRAM for serving: ~18 GB (two GGUF q8_0 models) — fits on an RTX 3090

Training cost: ~$7-8 on RunPod (RTX 3090, ~10 hours total)

Option C: Three-Model (Maximum Quality)

Use the best model for each task category:

VRAM for serving: ~31 GB (three GGUF q8_0 models) — requires A100 40GB or 2× RTX 3090

Training cost: ~$15-20 on RunPod (mixed RTX 3090 + A100 time)

Training All Agents (Step-by-Step)

Option A: Single-Model Training

# 1. Prepare datasets
make train-data

# 2. Train generic model
make train

# 3. Train all 9 agent specialists
python training/scripts/train_all_agents.py

# 4. Evaluate
make train-eval

# 5. Import to Ollama
python training/scripts/serve_model.py ollama-all \
  --output-dir training/output

# 6. Enable in AuroraSOC
make enable-finetuned

Option B/C: Multi-Model Training

# 1. Prepare datasets
make train-data

# 2. Train Granite 4 agents
python training/scripts/finetune_granite.py \
  --config training/configs/granite_soc_finetune.yaml \
  --agent orchestrator
# Repeat for: security_analyst, threat_intel, vulnerability_manager,
#              compliance_analyst, endpoint_security, cloud_security,
#              cps_security, ueba_analyst

# 3. Train Qwen 3 agents (modify config to use Qwen 3 base)
python training/scripts/finetune_granite.py \
  --config training/configs/qwen_soc_finetune.yaml \
  --agent malware_analyst
# Repeat for: threat_hunter, network_security, web_security

# 4. Train Gemma 4 agents (if using Option C)
python training/scripts/finetune_granite.py \
  --config training/configs/gemma_soc_finetune.yaml \
  --agent forensic_analyst
# Repeat for: incident_responder, report_generator

# 5. Import all to Ollama (each with correct chat template)
python training/scripts/serve_model.py ollama-all \
  --output-dir training/output \
  --multi-model  # Auto-detects model family for Modelfile template

# 6. Enable per-agent models
export GRANITE_USE_FINETUNED=true
export GRANITE_USE_PER_AGENT_MODELS=true

Performance Expectations by Configuration

Configuration	Avg. Score	Tool Calling	Code Gen	Reasoning	Serving VRAM	Monthly RunPod Cost
Base Granite 4 (no fine-tuning)	0.45	0.55	0.40	0.42	9 GB	$0
Single-model fine-tuned (Granite 4)	0.82	0.88	0.78	0.80	9 GB	$5 (one-time)
Two-model (Granite 4 + Qwen 3)	0.84	0.88	0.82	0.80	18 GB	$8 (one-time)
Three-model (+ Gemma 4)	0.86	0.88	0.82	0.84	31 GB	$18 (one-time)

The jump from no fine-tuning (0.45) to single-model fine-tuning (0.82) is enormous — nearly 2× improvement. The gains from two-model (0.84) to three-model (0.86) are incremental.

Recommendation

Start with single-model Granite 4 fine-tuning. The 0.82 average score represents a massive uplift from the 0.45 base. Only move to multi-model if you need the extra 2-4% for specific agents and have the infrastructure to serve multiple models.

Next Steps

Fine-Tuning Methods — understand how QLoRA, DPO, ORPO work
Model Comparison — deep-dive into model architectures and benchmarks
Cloud Training Guide — train on RunPod, Lambda Labs, or Colab
Per-Agent Specialists — detailed per-agent training guide

The Agent Landscape​

Quick Reference: Best Model Per Agent​

Detailed Per-Agent Analysis​

1. Orchestrator​

2. Security Analyst (Alert Triage)​

3. Threat Hunter​

4. Malware Analyst​

5. Forensic Analyst​

6. Threat Intelligence Analyst​

7. Incident Responder​

8-16. Remaining Agents (Summary)​

Deployment Configurations​

Option A: Single-Model (Simplest)​

Option B: Two-Model (Balanced)​

Option C: Three-Model (Maximum Quality)​

Training All Agents (Step-by-Step)​

Option A: Single-Model Training​

Option B/C: Multi-Model Training​

Performance Expectations by Configuration​

Next Steps​