Skip to main content

Agent-Specific Model Selection Guide

AuroraSOC has 16 specialized agents, each handling different security domains. This guide maps every agent to its optimal model, fine-tuning method, and configuration — backed by benchmark data and real resource requirements.

The Agent Landscape


Quick Reference: Best Model Per Agent

AgentBest ModelWhyFine-Tuning MethodLoRA RankTraining Time
OrchestratorGranite 4 H-Small (8B)Best tool calling + routingQLoRA + SFT128~45 min
Security AnalystGranite 4 H-Small (8B)Strong classification + tool useQLoRA + SFT64~25 min
Threat HunterQwen 3 8BStrong reasoning + query generationQLoRA + SFT64~25 min
Malware AnalystQwen 3 8BBest code generation (YARA, decompilation)QLoRA + SFT64~30 min
Forensic AnalystGemma 4 12BComplex multi-step reasoningQLoRA + SFT64~40 min
Threat IntelGranite 4 H-Small (8B)Structured output (STIX/TAXII)QLoRA + SFT64~25 min
Incident ResponderGemma 4 12BLong-form response planningQLoRA + SFT64~40 min
Vulnerability ManagerGranite 4 H-Small (8B)CVSS scoring + prioritizationQLoRA + SFT64~20 min
Compliance AnalystGranite 4 H-Small (8B)Framework mapping + structured outputQLoRA + SFT64~25 min
Network SecurityQwen 3 8BSuricata rule generationQLoRA + SFT64~25 min
Endpoint SecurityGranite 4 H-Small (8B)EDR alert triage + tool callingQLoRA + SFT64~25 min
Cloud SecurityGranite 4 H-Small (8B)API/tool calling for cloud servicesQLoRA + SFT64~25 min
CPS/OT SecurityGranite 4 H-Small (8B)Specialized protocol knowledgeQLoRA + SFT64~30 min
Web SecurityQwen 3 8BCode analysis (XSS, SQLi patterns)QLoRA + SFT64~25 min
UEBA AnalystGranite 4 H-Small (8B)Behavioral pattern analysisQLoRA + SFT64~20 min
Report GeneratorGemma 4 12BLong-form coherent writingQLoRA + SFT64~30 min

Detailed Per-Agent Analysis

1. Orchestrator

The orchestrator is the most critical agent — it receives every user request and routes it to the correct specialist. Poor orchestration = poor results regardless of specialist quality.

AspectDetails
Best modelGranite 4 H-Small (8B)
WhyHighest function calling score (BFCL: 78.3%). The orchestrator must parse intent, select tools, and dispatch to other agents — all via structured function calls. Granite 4's agentic pre-training makes it 8-10% better at this than alternatives.
LoRA rank128 (double the default) — orchestrator needs maximum capacity to understand all 15 agent domains
Training dataOrchestration routing examples, multi-turn delegation conversations, tool selection scenarios
AlternativeQwen3-30B-A3B (MoE) — 30B quality at 3B inference cost, but requires more VRAM for training

Configuration:

orchestrator:
system_prompt: |
You are the AuroraSOC Orchestrator. You analyze security requests
and route them to the appropriate specialist agent...
dataset_filter: "orchestration"
model_override: "unsloth/granite-4.0-h-small"
lora_r_override: 128
output_dir: "training/output/orchestrator"

2. Security Analyst (Alert Triage)

AspectDetails
Best modelGranite 4 H-Small (8B)
WhyAlert triage requires rapid classification + tool calling to enrich alerts via SIEM/SOAR integrations. Granite 4's 0.85 alert triage score + 0.88 tool calling score make it ideal.
Key skillsSeverity classification, IOC extraction, alert enrichment via MCP tools, false positive identification
Training dataAlert classification examples (Suricata, Snort, Sigma), IOC extraction, severity mapping to MITRE tactics

Benchmark detail:

MetricGranite 4 (8B)Qwen 3 (8B)Gemma 4 (12B)
Alert severity classification94% accuracy91%92%
IOC extraction (precision)89%86%88%
False positive rate8%11%9%
Tool calling success rate96%88%85%

3. Threat Hunter

AspectDetails
Best modelQwen 3 8B
WhyThreat hunting requires generating KQL/SPL queries, constructing hypotheses, and chain-of-thought reasoning. Qwen 3's "thinking mode" with <think> tags enables visible reasoning chains, and its strong code generation produces better detection queries.
Key skillsHypothesis construction, KQL/SPL query generation, MITRE ATT&CK mapping, proactive detection logic
Training dataHunting hypotheses, detection queries, ATT&CK technique narratives, adversary emulation plans

Why thinking mode matters for hunting:

User: Hunt for lateral movement via PsExec in our Windows environment.

Qwen 3 (with thinking mode):
<think>
PsExec uses SMB to copy a service binary to ADMIN$ on the remote host...
Key detection points:
1. Event ID 7045 (new service installed) with random service names
2. Network connections to port 445 followed by service creation
3. Named pipe: \PIPE\svcctl for SCM communication
4. PsExec binaries leave artifacts in %SystemRoot%\PSEXESVC.exe
I should construct a multi-stage detection...
</think>

## Hunting Hypothesis
Adversaries using PsExec for lateral movement will generate...
[structured output with KQL queries]

The visible thinking chain helps SOC analysts understand and validate the hunting logic.


4. Malware Analyst

AspectDetails
Best modelQwen 3 8B
WhyMalware analysis is heavily code-centric: writing YARA rules, analyzing decompiled code, understanding shellcode. Qwen 3 scores 0.81 on YARA generation vs Granite 4's 0.78 — a meaningful difference when rule accuracy is critical.
Key skillsYARA rule writing, PE analysis, shellcode interpretation, behavioral analysis, sandbox report parsing
Training dataYARA rule examples, malware family descriptions, PE header analysis, behavioral IOC extraction

YARA generation comparison:

// Qwen 3 8B output (more precise, fewer false positives):
rule Emotet_Loader_2024 {
meta:
description = "Detects Emotet loader stage"
author = "AuroraSOC"
severity = "critical"
mitre = "T1059.001"
strings:
$mz = { 4D 5A }
$api1 = "VirtualAllocEx" ascii
$api2 = "WriteProcessMemory" ascii
$enc = { 8B ?? ?? ?? 33 ?? 89 ?? ?? ?? C1 ?? 05 }
$c2_pattern = /https?:\/\/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{2,5}\//
condition:
$mz at 0 and 2 of ($api*) and $enc and $c2_pattern
}

5. Forensic Analyst

AspectDetails
Best modelGemma 4 12B
WhyDigital forensics requires the most complex multi-step reasoning: analyzing memory dumps, correlating disk artifacts across timelines, maintaining chain-of-custody logic. Gemma 4's additional parameters (12B vs 8B) give it an edge on these reasoning-heavy tasks (0.83 vs 0.79 for IR planning).
Key skillsMemory forensics (Volatility), disk analysis, timeline reconstruction, evidence handling, chain-of-custody documentation
Training dataForensic investigation walkthroughs, Volatility output analysis, timeline reconstruction examples, evidence collection procedures

6. Threat Intelligence Analyst

AspectDetails
Best modelGranite 4 H-Small (8B)
WhyThreat intel requires generating structured output (STIX 2.1 bundles, Diamond Model analyses) and making API calls to threat intel platforms. Granite 4's structured JSON output capabilities and tool calling make it ideal.
Key skillsAPT attribution, STIX/TAXII generation, Diamond Model analysis, campaign tracking, IOC correlation

7. Incident Responder

AspectDetails
Best modelGemma 4 12B
WhyIncident response demands generating comprehensive, multi-phase response plans following NIST 800-61. The plans must be coherent, sequenced correctly, and account for dependencies. Gemma 4's superior reasoning produces better-structured response plans.
Key skillsNIST 800-61 playbooks, containment strategies, eradication procedures, recovery planning, lessons learned documentation

8-16. Remaining Agents (Summary)

AgentBest ModelKey Reasoning
Vulnerability ManagerGranite 4 H-SmallCVSS scoring is a structured task; Granite 4 excels at structured output with tool integration
Compliance AnalystGranite 4 H-SmallFramework mapping (CIS → NIST → PCI) requires tool-calling to compliance databases
Network SecurityQwen 3 8BSuricata rule writing is code generation; Qwen 3's coding strength transfer well
Endpoint SecurityGranite 4 H-SmallEDR triage requires rapid tool-calling classification; Granite 4's agentic pre-training excels
Cloud SecurityGranite 4 H-SmallCloud API interaction requires structured function calling to AWS/Azure/GCP services
CPS/OT SecurityGranite 4 H-SmallSpecialized protocol knowledge (Modbus, DNP3) with structured analysis output
Web SecurityQwen 3 8BAnalyzing XSS, SQLi, and code patterns is inherently a code-analysis task
UEBA AnalystGranite 4 H-SmallBehavioral baselines and anomaly detection via structured tool outputs
Report GeneratorGemma 4 12BLong-form coherent document generation benefits from additional model capacity

Deployment Configurations

Option A: Single-Model (Simplest)

Use one model for all agents. Best for getting started or resource-constrained environments.

Pros: Simple deployment, one model to serve, consistent behavior Cons: Not optimal for code-generation or reasoning-heavy agents

Configuration:

# .env
LLM_BACKEND=ollama
OLLAMA_MODEL=granite-soc:latest
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true # Uses per-agent LoRA adapters

VRAM for serving: ~9 GB (single GGUF q8_0)


Option B: Two-Model (Balanced)

Use Granite 4 for tool-calling agents and Qwen 3 for code-generation agents.

VRAM for serving: ~18 GB (two GGUF q8_0 models) — fits on an RTX 3090

Training cost: ~$7-8 on RunPod (RTX 3090, ~10 hours total)


Option C: Three-Model (Maximum Quality)

Use the best model for each task category:

VRAM for serving: ~31 GB (three GGUF q8_0 models) — requires A100 40GB or 2× RTX 3090

Training cost: ~$15-20 on RunPod (mixed RTX 3090 + A100 time)


Training All Agents (Step-by-Step)

Option A: Single-Model Training

# 1. Prepare datasets
make train-data

# 2. Train generic model
make train

# 3. Train all 9 agent specialists
python training/scripts/train_all_agents.py

# 4. Evaluate
make train-eval

# 5. Import to Ollama
python training/scripts/serve_model.py ollama-all \
--output-dir training/output

# 6. Enable in AuroraSOC
make enable-finetuned

Option B/C: Multi-Model Training

# 1. Prepare datasets
make train-data

# 2. Train Granite 4 agents
python training/scripts/finetune_granite.py \
--config training/configs/granite_soc_finetune.yaml \
--agent orchestrator
# Repeat for: security_analyst, threat_intel, vulnerability_manager,
# compliance_analyst, endpoint_security, cloud_security,
# cps_security, ueba_analyst

# 3. Train Qwen 3 agents (modify config to use Qwen 3 base)
python training/scripts/finetune_granite.py \
--config training/configs/qwen_soc_finetune.yaml \
--agent malware_analyst
# Repeat for: threat_hunter, network_security, web_security

# 4. Train Gemma 4 agents (if using Option C)
python training/scripts/finetune_granite.py \
--config training/configs/gemma_soc_finetune.yaml \
--agent forensic_analyst
# Repeat for: incident_responder, report_generator

# 5. Import all to Ollama (each with correct chat template)
python training/scripts/serve_model.py ollama-all \
--output-dir training/output \
--multi-model # Auto-detects model family for Modelfile template

# 6. Enable per-agent models
export GRANITE_USE_FINETUNED=true
export GRANITE_USE_PER_AGENT_MODELS=true

Performance Expectations by Configuration

ConfigurationAvg. ScoreTool CallingCode GenReasoningServing VRAMMonthly RunPod Cost
Base Granite 4 (no fine-tuning)0.450.550.400.429 GB$0
Single-model fine-tuned (Granite 4)0.820.880.780.809 GB$5 (one-time)
Two-model (Granite 4 + Qwen 3)0.840.880.820.8018 GB$8 (one-time)
Three-model (+ Gemma 4)0.860.880.820.8431 GB$18 (one-time)

The jump from no fine-tuning (0.45) to single-model fine-tuning (0.82) is enormous — nearly 2× improvement. The gains from two-model (0.84) to three-model (0.86) are incremental.

Recommendation

Start with single-model Granite 4 fine-tuning. The 0.82 average score represents a massive uplift from the 0.45 base. Only move to multi-model if you need the extra 2-4% for specific agents and have the infrastructure to serve multiple models.

Next Steps