Training Pipeline — Complete Guide for Beginners
This guide is written for someone who has never trained a machine learning model before. If you follow it from top to bottom, you will go from raw security data to a fine-tuned Granite model that AuroraSOC can serve.
Why AuroraSOC Fine-Tunes Its Own Models
A general model can know many security facts, but SOC work requires something stricter: operational reasoning under pressure. "Knowing about" security means recognizing terms like CVE, MITRE ATT&CK, or ransomware. "Thinking like" a SOC analyst means rapidly connecting those facts to affected assets, CVSS severity, exploitability, blast radius, containment steps, and remediation priority in one coherent response. AuroraSOC fine-tunes for that behavior.
Training from scratch is not realistic for most teams. Full pre-training requires massive infrastructure, typically thousands of GPUs and multi-million-dollar budgets. AuroraSOC instead uses LoRA fine-tuning on an 8B model, which can run on one strong consumer GPU in hours, not months.
Key Concepts (Plain Language First)
LoRA (Low-Rank Adaptation)
LoRA does not retrain all model parameters. It keeps original weights frozen and trains lightweight adapter layers that encode domain specialization. In practice, this is often around 0.1% of full model size.
In AuroraSOC config:
r=64is the adapter rank (capacity).alpha=64is scaling for adapter updates.
Higher rank means more representational power, but more VRAM and compute cost.
Unsloth
Unsloth is an open-source training stack that optimizes kernels and memory behavior for efficient fine-tuning. A practical rule of thumb is around 2x faster training and much lower VRAM use than a naive setup. That difference is why an RTX 3090 can be practical for Granite-domain LoRA where older workflows might demand much larger accelerators.
Response-only masking
During instruction tuning, each sample has prompt and answer text. Response-only masking trains on the assistant answer tokens, not the user prompt tokens. That focuses learning signal on what the model must generate in production.
Think of it as grading only the student's answer, not the exam question.
Hardware Requirements
Local hardware
| Component | Minimum | Recommended |
|---|---|---|
| GPU | RTX 3090 (24 GB VRAM) | A100 (80 GB) |
| System RAM | 32 GB | 64 GB |
| Storage | 100 GB SSD | 500 GB NVMe |
| CUDA | 12.1+ | 12.4+ |
Cloud options
- RunPod: RTX 3090 around $0.39/hour. Specialist training in roughly 3-4 hours is about $1.50 per run.
- Google Colab Pro: A100 runtime is preferred. Free T4 (16 GB) is borderline for 8B and often requires aggressive memory settings.
- Notebook path for guided runs:
training/notebooks/AuroraSOC_Granite4_Finetune.ipynb.
Step-by-Step Training Walkthrough
Each step includes what it does, command(s), and realistic expected output.
Step 0: Verify your GPU
What this does: Confirms your machine can see a CUDA-capable device. If this step fails, training will fail later.
Commands:
nvidia-smi
python3 -c "import torch; print(torch.cuda.get_device_name(0))"
Expected output:
nvidia-smishows your GPU and driver.- Python prints a GPU name such as
NVIDIA GeForce RTX 3090.
Step 1: Install training dependencies
What this does: Installs the libraries used by data preparation, fine-tuning, evaluation, and export.
Command:
pip install -e ".[training]"
Expected output:
- Dependencies install without hard errors.
- Key package families include Unsloth, Transformers, Datasets, PEFT, TRL, and bitsandbytes.
Step 2: Prepare datasets
What this does: Builds instruction-following training data from public security corpora and synthetic SOC scenarios. The prep script fetches/uses sources such as MITRE ATT&CK techniques, Sigma detection rules, and NVD vulnerability records, then converts them into chat-style instruction/response examples.
Commands:
python training/scripts/prepare_datasets.py --output-dir training/data
Expected output:
- Logs for each dataset source pipeline.
- New files under
training/data/includingsoc_train.jsonl,soc_eval.jsonl, anddomain/*.jsonl. - Summary line showing total sample counts.
Step 3: Fine-tune the specialist model
What this does: Loads Granite 4 base weights, applies LoRA adapters, trains on SOC-formatted examples, saves checkpoints, and exports merged FP16 weights for serving.
Command:
python scripts/finetune_granite.py --model-type specialist
Expected output:
- Training progress logs with decreasing loss.
- Typical trend: loss can start around ~2.x and move below ~1.0 with sufficient data/epochs.
- Wall time: often ~2-4 hours on RTX 3090 class hardware.
- Export path for specialist:
training/output/granite-soc-specialist/.
Step 4: Fine-tune the orchestrator model
What this does: Runs a second fine-tuning pass for orchestrator behavior, so coordination and delegation logic has a dedicated model artifact.
Command:
python scripts/finetune_granite.py --model-type orchestrator
Expected output:
- Similar training logs and checkpoint saves.
- Export path for orchestrator:
training/output/granite-soc-orchestrator/.
Step 5: Evaluate
What this does: Runs benchmark prompts across security domains and scores response quality using expected keyword coverage plus timing metrics.
Command:
python scripts/evaluate_model.py --model vllm:granite-soc-specialist --vllm-base-url http://localhost:8000/v1
Expected output:
- Per-benchmark PASS/FAIL lines.
- Summary with pass rate and average latency.
- Results file (for example
training/eval_results.json) written to disk.
How to read scores:
- Higher pass rate means broader benchmark coverage.
- Keyword hit rate measures whether core domain concepts were present.
- Latency metrics help determine serving readiness under operational constraints.
Step 6: Deploy
What this does: Builds both model variants and starts serving backend for runtime use.
Commands:
make train-all
make vllm-up
make vllm-status
Expected output:
make train-allruns specialist then orchestrator training targets.make vllm-upstarts the vLLM container.make vllm-statusreturns health JSON or a reachable-status message.
Output Files and What They Mean
training/
├── output/
│ ├── granite-soc-specialist/ ← Merged FP16 weights loaded by vLLM
│ │ ├── config.json ← Model architecture config
│ │ ├── tokenizer.json ← Text-to-token mapping
│ │ ├── tokenizer_config.json
│ │ └── model-*.safetensors ← Actual weight tensors
│ └── granite-soc-orchestrator/ ← Same structure, orchestrator weights
├── checkpoints/ ← LoRA adapter checkpoints (resumable)
│ └── checkpoint-*/
└── data/ ← Prepared training datasets
Troubleshooting (Symptom → Cause → Exact Fix)
torch.cuda.OutOfMemoryError during training
- Cause: effective batch/sequence footprint exceeds GPU VRAM.
- Fix: reduce batch size, increase gradient accumulation, lower max sequence length, and close other GPU workloads.
HuggingFace 401 Unauthorized
- Cause: missing/invalid authentication token when pulling gated assets.
- Fix: export
HF_TOKENwith valid scope, then retry model/data pull.
ModuleNotFoundError: No module named 'unsloth'
- Cause: training dependencies not installed in the active environment.
- Fix: install training dependencies in the same Python environment used to run scripts.
Training loss is NaN from step 1
- Cause: unstable optimizer state, bad mixed-precision configuration, or malformed data sample.
- Fix: lower learning rate, verify bf16/fp16 compatibility, validate dataset JSONL formatting, and restart from clean checkpoint.
Training completes but vLLM cannot load model
- Cause: incomplete export artifacts or mismatched model directory.
- Fix: ensure merged FP16 export exists in
training/output/granite-soc-specialist, then point vLLM--modelto that directory.
Export validation FAILED in finetune_granite.py
- Cause: exported weights/tokenizer cannot be reloaded for a sanity forward pass.
- Fix: rerun export, verify disk space, confirm required files exist (
config.json, tokenizer files, safetensors), and retry validation.