Google Colab Training
Google Colab provides a free T4 GPU (16 GB VRAM) — enough to fine-tune Granite 4 Hybrid models for AuroraSOC. This is the recommended approach if you don't have a local GPU.
Why Colab?
| Benefit | Details |
|---|---|
| Free GPU | T4 (16 GB) on free tier, A100 (40/80 GB) on Pro |
| No hardware needed | Works from any browser |
| Pre-configured CUDA | No driver installation |
| Google Drive integration | Save models to Drive, download later |
| Shareable | Send the notebook link to teammates |
The Notebook
The training notebook is located at:
training/notebooks/AuroraSOC_Granite4_Finetune.ipynb
Opening in Colab
-
Upload to Google Drive:
- Upload
AuroraSOC_Granite4_Finetune.ipynbto your Google Drive - Double-click to open in Colab
- Upload
-
Or open directly:
- Go to colab.research.google.com
- Choose Upload → select the notebook file
-
Enable GPU runtime:
- Click Runtime → Change runtime type
- Select T4 GPU (free) or A100 (Pro)
- Click Save
Notebook Walkthrough
The notebook has 10 sections that mirror the local training pipeline:
Section 0: Install Dependencies
%%capture
!pip install unsloth torch transformers trl datasets pyyaml
!pip install --no-build-isolation mamba_ssm==2.2.5 causal_conv1d==1.5.2
Why: Installs Unsloth (2× faster training) and Mamba kernels (for Granite 4 Hybrid architecture). The Mamba kernel compilation takes ~10 minutes on first run — this is normal.
Section 1: Configuration
# Which agent to train? "all" = generic SOC model
AGENT_PROFILE = "all"
# Model variant
MODEL_NAME = "unsloth/granite-4.0-h-tiny"
# LoRA parameters
LORA_R = 64
LORA_ALPHA = 64
# Training parameters
BATCH_SIZE = 2
GRAD_ACCUM = 4
NUM_EPOCHS = 3
LEARNING_RATE = 2e-4
# Export format
GGUF_QUANT = "q8_0"
Key configuration choices:
| Parameter | Free T4 | Colab Pro (A100) | Why |
|---|---|---|---|
MODEL_NAME | granite-4.0-h-tiny | granite-4.0-h-small | Larger model = better quality, more VRAM |
BATCH_SIZE | 2 | 4-8 | A100 has more VRAM |
LORA_R | 64 | 64-128 | Higher rank = more capacity |
GGUF_QUANT | q8_0 | q8_0 | Highest quality that fits |
How to train per-agent specialists: Change AGENT_PROFILE from "all" to a specific agent name:
"security_analyst"— Alert analysis, IOC extraction, MITRE mapping"threat_hunter"— Hunting hypotheses, LOLBin detection"incident_responder"— NIST playbooks, containment plans"malware_analyst"— YARA rules, sandbox analysis"network_security"— Flow analysis, DNS tunneling"cps_security"— ICS/OT, Modbus, IEC 62443"threat_intel"— APT tracking, STIX/TAXII"forensic_analyst"— Memory/disk forensics, chain of custody"orchestrator"— Multi-agent routing and coordination
Section 2: Load Model + LoRA
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL_NAME,
max_seq_length=4096,
load_in_4bit=True, # Quantized loading (saves VRAM)
)
model = FastLanguageModel.get_peft_model(
model,
r=LORA_R,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
"shared_mlp.input_linear", "shared_mlp.output_linear", # Granite 4 Hybrid
],
lora_alpha=LORA_ALPHA,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
)
Why these targets? The target_modules list includes both standard Transformer layers (q_proj, k_proj, etc.) and Granite 4 Hybrid-specific Mamba layers (shared_mlp.input_linear, shared_mlp.output_linear). Training both ensures the model learns across its entire architecture.
Why load_in_4bit=True? 4-bit quantization (QLoRA) reduces the base model's memory footprint by ~4×, allowing the T4's 16 GB VRAM to handle the model + LoRA weights + optimizer states + activations. Without quantization, even the tiny model wouldn't fit.
Section 3: Upload & Prepare Dataset
The notebook expects your training data as a JSONL file uploaded to Colab:
from google.colab import files
# Option A: Upload from local machine
uploaded = files.upload() # Select your soc_train.jsonl
# Option B: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Then reference: /content/drive/MyDrive/path/to/soc_train.jsonl
Where does the data come from? Run make train-data locally first (no GPU needed), then upload the resulting training/data/soc_train.jsonl to Colab.
Section 4: Configure SFTTrainer
from trl import SFTConfig, SFTTrainer
from unsloth.chat_templates import train_on_responses_only
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
args=SFTConfig(
per_device_train_batch_size=BATCH_SIZE,
gradient_accumulation_steps=GRAD_ACCUM,
num_train_epochs=NUM_EPOCHS,
learning_rate=LEARNING_RATE,
bf16=True,
optim="adamw_8bit",
output_dir=OUTPUT_DIR,
),
)
# Only train on assistant responses
trainer = train_on_responses_only(
trainer,
instruction_part="<|start_of_role|>user<|end_of_role|>",
response_part="<|start_of_role|>assistant<|end_of_role|>",
)
Why train_on_responses_only? This is critical — it tells the trainer to compute loss only on the tokens after <|start_of_role|>assistant<|end_of_role|>. The model doesn't learn to generate system prompts or user queries — it only learns to generate analyst responses. This prevents the model from memorizing training prompts and produces much better generalization.
Section 5: Train
trainer.train()
On a free T4 with the default configuration:
- ~1000 samples: ~15 minutes
- ~5000 samples: ~1 hour
- ~10000 samples: ~2 hours
Section 6: Test Inference
The notebook includes a quick inference test before exporting:
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "You are the AuroraSOC Security Analyst..."},
{"role": "user", "content": "Analyze: ET TROJAN Cobalt Strike Beacon..."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(input_ids=inputs.to("cuda"), max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
When to test: Always run inference before exporting. If the output looks like gibberish or ignores the system prompt, training may have gone wrong (check loss values — should be < 0.5).
Section 7: Export to GGUF
model.save_pretrained_gguf(
OUTPUT_DIR,
tokenizer,
quantization_method=GGUF_QUANT, # "q8_0" or "q4_k_m"
)
This creates a GGUF file in the output directory. GGUF is the format that Ollama uses.
Quantization options:
| Method | Quality | File Size | Use Case |
|---|---|---|---|
q8_0 | High (best) | ~2-4 GB | Recommended default |
q4_k_m | Good | ~1-2 GB | When disk space is limited |
Section 8: Save to Google Drive
from google.colab import drive
drive.mount('/content/drive')
import shutil
shutil.copytree(OUTPUT_DIR, f"/content/drive/MyDrive/AuroraSOC/{OUTPUT_DIR}")
Why Google Drive? Colab sessions are temporary — when the session ends, all files are deleted. Saving to Drive preserves your model permanently. From Drive, you can download the GGUF file to your local machine.
Section 9: Download GGUF
from google.colab import files
files.download(f"{OUTPUT_DIR}/unsloth.Q8_0.gguf")
This triggers a browser download of the GGUF file.
After Training: Import to Local Ollama
Once you have the GGUF file on your local machine:
# Method 1: Using serve_model.py
python training/scripts/serve_model.py ollama \
--gguf /path/to/unsloth.Q8_0.gguf \
--name granite-soc:latest
# Method 2: Using setup_local.sh
./scripts/setup_local.sh --import /path/to/unsloth.Q8_0.gguf
# Method 3: Manual Modelfile
cat > Modelfile << 'EOF'
FROM /path/to/unsloth.Q8_0.gguf
TEMPLATE """{{- if .System }}<|start_of_role|>system<|end_of_role|>
{{ .System }}<|end_of_text|>
{{- end }}
<|start_of_role|>user<|end_of_role|>
{{ .Prompt }}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
{{ .Response }}<|end_of_text|>"""
PARAMETER temperature 0.1
PARAMETER top_p 0.95
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|start_of_role|>"
EOF
ollama create granite-soc:latest -f Modelfile
Then enable it in AuroraSOC:
make enable-finetuned
# This sets GRANITE_USE_FINETUNED=true in .env
Tips for Colab Training
Session Management
- Save frequently — Colab sessions can disconnect after 90 minutes of inactivity (free) or 24 hours (Pro)
- Mount Drive early — save checkpoints to Drive during training, not just at the end
- Use a browser extension to prevent idle disconnects (search "Colab keep alive")
Maximizing Free Tier
- Train during off-peak hours — weekday nights/weekends have better GPU availability
- Use
granite-4.0-h-tiny— fits comfortably in T4's 16 GB VRAM - Keep
BATCH_SIZE=2— larger batches risk OOM on T4 - Set
MAX_STEPS=200for quick tests before committing to full training
When to Upgrade to Pro
- You need
granite-4.0-h-small(requires A100's 40+ GB VRAM) - You're training many per-agent specialists (need long sessions)
- Free tier keeps disconnecting during training
Next Steps
- Per-Agent Specialists — train multiple specialist models
- Evaluation & Export — benchmark your model
- LLM Integration: Model Swap — enable fine-tuned models in AuroraSOC