Skip to main content

Google Colab Training

Google Colab provides a free T4 GPU (16 GB VRAM) — enough to fine-tune Granite 4 Hybrid models for AuroraSOC. This is the recommended approach if you don't have a local GPU.

Why Colab?

BenefitDetails
Free GPUT4 (16 GB) on free tier, A100 (40/80 GB) on Pro
No hardware neededWorks from any browser
Pre-configured CUDANo driver installation
Google Drive integrationSave models to Drive, download later
ShareableSend the notebook link to teammates

The Notebook

The training notebook is located at:

training/notebooks/AuroraSOC_Granite4_Finetune.ipynb

Opening in Colab

  1. Upload to Google Drive:

    • Upload AuroraSOC_Granite4_Finetune.ipynb to your Google Drive
    • Double-click to open in Colab
  2. Or open directly:

  3. Enable GPU runtime:

    • Click RuntimeChange runtime type
    • Select T4 GPU (free) or A100 (Pro)
    • Click Save

Notebook Walkthrough

The notebook has 10 sections that mirror the local training pipeline:

Section 0: Install Dependencies

%%capture
!pip install unsloth torch transformers trl datasets pyyaml
!pip install --no-build-isolation mamba_ssm==2.2.5 causal_conv1d==1.5.2

Why: Installs Unsloth (2× faster training) and Mamba kernels (for Granite 4 Hybrid architecture). The Mamba kernel compilation takes ~10 minutes on first run — this is normal.

Section 1: Configuration

# Which agent to train? "all" = generic SOC model
AGENT_PROFILE = "all"

# Model variant
MODEL_NAME = "unsloth/granite-4.0-h-tiny"

# LoRA parameters
LORA_R = 64
LORA_ALPHA = 64

# Training parameters
BATCH_SIZE = 2
GRAD_ACCUM = 4
NUM_EPOCHS = 3
LEARNING_RATE = 2e-4

# Export format
GGUF_QUANT = "q8_0"

Key configuration choices:

ParameterFree T4Colab Pro (A100)Why
MODEL_NAMEgranite-4.0-h-tinygranite-4.0-h-smallLarger model = better quality, more VRAM
BATCH_SIZE24-8A100 has more VRAM
LORA_R6464-128Higher rank = more capacity
GGUF_QUANTq8_0q8_0Highest quality that fits

How to train per-agent specialists: Change AGENT_PROFILE from "all" to a specific agent name:

  • "security_analyst" — Alert analysis, IOC extraction, MITRE mapping
  • "threat_hunter" — Hunting hypotheses, LOLBin detection
  • "incident_responder" — NIST playbooks, containment plans
  • "malware_analyst" — YARA rules, sandbox analysis
  • "network_security" — Flow analysis, DNS tunneling
  • "cps_security" — ICS/OT, Modbus, IEC 62443
  • "threat_intel" — APT tracking, STIX/TAXII
  • "forensic_analyst" — Memory/disk forensics, chain of custody
  • "orchestrator" — Multi-agent routing and coordination

Section 2: Load Model + LoRA

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL_NAME,
max_seq_length=4096,
load_in_4bit=True, # Quantized loading (saves VRAM)
)

model = FastLanguageModel.get_peft_model(
model,
r=LORA_R,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
"shared_mlp.input_linear", "shared_mlp.output_linear", # Granite 4 Hybrid
],
lora_alpha=LORA_ALPHA,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
)

Why these targets? The target_modules list includes both standard Transformer layers (q_proj, k_proj, etc.) and Granite 4 Hybrid-specific Mamba layers (shared_mlp.input_linear, shared_mlp.output_linear). Training both ensures the model learns across its entire architecture.

Why load_in_4bit=True? 4-bit quantization (QLoRA) reduces the base model's memory footprint by ~4×, allowing the T4's 16 GB VRAM to handle the model + LoRA weights + optimizer states + activations. Without quantization, even the tiny model wouldn't fit.

Section 3: Upload & Prepare Dataset

The notebook expects your training data as a JSONL file uploaded to Colab:

from google.colab import files

# Option A: Upload from local machine
uploaded = files.upload() # Select your soc_train.jsonl

# Option B: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Then reference: /content/drive/MyDrive/path/to/soc_train.jsonl

Where does the data come from? Run make train-data locally first (no GPU needed), then upload the resulting training/data/soc_train.jsonl to Colab.

Section 4: Configure SFTTrainer

from trl import SFTConfig, SFTTrainer
from unsloth.chat_templates import train_on_responses_only

trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
args=SFTConfig(
per_device_train_batch_size=BATCH_SIZE,
gradient_accumulation_steps=GRAD_ACCUM,
num_train_epochs=NUM_EPOCHS,
learning_rate=LEARNING_RATE,
bf16=True,
optim="adamw_8bit",
output_dir=OUTPUT_DIR,
),
)

# Only train on assistant responses
trainer = train_on_responses_only(
trainer,
instruction_part="<|start_of_role|>user<|end_of_role|>",
response_part="<|start_of_role|>assistant<|end_of_role|>",
)

Why train_on_responses_only? This is critical — it tells the trainer to compute loss only on the tokens after <|start_of_role|>assistant<|end_of_role|>. The model doesn't learn to generate system prompts or user queries — it only learns to generate analyst responses. This prevents the model from memorizing training prompts and produces much better generalization.

Section 5: Train

trainer.train()

On a free T4 with the default configuration:

  • ~1000 samples: ~15 minutes
  • ~5000 samples: ~1 hour
  • ~10000 samples: ~2 hours

Section 6: Test Inference

The notebook includes a quick inference test before exporting:

FastLanguageModel.for_inference(model)

messages = [
{"role": "system", "content": "You are the AuroraSOC Security Analyst..."},
{"role": "user", "content": "Analyze: ET TROJAN Cobalt Strike Beacon..."},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(input_ids=inputs.to("cuda"), max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

When to test: Always run inference before exporting. If the output looks like gibberish or ignores the system prompt, training may have gone wrong (check loss values — should be < 0.5).

Section 7: Export to GGUF

model.save_pretrained_gguf(
OUTPUT_DIR,
tokenizer,
quantization_method=GGUF_QUANT, # "q8_0" or "q4_k_m"
)

This creates a GGUF file in the output directory. GGUF is the format that Ollama uses.

Quantization options:

MethodQualityFile SizeUse Case
q8_0High (best)~2-4 GBRecommended default
q4_k_mGood~1-2 GBWhen disk space is limited

Section 8: Save to Google Drive

from google.colab import drive
drive.mount('/content/drive')

import shutil
shutil.copytree(OUTPUT_DIR, f"/content/drive/MyDrive/AuroraSOC/{OUTPUT_DIR}")

Why Google Drive? Colab sessions are temporary — when the session ends, all files are deleted. Saving to Drive preserves your model permanently. From Drive, you can download the GGUF file to your local machine.

Section 9: Download GGUF

from google.colab import files
files.download(f"{OUTPUT_DIR}/unsloth.Q8_0.gguf")

This triggers a browser download of the GGUF file.

After Training: Import to Local Ollama

Once you have the GGUF file on your local machine:

# Method 1: Using serve_model.py
python training/scripts/serve_model.py ollama \
--gguf /path/to/unsloth.Q8_0.gguf \
--name granite-soc:latest

# Method 2: Using setup_local.sh
./scripts/setup_local.sh --import /path/to/unsloth.Q8_0.gguf

# Method 3: Manual Modelfile
cat > Modelfile << 'EOF'
FROM /path/to/unsloth.Q8_0.gguf

TEMPLATE """{{- if .System }}<|start_of_role|>system<|end_of_role|>
{{ .System }}<|end_of_text|>
{{- end }}
<|start_of_role|>user<|end_of_role|>
{{ .Prompt }}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
{{ .Response }}<|end_of_text|>"""

PARAMETER temperature 0.1
PARAMETER top_p 0.95
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|start_of_role|>"
EOF

ollama create granite-soc:latest -f Modelfile

Then enable it in AuroraSOC:

make enable-finetuned
# This sets GRANITE_USE_FINETUNED=true in .env

Tips for Colab Training

Session Management

  • Save frequently — Colab sessions can disconnect after 90 minutes of inactivity (free) or 24 hours (Pro)
  • Mount Drive early — save checkpoints to Drive during training, not just at the end
  • Use a browser extension to prevent idle disconnects (search "Colab keep alive")

Maximizing Free Tier

  • Train during off-peak hours — weekday nights/weekends have better GPU availability
  • Use granite-4.0-h-tiny — fits comfortably in T4's 16 GB VRAM
  • Keep BATCH_SIZE=2 — larger batches risk OOM on T4
  • Set MAX_STEPS=200 for quick tests before committing to full training

When to Upgrade to Pro

  • You need granite-4.0-h-small (requires A100's 40+ GB VRAM)
  • You're training many per-agent specialists (need long sessions)
  • Free tier keeps disconnecting during training

Next Steps