Google Colab Training

Google Colab provides a free T4 GPU (16 GB VRAM) — enough to fine-tune Granite 4 Hybrid models for AuroraSOC. This is the recommended approach if you don't have a local GPU.

Why Colab?

Benefit	Details
Free GPU	T4 (16 GB) on free tier, A100 (40/80 GB) on Pro
No hardware needed	Works from any browser
Pre-configured CUDA	No driver installation
Google Drive integration	Save models to Drive, download later
Shareable	Send the notebook link to teammates

The Notebook

The training notebook is located at:

training/notebooks/AuroraSOC_Granite4_Finetune.ipynb

Opening in Colab

Upload to Google Drive:
- Upload AuroraSOC_Granite4_Finetune.ipynb to your Google Drive
- Double-click to open in Colab
Or open directly:
- Go to colab.research.google.com
- Choose Upload → select the notebook file
Enable GPU runtime:
- Click Runtime → Change runtime type
- Select T4 GPU (free) or A100 (Pro)
- Click Save

Notebook Walkthrough

The notebook has 10 sections that mirror the local training pipeline:

Section 0: Install Dependencies

%%capture
!pip install unsloth torch transformers trl datasets pyyaml
!pip install --no-build-isolation mamba_ssm==2.2.5 causal_conv1d==1.5.2

Why: Installs Unsloth (2× faster training) and Mamba kernels (for Granite 4 Hybrid architecture). The Mamba kernel compilation takes ~10 minutes on first run — this is normal.

Section 1: Configuration

# Which agent to train? "all" = generic SOC model
AGENT_PROFILE = "all"

# Model variant
MODEL_NAME = "unsloth/granite-4.0-h-tiny"

# LoRA parameters
LORA_R = 64
LORA_ALPHA = 64

# Training parameters
BATCH_SIZE = 2
GRAD_ACCUM = 4
NUM_EPOCHS = 3
LEARNING_RATE = 2e-4

# Export format
GGUF_QUANT = "q8_0"

Key configuration choices:

Parameter	Free T4	Colab Pro (A100)	Why
`MODEL_NAME`	`granite-4.0-h-tiny`	`granite-4.0-h-small`	Larger model = better quality, more VRAM
`BATCH_SIZE`	2	4-8	A100 has more VRAM
`LORA_R`	64	64-128	Higher rank = more capacity
`GGUF_QUANT`	`q8_0`	`q8_0`	Highest quality that fits

How to train per-agent specialists: Change AGENT_PROFILE from "all" to a specific agent name:

"security_analyst" — Alert analysis, IOC extraction, MITRE mapping
"threat_hunter" — Hunting hypotheses, LOLBin detection
"incident_responder" — NIST playbooks, containment plans
"malware_analyst" — YARA rules, sandbox analysis
"network_security" — Flow analysis, DNS tunneling
"cps_security" — ICS/OT, Modbus, IEC 62443
"threat_intel" — APT tracking, STIX/TAXII
"forensic_analyst" — Memory/disk forensics, chain of custody
"orchestrator" — Multi-agent routing and coordination

Section 2: Load Model + LoRA

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=4096,
    load_in_4bit=True,    # Quantized loading (saves VRAM)
)

model = FastLanguageModel.get_peft_model(
    model,
    r=LORA_R,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
        "shared_mlp.input_linear", "shared_mlp.output_linear",  # Granite 4 Hybrid
    ],
    lora_alpha=LORA_ALPHA,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

Why these targets? The target_modules list includes both standard Transformer layers (q_proj, k_proj, etc.) and Granite 4 Hybrid-specific Mamba layers (shared_mlp.input_linear, shared_mlp.output_linear). Training both ensures the model learns across its entire architecture.

Why load_in_4bit=True? 4-bit quantization (QLoRA) reduces the base model's memory footprint by ~4×, allowing the T4's 16 GB VRAM to handle the model + LoRA weights + optimizer states + activations. Without quantization, even the tiny model wouldn't fit.

Section 3: Upload & Prepare Dataset

The notebook expects your training data as a JSONL file uploaded to Colab:

from google.colab import files

# Option A: Upload from local machine
uploaded = files.upload()  # Select your soc_train.jsonl

# Option B: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Then reference: /content/drive/MyDrive/path/to/soc_train.jsonl

Where does the data come from? Run make train-data locally first (no GPU needed), then upload the resulting training/data/soc_train.jsonl to Colab.

Section 4: Configure SFTTrainer

from trl import SFTConfig, SFTTrainer
from unsloth.chat_templates import train_on_responses_only

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=SFTConfig(
        per_device_train_batch_size=BATCH_SIZE,
        gradient_accumulation_steps=GRAD_ACCUM,
        num_train_epochs=NUM_EPOCHS,
        learning_rate=LEARNING_RATE,
        bf16=True,
        optim="adamw_8bit",
        output_dir=OUTPUT_DIR,
    ),
)

# Only train on assistant responses
trainer = train_on_responses_only(
    trainer,
    instruction_part="<|start_of_role|>user<|end_of_role|>",
    response_part="<|start_of_role|>assistant<|end_of_role|>",
)

Why train_on_responses_only? This is critical — it tells the trainer to compute loss only on the tokens after <|start_of_role|>assistant<|end_of_role|>. The model doesn't learn to generate system prompts or user queries — it only learns to generate analyst responses. This prevents the model from memorizing training prompts and produces much better generalization.

Section 5: Train

trainer.train()

On a free T4 with the default configuration:

~1000 samples: ~15 minutes
~5000 samples: ~1 hour
~10000 samples: ~2 hours

Section 6: Test Inference

The notebook includes a quick inference test before exporting:

FastLanguageModel.for_inference(model)

messages = [
    {"role": "system", "content": "You are the AuroraSOC Security Analyst..."},
    {"role": "user", "content": "Analyze: ET TROJAN Cobalt Strike Beacon..."},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(input_ids=inputs.to("cuda"), max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

When to test: Always run inference before exporting. If the output looks like gibberish or ignores the system prompt, training may have gone wrong (check loss values — should be < 0.5).

Section 7: Export to GGUF

model.save_pretrained_gguf(
    OUTPUT_DIR,
    tokenizer,
    quantization_method=GGUF_QUANT,  # "q8_0" or "q4_k_m"
)

This creates a GGUF file in the output directory. GGUF is the format that Ollama uses.

Quantization options:

Method	Quality	File Size	Use Case
`q8_0`	High (best)	~2-4 GB	Recommended default
`q4_k_m`	Good	~1-2 GB	When disk space is limited

Section 8: Save to Google Drive

from google.colab import drive
drive.mount('/content/drive')

import shutil
shutil.copytree(OUTPUT_DIR, f"/content/drive/MyDrive/AuroraSOC/{OUTPUT_DIR}")

Why Google Drive? Colab sessions are temporary — when the session ends, all files are deleted. Saving to Drive preserves your model permanently. From Drive, you can download the GGUF file to your local machine.

Section 9: Download GGUF

from google.colab import files
files.download(f"{OUTPUT_DIR}/unsloth.Q8_0.gguf")

This triggers a browser download of the GGUF file.

After Training: Import to Local Ollama

Once you have the GGUF file on your local machine:

# Method 1: Using serve_model.py
python training/scripts/serve_model.py ollama \
  --gguf /path/to/unsloth.Q8_0.gguf \
  --name granite-soc:latest

# Method 2: Using setup_local.sh
./scripts/setup_local.sh --import /path/to/unsloth.Q8_0.gguf

# Method 3: Manual Modelfile
cat > Modelfile << 'EOF'
FROM /path/to/unsloth.Q8_0.gguf

TEMPLATE """{{- if .System }}<|start_of_role|>system<|end_of_role|>
{{ .System }}<|end_of_text|>
{{- end }}
<|start_of_role|>user<|end_of_role|>
{{ .Prompt }}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
{{ .Response }}<|end_of_text|>"""

PARAMETER temperature 0.1
PARAMETER top_p 0.95
PARAMETER stop "<|end_of_text|>"
PARAMETER stop "<|start_of_role|>"
EOF

ollama create granite-soc:latest -f Modelfile

Then enable it in AuroraSOC:

make enable-finetuned
# This sets GRANITE_USE_FINETUNED=true in .env

Tips for Colab Training

Session Management

Save frequently — Colab sessions can disconnect after 90 minutes of inactivity (free) or 24 hours (Pro)
Mount Drive early — save checkpoints to Drive during training, not just at the end
Use a browser extension to prevent idle disconnects (search "Colab keep alive")

Maximizing Free Tier

Train during off-peak hours — weekday nights/weekends have better GPU availability
Use granite-4.0-h-tiny — fits comfortably in T4's 16 GB VRAM
Keep BATCH_SIZE=2 — larger batches risk OOM on T4
Set MAX_STEPS=200 for quick tests before committing to full training

When to Upgrade to Pro

You need granite-4.0-h-small (requires A100's 40+ GB VRAM)
You're training many per-agent specialists (need long sessions)
Free tier keeps disconnecting during training

Next Steps

Per-Agent Specialists — train multiple specialist models
Evaluation & Export — benchmark your model
LLM Integration: Model Swap — enable fine-tuned models in AuroraSOC

Why Colab?​

The Notebook​

Opening in Colab​

Notebook Walkthrough​

Section 0: Install Dependencies​

Section 1: Configuration​

Section 2: Load Model + LoRA​

Section 3: Upload & Prepare Dataset​

Section 4: Configure SFTTrainer​

Section 5: Train​

Section 6: Test Inference​

Section 7: Export to GGUF​

Section 8: Save to Google Drive​

Section 9: Download GGUF​

After Training: Import to Local Ollama​

Tips for Colab Training​

Session Management​

Maximizing Free Tier​

When to Upgrade to Pro​

Next Steps​