Cloud GPU Training Guide

Not everyone has a local GPU. This guide walks you through training AuroraSOC models on cloud GPU platforms — from zero to a fully fine-tuned model you can download and run locally.

Platform Comparison

Platform	Best GPU	Cost/hour	Minimum Spend	Setup Time	Best For
RunPod ⭐	RTX 3090 (24 GB)	$0.69	None	5 min	Best value for 8B models
RunPod	A100 40 GB	$1.49	None	5 min	12B+ models
Lambda Labs	A100 80 GB	$1.29	None	10 min	Large-scale training
vast.ai	RTX 3090	~$0.40	None	15 min	Cheapest option (variable)
Google Colab Pro	A100 40 GB	$9.99/mo (limited)	$9.99/mo	2 min	Quick experiments
AWS SageMaker	A100 40 GB	~$5.67	Pay-per-use	30 min	Enterprise / existing AWS

RunPod (Recommended)

RunPod is the best balance of cost, ease of use, and reliability for AuroraSOC fine-tuning.

Step 1: Create Account & Add Funds

Go to runpod.io and create an account
Add funds: $10-15 is enough to fine-tune all 9 agents on Granite 4
Navigate to GPU Cloud → Secure Cloud (recommended) or Community Cloud (cheaper)

Step 2: Choose a GPU Pod

For AuroraSOC fine-tuning, you need:

Model Size	Minimum GPU	Recommended GPU	Cost/hr
Granite 4 H-Micro (3B)	RTX 3090 (24 GB)	RTX 3090	$0.69
Granite 4 H-Small (8B)	RTX 3090 (24 GB)	RTX 3090	$0.69
Qwen 3 8B	RTX 3090 (24 GB)	RTX 3090	$0.69
Gemma 4 12B	A100 40 GB	A100 40 GB	$1.49
Qwen 3 14B	A100 40 GB	A100 40 GB	$1.49
Qwen 3 32B / 30B-A3B	A100 80 GB	A100 80 GB	$2.49

Cost Saving

For most AuroraSOC agents, an RTX 3090 at $0.69/hr is sufficient. Only use A100 for 12B+ models.

Step 3: Select a Template

Choose RunPod PyTorch 2.4+ template, which includes:

CUDA 12.4
Python 3.11
PyTorch 2.4
50 GB disk (increase to 100 GB for training output)

Pod configuration:

Container disk: 20 GB
Volume disk: 100 GB (persistent — survives pod restarts)
Expose ports: 8888 (Jupyter), 22 (SSH)

Step 4: Connect via SSH

# RunPod provides an SSH command — copy it from the pod dashboard:
ssh root@<pod-ip> -p <port> -i ~/.ssh/id_rsa

# First time: add your SSH key in RunPod Settings → SSH Keys

Or use the web terminal in the RunPod dashboard (no SSH setup needed).

Step 5: Set Up the Training Environment

# Connect to your pod

# Clone AuroraSOC
git clone https://github.com/your-org/AuroraSOC.git
cd AuroraSOC

# Install dependencies
pip install -e ".[training]"

# Verify GPU
python -c "import torch; print(f'GPU: {torch.cuda.get_device_name(0)}'); print(f'VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB')"
# GPU: NVIDIA GeForce RTX 3090
# VRAM: 24.0 GB

# Verify Unsloth
python -c "from unsloth import FastLanguageModel; print('Unsloth ready!')"

Step 6: Run Training

# Prepare datasets
make train-data

# Train generic model (all domains)
make train

# Train per-agent specialists
python training/scripts/train_all_agents.py

# Or train a single agent:
python training/scripts/finetune_granite.py \
  --config training/configs/granite_soc_finetune.yaml \
  --agent malware_analyst

Expected training times (RTX 3090, Granite 4 H-Small):

Step	Duration	Cost
Dataset preparation	5 min	$0.06
Generic model training	25 min	$0.29
Per-agent training (×9 agents)	3-4 hours	$2.30-2.76
Evaluation	10 min	$0.12
GGUF export	15 min	$0.17
Total	~4.5 hours	~$3.10

Step 7: Export & Download

# Export to GGUF
make train-export

# Download to local machine (from your LOCAL terminal):
scp -P <port> root@<pod-ip>:/workspace/AuroraSOC/training/output/gguf/*.gguf ./models/

# Or use RunPod's file browser in the web UI

Step 8: Stop the Pod

warning

Always stop your pod when done! RunPod charges by the hour. An idle RTX 3090 pod costs $0.69/hr = $16.56/day.

# From RunPod dashboard: Click "Stop" on your pod
# Your /workspace volume persists — you can restart later without losing data

Lambda Labs

Lambda Labs offers on-demand A100 instances with a clean Ubuntu environment. Good for larger models (12B+).

Quick Start

# 1. Create account at lambdalabs.com
# 2. Launch instance: 1x A100 (40 GB) at $1.29/hr
# 3. SSH in:
ssh ubuntu@<instance-ip>

# 4. Set up environment
sudo apt update && sudo apt install -y git python3-pip
git clone https://github.com/your-org/AuroraSOC.git
cd AuroraSOC
pip install -e ".[training]"

# 5. Run training (same commands as RunPod)
make train-data && make train && python training/scripts/train_all_agents.py

Cost comparison for full training:

Configuration	Lambda A100 ($1.29/hr)	RunPod RTX 3090 ($0.69/hr)
Generic model only	$0.54	$0.29
All 9 agents	$6.45	$3.10
All agents + 3 models (Option C)	$25.80	$14.50

Lambda is ~1.9× more expensive than RunPod for the same work, but the A100 is faster for 12B+ models.

vast.ai

vast.ai is a marketplace for GPU rentals — prices vary based on supply and demand. It can be the cheapest option but requires more setup.

Quick Start

# 1. Create account at vast.ai
# 2. Install vast CLI
pip install vastai
vastai set api-key <your-key>

# 3. Search for GPUs
vastai search offers 'gpu_ram >= 24 cuda_vers >= 12.0 reliability > 0.95' \
  --order 'dph_total'

# 4. Create instance (pick cheapest RTX 3090)
vastai create instance <offer-id> \
  --image pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel \
  --disk 100

# 5. SSH in
vastai ssh-url <instance-id>
ssh -p <port> root@<host>

# 6. Same training commands
git clone https://github.com/your-org/AuroraSOC.git
cd AuroraSOC && pip install -e ".[training]"
make train-data && make train

caution

vast.ai uses community-provided GPUs. Instances can be interrupted if the provider needs their GPU back. Always save checkpoints frequently and download results promptly.

Google Colab Pro

Colab Pro is the easiest option for experiments and testing, but not ideal for full training runs.

See the dedicated Colab Training Guide for the full notebook walkthrough.

Quick summary:

$9.99/month for Colab Pro (A100 access, 24-hour sessions)
No SSH needed — runs in browser
Limited to ~12 hours continuous runtime (Pro+: ~24 hours)
Storage via Google Drive (15 GB free, 100 GB with Google One)

Best for: Training a single agent, quick experiments, testing hyperparameters.

Not recommended for: Training all 9 agents (insufficient runtime) or multi-model configurations.

GPU Selection Guide

How Much VRAM Do You Need?

GPU Speed Comparison

Real-world training times for Granite 4 H-Small (8B), single agent, 3 epochs:

GPU	VRAM	Training Time	Cost (RunPod)
T4	16 GB	52 min	$0.32
RTX 3090	24 GB	25 min	$0.29
RTX 4090	24 GB	18 min	$0.54
A100 40 GB	40 GB	14 min	$0.35
A100 80 GB	80 GB	12 min	$0.50
H100 80 GB	80 GB	8 min	$0.53

Best Value

The RTX 3090 consistently offers the lowest total cost despite not being the fastest. At $0.69/hr and 25 minutes per agent, you spend only $0.29 per agent fine-tune.

Cost Calculator

Single-Model Configuration (Granite 4 only)

Base training:          25 min × $0.69/hr = $0.29
Per-agent (×9):        225 min × $0.69/hr = $2.59
Dataset prep:            5 min × $0.69/hr = $0.06
Export + eval:          25 min × $0.69/hr = $0.29
────────────────────────────────────────────────
Total:                 280 min = 4.7 hrs → $3.23

Two-Model Configuration (Granite 4 + Qwen 3)

Granite 4 agents (×9):  225 min × $0.69/hr = $2.59
Qwen 3 agents (×4):     100 min × $0.69/hr = $1.15
Dataset prep:              5 min × $0.69/hr = $0.06
Export + eval:            40 min × $0.69/hr = $0.46
────────────────────────────────────────────────
Total:                   370 min = 6.2 hrs → $4.26

Three-Model Configuration (Granite 4 + Qwen 3 + Gemma 4)

Granite 4 agents (×9):  225 min × $0.69/hr  = $2.59
Qwen 3 agents (×4):     100 min × $0.69/hr  = $1.15
Gemma 4 agents (×3):    120 min × $1.49/hr  = $2.98
Dataset prep:              5 min × $0.69/hr  = $0.06
Export + eval:            45 min × $1.49/hr  = $1.12
────────────────────────────────────────────────
Total:                   495 min = 8.25 hrs → $7.90

Data Transfer Tips

Uploading Training Data

# Option 1: Clone from Git (fastest if data is in repo)
git clone --depth 1 https://github.com/your-org/AuroraSOC.git

# Option 2: rsync for large datasets
rsync -avz --progress training/data/ root@<pod-ip>:/workspace/AuroraSOC/training/data/

# Option 3: Hugging Face Hub (for public/private datasets)
pip install huggingface-hub
huggingface-cli download your-org/aurora-soc-dataset --local-dir training/data/

Downloading Results

# Download only GGUF files (smallest, ready for serving)
scp -P <port> root@<pod-ip>:/workspace/AuroraSOC/training/output/gguf/*.gguf ./models/

# Download LoRA adapters (for later merging)
scp -P <port> -r root@<pod-ip>:/workspace/AuroraSOC/training/output/*/adapter_model.safetensors ./adapters/

# Download everything
rsync -avz root@<pod-ip>:/workspace/AuroraSOC/training/output/ ./training/output/

Troubleshooting

Common Cloud Training Issues

Issue	Cause	Solution
`CUDA out of memory`	Batch size too large or wrong GPU	Reduce `per_device_train_batch_size` to 1, enable gradient checkpointing
`Connection reset`	Pod terminated	Use RunPod persistent volumes; restart pod and resume from checkpoint
`Permission denied` on SSH	SSH key not configured	Add key in platform settings; use web terminal as fallback
Training slower than expected	Shared GPU (vast.ai)	Monitor with `nvidia-smi`; switch to dedicated instance
Disk full	Output files accumulating	Increase volume to 200 GB; delete intermediate checkpoints
`ModuleNotFoundError: unsloth`	Missing dependency	`pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"`

Resuming from Checkpoints

If your training is interrupted (pod preempted, connection dropped):

# Find the latest checkpoint
ls -la training/output/checkpoint-*/

# Resume training from checkpoint
python training/scripts/finetune_granite.py \
  --config training/configs/granite_soc_finetune.yaml \
  --resume-from-checkpoint training/output/checkpoint-500/

Next Steps

Fine-Tuning Methods — understand QLoRA, LoRA, DPO, and ORPO
Model Comparison — compare Granite 4, Qwen 3, and Gemma 4
Agent Model Selection — which model for which agent
Evaluation & Export — evaluate and export your trained models

Platform Comparison​

RunPod (Recommended)​

Step 1: Create Account & Add Funds​

Step 2: Choose a GPU Pod​

Step 3: Select a Template​

Step 4: Connect via SSH​

Step 5: Set Up the Training Environment​

Step 6: Run Training​

Step 7: Export & Download​

Step 8: Stop the Pod​

Lambda Labs​

Quick Start​

vast.ai​

Quick Start​

Google Colab Pro​

GPU Selection Guide​

How Much VRAM Do You Need?​

GPU Speed Comparison​

Cost Calculator​

Single-Model Configuration (Granite 4 only)​

Two-Model Configuration (Granite 4 + Qwen 3)​

Three-Model Configuration (Granite 4 + Qwen 3 + Gemma 4)​

Data Transfer Tips​

Uploading Training Data​

Downloading Results​

Troubleshooting​

Common Cloud Training Issues​

Resuming from Checkpoints​

Next Steps​

Platform Comparison

RunPod (Recommended)

Step 1: Create Account & Add Funds

Step 2: Choose a GPU Pod

Step 3: Select a Template

Step 4: Connect via SSH

Step 5: Set Up the Training Environment

Step 6: Run Training

Step 7: Export & Download

Step 8: Stop the Pod

Lambda Labs

Quick Start

vast.ai

Quick Start

Google Colab Pro

GPU Selection Guide

How Much VRAM Do You Need?

GPU Speed Comparison

Cost Calculator

Single-Model Configuration (Granite 4 only)

Two-Model Configuration (Granite 4 + Qwen 3)

Three-Model Configuration (Granite 4 + Qwen 3 + Gemma 4)

Data Transfer Tips

Uploading Training Data

Downloading Results

Troubleshooting

Common Cloud Training Issues

Resuming from Checkpoints

Next Steps