Cloud GPU Training Guide
Not everyone has a local GPU. This guide walks you through training AuroraSOC models on cloud GPU platforms — from zero to a fully fine-tuned model you can download and run locally.
Platform Comparison
| Platform | Best GPU | Cost/hour | Minimum Spend | Setup Time | Best For |
|---|---|---|---|---|---|
| RunPod ⭐ | RTX 3090 (24 GB) | $0.69 | None | 5 min | Best value for 8B models |
| RunPod | A100 40 GB | $1.49 | None | 5 min | 12B+ models |
| Lambda Labs | A100 80 GB | $1.29 | None | 10 min | Large-scale training |
| vast.ai | RTX 3090 | ~$0.40 | None | 15 min | Cheapest option (variable) |
| Google Colab Pro | A100 40 GB | $9.99/mo (limited) | $9.99/mo | 2 min | Quick experiments |
| AWS SageMaker | A100 40 GB | ~$5.67 | Pay-per-use | 30 min | Enterprise / existing AWS |
RunPod (Recommended)
RunPod is the best balance of cost, ease of use, and reliability for AuroraSOC fine-tuning.
Step 1: Create Account & Add Funds
- Go to runpod.io and create an account
- Add funds: $10-15 is enough to fine-tune all 9 agents on Granite 4
- Navigate to GPU Cloud → Secure Cloud (recommended) or Community Cloud (cheaper)
Step 2: Choose a GPU Pod
For AuroraSOC fine-tuning, you need:
| Model Size | Minimum GPU | Recommended GPU | Cost/hr |
|---|---|---|---|
| Granite 4 H-Micro (3B) | RTX 3090 (24 GB) | RTX 3090 | $0.69 |
| Granite 4 H-Small (8B) | RTX 3090 (24 GB) | RTX 3090 | $0.69 |
| Qwen 3 8B | RTX 3090 (24 GB) | RTX 3090 | $0.69 |
| Gemma 4 12B | A100 40 GB | A100 40 GB | $1.49 |
| Qwen 3 14B | A100 40 GB | A100 40 GB | $1.49 |
| Qwen 3 32B / 30B-A3B | A100 80 GB | A100 80 GB | $2.49 |
For most AuroraSOC agents, an RTX 3090 at $0.69/hr is sufficient. Only use A100 for 12B+ models.
Step 3: Select a Template
Choose RunPod PyTorch 2.4+ template, which includes:
- CUDA 12.4
- Python 3.11
- PyTorch 2.4
- 50 GB disk (increase to 100 GB for training output)
Pod configuration:
- Container disk: 20 GB
- Volume disk: 100 GB (persistent — survives pod restarts)
- Expose ports: 8888 (Jupyter), 22 (SSH)
Step 4: Connect via SSH
# RunPod provides an SSH command — copy it from the pod dashboard:
ssh root@<pod-ip> -p <port> -i ~/.ssh/id_rsa
# First time: add your SSH key in RunPod Settings → SSH Keys
Or use the web terminal in the RunPod dashboard (no SSH setup needed).
Step 5: Set Up the Training Environment
# Connect to your pod
# Clone AuroraSOC
git clone https://github.com/your-org/AuroraSOC.git
cd AuroraSOC
# Install dependencies
pip install -e ".[training]"
# Verify GPU
python -c "import torch; print(f'GPU: {torch.cuda.get_device_name(0)}'); print(f'VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB')"
# GPU: NVIDIA GeForce RTX 3090
# VRAM: 24.0 GB
# Verify Unsloth
python -c "from unsloth import FastLanguageModel; print('Unsloth ready!')"
Step 6: Run Training
# Prepare datasets
make train-data
# Train generic model (all domains)
make train
# Train per-agent specialists
python training/scripts/train_all_agents.py
# Or train a single agent:
python training/scripts/finetune_granite.py \
--config training/configs/granite_soc_finetune.yaml \
--agent malware_analyst
Expected training times (RTX 3090, Granite 4 H-Small):
| Step | Duration | Cost |
|---|---|---|
| Dataset preparation | 5 min | $0.06 |
| Generic model training | 25 min | $0.29 |
| Per-agent training (×9 agents) | 3-4 hours | $2.30-2.76 |
| Evaluation | 10 min | $0.12 |
| GGUF export | 15 min | $0.17 |
| Total | ~4.5 hours | ~$3.10 |
Step 7: Export & Download
# Export to GGUF
make train-export
# Download to local machine (from your LOCAL terminal):
scp -P <port> root@<pod-ip>:/workspace/AuroraSOC/training/output/gguf/*.gguf ./models/
# Or use RunPod's file browser in the web UI
Step 8: Stop the Pod
Always stop your pod when done! RunPod charges by the hour. An idle RTX 3090 pod costs $0.69/hr = $16.56/day.
# From RunPod dashboard: Click "Stop" on your pod
# Your /workspace volume persists — you can restart later without losing data
Lambda Labs
Lambda Labs offers on-demand A100 instances with a clean Ubuntu environment. Good for larger models (12B+).
Quick Start
# 1. Create account at lambdalabs.com
# 2. Launch instance: 1x A100 (40 GB) at $1.29/hr
# 3. SSH in:
ssh ubuntu@<instance-ip>
# 4. Set up environment
sudo apt update && sudo apt install -y git python3-pip
git clone https://github.com/your-org/AuroraSOC.git
cd AuroraSOC
pip install -e ".[training]"
# 5. Run training (same commands as RunPod)
make train-data && make train && python training/scripts/train_all_agents.py
Cost comparison for full training:
| Configuration | Lambda A100 ($1.29/hr) | RunPod RTX 3090 ($0.69/hr) |
|---|---|---|
| Generic model only | $0.54 | $0.29 |
| All 9 agents | $6.45 | $3.10 |
| All agents + 3 models (Option C) | $25.80 | $14.50 |
Lambda is ~1.9× more expensive than RunPod for the same work, but the A100 is faster for 12B+ models.
vast.ai
vast.ai is a marketplace for GPU rentals — prices vary based on supply and demand. It can be the cheapest option but requires more setup.
Quick Start
# 1. Create account at vast.ai
# 2. Install vast CLI
pip install vastai
vastai set api-key <your-key>
# 3. Search for GPUs
vastai search offers 'gpu_ram >= 24 cuda_vers >= 12.0 reliability > 0.95' \
--order 'dph_total'
# 4. Create instance (pick cheapest RTX 3090)
vastai create instance <offer-id> \
--image pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel \
--disk 100
# 5. SSH in
vastai ssh-url <instance-id>
ssh -p <port> root@<host>
# 6. Same training commands
git clone https://github.com/your-org/AuroraSOC.git
cd AuroraSOC && pip install -e ".[training]"
make train-data && make train
vast.ai uses community-provided GPUs. Instances can be interrupted if the provider needs their GPU back. Always save checkpoints frequently and download results promptly.
Google Colab Pro
Colab Pro is the easiest option for experiments and testing, but not ideal for full training runs.
See the dedicated Colab Training Guide for the full notebook walkthrough.
Quick summary:
- $9.99/month for Colab Pro (A100 access, 24-hour sessions)
- No SSH needed — runs in browser
- Limited to ~12 hours continuous runtime (Pro+: ~24 hours)
- Storage via Google Drive (15 GB free, 100 GB with Google One)
Best for: Training a single agent, quick experiments, testing hyperparameters.
Not recommended for: Training all 9 agents (insufficient runtime) or multi-model configurations.
GPU Selection Guide
How Much VRAM Do You Need?
GPU Speed Comparison
Real-world training times for Granite 4 H-Small (8B), single agent, 3 epochs:
| GPU | VRAM | Training Time | Cost (RunPod) |
|---|---|---|---|
| T4 | 16 GB | 52 min | $0.32 |
| RTX 3090 | 24 GB | 25 min | $0.29 |
| RTX 4090 | 24 GB | 18 min | $0.54 |
| A100 40 GB | 40 GB | 14 min | $0.35 |
| A100 80 GB | 80 GB | 12 min | $0.50 |
| H100 80 GB | 80 GB | 8 min | $0.53 |
The RTX 3090 consistently offers the lowest total cost despite not being the fastest. At $0.69/hr and 25 minutes per agent, you spend only $0.29 per agent fine-tune.
Cost Calculator
Single-Model Configuration (Granite 4 only)
Base training: 25 min × $0.69/hr = $0.29
Per-agent (×9): 225 min × $0.69/hr = $2.59
Dataset prep: 5 min × $0.69/hr = $0.06
Export + eval: 25 min × $0.69/hr = $0.29
────────────────────────────────────────────────
Total: 280 min = 4.7 hrs → $3.23
Two-Model Configuration (Granite 4 + Qwen 3)
Granite 4 agents (×9): 225 min × $0.69/hr = $2.59
Qwen 3 agents (×4): 100 min × $0.69/hr = $1.15
Dataset prep: 5 min × $0.69/hr = $0.06
Export + eval: 40 min × $0.69/hr = $0.46
────────────────────────────────────────────────
Total: 370 min = 6.2 hrs → $4.26
Three-Model Configuration (Granite 4 + Qwen 3 + Gemma 4)
Granite 4 agents (×9): 225 min × $0.69/hr = $2.59
Qwen 3 agents (×4): 100 min × $0.69/hr = $1.15
Gemma 4 agents (×3): 120 min × $1.49/hr = $2.98
Dataset prep: 5 min × $0.69/hr = $0.06
Export + eval: 45 min × $1.49/hr = $1.12
────────────────────────────────────────────────
Total: 495 min = 8.25 hrs → $7.90
Data Transfer Tips
Uploading Training Data
# Option 1: Clone from Git (fastest if data is in repo)
git clone --depth 1 https://github.com/your-org/AuroraSOC.git
# Option 2: rsync for large datasets
rsync -avz --progress training/data/ root@<pod-ip>:/workspace/AuroraSOC/training/data/
# Option 3: Hugging Face Hub (for public/private datasets)
pip install huggingface-hub
huggingface-cli download your-org/aurora-soc-dataset --local-dir training/data/
Downloading Results
# Download only GGUF files (smallest, ready for serving)
scp -P <port> root@<pod-ip>:/workspace/AuroraSOC/training/output/gguf/*.gguf ./models/
# Download LoRA adapters (for later merging)
scp -P <port> -r root@<pod-ip>:/workspace/AuroraSOC/training/output/*/adapter_model.safetensors ./adapters/
# Download everything
rsync -avz root@<pod-ip>:/workspace/AuroraSOC/training/output/ ./training/output/
Troubleshooting
Common Cloud Training Issues
| Issue | Cause | Solution |
|---|---|---|
CUDA out of memory | Batch size too large or wrong GPU | Reduce per_device_train_batch_size to 1, enable gradient checkpointing |
Connection reset | Pod terminated | Use RunPod persistent volumes; restart pod and resume from checkpoint |
Permission denied on SSH | SSH key not configured | Add key in platform settings; use web terminal as fallback |
| Training slower than expected | Shared GPU (vast.ai) | Monitor with nvidia-smi; switch to dedicated instance |
| Disk full | Output files accumulating | Increase volume to 200 GB; delete intermediate checkpoints |
ModuleNotFoundError: unsloth | Missing dependency | pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" |
Resuming from Checkpoints
If your training is interrupted (pod preempted, connection dropped):
# Find the latest checkpoint
ls -la training/output/checkpoint-*/
# Resume training from checkpoint
python training/scripts/finetune_granite.py \
--config training/configs/granite_soc_finetune.yaml \
--resume-from-checkpoint training/output/checkpoint-500/
Next Steps
- Fine-Tuning Methods — understand QLoRA, LoRA, DPO, and ORPO
- Model Comparison — compare Granite 4, Qwen 3, and Gemma 4
- Agent Model Selection — which model for which agent
- Evaluation & Export — evaluate and export your trained models