Skip to main content

Prerequisites & System Requirements

Before training Granite models for AuroraSOC, ensure your environment meets the hardware and software requirements below. The requirements vary depending on whether you train locally, in Docker, or on Google Colab.

Hardware Requirements

Local GPU Training

ComponentMinimumRecommendedNotes
GPUNVIDIA with 8 GB VRAMNVIDIA with 16+ GB VRAMMust be CUDA-capable (compute capability ≥ 7.0)
GPU ModelsRTX 3060 (12 GB), T4 (16 GB)RTX 4090 (24 GB), A100 (40/80 GB)Consumer GPUs work fine for micro/tiny models
RAM16 GB32 GBDataset processing needs memory
Disk30 GB free50+ GB freeModel checkpoints + datasets
CPU4 cores8+ coresData loading is CPU-bound

VRAM Requirements by Model Variant

These are the VRAM requirements during training with 4-bit quantization (QLoRA) and Unsloth optimizations:

ModelTraining VRAM (4-bit)Training VRAM (8-bit)Inference VRAM (GGUF Q8_0)
granite-4.0-micro~4 GB~6 GB~1.5 GB
granite-4.0-h-micro~6 GB~8 GB~2 GB
granite-4.0-h-tiny~8 GB~12 GB~3 GB
granite-4.0-h-small~14 GB~20 GB~5 GB
Free GPU Option

If you don't have a local GPU, use Google Colab's free T4 (16 GB VRAM). See the Colab Training guide for details.

Docker Training

Same GPU requirements as local training, plus:

  • Docker ≥ 24.0
  • NVIDIA Container Toolkit (nvidia-docker2 or nvidia-container-toolkit)
  • Docker Compose ≥ 2.20

Google Colab

No local hardware needed. Colab provides:

  • Free tier: T4 GPU (16 GB VRAM), 12 GB RAM, ~78 GB disk
  • Colab Pro: A100 (40 GB VRAM), 50+ GB RAM
  • Colab Pro+: A100 (80 GB VRAM), persistent sessions

Software Requirements

Local Training (Bare Metal)

# Required
Python >= 3.11
CUDA >= 12.1 (with matching cuDNN)
Git

# Install via pip/uv
pip install unsloth torch transformers trl datasets pyyaml httpx

The full Python dependencies are listed in pyproject.toml under the [project.optional-dependencies.training] group:

# Install all training dependencies
pip install -e ".[training]"
# Or using uv (faster)
uv pip install -e ".[training]"
# Or via Make
make train-install

Verifying CUDA

Before training, verify your CUDA setup:

# Check NVIDIA driver
nvidia-smi

# Check CUDA version
nvcc --version

# Check PyTorch sees the GPU
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python -c "import torch; print(f'GPU: {torch.cuda.get_device_name(0)}')"
python -c "import torch; print(f'VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')"

Expected output:

CUDA available: True
GPU: NVIDIA GeForce RTX 4090
VRAM: 24.0 GB

Verifying Unsloth

python -c "from unsloth import FastLanguageModel; print('Unsloth OK')"

If this fails with a Mamba-related error, install the Mamba kernels:

pip install --no-build-isolation mamba_ssm==2.2.5 causal_conv1d==1.5.2
Mamba Kernel Compilation

The first time Mamba kernels are installed, they compile from source. This takes ~10 minutes and requires a working CUDA toolkit. On Colab, this happens automatically in the first notebook cell.

Docker Training

# Verify Docker + NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

If this fails, install the NVIDIA Container Toolkit:

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Ollama (For Deployment)

After training, models are served via Ollama. Install it if you haven't:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version

# Pull base Granite model (for comparison/fallback)
ollama pull granite4:8b

Or use the automated setup script:

./scripts/setup_local.sh

Directory Structure

The training pipeline uses this directory layout:

training/
├── configs/
│ ├── granite_soc_finetune.yaml # Main training config
│ └── Modelfile.granite-soc # Ollama Modelfile template
├── data/
│ ├── raw/ # Downloaded source data
│ │ ├── mitre_attack/
│ │ ├── mitre_car/
│ │ ├── sigma_rules/
│ │ └── atomic_red_team/
│ ├── domain/ # Per-agent domain splits
│ │ ├── security_analysis.jsonl
│ │ ├── threat_hunting.jsonl
│ │ └── ...
│ ├── soc_train.jsonl # Combined training data
│ └── soc_eval.jsonl # Evaluation data
├── checkpoints/ # Training outputs
│ ├── granite_soc_lora/ # LoRA adapters
│ ├── granite_soc_merged_16bit/ # Merged FP16 (for vLLM)
│ └── granite_soc_gguf/ # GGUF files (for Ollama)
├── notebooks/
│ └── AuroraSOC_Granite4_Finetune.ipynb # Colab notebook
└── scripts/
├── prepare_datasets.py # Data preparation
├── finetune_granite.py # Training pipeline
├── evaluate_model.py # Model evaluation
├── serve_model.py # Model deployment
└── train_all_agents.py # Batch per-agent training

Environment Variables

Training scripts use these environment variables (all optional with sensible defaults):

VariableDefaultDescription
GRANITE_MODEL_NAMEunsloth/granite-4.0-h-tinyBase model for training
GRANITE_LORA_R64LoRA rank
GRANITE_LORA_ALPHA64LoRA alpha scaling
GRANITE_BATCH_SIZE2Per-device batch size
GRANITE_GRAD_ACCUM4Gradient accumulation steps
GRANITE_EPOCHS3Number of training epochs
GRANITE_LR2e-4Learning rate
GRANITE_MAX_SEQ_LENGTH4096Maximum sequence length
HF_TOKEN(none)Hugging Face token (only needed for Hub push)

These can also be set in the YAML config file (training/configs/granite_soc_finetune.yaml), which takes precedence.

Next Steps

Once your environment is ready:

  1. Dataset Preparation — Download and process SOC training data
  2. Local GPU Training — Start training on your hardware