Configuration Reference
This page documents every configuration option for training, serving, and integrating Granite 4 models in AuroraSOC.
Training YAML Configuration
File: training/configs/granite_soc_finetune.yaml
model Section
Controls which base model is loaded and how.
model:
name: "unsloth/granite-4.0-h-tiny"
max_seq_length: 4096
load_in_4bit: true
| Field | Type | Default | Description |
|---|---|---|---|
name | string | "unsloth/granite-4.0-h-tiny" | HuggingFace model ID. Must be an Unsloth-optimized Granite 4 variant. |
max_seq_length | int | 4096 | Maximum token sequence length. Longer sequences use more VRAM. Granite 4 supports up to 128K but 4096 is optimal for training. |
load_in_4bit | bool | true | Enable QLoRA 4-bit quantization. Reduces VRAM by ~4×. Disable only if you have ≥48 GB VRAM. |
Available models:
| Model ID | Parameters | VRAM (4-bit) | Quality |
|---|---|---|---|
unsloth/granite-4.0-micro | ~1B | ~4 GB | Baseline |
unsloth/granite-4.0-h-micro | ~1B | ~4 GB | Better (Hybrid) |
unsloth/granite-4.0-h-tiny | ~2B | ~6 GB | Recommended |
unsloth/granite-4.0-h-small | ~8B | ~12 GB | Best quality |
lora Section
Configures LoRA (Low-Rank Adaptation) parameters.
lora:
r: 64
lora_alpha: 64
lora_dropout: 0
bias: "none"
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
- "shared_mlp.input_linear"
- "shared_mlp.output_linear"
use_gradient_checkpointing: "unsloth"
| Field | Type | Default | Description |
|---|---|---|---|
r | int | 64 | LoRA rank. Higher = more parameters = better capacity but slower training. Values: 16, 32, 64, 128. |
lora_alpha | int | 64 | LoRA scaling factor. Usually set equal to r. Higher values make LoRA updates stronger relative to the base model. |
lora_dropout | float | 0 | Dropout probability for LoRA layers. 0 is recommended by Unsloth for maximum training speed. |
bias | string | "none" | Whether to train bias parameters. "none" keeps LoRA lightweight. Other options: "all", "lora_only". |
target_modules | list | 9 modules | Which model layers get LoRA adapters. The 9-module list covers both Transformer attention (q_proj-down_proj) and Mamba SSM (shared_mlp.*). |
use_gradient_checkpointing | string | "unsloth" | Memory optimization. "unsloth" uses Unsloth's optimized implementation (2× less VRAM than PyTorch native). |
Understanding r (rank):
| Rank | Trainable Params | Training Speed | Model Quality |
|---|---|---|---|
| 16 | ~5M | Fastest | Good for simple tasks |
| 32 | ~10M | Fast | Good balance |
| 64 | ~20M | Moderate | Best for SOC tasks |
| 128 | ~40M | Slow | Diminishing returns |
Target modules explained:
| Module | Architecture | Purpose |
|---|---|---|
q_proj, k_proj, v_proj, o_proj | Transformer Attention | Query, Key, Value, and Output projections — core attention mechanism |
gate_proj, up_proj, down_proj | Transformer FFN | Feed-forward network gate and projections |
shared_mlp.input_linear | Mamba SSM | Granite 4 Hybrid's shared MLP input — covers state-space model layers |
shared_mlp.output_linear | Mamba SSM | Granite 4 Hybrid's shared MLP output |
training Section
Controls the training loop hyperparameters.
training:
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
num_train_epochs: 3
max_steps: -1
learning_rate: 0.0002
lr_scheduler_type: "cosine"
warmup_ratio: 0.1
weight_decay: 0.01
bf16: true
fp16: false
optim: "adamw_8bit"
logging_steps: 10
save_steps: 100
seed: 42
output_dir: "training/output"
| Field | Type | Default | Description |
|---|---|---|---|
per_device_train_batch_size | int | 2 | Samples per GPU per step. Higher = faster training but more VRAM. T4: 2, A100: 4-8. |
gradient_accumulation_steps | int | 4 | Accumulate gradients over N steps before updating. Effective batch size = batch_size × grad_accum = 8. |
num_train_epochs | int | 3 | Number of passes through the full dataset. 3 is a good starting point; monitor eval loss for overfitting. |
max_steps | int | -1 | Override epochs with a fixed step count. -1 = use num_train_epochs instead. Useful for quick tests (e.g., 200). |
learning_rate | float | 2e-4 | Peak learning rate. LoRA fine-tuning typically uses 1e-4 to 5e-4. |
lr_scheduler_type | string | "cosine" | Learning rate schedule. "cosine" decays smoothly; "linear" decays linearly. Cosine is preferred for LoRA. |
warmup_ratio | float | 0.1 | Fraction of training steps spent warming up the learning rate from 0. Prevents instability in early training. |
weight_decay | float | 0.01 | L2 regularization. Prevents overfitting. Standard value for LLM fine-tuning. |
bf16 | bool | true | Use bfloat16 mixed precision. Reduces VRAM, maintains numerical range. Requires Ampere+ GPU. |
fp16 | bool | false | Use float16 mixed precision. Use if bf16 is unsupported (pre-Ampere GPUs like T4). |
optim | string | "adamw_8bit" | Optimizer. "adamw_8bit" (bitsandbytes) uses ~33% less VRAM than standard AdamW. |
logging_steps | int | 10 | Log training loss every N steps. |
save_steps | int | 100 | Save checkpoint every N steps. Enables --resume if training is interrupted. |
seed | int | 42 | Random seed for reproducibility. |
output_dir | string | "training/output" | Where to save checkpoints and exported models. |
dataset Section
Controls dataset loading and preprocessing.
dataset:
train_file: "training/data/soc_train.jsonl"
eval_file: "training/data/soc_eval.jsonl"
train_on_completions: true
| Field | Type | Default | Description |
|---|---|---|---|
train_file | string | "training/data/soc_train.jsonl" | Path to training JSONL file. Generated by prepare_datasets.py. |
eval_file | string | "training/data/soc_eval.jsonl" | Path to evaluation JSONL file. Used for eval loss during training (separate from benchmark evaluation). |
train_on_completions | bool | true | Enable response masking. Only compute loss on assistant responses, not system/user tokens. Critical for quality. |
agent_profiles Section
Defines per-agent specialist training configurations.
agent_profiles:
security_analyst:
system_prompt: "You are the AuroraSOC Security Analyst..."
dataset_filter: "alert_triage"
model_override: null
output_dir: "training/output/security_analyst"
threat_hunter:
system_prompt: "You are the AuroraSOC Threat Hunter..."
dataset_filter: "threat_hunting"
model_override: null
output_dir: "training/output/threat_hunter"
| Field | Type | Default | Description |
|---|---|---|---|
system_prompt | string | (varies) | System prompt injected into each training example. Teaches the model its persona. |
dataset_filter | string | (varies) | Filter training data by domain. Only samples with matching domain field are used. |
model_override | string | null | Use a different base model for this agent. null = use the global model.name. |
output_dir | string | (varies) | Where to save this agent's trained model. Pattern: training/output/<agent_name>/. |
export Section
Controls model export after training.
export:
save_lora: true
save_merged_16bit: false
save_gguf: true
gguf_quantization_methods:
- "q8_0"
push_to_hub: false
hub_model_name: ""
| Field | Type | Default | Description |
|---|---|---|---|
save_lora | bool | true | Save LoRA adapter weights. Lightweight (~50-200 MB). Required for --resume and --export-only. |
save_merged_16bit | bool | false | Merge LoRA into base model and save full FP16 weights. Requires full model VRAM. Needed for vLLM serving. |
save_gguf | bool | true | Export to GGUF format. Required for Ollama deployment. |
gguf_quantization_methods | list | ["q8_0"] | GGUF quantization methods. Options: "q8_0", "q4_k_m", "q5_k_m", "f16". |
push_to_hub | bool | false | Push to HuggingFace Hub after training. Requires HF_TOKEN env var. |
hub_model_name | string | "" | HuggingFace repo name (e.g., "yourname/granite-soc-finetuned"). |
Environment Variables
Runtime Configuration (.env)
These variables control how AuroraSOC resolves and uses Granite models at runtime:
# Model Selection
GRANITE_MODEL_NAME=granite3.2:2b # Base model for Ollama
GRANITE_MODEL_OVERRIDE= # Force all agents to use this model
# Fine-tuned Model Settings
GRANITE_USE_FINETUNED=false # Enable fine-tuned model resolution
GRANITE_USE_PER_AGENT_MODELS=false # Enable per-agent model resolution
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest # Tag for the generic fine-tuned model
# Serving Backend
GRANITE_SERVING_BACKEND=ollama # "ollama" or "vllm"
OLLAMA_HOST=http://localhost:11434 # Ollama server URL
VLLM_API_BASE=http://localhost:8000 # vLLM server URL
VLLM_MODEL_PATH=training/output/merged_fp16 # Path to FP16 model for vLLM
| Variable | Values | Description |
|---|---|---|
GRANITE_MODEL_NAME | Model name/tag | Base Granite model pulled from Ollama registry |
GRANITE_MODEL_OVERRIDE | Model name or empty | Forces ALL agents to use this model, bypassing all resolution logic |
GRANITE_USE_FINETUNED | true / false | Enable fine-tuned model as fallback when per-agent model isn't available |
GRANITE_USE_PER_AGENT_MODELS | true / false | Enable per-agent model lookup via AGENT_MODEL_MAP |
GRANITE_FINETUNED_MODEL_TAG | Ollama tag | Tag used for the generic (non-per-agent) fine-tuned model |
GRANITE_SERVING_BACKEND | ollama / vllm | Which serving backend to use |
OLLAMA_HOST | URL | Ollama server endpoint |
VLLM_API_BASE | URL | vLLM OpenAI-compatible API endpoint |
VLLM_MODEL_PATH | Path | Path to merged FP16 model directory for vLLM |
Training Environment Variables
These variables are used during the training process:
| Variable | Default | Description |
|---|---|---|
AGENT_NAME | (none) | Agent profile name for Docker-based per-agent training |
HF_TOKEN | (none) | HuggingFace token for push_to_hub and gated model downloads |
WANDB_API_KEY | (none) | Weights & Biases API key for experiment tracking |
CUDA_VISIBLE_DEVICES | 0 | Which GPU(s) to use. 0,1 for multi-GPU. |
Makefile Targets Reference
Training Targets
| Target | Command | Description |
|---|---|---|
make train-install | pip install -e .[training] | Install training dependencies |
make train-data | python training/scripts/prepare_datasets.py | Download and prepare training datasets |
make train | python training/scripts/finetune_granite.py | Train generic SOC model |
make train-agent AGENT=X | finetune_granite.py --agent X | Train per-agent specialist |
make train-all-agents | python training/scripts/train_all_agents.py | Train all agent profiles sequentially |
make train-eval | python training/scripts/evaluate_model.py | Evaluate trained model |
make train-serve-ollama | serve_model.py ollama | Import GGUF to Ollama |
make train-serve-vllm | serve_model.py vllm | Start vLLM server |
Docker Training Targets
| Target | Command | Description |
|---|---|---|
make train-docker-data | docker compose ... run prepare-data | Prepare data in Docker |
make train-docker | docker compose ... run training | Train in Docker |
make train-docker-agent AGENT=X | docker compose ... run training-agent | Per-agent training in Docker |
make train-docker-eval | docker compose ... run eval | Evaluate in Docker |
Local Setup Targets
| Target | Command | Description |
|---|---|---|
make setup-local | ./scripts/setup_local.sh | Full local environment setup |
make enable-finetuned | Sets .env vars | Enable fine-tuned models in AuroraSOC |
make disable-finetuned | Unsets .env vars | Revert to base Granite models |
make ollama-pull-granite | ollama pull granite3.2:2b | Pull base model from Ollama |
Docker Compose Services
docker-compose.training.yml
| Service | Purpose | GPU Required | Volumes |
|---|---|---|---|
prepare-data | Download + process dataset | No | ./training/data:/app/training/data |
training | Generic model training | Yes | ./training:/app/training, HF cache |
training-agent | Per-agent training | Yes | Same as training |
eval | Model evaluation | Yes | ./training:/app/training |
vllm | Production serving (FP16) | Yes | ./training/output:/models |
ollama-import | GGUF → Ollama | Depends | ./training/output:/models, Ollama data |
docker-compose.yml (x-granite-env Anchor)
The main compose file uses a YAML anchor to inject Granite environment variables into all agent services:
x-granite-env: &granite-env
GRANITE_MODEL_NAME: ${GRANITE_MODEL_NAME:-granite3.2:2b}
GRANITE_MODEL_OVERRIDE: ${GRANITE_MODEL_OVERRIDE:-}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
GRANITE_FINETUNED_MODEL_TAG: ${GRANITE_FINETUNED_MODEL_TAG:-granite-soc:latest}
GRANITE_SERVING_BACKEND: ${GRANITE_SERVING_BACKEND:-ollama}
OLLAMA_HOST: ${OLLAMA_HOST:-http://ollama:11434}
Full Configuration Example
A complete .env file for a production deployment with per-agent fine-tuned models:
# --- Granite Model Configuration ---
GRANITE_MODEL_NAME=granite3.2:2b
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest
GRANITE_SERVING_BACKEND=ollama
OLLAMA_HOST=http://ollama:11434
# --- Application Configuration ---
SECRET_KEY=your-secret-key-here
DATABASE_URL=postgresql+asyncpg://aurora:aurora@postgres:5432/aurorasoc
NATS_URL=nats://nats:4222
MQTT_BROKER=mosquitto
# --- Monitoring ---
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
Next Steps
- LLM Integration: Architecture — how models connect to the agent framework
- LLM Integration: Model Swap — switch between base and fine-tuned models