Skip to main content

Configuration Reference

This page documents every configuration option for training, serving, and integrating Granite 4 models in AuroraSOC.

Training YAML Configuration

File: training/configs/granite_soc_finetune.yaml

model Section

Controls which base model is loaded and how.

model:
name: "unsloth/granite-4.0-h-tiny"
max_seq_length: 4096
load_in_4bit: true
FieldTypeDefaultDescription
namestring"unsloth/granite-4.0-h-tiny"HuggingFace model ID. Must be an Unsloth-optimized Granite 4 variant.
max_seq_lengthint4096Maximum token sequence length. Longer sequences use more VRAM. Granite 4 supports up to 128K but 4096 is optimal for training.
load_in_4bitbooltrueEnable QLoRA 4-bit quantization. Reduces VRAM by ~4×. Disable only if you have ≥48 GB VRAM.

Available models:

Model IDParametersVRAM (4-bit)Quality
unsloth/granite-4.0-micro~1B~4 GBBaseline
unsloth/granite-4.0-h-micro~1B~4 GBBetter (Hybrid)
unsloth/granite-4.0-h-tiny~2B~6 GBRecommended
unsloth/granite-4.0-h-small~8B~12 GBBest quality

lora Section

Configures LoRA (Low-Rank Adaptation) parameters.

lora:
r: 64
lora_alpha: 64
lora_dropout: 0
bias: "none"
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
- "shared_mlp.input_linear"
- "shared_mlp.output_linear"
use_gradient_checkpointing: "unsloth"
FieldTypeDefaultDescription
rint64LoRA rank. Higher = more parameters = better capacity but slower training. Values: 16, 32, 64, 128.
lora_alphaint64LoRA scaling factor. Usually set equal to r. Higher values make LoRA updates stronger relative to the base model.
lora_dropoutfloat0Dropout probability for LoRA layers. 0 is recommended by Unsloth for maximum training speed.
biasstring"none"Whether to train bias parameters. "none" keeps LoRA lightweight. Other options: "all", "lora_only".
target_moduleslist9 modulesWhich model layers get LoRA adapters. The 9-module list covers both Transformer attention (q_proj-down_proj) and Mamba SSM (shared_mlp.*).
use_gradient_checkpointingstring"unsloth"Memory optimization. "unsloth" uses Unsloth's optimized implementation (2× less VRAM than PyTorch native).

Understanding r (rank):

RankTrainable ParamsTraining SpeedModel Quality
16~5MFastestGood for simple tasks
32~10MFastGood balance
64~20MModerateBest for SOC tasks
128~40MSlowDiminishing returns

Target modules explained:

ModuleArchitecturePurpose
q_proj, k_proj, v_proj, o_projTransformer AttentionQuery, Key, Value, and Output projections — core attention mechanism
gate_proj, up_proj, down_projTransformer FFNFeed-forward network gate and projections
shared_mlp.input_linearMamba SSMGranite 4 Hybrid's shared MLP input — covers state-space model layers
shared_mlp.output_linearMamba SSMGranite 4 Hybrid's shared MLP output

training Section

Controls the training loop hyperparameters.

training:
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
num_train_epochs: 3
max_steps: -1
learning_rate: 0.0002
lr_scheduler_type: "cosine"
warmup_ratio: 0.1
weight_decay: 0.01
bf16: true
fp16: false
optim: "adamw_8bit"
logging_steps: 10
save_steps: 100
seed: 42
output_dir: "training/output"
FieldTypeDefaultDescription
per_device_train_batch_sizeint2Samples per GPU per step. Higher = faster training but more VRAM. T4: 2, A100: 4-8.
gradient_accumulation_stepsint4Accumulate gradients over N steps before updating. Effective batch size = batch_size × grad_accum = 8.
num_train_epochsint3Number of passes through the full dataset. 3 is a good starting point; monitor eval loss for overfitting.
max_stepsint-1Override epochs with a fixed step count. -1 = use num_train_epochs instead. Useful for quick tests (e.g., 200).
learning_ratefloat2e-4Peak learning rate. LoRA fine-tuning typically uses 1e-4 to 5e-4.
lr_scheduler_typestring"cosine"Learning rate schedule. "cosine" decays smoothly; "linear" decays linearly. Cosine is preferred for LoRA.
warmup_ratiofloat0.1Fraction of training steps spent warming up the learning rate from 0. Prevents instability in early training.
weight_decayfloat0.01L2 regularization. Prevents overfitting. Standard value for LLM fine-tuning.
bf16booltrueUse bfloat16 mixed precision. Reduces VRAM, maintains numerical range. Requires Ampere+ GPU.
fp16boolfalseUse float16 mixed precision. Use if bf16 is unsupported (pre-Ampere GPUs like T4).
optimstring"adamw_8bit"Optimizer. "adamw_8bit" (bitsandbytes) uses ~33% less VRAM than standard AdamW.
logging_stepsint10Log training loss every N steps.
save_stepsint100Save checkpoint every N steps. Enables --resume if training is interrupted.
seedint42Random seed for reproducibility.
output_dirstring"training/output"Where to save checkpoints and exported models.

dataset Section

Controls dataset loading and preprocessing.

dataset:
train_file: "training/data/soc_train.jsonl"
eval_file: "training/data/soc_eval.jsonl"
train_on_completions: true
FieldTypeDefaultDescription
train_filestring"training/data/soc_train.jsonl"Path to training JSONL file. Generated by prepare_datasets.py.
eval_filestring"training/data/soc_eval.jsonl"Path to evaluation JSONL file. Used for eval loss during training (separate from benchmark evaluation).
train_on_completionsbooltrueEnable response masking. Only compute loss on assistant responses, not system/user tokens. Critical for quality.

agent_profiles Section

Defines per-agent specialist training configurations.

agent_profiles:
security_analyst:
system_prompt: "You are the AuroraSOC Security Analyst..."
dataset_filter: "alert_triage"
model_override: null
output_dir: "training/output/security_analyst"

threat_hunter:
system_prompt: "You are the AuroraSOC Threat Hunter..."
dataset_filter: "threat_hunting"
model_override: null
output_dir: "training/output/threat_hunter"
FieldTypeDefaultDescription
system_promptstring(varies)System prompt injected into each training example. Teaches the model its persona.
dataset_filterstring(varies)Filter training data by domain. Only samples with matching domain field are used.
model_overridestringnullUse a different base model for this agent. null = use the global model.name.
output_dirstring(varies)Where to save this agent's trained model. Pattern: training/output/<agent_name>/.

export Section

Controls model export after training.

export:
save_lora: true
save_merged_16bit: false
save_gguf: true
gguf_quantization_methods:
- "q8_0"
push_to_hub: false
hub_model_name: ""
FieldTypeDefaultDescription
save_lorabooltrueSave LoRA adapter weights. Lightweight (~50-200 MB). Required for --resume and --export-only.
save_merged_16bitboolfalseMerge LoRA into base model and save full FP16 weights. Requires full model VRAM. Needed for vLLM serving.
save_ggufbooltrueExport to GGUF format. Required for Ollama deployment.
gguf_quantization_methodslist["q8_0"]GGUF quantization methods. Options: "q8_0", "q4_k_m", "q5_k_m", "f16".
push_to_hubboolfalsePush to HuggingFace Hub after training. Requires HF_TOKEN env var.
hub_model_namestring""HuggingFace repo name (e.g., "yourname/granite-soc-finetuned").

Environment Variables

Runtime Configuration (.env)

These variables control how AuroraSOC resolves and uses Granite models at runtime:

# Model Selection
GRANITE_MODEL_NAME=granite3.2:2b # Base model for Ollama
GRANITE_MODEL_OVERRIDE= # Force all agents to use this model

# Fine-tuned Model Settings
GRANITE_USE_FINETUNED=false # Enable fine-tuned model resolution
GRANITE_USE_PER_AGENT_MODELS=false # Enable per-agent model resolution
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest # Tag for the generic fine-tuned model

# Serving Backend
GRANITE_SERVING_BACKEND=ollama # "ollama" or "vllm"
OLLAMA_HOST=http://localhost:11434 # Ollama server URL
VLLM_API_BASE=http://localhost:8000 # vLLM server URL
VLLM_MODEL_PATH=training/output/merged_fp16 # Path to FP16 model for vLLM
VariableValuesDescription
GRANITE_MODEL_NAMEModel name/tagBase Granite model pulled from Ollama registry
GRANITE_MODEL_OVERRIDEModel name or emptyForces ALL agents to use this model, bypassing all resolution logic
GRANITE_USE_FINETUNEDtrue / falseEnable fine-tuned model as fallback when per-agent model isn't available
GRANITE_USE_PER_AGENT_MODELStrue / falseEnable per-agent model lookup via AGENT_MODEL_MAP
GRANITE_FINETUNED_MODEL_TAGOllama tagTag used for the generic (non-per-agent) fine-tuned model
GRANITE_SERVING_BACKENDollama / vllmWhich serving backend to use
OLLAMA_HOSTURLOllama server endpoint
VLLM_API_BASEURLvLLM OpenAI-compatible API endpoint
VLLM_MODEL_PATHPathPath to merged FP16 model directory for vLLM

Training Environment Variables

These variables are used during the training process:

VariableDefaultDescription
AGENT_NAME(none)Agent profile name for Docker-based per-agent training
HF_TOKEN(none)HuggingFace token for push_to_hub and gated model downloads
WANDB_API_KEY(none)Weights & Biases API key for experiment tracking
CUDA_VISIBLE_DEVICES0Which GPU(s) to use. 0,1 for multi-GPU.

Makefile Targets Reference

Training Targets

TargetCommandDescription
make train-installpip install -e .[training]Install training dependencies
make train-datapython training/scripts/prepare_datasets.pyDownload and prepare training datasets
make trainpython training/scripts/finetune_granite.pyTrain generic SOC model
make train-agent AGENT=Xfinetune_granite.py --agent XTrain per-agent specialist
make train-all-agentspython training/scripts/train_all_agents.pyTrain all agent profiles sequentially
make train-evalpython training/scripts/evaluate_model.pyEvaluate trained model
make train-serve-ollamaserve_model.py ollamaImport GGUF to Ollama
make train-serve-vllmserve_model.py vllmStart vLLM server

Docker Training Targets

TargetCommandDescription
make train-docker-datadocker compose ... run prepare-dataPrepare data in Docker
make train-dockerdocker compose ... run trainingTrain in Docker
make train-docker-agent AGENT=Xdocker compose ... run training-agentPer-agent training in Docker
make train-docker-evaldocker compose ... run evalEvaluate in Docker

Local Setup Targets

TargetCommandDescription
make setup-local./scripts/setup_local.shFull local environment setup
make enable-finetunedSets .env varsEnable fine-tuned models in AuroraSOC
make disable-finetunedUnsets .env varsRevert to base Granite models
make ollama-pull-graniteollama pull granite3.2:2bPull base model from Ollama

Docker Compose Services

docker-compose.training.yml

ServicePurposeGPU RequiredVolumes
prepare-dataDownload + process datasetNo./training/data:/app/training/data
trainingGeneric model trainingYes./training:/app/training, HF cache
training-agentPer-agent trainingYesSame as training
evalModel evaluationYes./training:/app/training
vllmProduction serving (FP16)Yes./training/output:/models
ollama-importGGUF → OllamaDepends./training/output:/models, Ollama data

docker-compose.yml (x-granite-env Anchor)

The main compose file uses a YAML anchor to inject Granite environment variables into all agent services:

x-granite-env: &granite-env
GRANITE_MODEL_NAME: ${GRANITE_MODEL_NAME:-granite3.2:2b}
GRANITE_MODEL_OVERRIDE: ${GRANITE_MODEL_OVERRIDE:-}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
GRANITE_FINETUNED_MODEL_TAG: ${GRANITE_FINETUNED_MODEL_TAG:-granite-soc:latest}
GRANITE_SERVING_BACKEND: ${GRANITE_SERVING_BACKEND:-ollama}
OLLAMA_HOST: ${OLLAMA_HOST:-http://ollama:11434}

Full Configuration Example

A complete .env file for a production deployment with per-agent fine-tuned models:

# --- Granite Model Configuration ---
GRANITE_MODEL_NAME=granite3.2:2b
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest
GRANITE_SERVING_BACKEND=ollama
OLLAMA_HOST=http://ollama:11434

# --- Application Configuration ---
SECRET_KEY=your-secret-key-here
DATABASE_URL=postgresql+asyncpg://aurora:aurora@postgres:5432/aurorasoc
NATS_URL=nats://nats:4222
MQTT_BROKER=mosquitto

# --- Monitoring ---
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

Next Steps