Configuration Reference

This page documents every configuration option for training, serving, and integrating Granite 4 models in AuroraSOC.

Training YAML Configuration

File: training/configs/granite_soc_finetune.yaml

`model` Section

Controls which base model is loaded and how.

model:
  name: "unsloth/granite-4.0-h-tiny"
  max_seq_length: 4096
  load_in_4bit: true

Field	Type	Default	Description
`name`	string	`"unsloth/granite-4.0-h-tiny"`	HuggingFace model ID. Must be an Unsloth-optimized Granite 4 variant.
`max_seq_length`	int	`4096`	Maximum token sequence length. Longer sequences use more VRAM. Granite 4 supports up to 128K but 4096 is optimal for training.
`load_in_4bit`	bool	`true`	Enable QLoRA 4-bit quantization. Reduces VRAM by ~4×. Disable only if you have ≥48 GB VRAM.

Available models:

Model ID	Parameters	VRAM (4-bit)	Quality
`unsloth/granite-4.0-micro`	~1B	~4 GB	Baseline
`unsloth/granite-4.0-h-micro`	~1B	~4 GB	Better (Hybrid)
`unsloth/granite-4.0-h-tiny`	~2B	~6 GB	Recommended
`unsloth/granite-4.0-h-small`	~8B	~12 GB	Best quality

`lora` Section

Configures LoRA (Low-Rank Adaptation) parameters.

lora:
  r: 64
  lora_alpha: 64
  lora_dropout: 0
  bias: "none"
  target_modules:
    - "q_proj"
    - "k_proj"
    - "v_proj"
    - "o_proj"
    - "gate_proj"
    - "up_proj"
    - "down_proj"
    - "shared_mlp.input_linear"
    - "shared_mlp.output_linear"
  use_gradient_checkpointing: "unsloth"

Field	Type	Default	Description
`r`	int	`64`	LoRA rank. Higher = more parameters = better capacity but slower training. Values: 16, 32, 64, 128.
`lora_alpha`	int	`64`	LoRA scaling factor. Usually set equal to `r`. Higher values make LoRA updates stronger relative to the base model.
`lora_dropout`	float	`0`	Dropout probability for LoRA layers. `0` is recommended by Unsloth for maximum training speed.
`bias`	string	`"none"`	Whether to train bias parameters. `"none"` keeps LoRA lightweight. Other options: `"all"`, `"lora_only"`.
`target_modules`	list	9 modules	Which model layers get LoRA adapters. The 9-module list covers both Transformer attention (`q_proj`-`down_proj`) and Mamba SSM (`shared_mlp.*`).
`use_gradient_checkpointing`	string	`"unsloth"`	Memory optimization. `"unsloth"` uses Unsloth's optimized implementation (2× less VRAM than PyTorch native).

Understanding r (rank):

Rank	Trainable Params	Training Speed	Model Quality
16	~5M	Fastest	Good for simple tasks
32	~10M	Fast	Good balance
64	~20M	Moderate	Best for SOC tasks
128	~40M	Slow	Diminishing returns

Target modules explained:

Module	Architecture	Purpose
`q_proj`, `k_proj`, `v_proj`, `o_proj`	Transformer Attention	Query, Key, Value, and Output projections — core attention mechanism
`gate_proj`, `up_proj`, `down_proj`	Transformer FFN	Feed-forward network gate and projections
`shared_mlp.input_linear`	Mamba SSM	Granite 4 Hybrid's shared MLP input — covers state-space model layers
`shared_mlp.output_linear`	Mamba SSM	Granite 4 Hybrid's shared MLP output

`training` Section

Controls the training loop hyperparameters.

training:
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 4
  num_train_epochs: 3
  max_steps: -1
  learning_rate: 0.0002
  lr_scheduler_type: "cosine"
  warmup_ratio: 0.1
  weight_decay: 0.01
  bf16: true
  fp16: false
  optim: "adamw_8bit"
  logging_steps: 10
  save_steps: 100
  seed: 42
  output_dir: "training/output"

Field	Type	Default	Description
`per_device_train_batch_size`	int	`2`	Samples per GPU per step. Higher = faster training but more VRAM. T4: 2, A100: 4-8.
`gradient_accumulation_steps`	int	`4`	Accumulate gradients over N steps before updating. Effective batch size = `batch_size × grad_accum` = 8.
`num_train_epochs`	int	`3`	Number of passes through the full dataset. 3 is a good starting point; monitor eval loss for overfitting.
`max_steps`	int	`-1`	Override epochs with a fixed step count. `-1` = use `num_train_epochs` instead. Useful for quick tests (e.g., `200`).
`learning_rate`	float	`2e-4`	Peak learning rate. LoRA fine-tuning typically uses 1e-4 to 5e-4.
`lr_scheduler_type`	string	`"cosine"`	Learning rate schedule. `"cosine"` decays smoothly; `"linear"` decays linearly. Cosine is preferred for LoRA.
`warmup_ratio`	float	`0.1`	Fraction of training steps spent warming up the learning rate from 0. Prevents instability in early training.
`weight_decay`	float	`0.01`	L2 regularization. Prevents overfitting. Standard value for LLM fine-tuning.
`bf16`	bool	`true`	Use bfloat16 mixed precision. Reduces VRAM, maintains numerical range. Requires Ampere+ GPU.
`fp16`	bool	`false`	Use float16 mixed precision. Use if `bf16` is unsupported (pre-Ampere GPUs like T4).
`optim`	string	`"adamw_8bit"`	Optimizer. `"adamw_8bit"` (bitsandbytes) uses ~33% less VRAM than standard AdamW.
`logging_steps`	int	`10`	Log training loss every N steps.
`save_steps`	int	`100`	Save checkpoint every N steps. Enables `--resume` if training is interrupted.
`seed`	int	`42`	Random seed for reproducibility.
`output_dir`	string	`"training/output"`	Where to save checkpoints and exported models.

`dataset` Section

Controls dataset loading and preprocessing.

dataset:
  train_file: "training/data/soc_train.jsonl"
  eval_file: "training/data/soc_eval.jsonl"
  train_on_completions: true

Field	Type	Default	Description
`train_file`	string	`"training/data/soc_train.jsonl"`	Path to training JSONL file. Generated by `prepare_datasets.py`.
`eval_file`	string	`"training/data/soc_eval.jsonl"`	Path to evaluation JSONL file. Used for eval loss during training (separate from benchmark evaluation).
`train_on_completions`	bool	`true`	Enable response masking. Only compute loss on assistant responses, not system/user tokens. Critical for quality.

`agent_profiles` Section

Defines per-agent specialist training configurations.

agent_profiles:
  security_analyst:
    system_prompt: "You are the AuroraSOC Security Analyst..."
    dataset_filter: "alert_triage"
    model_override: null
    output_dir: "training/output/security_analyst"

  threat_hunter:
    system_prompt: "You are the AuroraSOC Threat Hunter..."
    dataset_filter: "threat_hunting"
    model_override: null
    output_dir: "training/output/threat_hunter"

Field	Type	Default	Description
`system_prompt`	string	(varies)	System prompt injected into each training example. Teaches the model its persona.
`dataset_filter`	string	(varies)	Filter training data by domain. Only samples with matching `domain` field are used.
`model_override`	string	`null`	Use a different base model for this agent. `null` = use the global `model.name`.
`output_dir`	string	(varies)	Where to save this agent's trained model. Pattern: `training/output/<agent_name>/`.

`export` Section

Controls model export after training.

export:
  save_lora: true
  save_merged_16bit: false
  save_gguf: true
  gguf_quantization_methods:
    - "q8_0"
  push_to_hub: false
  hub_model_name: ""

Field	Type	Default	Description
`save_lora`	bool	`true`	Save LoRA adapter weights. Lightweight (~50-200 MB). Required for `--resume` and `--export-only`.
`save_merged_16bit`	bool	`false`	Merge LoRA into base model and save full FP16 weights. Requires full model VRAM. Needed for vLLM serving.
`save_gguf`	bool	`true`	Export to GGUF format. Required for Ollama deployment.
`gguf_quantization_methods`	list	`["q8_0"]`	GGUF quantization methods. Options: `"q8_0"`, `"q4_k_m"`, `"q5_k_m"`, `"f16"`.
`push_to_hub`	bool	`false`	Push to HuggingFace Hub after training. Requires `HF_TOKEN` env var.
`hub_model_name`	string	`""`	HuggingFace repo name (e.g., `"yourname/granite-soc-finetuned"`).

Environment Variables

Runtime Configuration (`.env`)

These variables control how AuroraSOC resolves and uses Granite models at runtime:

# Runtime Backend Selection
LLM_BACKEND=vllm
VLLM_BASE_URL=http://vllm:8000/v1
VLLM_MODEL=granite-soc-specialist
VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=granite4:8b
OLLAMA_ORCHESTRATOR_MODEL=granite4:dense

# Fine-tuned Model Settings (primarily used in Ollama model resolution paths)
GRANITE_USE_FINETUNED=false
GRANITE_USE_PER_AGENT_MODELS=false
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest

# Optional training metadata
GRANITE_SERVING_BACKEND=vllm

Variable	Values	Description
`LLM_BACKEND`	`vllm` / `ollama`	Active inference backend used by agents, workflows, and API chat
`VLLM_BASE_URL`	URL	vLLM OpenAI-compatible endpoint (typically `.../v1`)
`VLLM_MODEL`	Served model id	vLLM model id used for specialist calls
`VLLM_ORCHESTRATOR_MODEL`	Served model id	vLLM model id used for orchestrator calls
`OLLAMA_BASE_URL`	URL	Ollama server endpoint
`OLLAMA_MODEL`	Ollama tag	Specialist model tag for Ollama mode
`OLLAMA_ORCHESTRATOR_MODEL`	Ollama tag	Orchestrator model tag for Ollama mode
`GRANITE_USE_FINETUNED`	`true` / `false`	Enable fine-tuned model as fallback when per-agent model isn't available
`GRANITE_USE_PER_AGENT_MODELS`	`true` / `false`	Enable per-agent model lookup via `AGENT_MODEL_MAP`
`GRANITE_FINETUNED_MODEL_TAG`	Ollama tag	Tag used for the generic (non-per-agent) fine-tuned model
`GRANITE_SERVING_BACKEND`	`ollama` / `vllm`	Optional metadata used by training/serving helpers

Training Environment Variables

These variables are used during the training process:

Variable	Default	Description
`AGENT_NAME`	(none)	Agent profile name for Docker-based per-agent training
`HF_TOKEN`	(none)	HuggingFace token for `push_to_hub` and gated model downloads
`WANDB_API_KEY`	(none)	Weights & Biases API key for experiment tracking
`CUDA_VISIBLE_DEVICES`	`0`	Which GPU(s) to use. `0,1` for multi-GPU.

Makefile Targets Reference

Training Targets

Target	Command	Description
`make train-install`	`pip install -e .[training]`	Install training dependencies
`make train-data`	`python training/scripts/prepare_datasets.py --output-dir training/data`	Download and prepare training datasets
`make train`	`python training/scripts/finetune_granite.py`	Train generic SOC model
`make train-agent AGENT=X`	`finetune_granite.py --agent X`	Train per-agent specialist
`python training/scripts/train_all_agents.py`	`python training/scripts/train_all_agents.py`	Train all agent profiles sequentially
`make train-eval`	`python training/scripts/evaluate_model.py`	Evaluate trained model
`make train-serve-ollama`	`serve_model.py ollama`	Import GGUF to Ollama
`make train-serve-vllm`	`serve_model.py vllm`	Start vLLM server

Docker Training Targets

Target	Command	Description
`make train-docker-data`	`docker compose ... run prepare-data`	Prepare data in Docker
`make train-docker`	`docker compose ... run training`	Train in Docker
`make train-docker-agent AGENT=X`	`docker compose ... run training-agent`	Per-agent training in Docker
`make train-docker-eval`	`docker compose ... run eval`	Evaluate in Docker

Local Setup Targets

Target	Command	Description
`make setup-local`	`./scripts/setup_local.sh`	Full local environment setup
`make enable-finetuned`	Sets `.env` vars	Enable fine-tuned models in AuroraSOC
`make disable-finetuned`	Unsets `.env` vars	Revert to base Granite models
`make ollama-pull-granite`	`ollama pull granite4:8b`	Pull base model from Ollama

Docker Compose Services

`docker-compose.training.yml`

Service	Purpose	GPU Required	Volumes
`prepare-data`	Download + process dataset	No	`./training/data:/app/training/data`
`training`	Generic model training	Yes	`./training:/app/training`, HF cache
`training-agent`	Per-agent training	Yes	Same as `training`
`eval`	Model evaluation	Yes	`./training:/app/training`
`vllm`	Production serving (FP16)	Yes	`./training/output:/models`
`ollama-import`	GGUF → Ollama	Depends	`./training/output:/models`, Ollama data

`docker-compose.yml` (x-granite-env Anchor)

The main compose file uses a YAML anchor to inject Granite environment variables into all agent services:

x-granite-env: &granite-env
  GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
  GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
  GRANITE_FINETUNED_MODEL_TAG: ${GRANITE_FINETUNED_MODEL_TAG:-granite-soc:latest}
  GRANITE_SERVING_BACKEND: ${GRANITE_SERVING_BACKEND:-vllm}
  LLM_BACKEND: ${LLM_BACKEND:-vllm}
  VLLM_BASE_URL: ${VLLM_BASE_URL:-http://vllm:8000/v1}
  VLLM_MODEL: ${VLLM_MODEL:-granite-soc-specialist}
  VLLM_ORCHESTRATOR_MODEL: ${VLLM_ORCHESTRATOR_MODEL:-granite-soc-specialist}
  OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-http://ollama:11434}

Full Configuration Example

A complete .env file for a production deployment with per-agent fine-tuned models:

# --- Granite Model Configuration ---
LLM_BACKEND=vllm
VLLM_BASE_URL=http://vllm:8000/v1
VLLM_MODEL=granite-soc-specialist
VLLM_ORCHESTRATOR_MODEL=granite-soc-specialist
OLLAMA_MODEL=granite4:8b
OLLAMA_ORCHESTRATOR_MODEL=granite4:dense
GRANITE_USE_FINETUNED=true
GRANITE_USE_PER_AGENT_MODELS=true
GRANITE_FINETUNED_MODEL_TAG=granite-soc:latest
GRANITE_SERVING_BACKEND=vllm
OLLAMA_BASE_URL=http://ollama:11434

# --- Application Configuration ---
SECRET_KEY=your-secret-key-here
DATABASE_URL=postgresql+asyncpg://aurora:aurora@postgres:5432/aurorasoc
NATS_URL=nats://nats:4222
MQTT_BROKER=mosquitto

# --- Monitoring ---
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

Next Steps

LLM Integration: Architecture — how models connect to the agent framework
LLM Integration: Model Swap — switch between base and fine-tuned models

Training YAML Configuration​

model Section​

lora Section​

training Section​

dataset Section​

agent_profiles Section​

export Section​

Environment Variables​

Runtime Configuration (.env)​

Training Environment Variables​

Makefile Targets Reference​

Training Targets​

Docker Training Targets​

Local Setup Targets​

Docker Compose Services​

docker-compose.training.yml​

docker-compose.yml (x-granite-env Anchor)​

Full Configuration Example​

Next Steps​