Skip to main content

Adding or Swapping Base Models

AuroraSOC relies on the IBM Granite 4 hybrid architecture for its optimal blend of speed and long-context deductive reasoning. However, the ecosystem is entirely agnostic by design. Should your organization require the use of Llama 3, Mistral, Qwen, or a proprietary internal model, the pipeline can be adapted to support it.

This document walks you through the comprehensive process of swapping out the base model across the entire lifecycle—from training parameters to production deployment.

Understanding the Impact

When you switch a base model, you are not just changing a string in a config file. You must adapt three distinct pipeline phases:

  1. The Unsloth Training Configuration: Different architectures use different target projection layers for LoRA injection.
  2. The Ollama Modelfile: Different foundation models utilize drastically different chat templates (e.g., <|start_of_role|> vs. <bs> [INST]).
  3. The Deployment Configuration: Propagating the change through the docker-compose.yml to the actual agent framework.

Phase 1: Reconfiguring the Training Pipeline

Your first stop is training/configs/granite_soc_finetune.yaml.

Update the model name to your Hugging Face path of choice (supported by Unsloth).

model:
name: "unsloth/Meta-Llama-3-8B-Instruct"
max_seq_length: 4096
load_in_4bit: true

Updating LoRA Target Modules

LoRA matrices must be bound to specific attention and MLP layers within the neural network. IBM Granite requires specific layers like shared_mlp.input_linear, whereas Llama 3 uses a different schema.

If you switch to a Llama 3 architecture, you must update the target_modules list:

lora:
r: 64
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
# Remove IBM-specific "shared_mlp..." targets

Run prepare_datasets.py and execute the fine-tuning pipeline as normal.

Phase 2: Updating the Serving Template

If you are using vLLM, vLLM's OpenAI-compatible server uses Hugging Face tokenizer_config.json files to automatically handle chat templating. In most cases, vLLM "just works" when pointed to industry-standard models.

If you are deploying the fallback to Ollama, you must manually adapt the <|start_of_role|> template used by Granite.

Open training/scripts/serve_model.py and training/configs/Modelfile.granite-soc and adjust the generation templates.

For example, adapting the generate_modelfile function for Llama 3 format:

TEMPLATE """<|begin_of_text|>{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

# Add correct stop symbols
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|start_header_id|>"

Phase 3: Updating Deployment Configuration

Whether you trained a new model or simply decided to point AuroraSOC at a pre-existing cloud or network-accessible endpoint, you must inform the agent fleet by altering the subsystem environment block.

Open docker-compose.yml and locate the x-granite-env YAML anchor. Do not alter the factory.py application logic directly.

x-granite-env: &granite-env
# E.g. To point to an external vLLM server running Llama 3
LLM_BACKEND: "vllm"
VLLM_BASE_URL: "http://10.0.100.5:8000/v1"
VLLM_MODEL: "meta-llama/Meta-Llama-3-8B-Instruct"

Upon restart, the Agent Factory's _get_llm_config function will seamlessly route all 16 specialist agents to the new base URL and request the new model identifier.

Fastest path: OpenAI-compatible providers (no training needed)

If you just want to point AuroraSOC at a cloud-hosted or local OpenAI-compatible model — without any training or export — set three env vars:

x-granite-env: &granite-env
LLM_BACKEND: "openai"
OPENAI_COMPATIBLE_BASE_URL: "https://api.together.xyz/v1"
OPENAI_COMPATIBLE_MODEL: "meta-llama/Llama-3-70b-chat-hf"
OPENAI_COMPATIBLE_API_KEY: "${TOGETHER_API_KEY}"

This works with Together AI, Groq, Fireworks, OpenAI, LM Studio, llama.cpp, or any service implementing /v1/chat/completions. No Modelfile or export step is required.

Re-verifying Pipeline Competency

Absolute Necessity: A new LLM architecture means unverified deductive capabilities.

You must run evaluate_model.py against your newly deployed backend before treating the switch as complete.

python training/scripts/evaluate_model.py \
--model vllm:meta-llama/Meta-Llama-3-8B-Instruct

If the pass rate dips below your established operational baseline, you must either increase your LoRA rank, enrich the fine-tuning dataset, or abandon the architecture. In a SOC, precision is non-negotiable.