Skip to main content

Granite Module Deep Dive

The Granite module (aurorasoc/granite/) is the core abstraction that connects AuroraSOC agents to IBM Granite 4 models. It handles backend-aware model configuration, model resolution, ChatModel creation, and model health monitoring.

Module Structure

aurorasoc/granite/
├── __init__.py # GraniteModelConfig, resolve_model(), create_granite_chat_model()
└── registry.py # Health checks, model availability, warmup

GraniteModelConfig Dataclass

The central configuration object:

@dataclass
class GraniteModelConfig:
backend: ServingBackend = ServingBackend.OLLAMA
vllm_base_url: str = "http://vllm:8000/v1"
ollama_base_url: str = "http://ollama:11434"
use_finetuned: bool = True
use_per_agent_models: bool = False
finetuned_model_tag: str = "granite-soc:latest"
vllm_specialist_model: str = "granite-soc-specialist"
vllm_orchestrator_model: str = "granite-soc-specialist"
ollama_specialist_model: str = "granite4:8b"
ollama_orchestrator_model: str = "granite4:dense"
model_overrides: dict[str, str] = field(default_factory=dict)
FieldTypeDefaultDescription
backendServingBackendOLLAMAWhich backend to use (OLLAMA or VLLM)
vllm_base_urlstr"http://vllm:8000/v1"vLLM OpenAI-compatible API URL
ollama_base_urlstr"http://ollama:11434"Ollama server URL
use_finetunedboolTrueEnable generic fine-tuned fallback in Ollama mode
use_per_agent_modelsboolFalseEnable per-agent model resolution in Ollama mode
finetuned_model_tagstr"granite-soc:latest"Ollama tag for the generic fine-tuned model
vllm_specialist_modelstr"granite-soc-specialist"Served vLLM model ID for specialist agents
vllm_orchestrator_modelstr"granite-soc-specialist"Served vLLM model ID for orchestrator
ollama_specialist_modelstr"granite4:8b"Ollama specialist base tag when fine-tuning is off
ollama_orchestrator_modelstr"granite4:dense"Ollama orchestrator base tag when fine-tuning is off
model_overridesdict[str, str]{}Explicit per-agent override map (highest priority)

ServingBackend Enum

class ServingBackend(str, Enum):
OLLAMA = "ollama"
VLLM = "vllm"

Model Resolution Priority

The resolve_model() method implements a backend-aware priority strategy:

def resolve_model(self, agent_name: str) -> str:
# Tier 1: Explicit per-agent override
if agent_name in self.model_overrides:
return self._format_model(self.model_overrides[agent_name])

# Tier 2: vLLM explicit served model IDs
if self.backend == ServingBackend.VLLM:
model = (
self.vllm_orchestrator_model
if agent_name == "Orchestrator"
else self.vllm_specialist_model
)
return self._format_model(model)

# Tier 3: Ollama per-agent specialists
if self.use_per_agent_models and agent_name in AGENT_MODEL_MAP:
return self._format_model(AGENT_MODEL_MAP[agent_name])

# Tier 4: Ollama generic fine-tuned fallback
if self.use_finetuned:
return self._format_model(self.finetuned_model_tag)

# Tier 5: Ollama base specialist/orchestrator tags
if agent_name == "Orchestrator":
return self._format_model(self.ollama_orchestrator_model)
return self._format_model(self.ollama_specialist_model)

Resolution Priority Explained

TierWhen It ActivatesUse Case
1. Explicit Overridemodel_overrides[agent_name] is setForce one agent to a specific model for targeted debugging
2. vLLM Model IDsbackend == VLLMUse served vLLM model IDs from runtime settings
3. Per-Agent Ollamause_per_agent_models=true + agent in AGENT_MODEL_MAPProduction with per-agent Ollama specialists
4. Generic Ollama Finetuneduse_finetuned=trueProduction with one generic fine-tuned Ollama model
5. Ollama Base TagsNo fine-tuning path selectedDevelopment or fallback to base Granite tags

Fallback Behavior

If LLM_BACKEND=vllm, resolution uses VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL after explicit overrides and does not use Ollama-specific tiers. In Ollama mode, missing per-agent tags fall back to finetuned_model_tag, then base Ollama tags.

AGENT_MODEL_MAP

Maps agent names to Ollama model tags:

AGENT_MODEL_MAP = {
"security_analyst": "granite-soc-security-analyst",
"threat_hunter": "granite-soc-threat-hunter",
"malware_analyst": "granite-soc-malware-analyst",
"incident_responder": "granite-soc-incident-responder",
"network_security": "granite-soc-network-security",
"cps_security": "granite-soc-cps-security",
"threat_intel": "granite-soc-threat-intel",
"ueba_analyst": "granite-soc-ueba-analyst",
"forensic_analyst": "granite-soc-forensic-analyst",
"endpoint_security": "granite-soc-endpoint-security",
"web_security": "granite-soc-web-security",
"cloud_security": "granite-soc-cloud-security",
"compliance_analyst": "granite-soc-compliance-analyst",
"vulnerability_manager": "granite-soc-vulnerability-manager",
"report_generator": "granite-soc-report-generator",
"orchestrator": "granite-soc-orchestrator",
}

These tags correspond to the Ollama models created by serve_model.py ollama-all. The :latest tag is appended automatically by resolve_model().

create_granite_chat_model()

Creates a BeeAI ChatModel instance based on the resolved model and backend:

def create_granite_chat_model(
agent_name: str,
config: GraniteModelConfig | None = None,
) -> ChatModel:
model_name = config.resolve_model(agent_name)
provider_options = {}

if config.backend == ServingBackend.OLLAMA:
provider_options["base_url"] = config.ollama_base_url
elif config.backend == ServingBackend.VLLM:
provider_options["base_url"] = config.vllm_base_url

return ChatModel.from_name(model_name, provider_options=provider_options)

Backend-Specific Parameters

BackendChatModel ProviderURL FormatProtocol
Ollamaollama:<model>http://host:11434Ollama API (native)
vLLMopenai:<model>http://host:8000/v1OpenAI-compatible API

Why openai: for vLLM? vLLM exposes an OpenAI-compatible /v1/chat/completions endpoint. BeeAI's ChatModel.from_name("openai:...") makes standard OpenAI API calls, which vLLM handles transparently.

get_default_granite_config()

Creates a config from environment variables:

def get_default_granite_config() -> GraniteModelConfig:
settings = get_settings()
granite = settings.granite

return GraniteModelConfig(
backend=ServingBackend.VLLM if settings.LLM_BACKEND == "vllm" else ServingBackend.OLLAMA,
ollama_base_url=settings.OLLAMA_BASE_URL,
vllm_base_url=settings.VLLM_BASE_URL,
use_finetuned=granite.use_finetuned,
use_per_agent_models=granite.use_per_agent_models,
finetuned_model_tag=granite.finetuned_model_tag,
vllm_specialist_model=settings.VLLM_MODEL,
vllm_orchestrator_model=settings.VLLM_ORCHESTRATOR_MODEL,
ollama_specialist_model=resolve_ollama_model_tag(settings.OLLAMA_MODEL),
ollama_orchestrator_model=resolve_ollama_model_tag(
settings.OLLAMA_ORCHESTRATOR_MODEL,
orchestrator=True,
),
)

Note: Boolean environment variables are compared as lowercase strings to "true". Empty strings and unset variables default to False.

Model Registry (registry.py)

The registry module provides health checking and model management:

check_ollama_models()

Queries Ollama for available models:

async def check_ollama_models(ollama_host: str) -> list[str]:
"""Returns list of model tags available in Ollama."""

Used during startup to verify that required models are pulled and available.

check_vllm_models()

Checks vLLM's /v1/models endpoint:

async def check_vllm_models(vllm_base: str) -> list[str]:
"""Returns list of models loaded in vLLM."""

pull_ollama_model()

Pulls a model from the Ollama registry:

async def pull_ollama_model(ollama_host: str, model: str) -> bool:
"""Pulls a model if not already available. Returns success."""

warmup_model()

Sends a small prompt to warm up the model in the serving backend:

async def warmup_model(config: GraniteModelConfig, agent_name: str | None = None) -> bool:
"""Sends a warmup prompt to ensure the model is loaded in memory."""

Why warmup? The first inference request to Ollama/vLLM is slow because the model weight must be loaded from disk into GPU memory. Warmup sends a minimal prompt during startup so the model is already hot when real alerts arrive.

Integration Points

With the Factory

# aurorasoc/agents/factory.py
class AuroraAgentFactory:
def __init__(self, granite_config=None):
self.granite_config = granite_config or get_default_granite_config()

def _llm_for(self, agent_name: str) -> ChatModel:
return create_granite_chat_model(self.granite_config, agent_name)

With Settings

# aurorasoc/config/settings.py
class Settings(BaseSettings):
LLM_BACKEND: str = "vllm"
VLLM_BASE_URL: str = "http://vllm:8000/v1"
VLLM_MODEL: str = "granite-soc-specialist"
VLLM_ORCHESTRATOR_MODEL: str = "granite-soc-specialist"
OLLAMA_BASE_URL: str = "http://ollama:11434"
OLLAMA_MODEL: str = "granite4:8b"
OLLAMA_ORCHESTRATOR_MODEL: str = "granite4:dense"
# plus granite.* toggles for Ollama finetuned/per-agent behavior

With Docker Compose

# docker-compose.yml
x-granite-env: &granite-env
LLM_BACKEND: ${LLM_BACKEND:-vllm}
VLLM_BASE_URL: ${VLLM_BASE_URL:-http://vllm:8000/v1}
VLLM_MODEL: ${VLLM_MODEL:-granite-soc-specialist}
VLLM_ORCHESTRATOR_MODEL: ${VLLM_ORCHESTRATOR_MODEL:-granite-soc-specialist}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-http://ollama:11434}
# ... injected into all agent services

Adding a New Model

To add a new model variant:

  1. Add to AGENT_MODEL_MAP (if per-agent):

    AGENT_MODEL_MAP["my_new_agent"] = "granite-soc-my-new-agent"
  2. Train the model using finetune_granite.py --agent my_new_agent

  3. Import to Ollama using serve_model.py ollama

  4. Resolve automaticallyresolve_model("my_new_agent") will find it

To use a completely different model family (not Granite), set backend model IDs directly:

  1. vLLM path: set VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL
  2. Ollama path: set OLLAMA_MODEL and OLLAMA_ORCHESTRATOR_MODEL
  3. For one-off agent overrides in code/tests, provide model_overrides in GraniteModelConfig

Next Steps