Granite Module Deep Dive
The Granite module (aurorasoc/granite/) is the core abstraction that connects AuroraSOC agents to IBM Granite 4 models. It handles backend-aware model configuration, model resolution, ChatModel creation, and model health monitoring.
Module Structure
aurorasoc/granite/
├── __init__.py # GraniteModelConfig, resolve_model(), create_granite_chat_model()
└── registry.py # Health checks, model availability, warmup
GraniteModelConfig Dataclass
The central configuration object:
@dataclass
class GraniteModelConfig:
backend: ServingBackend = ServingBackend.OLLAMA
vllm_base_url: str = "http://vllm:8000/v1"
ollama_base_url: str = "http://ollama:11434"
use_finetuned: bool = True
use_per_agent_models: bool = False
finetuned_model_tag: str = "granite-soc:latest"
vllm_specialist_model: str = "granite-soc-specialist"
vllm_orchestrator_model: str = "granite-soc-specialist"
ollama_specialist_model: str = "granite4:8b"
ollama_orchestrator_model: str = "granite4:dense"
model_overrides: dict[str, str] = field(default_factory=dict)
| Field | Type | Default | Description |
|---|---|---|---|
backend | ServingBackend | OLLAMA | Which backend to use (OLLAMA or VLLM) |
vllm_base_url | str | "http://vllm:8000/v1" | vLLM OpenAI-compatible API URL |
ollama_base_url | str | "http://ollama:11434" | Ollama server URL |
use_finetuned | bool | True | Enable generic fine-tuned fallback in Ollama mode |
use_per_agent_models | bool | False | Enable per-agent model resolution in Ollama mode |
finetuned_model_tag | str | "granite-soc:latest" | Ollama tag for the generic fine-tuned model |
vllm_specialist_model | str | "granite-soc-specialist" | Served vLLM model ID for specialist agents |
vllm_orchestrator_model | str | "granite-soc-specialist" | Served vLLM model ID for orchestrator |
ollama_specialist_model | str | "granite4:8b" | Ollama specialist base tag when fine-tuning is off |
ollama_orchestrator_model | str | "granite4:dense" | Ollama orchestrator base tag when fine-tuning is off |
model_overrides | dict[str, str] | {} | Explicit per-agent override map (highest priority) |
ServingBackend Enum
class ServingBackend(str, Enum):
OLLAMA = "ollama"
VLLM = "vllm"
Model Resolution Priority
The resolve_model() method implements a backend-aware priority strategy:
def resolve_model(self, agent_name: str) -> str:
# Tier 1: Explicit per-agent override
if agent_name in self.model_overrides:
return self._format_model(self.model_overrides[agent_name])
# Tier 2: vLLM explicit served model IDs
if self.backend == ServingBackend.VLLM:
model = (
self.vllm_orchestrator_model
if agent_name == "Orchestrator"
else self.vllm_specialist_model
)
return self._format_model(model)
# Tier 3: Ollama per-agent specialists
if self.use_per_agent_models and agent_name in AGENT_MODEL_MAP:
return self._format_model(AGENT_MODEL_MAP[agent_name])
# Tier 4: Ollama generic fine-tuned fallback
if self.use_finetuned:
return self._format_model(self.finetuned_model_tag)
# Tier 5: Ollama base specialist/orchestrator tags
if agent_name == "Orchestrator":
return self._format_model(self.ollama_orchestrator_model)
return self._format_model(self.ollama_specialist_model)
Resolution Priority Explained
| Tier | When It Activates | Use Case |
|---|---|---|
| 1. Explicit Override | model_overrides[agent_name] is set | Force one agent to a specific model for targeted debugging |
| 2. vLLM Model IDs | backend == VLLM | Use served vLLM model IDs from runtime settings |
| 3. Per-Agent Ollama | use_per_agent_models=true + agent in AGENT_MODEL_MAP | Production with per-agent Ollama specialists |
| 4. Generic Ollama Finetuned | use_finetuned=true | Production with one generic fine-tuned Ollama model |
| 5. Ollama Base Tags | No fine-tuning path selected | Development or fallback to base Granite tags |
Fallback Behavior
If LLM_BACKEND=vllm, resolution uses VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL after explicit overrides and does not use Ollama-specific tiers. In Ollama mode, missing per-agent tags fall back to finetuned_model_tag, then base Ollama tags.
AGENT_MODEL_MAP
Maps agent names to Ollama model tags:
AGENT_MODEL_MAP = {
"security_analyst": "granite-soc-security-analyst",
"threat_hunter": "granite-soc-threat-hunter",
"malware_analyst": "granite-soc-malware-analyst",
"incident_responder": "granite-soc-incident-responder",
"network_security": "granite-soc-network-security",
"cps_security": "granite-soc-cps-security",
"threat_intel": "granite-soc-threat-intel",
"ueba_analyst": "granite-soc-ueba-analyst",
"forensic_analyst": "granite-soc-forensic-analyst",
"endpoint_security": "granite-soc-endpoint-security",
"web_security": "granite-soc-web-security",
"cloud_security": "granite-soc-cloud-security",
"compliance_analyst": "granite-soc-compliance-analyst",
"vulnerability_manager": "granite-soc-vulnerability-manager",
"report_generator": "granite-soc-report-generator",
"orchestrator": "granite-soc-orchestrator",
}
These tags correspond to the Ollama models created by serve_model.py ollama-all. The :latest tag is appended automatically by resolve_model().
create_granite_chat_model()
Creates a BeeAI ChatModel instance based on the resolved model and backend:
def create_granite_chat_model(
agent_name: str,
config: GraniteModelConfig | None = None,
) -> ChatModel:
model_name = config.resolve_model(agent_name)
provider_options = {}
if config.backend == ServingBackend.OLLAMA:
provider_options["base_url"] = config.ollama_base_url
elif config.backend == ServingBackend.VLLM:
provider_options["base_url"] = config.vllm_base_url
return ChatModel.from_name(model_name, provider_options=provider_options)
Backend-Specific Parameters
| Backend | ChatModel Provider | URL Format | Protocol |
|---|---|---|---|
| Ollama | ollama:<model> | http://host:11434 | Ollama API (native) |
| vLLM | openai:<model> | http://host:8000/v1 | OpenAI-compatible API |
Why openai: for vLLM? vLLM exposes an OpenAI-compatible /v1/chat/completions endpoint. BeeAI's ChatModel.from_name("openai:...") makes standard OpenAI API calls, which vLLM handles transparently.
get_default_granite_config()
Creates a config from environment variables:
def get_default_granite_config() -> GraniteModelConfig:
settings = get_settings()
granite = settings.granite
return GraniteModelConfig(
backend=ServingBackend.VLLM if settings.LLM_BACKEND == "vllm" else ServingBackend.OLLAMA,
ollama_base_url=settings.OLLAMA_BASE_URL,
vllm_base_url=settings.VLLM_BASE_URL,
use_finetuned=granite.use_finetuned,
use_per_agent_models=granite.use_per_agent_models,
finetuned_model_tag=granite.finetuned_model_tag,
vllm_specialist_model=settings.VLLM_MODEL,
vllm_orchestrator_model=settings.VLLM_ORCHESTRATOR_MODEL,
ollama_specialist_model=resolve_ollama_model_tag(settings.OLLAMA_MODEL),
ollama_orchestrator_model=resolve_ollama_model_tag(
settings.OLLAMA_ORCHESTRATOR_MODEL,
orchestrator=True,
),
)
Note: Boolean environment variables are compared as lowercase strings to "true". Empty strings and unset variables default to False.
Model Registry (registry.py)
The registry module provides health checking and model management:
check_ollama_models()
Queries Ollama for available models:
async def check_ollama_models(ollama_host: str) -> list[str]:
"""Returns list of model tags available in Ollama."""
Used during startup to verify that required models are pulled and available.
check_vllm_models()
Checks vLLM's /v1/models endpoint:
async def check_vllm_models(vllm_base: str) -> list[str]:
"""Returns list of models loaded in vLLM."""
pull_ollama_model()
Pulls a model from the Ollama registry:
async def pull_ollama_model(ollama_host: str, model: str) -> bool:
"""Pulls a model if not already available. Returns success."""
warmup_model()
Sends a small prompt to warm up the model in the serving backend:
async def warmup_model(config: GraniteModelConfig, agent_name: str | None = None) -> bool:
"""Sends a warmup prompt to ensure the model is loaded in memory."""
Why warmup? The first inference request to Ollama/vLLM is slow because the model weight must be loaded from disk into GPU memory. Warmup sends a minimal prompt during startup so the model is already hot when real alerts arrive.
Integration Points
With the Factory
# aurorasoc/agents/factory.py
class AuroraAgentFactory:
def __init__(self, granite_config=None):
self.granite_config = granite_config or get_default_granite_config()
def _llm_for(self, agent_name: str) -> ChatModel:
return create_granite_chat_model(self.granite_config, agent_name)
With Settings
# aurorasoc/config/settings.py
class Settings(BaseSettings):
LLM_BACKEND: str = "vllm"
VLLM_BASE_URL: str = "http://vllm:8000/v1"
VLLM_MODEL: str = "granite-soc-specialist"
VLLM_ORCHESTRATOR_MODEL: str = "granite-soc-specialist"
OLLAMA_BASE_URL: str = "http://ollama:11434"
OLLAMA_MODEL: str = "granite4:8b"
OLLAMA_ORCHESTRATOR_MODEL: str = "granite4:dense"
# plus granite.* toggles for Ollama finetuned/per-agent behavior
With Docker Compose
# docker-compose.yml
x-granite-env: &granite-env
LLM_BACKEND: ${LLM_BACKEND:-vllm}
VLLM_BASE_URL: ${VLLM_BASE_URL:-http://vllm:8000/v1}
VLLM_MODEL: ${VLLM_MODEL:-granite-soc-specialist}
VLLM_ORCHESTRATOR_MODEL: ${VLLM_ORCHESTRATOR_MODEL:-granite-soc-specialist}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-http://ollama:11434}
# ... injected into all agent services
Adding a New Model
To add a new model variant:
-
Add to
AGENT_MODEL_MAP(if per-agent):AGENT_MODEL_MAP["my_new_agent"] = "granite-soc-my-new-agent" -
Train the model using
finetune_granite.py --agent my_new_agent -
Import to Ollama using
serve_model.py ollama -
Resolve automatically —
resolve_model("my_new_agent")will find it
To use a completely different model family (not Granite), set backend model IDs directly:
- vLLM path: set
VLLM_MODELandVLLM_ORCHESTRATOR_MODEL - Ollama path: set
OLLAMA_MODELandOLLAMA_ORCHESTRATOR_MODEL - For one-off agent overrides in code/tests, provide
model_overridesinGraniteModelConfig
Next Steps
- Model Swap Guide — quickly switch between model configurations
- Serving Backends — Ollama vs vLLM in detail