Granite Module Deep Dive
The Granite module (aurorasoc/granite/) is the core abstraction that connects AuroraSOC agents to IBM Granite 4 models. It handles model configuration, 4-tier resolution, ChatModel creation, and model health monitoring.
Module Structure
aurorasoc/granite/
├── __init__.py # GraniteModelConfig, resolve_model(), create_granite_chat_model()
└── registry.py # Health checks, model availability, warmup
GraniteModelConfig Dataclass
The central configuration object:
@dataclass
class GraniteModelConfig:
model_name: str = "granite3.2:2b"
serving_backend: ServingBackend = ServingBackend.OLLAMA
ollama_host: str = "http://localhost:11434"
vllm_api_base: str = "http://localhost:8000"
use_finetuned: bool = False
use_per_agent_models: bool = False
finetuned_model_tag: str = "granite-soc:latest"
model_override: str | None = None
| Field | Type | Default | Description |
|---|---|---|---|
model_name | str | "granite3.2:2b" | Base model name in the Ollama registry |
serving_backend | ServingBackend | OLLAMA | Which backend to use (OLLAMA or VLLM) |
ollama_host | str | "http://localhost:11434" | Ollama server URL |
vllm_api_base | str | "http://localhost:8000" | vLLM OpenAI-compatible API URL |
use_finetuned | bool | False | Enable generic fine-tuned model fallback |
use_per_agent_models | bool | False | Enable per-agent model resolution |
finetuned_model_tag | str | "granite-soc:latest" | Ollama tag for the generic fine-tuned model |
model_override | str | None | None | Force a specific model for all agents |
ServingBackend Enum
class ServingBackend(str, Enum):
OLLAMA = "ollama"
VLLM = "vllm"
The 4-Tier Model Resolution
The resolve_model() method implements a priority-based resolution strategy:
def resolve_model(self, agent_name: str | None = None) -> str:
# Tier 1: Explicit override (highest priority)
if self.model_override:
return self.model_override
# Tier 2: Per-agent fine-tuned model
if self.use_per_agent_models and agent_name:
agent_tag = AGENT_MODEL_MAP.get(agent_name)
if agent_tag:
return f"{agent_tag}:latest"
# Tier 3: Generic fine-tuned model
if self.use_finetuned:
return self.finetuned_model_tag
# Tier 4: Base model (lowest priority)
return self.model_name
Resolution Priority Explained
| Tier | When It Activates | Use Case |
|---|---|---|
| 1. Override | GRANITE_MODEL_OVERRIDE env var is set | Testing a specific model across all agents; A/B testing |
| 2. Per-Agent | use_per_agent_models=true + agent has entry in AGENT_MODEL_MAP | Production with per-agent specialists |
| 3. Generic Finetuned | use_finetuned=true | Production with a single fine-tuned model |
| 4. Base Model | No fine-tuning configured | Development, initial setup, or before fine-tuning |
Fallback Behavior
If per-agent models are enabled but a specific agent doesn't have a trained model (not in AGENT_MODEL_MAP or model not available in Ollama), the resolution falls through to Tier 3 (generic finetuned) and then to Tier 4 (base model). This means you can train some specialists while leaving others on the generic model.
AGENT_MODEL_MAP
Maps agent names to Ollama model tags:
AGENT_MODEL_MAP = {
"security_analyst": "granite-soc-security-analyst",
"threat_hunter": "granite-soc-threat-hunter",
"malware_analyst": "granite-soc-malware-analyst",
"incident_responder": "granite-soc-incident-responder",
"network_security": "granite-soc-network-security",
"cps_security": "granite-soc-cps-security",
"threat_intel": "granite-soc-threat-intel",
"ueba_analyst": "granite-soc-ueba-analyst",
"forensic_analyst": "granite-soc-forensic-analyst",
"endpoint_security": "granite-soc-endpoint-security",
"web_security": "granite-soc-web-security",
"cloud_security": "granite-soc-cloud-security",
"compliance_analyst": "granite-soc-compliance-analyst",
"vulnerability_manager": "granite-soc-vulnerability-manager",
"report_generator": "granite-soc-report-generator",
"orchestrator": "granite-soc-orchestrator",
}
These tags correspond to the Ollama models created by serve_model.py ollama-all. The :latest tag is appended automatically by resolve_model().
create_granite_chat_model()
Creates a BeeAI ChatModel instance based on the resolved model and backend:
def create_granite_chat_model(
config: GraniteModelConfig,
agent_name: str | None = None,
) -> ChatModel:
model_id = config.resolve_model(agent_name)
if config.serving_backend == ServingBackend.OLLAMA:
return ChatModel.from_name(
f"ollama:{model_id}",
base_url=config.ollama_host,
)
elif config.serving_backend == ServingBackend.VLLM:
return ChatModel.from_name(
f"openai:{model_id}",
base_url=f"{config.vllm_api_base}/v1",
)
Backend-Specific Parameters
| Backend | ChatModel Provider | URL Format | Protocol |
|---|---|---|---|
| Ollama | ollama:<model> | http://host:11434 | Ollama API (native) |
| vLLM | openai:<model> | http://host:8000/v1 | OpenAI-compatible API |
Why openai: for vLLM? vLLM exposes an OpenAI-compatible /v1/chat/completions endpoint. BeeAI's ChatModel.from_name("openai:...") makes standard OpenAI API calls, which vLLM handles transparently.
get_default_granite_config()
Creates a config from environment variables:
def get_default_granite_config() -> GraniteModelConfig:
return GraniteModelConfig(
model_name=os.getenv("GRANITE_MODEL_NAME", "granite3.2:2b"),
serving_backend=ServingBackend(
os.getenv("GRANITE_SERVING_BACKEND", "ollama")
),
ollama_host=os.getenv("OLLAMA_HOST", "http://localhost:11434"),
vllm_api_base=os.getenv("VLLM_API_BASE", "http://localhost:8000"),
use_finetuned=os.getenv("GRANITE_USE_FINETUNED", "").lower() == "true",
use_per_agent_models=os.getenv(
"GRANITE_USE_PER_AGENT_MODELS", ""
).lower() == "true",
finetuned_model_tag=os.getenv(
"GRANITE_FINETUNED_MODEL_TAG", "granite-soc:latest"
),
model_override=os.getenv("GRANITE_MODEL_OVERRIDE") or None,
)
Note: Boolean environment variables are compared as lowercase strings to "true". Empty strings and unset variables default to False.
Model Registry (registry.py)
The registry module provides health checking and model management:
check_ollama_models()
Queries Ollama for available models:
async def check_ollama_models(ollama_host: str) -> list[str]:
"""Returns list of model tags available in Ollama."""
Used during startup to verify that required models are pulled and available.
check_vllm_models()
Checks vLLM's /v1/models endpoint:
async def check_vllm_models(vllm_base: str) -> list[str]:
"""Returns list of models loaded in vLLM."""
pull_ollama_model()
Pulls a model from the Ollama registry:
async def pull_ollama_model(ollama_host: str, model: str) -> bool:
"""Pulls a model if not already available. Returns success."""
warmup_model()
Sends a small prompt to warm up the model in the serving backend:
async def warmup_model(config: GraniteModelConfig, agent_name: str | None = None) -> bool:
"""Sends a warmup prompt to ensure the model is loaded in memory."""
Why warmup? The first inference request to Ollama/vLLM is slow because the model weight must be loaded from disk into GPU memory. Warmup sends a minimal prompt during startup so the model is already hot when real alerts arrive.
Integration Points
With the Factory
# aurorasoc/agents/factory.py
class AuroraAgentFactory:
def __init__(self, granite_config=None):
self.granite_config = granite_config or get_default_granite_config()
def _llm_for(self, agent_name: str) -> ChatModel:
return create_granite_chat_model(self.granite_config, agent_name)
With Settings
# aurorasoc/config/settings.py
class Settings(BaseSettings):
granite_model_name: str = "granite3.2:2b"
granite_serving_backend: str = "ollama"
granite_use_finetuned: bool = False
# ... maps to GraniteModelConfig fields
With Docker Compose
# docker-compose.yml
x-granite-env: &granite-env
GRANITE_MODEL_NAME: ${GRANITE_MODEL_NAME:-granite3.2:2b}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
# ... injected into all agent services
Adding a New Model
To add a new model variant:
-
Add to
AGENT_MODEL_MAP(if per-agent):AGENT_MODEL_MAP["my_new_agent"] = "granite-soc-my-new-agent" -
Train the model using
finetune_granite.py --agent my_new_agent -
Import to Ollama using
serve_model.py ollama -
Resolve automatically —
resolve_model("my_new_agent")will find it
To use a completely different model family (not Granite):
- Set
GRANITE_MODEL_OVERRIDE=<model_name>to use the same model for all agents - Or modify
create_granite_chat_model()to handle additional backends
Next Steps
- Model Swap Guide — quickly switch between model configurations
- Serving Backends — Ollama vs vLLM in detail