انتقل إلى المحتوى الرئيسي

Granite Module Deep Dive

The Granite module (aurorasoc/granite/) is the core abstraction that connects AuroraSOC agents to IBM Granite 4 models. It handles model configuration, 4-tier resolution, ChatModel creation, and model health monitoring.

Module Structure

aurorasoc/granite/
├── __init__.py # GraniteModelConfig, resolve_model(), create_granite_chat_model()
└── registry.py # Health checks, model availability, warmup

GraniteModelConfig Dataclass

The central configuration object:

@dataclass
class GraniteModelConfig:
model_name: str = "granite3.2:2b"
serving_backend: ServingBackend = ServingBackend.OLLAMA
ollama_host: str = "http://localhost:11434"
vllm_api_base: str = "http://localhost:8000"
use_finetuned: bool = False
use_per_agent_models: bool = False
finetuned_model_tag: str = "granite-soc:latest"
model_override: str | None = None
FieldTypeDefaultDescription
model_namestr"granite3.2:2b"Base model name in the Ollama registry
serving_backendServingBackendOLLAMAWhich backend to use (OLLAMA or VLLM)
ollama_hoststr"http://localhost:11434"Ollama server URL
vllm_api_basestr"http://localhost:8000"vLLM OpenAI-compatible API URL
use_finetunedboolFalseEnable generic fine-tuned model fallback
use_per_agent_modelsboolFalseEnable per-agent model resolution
finetuned_model_tagstr"granite-soc:latest"Ollama tag for the generic fine-tuned model
model_overridestr | NoneNoneForce a specific model for all agents

ServingBackend Enum

class ServingBackend(str, Enum):
OLLAMA = "ollama"
VLLM = "vllm"

The 4-Tier Model Resolution

The resolve_model() method implements a priority-based resolution strategy:

def resolve_model(self, agent_name: str | None = None) -> str:
# Tier 1: Explicit override (highest priority)
if self.model_override:
return self.model_override

# Tier 2: Per-agent fine-tuned model
if self.use_per_agent_models and agent_name:
agent_tag = AGENT_MODEL_MAP.get(agent_name)
if agent_tag:
return f"{agent_tag}:latest"

# Tier 3: Generic fine-tuned model
if self.use_finetuned:
return self.finetuned_model_tag

# Tier 4: Base model (lowest priority)
return self.model_name

Resolution Priority Explained

TierWhen It ActivatesUse Case
1. OverrideGRANITE_MODEL_OVERRIDE env var is setTesting a specific model across all agents; A/B testing
2. Per-Agentuse_per_agent_models=true + agent has entry in AGENT_MODEL_MAPProduction with per-agent specialists
3. Generic Finetuneduse_finetuned=trueProduction with a single fine-tuned model
4. Base ModelNo fine-tuning configuredDevelopment, initial setup, or before fine-tuning

Fallback Behavior

If per-agent models are enabled but a specific agent doesn't have a trained model (not in AGENT_MODEL_MAP or model not available in Ollama), the resolution falls through to Tier 3 (generic finetuned) and then to Tier 4 (base model). This means you can train some specialists while leaving others on the generic model.

AGENT_MODEL_MAP

Maps agent names to Ollama model tags:

AGENT_MODEL_MAP = {
"security_analyst": "granite-soc-security-analyst",
"threat_hunter": "granite-soc-threat-hunter",
"malware_analyst": "granite-soc-malware-analyst",
"incident_responder": "granite-soc-incident-responder",
"network_security": "granite-soc-network-security",
"cps_security": "granite-soc-cps-security",
"threat_intel": "granite-soc-threat-intel",
"ueba_analyst": "granite-soc-ueba-analyst",
"forensic_analyst": "granite-soc-forensic-analyst",
"endpoint_security": "granite-soc-endpoint-security",
"web_security": "granite-soc-web-security",
"cloud_security": "granite-soc-cloud-security",
"compliance_analyst": "granite-soc-compliance-analyst",
"vulnerability_manager": "granite-soc-vulnerability-manager",
"report_generator": "granite-soc-report-generator",
"orchestrator": "granite-soc-orchestrator",
}

These tags correspond to the Ollama models created by serve_model.py ollama-all. The :latest tag is appended automatically by resolve_model().

create_granite_chat_model()

Creates a BeeAI ChatModel instance based on the resolved model and backend:

def create_granite_chat_model(
config: GraniteModelConfig,
agent_name: str | None = None,
) -> ChatModel:
model_id = config.resolve_model(agent_name)

if config.serving_backend == ServingBackend.OLLAMA:
return ChatModel.from_name(
f"ollama:{model_id}",
base_url=config.ollama_host,
)
elif config.serving_backend == ServingBackend.VLLM:
return ChatModel.from_name(
f"openai:{model_id}",
base_url=f"{config.vllm_api_base}/v1",
)

Backend-Specific Parameters

BackendChatModel ProviderURL FormatProtocol
Ollamaollama:<model>http://host:11434Ollama API (native)
vLLMopenai:<model>http://host:8000/v1OpenAI-compatible API

Why openai: for vLLM? vLLM exposes an OpenAI-compatible /v1/chat/completions endpoint. BeeAI's ChatModel.from_name("openai:...") makes standard OpenAI API calls, which vLLM handles transparently.

get_default_granite_config()

Creates a config from environment variables:

def get_default_granite_config() -> GraniteModelConfig:
return GraniteModelConfig(
model_name=os.getenv("GRANITE_MODEL_NAME", "granite3.2:2b"),
serving_backend=ServingBackend(
os.getenv("GRANITE_SERVING_BACKEND", "ollama")
),
ollama_host=os.getenv("OLLAMA_HOST", "http://localhost:11434"),
vllm_api_base=os.getenv("VLLM_API_BASE", "http://localhost:8000"),
use_finetuned=os.getenv("GRANITE_USE_FINETUNED", "").lower() == "true",
use_per_agent_models=os.getenv(
"GRANITE_USE_PER_AGENT_MODELS", ""
).lower() == "true",
finetuned_model_tag=os.getenv(
"GRANITE_FINETUNED_MODEL_TAG", "granite-soc:latest"
),
model_override=os.getenv("GRANITE_MODEL_OVERRIDE") or None,
)

Note: Boolean environment variables are compared as lowercase strings to "true". Empty strings and unset variables default to False.

Model Registry (registry.py)

The registry module provides health checking and model management:

check_ollama_models()

Queries Ollama for available models:

async def check_ollama_models(ollama_host: str) -> list[str]:
"""Returns list of model tags available in Ollama."""

Used during startup to verify that required models are pulled and available.

check_vllm_models()

Checks vLLM's /v1/models endpoint:

async def check_vllm_models(vllm_base: str) -> list[str]:
"""Returns list of models loaded in vLLM."""

pull_ollama_model()

Pulls a model from the Ollama registry:

async def pull_ollama_model(ollama_host: str, model: str) -> bool:
"""Pulls a model if not already available. Returns success."""

warmup_model()

Sends a small prompt to warm up the model in the serving backend:

async def warmup_model(config: GraniteModelConfig, agent_name: str | None = None) -> bool:
"""Sends a warmup prompt to ensure the model is loaded in memory."""

Why warmup? The first inference request to Ollama/vLLM is slow because the model weight must be loaded from disk into GPU memory. Warmup sends a minimal prompt during startup so the model is already hot when real alerts arrive.

Integration Points

With the Factory

# aurorasoc/agents/factory.py
class AuroraAgentFactory:
def __init__(self, granite_config=None):
self.granite_config = granite_config or get_default_granite_config()

def _llm_for(self, agent_name: str) -> ChatModel:
return create_granite_chat_model(self.granite_config, agent_name)

With Settings

# aurorasoc/config/settings.py
class Settings(BaseSettings):
granite_model_name: str = "granite3.2:2b"
granite_serving_backend: str = "ollama"
granite_use_finetuned: bool = False
# ... maps to GraniteModelConfig fields

With Docker Compose

# docker-compose.yml
x-granite-env: &granite-env
GRANITE_MODEL_NAME: ${GRANITE_MODEL_NAME:-granite3.2:2b}
GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
# ... injected into all agent services

Adding a New Model

To add a new model variant:

  1. Add to AGENT_MODEL_MAP (if per-agent):

    AGENT_MODEL_MAP["my_new_agent"] = "granite-soc-my-new-agent"
  2. Train the model using finetune_granite.py --agent my_new_agent

  3. Import to Ollama using serve_model.py ollama

  4. Resolve automaticallyresolve_model("my_new_agent") will find it

To use a completely different model family (not Granite):

  1. Set GRANITE_MODEL_OVERRIDE=<model_name> to use the same model for all agents
  2. Or modify create_granite_chat_model() to handle additional backends

Next Steps