Granite Module Deep Dive

The Granite module (aurorasoc/granite/) is the core abstraction that connects AuroraSOC agents to IBM Granite 4 models. It handles backend-aware model configuration, model resolution, ChatModel creation, and model health monitoring.

Module Structure

aurorasoc/granite/
├── __init__.py     # GraniteModelConfig, resolve_model(), create_granite_chat_model()
└── registry.py     # Health checks, model availability, warmup

`GraniteModelConfig` Dataclass

The central configuration object:

@dataclass
class GraniteModelConfig:
    backend: ServingBackend = ServingBackend.OLLAMA
    vllm_base_url: str = "http://vllm:8000/v1"
    ollama_base_url: str = "http://ollama:11434"
    use_finetuned: bool = True
    use_per_agent_models: bool = False
    finetuned_model_tag: str = "granite-soc:latest"
    vllm_specialist_model: str = "granite-soc-specialist"
    vllm_orchestrator_model: str = "granite-soc-specialist"
    ollama_specialist_model: str = "granite4:8b"
    ollama_orchestrator_model: str = "granite4:dense"
    model_overrides: dict[str, str] = field(default_factory=dict)

Field	Type	Default	Description
`backend`	ServingBackend	`OLLAMA`	Which backend to use (`OLLAMA` or `VLLM`)
`vllm_base_url`	str	`"http://vllm:8000/v1"`	vLLM OpenAI-compatible API URL
`ollama_base_url`	str	`"http://ollama:11434"`	Ollama server URL
`use_finetuned`	bool	`True`	Enable generic fine-tuned fallback in Ollama mode
`use_per_agent_models`	bool	`False`	Enable per-agent model resolution in Ollama mode
`finetuned_model_tag`	str	`"granite-soc:latest"`	Ollama tag for the generic fine-tuned model
`vllm_specialist_model`	str	`"granite-soc-specialist"`	Served vLLM model ID for specialist agents
`vllm_orchestrator_model`	str	`"granite-soc-specialist"`	Served vLLM model ID for orchestrator
`ollama_specialist_model`	str	`"granite4:8b"`	Ollama specialist base tag when fine-tuning is off
`ollama_orchestrator_model`	str	`"granite4:dense"`	Ollama orchestrator base tag when fine-tuning is off
`model_overrides`	dict[str, str]	`{}`	Explicit per-agent override map (highest priority)

`ServingBackend` Enum

class ServingBackend(str, Enum):
    OLLAMA = "ollama"
    VLLM = "vllm"

Model Resolution Priority

The resolve_model() method implements a backend-aware priority strategy:

def resolve_model(self, agent_name: str) -> str:
    # Tier 1: Explicit per-agent override
    if agent_name in self.model_overrides:
        return self._format_model(self.model_overrides[agent_name])

    # Tier 2: vLLM explicit served model IDs
    if self.backend == ServingBackend.VLLM:
        model = (
            self.vllm_orchestrator_model
            if agent_name == "Orchestrator"
            else self.vllm_specialist_model
        )
        return self._format_model(model)

    # Tier 3: Ollama per-agent specialists
    if self.use_per_agent_models and agent_name in AGENT_MODEL_MAP:
        return self._format_model(AGENT_MODEL_MAP[agent_name])

    # Tier 4: Ollama generic fine-tuned fallback
    if self.use_finetuned:
        return self._format_model(self.finetuned_model_tag)

    # Tier 5: Ollama base specialist/orchestrator tags
    if agent_name == "Orchestrator":
        return self._format_model(self.ollama_orchestrator_model)
    return self._format_model(self.ollama_specialist_model)

Resolution Priority Explained

Tier	When It Activates	Use Case
1. Explicit Override	`model_overrides[agent_name]` is set	Force one agent to a specific model for targeted debugging
2. vLLM Model IDs	`backend == VLLM`	Use served vLLM model IDs from runtime settings
3. Per-Agent Ollama	`use_per_agent_models=true` + agent in `AGENT_MODEL_MAP`	Production with per-agent Ollama specialists
4. Generic Ollama Finetuned	`use_finetuned=true`	Production with one generic fine-tuned Ollama model
5. Ollama Base Tags	No fine-tuning path selected	Development or fallback to base Granite tags

Fallback Behavior

If LLM_BACKEND=vllm, resolution uses VLLM_MODEL / VLLM_ORCHESTRATOR_MODEL after explicit overrides and does not use Ollama-specific tiers. In Ollama mode, missing per-agent tags fall back to finetuned_model_tag, then base Ollama tags.

`AGENT_MODEL_MAP`

Maps agent names to Ollama model tags:

AGENT_MODEL_MAP = {
    "security_analyst":      "granite-soc-security-analyst",
    "threat_hunter":         "granite-soc-threat-hunter",
    "malware_analyst":       "granite-soc-malware-analyst",
    "incident_responder":    "granite-soc-incident-responder",
    "network_security":      "granite-soc-network-security",
    "cps_security":          "granite-soc-cps-security",
    "threat_intel":          "granite-soc-threat-intel",
    "ueba_analyst":          "granite-soc-ueba-analyst",
    "forensic_analyst":      "granite-soc-forensic-analyst",
    "endpoint_security":     "granite-soc-endpoint-security",
    "web_security":          "granite-soc-web-security",
    "cloud_security":        "granite-soc-cloud-security",
    "compliance_analyst":    "granite-soc-compliance-analyst",
    "vulnerability_manager": "granite-soc-vulnerability-manager",
    "report_generator":      "granite-soc-report-generator",
    "orchestrator":          "granite-soc-orchestrator",
}

These tags correspond to the Ollama models created by serve_model.py ollama-all. The :latest tag is appended automatically by resolve_model().

`create_granite_chat_model()`

Creates a BeeAI ChatModel instance based on the resolved model and backend:

def create_granite_chat_model(
    agent_name: str,
    config: GraniteModelConfig | None = None,
) -> ChatModel:
    model_name = config.resolve_model(agent_name)
    provider_options = {}

    if config.backend == ServingBackend.OLLAMA:
        provider_options["base_url"] = config.ollama_base_url
    elif config.backend == ServingBackend.VLLM:
        provider_options["base_url"] = config.vllm_base_url

    return ChatModel.from_name(model_name, provider_options=provider_options)

Backend-Specific Parameters

Backend	ChatModel Provider	URL Format	Protocol
Ollama	`ollama:<model>`	`http://host:11434`	Ollama API (native)
vLLM	`openai:<model>`	`http://host:8000/v1`	OpenAI-compatible API

Why openai: for vLLM? vLLM exposes an OpenAI-compatible /v1/chat/completions endpoint. BeeAI's ChatModel.from_name("openai:...") makes standard OpenAI API calls, which vLLM handles transparently.

`get_default_granite_config()`

Creates a config from environment variables:

def get_default_granite_config() -> GraniteModelConfig:
    settings = get_settings()
    granite = settings.granite

    return GraniteModelConfig(
        backend=ServingBackend.VLLM if settings.LLM_BACKEND == "vllm" else ServingBackend.OLLAMA,
        ollama_base_url=settings.OLLAMA_BASE_URL,
        vllm_base_url=settings.VLLM_BASE_URL,
        use_finetuned=granite.use_finetuned,
        use_per_agent_models=granite.use_per_agent_models,
        finetuned_model_tag=granite.finetuned_model_tag,
        vllm_specialist_model=settings.VLLM_MODEL,
        vllm_orchestrator_model=settings.VLLM_ORCHESTRATOR_MODEL,
        ollama_specialist_model=resolve_ollama_model_tag(settings.OLLAMA_MODEL),
        ollama_orchestrator_model=resolve_ollama_model_tag(
            settings.OLLAMA_ORCHESTRATOR_MODEL,
            orchestrator=True,
        ),
    )

Note: Boolean environment variables are compared as lowercase strings to "true". Empty strings and unset variables default to False.

Model Registry (`registry.py`)

The registry module provides health checking and model management:

`check_ollama_models()`

Queries Ollama for available models:

async def check_ollama_models(ollama_host: str) -> list[str]:
    """Returns list of model tags available in Ollama."""

Used during startup to verify that required models are pulled and available.

`check_vllm_models()`

Checks vLLM's /v1/models endpoint:

async def check_vllm_models(vllm_base: str) -> list[str]:
    """Returns list of models loaded in vLLM."""

`pull_ollama_model()`

Pulls a model from the Ollama registry:

async def pull_ollama_model(ollama_host: str, model: str) -> bool:
    """Pulls a model if not already available. Returns success."""

`warmup_model()`

Sends a small prompt to warm up the model in the serving backend:

async def warmup_model(config: GraniteModelConfig, agent_name: str | None = None) -> bool:
    """Sends a warmup prompt to ensure the model is loaded in memory."""

Why warmup? The first inference request to Ollama/vLLM is slow because the model weight must be loaded from disk into GPU memory. Warmup sends a minimal prompt during startup so the model is already hot when real alerts arrive.

Integration Points

With the Factory

# aurorasoc/agents/factory.py
class AuroraAgentFactory:
    def __init__(self, granite_config=None):
        self.granite_config = granite_config or get_default_granite_config()

    def _llm_for(self, agent_name: str) -> ChatModel:
        return create_granite_chat_model(self.granite_config, agent_name)

With Settings

# aurorasoc/config/settings.py
class Settings(BaseSettings):
    LLM_BACKEND: str = "vllm"
    VLLM_BASE_URL: str = "http://vllm:8000/v1"
    VLLM_MODEL: str = "granite-soc-specialist"
    VLLM_ORCHESTRATOR_MODEL: str = "granite-soc-specialist"
    OLLAMA_BASE_URL: str = "http://ollama:11434"
    OLLAMA_MODEL: str = "granite4:8b"
    OLLAMA_ORCHESTRATOR_MODEL: str = "granite4:dense"
    # plus granite.* toggles for Ollama finetuned/per-agent behavior

With Docker Compose

# docker-compose.yml
x-granite-env: &granite-env
    LLM_BACKEND: ${LLM_BACKEND:-vllm}
    VLLM_BASE_URL: ${VLLM_BASE_URL:-http://vllm:8000/v1}
    VLLM_MODEL: ${VLLM_MODEL:-granite-soc-specialist}
    VLLM_ORCHESTRATOR_MODEL: ${VLLM_ORCHESTRATOR_MODEL:-granite-soc-specialist}
  GRANITE_USE_FINETUNED: ${GRANITE_USE_FINETUNED:-false}
  GRANITE_USE_PER_AGENT_MODELS: ${GRANITE_USE_PER_AGENT_MODELS:-false}
    OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-http://ollama:11434}
  # ... injected into all agent services

Adding a New Model

To add a new model variant:

Add to AGENT_MODEL_MAP (if per-agent):

AGENT_MODEL_MAP["my_new_agent"] = "granite-soc-my-new-agent"

Train the model using finetune_granite.py --agent my_new_agent
Import to Ollama using serve_model.py ollama
Resolve automatically — resolve_model("my_new_agent") will find it

To use a completely different model family (not Granite), set backend model IDs directly:

vLLM path: set VLLM_MODEL and VLLM_ORCHESTRATOR_MODEL
Ollama path: set OLLAMA_MODEL and OLLAMA_ORCHESTRATOR_MODEL
For one-off agent overrides in code/tests, provide model_overrides in GraniteModelConfig

Next Steps

Model Swap Guide — quickly switch between model configurations
Serving Backends — Ollama vs vLLM in detail

Module Structure​

GraniteModelConfig Dataclass​

ServingBackend Enum​

Model Resolution Priority​

Resolution Priority Explained​

Fallback Behavior​

AGENT_MODEL_MAP​

create_granite_chat_model()​

Backend-Specific Parameters​

get_default_granite_config()​

Model Registry (registry.py)​

check_ollama_models()​

check_vllm_models()​

pull_ollama_model()​

warmup_model()​

Integration Points​

With the Factory​

With Settings​

With Docker Compose​

Adding a New Model​

Next Steps​