Skip to main content

LLM Doctor

make llm-doctor runs scripts/llm_doctor.py, a three-stage probe that validates the shared local LLM that the entire AuroraSOC agent fleet shares. It is the fastest way to confirm a freshly cloned laptop can drive all 14 agents without surprises.

What it checks

StageProbeFailure mode
1GET ${OLLAMA_BASE_URL}/api/tagsOllama daemon unreachable, or OLLAMA_MODEL not pulled.
2POST ${OLLAMA_BASE_URL}/api/chat with keep_alive=30m and a tiny promptModel fails to load into VRAM, or first-token latency is unacceptable.
3aurorasoc.granite.create_granite_chat_model("Orchestrator") driven by get_default_granite_config()BeeAI ChatModel pool, LiteLLM adapter, or LLMSettings resolution is misconfigured.

Stage 3 mirrors the exact code path every agent (orchestrator + 13 specialists) uses at runtime, so a green doctor implies the fleet itself will resolve the same model identifier.

Defaults

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=granite3.2:8b
OLLAMA_KEEP_ALIVE=30m

These match the values shipped in .env.example and the docker-up-minimal Makefile target. Override any of them inline:

OLLAMA_MODEL=qwen2.5:7b-instruct make llm-doctor

Running

make llm-doctor

Expected healthy output:

AuroraSOC LLM doctor — model=granite3.2:8b base=http://localhost:11434

[1/3] GET http://localhost:11434/api/tags
PASS: granite3.2:8b present (65 ms)

[2/3] POST http://localhost:11434/api/chat (keep_alive=30m)
PASS: warmup 0.13s, response: 'Ready.'

[3/3] aurorasoc.granite.create_granite_chat_model('Orchestrator')
PASS: BeeAI 1.89s, response: 'Ready.'

All probes passed.

The script exits non-zero (2, 3, or 4) if any stage fails so it can gate CI smoke jobs and release scripts.

Common failures

  • unreachable: ConnectError — Ollama is not running. Start it with ollama serve (or make docker-up-minimal for the containerised path).
  • granite3.2:8b not installed — Pull the default model: make ollama-pull-granite. AuroraSOC never auto-pulls models.
  • empty response in stage 2 — Usually a VRAM/keep-alive issue. Confirm OLLAMA_MAX_LOADED_MODELS=1 and OLLAMA_NUM_PARALLEL=1 are exported, then retry.
  • model 'granite-soc:latest' not found in stage 3 — A stale GRANITE_USE_FINETUNED=true is set somewhere. Either unset it or run make llm-fallback-qwen to refresh .env.

Falling back to qwen2.5:7b-instruct

The supported escape hatch is:

make llm-fallback-qwen # pulls qwen2.5:7b-instruct and rewrites .env
make llm-doctor # re-validate the new shared model

This stays consistent with the single-model contract: every agent still shares one tag — just a different one.

Prometheus signal

The warmup path also records a histogram, aurora_llm_warmup_seconds, labelled by model and outcome (ok/error). Scrape it from the API process to track first-token latency over time.