LLM Doctor

make llm-doctor runs scripts/llm_doctor.py, a three-stage probe that validates the shared local LLM that the entire AuroraSOC agent fleet shares. It is the fastest way to confirm a freshly cloned laptop can drive all 14 agents without surprises.

What it checks

Stage	Probe	Failure mode
1	`GET ${OLLAMA_BASE_URL}/api/tags`	Ollama daemon unreachable, or `OLLAMA_MODEL` not pulled.
2	`POST ${OLLAMA_BASE_URL}/api/chat` with `keep_alive=30m` and a tiny prompt	Model fails to load into VRAM, or first-token latency is unacceptable.
3	`aurorasoc.granite.create_granite_chat_model("Orchestrator")` driven by `get_default_granite_config()`	BeeAI ChatModel pool, LiteLLM adapter, or `LLMSettings` resolution is misconfigured.

Stage 3 mirrors the exact code path every agent (orchestrator + 13 specialists) uses at runtime, so a green doctor implies the fleet itself will resolve the same model identifier.

Defaults

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=granite3.2:8b
OLLAMA_KEEP_ALIVE=30m

These match the values shipped in .env.example and the docker-up-minimal Makefile target. Override any of them inline:

OLLAMA_MODEL=qwen2.5:7b-instruct make llm-doctor

Running

make llm-doctor

Expected healthy output:

AuroraSOC LLM doctor — model=granite3.2:8b base=http://localhost:11434

[1/3] GET http://localhost:11434/api/tags
      PASS: granite3.2:8b present (65 ms)

[2/3] POST http://localhost:11434/api/chat (keep_alive=30m)
      PASS: warmup 0.13s, response: 'Ready.'

[3/3] aurorasoc.granite.create_granite_chat_model('Orchestrator')
      PASS: BeeAI 1.89s, response: 'Ready.'

All probes passed.

The script exits non-zero (2, 3, or 4) if any stage fails so it can gate CI smoke jobs and release scripts.

Common failures

unreachable: ConnectError — Ollama is not running. Start it with ollama serve (or make docker-up-minimal for the containerised path).
granite3.2:8b not installed — Pull the default model: make ollama-pull-granite. AuroraSOC never auto-pulls models.
empty response in stage 2 — Usually a VRAM/keep-alive issue. Confirm OLLAMA_MAX_LOADED_MODELS=1 and OLLAMA_NUM_PARALLEL=1 are exported, then retry.
model 'granite-soc:latest' not found in stage 3 — A stale GRANITE_USE_FINETUNED=true is set somewhere. Either unset it or run make llm-fallback-qwen to refresh .env.

Falling back to qwen2.5:7b-instruct

The supported escape hatch is:

make llm-fallback-qwen   # pulls qwen2.5:7b-instruct and rewrites .env
make llm-doctor          # re-validate the new shared model

This stays consistent with the single-model contract: every agent still shares one tag — just a different one.

Prometheus signal

The warmup path also records a histogram, aurora_llm_warmup_seconds, labelled by model and outcome (ok/error). Scrape it from the API process to track first-token latency over time.

What it checks​

Defaults​

Running​

Common failures​

Falling back to qwen2.5:7b-instruct​

Prometheus signal​

Related​