LLM Doctor
make llm-doctor runs scripts/llm_doctor.py, a three-stage probe that
validates the shared local LLM that the entire AuroraSOC agent fleet shares.
It is the fastest way to confirm a freshly cloned laptop can drive all
14 agents without surprises.
What it checks
| Stage | Probe | Failure mode |
|---|---|---|
| 1 | GET ${OLLAMA_BASE_URL}/api/tags | Ollama daemon unreachable, or OLLAMA_MODEL not pulled. |
| 2 | POST ${OLLAMA_BASE_URL}/api/chat with keep_alive=30m and a tiny prompt | Model fails to load into VRAM, or first-token latency is unacceptable. |
| 3 | aurorasoc.granite.create_granite_chat_model("Orchestrator") driven by get_default_granite_config() | BeeAI ChatModel pool, LiteLLM adapter, or LLMSettings resolution is misconfigured. |
Stage 3 mirrors the exact code path every agent (orchestrator + 13 specialists) uses at runtime, so a green doctor implies the fleet itself will resolve the same model identifier.
Defaults
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=granite3.2:8b
OLLAMA_KEEP_ALIVE=30m
These match the values shipped in .env.example
and the docker-up-minimal Makefile target. Override any of them inline:
OLLAMA_MODEL=qwen2.5:7b-instruct make llm-doctor
Running
make llm-doctor
Expected healthy output:
AuroraSOC LLM doctor — model=granite3.2:8b base=http://localhost:11434
[1/3] GET http://localhost:11434/api/tags
PASS: granite3.2:8b present (65 ms)
[2/3] POST http://localhost:11434/api/chat (keep_alive=30m)
PASS: warmup 0.13s, response: 'Ready.'
[3/3] aurorasoc.granite.create_granite_chat_model('Orchestrator')
PASS: BeeAI 1.89s, response: 'Ready.'
All probes passed.
The script exits non-zero (2, 3, or 4) if any stage fails so it
can gate CI smoke jobs and release scripts.
Common failures
unreachable: ConnectError— Ollama is not running. Start it withollama serve(ormake docker-up-minimalfor the containerised path).granite3.2:8b not installed— Pull the default model:make ollama-pull-granite. AuroraSOC never auto-pulls models.empty responsein stage 2 — Usually a VRAM/keep-alive issue. ConfirmOLLAMA_MAX_LOADED_MODELS=1andOLLAMA_NUM_PARALLEL=1are exported, then retry.model 'granite-soc:latest' not foundin stage 3 — A staleGRANITE_USE_FINETUNED=trueis set somewhere. Either unset it or runmake llm-fallback-qwento refresh.env.
Falling back to qwen2.5:7b-instruct
The supported escape hatch is:
make llm-fallback-qwen # pulls qwen2.5:7b-instruct and rewrites .env
make llm-doctor # re-validate the new shared model
This stays consistent with the single-model contract: every agent still shares one tag — just a different one.
Prometheus signal
The warmup path also records a histogram, aurora_llm_warmup_seconds,
labelled by model and outcome (ok/error). Scrape it from the
API process to track first-token latency over time.