Observability Setup
AuroraSOC ships with full OpenTelemetry instrumentation, Prometheus metrics, and Grafana dashboards.
OpenTelemetry
All backend services emit distributed traces and metrics via the OpenTelemetry SDK.
Instrumented Libraries
| Library | What is Traced |
|---|---|
| FastAPI | Every HTTP request (method, path, status, latency) |
| SQLAlchemy | Database queries (table, operation, duration) |
| httpx | Outbound HTTP calls (to LLM, ticketing, OIDC providers) |
| Redis | Cache operations and stream reads/writes |
Configuration
Set OTEL_EXPORTER_ENDPOINT to enable tracing:
OTEL_EXPORTER_ENDPOINT=http://otel-collector:4317
The collector is included in both the dev and production compose stacks.
Custom Attributes
Every span carries:
aurora.site_id- the site this request belongs toaurora.agent_id- the agent that initiated the action (when applicable)service.name- alwaysaurorasoc-api
PgBouncer Connection Pooling
For MID and LARGE deployments, enable PgBouncer to prevent connection exhaustion:
docker compose -f docker-compose.yml -f docker-compose.pgbouncer.yml up -d
Configure the API to connect through PgBouncer by setting:
PG_HOST=pgbouncer
PG_PORT=6432
PgBouncer runs in transaction mode with a default pool of 20 connections.
Grafana Dashboards
25 pre-configured alert rules cover:
- Agent health (restart rate, response latency)
- Pipeline health (Suricata drops, ingest lag)
- LLM health (inference latency, error rate)
- Resource health (memory, disk, CPU)
- API health (error rate, latency percentiles)
- Database health (connection pool, replication lag)