إنتقل إلى المحتوى الرئيسي

Observability Setup

AuroraSOC ships with full OpenTelemetry instrumentation, Prometheus metrics, and Grafana dashboards.

OpenTelemetry

All backend services emit distributed traces and metrics via the OpenTelemetry SDK.

Instrumented Libraries

LibraryWhat is Traced
FastAPIEvery HTTP request (method, path, status, latency)
SQLAlchemyDatabase queries (table, operation, duration)
httpxOutbound HTTP calls (to LLM, ticketing, OIDC providers)
RedisCache operations and stream reads/writes

Configuration

Set OTEL_EXPORTER_ENDPOINT to enable tracing:

OTEL_EXPORTER_ENDPOINT=http://otel-collector:4317

The collector is included in both the dev and production compose stacks.

Custom Attributes

Every span carries:

  • aurora.site_id - the site this request belongs to
  • aurora.agent_id - the agent that initiated the action (when applicable)
  • service.name - always aurorasoc-api

PgBouncer Connection Pooling

For MID and LARGE deployments, enable PgBouncer to prevent connection exhaustion:

docker compose -f docker-compose.yml -f docker-compose.pgbouncer.yml up -d

Configure the API to connect through PgBouncer by setting:

PG_HOST=pgbouncer
PG_PORT=6432

PgBouncer runs in transaction mode with a default pool of 20 connections.

Grafana Dashboards

25 pre-configured alert rules cover:

  • Agent health (restart rate, response latency)
  • Pipeline health (Suricata drops, ingest lag)
  • LLM health (inference latency, error rate)
  • Resource health (memory, disk, CPU)
  • API health (error rate, latency percentiles)
  • Database health (connection pool, replication lag)