Federation mesh
Demo-grade but real inter-site federation (ADR 029): NATS leafnodes at the transport layer, health gossip and severity-gated alert sharing on top. The full federation controller (policy bundles, cross-site case handoff, federated search) remains deferred per the architecture document.
Transport
The primary site's NATS exposes a leafnode listener
(infra/nats/nats-server.conf, port 7422, TLS); peer sites run
their own NATS attached as leafnodes (infra/nats/nats-leaf.conf).
Subject interest propagates across the link, so aurora.> traffic
published at any site reaches every peer while each side keeps an
independent JetStream (distinct domain: values stop cross-site JS
API capture). Plaintext dev configs for the local two-site demo
live at infra/nats/dev-hub.conf / dev-leaf.conf.
Code map
| Piece | Where |
|---|---|
Settings (FEDERATION_*) | aurorasoc/config/settings/messaging.py (FederationSettings) |
| Core logic | aurorasoc/services/federation.py |
| Worker loops | aurorasoc/workers/federation_worker.py |
| NATS subjects + clients | aurorasoc/events/nats_jetstream.py |
| Lifecycle | API lifespan in aurorasoc/api/main.py (only when enabled) |
| Tests | tests/backend/test_federation.py |
Health gossip
run_gossip_loop publishes build_health_payload() to
aurora.federation.health.<site_id> every
heartbeat_interval_seconds (15s default), refreshes the local
SiteModel row, and runs mark_stale_links so links whose peers
went silent demote to degraded and then down
(thresholds: 45s / 120s by default).
run_health_listener consumes peer gossip and upserts the remote
SiteModel plus the undirected SiteLinkModel
(link-<a>-<b>, lexicographic). Link status derives from heartbeat
age; latency is approximated from gossip propagation age - honest on
one host, clock-skew-sensitive across real WANs (echo-based
measurement is the follow-up when the mesh leaves demo stage).
/api/v1/sites and /api/v1/system/topology serve this state with
no contract change; the operator console's SOC Site Topology map
renders it directly.
Alert federation
The alert create path calls should_federate(severity) - at or
above federate_min_severity (default high) the alert is stamped
with its origin (build_federated_alert) and published best-effort
to aurora.alerts.federation.<severity>; the local write never
blocks on the mesh.
run_alert_listener persists peer alerts via ingest_remote_alert:
own-origin echoes are dropped, replays dedup on a hash of
(origin_site, origin_alert_id), IOCs normalize to the canonical
dict shape, and the row lands with source=federation:<origin> plus
origin metadata - which the alert queue renders as the purple origin
badge.
Running two sites locally
just stack-up # primary stack (set FEDERATION_ENABLED=true in .env)
just stack-up-site-b # second site: own Postgres/Redis + NATS leaf, API on :8002
just migrate-site-b
tools/scripts/demo/attack_simulator.py --api-base http://localhost:8002 --scenario c2-beacon raises a critical alert at site B; within a
heartbeat it appears in the primary site's queue with the origin
badge, and the topology map shows the live healthy link.