Inline web defense (Envoy ext_authz)
What this page is
The inline enforcement point that sits in front of HTTP traffic and
blocks web attacks before they reach the application: an Envoy
ext_authz filter calling the AuroraSOC backend, the layered decision
engine behind it, and the runtime controls (fail mode, verdict cache,
rate limiting, reputation) that make it a production edge defense.
Why it exists this way
The Web Security Agent and the WAF MCP tools could analyze WAF and proxy logs after the fact, but nothing sat in front of HTTP traffic to block an attack before it reached the app. Detection without prevention is half of "protect any organization". ADR 032 adds the inline topology and proves it against a live target; ADR 039 hardens it for production.
How it works
Topology
An Envoy front proxy
(infra/envoy/envoy.yaml,
with a host-mode variant envoy.host.yaml) runs the ext_authz HTTP
filter with request-body inspection against the backend before
forwarding upstream. A 200 allows; a 403 blocks. The demo target is
OWASP Juice Shop on an isolated network
(infra/compose/docker-compose.webtarget.yml), reachable only through
the proxy.
Decision engine
services/web_defense.py is a layered engine:
- a fast deterministic OWASP signature layer (SQLi, XSS, path traversal, OS command injection, SSRF, LFI, NoSQL/LDAP) that blocks the unambiguous classes with zero LLM latency and works with no model reachable;
- an optional AI-escalation hook (
services/web_defense_ai.py) that asks the Web Security Agent to adjudicate requests flagged suspicious-but-not-certain, converting its strict-JSON verdict to a block whenseverity_score >= 6.
Endpoint
api/routers/ext_authz.py
exposes /ext-authz/check and the /check/{appended:path} form
Envoy's HTTP ext_authz uses. It takes no auth (internal network only)
and emits a best-effort blocked-attack alert into the SOC pipeline.
Runtime controls (ADR 039)
services/web_defense_runtime.py wraps the pure engine and is wired
into the endpoint:
- Fail mode.
WEB_DEFENSE_FAIL_MODEisopenby default (favor availability of the protected app) orclosedfor high-security deployments. On an internal engine error the endpoint honors this mode rather than always allowing. - Verdict cache. A TTL cache keyed by request shape so repeated identical requests skip re-evaluation and any AI escalation cost.
- Rate limiting. A per-client sliding-window limiter throttles abusive sources (HTTP 429) before inspection cost is incurred.
- Reputation. Blocked-attack counts per client over a window flag abusive clients for escalation.
All controls are in-memory and time-injectable for deterministic tests;
a multi-instance deployment can back them with Redis through the same
interface. The AI classifier is enabled at startup by wiring the Web
Security Agent runner via set_ai_classifier.
Verifying the path
tests/security/web_attacks/ fires 17 representative OWASP payloads
and asserts every one is blocked, plus 7 benign requests that must pass
(no over-blocking), plus endpoint-contract and AI-escalation tests, all
with no network and no LLM, gating every CI run. The live path
(payload to Envoy to 403) is exercised by bringing up the webtarget
overlay.
What goes wrong
- A protected app goes down under an engine defect: the default
fail-open posture should prevent this. If it is blocking, check
whether
WEB_DEFENSE_FAIL_MODE=closedwas set. - AI escalation is slow under load: identical request shapes are cached,
but the first of each shape pays the agent latency. Confirm the cache
TTL and that
set_ai_classifieris wired only where the latency is acceptable.