Inline web defense (Envoy ext_authz)

What this page is

The inline enforcement point that sits in front of HTTP traffic and blocks web attacks before they reach the application: an Envoy ext_authz filter calling the AuroraSOC backend, the layered decision engine behind it, and the runtime controls (fail mode, verdict cache, rate limiting, reputation) that make it a production edge defense.

Why it exists this way

The Web Security Agent and the WAF MCP tools could analyze WAF and proxy logs after the fact, but nothing sat in front of HTTP traffic to block an attack before it reached the app. Detection without prevention is half of "protect any organization". ADR 032 adds the inline topology and proves it against a live target; ADR 039 hardens it for production.

How it works

Topology

An Envoy front proxy (infra/envoy/envoy.yaml, with a host-mode variant envoy.host.yaml) runs the ext_authz HTTP filter with request-body inspection against the backend before forwarding upstream. A 200 allows; a 403 blocks. The demo target is OWASP Juice Shop on an isolated network (infra/compose/docker-compose.webtarget.yml), reachable only through the proxy.

Decision engine

services/web_defense.py is a layered engine:

a fast deterministic OWASP signature layer (SQLi, XSS, path traversal, OS command injection, SSRF, LFI, NoSQL/LDAP) that blocks the unambiguous classes with zero LLM latency and works with no model reachable;
an optional AI-escalation hook (services/web_defense_ai.py) that asks the Web Security Agent to adjudicate requests flagged suspicious-but-not-certain, converting its strict-JSON verdict to a block when severity_score >= 6.

Endpoint

api/routers/ext_authz.py exposes /ext-authz/check and the /check/{appended:path} form Envoy's HTTP ext_authz uses. It takes no auth (internal network only) and emits a best-effort blocked-attack alert into the SOC pipeline.

Runtime controls (ADR 039)

services/web_defense_runtime.py wraps the pure engine and is wired into the endpoint:

Fail mode. WEB_DEFENSE_FAIL_MODE is open by default (favor availability of the protected app) or closed for high-security deployments. On an internal engine error the endpoint honors this mode rather than always allowing.
Verdict cache. A TTL cache keyed by request shape so repeated identical requests skip re-evaluation and any AI escalation cost.
Rate limiting. A per-client sliding-window limiter throttles abusive sources (HTTP 429) before inspection cost is incurred.
Reputation. Blocked-attack counts per client over a window flag abusive clients for escalation.

All controls are in-memory and time-injectable for deterministic tests; a multi-instance deployment can back them with Redis through the same interface. The AI classifier is enabled at startup by wiring the Web Security Agent runner via set_ai_classifier.

Verifying the path

tests/security/web_attacks/ fires 17 representative OWASP payloads and asserts every one is blocked, plus 7 benign requests that must pass (no over-blocking), plus endpoint-contract and AI-escalation tests, all with no network and no LLM, gating every CI run. The live path (payload to Envoy to 403) is exercised by bringing up the webtarget overlay.

What goes wrong

A protected app goes down under an engine defect: the default fail-open posture should prevent this. If it is blocking, check whether WEB_DEFENSE_FAIL_MODE=closed was set.
AI escalation is slow under load: identical request shapes are cached, but the first of each shape pays the agent latency. Confirm the cache TTL and that set_ai_classifier is wired only where the latency is acceptable.

What this page is​

Why it exists this way​

How it works​

Topology​

Decision engine​

Endpoint​

Runtime controls (ADR 039)​

Verifying the path​

What goes wrong​