Skip to main content

AI Assistant Safety

The AuroraSOC AI Assistant (the chat in the operator console) is hardened against adversarial use so it stays a safe analyst aid rather than an attack surface.

What the assistant will and will not do

  • It knows the current date and time, so it dates answers and generated reports correctly instead of guessing.
  • It refuses to reveal or change its system instructions, refuses to change its role or identity, and will not output API keys, tokens, or other secrets, even if asked.
  • When you paste logs, alerts, or tool output, any "instructions" hidden inside that text are treated as data to analyze, never as commands. This blocks indirect prompt injection from attacker-controlled content.
  • It grounds answers in your live alerts and cases and cites real ids where relevant.

Protections in place

  • Input sanitization and fencing. Pasted and database content is cleaned (control, zero-width, and bidi characters removed) and wrapped in non-forgeable untrusted-data fences (ADR 036).
  • Output filtering. Responses are scanned and any secret-shaped strings or system-prompt leakage are redacted before they reach you.
  • Rate limiting. The chat is rate limited per user to control cost and abuse. If you send too many messages too quickly you will see a "slow down" notice (HTTP 429).
  • Auditability. Suspected injection attempts are logged for review.

Good practice

  • Treat AI output as decision support, not ground truth. Verify before acting, and note that containment or destructive actions still require human approval.
  • Do not paste credentials or secrets into the chat.

For the engineering detail of these controls, see the developer page on Security and autonomy hardening.