Skip to content
Back to home

§ GOVERNANCE

Incident Response Workflow

From policy violation to closed incident: detection at enforcement time, routing into your SIEM, analyst triage, containment, and tamper-evident evidence — one runbook, end to end.

The principle: agents are production actors, so agent incidents should flow through the same SOC pipeline as every other security signal — your SIEM detects and queues, Execlave is the system of record for what the agent did and the control plane that stops it doing it again.

§ §

The six stages

Detection and routing are automatic. Stages 3–6 are the analyst runbook.

§ 01

Detect — enforcement marks the trace

Every governed call runs through pre-execution enforcement. When a policy fires, the outcome is recorded on the trace as a first-class status — not buried in logs:

StatusMeaningTypical trigger
policy_blockedAction stopped before executionTool allowlist, access control, injection detection
flagged_for_reviewAction ran, marked for human reviewPolicies in flag mode, quality thresholds
limit_exceededBudget or rate cap reachedCost circuit breaker, external-call limits
error / timeoutOperational failureProvider errors, latency

Configure what blocks vs. flags per policy in Policies.

§ 02

Route — the violation reaches your SOC

Traces export asynchronously to your SIEM — the enforcement path is never delayed by delivery. Detection rules in the SIEM turn governance statuses into alerts/incidents in the queue your analysts already work:

  • Splunk — HEC events + scheduled saved searches with alert actions.
  • Microsoft Sentinel — Log Analytics custom table + scheduled analytics rules.
  • OTEL collector — route to any other backend (Datadog, Elastic, Chronicle) from one OTLP stream.

For paging and chat, policy violation alerts also fan out through alert channels (webhook, email) configured on the policy itself.

§ 03

Triage — pivot from SIEM alert to Execlave trace

Every exported event carries trace_id and agent_id. The analyst pivots into the Execlave dashboard and answers the four triage questions in one place:

QuestionWhere
What exactly did the agent try to do?Trace detail — full span timeline, tool inputs, policy verdicts
Who / what is this agent?Agent passport — owner, autonomy level, credentials, version history
Is this part of a pattern?Drift signals + trace search filtered by agent, user, session
Was data touched?Data-access lineage on the trace (sources, fields, classifications)

Severity heuristic: blocked tool call with injection signals from an external user → treat as attempted compromise; repeated flagged_for_review on one workflow → likely policy tuning, not attack.

§ 04

Contain & remediate

Pick the smallest action that stops the behaviour:

  • Pause the agent — lifecycle transition in the registry; enforcement rejects further calls immediately.
  • Tighten the policy — switch flag → block, narrow the tool allowlist, or lower budget caps. Changes apply on the next enforcement call.
  • Require human approval — drop the agent to act_with_approval so risky actions queue for sign-off (approval workflows).
  • Rotate credentials — revoke the agent's API key if compromise is suspected; issue a new one after review.
§ 05

Evidence — export the tamper-evident record

Execlave's audit log is append-only and hash-chained — UPDATE/DELETE are blocked at the database level, so the record your analyst exports is the record that was written. For the incident report, attach:

  • The trace timeline (what happened, with policy verdicts inline).
  • The audit-log entries for the agent over the incident window.
  • The compliance report PDF if the incident feeds a regulatory obligation — serious incidents map to EU AI Act Article 26 duties (see the article-by-article mapping).
§ 06

Close the loop

Before closing the SIEM incident: confirm the policy change is live (re-run the triggering input in staging and verify it blocks), restore the agent's lifecycle state if it was paused, and record the root cause. If the detection rule was noisy, tune the SPL/KQL threshold — both integration pages ship the queries as editable starting points, not fixed product behaviour.