§ DOCUMENTATION
Red-Team Gate
Make adversarial resilience a precondition for autonomy. Run the execlave test probe suite, record a red_team_score, and block autonomous promotion for any agent that scores below 0.9.
Why gate autonomy on resilience
Promoting an agent to the autonomous tier removes the human-approval checkpoints that protect against unintended actions. Without a formal adversarial test, that promotion decision is based entirely on happy-path behavior — the agent has never been systematically probed for injection susceptibility, jailbreak vectors, or exfiltration paths.
The Red-Team Gate makes resilience evidence a hard prerequisite. Before an agent can reach autonomous, it must pass a structured adversarial probe suite and carry a red_team_score of at least 0.9. This turns autonomous promotion from a judgment call into a verifiable, reproducible gate — one that CI can enforce on every code change.
The execlave test suite
execlave test --agent <id> runs the built-in adversarial probe suite against the specified agent. The suite covers three attack categories — prompt injection, jailbreak, and exfiltration-style probes — and computes a resilience score in [0, 1] as the aggregate pass rate across all probes. The score and a per-probe breakdown are printed to stdout.
The command exits with code 0 when the score meets or exceeds the threshold, and 1 when it does not — making it directly usable as a CI gate step. Adding --record persists the score to the agent via PATCH /api/v1/agents/:id/red-team-score.
| Probe category | What it tests |
|---|---|
| Prompt injection | Override of system instructions via user-turn content |
| Jailbreak | Circumvention of policy constraints through indirect framing |
| Exfiltration | Elicitation of sensitive context through indirect prompts |
The 0.9 threshold & promotion enforcement
When FF_REDTEAM_GATE is enabled, the autonomy service checks the agent's red_team_score on every promotion request that targets the autonomous tier. If the score is absent or below 0.9, promotion is blocked with HTTP 403 and a message indicating the gate failed.
| Target tier | Gate behavior (flag ON) |
|---|---|
| supervised | No gate — red_team_score not required |
| operator_assisted | No gate — red_team_score not required |
| autonomous | Blocked (HTTP 403) if score absent or < 0.9; passes if score ≥ 0.9 |
When FF_REDTEAM_GATE is off (default), promotion to autonomous is unaffected — prior behavior is fully preserved.
Recording scores — CLI & API
--record to persist the score; omit it for a dry-run that only affects the exit code. The score is stored on the agent as red_team_score (JSONB — score value plus metadata).# Run probes and record the score to the agent (exits non-zero if score < 0.9)execlave test --agent my-agent --record # Run probes locally without recording (CI pull-request check)execlave test --agent my-agent # Override the local exit-code threshold (does not affect server-side gate)execlave test --agent my-agent --min-score 0.85curl -X PATCH https://api.execlave.com/api/v1/agents/agt_01j.../red-team-score \ -H "Authorization: Bearer $EXECLAVE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "score": 0.94, "metadata": { "probe_suite": "built-in-v1", "run_id": "run_01j...", "recorded_at": "2026-06-02T11:00:00Z" } }'Enabling the gate (FF_REDTEAM_GATE)
The Red-Team Gate is controlled by the feature flag FF_REDTEAM_GATE, which defaults to off. With the flag off, autonomy promotion is completely unchanged from prior behavior. No existing agent workflows are affected by deploying the migration.
Enable the flag once you have run execlave test --record on each agent you intend to promote to autonomous. Agents that have not yet been tested will fail promotion with a clear "score missing" error, prompting you to run the suite before proceeding.