Skip to content
Back to home

§ DOCUMENTATION

Red-Team Gate

Make adversarial resilience a precondition for autonomy. Run the execlave test probe suite, record a red_team_score, and block autonomous promotion for any agent that scores below 0.9.

§ 01

Why gate autonomy on resilience

Promoting an agent to the autonomous tier removes the human-approval checkpoints that protect against unintended actions. Without a formal adversarial test, that promotion decision is based entirely on happy-path behavior — the agent has never been systematically probed for injection susceptibility, jailbreak vectors, or exfiltration paths.

The Red-Team Gate makes resilience evidence a hard prerequisite. Before an agent can reach autonomous, it must pass a structured adversarial probe suite and carry a red_team_score of at least 0.9. This turns autonomous promotion from a judgment call into a verifiable, reproducible gate — one that CI can enforce on every code change.

§ 02

The execlave test suite

execlave test --agent <id> runs the built-in adversarial probe suite against the specified agent. The suite covers three attack categories — prompt injection, jailbreak, and exfiltration-style probes — and computes a resilience score in [0, 1] as the aggregate pass rate across all probes. The score and a per-probe breakdown are printed to stdout.

The command exits with code 0 when the score meets or exceeds the threshold, and 1 when it does not — making it directly usable as a CI gate step. Adding --record persists the score to the agent via PATCH /api/v1/agents/:id/red-team-score.

Probe categoryWhat it tests
Prompt injectionOverride of system instructions via user-turn content
JailbreakCircumvention of policy constraints through indirect framing
ExfiltrationElicitation of sensitive context through indirect prompts
§ 03

The 0.9 threshold & promotion enforcement

When FF_REDTEAM_GATE is enabled, the autonomy service checks the agent's red_team_score on every promotion request that targets the autonomous tier. If the score is absent or below 0.9, promotion is blocked with HTTP 403 and a message indicating the gate failed.

Target tierGate behavior (flag ON)
supervisedNo gate — red_team_score not required
operator_assistedNo gate — red_team_score not required
autonomousBlocked (HTTP 403) if score absent or < 0.9; passes if score ≥ 0.9

When FF_REDTEAM_GATE is off (default), promotion to autonomous is unaffected — prior behavior is fully preserved.

§ 04

Recording scores — CLI & API

Run the probe suite and record the result with a single CLI command. Use --record to persist the score; omit it for a dry-run that only affects the exit code. The score is stored on the agent as red_team_score (JSONB — score value plus metadata).
# Run probes and record the score to the agent (exits non-zero if score < 0.9)execlave test --agent my-agent --record # Run probes locally without recording (CI pull-request check)execlave test --agent my-agent # Override the local exit-code threshold (does not affect server-side gate)execlave test --agent my-agent --min-score 0.85
To record a score directly — for example from an external red-team tool — use the PATCH endpoint:
curl -X PATCH https://api.execlave.com/api/v1/agents/agt_01j.../red-team-score \  -H "Authorization: Bearer $EXECLAVE_API_KEY" \  -H "Content-Type: application/json" \  -d '{    "score": 0.94,    "metadata": {      "probe_suite": "built-in-v1",      "run_id": "run_01j...",      "recorded_at": "2026-06-02T11:00:00Z"    }  }'
§ 05

Enabling the gate (FF_REDTEAM_GATE)

The Red-Team Gate is controlled by the feature flag FF_REDTEAM_GATE, which defaults to off. With the flag off, autonomy promotion is completely unchanged from prior behavior. No existing agent workflows are affected by deploying the migration.

Enable the flag once you have run execlave test --record on each agent you intend to promote to autonomous. Agents that have not yet been tested will fail promotion with a clear "score missing" error, prompting you to run the suite before proceeding.

§ 06

Frequently asked questions

What probes does the execlave test suite run?
The built-in suite covers multiple adversarial probe categories: prompt injection attacks (attempts to override system instructions via user-turn content), jailbreak probes (attempts to circumvent policy constraints through indirect framing), and exfiltration-style probes (attempts to surface sensitive context through indirect elicitation). Each probe is scored independently; the overall resilience score is the aggregate pass rate across all probes in [0, 1].
Can I run the test suite in CI without recording the score?
Yes. Omit the --record flag to run the full probe suite, print the score, and exit with a non-zero code if the score is below threshold — without persisting anything to the agent record. This is the recommended pattern for pull-request checks. Add --record only when you intend to advance the agent toward autonomous promotion.
What happens when FF_REDTEAM_GATE is off?
When the feature flag FF_REDTEAM_GATE is disabled (the default), the autonomy promotion path is unchanged from prior behavior — the red_team_score field is ignored entirely. Agents already carrying a recorded score retain it; the value simply has no effect on promotion gating until the flag is enabled.
Can I set a custom resilience threshold lower than 0.9?
The 0.9 threshold is the system default enforced by the autonomy service. If your deployment requires a different value, it can be adjusted via environment configuration. The CLI --min-score flag lets you override the exit-code threshold locally for development testing without changing the server-side promotion gate.