§ BENCHMARKS
Enforcement latency, measured honestly
Governance only works if it is fast enough to sit in the request path. This page documents how we measure Execlave's enforcement latency and throughput — and publishes the numbers once they are measured under this methodology.
The pattern-tier and server-side enforce figures below are measured by checked-in harnesses (see methodology). The semantic tier and scaled throughput are marked not yet published — we publish only figures measured under the methodology on this page, never estimates or marketing round-numbers.
§ RESULTS
Pattern tier — measured
In-process policy evaluation latency (parse-cached CEL), excluding network and database time.
§ RESULTS
Server-side enforce decision — measured
The real PolicyService.enforcePreExecution path: agent-status lookup + policy load + evaluation, DB-inclusive, against a local Postgres. Excludes HTTP/auth framing and the model-backed semantic tier.
§ RESULTS
Semantic tier & scaled throughput — pending
Dominated by the chosen LLM (semantic tier) and worker fan-out (throughput). Measured on representative infrastructure before publication — never estimated.
§ METHODOLOGY
How we measure
So the published numbers are reproducible and comparable across releases.
What the measured figures cover
The published numbers are the in-process pattern tier: the cost of evaluating a trace against a set of policies in the enforcement engine (parse-cached CEL expression evaluation), excluding network and database time. This is the part of enforcement latency Execlave controls directly; it is reported as p50/p95/p99, not an average, because tail latency is what governance SLAs care about.
Measurement provenance
Measured on Node.js v22.20.0, single process, one core, caches warmed. Each "pass" evaluates a representative set of 5 expression policies (cost/size/environment comparisons) against one trace, over 200,000 iterations after a 20,000-iteration warmup. The harness is checked in at backend/scripts/bench-enforcement.ts and calls the same evaluateExpression path the runtime uses — re-run it to reproduce these figures on your own hardware.
End-to-end measurement
The server-side figures drive the real PolicyService.enforcePreExecution path against a live Postgres — agent-status lookup, policy load, and expression evaluation, every database round-trip the SDK enforce endpoint makes, under the least-privilege app role with row-level security on. Measured on Node.js v22.20.0 against a local Postgres (sub-millisecond network), 5,000 calls after a 500-call warmup; the harness is checked in at backend/scripts/bench-enforce-e2e.ts. It excludes the HTTP/auth framing (sub-millisecond, not "enforcement") and the model-backed semantic tier. Production database round-trips are slower than localhost, so treat these as a floor, not a ceiling.
What is NOT in these numbers
The semantic tier (model-backed checks via the Python processing service) depends on the chosen LLM and is not yet published. Horizontally-scaled throughput (traces/sec across workers) is likewise not yet published. We do not blend the microsecond evaluation, the millisecond server-side decision, and the model-backed tier into one headline figure, and we do not estimate the tiers we have not measured.
Why per-tier, never blended
Pattern-tier (in-process, microsecond-scale) and semantic-tier (model-backed, millisecond-to-second-scale) enforcement differ by orders of magnitude. Collapsing them into a single "enforcement latency" number would be misleading, so each tier is reported separately and labelled with exactly what it includes.