Benchmarks

Benchmark reporting is meant to show discipline, not benchmark puffery. Public numbers are directional and should be reproduced on your own hardware and workload before any decision.

Public benchmark snapshot

5 canonical scenescontact, constraints, vehicles, cloth, and fluid lanes

20% regression gatecurrent conservative CI threshold

CSV / JSON / HTMLartifact trail published for review

Reference machine notedi7-12700K + RTX 3080 in the package overview

Canonical scene sweep

Scene	Scale	Avg step	Max step	Why it matters
Sphere Stack (10K)	10,000 bodies	5.234 ms	6.123 ms	contact stress
Ragdoll Stack	100 ragdolls / 500 bodies	3.891 ms	4.234 ms	constraint complexity
Vehicle Scene	10 vehicles / 50 bodies	2.456 ms	2.890 ms	mixed constraints + friction
Cloth + Wind	500–1000 particles	0.500 ms	0.700 ms	soft-body deformation lane
Fluid Spray	10,000+ particles	1.200 ms	1.500 ms	particle persistence

Public figures are taken from the reference package snapshot and should be re-run on the target buyer workload before any decision.

Artifact trail

Baseline file: proof/EXPECTED_OUTPUTS/world_step_benchmark_baseline.json
Trend history: proof/RUNS/world_step_benchmark_history.csv
Trend snapshots: proof/RUNS/world_step_benchmark_trend_<timestamp>.json
Current policy: conservative 20% regression threshold until CI variance is better characterized

How the page should be read

What is measured: world-step cost, scene scaling, and subsystem behavior across named workloads.
What is not claimed: universal performance leadership across every machine, build mode, and scene type.
Why this matters: the key buyer signal is repeatability, named scenes, stored artifacts, and a visible regression policy.

Next review steps

Proof Index Benchmark visual Benchmark notes Request evaluation pack