Here’s a clear, SPRT-style reading of your head-to-head match.
SPRT interpretation — Revolution_BASE vs Revolution_DEV
What the data says
- Scoreline & volume: 1,960 games played; Revolution_BASE scored 58% vs 42% for Revolution_DEV with an overall draw rate ≈48%.
- Elo gap (two methods):
- BayesElo: BASE +25 Elo vs DEV −25 Elo → ~+50 Elo gap.
- Ordo: BASE 3757 vs DEV 3700 → +57 Elo gap; White advantage = 0.00 reported for the pool.
Uncertainty & confidence
- BayesElo reports ±7 Elo per player; combining errors gives SE(diff) ≈ 9.9 Elo, so a 95% CI for the gap is ~+30 to +70 Elo. That implies an overwhelming probability (>99.99%) that BASE is stronger given this sample (1,960 games).
SPRT verdict (how to read this)
- In typical engine testing, the SPRT is set up as H₀: DEV is not stronger (≤0 Elo) vs H₁: DEV is stronger by a target (e.g., +10 Elo).
- Your results show DEV is ~50–57 Elo weaker than BASE. Therefore, a standard SPRT would very quickly accept H₀ and reject H₁ (i.e., no improvement; in fact, a clear regression) at conventional error rates (e.g., α = β = 5%).
- Put simply: the test decisively says “DEV does not meet the gain target and is significantly worse than BASE.”
Practical takeaway
- Treat the current DEV as a regression of roughly half a pawn per 100 moves at this time control/pool, given the ~+50–57 Elo deficit. Focus on the latest patches in DEV (evaluation, pruning/LMP/LMR, move ordering, time management) to identify where the strength was lost, then re-run a smaller confirmation SPRT once fixes land.
Quick cross-check (sanity)
- A 58% score translates to
+56 Elo via the standard conversion 400⋅log10(0.58/0.42)400·log₁₀(0.58/0.42), which aligns with both BayesElo (+50) and Ordo (+57). The consistency across tools supports the conclusion.
.here are copy-paste ready Fastchess SPRT commands (Windows) for your typical targets. I’ve used the correct Fastchess syntax for engine options (no initstr
—use option.*
), included a sane Fishtest-style adjudication + openings setup, and set a large cap so the run auto-stops as soon as SPRT hits a boundary.
Standard patch test — H₀: 0 Elo vs H₁: +10 Elo (α=β=5%)
fastchess.exe ^
-recover -repeat -games 2 -rounds 2000 -concurrency 4 -output ^
-report penta=true -ratinginterval 50 -scoreinterval 50 ^
-openings file="C:\books\UHO_Lichess_4852_v1.epd" format=epd order=random plies=16 ^
-resign movecount=3 score=600 -draw movenumber=34 movecount=8 score=20 ^
-engine name="Revolution_DEV" cmd="C:\engines\revolution_dev.exe" dir="C:\engines" tc=1.0+0.01 option.Hash=32 ^
-engine name="Revolution_BASE" cmd="C:\engines\revolution_base.exe" dir="C:\engines" tc=1.0+0.01 option.Hash=32 ^
-each proto=uci option.Threads=1 ^
-pgnout "C:\tests\sprt_dev_vs_base.pgn" ^
-sprt elo0=0 elo1=10 alpha=0.05 beta=0.05
Why this layout?
option.Threads=1
per engine (repeatable STC), and-concurrency 24
to utilize your 48 logical threads (2 engines × 24 games in parallel).-output cutechess
for familiar, tool-friendly logs.-rounds 20000
is just a cap; SPRT will stop early once a boundary is crossed.
Citations for syntax and options: Fastchess docs/examples for-engine … option.*
,-each proto=uci
,-openings … format=epd … plies
, adjudication flags, etc., match the Stockfish “Running Fastchess” page. official-stockfish.github.io-output cutechess
is documented in the Fastchess README’s “Enhanced Cutechess Output”. GitHub
Micro-gain test — H₀: 0 Elo vs H₁: +5 Elo (α=β=5%)
Use this when you expect very small improvements and can tolerate longer runs.
fastchess.exe ^
-recover -repeat -games 2 -rounds 3000 -concurrency 2 -output ^
-report penta=true -openings file="C:\books\UHO_Lichess_4852_v1.epd" format=epd order=random plies=16 ^
-resign movecount=3 score=600 -draw movenumber=34 movecount=8 score=20 ^
-engine name="Revolution_DEV" cmd="C:\engines\revolution_dev.exe" dir="C:\engines" tc=1.0+0.01 option.Hash=32 ^
-engine name="Revolution_BASE" cmd="C:\engines\revolution_base.exe" dir="C:\engines" tc=1.0+0.01 option.Hash=32 ^
-each proto=uci option.Threads=1 ^
-pgnout "C:\tests\sprt_dev_vs_base_5elo.pgn" ^
-sprt elo0=0 elo1=5 alpha=0.05 beta=0.05
Quick regression check (blitzier) — H₀: 0 vs H₁: +10 Elo at 10+0.1
fastchess.exe ^
-recover -repeat -games 2 -rounds 1500 -concurrency 2 -output ^
-report penta=true -openings file="C:\books\UHO_Lichess_4852_v1.epd" format=epd order=random plies=16 ^
-resign movecount=3 score=600 -draw movenumber=34 movecount=8 score=20 ^
-engine name="Revolution_DEV" cmd="C:\engines\revolution_dev.exe" dir="C:\engines" tc=10+0.1 option.Hash=64 ^
-engine name="Revolution_BASE" cmd="C:\engines\revolution_base.exe" dir="C:\engines" tc=10+0.1 option.Hash=64 ^
-each proto=uci option.Threads=1 ^
-pgnout "C:\tests\sprt_dev_vs_base_10s.pgn" ^
-sprt elo0=0 elo1=10 alpha=0.05 beta=0.05
Notes & small gotchas
- Don’t use
initstr
. Fastchess sets UCI options viaoption.<Name>=<Value>
either per-engine
or under-each
(e.g.,option.Hash=32
,option.Threads=1
). That’s the supported path and matches the official example. official-stockfish.github.io - Openings input:
-openings file=… format=epd|pgn order=random plies=N
is supported; the Stockfish page shows a working template. official-stockfish.github.io - Resume a stopped run: Fastchess saves state (
config.json
)—you can resume with-config file=config.json
. Handy if Windows reboots mid-test. dogeystamp.com

Jorge Ruiz
connoisseur of both chess and anthropology, a combination that reflects his deep intellectual curiosity and passion for understanding both the art of strategic chess books