Skip to content
Portada » News » SPRTRevolution_BASE vs Revolution_DEV

SPRTRevolution_BASE vs Revolution_DEV

Revolution_DEV

Here’s a clear, SPRT-style reading of your head-to-head match.

SPRT interpretation — Revolution_BASE vs Revolution_DEV

What the data says

  • Scoreline & volume: 1,960 games played; Revolution_BASE scored 58% vs 42% for Revolution_DEV with an overall draw rate ≈48%.
  • Elo gap (two methods):
    • BayesElo: BASE +25 Elo vs DEV −25 Elo~+50 Elo gap.
    • Ordo: BASE 3757 vs DEV 3700+57 Elo gap; White advantage = 0.00 reported for the pool.

Uncertainty & confidence

  • BayesElo reports ±7 Elo per player; combining errors gives SE(diff) ≈ 9.9 Elo, so a 95% CI for the gap is ~+30 to +70 Elo. That implies an overwhelming probability (>99.99%) that BASE is stronger given this sample (1,960 games).

SPRT verdict (how to read this)

  • In typical engine testing, the SPRT is set up as H₀: DEV is not stronger (≤0 Elo) vs H₁: DEV is stronger by a target (e.g., +10 Elo).
  • Your results show DEV is ~50–57 Elo weaker than BASE. Therefore, a standard SPRT would very quickly accept H₀ and reject H₁ (i.e., no improvement; in fact, a clear regression) at conventional error rates (e.g., α = β = 5%).
  • Put simply: the test decisively says “DEV does not meet the gain target and is significantly worse than BASE.”

Practical takeaway

  • Treat the current DEV as a regression of roughly half a pawn per 100 moves at this time control/pool, given the ~+50–57 Elo deficit. Focus on the latest patches in DEV (evaluation, pruning/LMP/LMR, move ordering, time management) to identify where the strength was lost, then re-run a smaller confirmation SPRT once fixes land.

Quick cross-check (sanity)

  • A 58% score translates to +56 Elo via the standard conversion 400⋅log10(0.58/0.42)400·log₁₀(0.58/0.42), which aligns with both BayesElo (+50) and Ordo (+57). The consistency across tools supports the conclusion.

.here are copy-paste ready Fastchess SPRT commands (Windows) for your typical targets. I’ve used the correct Fastchess syntax for engine options (no initstr—use option.*), included a sane Fishtest-style adjudication + openings setup, and set a large cap so the run auto-stops as soon as SPRT hits a boundary.

Standard patch test — H₀: 0 Elo vs H₁: +10 Elo (α=β=5%)

fastchess.exe ^
 -recover -repeat -games 2 -rounds 2000 -concurrency 4 -output  ^
 -report penta=true -ratinginterval 50 -scoreinterval 50 ^
 -openings file="C:\books\UHO_Lichess_4852_v1.epd" format=epd order=random plies=16 ^
 -resign movecount=3 score=600 -draw movenumber=34 movecount=8 score=20 ^
 -engine name="Revolution_DEV"  cmd="C:\engines\revolution_dev.exe"  dir="C:\engines"  tc=1.0+0.01  option.Hash=32 ^
 -engine name="Revolution_BASE" cmd="C:\engines\revolution_base.exe" dir="C:\engines" tc=1.0+0.01  option.Hash=32 ^
 -each proto=uci option.Threads=1 ^
 -pgnout "C:\tests\sprt_dev_vs_base.pgn" ^
 -sprt elo0=0 elo1=10 alpha=0.05 beta=0.05

Why this layout?

  • option.Threads=1 per engine (repeatable STC), and -concurrency 24 to utilize your 48 logical threads (2 engines × 24 games in parallel).
  • -output cutechess for familiar, tool-friendly logs.
  • -rounds 20000 is just a cap; SPRT will stop early once a boundary is crossed.
    Citations for syntax and options: Fastchess docs/examples for -engine … option.*, -each proto=uci, -openings … format=epd … plies, adjudication flags, etc., match the Stockfish “Running Fastchess” page. official-stockfish.github.io
    -output cutechess is documented in the Fastchess README’s “Enhanced Cutechess Output”. GitHub

Micro-gain test — H₀: 0 Elo vs H₁: +5 Elo (α=β=5%)

Use this when you expect very small improvements and can tolerate longer runs.

fastchess.exe ^
 -recover -repeat -games 2 -rounds 3000 -concurrency 2 -output  ^
 -report penta=true -openings file="C:\books\UHO_Lichess_4852_v1.epd" format=epd order=random plies=16 ^
 -resign movecount=3 score=600 -draw movenumber=34 movecount=8 score=20 ^
 -engine name="Revolution_DEV"  cmd="C:\engines\revolution_dev.exe"  dir="C:\engines"  tc=1.0+0.01  option.Hash=32 ^
 -engine name="Revolution_BASE" cmd="C:\engines\revolution_base.exe" dir="C:\engines" tc=1.0+0.01  option.Hash=32 ^
 -each proto=uci option.Threads=1 ^
 -pgnout "C:\tests\sprt_dev_vs_base_5elo.pgn" ^
 -sprt elo0=0 elo1=5 alpha=0.05 beta=0.05

Quick regression check (blitzier) — H₀: 0 vs H₁: +10 Elo at 10+0.1

fastchess.exe ^
 -recover -repeat -games 2 -rounds 1500 -concurrency 2 -output ^
 -report penta=true -openings file="C:\books\UHO_Lichess_4852_v1.epd" format=epd order=random plies=16 ^
 -resign movecount=3 score=600 -draw movenumber=34 movecount=8 score=20 ^
 -engine name="Revolution_DEV"  cmd="C:\engines\revolution_dev.exe"  dir="C:\engines"  tc=10+0.1 option.Hash=64 ^
 -engine name="Revolution_BASE" cmd="C:\engines\revolution_base.exe" dir="C:\engines" tc=10+0.1 option.Hash=64 ^
 -each proto=uci option.Threads=1 ^
 -pgnout "C:\tests\sprt_dev_vs_base_10s.pgn" ^
 -sprt elo0=0 elo1=10 alpha=0.05 beta=0.05

Notes & small gotchas

  • Don’t use initstr. Fastchess sets UCI options via option.<Name>=<Value> either per -engine or under -each (e.g., option.Hash=32, option.Threads=1). That’s the supported path and matches the official example. official-stockfish.github.io
  • Openings input: -openings file=… format=epd|pgn order=random plies=N is supported; the Stockfish page shows a working template. official-stockfish.github.io
  • Resume a stopped run: Fastchess saves state (config.json)—you can resume with -config file=config.json. Handy if Windows reboots mid-test. dogeystamp.com

Jorge Ruiz

Jorge Ruiz

connoisseur of both chess and anthropology, a combination that reflects his deep intellectual curiosity and passion for understanding both the art of strategic chess books

Leave a Reply

Your email address will not be published. Required fields are marked *

Share via