Great run — big sample, nearly clean pairing.
TL;DR (4000 games, 10+0.1, 1t, 64 MB, UHO_Lichess_4852_v1.epd)
- Score: 50.65% (2026/4000)
- Elo: +4.52 ± 5.39 (small gain; statistically compatible with a very small edge)
- LOS: 94.97% (decent confidence DEV ≥ BASE, not a lock)
- LLR: +0.64 (well short of the +2.94 accept boundary for H₁=+2.5)
- Draw ratio: 55.50%
- PairsRatio: 1.10 (much better; still not perfect 1.00)
What it means
- On this book/hash, Revolution 2.81 shows a small, likely real advantage (~+5 Elo) over baseline 2.80, but the evidence isn’t strong enough to pass a strict SPRT with H₁=+2.5 (the LLR remains far from +2.94).
- The very large sample stabilizes variance; your ±5.4 error bar is tight. With mean ≈ +4.5, the improvement is plausible but modest.
Quick recommendations
- Tighten pairing further (aim ≈1.00): run a verification block with
-concurrency 1for 400–800 games. If PairsRatio → ~1.00 and Elo stays ~+4–6, that strengthens the claim. - Keep protocol frozen: same NNUE/weights for both,
Threads=1,Hash=64,Ponder=false,MultiPV=1, experience OFF (or same.expread-only), fixed-srand. - Decide on your “release number”: for GitHub notes, describe this as “~+5 Elo at 10+0.1 / 1t / 64 MB (Lichess-4852 EPD)” and call out that gains are book/TC dependent.
- If you want a formal SPRT decision: either (a) keep accumulating with clean pairing, or (b) relax H₁ slightly (e.g., +1.5/+2.0) if the product goal is to detect smaller effects.
Revolution 2.81 scores 50.65%
Revolution 2.81 vs 2.80 (10+0.1, 1 thread, 64 MB, UHO_Lichess_4852_v1.epd).
Over 4,000 games, Revolution 2.81 scores 50.65%, corresponding to +4.5 Elo (±5.4) with LOS 95%. Pairing quality is high (PairsRatio 1.10). This indicates a small, reproducible strength increase on this test matrix, suitable for CI/gauntlet use. As usual, improvements are book- and TC-dependent; results may vary on other suites or time controls.

Jorge Ruiz
connoisseur of both chess and anthropology, a combination that reflects his deep intellectual curiosity and passion for understanding both the art of strategic chess books.
