Test SPRT revolution 2.0 versus baseline

Table of Contents

Test Summary

Time control: 10+0.1 (rapid with small increment)
Engines tested: Revolution (baseline) vs Revolution1 (new version)
Games played: 2178
Book: UHO_2024_8mvs_+085_+094.pgn (balanced/unbiased openings, up to 8 moves)
Hash: 32 MB (kept deliberately small for throughput)
Threads: 1

✅ Interpreting Results

Score %: 47.54% for Revolution (baseline) → means Revolution1 scored higher (52.46%).
What difference:
- -17.08 ± 7.97 → Revolution1 is about +17 Elo stronger (statistically).
- nElo -31.33 ± 14.59 → normalized Elo suggests an even wider gap, ~+31 Elo for Revolution1.
LOS (Likelihood of Superiority): 0.00% → from the chosen perspective, Revolution did not outperform; Revolution1 was consistently better.
Draw ratio: 49.59% — fairly typical for this TC and book.
Win/Loss ratio: 483W / 590L → Revolution1 wins ~22% more decisive games than it loses.
Pentanomial vector: [26, 294, 540, 219, 10] → distribution of outcomes by colour and result.
LLR (Log-Likelihood Ratio): -1.16 → indicates the SPRT test would reject the hypothesis that Revolution is stronger; in fact, Revolution1 is favoured.

🧾 Bottom Line

Revolution1 gained around +20 to +30 Elo over Revolution in this test set.
With 2178 games, the confidence interval is already quite tight (±8 Elo).
The LOS of 0% means we can confidently say Revolution1 is not weaker—it is stronger.

I’ll use the standard rule that the Elo uncertainty (σ) falls as1/N1/\sqrt{N}1/Nmaintaining the same TC, book and draw rate.

Current data:

N0=2178N_0 = 2178N0=2178matches
Error actual: s0=7.97\sigma_0 = 7.97s0=7.97How much

Projection formula: Naim=N0(s0saim)2N_{\text{objetivo}} = N_0 \left(\frac{\sigma_0}{\sigma_{\text{objetivo}}}\right)^2Naim=N0(saims0)2

Case A) You want ±5 Elo as 1σ

NA=2178(7.975)2≈5534 partidasN_A = 2178 \left(\frac{7.97}{5}\right)^2 \approx \mathbf{5534\ partidas}NA=2178(57.97)2≈5534 matches

Additional items ≈ 5534 − 2178 = 3356 .

Case B) You want ±5 Elo at 95%

(95% ≈ ±1.96s \pm 1.96\sigma± 1.96 s ⇒ saim=5/1.96≈2.55\sigma_{\text{objective}} = 5/1.96 \approx 2.55saim=5/1.96≈2.55) NB=2178(7.972.55)2≈21259 partidasN_B = 2178 \left(\frac{7.97}{2.55}\right)^2 \approx \mathbf{21259\ partidas}NB=2178(2.557.97)2≈21259 matches

Additional items ≈ 21259 − 2178 = 19081 .

Quick Notes

These figures assume the same conditions (10+0.1, 1 thread, 32 MB, same book and draw ratio ~50%).
If you raise the draw rate, the effective variance may increase a little, but the rule of thumb is1/N1/\sqrt{N}1/NIt is still a good guide for planning the stop.

Download Revolution 2.0