SPRT Result — wordfish_dev_070925_v2.0.1 vs Revolution (10+0.1)
Status: Strong Regression
Test WF_dev_070925_v2.0.1_vs_REV_10+0.1_run2_784 using UHO_2024_8mvs_+085_+094.pgn, α=β=0.05 (Wald bounds A≈-2.944, B≈+2.944).
Sequential evidence: LLR = -2.63.
| Metric | Value |
|---|---|
| Engines | wordfish_dev_070925_v2.0.1 vs Revolution |
| Time control | 10+0.1 (1 thread) |
| Games | 784 |
| Score | 343.5 / 784 (43.81 %) |
| W / L / D | 145 / 242 / 397 |
| Draw ratio | 50.64 % |
| Elo (±) | -43.21 ± 12.39 |
| nElo (±) | -85.75 ± 24.32 |
| LOS | 0.00 % |
| LLR / bounds | -2.63 (A=-2.944, B=+2.944) |
| Pentanomial | [11, 124, 210, 45, 2] |
Recommendation:
Given the strong negative signal (Elo ≈ -43, LOS=0%), you can stop the test now for engineering purposes.
If you need a formal SPRT outcome, continue a little longer until LLR ≤ A (≈ -2.944).
For faster future screens, consider α=β=0.10 or a negative-effect H₁ (e.g., 0↔−5 Elo) when you suspect regressions.
