Skip to content
Portada » SPRT TESTING UHO 8mvs_big_+80_+109.pgn 60s+0.1

SPRT TESTING UHO 8mvs_big_+80_+109.pgn 60s+0.1

Overall score: Wordfish 1.0 dev 260825 vs Wordfish base: 3649 - 3582 - 2769 [0.503] 10000 games
Wordfish 1.0 dev playing White: 2569 - 1033 - 1398 [0.654] 5000 games
Wordfish 1.0 dev playing Black: 1080 - 2549 - 1371 [0.353] 5000 games
White vs Black: 5118 - 2113 - 2769 [0.650] 10000 games
Elo difference: 2.3 +/- 5.8, LOS: 78.5 %, DrawRatio: 27.7 %
SPRT: llr 0.309 (10.5%), lbound -2.94, ubound 2.94

Step 1: Understanding the numbers

  1. Overall score (3649-3582-2769) corresponds to wins-draws-losses for Wordfish 1.0 dev vs Wordfish base across 10,000 games.
    • Wins: 3649
    • Draws: 3582
    • Losses: 2769
    • Score fraction: 0.503 → slightly above 50%, meaning the dev version performs slightly better.
  2. Colour breakdown:
    • White: 2569 wins, 1033 draws, 1398 losses → score fraction 0.654 (dominating when playing White).
    • Black: 1080 wins, 2549 draws, 1371 losses → score fraction 0.353 (weaker when playing Black).
  3. White vs Black combined: 5118-2113-2769 → score fraction 0.650 over all games separating colour.
  4. SPRT (Sequential Probability Ratio Test) data:
    • llr = 0.309 → log-likelihood ratio is small, well within bounds.
    • lbound = -2.94, ubound = 2.94 → test is inconclusive at strict confidence level, but shows a slight advantage.
  5. Elo difference:
    • Mean: +2.3 Elo
    • Uncertainty: ±5.8
    • Level of statistical significance: LOS 78.5%
    • Draw ratio: 27.7%

Step 2: Interpretation

  1. Overall strength:
    • Wordfish 1.0 dev is slightly stronger than the base version.
    • +2.3 Elo is very modest; given the standard deviation of 5.8 Elo, the confidence interval includes zero (i.e., the dev version might not be stronger in a strict statistical sense).
  2. Colour asymmetry:
    • Very strong performance with White (score 0.654), significantly outperforming the base.
    • Substantially weaker with Black (score 0.353).
    • This suggests that the dev version may have optimisations or heuristics that favour playing first, but these are overcompensated when playing second, leading to losses with Black.
  3. Draw ratio & SPRT:
    • Draw ratio 27.7% is relatively low, meaning many decisive games → higher variance per game.
    • SPRT LLR 0.309 < upper bound → the sequential test has not crossed the threshold, so the result is suggestive but not statistically conclusive.

Step 3: Elo Gain Estimation

  • The provided Elo difference: +2.3 ± 5.8
  • This means:
    • Expected gain for Wordfish 1.0 dev vs base: 2.3 Elo
    • Error margin (1 standard deviation): 5.8 Elo
    • Statistical confidence (LOS 78.5%): moderate; not definitive, but indicates slight improvement.

Interpretation in practical terms:

  • The dev version is marginally stronger, but this improvement is very small and might not be reliably detectable in short tournament runs.
  • White advantage is clear; Black disadvantage cancels some of the gains.

Step 4: Summary Table

MetricValueInterpretation
Overall score fraction0.503Slight advantage for dev
Elo difference+2.3 ± 5.8Marginal, within statistical noise
LOS (Level of Significance)78.5%Moderate confidence
Draw ratio27.7%Low draws, many decisive games
White score fraction0.654Strong White performance
Black score fraction0.353Weak Black performance
SPRT llr0.309Well within bounds, test inconclusive

✅ Conclusion

  • Wordfish 1.0 dev shows a slight Elo improvement of +2.3 over Wordfish base.
  • The improvement is not statistically robust due to ±5.8 Elo uncertainty.
  • Most of the gain comes from playing White, while Black games are weaker than White, resulting in an overall modest net gain.
  • SPRT confirms the result is suggestive but not decisive; further testing would be required to confirm a true performance increase.

In short, Wordfish 1.0 dev is marginally stronger, with a small estimated gain of roughly +2 Elo, primarily when playing first. The statistical evidence indicates improvement but is not fully conclusive.

Download games