Skip to content
Home » News » SPRT TEST Revolution dev versus Baseline

SPRT TEST Revolution dev versus Baseline

SPRT TEST Revolution

Snapshot of the current match

  • Score (10+0.1, 1T, 32MB, UHO_2024_8mvs_+085_+094): 18–15–27 (60 games), 52.5% for DEV.
  • Elo: +17.4 ± 49.5 (not statistically significant).
  • LOS: 75.6% (suggestive, but not conclusive).
  • Draw rate: 46.7% (healthier, closer to parity conditions than earlier).
  • LLR: 0.04 — effectively “no verdict” yet; far from any SPRT boundary.

How this compares to earlier runs (regression from the beginning)

  1. Early tests: DEV was badly negative vs the base (often −90 to −150 Elo) and showed a catastrophic collapse as Black (e.g., near-zero win rate with Black), heavily distorted by White-biased books and some option mismatches.
  2. Mid-stage fixes: After aligning time management (defaults, no MinThink/SlowMover hacks), enforcing color-pairing per line, and cleaning UCI mismatches, results moved toward rough parity but still volatile; many runs were short and white-skewed.
  3. Now: DEV is slightly ahead (+17 Elo), with wide error bars. The draw rate rose vs earlier (where it hovered ~30–40%), which usually indicates better comparability and fewer “free points” from adjudication/over-pruning. The Black collapse signal is no longer obvious in this small sample, but with only 60 games you can’t call it fixed.

What to take from this

  • Direction of travel: from clearly worse → about even / marginally better.
  • Confidence: still low due to the small sample (n=60) and ±50 Elo uncertainty.
  • SPRT status: LLR ~0 means keep playing; you’re nowhere near accept/reject thresholds.

Recommended next steps (quick, practical)

  1. Grow the sample to at least 400–800 games at the same settings before judging (LLR will move; CI will shrink to ±20–25 Elo).
  2. Color sanity: continue to track per-color scores; if possible, also run an original + mirrored suite pair and combine, to fully remove residual book bias.
  3. Hold conditions fixed: 1 thread, 32 MB, same book, no extra time options, Ponder off, MultiPV=1.
  4. Watch these indicators:
    • Draw rate (should stabilize),
    • White vs Black split (if Black dips again, we revisit SEE/TT/NMP gates),
    • Time losses and illegal option warnings in logs.

Bottom line: today’s result is encouraging—a marked improvement over the early negative regressions—but not yet decisive. Keep the current setup and extend the run so the statistics can speak clearly.

Leave a Reply

Your email address will not be published. Required fields are marked *