Revolution v.2.70 dev-210925

Table of Contents

Summary

I centralized the engine name to version.hand changed it to revolution v.2.70 dev-210925, ensuring that any build without special flags will display the correct identifier.
I updated the entry points and utilities (main, misc, UCI, and options) to include the shared version header and emit the new name to the console, UCI protocol, and information reports
Scaled correction-history contributions before blending into static evaluation, tapering their influence as depth grows and when experience guidance is available while still rewarding early continuation data.
Introduced king-file exposure analysis so correction-history updates are damped or amplified when enemy pressure opens files toward our king, reducing optimistic pruning in dangerous situations.
Propagated the presence of experience data to all workers and periodically refreshed NNUE network handles during iterative deepening to stay aligned with upstream evaluation improvements.

Test

Point estimate (for context)

Mean score SSS = (IN+0.5D)/N=0.506(W + 0.5D)/N = 0.506( IN+0.5 D ) / N=0.506 → ΔElo ≈ +4.17 (matches your +4.2).
Formula: DHow much=400log⁡10 ⁣(S1−S)\Delta \text{Elo} = 400 \log_{10}\!\left(\frac{S}{1-S}\right)ΔElo=400log10(1−SS).

SPRT with typical hypotheses

Assume the usual H₀: Δ=0 Elo vs H₁: Δ=+2.5 Elo (or your earlier +10 Elo) and α=β=0.05.
SPRT decision boundaries (in LLR units): A = +2.944, B = −2.944.

Using your counts (W=240, L=231, D=279), and treating the draw rate as fixed under both hypotheses (so draws cancel in the LLR), the current log-likelihood ratio is:

H₁ = +2.5 Elo : LLR ≈ +0.053
H₁ = +10 Elo : LLR ≈ +0.064

You’re miles away from either boundary (±2.944). That’s why your runner shows SPRT: llr 0.

How many more games would SPRT need?

With the observed draw rate (d ≈ 0.372) and empirical “true” win probability for decisive games p^≈0.510\hat p \approx 0.510p^≈0.510, the expected LLR gain per game is extremely small:

vs H₁=+2.5 Elo :AND[DLLR]≈7.0×10−5E[\Delta \text{LLR}] \approx 7.0\times 10^{-5}And [ Δ LLR ]≈7.0×10−5 / game
vs H₁=+10 Elo :AND[DLLR]≈8.5×10−5E[\Delta \text{LLR}] \approx 8.5\times 10^{-5}And [ Δ LLR ]≈8.5×10−5/ game

At that rate, to reach A = +2.944 you’d need on the order of 30,000 games (same ballpark to hit B if the true Δ were ≤ 0). In short: with Δ ≈ +4 Elo and ~37% draws, SPRT will converge painfully slowly.

Practical takeaways

Your current data says: about even, maybe small +4 Elo, not significant (your CI ±19.7 Elo is right).
SPRT is the wrong hammer here unless you push conditions to be more decisive or raise the tested effect size.

If you want SPRT to bite, do one (or more) of these:

Test a larger effect size (e.g., H₁=+15 Elo). You’ll quickly accumulate negative evidence if the true Δ ~ +4, or positive if the build is genuinely ≥+15.
Lower draw rate: use a sharper subset (the “worst Black lines” package we built), or add resignation/adjudication (you already use -draw 50 5 / -resign movecount=3 score=700 in subset runs).
Switch to fixed-N testing**:** pick N = 2000–4000, then evaluate with BayesElo/Ordo; this is often faster and clearer when Δ is small.

Download revolution v.2.70 dev-210925