Skip to content
Portada » News » ZurgaaPOLY 18.1AI SSE-251004 vs revolution 2.81 021025

ZurgaaPOLY 18.1AI SSE-251004 vs revolution 2.81 021025

ZurgaaPOLY

ZurgaaPOLY

Technical Match Report: revolution 2.81 (021025) vs ZurgaaPOLY 18.1AI SSE-251004 — PGN-Backed Findings from a 1,000-Game Trial (881 Completed)

This report analyses a long match between two modern chess engines—revolution 2.81 (021025) and ZurgaaPOLY 18.1AI SSE-251004—using the supplied PGN plus rating summaries from Ordo and BayesElo. Although the match was scheduled for 1,000 games, the artifacts currently include 881 completed games. Ordo rates revolution ahead by ~29 Elo (3700.0 vs 3671.3) with a 54% score over 881 games, while BayesElo shows a similar +30 Elo gap (+15/−15 with ±10 Elo error) and a notably low 26% draw rate.

Data & Methods

  • Artifacts analysed
    • Ordo output: 881 games, points and ratings per engine.
    • BayesElo output: per-engine Elo, error bars, draw rate, and total games (881).
    • PGN (“games.pgn”): complete headers and moves for the played games (881 present).
  • Time control: 60+1 as recorded in the PGN headers.
  • Primary endpoints: match score, Elo gap (Ordo & BayesElo), draw rate, colour balance, and structural biases detectable from the PGN.

Headline Results

  • Overall score (Ordo): revolution 476.5 / 881 = 54%, ZurgaaPOLY 404.5 / 881 = 46%.
  • Ordo ratings: revolution 3700.0, ZurgaaPOLY 3671.3, Δ ≈ +28.7 Elo for revolution.
  • BayesElo ratings: revolution +15 ±10 Elo, ZurgaaPOLY −15 ±10 Elo (symmetric anchor), implying Δ ≈ +30 Elo; draws ≈ 26%.
  • Games completed: 881/1000 at the time of the exported files (both Ordo and BayesElo summaries reflect 881 games).

What the PGN Reveals (structure & bias)

A pass over the PGN (881 games) shows an extreme first-move bias:

  • Result distribution (all games): White wins 651, Black wins 1, Draws 229
    White win rate ≈ 73.9%, Black win rate ≈ 0.1%, Draw rate ≈ 26.0%.
  • Side balance is otherwise fair: revolution plays White 441 times and Black 440 times (mirrored for ZurgaaPOLY).
  • Per-side outcome for revolution:
    • As White: 361–0–80 (W–L–D) → 90.9% score.
    • As Black: 1–290–149 (W–L–D) → 17.2% score.
  • Average game length: ~78.7 moves (157.3 ply).

Interpretation

Such a near-absence of Black wins is not typical of balanced engine vs engine testing and strongly suggests an opening-suite artefact (e.g., a set of lines systematically bad for Black, insufficient colour-swapping per position, or a lopsided book). In effect, both engines score heavily with White and bleed with Black; the +30 Elo edge for revolution comes from winning slightly more (or conceding slightly fewer draws) within a highly White-favoured environment—not from a colour-neutral contest.

Statistical Confidence (back-of-the-envelope)

Treating the 881-game score of 54.09% for revolution, a logistic mapping yields ≈ +28.5 Elo with a rough 95% CI of ≈ [+5, +51] Elo—broadly consistent with the Ordo/BayesElo summaries and their uncertainty. This again underscores that the direction of the result is credible, but its magnitude is entangled with opening bias.

Practical Takeaways

  1. Revolution is ahead—modestly: Across 881 games, revolution leads by roughly +30 Elo in both Ordo and BayesElo views.
  2. But the match conditions are not colour-neutral: The PGN shows an extreme White advantage, rendering the current Elo gap non-portable to fair, book-balanced conditions.
  3. Draw rate is unusually low (≈26%): This is compatible with skewed opening lines that collapse for one side rather than leading to balanced middlegames.

Ordo

#PlayerRating (Ordo)PointsPlayedScoreWDL
1revolution 2.81 0210253700.0476.588154%362229290
2ZurgaaPOLY 18.1AI SSE-2510043671.3404.588146%290229362
Ordo: White advantage = 0.00 · Draw rate (equal opponents) = 50.00%.
Observed (PGN): Draw rate = 26.0% · White wins 651, Black wins 1, Draws 229 · Games 881.

Recommendations for a Colour-Fair Rerun

To obtain a trustworthy, portable Elo estimate:

  • Use a balanced, curated opening suite (e.g., UHO-style or comparable), pair-swapped per position (A/B colours), and sampled uniformly across the match.
  • Fix adjudication and resignation logic to avoid premature collapses and to converge draw rates towards typical engine-vs-engine baselines at the chosen time control.
  • Neutralise confounders: disable/clear experience files, freeze NNUE networks, pin hash sizes and thread counts, and log exact command lines.
  • Target ≥1,000 completed games (not just scheduled), then export Ordo/BayesElo again.

Minimal Reproducibility Notes (what to archive with the release)

  • Artifacts:
    • games.pgn (881 currently)
    • ordo.txt and Bayeselo.txt (rating snapshots)
  • Report the fixed settings: engine SHA/build, NNUE file hashes, Threads, Hash, opening suite ID and seed, adjudication rules, and book-pairing policy (A/B colour swap enabled).
  • Example BayesElo session (scripted): readpgn games.pgn elo mm exactdist ratings
  • Example Ordo call (pseudocode; adapt paths/names to your environment): ordo -J -v 2 -q \ -p "revolution 2.81 021025" \ -p "ZurgaaPOLY 18.1AI SSE-251004" \ games.pgn > ordo.txt

Conclusion

Under the present (colour-skewed) conditions, revolution 2.81 (021025) shows a small but consistent lead over ZurgaaPOLY 18.1AI SSE-251004—about +30 Elo with a 54% match score across 881 completed games. However, the overwhelming White bias in the PGN means the result does not constitute a colour-neutral rating. A rerun with a balanced, pair-swapped opening suite and the same engines will convert this into a robust, portable Elo estimate suitable for publication and downstream benchmarking.

Download games

Jorge Ruiz

Jorge Ruiz

connoisseur of both chess and anthropology, a combination that reflects his deep intellectual curiosity and passion for understanding both the art of strategic chess books

Leave a Reply

Your email address will not be published. Required fields are marked *

Share via