Skip to content
Home » News » Wordfish in playchess.com machines room

Wordfish in playchess.com machines room

playchess.com

Overview of Results (September 18, 2025)

On September 18, 2025, the Wordfish chess engine (operating under the PlayChess nickname “Desvelemosafrica”) delivered a highly solid performance in the PlayChess.com “machines” room. Over the course of 92 games played that day, Wordfish remained undefeated, scoring a total of 49.5 points (approximately 54% score). This score was achieved through a large majority of drawn games and a handful of wins – remarkably, Wordfish did not lose a single game. Its Ordo rating for the day was computed around 3700, which placed it on par with virtually all the other top engines active that day. In fact, the Ordo ranking shows Wordfish at 3700 Elo, identical to the cluster of leading engines, while only the clearly outperformed engines dropped significantly below that mark (e.g. DRAHA at ~3545). Wordfish’s undefeated run and plus score underscore its competitiveness among the elite engines present.

Win/Loss/Draw Statistics by Color

A closer look at Wordfish’s results reveals an interesting split between games played with the White pieces versus Black pieces. Out of its 92 games, Wordfish played 44 games as White and 48 as Black. As White, Wordfish scored exactly 50%all 44 games were draws, and not a single victory (or loss) was achieved with the white pieces. In contrast, as Black Wordfish scored a higher percentage: in 48 games as Black, it achieved 7 wins and 41 draws (with 0 losses). This means all of Wordfish’s decisive results came when it was playing Black, while every White game remained peacefully drawn. Such a scenario is statistically unusual in human play, but in engine play (especially at high levels) it can occur – here the overall “white advantage” was essentially nil (White scored no higher than Black). The engines’ precise play neutralized first-move edge, and in Wordfish’s case, Black even ended up delivering all the wins.

Wordfish’s results broken down by color. As White, Wordfish drew 100% of its 44 games (no wins or losses). As Black, Wordfish won 7 out of 48 games (green segment) and drew the remaining 41 – yielding a plus score as Black. It did not suffer any losses with either color.

This color asymmetry highlights Wordfish’s exceptionally solid play with White (no losses, but also no decisive breakthroughs) and its opportunistic scoring with Black. Wordfish managed to win several games as Black against select opponents (as discussed below), while holding all stronger opponents to draws even when Wordfish had the second move. In practical terms, Wordfish’s Black repertoire proved capable of not only equalizing but occasionally seizing winning chances. Meanwhile, with White, Wordfish did press but ultimately every game was held by the opponent – indicating that on this day Wordfish could not convert any initiative into a full point. For an engine, scoring more wins as Black than White is a bit counter-intuitive, and likely reflects the specific opponents and openings encountered rather than a fundamental preference. It’s a testament to Wordfish’s defensive resilience that neither color saw any losses; however, the lack of White wins also suggests room for improvement in leveraging first-move advantages or in selecting sharper lines as White to unbalance the game.

Notable Opponents and Matchups

Wordfish’s marathon session featured games against a wide array of high-profile engine opponents in the PlayChess machine room. Many of these opponents were top-tier Stockfish-derived engines or other cutting-edge chess engines, making the machine room on this date a true clash of heavyweights. Some of the notable opponents (with their engine identification and match outcomes) included:

  • Amonfritz (running Stockfish 17.1): 4 games, all drawn. Amonfritz is an example of an official Stockfish-based engine, and Wordfish held its ground with four solid draws in their mini-match【10†】. Neither engine could break through the other’s defenses, reflecting the razor-thin margins at this level.
  • Yekoyeko (running Claw 2.0): 3 games, all drawn. Claw 2.0 is a strong Stockfish derivative known for its unique tuning. Wordfish managed a 50% score here as well, drawing all games. Notably, Yekoyeko (Claw) was the top-rated engine of the day by BayesElo at +29 Elo, yet Wordfish comfortably drew their encounters – a sign that Wordfish can stand toe-to-toe with the day’s leader.
  • Srye (using Artemis 18.2 and ShashChess 6.0 in different games): 7 games, all draws. Srye is an engine user that switched between two strong engines (Artemis, a Stockfish fork with a specialized net, and ShashChess, another SF derivative emphasizing more positional play). Wordfish neutralized both variants, resulting in a peaceful 3½–3½ overall with Srye across those games.
  • Gariz (running a Stockfish development version from 2025-09-06): 4 games, all drawn. This shows Wordfish handling a very recent Stockfish dev build with no losses. Gariz’s games ended drawn, indicating parity between Wordfish and the bleeding-edge Stockfish code.
  • AutoLearning (Stockfish AutoLearn): 8 games, all draws. This opponent was running an experimental Stockfish that automatically learns from games. Wordfish again maintained equilibrium in all 8 encounters, giving up no losses (and also scoring no wins) – a testament to how even self-learning engines could not topple Wordfish’s defenses.
  • PSYKOTRANCE69 (Eman 9.90 engine): 5 games, all draws. Eman is a well-known Stockfish variant with modified evaluation; these games were also drawn. Wordfish showed it can handle engines with alternative evaluation approaches without trouble.
  • Kaltago (Wordfish 2.41 dev – an earlier version of Wordfish): 6 games, all drawn. In a showdown against its own predecessor version, Wordfish 2.42 did not manage to score a win. The fact that Wordfish 2.42 and 2.41 drew all their games suggests that the newer version, while at least as strong, did not demonstrate a clear superiority over the older version in head-to-head play. This may hint that the improvements in the update were evolutionary rather than game-breaking.
  • Peace on Earth X (Wordfish 2.42 from an earlier compile date): 3 games, all draws. This appears to be another instance of Wordfish (perhaps a different user running a similar version); unsurprisingly, engines of essentially the same lineage also drew all their games.
  • DeepBlueOcean (Stockfish 17.1 and dev builds): 7 games (across two identifications) – all drawn. Wordfish faced DeepBlueOcean in multiple games; whether it was using the official SF17.1 release or a dev build, the result was always a draw. This again underscores how closely matched these top engines were – none could score a decisive win over Wordfish.
  • BrainLearn and Other SF Forks: Wordfish encountered engines like Sebastian6430 (BrainLearn 31), Thinkharder (Obsidian dev 16), Great Ozzie (JudaS++ engine), Ultima (JudaS++ 1.02), Hippo100 (Remix NNUE engine), among others – typically in 1–3 game mini-matches. Across all these encounters, Wordfish drew every game, showing tremendous consistency. These opponents represent various branches of the Stockfish family and other independent engines, each with their own tweaks; none of them managed to outplay Wordfish in a single game.

In summary, Wordfish drew against every top engine it facedno opponent of similar caliber could score a win against it. This is an impressive feat: engines like Stockfish dev, ShashChess, Claw, Eman, BrainLearn, etc., all tried and failed to defeat Wordfish that day. The only engine that Wordfish managed to beat (rather than just draw) was DRAHA, which we’ll examine next. It’s worth noting that the landscape of opponents included several Stockfish derivatives and contemporaries – Wordfish’s ability to hold them all to draws indicates that it belongs firmly in the top echelon of this field. The flip side, of course, is that Wordfish also wasn’t able to defeat those peers either, leading to many split points.

Decisive Wins and ECO Performance

All 7 of Wordfish’s wins on this day came at the expense of a single opponent: DRAHA (an engine user who ran a classical AI engine, possibly an older or less advanced program). Wordfish played a total of 12 games against DRAHA (11 games against “DRAHA, AI 50.3” and 1 game against the base “DRAHA” engine), scoring 7 wins and 5 draws in those encounters【18†】. In other words, DRAHA was clearly overmatched – it failed to win any game and lost the majority. This one matchup provided Wordfish with all of its decisive victories. The Ordo rating reflects DRAHA’s struggling performance: DRAHA’s rating fell to 3485.5, a full ≈215 Elo points below Wordfish’s 3700 on the day. In BayesElo terms, DRAHA’s engine registered at –136 Elo, the lowest on the list (with a mere 23% score). These numbers highlight how lopsided the Wordfish–DRAHA encounters were. Wordfish capitalized on this to boost its overall score, while all other matches remained drawn. In practical terms, DRAHA likely had weaknesses that Wordfish exploited – possibly older evaluation or inferior hardware – leading to multiple decisive results.

Interestingly, all those wins occurred in a particular opening family. An analysis of the PGN games shows that Wordfish’s wins over DRAHA came when DRAHA (playing White) ventured into the Ruy Lopez (Spanish) Opening, specifically an ECO code C67 line【18†】. Wordfish as Black repeatedly defended and outplayed DRAHA in that line. In fact, every single decisive game was in ECO C67 – highlighting that Wordfish found its scoring chances in this classical 1.e4 e5 opening. Outside of the C67 Spanish, Wordfish did not notch any wins; all other openings it played led to draws. This suggests that either DRAHA kept playing the same opening where Wordfish had an edge, or that Wordfish’s best preparation/performance on the day was in that specific opening. By contrast, in all other ECO codes (other openings) Wordfish’s games were draws.

If we look broadly at Wordfish’s opening repertoire that day, it heavily featured classical king-pawn openings and a variety of defenses, but without decisive effect apart from the DRAHA games. The distribution of ECO codes in Wordfish’s games was skewed towards certain categories. As the chart below shows, about 61% of Wordfish’s games were in ECO category “C” – which corresponds to 1.e4 e5 open games (e.g. the Ruy Lopez, Italian Game, etc.). The next largest chunk (~20%) were “B” category (semi-open games, typically 1.e4 with a different response such as Sicilian or French), around 14% “D” category (closed openings starting with 1.d4), and only a few games in “A” (flank openings) or “E” (Indian defenses and others). This tells us that most games involved mainstream, open chess – and that is indeed where Wordfish scored its wins (within the C category).

Breakdown of the openings (by ECO code category) in Wordfish’s games on Sept 18, 2025. A whopping 61% of the games fell under category “C” (open 1.e4 e5 games – predominantly Ruy Lopez lines). The next most common were “B” (semi-open defenses like the Sicilian, 20%) and “D” (closed 1.d4 games, 14%). Minor categories “A” (flank/irregular openings) and “E” (Indian defenses) together constituted only about 5% of games. This indicates Wordfish spent the vast majority of its time in classical e4/e5 battles.

From a performance by ECO perspective, Wordfish scored slightly above 50% in the “C” category overall (thanks to those wins vs DRAHA in the Ruy Lopez), and exactly 50% in all other opening categories. In the Sicilians and other semi-open games (B ECO) Wordfish did not lose, but also did not win any – those encounters (e.g. a couple Najdorf Sicilian games, etc.) were all draws. The same holds for the Queen’s Gambit/Indian defenses (D and E categories) – solid draws across the board. This reinforces a pattern: Wordfish was extraordinarily hard to beat in any opening, and it required an obviously weaker opponent in a well-trodden opening (the Spanish) for it to score victories. We can say that Wordfish’s opening preparation and play were generally robust, yielding at least a draw in all cases, and that its best results came in the most heavily played opening (Spanish). The lack of wins in other lines might imply that against equally matched engines, the openings tended toward theoretical draws; whereas against DRAHA, even a mainline Spanish provided enough imbalance for Wordfish to convert points.

Elo Rating Comparisons (Wordfish vs. Opponents)

Both the Ordo and BayesElo rating systems gave insight into Wordfish’s standing relative to its competition on this date. Ordo ratings, which were calculated from the game results, placed Wordfish at 3700 Elo, effectively tying it with all the top engines that day. In fact, every engine in the upper ranks (from Amonfritz through Wordfish itself) ended up sharing that same rating to the displayed precision, reflecting the extensive drawing balance among them. Only DRAHA (and its variant with “AI 50.3”) were significantly lower rated by Ordo, in the mid-3500s. This means that, based on Ordo’s algorithm, Wordfish and the cluster of Stockfish-derived peers were essentially in a dead heat – none had a performance clearly above the rest (50% draws all around tends to equalize ratings). DRAHA’s much lower Ordo rating quantifies the one-sided nature of its results against the field (0% in one case, 23% in the other).

The BayesElo list (another ranking computed with a Bayesian Elo model) provides a slightly different perspective, because it takes into account not just score percentages but also draws and prior distributions. According to BayesElo, Wordfish (Desvelemosafrica) achieved a rating of about +9 Elo on the day. This placed it at rank 10 among the engines listed – meaning there were a few engines rated higher, a result of subtle performance differences. For instance, the top engine by BayesElo was Yekoyeko (Claw 2.0) at +29 Elo, and a couple of others like Leyla995, Thinkharder, DeepBlueOcean, Kaltago clustered in the +24 to +16 range ahead of Wordfish. All of those engines, however, had very few games (often 1–3 games total) and 50% scores – their higher Elo in BayesElo likely arose from facing slightly lower-rated opponents or winning a single game elsewhere. Wordfish’s +9 Elo in BayesElo reflects its solid 54% score over a large sample of 92 games. Notably, almost all of Wordfish’s prominent rivals also ended up around +9 Elo (many exactly 9, in fact, due to the prevalence of draws). Engines like Gariz, Srye, Auryn, CM9000, etc., all appear with single-digit Elos and 50% scores – essentially tied with Wordfish in performance. The only engines with substantially negative BayesElo were, unsurprisingly, DRAHA and its AI 50.3 version at –136 and –152 Elo, bringing up the bottom of the list. This huge rating deficit for DRAHA mirrors its very poor results (23% score) against the field. In summary, BayesElo confirms that Wordfish was among the top engines but not singularly the top – it was part of a pack of closely matched engines all within a few Elo points of each other, separated by tiny statistical differences. At the same time, Wordfish was hundreds of Elo stronger than the weakest competitor it faced (DRAHA), which is exactly where it picked up its wins.

Relationship between Wordfish’s performance and opponent strength. The scatter plot shows Wordfish’s score percentage against each opponent, plotted versus the opponent’s BayesElo rating for that day. We see that for nearly all strong opponents (Elo ~0 to +30 range), Wordfish scored about 50% (indicated by the cluster of blue X’s at the 50% line). Only against the low-rated DRAHA (around –136 Elo) did Wordfish score significantly higher, about 79% (red point). This illustrates that Wordfish traded draws with its peers, and achieved decisive wins only against the much weaker engine.

As the above chart and data indicate, Wordfish’s performance was highly level-dependent: versus the top-tier engines, results were consistently drawn (50% score each), whereas versus the clearly weaker engine, Wordfish’s score jumped dramatically. This is expected behavior in a competitive Elo ecosystem – engines of roughly equal strength tend to split points, and even a small rating gap can yield a more lopsided score. It’s also noteworthy that Wordfish’s 92% draw rate overall was one of the highest, reflecting its style or preparation: it was extremely difficult to beat, but also found it difficult to beat others of similar strength. High draw rates are typical in engine vs. engine play at the top level (BayesElo even assumes ~50% draw rate for equal opponents), and Wordfish epitomized that on this day.

Standing Among Stockfish Derivatives and Competitors

Wordfish is itself a derivative of the Stockfish engine (Wordfish 2.42 is based on Stockfish code with some custom modifications). In the context of the PlayChess machine room on Sep 18, it was surrounded by a crowd of Stockfish cousins – and it fared extremely well among them. No Stockfish-derived engine could beat Wordfish that day: not the official Stockfish 17.1 (Amonfritz, DeepBlueOcean), not dev builds (Gariz, Bingow, etc.), nor specialized forks like ShashChess (Polomo, Srye), nor others like Claw (Yekoyeko) or Eman (PSYKOTRANCE69). This is a strong endorsement of Wordfish’s strength – it has kept pace with the rapid evolution of Stockfish and its offshoots. In fact, the Ordo ranking lumps Wordfish with all these top engines at the same 3700 rating because their head-to-head results were draws across the board. Wordfish essentially proved equal to the best engines present.

However, Wordfish also did not demonstrably exceed the others – its inability to score wins against them means it didn’t pull ahead in Elo or ranking. For instance, another Wordfish-based entrant (“Peace on Earth X”) also scored all draws with Wordfish. The BayesElo list shows Wordfish slightly behind a few others, but the margins are very small (a handful of Elo points). It’s fair to say Wordfish secured its place in the top tier, but as one member among a pack of closely matched engines. On that day it wasn’t the absolute top dog (Claw 2.0 edged it by scoring a win elsewhere to reach 29 Elo), but it was certainly near the summit.

When comparing Wordfish to other Stockfish derivatives, one might consider factors like style and innovation. Many forks attempt different tweaks: e.g., ShashChess focuses on more positional/playstyle changes, Claw and Eman experiment with different neural net evaluations, BrainLearn incorporates learning from games, etc. Wordfish’s distinguishing features (in its 2.42 version) aren’t explicitly documented in this data, but the results suggest its playing strength is effectively in the same league as these experiments. One empirical observation: Wordfish’s extreme drawishness could imply it is tuned for stability – perhaps favoring solid play and not taking undue risks. It might also reflect the general “engine convergence” at the top: all these programs, being based on Stockfish NNUE, tend to find the best moves and often end up neutralizing each other. In that sense, Wordfish’s performance is very similar to its peers – a lot of theoretical draws, a handful of decisive games only against weaker opponents.

It’s also instructive to compare Wordfish vs. Stockfish (official): Stockfish 17.1 was represented by Amonfritz and DeepBlueOcean accounts. Those encounters were all drawn, indicating Wordfish 2.42 can hold the latest official Stockfish version without issue. Against other notable forks like Artemis (run by Srye) or JudaS++ (Great Ozzie, Ultima, etc.), again Wordfish showed itself equal. In essence, none of the alternative engines showed a clear superiority over Wordfish in actual play. This suggests that whatever innovations Wordfish has under the hood are effective enough to keep it on par with the cutting-edge. For example, if Wordfish includes custom tuning or search modifications, they are yielding a strength roughly matching the best from others. It also suggests that Wordfish’s evolution (up to version 2.42) has successfully kept up with the rapid improvements that Stockfish and its NNUE-based derivatives have made through 2025.

Performance Trends and Insights

A few empirical insights emerge from Wordfish’s performance on this day:

  • Rock-Solid Defense: Wordfish was never defeated in 92 games – a formidable accomplishment. Even playing dozens of games against top engines, it maintained perfect defense. This implies exceptional tactical sharpness and endgame reliability; any mistake by Wordfish would likely have been seized by those opponents, yet it made none serious enough to lose.
  • Difficulty in Securing Wins: The flip side is that Wordfish also struggled to score wins against engines of comparable strength. All draws as White indicate that even with first-move initiative, Wordfish could not break the equilibrium (possibly a credit to opponents’ defense as well). Its wins as Black came only when a noticeable gap in opponent strength existed. This highlights a common theme in top-level computer chess – engines are so strong and well-matched that decisive results are rare. For Wordfish, it may point to a relatively conservative or balanced style, where it plays “correct” moves that make it very hard to beat, but also may not press as recklessly for wins when there’s no clear opportunity.
  • Opening Preparedness: The concentration of games in certain openings (especially the Ruy Lopez/Spanish) suggests that Wordfish was comfortable repeatedly entering those lines. The fact that it scored wins there (and nowhere else) could imply that Wordfish (or its operator) specifically prepared that opening to target weaker opponents. It also underscores how critical opening choice is: many draws in other openings show that if both engines know the line, the game will likely steer towards a drawish outcome. Wordfish might consider exploring sharper or less trodden variations when aiming for wins against its equals. Conversely, its success in the Spanish indicates a deep familiarity with those positions – a strength to continue leveraging.
  • Comparison to Predecessors: Wordfish 2.42’s draw with Wordfish 2.41 (Kaltago) shows that improvements between versions did not translate into easy wins; this could suggest that the differences were subtle (perhaps efficiency, minor evaluation tweaks, etc.) rather than a major strength jump. It’s a reminder that as engines approach perfection, even a stronger version might only be marginally better – not enough to force a decisive result except over many games. The evolutionary nature of Wordfish’s development means each version increment might yield a few Elo points gain. In the fiercely competitive engine world, that’s significant but not always immediately visible in head-to-head short matches.
  • BayesElo vs Ordo nuances: Wordfish’s BayesElo rating of +9 was a bit behind the top, partly because it played so many games (92) that its score, while positive, was diluted by many draws. Engines that played very few games could appear at the top of the BayesElo list if they happened to score 50% against a high-rated opponent or got one win – essentially benefiting from small sample variance. Wordfish’s large sample size gives a more robust indication of its strength. The takeaway is that Wordfish, over a large number of games, proved itself reliably strong – there were no random upsets or off-days in that sample. Consistency is a virtue in rating terms, even if it means the rating doesn’t spike as high as someone who, say, won 1 out of 1 game. In a longer run, Wordfish’s steadiness would likely make it a top contender for any ranking list.

Conclusion and Outlook

Wordfish’s performance on September 18, 2025 reflects a mature, competitive chess engine at the top of its game. It demonstrated elite defensive prowess, going unbeaten in a gauntlet of 90+ games against the strongest engines available. In the PlayChess machine room context – a brutal arena of engine competition – Wordfish proved that it can hang with the very best Stockfish derivatives and even the official Stockfish, without giving ground. Its plus score (+4 net wins) came entirely from appropriately dispatching a weaker engine, which is exactly what a top engine should do – exploit any weaker opponent for full points while conceding nothing to peers.

That said, the results also highlight areas for potential improvement and future evolution of Wordfish. The fact that it did not score any win against engines of similar caliber suggests that there is still room to push the envelope in terms of aggression or creativity. Perhaps Wordfish’s developers could incorporate more novel neural network tweaks, or adjust its evaluation function to take more calculated risks in positions that are objectively equal but offer practical win chances. Many drawn games were likely very balanced – finding ways to introduce imbalance (without compromising soundness) could be key to converting a few more draws into wins. This is, of course, the holy grail of engine design: how to win against an equally perfect opponent? Wordfish might look into techniques other forks have tried – like reinforcement learning of openings, alternative search heuristics, or endgame tablebase innovations – to gain that edge.

Another aspect is opening diversification. Wordfish leaned heavily on mainline openings (particularly 1.e4 e5 Spanish). While it clearly knows these inside out, top opponents also know them, yielding many theoretical draws. Exploring offbeat but strong lines could catch an opponent “out of book” and lead to a decisive result. Some engines prepare surprise opening novelties in engine-vs-engine play to avoid the well-trodden paths that lead to sterile equality. If Wordfish can incorporate such strategies, it might tip some drawn encounters in its favor next time.

From an innovation standpoint, Wordfish is part of the broader trend of Stockfish-based engines trying to differentiate themselves. Its standing in the machine room this day shows it is on par with the leading edge. The continued development (the version numbering suggests frequent updates) will likely aim to build on this foundation – maintaining the solidity while improving the scoring capability. As a community, these engines collectively push the boundary of chess strength; Wordfish’s contribution is evident in how well it performed.

In conclusion, Wordfish emerges as a formidable engine that has effectively joined the top ranks of computer chess. It showcases the culmination of Stockfish lineage strength with its own twist of reliability. To truly shine above the rest, the next steps for Wordfish might involve nurturing a bit more “killer instinct” – finding wins where others settle for draws – without sacrificing its legendary stability. The September 18, 2025 performance was an excellent demonstration of competitiveness and consistency. With continued innovation, Wordfish could convert more of those near-equal battles into victories, potentially climbing from being one of the pack leaders to a clear front-runner in future machine room showdowns. The engine’s evolution will be exciting to watch, as it balances on that razor’s edge between maintaining perfection and taking the leap to seize victory.

Sources: The analysis above was derived from the provided PGN game file of Wordfish’s games on 2025-09-18, along with the Ordo rating list and BayesElo rankings computed for those games. Key data points such as game results, scores, and ratings have been cited from these sources for accuracy. For example, Wordfish’s game tally and score are documented in the Ordo output, and its draw rate and BayesElo rating are noted in the BayesElo summary. The rating gap of DRAHA is evidenced by both Ordo and BayesElo listings. These sources underpin the performance evaluation and comparisons discussed.

Wordfish (Desvelemosafrica) — Performance on 18 Sep 2025

· PlayChess.com machines room

At-a-Glance

Total games: 86 · Wins: 7 · Draws: 79 · Losses: 0

Figures

You can save each chart with right‑click (or long‑press) → “Save image as…”.

Stacked bar chart: Wordfish results by color (wins, draws, losses)
Figure 1. Results by color (White vs Black) — 18 Sep 2025.
Bar chart showing count of games by ECO category A–E
Figure 2. Opening distribution by ECO category (A–E).
Scatter plot of Wordfish score percentage vs opponent BayesElo
Figure 3. Score percentage vs opponent BayesElo (day list).

Notes & Sources

  • Games parsed from games.pgn (filtered by Date=2025‑09‑18 and nickname contains “Desvelemosafrica”).
  • Ratings cross‑referenced from ordo.txt and bayeselo.txt.
Jorge Ruiz

Jorge Ruiz

connoisseur of both chess and anthropology, a combination that reflects his deep intellectual curiosity and passion for understanding both the art of strategic chess books

Leave a Reply

Your email address will not be published. Required fields are marked *