Adjudication Rules in Computer Chess Rating Publication

When readers look at a computer chess rating list, they often focus on the visible outputs: Elo numbers, ranks, win rates, draw rates, and total game counts. Yet the trustworthiness of those numbers depends on a much less glamorous layer underneath: adjudication policy. In other words, what exactly counts as a finished game, what counts as a technical loss, what is excluded from the rating pool, and how are tablebase or manual decisions reported?

That question is not secondary. It is central. Adjudication rules determine which results are allowed to become evidence.

This is why a serious rating workflow cannot stop at “games were played and ratings were computed.” It must also explain how games were closed, how incidents were classified, and how exceptional cases were documented. TCEC has long published explicit information on draw adjudication, win adjudication, critical engine bugs, and tablebase adjudication. CCRL, while different in structure and purpose, also shows why formal testing conditions, large game archives, and downloadable PGN matter to public confidence. IJCCRL should therefore present its own methodology in the same spirit: not as a replacement for TCEC, CCRL, or any other established resource, but as a precise statement of how IJCCRL treats results inside its own broadcast, archive, downloads, winners, and rating workflow.

The principle is straightforward: rating publication becomes more credible when the path from live game to archived result is transparent.

Why adjudication rules matter

A rating list is an evidence summary. It compresses many individual games into a numerical model. But compression only works if the underlying records are coherent. If some games are natural completions, some are auto-adjudicated, some are ended by timeouts, some are interrupted by communication failures, and some are removed after manual review, then readers need to know that. Otherwise the final rating table risks looking more certain than the evidence really is.

This is why adjudication rules matter so much in computer chess ratings. They sit between raw gameplay and published statistics. They decide whether a game enters the sample, enters with a note, or is excluded.

A robust publication policy therefore needs to answer at least five questions:

Under what conditions may a game be adjudicated automatically as a draw or win?
How are crashes, disconnects, UCI failures, or timeouts classified?
How are tablebase-resolved positions treated?
When is manual intervention allowed?
How are adjudicated or excluded games reported without overstating the reliability of the result?

Those questions are not merely operational. They shape the credibility of the final rating output.

Automatic draw and win adjudication versus natural game completion

The cleanest game is a naturally completed game. Checkmate, stalemate, repetition, the 50-move rule, or an ordinary resignation-like engine result produce a straightforward record. But in engine tournaments, organizers often use adjudication rules to avoid wasting time in technically dead positions or obviously decisive evaluations.

TCEC provides a clear public example. Its published rules explain that, in addition to normal threefold repetition and the 50-move rule, a game may be adjudicated drawn from move 35 onward if both engines’ evaluations remain within a narrow near-equality band for a set number of recent plies. TCEC also publishes a win adjudication rule based on both engines holding a large winning evaluation for a sustained period. In addition, Cutechess may automatically adjudicate endgames with six men or fewer using Syzygy tablebases. Whether one agrees with every threshold is less important than the methodological lesson: the rule is public, the trigger is known, and the audience is told when such a rule exists.

That is the standard IJCCRL should emulate in spirit. An adjudicated draw is not inherently problematic. An adjudicated win is not inherently problematic either. The problem appears only when adjudication exists but is not disclosed.

For rating publication, the correct distinction is therefore not “natural completion good, adjudication bad.” The correct distinction is “transparent adjudication good, opaque adjudication bad.”

IJCCRL should explain clearly whether a given event used:

only natural completion,
natural completion plus automatic draw adjudication,
natural completion plus automatic win adjudication,
natural completion plus Syzygy-based endgame adjudication,
or a mixed regime with post-game manual review in exceptional cases.

That disclosure is important because the interpretation of results changes with the closure method. If a closed event used draw adjudication heavily, readers may want to know that many balanced endgames did not continue to full natural resolution. If a format used aggressive win adjudication, readers may want reassurance that the threshold was conservative and symmetrical.

In rating work, clarity beats silence.

Crashes, UCI failures, readyok/proxy failures, and timeout language

This is where methodological writing becomes most valuable. Not every non-standard result is a chess result. Some are technical incidents.

TCEC publicly distinguishes serious play-limiting bugs such as crashing or interface communication problems. Its archive rules also note that games lost on time, via stall, or disconnect are discarded from its official ratings list. That is an extremely important precedent because it separates tournament outcome from rating evidence. A game may exist as part of event history while still being treated differently in rating publication.

IJCCRL should apply the same logic and describe it in precise language.

A practical audit taxonomy for IJCCRL could look like this:

1. Crash

An engine process terminates unexpectedly or becomes non-functional in a way attributable to the engine side rather than ordinary chess play.

Recommended audit language:
“Game closed as engine crash.”
“Counted in event history.”
“Included in ratings only if IJCCRL event policy for that competition explicitly allows crash results; otherwise excluded from the rating sample.”

2. UCI failure

The engine fails to comply with the protocol layer in a meaningful way: malformed responses, missing bestmove, illegal move output, or failure to complete expected UCI interaction.

Recommended audit language:
“Game closed as UCI protocol failure.”
“Technical result, not interpreted as ordinary chess evidence.”

3. readyok failure

The engine does not answer readiness checks in the expected time window, preventing normal continuation. In an IJCCRL environment this can also appear symmetrically if the orchestration layer is disrupted and both sides are affected.

Recommended audit language:
“Game interrupted by readyok timeout.”
“Classified as technical interruption.”
“If symmetrical or infrastructure-related, report explicitly and avoid inflating competitive conclusions.”

4. Proxy or relay failure

The engine may be healthy, but the communication chain between engine, proxy, scheduler, or live relay breaks. This is especially relevant in a broadcast ecosystem with separate layers for engine orchestration and public display.

Recommended audit language:
“Game affected by proxy/relay communication failure.”
“Event note retained.”
“Excluded from rating inference unless the result can be clearly attributed to one engine under pre-declared rules.”

5. Timeout

Timeout language must be used carefully. A timeout can be an authentic chess loss on time, but it can also be the visible symptom of a deeper technical malfunction. The publication should not collapse those cases into one vague label.

A good audit note should therefore distinguish between:

ordinary loss on time during normal play,
stall or hang leading to time forfeit,
scheduler/proxy timing failure,
readyok timeout before proper continuation.

Without that distinction, readers cannot evaluate whether the result belongs inside a rating sample.

Tablebase positions and manual decisions

Tablebase resolution is one of the clearest examples of why adjudication policy must be written down. In reduced-material endgames, Syzygy tablebases provide objective information about win-draw-loss status and distance-to-zeroing in the relevant domains. That makes them extremely useful operationally. It also creates publication responsibilities.

If a game reaches a six-man tablebase position and the tournament software auto-adjudicates from that point, the published result should say so. If a game reaches a tablebase-known win but the engine then crashes before the move is played, the incident should not be silently disguised as a normal over-the-board-style conversion. If a human organizer manually confirms a result because the position is tablebase-proven, that too should be recorded.

The key issue is not whether tablebases are legitimate. They clearly are. The key issue is whether the reader is told when they were used.

Manual decisions deserve even stricter discipline. They should be rare, explicit, and framed as exceptions. A methodological article from IJCCRL should state that manual intervention is reserved for cases such as:

corrupted or incomplete game records,
symmetrical infrastructure failures,
clearly documented relay interruptions,
post-game verification of tablebase-determined positions,
or the removal of non-chess technical incidents from a rating dataset.

It should also state what manual intervention is not for. It is not for “correcting” an inconvenient chess result. It is not for polishing the story of an event. It is not for making a rating table look cleaner. Manual action is a documentation tool, not a narrative tool.

That distinction protects editorial integrity.

How to report adjudicated games without inflating conclusions

One of the most common publication mistakes is overstatement. A tournament report may treat a technically interrupted game as if it were a clean competitive result. A rating summary may cite totals without telling readers that some entries were excluded or adjusted. A winners page may emphasize the final score while leaving incident notes buried elsewhere.

The better approach is layered reporting.

Event layer

At the event level, all closures should be visible in the historical record. If a game ended by crash, timeout, or adjudication, the event note should say so. The event history should preserve what happened.

Audit layer

At the audit level, the classification should be refined. Was the incident engine-side, infrastructure-side, symmetrical, or unresolved? Did it affect only the event narrative, or also the rating sample?

Rating layer

At the rating level, only the declared evidence policy should control inclusion. If time forfeits, stalls, disconnects, or relay failures are excluded from ratings, the rating methodology should say that directly. If some categories are included, that too should be stated clearly.

This layered approach reduces confusion. It also allows a publication to remain honest without becoming unreadable.

A useful sentence pattern is:

“The event score includes all officially recorded games. The rating sample applies the IJCCRL rules-and-audit policy and may exclude technical incidents such as relay failures, readyok interruptions, or non-chess disconnects.”

That is the kind of sentence readers understand immediately. It does not hide the event history, but it prevents inflated statistical certainty.

An IJCCRL audit-note template

To make the methodology operational, IJCCRL should publish and reuse a standard audit-note template. A concise template might look like this:

IJCCRL Audit Note

Event:
Track: Original UCI / Derived Stockfish / other
Time control:
Game number(s):
Closure type: natural completion / draw adjudication / win adjudication / Syzygy adjudication / crash / UCI failure / readyok timeout / proxy failure / loss on time / manual decision
Trigger summary: short factual description
Symmetry status: one-sided / symmetrical / infrastructure-side
PGN status: preserved / corrected / excluded / replaced by note
Event-table treatment: included / annotated
Rating-sample treatment: included / excluded / included with note
Reason for rating treatment:
Manual review performed by:
Further evidence: download pack / archive entry / winners page / event page

This is not bureaucratic excess. It is exactly the kind of lightweight formalism that helps a rating ecosystem scale without losing trust.

Why this matters for IJCCRL specifically

IJCCRL is building a connected workflow: live broadcast, closed-event reporting, downloadable packs, archive preservation, winners surfaces, and rating publication. That means adjudication language cannot remain informal. Once a project has public event history and public rating surfaces, readers need a stable bridge between them.

That bridge is the rules-and-audit framework.

A reader who comes in through the homepage should be able to understand the broad project through the main hub around chess engines ratings lists at https://ijccrl.com/. A reader who wants the historical context should be able to move to the archive at https://ijccrl.com/archive/. A reader who wants evidence should be able to inspect downloadable material at https://ijccrl.com/downloads/. If appropriate to the editorial flow, a current event context can also be connected through https://ijccrl.com/events/, and the methodological explanation itself should live in the rules-and-audit area at https://ijccrl.com/rules-and-audit/.

That internal structure matters because it prevents ratings from becoming isolated numbers. It connects the claim to the record.

This article should therefore be presented not as a competitor to CCRL or TCEC, and not as a universal rulebook for all computer chess. It should be presented as an IJCCRL methodological article: a statement of how IJCCRL classifies and reports adjudication-related cases within its own tournament and publication ecosystem.

That is the right level of ambition. It is useful, specific, and credible.

Conclusion

Adjudication rules matter because they decide which results enter the evidence chain. In computer chess rating publication, that is not a minor technicality. It is the difference between a rating list that merely displays numbers and a rating list that can explain where those numbers came from.

A strong methodology distinguishes natural completions from automatic adjudications. It separates chess outcomes from technical incidents. It records when Syzygy tablebases or manual decisions are involved. It states whether crashes, stalls, disconnects, readyok failures, or proxy interruptions are included in event history, in ratings, in both, or in neither. And it publishes those choices in language readers can inspect.

That is the direction IJCCRL should take. Not to replace established institutions, but to strengthen its own evidence workflow: live broadcast, event reporting, downloads, archive, winners, and rating publication joined by one transparent audit vocabulary.

If ratings are the public summary, adjudication policy is the hidden grammar underneath. The more clearly that grammar is written, the more trust the summary can earn.

Sources consulted before drafting

TCEC Archive and Rules Information: https://tcec-chess.com/archive/
CCRL 40/15 Index: https://computerchess.org.uk/ccrl/4040/
CCRL 40/15 About: https://computerchess.org.uk/ccrl/4040/about.html
CCRL 40/15 Games / Downloads: https://computerchess.org.uk/ccrl/4040/games.html
IJCCRL Archive: https://ijccrl.com/archive/
IJCCRL Downloads: https://ijccrl.com/downloads/

Jorge Ruiz Centelles

Filólogo y amante de la antropología social africana

SÍGUEME