Skip to content
Home » News » Optimising Windows and hardware for UCI Chess Engines

Optimising Windows and hardware for UCI Chess Engines

Windows 10 chess engines

Table of Contents

Windows 10 Pro Workstation and Server-Class Hardware for UCI Chess Engines

Executive summary

UCI engines are essentially CPU-bound scientific workloads with strong sensitivity to latency, cache locality and memory bandwidth. On Windows 10 Pro Workstation running on multi-socket Xeon/EPYC servers—or high-core desktops—the largest gains come from:

  • Correct NUMA placement (pinning engines and GUIs to specific sockets).
  • Choosing thread counts that match physical cores per socket (12 or 24 on a 12C/24T node, etc.).
  • Ensuring Turbo Boost and appropriate power policy (Ultimate Performance) are enabled.
  • Using Large Pages for the transposition table (with care), sized sensibly (e.g., 4–16 GB per engine).
  • Reducing OS “noise”: indexing, Game DVR, background updates during runs.
  • Measuring the effect with PerfMon counters and repeatable Data Collector Sets rather than intuition.

This guide gives you a principled workflow: set hardware/firmware, prepare Windows, set process affinity/priority, pick engine options, verify with PerfMon, iterate. You’ll also get drop-in shell (PowerShell and CMD) snippets for everyday operations.


1) First principles: how UCI engines use your machine

1.1 CPU, threads, and NUMA

Modern engines are heavily search-bound; their parallelism scales well within one NUMA node (a CPU socket with attached memory channels), but often scales sub-linearly across sockets due to remote memory access and coherency traffic. Two implications follow:

  • On a dual-socket server (e.g., 2×12 cores with Hyper-Threading), the best starting point is to keep one engine instance within one socket: 12 threads (physical-only) or 24 threads (including SMT).
  • If you must use “odd sizes” (e.g., 20T on a 24-logical node), prioritise 12 physical + 8 SMT instead of a random 20/24 selection.

1.2 Memory and Large Pages

Engines allocate a large contiguous transposition table (TT). Large Pages (a.k.a. “Lock pages in memory” privilege + the engine using 2 MB pages) can reduce TLB pressure and page management overhead. Gains vary by CPU, OS and allocation pattern; on servers with abundant RAM the benefit is often measurable, but it must be configured carefully to avoid starving the OS.

1.3 Storage

For testing, storage matters less than CPU/memory, but tablebases and opening books benefit from SSDs. On enterprise RAID controllers, OS-level TRIM/UNMAP may not pass through; that is a normal limitation and not a blocker for engine throughput.

1.4 OS interference

Indexing, Game Bar/DVR, real-time antivirus scanning in engine folders and update tasks can add unpredictability. Disable or sandbox them during runs; re-enable for normal desktop usage.


2) Firmware (BIOS/UEFI) settings that actually move the needle

Every vendor’s naming differs; check your platform manual and translate accordingly.

  • Power/Turbo policy
    • Enable Intel Turbo Boost / AMD Precision Boost.
    • For servers: OS Control / High Performance / Maximum Performance (avoid “Static Low Power”).
  • C-States
    • Disable C1E and deep package C-states while benchmarking to minimise wake latencies; you can revert later if you need energy savings.
  • NUMA exposure
    • Disable Node Interleaving so Windows sees real NUMA nodes (one per socket).
    • Prefer Clustered/“Local” NUMA group size (vendor wording varies).
  • Hyper-Threading / SMT
    • Leave Enabled; evaluate both 1×threads/core (physical only) and 2×threads/core (with SMT) per node.
  • Memory RAS features
    • Keep ECC; disable aggressive “patrol scrubbing” during short tests if it causes background traffic (rarely needed).
  • PCIe power management
    • Disable deep ASPM states if you see DPC spikes (usually not required on a stable workstation).

Reboot into Windows only when BIOS is set; OS tuning without Turbo/NUMA is wasted effort.


3) Windows 10 Pro Workstation: power, timers and noise reduction

3.1 Use the “Ultimate Performance” plan and remove HPET forcing

  • Do select the Ultimate Performance power plan, which unlocks aggressive CPU policies and disables core parking.
  • Do not force HPET via useplatformclock. Windows on modern systems prefers TSC, which has far better overhead.

Batch (run as Administrator):

:: Ultimate Performance
powercfg -duplicatescheme e9a42b02-d5df-448d-aa00-03f14749eb61 >nul 2>&1
powercfg -setactive e9a42b02-d5df-448d-aa00-03f14749eb61

:: 100% min/max processor policy (harmless on Ultimate but explicit)
powercfg /setacvalueindex scheme_current sub_processor PROCTHROTTLEMIN 100
powercfg /setacvalueindex scheme_current sub_processor PROCTHROTTLEMAX 100

:: Ensure no HPET forcing
bcdedit /deletevalue useplatformclock 2>nul

3.2 Turn off gaming and indexing features during test windows

Batch (Admin):

:: Disable Game Mode / DVR noise
reg add "HKCU\Software\Microsoft\GameBar" /v AutoGameModeEnabled /t REG_DWORD /d 0 /f
reg add "HKLM\SOFTWARE\Microsoft\GameBar" /v AllowAutoGameMode /t REG_DWORD /d 0 /f
reg add "HKLM\SOFTWARE\Policies\Microsoft\Windows\GameDVR" /v AllowGameDVR /t REG_DWORD /d 0 /f

:: Disable Search indexing service (optional—re-enable after tests)
sc stop WSearch & sc config WSearch start= disabled

3.3 Defender exclusions for engine folders and binaries

PowerShell (Admin):

# Replace paths and processes with your actual engine/test tree
Add-MpPreference -ExclusionPath "C:\Engines","C:\Gauntlet","C:\Program Files\ChessBase"
Add-MpPreference -ExclusionProcess "revolution.exe","fastchess.exe","ChessProgram20.exe"

(If you use a third-party AV, configure equivalent exclusions.)

3.4 Visual effects and background priorities (optional)

On systems that double as desktops, the GUI can pre-empt a bit of CPU. Reducing compositor overhead is fine but usually low impact compared to affinity/NUMA. You can set “Best performance” visuals or “Programs” in System → Advanced → Performance. Measure before/after.


4) Large Pages (Lock Pages in Memory) the right way

4.1 What “Disabled” means in your shell

whoami /priv showing Disabled for SeLockMemoryPrivilege merely reflects the current shell token. Engines typically enable the privilege inside their own process just before allocating the TT in 2 MB pages. That is expected. The real checks are:

  • Does the privilege appear listed for your user after re-login?
  • Does RAMMap show Large Page usage rising when the engine runs?
  • (Optional) Does a per-process token inspection show Enabled=True for the engine process?

4.2 Granting the privilege with PowerShell only

PowerShell (Admin): adds the current user to “Lock pages in memory” via secedit; re-login afterward.

# --- Enable SeLockMemoryPrivilege for the CURRENT user ---
$TargetAccount = "$env:USERDOMAIN\$env:USERNAME"
$Sid = ([System.Security.Principal.NTAccount]$TargetAccount).
          Translate([System.Security.Principal.SecurityIdentifier]).Value
$InfExport = Join-Path $env:TEMP 'secpol_lp_export.inf'
$InfNew    = Join-Path $env:TEMP 'secpol_lp_new.inf'
$Db        = Join-Path $env:TEMP 'secpol_lp.sdb'

secedit /export /cfg $InfExport | Out-Null
$content = Get-Content -Raw -LiteralPath $InfExport
if ($content -notmatch '\[Privilege Rights\]') { $content += "`r`n[Privilege Rights]`r`n" }

$lines = $content -split "`r`n", -1
$found = $false
for ($i=0; $i -lt $lines.Length; $i++) {
    if ($lines[$i] -match '^\s*SeLockMemoryPrivilege\s*=\s*(.*)$') {
        $found = $true
        $rhs = $Matches[1].Trim()
        $vals = @()
        if ($rhs.Length -gt 0) { $vals = $rhs -split '\s*,\s*' | Where-Object { $_ -ne '' } }
        $acc = "*$Sid"
        if ($vals -notcontains $acc) { $vals += $acc }
        $lines[$i] = "SeLockMemoryPrivilege = " + ($vals -join ',')
        break
    }
}
if (-not $found) {
    $idx = [Array]::IndexOf($lines, '[Privilege Rights]')
    if ($idx -lt 0) { $lines += '[Privilege Rights]'; $idx = $lines.Length - 1 }
    $lines = $lines[0..$idx] + @("SeLockMemoryPrivilege = *$Sid") + $lines[($idx+1)..($lines.Length-1)]
}
Set-Content -LiteralPath $InfNew -Value ($lines -join "`r`n") -Encoding Unicode
secedit /configure /db $Db /cfg $InfNew /areas USER_RIGHTS | Out-Null
Write-Host "Added Lock pages in memory to $TargetAccount ($Sid). Please sign out/in."

Reboot or sign out/in to refresh your token. Then confirm:

whoami /priv | Select-String -Pattern 'SeLockMemoryPrivilege'

4.3 Selecting a sensible hash size

Even with 256 GB RAM, start modestly (e.g., 4–16 GB for TT/Hash per engine). Large Pages allocate contiguous blocks; oversizing can fail or fragment. Monitor pagefile/commit and keep tens of GB free for the OS and other tasks.


5) Process placement: priority, affinity and NUMA pinning

5.1 CMD: launchers that pin by node

Separate GUI and gauntlet to different sockets to avoid resource contention. Examples:

:: Gauntlet on NUMA Node 0 (logical CPUs 0..23), High priority
start "Gauntlet-Node0" /high /node 0 /affinity FFFFFF cmd /c "run_fastchess_gauntlet.bat"

:: ChessBase GUI (ChessProgram20.exe) on NUMA Node 1, High priority, 20 logical CPUs (bits 24..43)
start "ChessProgram20-Node1-20T" /high /node 1 /affinity FFFFF000000 "C:\Program Files\ChessBase\ChessProgram20.exe"

Notes

  • /node N binds the process to NUMA node N.
  • /affinity MASK assigns a bit mask of allowed logical CPUs.
  • On a 24-logical node, the full mask is FFFFFF (hex). If you want the entire node: use FFFFFF000000 for node 1, etc.
  • For engines, prefer 12T (physical) or 24T (with SMT) per socket, measure both.

5.2 PowerShell: pin an existing process

If the GUI spawns the engine separately, you can “enforce” affinity after launch:

# Set processor affinity for a running process (by name). Example: 20 threads on node1 bits (24..43)
$mask = [IntPtr]0xFFFFF000000
Get-Process -Name 'revolution*' -ErrorAction SilentlyContinue |
  ForEach-Object { $_.ProcessorAffinity = $mask }

You can wrap that in a loop to wait for the process for up to 60 seconds.


6) Engine configuration choices that interact with Windows

  • Threads:
    • Start with physical cores per socket (e.g., 12), then try physical+SMT (24).
    • Avoid arbitrary counts (e.g., 20/24) unless you understand the masks you’re applying.
  • Ponder:
    • Disable (Ponder off) in formal tests to avoid cross-instance interference and fair scheduling issues.
  • Hash:
    • Size TT in GiB sensibly (4–16 GB per instance is common on large-RAM hosts). Large Pages magnify the benefit but also the risk of starving the OS if oversized.
  • Tablebases (Syzygy):
    • Place on a fast SSD. Consider mapping TB folders to a RAM cache only if you can spare the memory and you can demonstrate a throughput gain for your time control.
  • Books/openings:
    • Keep in a local SSD folder excluded from AV scanning; avoid network shares for deterministic timings.
  • Limit Strength / Skill:
    • Ensure these are disabled for rating tests (common gotcha when copying GUIs or .ini files between hosts).

7) Measurement: the PerfMon methodology that avoids self-deception

Optimisation without measurement is guesswork. Windows Performance Monitor (PerfMon) gives reproducible counters, either in real-time or via a Data Collector Set that writes .blg logs for offline analysis.

7.1 Which counters to collect (and why)

CPU and frequency

  • \Processor(_Total)\% Processor Time – baseline utilisation.
  • \Processor Information(_Total)\% Processor Performance – indicates Turbo (>100 %).
  • \Processor Information(*)\Processor Frequency – instantaneous per-core frequency (if available).
  • \System\Processor Queue Length – runnable threads waiting (prolonged > #cores suggests saturation).

NUMA locality

  • \NUMA Node Memory(*)\Local Node Accesses/sec
  • \NUMA Node Memory(*)\Remote Node Accesses/sec If remote accesses climb for your engine process, your affinity/placement is leaking across sockets.

Process-level detail

  • \Process(revolution*)\% Processor Time (or the engine you use).
  • \Process(revolution*)\Private Bytes, Working Set – check actual TT footprint.
  • \Thread(revolution*)\Context Switches/sec – very high values can reflect thread contention or timer noise.

OS background noise

  • \Processor(*)\% DPC Time, \Processor(*)\% Interrupt Time – driver/ISR load.
  • \PhysicalDisk(_Total)\Avg. Disk sec/Read – if you involve TBs; spikes may indicate contention.

7.2 Building a reusable Data Collector Set (GUI)

  1. Open perfmon.exeData Collector SetsUser DefinedNew → Data Collector Set.
  2. Choose Create manually (Advanced)Performance Counter.
  3. Add the counters listed above (adjust process names to your actual engine).
  4. Sample interval: 1 second for test sessions is fine.
  5. Log format: Binary (.blg), output to a folder with ample space.
  6. Start the set → run your workload → stop → open the .blg in Performance Monitor for analysis.

7.3 Scripting a Data Collector Set (logman)

PowerShell / CMD (Admin):

$set = "UCI_Engines"
$log = "C:\PerfLogs\UCI_Engines.blg"
logman delete $set -ets 2>$null

logman create counter $set `
 -c "\Processor(_Total)\% Processor Time" `
 -c "\Processor Information(_Total)\% Processor Performance" `
 -c "\NUMA Node Memory(*)\Local Node Accesses/sec" `
 -c "\NUMA Node Memory(*)\Remote Node Accesses/sec" `
 -c "\Process(revolution*)\% Processor Time" `
 -c "\Process(revolution*)\Private Bytes" `
 -c "\Thread(revolution*)\Context Switches/sec" `
 -si 01:00 -o $log -f bincirc -max 200 -ets

# Start, later stop with:
# logman stop UCI_Engines -ets

(Adjust the sampling interval -si (hh:mm:ss), process names, and -max circular size according to your runs.)

7.4 Reading results

  • If \% Processor Performance never exceeds ~100 %, revisit Turbo and BIOS power mode.
  • If Remote Node Accesses/sec are significant compared to local, your engine is crossing NUMA boundaries → tighten affinity.
  • If Context Switches/sec for engine threads are excessive and vary with GUI activity, consider headless runs for formal testing.

8) Storage and file system considerations (brief)

  • SSD for TBs and books: latencies help when the engine occasionally touches TBs; for pure mid-game search, impact is smaller.
  • TRIM / ReTrim: on many enterprise RAID controllers the OS cannot pass TRIM to logical volumes; do not force it. A periodic defrag C: /O is sufficient housekeeping for the file system metadata on SSDs.
  • NTFS cluster size: do not change unless you truly know you need it (gains are marginal and trade-offs are real).

9) Putting it together: practical recipes

9.1 Separate gauntlet and GUI by socket (two launchers)

CMD launcher #1 – Gauntlet on Node 0 (1T per engine)

@echo off
setlocal
pushd "%~dp0"
echo Gauntlet on NUMA Node 0 (CPUs 0..23), High priority...
start "Gauntlet-Node0" /high /node 0 /affinity FFFFFF cmd /c "run_fastchess_gauntlet.bat"
popd

CMD launcher #2 – ChessBase GUI on Node 1 (20T or 24T)

@echo off
set "GUI=C:\Program Files\ChessBase\ChessProgram20.exe"
if not exist "%GUI%" (
  echo Cannot find ChessProgram20.exe at: %GUI%
  pause & exit /b 1
)
:: 20 logical threads on node 1 (bits 24..43):
start "ChessProgram20-Node1-20T" /high /node 1 /affinity FFFFF000000 "%GUI%"
:: For the full node (24 logical), use: FFFFFF000000

If the GUI spawns the engine in a separate process, add a post-pin step:

# After GUI launch, enforce affinity on the engine for 60 s
$mask = [IntPtr]0xFFFFF000000
$deadline = (Get-Date).AddSeconds(60)
while((Get-Date) -lt $deadline){
  $p = Get-Process -Name 'revolution*' -ErrorAction SilentlyContinue
  if($p){ $p | ForEach-Object{ $_.ProcessorAffinity = $mask }; break }
  Start-Sleep -Milliseconds 500
}

9.2 Enable “Lock pages in memory” with PowerShell (no .bat)

See §4.2 for the script that modifies Privilege Rights via secedit.

9.3 Defender exclusions (repeat)

Add-MpPreference -ExclusionPath "C:\Engines","C:\Gauntlet","C:\Program Files\ChessBase"
Add-MpPreference -ExclusionProcess "revolution.exe","fastchess.exe","ChessProgram20.exe"

9.4 Verify Turbo and idle with a 5-second snapshot

Get-Counter '\Processor Information(_Total)\% Processor Performance',
            '\Processor(_Total)\% Idle Time' -SampleInterval 1 -MaxSamples 5 |
  Select-Object -Expand CounterSamples |
  Select Path, CookedValue | Format-Table -Auto

Expect % Processor Performance > 100 under sustained load.

9.5 Measure NUMA locality for your engine

# Quick view: local vs remote memory accesses per NUMA node (system-level)
Get-Counter '\NUMA Node Memory(*)\Local Node Accesses/sec',
            '\NUMA Node Memory(*)\Remote Node Accesses/sec' -SampleInterval 1 -MaxSamples 10

If remote accesses are persistently non-trivial during single-socket runs, double-check that your engine is not crossing nodes.

9.6 Inspect the engine process token for SeLockMemoryPrivilege (optional)

# Returns "Present, Enabled=True" once the engine has enabled LP in-process
# (Run with the engine already thinking.)

$src = @"
using System;
using System.Text;
using System.Runtime.InteropServices;
public static class PrivCheck {
  [StructLayout(LayoutKind.Sequential)] public struct LUID { public uint LowPart; public int HighPart; }
  [StructLayout(LayoutKind.Sequential)] public struct LUID_AND_ATTRIBUTES { public LUID Luid; public UInt32 Attributes; }
  const int TokenPrivileges=3, SE_PRIVILEGE_ENABLED=0x2, PROCESS_QUERY_LIMITED_INFORMATION=0x1000, PROCESS_QUERY_INFORMATION=0x0400, TOKEN_QUERY=0x0008;
  [DllImport("kernel32.dll", SetLastError=true)] static extern IntPtr OpenProcess(int access, bool inherit, int pid);
  [DllImport("kernel32.dll", SetLastError=true)] static extern bool CloseHandle(IntPtr h);
  [DllImport("advapi32.dll", SetLastError=true)] static extern bool OpenProcessToken(IntPtr ph, int access, out IntPtr th);
  [DllImport("advapi32.dll", SetLastError=true)] static extern bool GetTokenInformation(IntPtr th, int cls, IntPtr buf, int len, out int ret);
  [DllImport("advapi32.dll", SetLastError=true, CharSet=CharSet.Unicode)]
  static extern bool LookupPrivilegeName(string sys, ref LUID luid, StringBuilder name, ref int cch);
  public static string Check(int pid, string targetPriv) {
    IntPtr ph = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, false, pid);
    if (ph == IntPtr.Zero) ph = OpenProcess(PROCESS_QUERY_INFORMATION, false, pid);
    if (ph == IntPtr.Zero) return "OpenProcess failed";
    try {
      if (!OpenProcessToken(ph, TOKEN_QUERY, out var th)) return "OpenProcessToken failed";
      try {
        int len=0; GetTokenInformation(th, TokenPrivileges, IntPtr.Zero, 0, out len);
        IntPtr buf = System.Runtime.InteropServices.Marshal.AllocHGlobal(len);
        try {
          if (!GetTokenInformation(th, TokenPrivileges, buf, len, out _)) return "GetTokenInformation failed";
          uint count = (uint)System.Runtime.InteropServices.Marshal.ReadInt32(buf);
          IntPtr ptr = new IntPtr(buf.ToInt64() + 4);
          for (uint i=0;i<count;i++){
            LUID_AND_ATTRIBUTES la = (LUID_AND_ATTRIBUTES)System.Runtime.InteropServices.Marshal.PtrToStructure(ptr, typeof(LUID_AND_ATTRIBUTES));
            int nlen=256; var name = new StringBuilder(nlen); var luid=la.Luid;
            if (LookupPrivilegeName(null, ref luid, name, ref nlen)){
              if (string.Equals(name.ToString(), targetPriv, StringComparison.OrdinalIgnoreCase)){
                bool enabled = (la.Attributes & SE_PRIVILEGE_ENABLED) != 0;
                return $"Privilege {name}: Present, Enabled={enabled}";
              }
            }
            ptr = new IntPtr(ptr.ToInt64() + System.Runtime.InteropServices.Marshal.SizeOf(typeof(LUID_AND_ATTRIBUTES)));
          }
          return $"Privilege {targetPriv}: NOT present in process token";
        } finally { System.Runtime.InteropServices.Marshal.FreeHGlobal(buf); }
      } finally { CloseHandle(th); }
    } finally { CloseHandle(ph); }
  }
}
"@
Add-Type -TypeDefinition $src
function Get-ProcessSeLockMemoryPrivilege([string]$ProcessOrPid){
  $pid = ($ProcessOrPid -match '^\d+$') ? [int]$ProcessOrPid : (Get-Process -Name $ProcessOrPid -ErrorAction Stop | Select-Object -First 1).Id
  [PrivCheck]::Check($pid, "SeLockMemoryPrivilege")
}
# Example:
Get-ProcessSeLockMemoryPrivilege revolution

10) Troubleshooting patterns and decision checklist

  1. CPU stays around 50 % of total
    • Expected when you use ~half the machine (e.g., 20–24 threads on a 48-logical system). To utilise more, either:
      • Increase threads on the GUI engine (e.g., from 20→24 on a 24-logical socket), or
      • Increase gauntlet concurrency on the other node (match physical cores).
  2. % Processor Performance never >100 %
    • Turbo may be limited. Check BIOS: Turbo enabled, power regulator in OS or High Performance. Verify with PerfMon frequency counters.
  3. Remote Node Accesses/sec are high
    • Affinity leak or NUMA interleaving. Re-apply /node and /affinity, verify masks, and avoid cross-socket TB folders.
  4. Engine reports Large Pages: no / allocation fails
    • Ensure the user launching the engine has Lock pages in memory. Reduce Hash size. Close other LP-consuming processes. Sign out/in after privilege assignment.
  5. DPC/ISR spikes
    • Update NIC/storage drivers; disable unnecessary devices; avoid HPET forcing; use Ultimate Performance. Generally rare on workstation builds.

11) Example “golden” workflow for a dual-socket 12C/24T/server

  1. Firmware: Turbo=On, OS Control=On, C1E=Off, Node Interleaving=Off, SMT=On.
  2. Windows: Ultimate Performance, no HPET forcing, indexing & Game DVR off for the test window, Defender exclusions set.
  3. Large Pages: assign privilege, sign out/in; start with 8 GB–16 GB Hash.
  4. Placement:
    • Node 0: gauntlet (1T per engine), concurrency ≤ physical cores.
    • Node 1: GUI + main engine, 12T (physical) or 24T (incl. SMT).
  5. PerfMon: start the Data Collector Set; run test; stop and analyse.
  6. Iterate: pick the best thread count & Hash size for your time control and test protocol.

12) SEO-friendly FAQ (quick answers people search for)

Q: How do I optimise Windows 10 for chess engines?
A: Use the Ultimate Performance plan, keep Turbo Boost on, pin engines to a single NUMA node, choose threads = physical cores (or physical+SMT) per socket, set Defender exclusions, and enable Large Pages carefully. Validate with PerfMon.

Q: Should I disable Hyper-Threading?
A: Usually no; measure both. Many engines benefit modestly from SMT when confined to one socket. Always compare 1× vs 2× threads per core on your node.

Q: Do Large Pages always help?
A: Often, but not universally. Gains depend on CPU, OS, allocator and TT patterns. Keep Hash reasonable (4–16 GB), ensure the privilege is set, and confirm with RAMMap and performance deltas.

Q: Why does my shell show “SeLockMemoryPrivilege: Disabled” while the engine says “Large Pages: yes”?
A: Because enablement is per process. The engine enables the privilege in its own token at allocation time. Your shell’s token state is irrelevant to the engine’s runtime state.

Q: Should I force HPET on Windows 10?
A: No for these workloads. Let Windows use TSC (default). Forcing HPET increases timer overhead on most modern systems.


13) Script index (copy-paste ready)

  • Power plan + policy + “no HPET forcing” (CMD, Admin)
    See §3.1.
  • Disable Game Bar/DVR and indexing (CMD, Admin)
    See §3.2.
  • Defender exclusions (PowerShell, Admin)
    See §3.3.
  • Enable Lock Pages (PowerShell, Admin)
    See §4.2.
  • Pin GUI and gauntlet by NUMA (CMD; and PowerShell post-pin)
    See §5.1, §5.2.
  • PerfMon snapshot for Turbo & Idle (PowerShell)
    See §9.4.
  • logman Data Collector Set (PowerShell/CMD, Admin)
    See §7.3.

14) Cautions, risks and responsible use

  • Back up: Before changing firmware or registry settings, create a restore point/system backup.
  • Privilege scope: Do not grant Lock pages in memory to “Everyone”. Assign it only to the account(s) that launch engines. Large Pages can be abused for memory-locking denial-of-service.
  • Hash sizing: Oversized TT can cause allocation failure or push the OS into memory pressure. Start modestly; scale up while observing Commit (GB), Available (GB) and the pagefile.
  • Indexing and security: Disabling Search/Indexing and excluding folders from AV reduces OS noise, but also removes protections. Re-enable or re-scope exclusions when you’re done testing.
  • HPET/TSC: Do not force HPET unless you have a very specific reason and have measured a gain (rare).
  • NUMA masks: A wrong mask can silently move threads across sockets, hurting performance. Test masks on a throwaway session first.
  • BIOS experimentation: Vendors expose advanced knobs (uncore frequencies, scrubbing, package C-states). Change one variable at a time and document your baseline to avoid chasing ghosts.
  • Domain policies (GPO): In managed environments, domain GPOs may override local Privilege Rights. Coordinate with your admin.
  • Reproducibility: Keep a runbook (engine build, options, thread counts, OS build, firmware version, masks used, counters recorded) so others can replicate your results.

Closing note

Chess engines are deterministic programs; the OS and hardware are not. The path to a faster system is measure → change one thing → measure again. Combine the NUMA placement patterns, sensible thread/Hash sizing, Large Pages where appropriate, and a repeatable PerfMon setup. The tools and scripts in this guide should get you there predictably—just apply them with care, document your changes, and always verify the results before trusting them in long-running rating tests or tournament broadcasts.

Jorge Ruiz

Jorge Ruiz

connoisseur of both chess and anthropology, a combination that reflects his deep intellectual curiosity and passion for understanding both the art of strategic chess books

Leave a Reply

Your email address will not be published. Required fields are marked *