how it thinks

minimax search in quoridor

On each turn the computer considers every legal move it could make. For each, it imagines the opponent's best reply. For each of those, its own best response. And so on. The method has a name — minimax — and a simple rule: look deeper and play better. The trouble is that the tree of possibilities explodes.

the search tree

Each row below is one ply of lookahead. The green path is the principal variation: the sequence of moves the computer considers best, assuming the opponent also plays their strongest reply at every step. That is the move it plays.

computer's turn opponent's turn principal variation

the explosion

From the opening position, the computer has 131 legal moves: three pawn steps plus 128 places to put a wall. The opponent has roughly as many replies. Add another layer, and another, and the numbers run away from you.

depth	full tree	with pruning + ordering
1	131	~131
2	17,161	~2,000
3	2,248,091	~50,000
4	294,499,921	~500,000
5	38,579,489,651	~5,000,000

The first column is how many positions the computer would have to examine if it checked every possibility. The second is how many it actually checks. Two tricks do the work. Alpha-beta pruning skips any branch that has already been shown to be worse than an alternative. Candidate filtering ignores walls placed far from the action, where they cannot affect either player's path. Together they reduce the work by four orders of magnitude at depth 4.

how long that takes

Measured on this device, starting from the opening position:

depth 1

~30 ms

depth 2

~250 ms

depth 3

~1.4 s

depth 4

~14 s

log scale

Every extra ply costs five to ten times as much time as the one before (the chart is on a log scale to keep the shorter bars visible; the actual growth is exponential). The computer has a two-second budget per move, which caps it at depth 4. In the cluttered middlegame it usually manages three; in simpler endgames it gets to four.

why it plays well

At the bottom of the search tree, every position needs a score. The formula is short:

score = (opponent's distance − my distance) × 10 + (my walls − opponent's walls) × 2

Two ingredients: am I closer to my goal than my opponent, and do I have more walls in hand. Nothing else. The computer doesn't know about openings or traps or good squares. It just counts path length and wall inventory, and reaches far enough forward in the search to let tactics — jumps, forks, forced sequences — emerge on their own.

There is one positional shortcut worth mentioning. The computer never considers placing a wall far from every pawn. Walls in the corner, with nothing to block, are nearly always pointless, and examining them slows the search. Filtering them out leaves 20 to 40 candidates instead of 128. It is a safe simplification — but it does mean the computer can miss an unusual long-range wall play.

A strong human plays roughly the same way: count path lengths, imagine a few replies, pick the move that leaves you best off. The computer just does it more exhaustively.

open source at github.com/TomMHarris/quoridor