← back to game

how it thinks

minimax search in quoridor

On each turn the computer considers every legal move it could make. For each, it imagines the opponent's best reply. For each of those, its own best response. And so on. The method has a name — minimax — and a simple rule: look deeper and play better. The trouble is that the tree of possibilities explodes.

the search tree

Each row below is one ply of lookahead. The green path is the principal variation: the sequence of moves the computer considers best, assuming the opponent also plays their strongest reply at every step. That is the move it plays.

computer's turn opponent's turn principal variation

the explosion

From the opening position, the computer has 131 legal moves: three pawn steps plus 128 places to put a wall. The opponent has roughly as many replies. Add another layer, and another, and the numbers run away from you.

depth full tree with pruning + ordering
1 131 ~131
2 17,161 ~2,000
3 2,248,091 ~50,000
4 294,499,921 ~500,000
5 38,579,489,651 ~5,000,000

The first column is how many positions the computer would have to examine if it checked every possibility. The second is how many it actually checks. Two tricks do the work. Alpha-beta pruning skips any branch that has already been shown to be worse than an alternative. Candidate filtering ignores walls placed far from the action, where they cannot affect either player's path. Together they reduce the work by roughly three orders of magnitude at depth 4 — and four by depth 5.

how long that takes

Measured on this device, from a cluttered middlegame position:

depth 1
~0.5 ms
depth 2
~5 ms
depth 3
~30 ms
depth 4
~0.4 s

log scale

Each extra ply costs several times as much as the one before (the chart is on a log scale to keep the shorter bars visible; the real growth is exponential). The hardest setting searches to depth four — well under a second here, comfortably inside its budget — and the optional move hint goes one ply deeper still, to depth five (about 1.7 seconds).

why it plays well

At the bottom of the search tree, every position needs a score. The formula is short:

score = (opponent's distance − my distance) × 10
+ (my walls − opponent's walls) × 2
+ (5 if it is my turn, otherwise −5)

Three ingredients: am I closer to my goal than my opponent, do I have more walls in hand, and is it my turn to move. The last is worth half a step — whoever is on the move can shorten their path next — and it keeps the computer from treating a position as equal when really one side is a tempo ahead. Nothing else. The computer doesn't know about openings or traps or good squares. It just counts path length, wall inventory and whose turn it is, and reaches far enough forward in the search to let tactics — jumps, forks, forced sequences — emerge on their own. (Reaching a goal is scored separately, as a decisive ±10000 that swamps everything above, so a win is always preferred to merely looking good.)

There is one positional shortcut worth mentioning. The computer never considers placing a wall far from every pawn. Walls in the corner, with nothing to block, are nearly always pointless, and examining them slows the search. Filtering them out leaves 20 to 40 candidates instead of 128. It is a safe simplification — but it does mean the computer can miss an unusual long-range wall play.

The three difficulty settings change only how far ahead it looks. Medium and hard share the formula above and search two plies and four respectively. Easy looks just one move ahead: it scores its own immediate moves and plays greedily, grabbing a wall when one clearly slows you and otherwise simply racing — plus a last-ditch wall when it has fallen behind and you are nearly home. So easy reacts; medium and hard plan.

A note on the constants. The ×10 and ×2 are hand-chosen, not learned — but the ×2 isn't claiming a wall in hand is "really" worth a fifth of a step. A wall's value shows up when it is placed: the opponent's path lengthens and the first term jumps. The small second term is just a spending cost — keep a reserve, and don't burn a wall unless it buys more than a fifth of a step, which any useful wall easily does. (This project also contains a separate, self-trained neural network — an AlphaZero-style policy-and-value net that learned its own evaluation from self-play — but the browser keeps the hand-tuned formula, because it runs instantly with nothing to download and no server to call.)

A strong human plays roughly the same way: count path lengths, imagine a few replies, pick the move that leaves you best off. The computer just does it more exhaustively.

open source at github.com/TomMHarris/quoridor