Every Chess Engine Makes the Same Wrong Assumption. Maxwell Doesn't.

There is one assumption so deeply embedded in the architecture of every chess engine that it's almost never stated outright. It's the foundation of Stockfish, Leela Chess Zero, Komodo, Berserk, and every other engine that has ever defeated a grandmaster or crushed you online. This assumption dates back to Claude Shannon's seminal 1950 paper on computer chess. And it is fundamentally wrong in a way that changes everything.
The assumption: your opponent plays perfectly.
More precisely, at every node in the search tree, the engine assumes the opponent will choose the move that minimizes your score—as if they possessed unlimited time, a perfect evaluation function, and no psychological quirks. This is the core of minimax. The engine doesn't model you. It models a ghost—an idealized, omniscient version of you that never blunders, never falls for a pawn fork, and never greedily snatches a poisoned pawn.
That ghost does not exist. Yet every top engine plays as if it does.
The Practical Cost of a Perfect-Opponent Model
When you assume a perfect adversary, many moves become "suboptimal" that are, in fact, highly effective against a real human. Traps, swindles, pseudo-sacrifices—these are lines a perfect opponent would refuse, but a human might accept. Minimax assigns them a low score and discards them. Consequently, the engine walks right past the sequence that would beat you, searching only for the line that beats the ghost.
This isn't a minor oversight. The entire lexicon of practical chess—zugzwang traps, poisoned pawns, Tal's speculative sacrifices—describes positions where the "objectively best" play diverges from the "best play against this opponent." Grandmasters constantly adjust to their opponent's tendencies. Their engines do not.
The central question of the Maxwell project was: What happens if the engine thinks about the opponent, too?
FEAS: One Parameter That Changes Everything
The first breakthrough was replacing the hard minimax backup with a softened, probabilistic alternative. Instead of assuming the opponent always plays the single best move, we model them as choosing from a probability distribution over moves, weighted by move quality. Formally, this is the quantal response model from bounded-rationality game theory: the probability of a move is proportional to exp(β · value), where β is an inverse-temperature parameter.
β → ∞recovers minimax: the opponent always plays the best move.β → 0recovers expectimax: the opponent plays randomly.- A finite
βmodels a fallible but structured opponent.
We call this operator family Free-Energy Adversarial Search (FEAS). Its key property is Pareto-safety: at β → ∞, you recover exact minimax, so FEAS can never be worse than minimax against a perfect opponent. All the risk is on the upside.
The engine estimates β online, from the opponent's actual moves during the game—observing what they play and updating a running maximum-likelihood estimate of their rationality. If you play perfectly, the engine converges to minimax. If you blunder, it starts exploiting.
This is intriguing, but not the sharpest result.
The Dangerous Lever: Structured Move-Type Bias
While varying β with position complexity yielded a measurable per-decision edge, it faced a stubborn limitation: exploitable states were rare and couldn't be manufactured. The engine could recognize when a position was tactically complex and your decision quality might drop—but it couldn't force you into that complexity.
The breakthrough came from asking a different question. Instead of asking where you err (complexity), we asked what kind of move you err on.
Real players—especially below master level—exhibit systematic move-type biases. The most universal one: they overvalue captures. When a piece can be taken, humans feel a powerful urge to take it, even when the resulting position is worse. This isn't random noise; it's a structural bias. Given a choice between a capture and a quiet move of equal objective value, a human will preferentially capture.
A biased opponent doesn't merely have a lower β. They possess a distorted perception of move values. Their perceived value of a reply r is:
perceived(r) = true(r) + greed · captured_value(r)
Their true strength might be high—they won't blunder mate in one—but they systematically overestimate moves that win material.
And here is what makes this dangerous: a biased perception is steerable. You can offer a capture.
Poisoned Baits: Engineering Exploitable Positions
If you know an opponent overvalues captures, you can place a piece en prise that is actually a trap. The piece looks free. Their biased model says it's a free piece. They take it. They lose.
This isn't a new idea in chess—it's the essence of a poisoned pawn, a zwischenzug, a desperado. What is new is building an engine that:
- Estimates the greed parameter online from observed opponent moves.
- Modifies its search to account for the distorted opponent model.
- Actively selects baits—moves that are suboptimal against a rational opponent but become optimal once the bias is factored in.
We proved this mechanism cleanly in a controlled experiment across 5,000 real chess positions at depth 3. Three decision algorithms competed: minimax (β → ∞), value-QR (knows β but not the bias), and bait (knows β and greed). Scored by true expected value against a greedy opponent:
| Greed | Minimax | Value-QR | Bait | Bait − Value-QR |
|---|---|---|---|---|
| 0.0 | 215.9 cp | 223.3 cp | 223.3 cp | +0.0 cp (z=0.0) |
| 0.5 | 207.4 cp | 210.1 cp | 213.0 cp | +2.9 cp (z=17.9) |
| 1.0 | 209.8 cp | 209.2 cp | 217.6 cp | +8.4 cp (z=21.9) |
| 2.0 | 232.5 cp | 228.5 cp | 248.6 cp | +20.1 cp (z=27.1) |
| 4.0 | 257.8 cp | 252.1 cp | 287.3 cp | +35.2 cp (z=29.9) |
Two findings stand out:
The edge scales exactly with the bias. At greed=0 (no bias), the bait engine offers zero advantage over generic exploitation—Pareto-safety by construction. At greed=4 (an opponent who overvalues captures by 4×), the bait engine gains +35 centipawns per decision over generic exploitation and +29 cp over minimax. The z-scores (20–30) indicate these are not statistical flukes.
This is fundamentally different from complexity exploitation. Complexity-based β adjustment yields a real but small and unsteerable edge. Move-type bias is everywhere: any position with a tempting capture is an opportunity, and the engine can create those opportunities by engineering positions where a piece is en prise. The engine doesn't wait for complexity to arise; it manufactures it.
What Maxwell Does That No Other Engine Does
Concretely: Maxwell plays a different game against you than it plays against Stockfish.
Against Stockfish (β → ∞, greed → 0), it collapses to pure minimax. There's no point offering bait to an opponent that won't take it. The engine recognizes this and stops.
Against a human who reliably grabs material, it starts engineering positions where material looks free. It's not that Maxwell plays worse against Stockfish—it's that it adds an entire dimension against humans. Minimax is merely a special case.
The unifying principle—what we call the exploitation thesis—is: exploit the conditional structure of your opponent's errors, not just their general weakness. Generic exploitation (knowing they're imperfect) yields a few centipawns. Structured exploitation (knowing how they're imperfect) delivers an edge that scales without bound with the bias.
This is the difference between a tactician who thinks, "This opponent blunders sometimes," and one who has watched twenty games and knows, "They always take with the queen."
Honest Caveats and Future Directions
We would be remiss not to note the gap between per-decision advantage and full-game wins.
The per-decision result (+35 cp, z=29.9) is extremely clean. However, translating it to game wins is harder. At depth 5 with an oracle greed estimate, the exploit engine wins more games in a sweet spot (greed 0.5–1.0, approximately +18% win rate), but the effect shrinks with larger sample sizes. The online learner—estimating greed from in-game moves—currently shows only marginal results.
The reason is data: greed estimation requires sufficient observations, and a single 40-move game doesn't provide many captures to observe. The realistic deployment is cross-game modeling—accumulating a fingerprint of a specific opponent across many games, which a Lichess bot can do. That is the obvious next step.
A deeper open question remains: greed bias is just one dimension of a full opponent fingerprint. Real opponents have multiple structured biases—overvaluing checks, undervaluing retreats, mishandling pawn structures. The production form of this idea is a learned bias(move-features) vector, estimated online, with per-bias gates. That is a richer, more powerful system. The proof of mechanism shows the direction is correct.
Where Maxwell Fits in the Literature
To be precise about novelty:
Maia (McIlroy-Young et al., 2020) built a neural network that predicts human moves—it plays human-like, not strong. That's a different goal: human-likeness, not exploitation.
KL-regularized search (piKL) (Jacob et al., 2022) regularizes toward a human policy prior. Again, the goal is modeling humans to play like them, not to beat them by modeling their mistakes.
Maxwell does something neither of these does: it estimates an opponent's bias online, in real time, and uses that estimate to select moves that are objectively suboptimal against a rational opponent but superior against that particular opponent. The engine adapts to you, not to humanity-in-aggregate.
No deployed engine—Stockfish, Leela, Dragon—does this. They model a single, rational, universal ghost opponent. Maxwell models you.
The Swindler Engine: A New Chess Archetype
There is a chess archetype that computers have never embodied: the swindler. Mikhail Tal, Alexei Shirov, Magnus Carlsen in practical play—players who don't just find the objectively best moves, but find the moves that are hardest for this particular opponent to handle. They read their opponent and set traps tailored to what they know.
That is what we are building. An engine that, when it detects you are greedy, starts leaving pieces hanging. When it senses you panic under time pressure, it creates complications. An engine that is, against a perfect opponent, a rigorous mathematician—and against you, a swindler.
Minimax is not wrong. It is right in the worst case. But the worst case is not the case we are usually in.