Chess Like Poker: Why One Bad Game Doesn't Matter
26 May 2026 · Jordan Morris
The hardest thing about playing poker well is ignoring the result of any one hand. A great player can shove all-in with aces, get rivered by an opponent who should have folded, and lose the pot anyway. The professional response is “yes, that’s poker.” Variance is the cost of doing business. What pays the rent over a thousand hands is the quality of every decision.
Chess players don’t usually think this way. We tilt. We replay a loss in our head for days. A blunder in round three echoes through round four, and the rating drop feels permanent. Poker players tilt too, by the way. The difference is that the great ones have learned not to: separating result from decision is the skill the entire game is built on.
The rating drop isn’t permanent. Most of it isn’t even very long-lasting. And we can prove it with real data.
The claim
The Glicko rating system that lichess uses (and every variant of it, including the ones chess.com and FIDE use) has a self-correcting property: every game you play after a loss is biased toward recovery. Your rating is lower than your true skill, so your expected score against your usual opponents is lower than reality, which means a win nets more points than usual, and a loss costs fewer.
If you keep playing the same decisions you always do, the gap between your actual rating and “where you would have been” shrinks every single game. The decision quality is what matters in the long run; one bad game just adds noise that the system actively damps out.
To put numbers on it, we ran an experiment on a real lichess player.
The setup
We picked MassterofMayhem, who has played over forty-seven thousand rated blitz games on lichess: plenty of volume to make the statistics meaningful. We pulled every one of those games via the lichess API. For each game lichess tells us his pre-game rating, the opponent’s rating, the result, and the exact rating change lichess applied, which is the key.
Rather than try to replicate lichess’s full Glicko-2 implementation (the small details quietly compound), we treat lichess’s reported rating changes as ground truth and back out, for each game, the effective “K factor” lichess used:
K = (lichess’s actual rating change) ÷ (result − expected score)
Then for the counterfactual we apply that same K with the flipped result, scaled by the counterfactual rating’s slightly different expected score. The actual trajectory matches lichess exactly; the counterfactual evolves at lichess’s effective per-game magnitudes. The experiment code has the full method, including a vanilla Glicko-2 reconstruction we tried first and discarded after it over-stated rating swings by a factor of about three.
The experiment itself: pick one of his losses. Replay history with the result flipped, so he wins it instead. Replay every game he played afterwards against the same opponents in the same order. Measure the gap between the two trajectories game by game.
Experiment A: one specific loss
We picked a clear-cut upset: a loss where he was rated more than two hundred points above his opponent. A game he really “should have won.”
In the moment, this loss cost him about 11.4 rating points compared to the parallel universe where he won. That’s the initial gap. Here’s what happens to it over the next several thousand games:
The curve drops fast at first, then keeps shrinking until it hits zero. By the time he had played another 165 games, the gap had fallen below one rating point. After 211 games, the gap was below half a point, meaning his displayed rating (rounded to the nearest integer) was identical to the counterfactual. Zero visible trace of the loss.
For a player active in blitz, a few hundred games is days, not months.
Experiment B: but is that game cherry-picked?
A single example proves nothing. Maybe we happened to pick the one loss where the opponents that came afterwards were friendly. So we ran the same experiment 200 times.
We took 200 random wins from his established blitz period, flipped each one to a loss, simulated the rest of his career under that change, and recorded the gap-over-time curve. The typical pattern:
Every single curve collapses toward zero. None of them stay perturbed. The exact recovery time varies depending on what came next: sometimes he had a string of wins that absorbed the change quickly, sometimes a string of losses dragged it out, but the destination is always the same.
The numbers behind that fan chart:
| Games since the flipped loss | Median gap (rating points) | 25–75% range |
|---|---|---|
| 0 (immediately after) | +11.3 | +10.8 to +12.0 |
| 1 | +11.1 | +10.6 to +11.8 |
| 5 | +10.4 | +10.0 to +11.1 |
| 10 | +9.7 | +9.2 to +10.2 |
| 25 | +7.7 | +7.3 to +8.1 |
| 50 | +5.2 | +4.9 to +5.6 |
| 100 | +2.4 | +2.3 to +2.6 |
| 250 | +0.2 | +0.2 to +0.3 |
| 500 | +0.0 | +0.0 to +0.0 |
| 1000 | +0.0 | +0.0 to +0.0 |
| 2500 | +0.0 | +0.0 to +0.0 |
| 5000 | +0.0 | +0.0 to +0.0 |
The median initial gap from a flipped game is about 11.3 rating points. By a thousand games later it’s typically already below one. The median time to converge below 0.5 rating points, to be effectively invisible, is 203 games.
What this means at the board
The Glicko system does the math because we don’t. That’s the whole point of having a rating system. But it’s worth internalising what the math says, because it changes how to think about a result.
You don’t owe your rating anything. A loss you “shouldn’t” have had isn’t a debt you have to grind back to even. The rating system will grind it back for you, automatically, as long as you keep playing the same way you always do.
The decision is what compounds, not the result. If you played that losing game with sound decisions (calculated the lines you should have, managed your time, didn’t blunder), then the loss is just variance. You made a +EV bet and got an unlucky outcome. Keep making +EV bets.
Tilt is the actual cost, not the loss. If a round-three blunder makes you play round four scared, or aggressive, or distracted, now you’re paying a real price, because round four’s decision quality genuinely dropped. The rating system can’t recover from worse decisions the way it can from unlucky outcomes.
This is the poker bit. Phil Galfond once said that the difference between a good player and a great one isn’t how they play their best hands; it’s how they handle the worst beats. The math in this post is just the chess version of the same idea: the long run rewards consistency, and “consistency” means consistency of decisions, not consistency of outcomes.
So next Tuesday, when you drop a half-point you weren’t expecting to drop, the math says: shrug, write down what you learned, and play your next move the same way you’d have played it if you were +30 instead of −30.
The rating, eventually, catches up.
Methodology, simulator code, and raw data are at
scripts/poker_analogy/.
Caveats: opponent rating deviations are estimated from the lichess
“provisional” flag rather than reconstructed exactly; the counterfactual
assumes he would have faced the same opponents in the same order, which
is a reasonable approximation for a single-game perturbation but not for
larger ones. Doesn’t apply to over-the-board tournament chess where a
loss can knock you out of contention, which is a separate kind of cost
that no rating system can refund.