One very enjoyable part of writing “Game Changer” was that Natasha and I got the chance to interview Demis Hassabis, CEO of DeepMind. One question we asked Demis was whether it was hard to analyse results (such as from AlphaZero) because the systems are so complicated? Demis’ answer:
“It’s very complicated, but certainly no more complicated than the brain. Probably substantially less complicated because these systems are still a lot smaller in terms of the number of neurons and connections. We also have full access and control over every moment-by-moment thing that the machine is doing, which we don’t even have with brain imaging. So my argument is, our understanding should be at least as good as with the brain, and I would argue that we should be in a better position than we are with the brain, because we have all these extra controls over what the system is doing”.
I thought of this just after witnessing 3600-rated Leela Zero make an uncharacteristic blunder against Stockfish in a slightly worse position in game 49 of the TCEC Season 20 SuperFinal.
Stockfish NNUE – Leela Zero [E87]
TCEC Season 20 – Superfinal https://tcec-chess.com (49.1), 23.01.2021
The first in a series of missteps by Leela in a slightly worse position. Despite White’s pressure on the h-file, Black is defending fairly comfortably. However, with 78…Kh8 Leela places its king on a spot that will become tactically vulnerable in an unexpected way. 78…Rf4 was Stockfish’s preferred line keeping Black’s disadvantage to a minimum.
Stockfish’s evaluation rose ominously to nearly a pawn’s advantage after this move. Leela however was unaware of the gathering storm clouds and played the next two moves very quickly.
79…Bb5 80.Bc2 Bd7
By chasing White’s light-squared bishop away from the f1–a6 diagonal, Black prepares the strong manoeuvre …Nb5–d4 without allowing the exchange of the black knight. However, the light-squared bishop’s presence on the b1–h7 diagonal gives White a fine attacking idea!
Threatening Rxh7+ winning the black queen. We see why the black king is poorly-placed on h8!
82…f4 83.Qg2 Rff7
Desperation! 83…Rfg8 was Leela’s intention from afar, exploiting the king’s shuffle to h8 by pinning the g6–pawn to the white queen. It is a devilishly clever defence but it fails to a spectacular idea! 84.Qg5
Threatening Qxe7 followed by Rxh7+ mate! 84…Qxg5 85.Rxh7+ Rxh7 86.Rxh7#
84.Rh6 Bb5 85.gxf7 Qxf7
86.Qd2 Qd7 87.Bd3 Bxd3 88.Qxd3
and Stockfish finished off the game some 40 moves later.
88…Nb5 89.Qc4 Nc7 90.Qc3 Kg8 91.Qe1 Nb5 92.Qa5 Nc7 93.Qb6 Ne8 94.Qb8 Re7 95.Ka2 Kf8 96.Rg1 Rf7 97.Re6 Re7 98.Rxe7 Kxe7 99.Rg8 Qd8 100.Qxd8+ Kxd8 101.b3 axb3+ 102.Kxb3 h5 103.Rh8 Kd7 104.a4 Nc7 105.Rxh5 Kc8 106.Rh8+ Kd7 107.a5 Na6 108.Kc4 Nc7 109.Rb8 Na6 110.Rb6 Nb4 111.a6 Nxa6 112.Rxa6 Ke7 113.Kb5 c4 114.Kxc4 Kf6 115.Rxd6+ Ke7 116.Re6+ Kf7 117.Kc5 Kg7 118.d6 Kf7 119.Kd5 Kg7 120.d7 Kf8 121.d8Q+ Kg7 122.Qd7+ Kf8 123.Re8# 1–0
What had Leela been thinking? How had such a blunder occurred? Is it possible to peer into Leela’s brain and see what happened?
The answer is… partly! I’d like to introduce a great tool to you called “Nibbler”. This tool can be downloaded from https://github.com/fohristiwhirl/nibbler and offers a deeper look into Leela’s “thoughts” than is otherwise possible with Chessbase or the Fritz GUI.
After downloading Nibbler and pointing it at a Leela installation (which I refer to as “my Leela” in the text to differentiate it from “TCEC Leela” playing the SuperFinal!), you are ready to go! I’ll just point out some of the interesting details that you can see with Nibbler and relate those to the human experience of thinking.
This is Nibbler analysing the critical position where Leela started to go astray with 78…Kh8. I’ll highlight a few elements of the GUI:
Shows the main Black move and “my Leela’s” evaluation (from the side it is analysing). 45 means a 45% expected score for Black which means slightly worse.
Shows the expected score of 2 Black moves: 78…Kh8 (45.1%) and 78…Nb5 (28.4%) So “my Leela” thinks 78…Kh8 is much better than 78…Nb5.
Seems technical and mysterious but is quite simple.
Nodes: 64.2M is the number of plies (individual half-moves) that Leela has looked at in total while analysing.
N/s: 11.3K is the number of plies (half-moves) per second that “my Leela” is analysing. That may sound like a lot (well humans do about 2 or 3!) and my hardware is quite decent but Leela is reaching speeds of 100K-200K at the TCEC so you can see how amazing their hardware is!
These are interesting figures but maybe not useful in themselves. Let’s look however at the figures given at the end of each variation.
This is the summary at the end of the line 78…Kh8.
N:98.93% [63.5M] shows that of the 64.2M plies it looked at 63.5M of those were related to the move 78…Kh8! In other words, it didn’t look at much else! WDL is the Win, draw and loss subdivision of “my Leela’s” expected score for 78…Kh8 (45.1%). So out of 10 games, “my Leela” would expect to win about 2, draw 5 and win about 3.
You may notice as well that the moves do not seem to be displayed in any specific order:
The percentages seem completely random! And yet this is the order in which “my Leela” has ranked its moves. On what basis you may ask? On how much it has looked at each move:
It may be a little hard to see, but the first move has had 98.9% of Leela’s time, the second move has had 0.32%, the third move 0.15% etc… As far as I understand it, Leela always plays the move it has looked at the most. That means that if it has looked at a bad move a LOT, it will take a HUGE effort to change its mind!
Bearing these parameters in mind, let’s go through what happened in the game, using the statistics from the TCEC and some extra data I generated from my own attempts on my own hardware. For the purposes of this investigation, we’ll assume they correlate perfectly. They aren’t a perfect match since my hardware (“my Leela”) is weaker, and the version of Leela at the TCEC (“TCEC Leela”) is bleeding edge, but they aren’t miles apart from each other either.
A big mistake according to Stockfish recommending 78…Rf4 instead. Its evaluation jumped from 0.52 centipawns (Stockfish displays its evaluation in the old-fashioned “pawns up” way!) to 0.92 centipawns (nearly a pawn up).
However, as we have seen from my Nibbler output, “my Leela” spends almost all its efforts and energy on that move; in my tests Stockfish’s recommendation 78…Rf4 was given just 0.1% of the time it spent. The more time it spent (I ran my test for 1hr 34 mins!) the more time my Leela spent on 78…Kh8.
If we translate that to human terms, Leela was a victim of tunnel vision, focusing on one move to the exclusion of anything else. It’s one of the things that Russian coaches such as Mark Dvoretsky and Artur Yusupov have warned against, encouraging their students to draw up a list of candidate moves before starting to calculate!
79.Bd3 was unexpected for Leela: it was way down in 14th on my Leela’s list of expected moves! It rejected it after a cursory examination due to 79…Nb5. Once 79.Bd3 was played on the board, Leela thought a little bit further.
Its initial thoughts were all focused on 78…Nb5 and “my Leela” was happy! However, after a little while longer…
Leave “my Leela” long enough (16.2M nodes searched) and its evaluation plummets: it sees that 79…Nb5 loses to the 80.Nf5 sacrifice and its assessment of its new top move (79…Bb5) is just 27.9% expected score (1 win, 3 draws and 6 losses). A big fall!
However, if we look at “TCEC Leela’s” evaluation at this point…
…it gives a WDL breakdown of 11.3% White win, 88.7% Draw and 0% Black win (0.16 is the conversion of this expected score to centipawns which the TCEC does to be able to compare easily with Stockfish). That’s just a slight disadvantage. Why is that?
Well, “TCEC Leela” moved 79…Bb5 very quickly, taking just 34 seconds for its move.
In that short time, it searched 2.3M nodes (10M nodes less than my 24 mins’ effort) and this was presumably not enough for it to see all the dangers in the position!
In human terms, Leela reacted impulsively to an unexpected move, flashing out a reply when it would have been well-advised to think a little longer!
“TCEC Leela” played like a steam train again (10 seconds) searching 1.8M nodes… and blundered! Its main line is not impressive!
In fact, “my Leela” only found 81.Nf5 after examining many more nodes (8.4M):
It would probably have taken “TCEC Leela” just about 50 seconds to a minute to discover it… but 10 seconds was too little.
In human terms, Leela was rushing – perhaps seeking to keep time in reserve for the ensuing struggle – and missed the critical point of the struggle which was effectively ended after 81.Nf5!
We can see that Leela suffered from tunnel-vision, concentrating on 78…Kh8 to the exclusion of anything else – and then rushed its next two moves, searching too few nodes to enable it to spot the tactic. However, did it miss 81.Nf5 completely? That’s not really something that engines do; unlike humans, they see everything! What did Leela see against 81.Nf5?
This was what “my Leela” saw after analysing a measly 16K nodes and this was enough for “my Leela” to reject 81.Nf5 and put it on the back-burner for ever!
The first 3 moves – 81.Nf5 gf 82.g6 f4 83.Qg2 Rfg8 were good, but it missed White’s lovely idea 84.Qg5!! overloading the Black queen.
In human terms, Leela saw a dangerous line for White, quickly spotted a plausible refutation and its sense of danger let it down as tactical alarm bells failed to ring!
In general, my experience is that if strong neural nets like Leela Zero are going to miss tactics, then queen sacrifices for no material (like 84.Qg5) are extremely strong candidates! Neural nets work on probabilities and the likelihood that giving away your queen for nothing is a good move is generally low! It’s quite different for traditional engines that can be explicitly programmed to search deeper when certain conditions apply (for example, a king under attack such as in this case). Neural nets must discover all these complexities and exceptions through their own self-play!
I hope you enjoyed this little dive into the Leela brain!