Friday, November 14, 2014

Chess Analytics: Analyzing Championship Chess for Strategy

Introduction

I recently become very interested in learning  how to play chess.  I knew the rules and could play casually, but I wanted to learn how to really play.  Which tactics should I use?  Where should I put my pieces?  What strategies should I employ?  I started playing a lot on a mobile app and also got a book on chess to learn more about the game and how I should strategize my play.

In the midst of this, I began to think that there was probably data available on chess games and the moves good players typically make.  And I figured that, if I could analyze the data, I could learn from other people's games and apply those lessons to how I go about playing chess.

What I Did

 I looked around online for a chess database.  At chess.com I found a file of world chess championship games from 1886-2013.  In it were 932 championship games in PGN format. The file looked like this:


For each game there was a header recording details about the game, followed by the series of moves each player took with the final result.  I needed to convert this data into a tabular structure to do my analysis, so I created an R script to pull in the data, restructure the data for each game, separate out each move in a game, and then analyze each move on its own.  My final data sets looked like this:



I created pivot tables off of these data sets, which populated an 8x8 set of cells to mimic a chess board.  Using conditional formatting I created a heat map of where various activities happened on the chess board.  The results are visually displayed below.

Understanding the Results

The images below can be understood as follows. The black side of the board is at the top, while the white side is at the bottom.  The files (i.e., columns) are marked in green  as a-h, while the ranks (i.e., rows) are marked in blue as 1-8.  The black horizontal strip shows where the pieces would line up on the 8th rank, while the white strip shows where the white pieces would line up on the 1st rank.  The pieces are R for Rook, N for Knight, B for Bishop, Q for Queen, and K for King.  The black pawns are all lined up on the 7th rank while the white pawns all start on the 2nd rank. These are not visually indicated below. 

The actual chess board (the light to dark blue 8x8 square) has a lighter shaded square when that square has few instances of play occurring on that square, while the darker squares have the most instances of play.  There is a number in the middle of each square giving the total number of instances of play occurring on that square under the current context of discussion for the 932 games in the dataset.

The white moves are represented in the image on the left, while the black moves are represented in the image on the right.

Let's move to the results...

Results: Where Should You Move?

While what move you should make on any given move will depend on the specific situation your pieces are in, the below visualizations offer some insights about where and how chess masters generally try to position and use their pieces.  Trusting in their expertise, we should try to do the same in our play if we want to be successful. 






















In the course of an entire game, white concentrates pieces in the center.  The two squares that stand out are f3 and d4.  However, c3 and c4 are also commonly moved to. Black also concentrates pieces in the center.  The important squares seem to be f6 and d5, followed by c6 and e7.  It seems that it is important to control the middle of the board by having one's pieces located there.

Breakdown By Piece

Bishops:





















White clearly wants Bishops to be on d3 and e3 to protect the center and guard the diagonals.  Notice that a Bishop on g5 puts the black Queen in danger, and on b5 puts the black King in danger.  Black mimics this movement to a lesser extent, playing a more defensive role by positioning Bishops on d7 and e7.  Notice that both white and black try to keep the Bishops side by side so that their diagonals combine to create a wide path of attack.  Bishops move around most in the middle diamond of the board.

Knights:





















Both white and black move the Knights out towards the middle of the board, where they can attack and support pieces in the center, and also protect each other from Pawn attacks. These pieces move around most in the middle of the board.


Rooks:


















 



Rooks appear to be most effective in the back row.  White places one Rook on f1 (as a result of castling), and then both move back and forth to the left of the castled King, putting pressure on pieces across the board or providing support to pieces in the front lines.  Black does the same by castling the Rook to f8, and then having both pieces move along its back rank.  Rooks are also active across the board and all along the d-file.


Queen:




Queens look similar to Bishops in moving to the center and then again along diagonals.  The end of the diagonal (a4 or a5) puts pressure on the King (e8 or e1).  After an initial defense in front of her King, the Queen moves around in the middle of the board, and attacks the opponent's Queen square.

King:


 





















The King is most commonly castled on the kingside, and then stays pretty close to home, moving towards the center of the board occasionally (to support other pieces, avoid check, etc.).

Pawns:




Pawns cannot move backwards (hence the #REF! where there are no PivotTable values).  White advances a Pawn to d4 most often, and c4 and e4 commonly as well.  These Pawns are trying to control the middle of the board.  Black does the same by moving Pawns to d5, e6 (to support d5), and c5 and e5.  Black also often takes the white Pawn on c4 with his Pawn on d5.  Pawn movements lessen moving across the board, and very few make it to the back rank for promotion.

Other Results

Here are some other results I found.

Activity:

Which pieces are the most active?  White's Pawns move the most (unsurprising since there are so many), comprising 25% of moves.  Pawns are followed by the Rooks (20%), Knights (17%), Bishops (16%), Queen (11%), and then the King (11%).  Black's pieces move with roughly the same distribution.

Exchanges:























White captures pieces most often on c5, d5, d4, and c4.  Black most often captures pieces on d5, c4, d4, and c5.  So this set of squares on the middle-left of the board is where most of the fighting is occurring.  While white's other pieces follow this pattern, white's Rooks capture most often on the d file, with most captures happening on d8 (where black's Queen would be).  Black, on the other hand, often has his Bishops capture on c3 and f3, which would put white's King and Queen in danger.  Black's Rooks also capture often on the d file, particularly on d8 and d1.

Pawns are responsible for 29 to 31% of captures, followed by Rooks with 18-19%, Bishops at 18-19%, Knights at 16-17%, the Queen at 13%,  and the King at 4%.

Checks:

While the piece doing the moving usually also does the checking, this is not necessarily the case.  Bear that in mind in what follows, as another non-moving piece may be doing the checking.  The game notation does not indicate which piece is doing the checking, and without reconstructing the whole play in a game, it may be difficult to tell where the check is coming from.  I will speak in terms of a piece moving to "create a check."  However, for our purposes, it is likely safe to assume that the piece creating the check is also the piece doing the checking.























White most often creates a check as a result of a move to d8, e6, and f6.  Black similarly creates a check in a move to c3, d2, and d1.  These checks are resulting from pieces that are closely positioned to the King.  White's Bishop is an exception to this, most often creating a check from b5 to black's King on e8.  The Knight creates a check from f6 and the Queen creates a check from d8.  The Rook creates a check from any position in black's left back two ranks (7 and 8).  Black's Bishop creates a check from c3, the Knight creates a check from f3, and the Queen creates a check from d1, d2, or c2.  The Rook creates a check from d1 primarily.

Rooks create a check the most at 39-40%, followed by the Queen at 28-31%, Knights at 12-15%, Bishops at 10-12%, and Pawns at 5-6%. 

Pawn Promotion:












White most often promotes a Pawn on the a8 and b8 squares.  Black most often promotes a Pawn on the a1 and b1 squares.  So Pawn promotions tend to happen on the side away from the castled King.  Also, in these games, the Pawn was always promoted to a Queen.


Openings:






















This is how white tries to position himself in the opening three moves: move a Pawn to d4,  a Pawn to c4, a Knight to f3, a Knight to c3, a Pawn to e4, and possibly a Bishop to b5 (in order to put pressure on the black King).  In other words, white moves the middle Pawns forward and supports by moving the Knights forward.  This is done to control the center of the board.

Black opens up in this way: move a Knight to f6, a Pawn to d5, a Pawn to e6, a Knight to c6, and a Pawn to e5. Also, possibly move a Bishop to b4 to try and check the white King. So black opens in basically the same way as white.  The goal in the opening to establish control in the middle of the board.


Lessons Learned

Here are some takeaways to guide our general chess play:
  •  Try to control the middle of the board by having your pieces located and supported there.  In the opening, move your Pawns and Knights forward towards the center.  But be prepared to have active fighting in the middle as well.
  • Keep Bishops in the center and close together to attack both flanks.  Occasionally move them forward along diagonals to either flank to attack the King or Queen from a safe distance.
  • Move your Knights out to the center behind your Pawns to support them and attack in the center.  Use your knights actively around the board, but don't try to use them to check the King.
  • Keep Rooks in the back after castling on the kingside.  Let them actively support and attack pieces in the center and across the board.  Be on the lookout to put pressure on your opponent's Queen and to check the King from a distance.  Move them across the board to check the King when it is safe to do so and use them actively when the board has cleared to capture other pieces.
  • Keep the Queen close and in front of your King initially.  When the board has cleared, move the Queen across the board to put pressure on and to check the King.
  • Castle the King on the kingside and keep him safely in the back.  Move him towards the center only when necessary.
  • Move your center Pawns forward, and use them to attack and support in the center of the board.  Use them to capture and clear the board for your other more powerful pieces.

I hope these insights help you in your chess play.  Good luck!

Saturday, November 1, 2014

Let's Begin...

I have participated in blogs in the past and my experience has been mixed.  I and others will generate a lot of good posts and discussion, and then that initial enthusiasm will die out as we all turn our attention to the many other pressing matters in our lives.  As a result, the blog sits and stalls out, and often is deleted after a time.

Or the response to a particular post is so overwhelming that responding becomes a consuming and burdensome exercise that never ceases.  Consequently, my attitude towards the blog becomes that of a beast viewing its burden, reluctant to shoulder it again or add to the work load.  Better to let it sit than to create more work for myself.

And yet I recognize the value of blogging personally for me.  It has generated good conversations in the past with others, and I have enjoyed the thought and work that has gone into various posts.  It has kept my mind active and my skills sharp.  It provides an outlet for creative work that is above and beyond what I do for my day job.

This is why, despite the challenges of blogging, a slight disdain for self-promotion, and a desire to avoid reinforcing a culture that encourages everyone to have an opinion on everything (and holds all opinions to be of equal worth), I am drawn back to it.  In starting this blog, I hope to avoid the pitfalls while reaping the rewards of blogging.  I will focus on the two areas I know something about: (1) philosophy and (2) data analytics.  I will focus on one, or the other, or often, the two of them together.  For example, I have done work on trends in PhD  job placements in academic philosophy and done surveys on the relationship of philosophical views to demographics data.  I hope to continue to do this same sort of work, while expanding into other areas ripe for analysis that pique my interest.

I will do this at a pace sustainable for my current work/home life, which practically means generating a post or addressing comments once or twice a month.  This blog will be about what I am interested in doing, and will be updated however often I am interested in doing so.  And since I am the only contributor, this pace and content is in my control.  This is the only sustainable way to keep up a blog I believe.  Thus, it can remain a blessing to me (and perhaps to any others that may read it) without becoming a burden.

In conclusion, I hope someone out there finds something that I write and do on this blog insightful, intriguing, and interesting.  If not, if only I benefit from what I do here, it will still have been worth it.

Let's begin...