World Cup 2018 draw

I ran a little experiment yesterday during the World Cup draw that seemed to go well, so thought I’d write it up here and collect all the bits and pieces together.

A shared spreadsheet

First of all, I realised that the quick spreadsheet I’d built myself to track the draw didn’t have any code in it, and would therefore work fine as a Google Sheet that I could share it with other people who might be interested.

[here’s the link]

I use Google Drive and its apps all the time at work, although with only one or two people jointly editing or reviewing documents, so I thought it’d be interesting to see what happened when I shared one more widely. It seemed to handle almost 100 simultaneous viewers during the draw yesterday with no problem, so I’ll definitely look for more opportunities to share stuff in this way.

Who got the luckiest – and unluckiest – draw?

Something else I was interested in was how hard the draw was for each team – you’ll see in the sheet that I had a go at calculating the “group of death” from the World Football Elo ratings, which give a much more credible measurement of team strength than the official FIFA rankings in my book.

However, the pot you’re in has a huge impact on the average rating of the teams you’re going to be drawn with, so I also wanted to work out how every team’s draw compared with every other draw they could have gotten. Before the draw I calculated every possible combination of teams that could be drawn into any one group and filtered out the invalid ones (i.e with too many teams from the same confederation).

This left me with an average of roughly 270 possible draws for each team, so for each of these I then added up the Elo ratings of all the other nations and ranked the actual draw against these. You can see a summary on the “Nations” page of the shared spreadsheet, but I also graphed it here:

So Uruguay were the luckiest team of all, as they got the lowest-rated team possible from Pots 1 and 4, plus the second lowest from Pot 2. Only getting Tunisia instead of Egypt would have resulted in their three Group A opponents having a lower combined Elo rating. Portugal came off worst, getting the highest-ranked team in Pot 2, the highest-ranked non-European team in Pot 3 and the second-toughest team that was available in Pot 4.

As an aside, I’m aware that I’ve taken a relatively simple approach to this by just adding the ratings up. As per this discussion on Twitter, you could argue that it’d be easier to get out of a group containing one really strong opponent and two easier ones than three equally-ranked teams with the same combined score.

When’s the earliest that any two teams can meet?

It was fun watching this fill up in the “Earliest” tab of the shared sheet during the draw yesterday and seeing what the draw meant for the Pot 1 teams in particular. Basically the way the knockout stages are structured means that you’re guaranteed to have an opponent from one specific group in the round of 16, another two groups in the quarter-finals and the remaining four groups in the semis, so it was easy to calculate the earliest possible stage that any two teams could meet once they’d been drawn into their groups.

To take an example from near the top-right corner, Brazil can meet Germany in the last 16, but neither can meet Argentina or France until the semi-finals.

Who are each team *likely* to meet though?

The limitation of the graphic above is that it only shows what’s possible, not what’s probable. I did have a go at building a simple prediction model from the Elo ratings too, which definitely isn’t rigorous enough to bet from but seems to give believable results. I churned through 10,000 simulations and came up with this estimate of who England could face at each stage (assuming they made it all the way to the final of course):

I had a few questions about other teams’ prospects but it wasn’t practical to churn out 31 more of these, so instead I bumped up the number of simulations to 100,000 and created a summary of the four most likely opponents for each team in each pot. You’ll notice that the England percentages are slightly different to the ones above as the sim doesn’t run exactly the same each time (plus the larger number of simulations probably smoothed the numbers out a bit).