Introducing outcome matrices

Introduction

Earlier this week I finally got around to creating an automated summary graphic for my match previews that will hopefully save people time wading through entire galleries of graphics to get an overall picture of what’s likely to happen.

On a roll, I thought I’d attempt something in a similar vein for the post-match analysis. I already crank out swathes of match timelines which show how each match unfolded, but I didn’t have a compact way of comparing what happened to what was expected. I also saw an opportunity to add an extra measure that I’ve seen pioneered elsewhere: a post-match probability for how likely the outcome was based on the chances that each side created.

The old numbers bit

As a reminder, the first of these two numbers is a pre-match prediction of the match result based on how many goals’ worth of chances each team is expected to create, which is in turn driven by the two clubs’ current E Ratings.

What happens is that the likelihood of each team scoring a specific number of goals in the match is calculated using a probability distribution function, and then all of the combinations which result in a home win are added up (i.e. 1-0, 2-1, 3-1, 3-2 etc). The same process is repeated for combinations which result in draws and away wins, so you end up with three probabilities – one for each of the three possible outcomes – which add up to 100%.

The new numbers bit

The second number is the new bit, although it actually uses the exact same calculation method as above. The difference is that it takes the actual quality of chances created during the match as its inputs instead: the numbers you see on the match timelines that add up the “expected goals” value of all chances created by each team.

The point of doing this is to quantify how surprising (or not) the outcome of the match was based on the chances created. This allows us to identify which games ended in a seemingly fair result and which results may have owed something to random chance.

The graphical bit

Having two values to compare will always draw me towards my favourite visualisation of all time: the noble scatter plot. Rather than commandeer the vividly-coloured versions that are the staple industry of this blog, I’ve gone for a more minimalist version in keeping with the style of my other matchday graphics.

What I’ve done is to plot the two probabilities described above against each other for every game in a round of fixtures, which gives us a handy visual way to see which matches played out as expected and which sprung a surprise. Below is an example using a round of League 2 matches from a few weeks ago:

round-review-test

We can see that in the top right both Plymouth and Carlisle delivered on the pre-match expectations of them. Both were given a better than 50% chance of winning their matches (reading from the horizontal axis) and did so. We can see from their high position on the vertical axis that both victories look to have been deserved: the chances they created relative to their opponents were enough to deliver a win over 50% of the time.

There were some surprises though. At the bottom we can see that Cambridge weren’t particularly fancied to beat Accrington: the horizontal axis suggests that they had just under a 30% chance of doing so. However, win they did, although it doesn’t look like a convincing result as the balance of chances created looks like it would have led to a win in fewer than 10% of cases. Let’s call up that timeline to see what went on:

2016-10-01-cambridge-accrington

So Cambridge netted twice from fewer than one goal’s worth of chances while Accrington fired in attempts of around three times the overall quality. The huge leap in the visitors’ line is the two injury-time penalties that they failed to convert (as per the BBC match report), so we can see how easily this game could have gone the other way.

The biggest shock of the weekend was Hartlepool‘s 3-0 win at Grimsby, which looks to have had only around a one in five chance of happening based on our pre-match (horizontal) probability. It looks like the Mariners had the better of the match itself too, as the vertical axis shows the probability of the away win based on the chances created was below 30%. Again we can dig out the timeline and take a look:

2016-10-01-grimsby-hartlepool

So this wasn’t quite as much of a smash and grab as the last one – hence it’s higher up the vertical axis – but while Hartlepool did more than enough to merit a place on the scoresheet, Grimsby can still feel a bit hard done by. They look to have created the better chances by around half a goal overall, notably a couple of close-range efforts in first half stoppage time, and could well have gotten something from this game on another day.

Summary

Hopefully people will find these useful as a quick way to see how each match lived up to expectations and how surprising each result was. The aim is to add them to the standard weekly output after a round of fixtures, in addition to the match timelines and E Ratings updates.