E ratings

Note: Even with the maths kept to a minimum this post is a bit on the dry side, so I’ve stuck a quick-ish explanation up front and the bulk of the detail underneath.

The short version

E ratings are my way of tracking each club’s attacking and defensive strength over time. They use the same “expected goals” concept that powers my match timelines, where every shot taken and faced is given a value based on how likely shots of that type go in. By adding up all these values we can work out how many goals each club “should” be scoring and conceding. The reason for doing this is that it has been shown to be a more reliable measure of a team’s underlying strength than treating all of their shots equally.

There are three numbers for each club:

  1. An attack rating, which basically equates to how many goals the club would be expected to score against an average opponent in their division.
  2. A defence rating, which measures how many goals they’d be expected to concede against an average opponent.
  3. A difference score which is just the attack rating minus the defence one, giving their expected goal difference in a match against an average opponent. This can be used to rank teams overall.

Before a match, the attack and defence ratings of both clubs are compared to calculate an expectation for the quality of chances that each will create. If a club ends up creating chances with a higher combined expected goals value than predicted, or allowing chances with a lower combined value, then the corresponding rating will improve. However if they underperform relative to expectations then the relevant rating will drop. The size of the change is proportional to how much they over or under-achieved.

This sounds more complicated than just averaging each team’s performance, like I do in the scatter graphics, so why do it? The main reason is to reduce the effect of the fixture schedule: if we just averaged everything without accounting for opponent strength then we wouldn’t be able to tell the difference between a club’s ability genuinely improving or worsening and the impact of an easy or difficult run of games.

The E Ratings should therefore provide a useful way of tracking the strength of a club’s attack and defence over time. We can use this information to dig deeper into their underlying performances and predict how well they might do in future.

That’s pretty much it in a nutshell. If you’re interested in more detail, below is the original (long) explanation of my motivation and the approach I’ve taken.

 


Introduction

I’ve smashed together two things that I’ve wanted to bring to lower league football for a long time:

  1. An “expected goals” measure which assesses clubs based on the quality of the shots they take and face, in addition to their quantity.
  2. A ratings system (inspired by the Elo ratings) which tracks the “strength” of clubs over time and factors in the relative ratings of their opponents.

There are plenty of examples of both of these things out there but I haven’t seen them combined before. I wanted to do this because I like different things about both of these approaches:

  1. Expected goals is a way of weighting shots based on their goalscoring potential rather than just counting them, which gives an intuitively fairer way of assessing a club’s attacking and defensive performance.
  2. Elo-type ratings systems give a single, current measure of team strength that can be tracked over time to help tell the “story of the season” and moves based on how a team performed relative to expectation against each opponent.

Combining them also allows me to sidestep things I like less about them:

  1. Expected goals tend to be averaged over a season or a rolling set number of matches, but this allows ratings to be influenced by the fixture schedule i.e. a run of easy or hard matches can make it look like a team is getting better or worse when they aren’t.
  2. Ratings systems tend to be calculated based on results, but shots can tell us more about a team’s underlying ability. They also usually provide a single “strength” number when I prefer to look at attack and defence separately. Finally, seeing I’ll mostly be comparing clubs within the same division, using results wouldn’t add much to what we can already see in the league table.

What I’ve done is built a ratings system to track each club’s expected goals scored and conceded, based on the quality of the shots they’ve taken and faced, rather than the actual goals they scored. This gives two numbers for each club, which can be updated after every round of league fixtures:

  1. How many goals they are expected to create per match (intended as a measure of attacking ability)
  2. How many goals they are expected to concede per match (intended as a measure of defensive ability)

I like that this number is still a “real” quantity rather than an abstract rating: it tells you how many goals are expected to be scored in a match involving a given team and an average opponent.

They will rise and fall over time based on the quality of the shots that a club creates and allows compared with what was expected of it. If I want to rank clubs by a single measure I can just subtract (2) from (1) to get “expected goal difference per match”, which can serve as an overall “strength” rating, but being able to quantify attacking and defensive performance separately feels more interesting to me.

Some examples

If you’re already bored of me stumbling through this explanation and want to skip to some examples of it in action, then here are some reviews of last season I’ve put together for illustrative purposes:

If you want some more details, here’s some further information on the two ingredients, but feel free to skip past this if you’re already familiar with them:

Ingredient 1: Expected goals

People like Michael Caley have already done a much better job of building and explaining Expected Goals models than I can, but in a nutshell the way they work is that each type of shot has a goal value assigned to it based on how often shots of that type are scored. For example, if a certain type of shot went in 1 time in every 10, it would have an expected goal value of 0.1).

There’s a really nice summary of the concept of expected goals here:

If you add up all these values for each match, you’ll get totals for how many goals a team “should” have scored and conceded, assuming everyone is equally good at converting and saving shots over the long run. That last bit is obviously not true, but shooting and saving are currently difficult skills to quantify and vary quite a bit, and this approach has been shown – and is widely considered – to be an upgrade on simply counting all shots equally.

As an example, Tottenham were a significant outlier on my scatter plots a few seasons ago because they were taking loads of shots, but a lot of them were “low value” efforts from a long way outside the area and thus largely wasted. If you just counted shots you might have concluded that they were in with a shot at the title, but once you correct for the quality of those efforts a more realistic picture emerges.

In the lower leagues we don’t have as much data to play with as there is for top flight clubs – many of the components described in the two links above simply aren’t measured – but we have shot type and approximate location, which is better than nothing.

Taking inspiration from several others working with this level of data, I’ve subsequently introduced two small weighting factors: one which assigns greater goalscoring potential to shots which were on target and another which reflects recent shot conversion rates for and against each team. The rationale for doing this is that shot accuracy and conversion owe something to the quality of the shot and the quality of the finish respectively. Therefore by dosing the ratings with the appropriate proportion of these factors we can improve their accuracy without over-reacting to unsustainable trends in the data.

Ingredient 2: a ratings system

As I mentioned earlier, expected goals for and against are usually totalled up for the current season or shown as a “rolling average” e.g. over the last 20 or so matches, but this allows the fixture schedule to distort the numbers.

To give an example, say that a team had played 10 matches of their season and had faced a lot of weak teams: their rating would be inflated above its true level because they’d find it easier to create good chances than a team of equal strength who’d had a tougher start. Likewise if a team played a few easy matches in a row, you’d see their average start to rise and erroneously assume they were getting better.

What I like about the Elo ratings system is that it compares what you achieve with what you were expected to achieve, rather than treating all matches as being equal. Every team has a “strength rating” and when two teams play each other these are compared to calculate what should happen. Afterwards, each team’s rating is modified either up or down in proportion to how they fared compared to that expectation.

For example, when England beat Switzerland in 2014, their rating going into the match was 1837 and Switzerland’s a similar 1819. England’s 2-0 win saw them gain 37 points and Switzerland lose the same number. However when England later beat Lithuania (who had a much lower ranking of just 1440) 4-0, the two team’s rankings only moved by 3 points given that this was much closer to what was expected of them both.

It’s worth mentioning that there’s already an excellent club version of the Elo system, but it only goes down as far as the second tier of English football. It’s also worth pointing out that the E Ratings are not equivalent to an Elo rating: I just like the way that an Elo rating behaves, but the formulae used to calculate these ratings are different.

Combining the two

Here’s how it works in practice. I apologise in advance for using the word “expected” so much:

  1. Ahead of a given fixture, each club will have an attack rating (telling you how many goals’ worth of shots they are expected to score in a typical match) and a defence rating (measuring how much they’re expected to concede) based on data from their previous matches. For most teams these values will be somewhere between 1 and 2 goals per match (the long-term average is around 1.3).
  2. Before each match, the attack and defence ratings are adjusted for home advantage and then compared to reach an expected “expected goal” tally for each team.
  3. Once the match has been played, the actual expected goal values of each team’s shots are added up and compared with the “expected” expected goal values calculated in (2). The attack and defence ratings for each team are then adjusted according to whether they have over or under-performed.
Why do this at all?

The point of doing this to allow clubs to be compared over time in a way that hopefully strikes the right balance between simplicity, transparency and credibility.

What I like about them is:

  • Measuring clubs on the quality of the chances they create and allow feels “fairer” than just looking at their results – we have the league table for that, after all – and provides a way of flagging which clubs may be under or over-performing in said table.
  • It allows me to track the attacking and defensive performances of clubs separately, although I can still net them off to get a single strength rating if needed.
  • There’s a much lower chance of a change in rating being caused by the fixture schedule, as opponent strength is taken into account.
  • When I’ve back-tested this against data from prior seasons, it yields results and trends which resemble the output of a standard “expected goals” model fairly closely, which reassures me that I haven’t buggered the maths up too much.
Limitations of expected goals

However it’s worth pointing out that expected goals don’t tell the whole story, even if we had perfect information about the expected goals value of every shot taken and faced. The differences between the actual and expected goals scored and conceded by a club can be explained by three non-exclusive reasons:

  1. They’ve been lucky (or unlucky) in the games they’ve played so far. There’s no reliable way of quantifying this, so we just have to live with it and rely on luck averaging out in the long run.
  2. They’ve played unusually easy (or hard) opponents so far. The way the ratings are adjusted based on how each team’s rating suggested they would perform will help here, but if a team’s underlying ability changes rapidly it may take the ratings a while to catch up.
  3. They’ve genuinely performed more (or less) effectively than the average club. Every team’s finishing and shot-stopping ability will be different, but again it’s a tricky thing to quantify. I’ve chosen to adopt the approach that several others have taken in building a “composite team rating” and introduced a small corrective factor to the ratings based on how well teams are converting and preventing chances.
Challenges of this approach

No model or system is perfect and I’ve had to deal with quite a few challenges when building this, including:

  • Balancing the ratings so that they change quickly enough to be relevant but not to overreact too much to “form”. I’ve back-tested a variety of different equations and weighting factors using data from previous seasons to get the optimum balance, but if a team undergoes major changes then it may take the rating a while to catch up.
  • Adjusting a team’s ratings when they move between divisions. When a team is promoted or relegated I amend their ratings based on an “exchange rate” derived from how the performances of previous teams have changed, but again much depends on how many changes the club makes in response to their new circumstances.
  • Compromising between prediction and explanation. When building the model that powers the ratings I ran a series of tests to see which equations and factors gave the most realistic results, but quickly discovered that some matched a team’s recent performances more closely while others were better at predicting how they would do in future. As I value both and only want to operate one set of ratings, I’ve chosen the approach which gives the best combined result, but this means that it’s neither as explanatory or predictive as it would be if I’d picked just one.
The name

Given that “Experimental 361’s Expected Goal Elo-style Rating” is both horrible and has lots of ‘e’s in it, I’ll just be calling them “E ratings”.

Using these ratings

The motivation for doing all of this in the first place was to come up with something a bit more rigorous and explanatory to complement my scatter plots, which are powered by raw shot counts and therefore omit valuable information about where shots are coming from.

The current plan is to use them to:

  • Preview matches and rounds of fixtures in a bit more detail – knowing about each team’s attacking and defensive strength, plus which way these numbers are moving, should prove useful.
  • Review matches after they’ve been played – and the season so far – to assess how teams are performing compared to each other and to individual expectations.
  • Model and simulate matches – and the rest of the season – to try and predict what could happen in future. This system is aimed at being more explanatory than predictive, but some simple simulations using these ratings as inputs should still provide something of interest.