Introducing “E ratings”

Note: Even with the maths kept to a minimum this post is a bit on the dry side, so I’ve stuck a quick-ish explanation and some examples up front, with more detail underneath.


Ahead of the new season I’ve smashed together two things that I’ve wanted to bring to lower league football for a long time:

  1. An “expected goals” measure which assesses clubs based on the quality of the shots they take and face, in addition to their quantity.
  2. A ratings system (like the Elo ratings) which tracks the “strength” of clubs over time and factors in the relative ratings of their opponents.

There are plenty of examples of both of these things out there but I haven’t seen them combined before. I wanted to do this because I like different things about both of these approaches:

  1. Expected goals methodologies weight shots based on their goalscoring potential rather than just counting them, which gives an intuitively fairer way of assessing a club’s attacking and defensive performance.
  2. Elo-type ratings systems give a single, current measure of team strength that can be tracked over time to help tell the “story of the season” and moves based on how a team performed relative to expectation against each opponent.

Combining them also allows me to sidestep things I like less about them:

  1. Expected goals tend to be averaged over a season or a rolling set number of matches, but this allows ratings to be influenced by the fixture schedule i.e. a run of easy or hard matches can make it look like a team is getting better or worse when they aren’t.
  2. Ratings systems tend to be calculated based on results, but analysts rely more on shots as a measure of a team’s underlying ability. They also usually provide a single “strength” number when I prefer to look at attack and defence separately. Finally, seeing I’ll mostly be comparing clubs within the same division, using results wouldn’t add much to what we can already see in the league table.

What I’ve done is built a ratings system to track each club’s expected goals scored and conceded, rather than the actual goals they scored. This gives two numbers for each club, which can be updated after every round of league fixtures:

  1. How many goals they are expected to create per match (intended as a measure of attacking ability)
  2. How many goals they are expected to concede per match (intended as a measure of defensive ability)

They will rise and fall over time based on the quality of the shots that a club creates and allows compared with what was expected of it. If I want to rank clubs by a single measure I can just subtract (2) from (1) to get “expected goal difference per match”, which can serve as an overall “strength” rating, but being able to quantify attacking and defensive performance separately feels more interesting to me.

Some examples

If you’re already bored of me stumbling through this explanation and want to skip to some examples of it in action, then here are some reviews of last season I’ve put together for illustrative purposes:

If you want some more details, here’s some further information on the two ingredients, but feel free to skip past this if you’re already familiar with them:

Ingredient 1: Expected goals

People like Sander Ijtsma and Michael Caley have already done a much better job of building and explaining Expected Goals models than I can, but in a nutshell the way they work is that each type of shot has a goal value assigned to it based on how often shots of that type are scored. For example, if a certain type of shot went in 1 time in every 10, it would have an expected goal value of 0.1).

If you add up all these values for each match, you’ll get totals for how many goals a team “should” have scored and conceded, assuming everyone is equally good at converting and saving shots over the long run. That last bit is obviously not true, but shooting and saving are currently difficult skills to quantify and vary quite a bit, and this approach is widely considered to be an upgrade on simply counting shots.

As an example, Tottenham were a significant outlier on my scatter plots a few seasons ago because they were taking loads of shots, but a lot of them were “low value” efforts from a long way outside the area and thus largely wasted. If you just counted shots you might have concluded that they were in with a shot at the title, but once you correct for the quality of those efforts a more realistic picture emerges.

In the lower leagues we don’t have as much data to play with as there is for top flight clubs – many of the components described in the two links above simply aren’t measured – but we have shot type and approximate location, which is better than nothing.

Ingredient 2: Elo ratings

As I mentioned earlier, expected goals for and against are usually totalled up for the current season or shown as a “rolling average” e.g. over the last 20 or so matches, but this allows the fixture schedule to distort the numbers.

To give an example, say that a team had played 10 matches of their season and had faced a lot of weak teams: their rating would be inflated above its true level because they’d find it easier to create good chances than a team of equal strength who’d had a tougher start. Likewise if a team played a few easy matches in a row, you’d see their average start to rise and erroneously assume they were getting better.

What I like about the Elo ratings system is that it compares what you achieve with what you were supposed to achieve, rather than treating all matches as being equal. Every team has a “strength rating” and when two teams play each other these are compared to calculate what should happen. Afterwards, each team’s rating is modified either up or down in proportion to how they fared compared to that expectation.

For example, when England beat Switzerland in 2014, their rating going into the match was 1837 and Switzerland’s a similar 1819. England’s 2-0 win saw them gain 37 points and Switzerland lose the same number. However when England later beat Lithuania, who had a ranking of just 1440, 4-0, the two team’s rankings only moved by 3 points given that this was much closer to what was expected of them both.

It’s worth mentioning that there’s already an excellent club version of the Elo system, but it only goes down as far as the second tier of English football.

Combining the two

Here’s how it works in practice. I apologise in advance for using the word “expected” so much:

  1. Ahead of a given fixture, each club will have an attack rating (telling you how many goals’ worth of shots they are expected to score in a typical match) and a defence rating (measuring how much they’re expected to concede) based on data from their previous matches. For most teams these values will be somewhere between 1 and 2 goals per match (the long-term average is around 1.3).
  2. Before each match, the attack and defence ratings are adjusted for home advantage and then compared to reach an expected “expected goal” tally for each team.
  3. Once the match has been played, the actual expected goal values of each team’s shots are added up and compared with the “expected” expected goal values calculated in (2). The attack and defence ratings for each team are then adjusted according to whether they have over or under-performed.
Why do this at all?

The point of doing this to allow clubs to be compared over time in a way that hopefully strikes the right balance between simplicity, transparency and credibility.

What I like about them is:

  • Measuring clubs on the quality of the chances they create and allow feels “fairer” than just looking at their results – we have the league table for that, after all – and provides a way of flagging which clubs may be under or over-performing in said table.
  • It allows me to track the attacking and defensive performances of clubs separately, although I can still net them off to get a single strength rating if needed.
  • There’s a much lower chance of a change in rating being caused by the fixture schedule, as opponent strength is taken into account.
  • When I’ve back-tested this against data from prior seasons, it yields results and trends which resemble the output of a standard “expected goals” model fairly closely, so I’ve not buggered the maths up too much.

Given that “Experimental 361’s Expected Goal Elo-style Rating” is both horrible and has lots of ‘e’s in it, I’ll just be calling them “E ratings”.

Using these ratings

The motivation for doing all of this in the first place was to come up with something a bit more rigorous and explanatory to complement my scatter plots, which are powered by raw shot counts and therefore omit valuable information about where shots are coming from.

The current plan is to use them to:

  • Preview matches and rounds of fixtures in a bit more detail – knowing about each team’s attacking and defensive strength, plus which way these numbers are moving, should prove useful.
  • Review matches after they’ve been played – and the season so far – to assess how teams are performing compared to each other and to individual expectations.
  • Model and simulate matches – and the rest of the season – to try and predict what could happen in future. This system is aimed at being more explanatory than predictive, but some simple simulations using these ratings as inputs should still provide something of interest.