Scatter plots

Scatter plots are probably the graphics I’m most widely known for. They were one of the first graphics I produced (back in early 2012) and they’ve evolved several times since then and that I produce most often, and they currently look something like this:

If you strip away all the colour, these are simple scatter plots showing a combination of three basic measures, each with an attacking and defensive flavour:

  • The average number of shots taken or faced per match – a simple measure of how busy an attack or defence is.
  • The average number of shots per goal scored or conceded – an equally simple measure of how efficient an attack or defence is.
  • The average number of expected goals created or allowed per match – a widely popular measure of chance quality (explained here).

These can be cut and combined in various ways to summarise and compare the performance of a group of teams.

Quadrants

The axes of the graph are centred on the median for ease of comparison, which also allows me to divide the graphics into quadrants (with thanks to Chris Anderson for suggesting that using median rather than mean). The quadrant names provide some broad categories, which are mainly intended for fun (as some of the names imply) although the chances are that the further inside a quadrant a team is, the more they apply.

For the shots taken vs. shots faced graphic, they’re pretty self-explanatory. For example the top left quadrant, “Quiet attack, busy defence” contains teams that take fewer shots than average while allowing their opponents to take a higher than average number.

The expected goals chart works in pretty much the same way, except that as these adjust for quality, “more” and “less” can be substituted for “better” and “worse” as appropriate.

For the attacking effectiveness chart, the quadrants are currently named as follows:

  • Constant threat – teams that take more shots per match than average and are also better than average at converting them.
  • Energetically wasteful – as for the previous quadrant, these teams shoot more often than average but are below average at converting them.
  • Languidly clinical – the flipside of “energetically wasteful”, these teams shoot less often than average but need fewer attempts than the average team to score.
  • Ineffectual – the worst of both worlds, these teams take fewer shots than average and require more efforts to score each goal on average.

The same divisions apply for the defensive effectiveness chart:

  • Formidable – teams that allow fewer shots per match than average and have also withstood more shots than average side for each goal they’ve conceded.
  • Competent but busy – teams that face more shots per match than average, but as for “formidable” sides they have absorbed more shots than average for each goal conceded.
  • Languidly clinical – the flipside of “competent but busy”, these teams permit opponents fewer attempts than average but their defence requires a lower-than-average number of shots to penetrate.
  • Pushovers – the worst of both worlds, these teams allow more shots than average and require fewer efforts to breach than the average.
Stripes

The stripes are basically contours, coloured in a simple “greener = better, redder = worse” style. For each division these are based on a historical distribution as follows:

  • Shots taken vs. faced – the average of shots taken minus shots faced per match.
  • Attacking effectiveness – the average number of goals scored per match.
  • Defensive effectiveness – the average number of goals conceded per match.
  • Expected goals – the average of expected goals created minus allowed per match.

Each stripe is based on a particular percentile of the distribution and is labelled on either the top or right edge of the plot. I originally divided the stripes into deciles e.g. “top 10%”, “11-20%”, “21-30%” etc, but this resulted in a lot of small stripes in the middle and not a lot of distinction around the edges where things are usually more interesting. Therefore I’ve used the top / bottom 2% as the outer categories (in testing this gave a more useful spread than the top 1%, particularly as in a 24-team division the title winner represents more than 5% of the teams), followed by the top/bottom 5%, 10% and 25%.

Limitations

These charts are intended to be simple and are not intended to provide a rigorous analysis of tactics or effectiveness, but rather a broad indication of how teams are doing relative to one another that can be used as the springboard for more detailed discussion or analysis. Simply counting shots as I’ve done here is to treat them all equally, when in reality a penalty or shot from point blank range has a much greater chance of going in than a 30-yard header. Being in a greener stripe here doesn’t mean a side is better full stop, just better by this measure. There are plenty of reasons why a club’s true performance will differ from their position on one of these charts, including:

  • A disproportionately high number of the shots they’re taking or facing are of a very high or very low quality.
  • If the graphic is produced during a season, then the fixtures they’ve played so far may have been unusually easy or difficult.
  • Luck also plays its part – the effect of injuries, tiredness, refereeing decisions and even just the bounce of the ball can all favour specific teams more than others in the short term.