Pre-season predictions: how did the model do?

Shortly after the regular EFL season had finished I compiled a set of club-by-club graphics which tracked how their league position changed over time compared with my model’s predictions. When doing so I noticed that even the pre-season predictions (made without any knowledge beyond last season’s performances) were surprisingly accurate, so I thought I’d try to work out how good they were.

I didn’t have the foresight to track bookmakers’ odds during the season, but then remembered that I’d averaged and ranked a load of them shortly after the fixtures were announced in order to compile the fixture difficulty matrices for the Championship, League 1 and League 2. While the odds may well have moved before the start of the season, at this stage they will still have been influenced by more information than the model ratings, so this doesn’t feel like an unfair comparison.

What I’ve done below is to compare each club’s actual league finish against both what I predicted and what the average odds “predicted” in pre-season to see where the differences were:

The Championship is definitely the most unpredictable of the three EFL divisions. As the bold numbers at the bottom show, the average difference between each club’s final league position and where the model expected them to finish was 3.7 places, although the odds were much further out with an average “error” of 5.2.

Both got Newcastle’s triumph and Rotherham’s relegation correct, while the model was right to be more optimistic about Brighton, Sheffield Wednesday, Huddersfield and Fulham. However it was taken more by surprise by Reading’s achievements and was too optimistic about Bristol City and Wolves.

With an average error of just 2.8 compared to the bookmakers’ 4.8, League 1 saw the model’s pre-season predictions perform impressively. Both had the same predicted top three of Sheffield United, Bradford and Millwall but the model was correct to be more optimistic in its assessment of Scunthorpe and Fleetwood.

Both were surprised by Southend, who turned a poor start around to push for a play-off finish, but Rochdale’s top half finish and Coventry’s struggles were better anticipated by the model based on last season’s showing.

The model performed marginally better here with an average prediction error of 2.7 league places and completed a clean sweep against the early bookmakers’ odds, but the latter was closer to the mark than in League 1. Both guessed three of the eventual top four correctly: Portsmouth and Doncaster plus one of Luton and Plymouth apiece.

Both failed to predict a top half finish for Stevenage or a relegation battle for Cheltenham, but the model was less surprised by Exeter’s strong performances and Leyton Orient’s struggles. However its faith in Accrington and poor opinion of Notts County were both misplaced (although the latter looked accurate until Kevin Nolan was parachuted in and worked his magic).

Summary

I was pleased to see that the ratings model was more accurate overall than the average bookmakers’ odds early in pre-season, particularly as it’s only powered by match information. However I suspect that the average differences would have been closer if I’d captured the odds on the eve of the season when the prices would surely have moved to reflect a near-complete set of transfers plus performances in pre-season friendlies.

Edit: The other thing to bear in mind here is that the model being more “accurate” here doesn’t mean it’s beaten the market. All I’ve done here is rank the pre-season odds – I wouldn’t consider betting using my model’s predictions until I’d compared them against the odds themselves. The margin that bookies typically apply to outright bets like this could well be big enough to cancel out any accuracy advantage enjoyed by the model.

I’ll try to remember to capture the average odds a bit closer to the start of the season next time around, but either way I’m happy with how well the model performed and genuinely surprised with how well it did compared to “the market”.