Assessing the Projections: 2015 Major League Soccer regular season

The 2015 Major League Soccer regular season concluded last Sunday.  It’s time to engage in that exercise in humility in which I look back at projections made at the start of the season and assess them.

Here is what was projected at the beginning of the season – first by conference:

MLS_EC2015_Projections

MLS_WC2015_Projections

And then the single table:

MLS2015_Overall_Projections

The final league table finished like this at conference level, with Pythagorean expectations of the team records thrown in:

Eastern Conference:

Standard Table Pythagorean
GP W D L GF GA GD Pts W D L Pts Δ
New York Red Bulls 34 18 6 10 62 43 19 60 17 8 9 59 +1
Columbus Crew SC 34 15 8 11 58 53 5 53 14 8 12 50 +3
Montreal Impact 34 15 6 13 48 44 4 51 13 9 12 48 +3
DC United 34 15 6 13 43 45 -2 51 12 9 13 45 +6
New England Revolution 34 14 8 12 48 47 1 50 13 9 12 48 +2
Toronto FC 34 15 4 15 58 58 0 49 13 8 13 47 +2
Orlando City SC 34 12 8 14 46 56 -10 44 11 8 15 41 +3
New York City FC 34 10 7 17 49 58 -9 37 11 8 15 41 -4
Philadelphia Union 34 10 7 17 42 55 -13 37 10 8 16 38 -1
Chicago Fire 34 8 6 20 43 58 -15 30 10 8 16 38 -8

Western Conference:

Standard Table Pythagorean
GP W D L GF GA GD Pts W D L Pts Δ
FC Dallas 34 18 6 10 52 39 13 60 15 9 10 54 +6
Vancouver Whitecaps FC 34 16 5 13 45 36 9 53 14 10 10 52 +1
Portland Timbers 34 15 8 11 41 39 2 53 13 10 11 49 +4
Seattle Sounders FC 34 15 6 13 44 36 8 51 14 10 10 52 -1
LA Galaxy 34 14 9 11 56 46 10 51 15 8 11 53 -2
Sporting Kansas City 34 14 9 11 48 45 3 51 13 9 12 48 +3
San Jose Earthquakes 34 13 8 13 41 39 2 47 13 10 11 49 -2
Houston Dynamo 34 11 9 14 42 49 -7 42 11 9 14 42 0
Real Salt Lake 34 11 8 15 38 48 -10 41 10 9 15 39 +2
Colorado Rapids SC 34 9 10 15 33 43 -10 37 10 10 14 40 -3

The final single table looks like this:

Standard Table Pythagorean
GP W D L GF GA GD Pts W D L Pts Δ
New York Red Bulls 34 18 6 10 62 43 19 60 17 8 9 59 +1
FC Dallas 34 18 6 10 52 39 13 60 15 9 10 54 +6
Vancouver Whitecaps FC 34 16 5 13 45 36 9 53 14 10 10 52 +1
Columbus Crew SC 34 15 8 11 58 53 5 53 14 8 12 50 +3
Portland Timbers 34 15 8 11 41 39 2 53 13 10 11 49 +4
Seattle Sounders FC 34 15 6 13 44 36 8 51 14 10 10 52 -1
Montreal Impact 34 15 6 13 48 44 4 51 13 9 12 48 +3
DC United 34 15 6 13 43 45 -2 51 12 9 13 45 +6
LA Galaxy 34 14 9 11 56 46 10 51 15 8 11 53 -2
Sporting Kansas City 34 14 9 11 48 45 3 51 13 9 12 48 +3
New England Revolution 34 14 8 12 48 47 1 50 13 9 12 48 +2
Toronto FC 34 15 4 15 58 58 0 49 13 8 13 47 +2
San Jose Earthquakes 34 13 8 13 41 39 2 47 13 10 11 49 -2
Orlando City SC 34 12 8 14 46 56 -10 44 11 8 15 41 +3
Houston Dynamo 34 11 9 14 42 49 -7 42 11 9 14 42 0
Real Salt Lake 34 11 8 15 38 48 -10 41 10 9 15 39 +2
New York City FC 34 10 7 17 49 58 -9 37 11 8 15 41 -4
Philadelphia Union 34 10 7 17 42 55 -13 37 10 8 16 38 -1
Colorado Rapids SC 34 9 10 15 33 43 -10 37 10 10 14 40 -3
Chicago Fire 34 8 6 20 43 58 -15 30 10 8 16 38 -8

When I ran the numbers on this season’s collection of projections, I was starting to feel bad, but then I compared them to last year’s performance and I felt slightly better but still depressed.  Preseason projections are always going to be a bit unreliable when it comes to point totals, but I do believe that a model that’s good enough contains some nugget of reality about each team’s chances.  That said, small differences manifested themselves into significant deviations from reality.

As was the case last season, the current projection model does a good job of predicting the teams that do make it to the MLS Cup Playoffs: four of the six teams in the Eastern Conference, and five of the six teams in the West.  To be fair, it’s a little misleading to look at accuracy alone, so here is a confusion matrix of the projection model’s performance:

    Predicted
    Playoffs No Playoffs
Actual Playoffs 9 3
No Playoffs 3 5

If we’re using the projection model to predict, or, in machine learning language, classify who will go to the playoffs (not the original intent, but just for sake of argument), the accuracy rate of this season’s model is 70% (14 correct predictions — 9 in playoffs, 5 out of playoffs — out of 20).  Because the model always classifies 12 teams for the playoffs, the recall and precision rates are identical: 75% (9 of 12).  Overall the model’s playoff prediction error is a little better than blindly predicting that all teams will make the playoffs (30% versus 40%), but not much more.

If we’re using the projection model to predict points and places, it does a less impressive job.  The RMSE of the final position in the projection model is about 40% worse than last season’s model, but the RMSE of the final point total is about the same as last season’s (8.3126 vs 8.3161).  The preseason model predicted a closely contested regular season, with just nine points separating first place from 13th, which meant that any deviation in points would result in a significant rise or fall in the league table.  That result comes down to the quality of the expected goals that are fed into the Pythagorean expectation to predict total points.

There are always projections that are spot on, and some that are true clangers.  The biggest misses in a positive direction were New York Red Bulls and Montreal Impact, who performed at least 12 points ahead of expectations.  Both teams were characterized by offensive and defensive goal statistics that ran well ahead of expectations, but Montreal received a huge boost from the summer signing of Didier Drogba.  The biggest misses in a negative direction were Philadelphia Union and Chicago Fire — both teams performed 11 and 23 points below expectations, respectively, and both sides had the worst combined offensive and defensive goal performance relative to preseason expectations.  Little wonder that the Union made a big change at the front office; the Fire are expected to do the same.  The few projections that were spot on were Columbus Crew SC, DC United, Orlando City SC, and Houston Dynamo.  Finally it’s worth saying that of the teams performing at least five goals better in defense than expected, only the San Jose Earthquakes missed the playoffs.  Sporting Kansas City, who earned the final Western playoff spot at the Quakes’ expense, allowed one more goal than expected preseason.  (But again, did the teams really perform that well, or did it appear that way thanks to a biased expected goals model?

Let’s assume that we did have a model that projected the expected goals for each team perfectly.  The Pythagorean expectation would look like the last five columns in the final tables listed above.  In this context, the point totals represent expected points for an average team with identical goal statistics.  The RMSEs of the final points and positions are greatly reduced (3.44 points and 2.43 places, respectively), and if you look at individual teams the deviation between prediction and reality is much smaller.  Nevertheless, you can identify some teams that appear to have over- or under-performed their expectations.  DC United and FC Dallas had Pythagorean residuals of +6, which would have made a difference in United’s position in the table but not much in their playoff destiny (perhaps, they would have had an away knockout match in the playoff).  At the other end of the scale, Chicago’s -8 Pythagorean residual meant that when it rained, it poured.

If we make another confusion matrix for the Pythagorean expectation (perfect expected goals model), we get the following result:

    Predicted
    Playoffs No Playoffs
Actual Playoffs 11 1
No Playoffs 1 7

This gives an accuracy rate of 90% (18/20) and recall/precision of 91.6% (11/12).  Which of course is fantastic, but it’s easier to give more accurate predictions with perfect data.

Statistical Summary:

Pred Goals Actual Goals % Diff Goals RMSE Pts RMSE GF RMSE GA RMSE Pos
938 937 -0.10% 8.31 7.88 7.63 6.89

This season’s projection will hopefully be the last of these league projections as currently designed.  I’m a relative newcomer to predictive analytics in football, and I’ve viewed these attempts at league projection as a learning experience and a bit of a challenge.  I’ve definitely learned a lot, chief among them the need to have ownership over the expected (or projected) goal models and a more systematic and quantitative assessment of the uncertainty present in these expectations/projections.  I don’t know when the new version of the league projection will be ready, but the plan is to develop expected goal models that a bit less opaque and deploy uncertainty bounds around the point projections.  In other words, go Bayesian.

 

Share

Tags: