What I’m learning from the Soccer Pythagorean
Categories: Soccer Pythagorean: Theory, Team Performance
I've had an opportunity to apply the soccer Pythagorean to various leagues and competitions, and I'm noticing some characteristics of the formula.
First of all, the Pythagorean exponent does vary significantly from league to league, and perhaps from season to season in a league, but I haven't checked that out yet. The exponent does stay between 1.3 and 1.8, which indicates that the goal distribution of the teams is skewed toward the smaller numbers. That result does make sense — teams don't score five or more goals very often in soccer.
Second, the soccer Pythagorean consistently overestimates the number of wins and draws in a season. I have yet to see a team finish above the estimated point total in the Pythagorean. In general, the formula overestimates the point total by 12-16 points — four or five games, which happens to be the stated accuracy of the baseball Pythagorean. Now, I've applied this formula to the World Cup qualifying tournaments in CONCACAF and CONMEBOL, and while I still see the same overestimation of points, the point differences are much smaller.
Third, the soccer Pythagorean estimates the relative places of the teams very well. I think the ability of the Pythagorean to estimate final placement is perhaps the best measure of a team's ability to meet statistical expectations. I am guessing that for predicting over/underperformance in a season, it is only useful to calculate the Pythagorean at the halfway point and compare results at the end of the season. That test would be more difficult to carry out because I would need the goal scoring data up to the halfway point of the season, and that would require access to the match scores for each round of the league.
It looks like I'll have to think of a notion of "second-order" wins/draws in order to make the Pythagorean more accurate. It ends up becoming a correction to the goals scored and allowed with respect to the expected goals scored and allowed in a league. I need to give some more thought to how you would describe that mathematically.
Thanks to Dave, Eric, and the others who pointed me to the Pythagorean formula; it's been neat to think of the math foundations behind a deceptively simple formula, which turned out to be a little too simple for soccer. Now to make it a little less intimidating, but I think we're well past that point!