Update on the Soccer Pythagorean derivation
Categories: Soccer Pythagorean: Theory, Team Ranking Models
** BUMPED to the top with updates added **
I want to give an update on my efforts to derive a Pythagorean (or Pythagorean-like) formula for use in soccer. As I said in other posts, the most challenging part is coming up with the term that captures the probability of a drawn result. This is rather difficult when you assume that the goals follow a continuous probability distribution, because the probability that teams X and Y will have the same number of goals is zero for a continuous distribution. I tried to work around this by assuming a Poisson distribution and working out the probability of a drawn result, but I end up with some nasty infinite sums that can't be simplified. Also, there aren't any parameters that can be changed in a Poisson distribution:
P(i) = P(lambda,i)
and that lambda term is the average number of goals during the season.
I was looking through some of the papers that I have on soccer goal distributions, and I found that some researchers use an extreme value distribution to model goal distributions during a season. As a matter of fact, I discussed one of these papers on this blog a few months ago. Extreme value distributions are flexible like Weibull distributions and can be used to handle extreme events in soccer, like your 6-0 or 13-1 results. I'm thinking that that distribution might be a better one to use. Perhaps I could also look at the probability of teams X and Y scoring goals between a continuous interval like [-0.5, 0.5] or [2.5, 3.5] (where the actual goals scored is in the center of the interval) in order to come up with a draw probability.
I have a midterm in my course this week, but I hope to have something to present on here soon.
(Oh, and an explanation of one of the math terms: the notations [a,b] and (a,b) in mathematics are used to represent an interval between a and b. The square brackets mean that a and b are included in the interval; the round brackets mean that a and b are not included. Infinity can never be reached, so there is always a round bracket on that end: (-∞, b] or [a, +∞). )
UPDATE (18 Oct, 9:20pm): I made a breakthrough in my derivation of a "Pythagorean" for soccer teams and leagues. I kept the Weibull distribution and solved for the probability that teams X and Y scored goals between a continuous interval. The actual number of goals is a discrete number that lies at the center of the interval. I got an expression for the probability of a n-score draw, where n is the number of goals, which you can sum up to whatever number of goals you wish. Unfortunately the resulting expression is much more complicated than the one that predicts a clear win, and includes some special functions that were canceled out in the first term.
I'll write a separate post that presents the complete formula and my derivation in an attached PDF.
UPDATE #2 (19 Oct, 11:30pm): I forgot to move this post to the top. And I won't be able to write that post tonight. I had too much going on; sorry.