Trinomial trees, Markov chains, and qualifying odds for the World Cup
Categories: Markov Processes, Monte Carlo Simulations
I have a couple of posts in the pipeline for my other website that explore a hypothetical merger between CONCACAF and CONMEBOL, and I started thinking about the probability of Caribbean countries qualifying for the World Cup from CONCACAF and how that might change in a larger 50-team region. (With 30 associations, the Caribbean countries would maintain the balance of power in a merged American confederation.) It got me to thinking, just what is the probability of any national team qualifying for the World Cup?
An exact answer doesn't exist, but I think the answer could be approximated by looking at the problem as a trinomial tree. A trinomial tree is a computational tool for pricing options, but at this point I'm more interested in the conceptual model. A soccer team has three possible match results: win, lose, or draw. (Putting aside complicating factors like away goals rule or penalty kick shootouts.) Each match has those three possible outcomes, so it's possible to visualize a team's path through a competition by the tree below:
That's after two games, and there are nine possible paths that a team can take. After three matches (say, group play at the World Cup), there are 27. In fact, there are 3n possible paths for a team playing n games.
So how might you use this to calculate qualification probabilities? I'm still thinking my way through the process, but you would first have to come up with the probability of winning a match against a given opponent. Perhaps you could use a ranking system (ELO, SPI, even FIFA) to derive win/loss/draw probabilities against an opponent when playing home or away. There are certain point totals that guarantee defeat in a two-match series (0 or 1 points), as well as point totals that guarantee success (4 or 6). Point totals of 2 or 3 could result in a win by aggregate goals, the away goals rule, or the penalty kick tiebreaker, which makes any odds calculations complicated in a hurry. If you put those tiebreakers aside for a moment, it's possible to model the series result probabilities as a Markov chain, which is a useful tool for modeling discrete processes where the state of the process at a future step depends only on the state at the current step. There are separate Markov chains during the qualification process in CONCACAF: two two-match series (two for the bottom 22 teams, one for everyone else), one six-match series, and one ten-match series.
As I said, this can get complicated very quickly, and I know that I need to flesh out all the details. It has the makings of a very intriguing problem — as if I don't have enough to do already.