Recognizing the limits of soccermetrics

I had meant to comment on Tim Vickery's well-deserved ridicule of the latest IFFHS rankings, but The Run of Play blog posted some comments of their own and provided a link to Vickery's post.  The authors divide the current state of soccer statistics into two categories: (1) those that are useless and downright counterproductive (the 'rankings-ism' that Vickery pillories), and (2) those that aren't good enough because of their current lack of sophistication and completeness.

I agree with both parts.  I haven't met a rankings system in soccer that I have liked, and they have ranged from mediocre to laughable.   The coefficient systems are a little better because they are limited in scope (they're trying to allocate slots for tournaments, not say who the best team is), but I'm not going to claim that they're perfect.  And that dovetails nicely with the second point.

I believe that those of us engaged in statistical analysis for soccer should be upfront about the limitations of such an approach to complete understanding of the game.  One example that Vickery gave was the statisticians determining what was a "good" or "bad" pass simply because it went to the desired target.  A "good" pass may turn out to be "bad" because it was sent to the wrong location on the field and ultimately lost possession.  A "bad" pass could have been a "good" pass had the intended player been a little better (e.g. continue his run, not be offside, etc.).  I think that a sophisticated statistical analysis can capture the first event, but I'm not sure that it can capture the second.  How do you capture what could have been?  Maybe someone else a lot smarter than me has an idea, but at this time I don't know.  I also don't believe that scorelines will be able to be predicted with any kind of precision, although that should be obvious to just about anyone. (It does appear that the distribution of score results follows one of two well-known statistical distributions, which I think is fascinating.)

I've given a number of technical presentations in the past on my research (referring to my day jobs here), and listened to a lot more.  I know that every new algorithm or heuristic that is presented is supposed to be the bees-knees, and will enable all sorts of new capabilities not seen before, but it would be really nice if researchers were frank about the limitations of what they have developed, and the simplifications that facilitated the development.  And those simplifications and limitations DO exist.  Such frankness may make it a little more difficult to get papers accepted to conferences or secure funding, but it would go a long way toward maintaining credibility with the public.  I hope to be upfront about the limitations of the algorithms that I present on this site.

A very common critique of statistical analysis in soccer is that the Beautiful Game isn't given to being reduced to statistics, and that such a reduction would diminish its aesthetic appeal.  An excellent retort is given in the Run of Play post: "Poetry and humanity aren't going to be saved by ignorance."  At the same time, however, not everything about this wonderful game is going to be revealed by numbers.  And that's okay.  A little mystery isn't bad.

Share