What is a good statistic?
Categories: Analytics Perspectives, High-Level Discussions
Back in November 2009 there was a discussion on the Sounders At Heart website on soccer statistics, and this site had been cited, among others. I had followed some of the thread, and it brought me back to something I probably mentioned briefly: what is a good statistic in soccer?
Perhaps we can answer the question by asking what is not a good statistic in soccer. In my opinion, it’s a statistic that is useless; that is, one that does not reveal illuminating information on the match, its trends, or the comparable abilities of the players or teams involved.
One example would be most ranking systems for national teams and especially club teams. Perhaps you might be able to say who the top two or three sides are, but beyond that it’s just piles of teams that are closely matched among them. Another example would be the time of possession per team or the average distance run or the passes made/completed. I’m not sure how either measure correlates to the final score; average distance can tell you how hard a player is working, but not how effective he is playing — at least not directly. I think the passing statistic can be made to be useful, but as it stands right now it does not differentiate a completed pass that was the wrong one to make from a pass that would have created an excellent opportunity on goal had the receiving player received the ball properly.
A poor statistic can be not just useless but also counterproductive, obscuring knowledge and understanding even more than providing them. The various ranking systems all too often fall in this category; time of possession could be counterproductive as well. I’ve heard some fans describe the “Team X has never lost when Player Y has scored” as a useless stat. I prefer to think of it as a useless factoid instead of some kind of measurement of raw data.
By counterargument, a good statistic should be useful, add to knowledge and understanding, and be relevant to the issue at hand. A good statistic, in my mind, should contribute toward answering the question “Who has contributed the most toward the final outcome of the match?” You can rephrase the question to ask who has done this on a team, or over a season or a competition. I keep coming back to the notion of “goal value”; I believe that such a metric, if developed successfully, would go a long way toward determining which player, by his actions, contributed the most to his side’s success. There might have to be parallel metrics developed for goalkeepers and perhaps defenders, with some way of isolating the goalkeeper’s work from that of his defensive unit.
One other thing about statistics in soccer that is different from other sports is that sometimes, these “bad” statistics are actually useful in some situations. When the result is emphatic, it is typical for the possession and offensive statistics to point in one direction. But even for 3-0 or 4-1 results, the statistics may be misleading, and for closer results, the meaning of the field statistics becomes even more muddled. It speaks to a lack of precision of the statistics that are used in the game.
So those are some of my thoughts on what is — and is not — a good statistic in soccer. It’s a question that I’m sure I will revisit many times in the future.