Goal parsing tool for soccer results matrix
Categories: Goalscoring Models, Match Data Collection
You might have seen what is typically known in the UK as a "results matrix", which is the collection of all results during a league season. Here is one example for the Premier League; here is another for La Liga. It is an elegant and compact way to present results in leagues where all teams play each other twice, but it can be a challenge to collect goal scoring data per team in a given season.
To this end, I've written a script that allows me to convert a results matrix into goal scoring data per team in columnar form. Such a format would allow me to do a lot more things with the data, such as running the data through my mathematical and statistical packages to examine the goal distribution and obtain summary data. I doubt that I'm the first person to do something like this, but I haven't seen a similar code presented elsewhere. I am also sure that I'm not the only person who would find such a script useful, so I will share it here.
The code is called ParseMatrix and is written in Perl. It is best used from the command line with the following options:
./ParseMatrix <matrix.file> <team.file>
The matrix file is the collection of match results, with no team descriptors included. On the diagonal there must be placeholders for the scoreline (I use X-X but they can be any non-numeric character). The team file is a column list of the corresponding league team name. No spaces are allowed; so a name like "Manchester United" must be written as "Manchester_United".
The script reads in the matrix into a two-dimensional array and for each team compiles the goals scored/allowed data — across columns for home matches, down rows for away matches. For each team, it saves the data to an output file Goals_<TeamName>.dat. The filename can be anything you wish, of course.
At this time the script only works for completed leagues or result matrices with all the placeholders included. You do have to insert the placeholders by hand; I haven't gotten around to automating that procedure. I'll leave that as an exercise to an enterprising reader.
Without further delay, here's the code. I hope you find it useful.