An update on Soccermetrics software projects
Categories: Database Development, Software Development
It’s been a very long time since I’ve written about Soccermetrics software projects, so in this post I’d like to write about the current status and the roadmap ahead.
Soccermetrics has a number of open-source projects that you can find at our GitHub page. Some projects are definitions of tables and views that make up database schemas at varying levels of complexity, others are client libraries for the Soccermetrics APIs, and one is a fully featured, and ultimately over-engineered, desktop-based football data entry application. A few projects haven’t seen activity in several months, while others haven’t been touched in years. I did take a break from writing software after shutting down Soccermetrics Connect API, but I have been working on other private projects since then (e.g. football career forecasting, MLS front-office efficiency).
I’ve been looking at the codebase and it is impressive what has been done over the last five years — well into five figures LOC, and possibly six figures. Much of the code has done its job and done it well, but it was written when I didn’t know better about certain best practices in software and database design. I’ve learned a lot since then, so it’s time to refactor the code and open some of it up to the public.
So here’s a walk through these projects:
Marcotti Database Models
I had written design specifications and schemas to capture match data at varying levels of complexity. I had also written what was essentially my own ORM to handle database operations. It was a learning experience for sure, and I’m proud of what I created, but the most important thing that I learned was that custom ORMs are a big pain in the butt. I was not comfortable with off-the-shelf ORMs such as SQLAlchemy when I was getting started, but I’ve learned a lot about web development in the intervening years and the time has come to use a model-based approach.
The plan is to convert the database table definitions in Marcotti-Light to model definitions in SQLAlchemy, and then repeat the process across the other schemas (probably in order of Marcotti-Summary, Marcotti, and Marcotti-Events). The database schemas have matured thanks to their use on various projects, and I’m a lot more knowledgeable with SQLAlchemy than I was several years ago (and it’s now version 1.0!).
Not all of the database schemas are available to the public — Marcotti and Marcotti-Summary are public, but Marcotti-Events and Marcotti-Light are not. I kept them private because I had created an analytics library within each repository, but I’ll separate the schemas into their own repositories and open-source them within a couple of weeks. Marcotti-Events is a little more complicated to release because there are libraries that interface with proprietary data feeds such as Opta, Press Association, and other sports data companies, so those libraries will have to be separated if the rest of the repository is to be made public.
I should point out to the uninitiated that these repositories contain database schemas. They do NOT contain match data of any kind.
Marcotti Analytics Library
This repository contained code that governed database access and data pre-processing for specific types of analyses. The most complicated code concerned data entry and retrieval to and from the database. What’s really great about switching to SQLAlchemy is that I can retire a lot of that code and clean up the data preprocessing and analysis code. A new test suite would be a good thing as well. I say right now that this library won’t be open-sourced, but I’m of two minds on the matter.
Marcotti-Desktop
This unwieldy thing is a desktop-based data entry application for the Marcotti match databases (called FMRD at the time). The data that it collected was actually pretty basic (match result and information data), but it required a lot of work in UI and backend logic. But at least it was my first major project in Python, and I got to learn the Qt library as well. The repository is already public, but there are days when I just want to light a match to the repo and walk away. I might write a similar project for the web in the future (as several people recommended to me years ago), but I don’t plan on maintaining this repo and it may disappear in the undetermined future.
Match Result App
This repository is the code for the ResultPage online app. It’s supposed to work as a thin client, with calls to a light version of the Soccermetrics API. I hadn’t done anything with the page or the backend data in years, and it was slowly costing Soccermetrics money, so I shut it down permanently last week. I wrote that code to learn more about web development, and I wouldn’t say that everything I wrote was in keeping with best practice. I don’t have a problem with sharing the code, so I’ll open-source the repo and people can do with it as they wish.
Soccermetrics API Server Code
There are actually two repositories that contain code that runs the Soccermetrics API servers. One is the full Soccermetrics Connect API, which I have to say is pretty impressive. The other is a lighter version of Soccermetrics Connect. They will be refactored to incorporate changes in other repositories, but they will not be open-sourced anytime soon. The clients are public, but the Connect API is not open to the public right now. (It might be soon, but “soon” is an undetermined time in the future.)
Soccermetrics Executive Dashboard
This project was (and is) and analytics dashboard application that displays top-level metrics to executive-level personnel at clubs and league organizations. It’s a thin client that interfaces with an API to receive its content, and there are very nice visualizations of executive-level metrics at various levels of granularity. There are no plans to open-source this repository at this time.
Custom Soccermetrics GitHub Page
There is a repository that allows for a custom Soccermetrics page on GitHub. As you can tell, I haven’t done much with it. I was trying to think of what to do with such a page beyond posting a list of software packages, which ultimately isn’t very interesting, but I have more of an idea now. I had used Sphinx and Bootstrap to create some beautiful documentation pages for the Soccermetrics APIs, but I will more than likely take a different approach by using Jekyll (which is what powers GitHub pages anyway).
So that’s a review of the projects that are currently on the Soccermetrics GitHub page, with an eye toward making some of them public in the very near future. It’s my hope that you find them useful, and if you have ideas for improvements, please submit them. Thank you for your continued interest in Soccermetrics software projects.