cfanalytics

05/01/18

My first open source python package!

https://github.com/raybellwaves/cfanalytics

I'm an avid cross-fitter (people often joke, you know if someone does cross-fit because they will tell you immediately). Every year people partake in the open (a competition where you post the results online and the winners qualify to the games (where the best of the best compete)).

~200,000 people partake and I knew it would be a good dataset to play around with. There is something fun by using python for a hobby, it doesn't feel like work and there is no pressure of deadlines. There is a feedback as well for improving day-to-day work and it is a nice thing to add on the CV. I would encourage everyone to have a fun 'work' side-project.

r/crossfit was my main resource for understanding how to scrape data from the website (scrape is a term used to mean download data displayed on a webpage) and like most of reddit there are techies hiding in all sub-reddits. I would also not have started this project if I had not found a package which scraped the data from 2017 (and had some bugs in it). Those bugs were the tinder for me to fork (clone) the project and try to fix them. Eventually I re-wrote the package.

I got to play with some cool python packages: aiohttp, pandas, xarray, cartopy and salem.

I posted the data on r/crossfit and various people picked up the data and did their own thing with it. It is motivating knowing that you are doing something useful for someone. You can see the traffic of people looking at the GitHub repository for the last two weeks and despite me not touching it since March people are still viewing it and cloning it. Most of the views coming this post. I seem to remember at the peak there was about 200 views in one day and about 50 people cloned it.

Most of my time on the package was data management (a very important step in any data analysis). I finally did some geospatial data analysis towards the end by comparing the regions (after the open there are nine regional competitions). I documented my experience creating a map (see below) using cartopy on r/python. In addition, I compared results in a city using the salem package which is good for plotting point data onto of google maps (see below).

I'll pick up the project next year and will create an interactive notebook using mybinder.org to allow non-techies to find their scores in the dataset and plot it.