Rubik's cube data analysis

I learnt how to solve a cube in 2012 when I was in college. Then I forgot everything (as things usually go!). Thanks to the pandemic, I got a new cube and relearnt the easiest algorithm AGAIN. What I thought would be interesting is to record the data as I solve the cube and see what I can interpret from it. (Spoiler alert: I did not discover a new fast method. I am still slow! :( )

I collected data over a week. I had 129 random samples. I recorded what time of the day I was solving the cube, how many consecutive number of times I was doing it and the most important of all, how much time it took me to solve the cube. In the 129 tries, I averaged 108.805 sec with a minimum of 72 sec and maximum of 222.04 sec. As I practiced more, my average got better. This makes sense, as we all know, 'Practice makes perfect better!'

Next, I checked what influence does it have on the average solving time, if I consecutively kept solving it. To be clear, I mean I would solve, scramble and repeat. If I took a break to watch a youtube video, the streak will be lost. The maximum number of times I solved the cube at a stretch was 15.

The graph looks absolutely horrendous! What you can see very unclearly is, as I got the hang of it (around turn 3-4), I got better but then as time went by, the solving time was on the rise.

Then I wanted to see how frequently I hit a particular time range. I should mention the mean and standard deviation here, but, oh well!

Next, I did some of my favorite analysis, Regression and Classifier. I have recently learnt Machine Learning techniques with Codecademy and what better place to use that very limited knowledge? I chose to Multiple Linear Regression. So, the input would be the time of the day, consecutiveness and the output is well, the solving time. That didn't go that well. The Regressor predicted whatever! Maybe I did something wrong or this is just not the correct tool, let me know! But it looks like this So yeah, it predicted an average of 112-ish seconds which is not bad considering I average 108 sec. But I guess the input parameters are wrong or the sample size needs to be bigger.

Not defeated by that, I hoped to find my answer using a K-nearest Neighbors Classifier. I normalised the data using a min-max function. I converted the outputs into 0 and 1 based on if it is lower than my average or not. I tested out with different nearest neighbors values. I got the highest accuracy (93%) for k = 3.

From this result, I can input a given time and which consecutive time I am solving the Cube and all it will tell me is if I will do better or worse than my average and that too with a 93% accuracy. But hey! atleast that is something. If I keep adding more data, that accuracy will only increase!


To conclude, I made some data interpretation which is interesting to me, might not be to you. But I learnt a lot of new tricks while writing the code. Even though some would consider this a failure (me included!), I still love coding!. I can't wait for the next idea to strike me.

P.S. Please send me your concerns, suggestions and corrections at Gmail Send love, not hate!

Material used: 1. Cube from Amazon That's it!

Algorithm used: This video by J.Perm.

Disclaimer: I use Python 3.7.6 for all purposes.