Just Dance is an incredibly popular video game where players dance along with an example dancer on the screen. Usually, a popular song is being played and the user is scored based on how well they complete the dance moves.
Not Just Dance uses the same premise as the game Just Dance, with a few minor tweaks. There are only 5 possible dance moves that the user can be asked to do, and, currently, they are only to the song "Beat It" by Michael Jackson which is around 138 beats per minute. The user will move at each down beat and is scored on whether they complete the correct dance move, and how "on beat" they complete it.
Using the MATLAB Cloud application on your phone, hold the phone the orientation described in the gifs above. Select the accelerometer recording option and start recording at 100.0 Hz. Choose a dance move to complete, and dance it to the beat of "Beat It" by Michael Jackson.
Then, upload this dance file and change the file name in the player data section. Run the matlab script and see your score! This script is linked in the github below.
With the phone facing away from them, the player moves their hand up or down at each down beat.
With the phone initially facing the ground, the player rotates their hands around each other.
With the phone facing the player's left, the player claps their hands.
With the phone facing the ground at all times, the player swings their forearms at each down beat with their upper arms parallel to the ground.
With the phone facing away from them, the player swings their arm from an up position to a halfway down position at each down beat.
In order to better understand the data our algorithm is working with we have modeled the motion of Dance Move 2, our most reliable, highest scoring dance move.
The free body diagram on the right shows a cross section view of the phone at the peaks of the up and down swings as well as when the linear acceleration is zero and the phone is parallel to the ground. The diagram allows us to predict what a graph of linear acceleration will look like for each axis.
In an ideal world, there would be no acceleration in the X axis, since the motion takes place in the Y-Z plane. However, due to the mechanics of arms, the player frequently moves their hand slightly closer towards their body on the upswing, causing our data in the X axis to look more like a very noisy version of the data in the Z axis.
The swinging motion of arms creates angular acceleration along the Y axis, however since the accelerometer only measures linear acceleration, we have excluded angular acceleration from our model. The Y axis is experiencing only linear acceleration due when the arm is at the peaks of its swing. Since the downswing motion only takes the acceleromter parallel to the floor, slightly past the Y axis, the downswing produces very small spikes of acceleration. We see larger spikes on the upswing due to the contribution of gravity to linear acceleration.
The Z axis of the phone is aligned with the direction of motion for this movement. Therefore we expect a waveform signal with the same dominant frequency as the music, in this case 2.33 Hz, if the player is dancing on beat. Because the phone is parallel to the ground when the acceleration due to the motion of the arm is 0, the data is shifted to center around 9.8 to account for acceleration due to gravity.
To fully understand how this algorithm works, we will walk through how our algorithm processes your data, step by step. In this example, we will pretend that the player has just danced Move 2 and is looking to find out their score.
The player is scored on two axes: 1) if they danced the correct dance move, and 2) how "on beat" the player was. Control data with a player dancing each of the dance moves to the best of their abilities was recorded. To determine which dance move the player is doing, we will first align the player data set (x, y, z) with all 5 of the control data sets (x, y, z) using the "find delay" function. Using this delay, we will then trim this delta from all 5 data sets and snip the two data sets so they are the same length.
The player data lines up best with control data for dance move 2, as the player has also danced dance move 2.
Aligned x-axis player data and x-axis control Dance Move 2 data
Aligned y-axis player data and y-axis control Dance Move 2 data
Aligned z-axis player data and z-axis control Dance Move 2 data
Then, with the cropped and aligned information, we then take the cross correlation of the accelerometer data to find how much of the player data is in the control data. This is quantified in the estimated cross correlation array which is output for different lags using MATLAB. Then, the maximum cross correlation between the player data set and the control data set in x, y, and z is calculated and averaged to be the cross correlation value for the player data set and one of the control data sets. This is calculated between the player data set and all of the control data sets and ordered in decreasing value. The cross correlation plots for all three axes are shown below between the player data and dance move 2. The highest cross correlation value is with control data for dance move 2, as the player has also danced dance move 2.
Cross correlation looks at how related two series are. In order to demonstrate how this works, let there be two time series functions (x(t) and y(t)). These two functions have been aligned and cut to be the same length as well. The equation below shows how the cross correlation is calculated.
The cross correlation is calculated using this equation for all the possible delays. By delays, we mean shifting the y function by the delay value and recalculating the cross correlation value at that delay. By plotting the cross correlation values against their corresponding delays, we can find out at which delay the two signals are most related. We
In the case of our dance move data, we have already aligned and chopped the excess off, so 0 should be the point at which they are the most correlated. If, by any chance, the aligning function didn't completely work, it would still find the lag at which they are the most aligned. Aligning and chopping the data simply ensures that any wait time before the dance move begins is removed.
Cross-correlation for different lags between x-axis player data and x-axis control dance move 2 data
Cross-correlation for different lags between y-axis player data and y-axis control dance move 2 data
Cross-correlation for different lags between z-axis player data and z-axis control dance move 2 data
The correlation R-value between the player data set and all of the control data sets is also calculated in a similar way and ordered in decreasing value. The image to the left shows the average R values in x, y, and z for the 5 dance moves. As you can see, the average R value is greatest for the control data set for dance move 2.
The two ordered arrays for the cross correlation and the R-values are compared.
This is the cross correlation values calculated and ordered in decreasing order for the 5 dance moves.
This is the r-values calculated and ordered in decreasing order for the 5 dance moves.
If the "top score" for the two arrays links to the same control dance move, as shown here, it can be said that the player was dancing that dance move.
If they aren't the same, then we check if the top scorer for the cross correlation is equivalent to the top scorer for the correlation R-value, and vice versa. In other words, if the algorithm can't decide between two dance moves. If this is the case, then we check the FFT dance score for the two possiblities. We give the player the benefit of the doubt by assuming they are dancing the dance move with the better score. How this FFT score is calculated is shown below.
Our metric for how well a player is dancing a dance move is how "on the beat" they are moving. For comparison, we have recorded control accelerometer data of how well someone is dancing a dance move. The FFT of this control data is compared to the FFT of the player data. Then, the 5 most prominent frequencies are compared against each other in order. If the difference between two frequencies is less than 0.7, we will add 100 points to the "on the beat score". If the difference is greater than 0.7, but less than 1.3, we will add a variable score that depends on how much the difference is (percentage). The graphs below show the FFT data in the x, y, and z axes between the player dance data and dance move 2.
The Fourier transform is a series of matrix operations that take a signal in the time domain and transforms it into the frequency domain. Every signal can be expressed as a sum of simpler signals. When a signal is expressed in the frequency domain each of its component signals appear as a spike on the graph. The magnitude of the spike is proportional to the amount of the component frequency that is present in the signal. This is useful because it allows us to find the main component frequencies in our data.
FFT of the x-axis control dance 2 data set and the FFT of the x-axis player data set
FFT of the y-axis control dance 2 data set and the FFT of the y-axis player data set
FFT of the z-axis control dance 2 data set and the FFT of the z-axis player data set
Since we expected that the player is dancing dance move 2 and it was found that they did, we will give that player 500 points. Since the player's FFT peaks were found to be similar enough to the control data set, we will give the player 475.7 points. Thus, this player has a total of 975.7 points.
If you would like to access the code, clone the following repository: https://github.com/naviatolin/NotJustDance. The code is located in the .mlx file. Feel free to collect your own data and test our algorithm yourself! Make sure that your collected data and all the control data is located in the same local folder as the .mlx file as this ensures that there are no path errors when loading the data in. Have fun!
We chose this project because prior to this project Anna and I really enjoyed playing the game Just Dance. As engineers, we were very curious to try to develop a similar dance move recognition algorithm that involved some aspects of scoring. We learnt a lot about how to visually compare two signals mathematically, which proved to be a more difficult task than we initially anticipated.