Playlist-Assist is a personal DJ for your Spotify playlists. After logging into the site, start playing a song on any of your Spotify-enabled devices and watch as Playlist-Assist begins the 'ingestion' phase. First, it extracts information from every song in your playlist and reshuffles the list so the songs flow together more smoothly. Second, it analyzes the audio content of the currently playing song, and looks for 'beat-pairs' with the song that's next in queue. A 'beat-pair' is a pair of beats from two different songs that sound identical.
After generating a list of beats that your current song and next song share, Playlist-Assist begins tracking your progress in the current song. Once you arrive at the jumpable beat, a command is issued to Spotify to change your song to the next in queue at the timestamp listed in the beat-pair. These seamless, beat-matched transitions continue until you stop or run out of songs in the playlist.
This week I added a playlist display to the app’s interface so the user can see songs in their queue and added timestamps so they know when to expect a jump. I also implemented a self-adjusting timer system to account for drift, accurately (sub-millisecond accuracy) measure the user's position in their song, and prevent desync with the user's player.
I started the week by revisiting my existing codebase. During the weeks spent working on other projects, I had some ideas for interface improvements, so the first thing I did was implement them. The main addition I wanted was a way for the user to see their playlist queue because during Playlist Assist’s ingestion phase, the playlist that the user is currently listening to is reshuffled. In an attempt to judge the ‘mood’ of songs, Spotify (secretly) assigns every song in their collection number values in a few different categories. Some examples of these categories are acousticness, danceability, energy, instrumentalness, liveness, and speechiness. Comparing songs by their category ratings and shuffling accordingly makes the job of creating beat-matched transitions much easier because songs that are next to each other (hopefully) already sound relatively similar. I went with a simple stack of info cards that the user can scroll through. On each card, there’s information like title, artist name, and an image of the album cover. Spotify includes all this information when I pull the user’s playlist data from the Spotify API.
snazzy scrollable song cards to show queue
logging the drift on every iteration
The second thing I did this week was revisiting the timer system. One of the main problems I’ll have to face is accurately measuring the user’s position in their song. If the website is going to make a transition at a given timestamp, it first needs to know where the user is in their song (to know if they’re at the timestamp). The best way to do this would be to place a webhook (event listener) so that when Spotify detects that the user has reached the target timestamp, they alert me and I can make the transition. Unfortunately webhooks aren’t supported by the Spotify API. The second best option is to open a longer-term connection with the Spotify servers and listen on maximum intervals for changes. By waiting the longest possible time on each request, I can maximize my listening area while minimizing the resources required. This technique is called long-polling, and yet again, Spotify doesn’t support it. That leaves me with the third best option, keeping my own timer locally. I do this by making one API call when the player starts to get a current timestamp then I can synchronize a local clock to infer the user’s current position. I can also periodically resync this local timer with more API calls to make sure the user isn’t fast-forwarding or rewinding their music.
To do this, every millisecond needs to be tracked and accounted for, or else the local timer will become desynced with the user’s actual player and the jump won’t be performed at the correct time. The first step of keeping accurate time is (obviously) the timer itself. I need a timer that has sub-millisecond accuracy and can adjust itself based on natural drift that all computer programs experience. On every iteration, the timer needs to calculate the drift (the time taken to perform functions) since its last step and subtract that drift from the interval (expected time between iterations).
In the second screenshot, you can see the phrase ‘it ran!’ being printed at the beginning of every iteration along with the drift since the last iteration. This drift is then fed into the interval to adjust for the next iteration, creating a self-adjusting timer.
The third screenshot is another screenshot of the timer, but instead of logging the drift on every iteration, we log the unix epoch time (time in ms since 00:00:00 UTC January 1, 1970) so we can see the timer in action. This timer was set to fire the ‘it ran!’ code every second. You can see how every iteration is performed exactly 1000ms (1 second) after the last with sub-millisecond error, meaning the timer works as intended.
logging unix epoch timestamps
This week was all about pre-processing. In order for the beat-finding algorithm to work well, I need to create an ideal environment for it. Trying to find jumps between songs that sound dramatically different is possible, but the results are highly unlikely to sound good. The jump from a calm instrumental track to a high-energy rap/rock track isn’t going to sound smooth no matter how statistically similar the beats are.
The goal of pre-processing (the ingestion phase) is to rearrange the user’s current playlist in a way that meets two goals:
Similar sounding tracks are arranged next to each other, creating a ‘flow’ through the playlist
The user’s currently playing song is always kept in the first position of the playlist and the following tracks must be organized around it.
The first step is to convert the track information (given to me by Spotify) of each song into vectors. While reducing the data to vectors, it’s important to make sure that the track IDs are preserved so we can retrieve additional information about the organized tracks later.
Once I have the vectors, I’m ready to compare. The first solution that came to mind was the K-means clustering algorithm. K-means clustering is a way of grouping similar things together based on their characteristics. It starts by guessing where the centers of the groups might be, then assigns each thing to the group with the nearest center, and repeats until the groups don't change much anymore. After the clustering was complete, I’d be able to find the current song and build out the playlist starting in that cluster.
The issues with this technique begin when we leave that first cluster. While clustering is great at showing me a handful of similar vectors given an input vector, it’s not as good at creating a larger picture. Knowing which cluster should follow the current song’s cluster adds another layer of complexity and doesn’t even guarantee that the playlist will retain ‘flow’.
A better way to create flow would be to start with the current song and compare it to every other vector in the queue, then pick out the best match and put it after the first song. I could then repeat this process recursively for the newly shuffled song (and the songs after that) until the queue has been fully organized. This guarantees that every song will be followed by its best match and that the songs flow out of the first track.
The metric I chose to compare the vectors is cosine similarity. Cosine similarity is a way to measure how similar two vectors are based on the cosine of the angle between them in a multi-dimensional space. I chose it because it’s a norm I’m familiar with. Unfortunately, cosine similarity is generally less accurate on lower dimension vectors (mine are only 12 dimensions), because the angle between two vectors becomes less distinctive as the number of dimensions decreases. Because of this, I’ll probably experiment with some others (maybe 2-norm?) and try to find the most accurate. But until then, cosine similarity will have to do.
As this project grows bigger, my code is becoming less and less standard. This week I started rewriting some of my older code and added a new terminal element in the UI so people can get updates about what the app is doing. The first thing I did was add error handling to all my API calls so that if they fail, I get some information on what failed. I already had error handling in place for most of my calls, but there were some stragglers that I patched up. The second thing I did was standardize my variable naming conventions. A naming convention is the pattern you use to name things. There’s lots of different conventions, but the popular ones are camelCase and snake_case. Ultimately it’s just a personal preference and there’s no serious advantage to using one casing over another. I decided to use "snake case" because it feels more readable and snakes are cooler than camels. I also removed some mysterious underscores from my analysis functions. The functions that had them were actually some of the first lines of code I ever wrote in this project.
Bottom function is named with a mysterious underscore
An example of camelCase being used
I wasn’t just renaming variables all week though, I also created a little logging window so that the user can get text updates as the app works through their playlist / songs. I used a class to manage the logger so that writing updates is as simple as using:
terminal.log('Queue shuffled successfully!');
I also added a stop button to match the start button so that people can turn the app off. Before, it would turn off when you close the tab or refresh the page, but now you can do it whenever you'd like. If there's anyone thinking about doing an independent project next year, you'll probably want to work on weekends, because I'm not sure if I'm going to have time to really polish this app.
This week I began the process of changing my beat-finding algorithm. This is the way that my program decides which milliseconds to jump and land at. My original algorithm was primarily a placeholder, so not much thought went into it. It was essentially just comparing two track statistics that Spotify makes available in their api - timbre and pitch. Timbre can be described as the ‘quality’ of a sound. Timbre is how different instruments playing the same pitch still sound distinct. A cello and a piano both playing C are pretty easy to tell apart. Pitch is the way of classifying frequencies that most people are familiar with (example of A, B, C, C#). My old algorithm would take two songs, measure their pitch and timbre, then compare the results using Euclidean distance (square root of sum of squares). This week, I switched that process up a bit.
Video will appear unavailable - just click "Watch on YouTube" to view it
The most recent showcase
The first step was scheduling a meeting with Mr. Moon because I don’t know that much about music. During our meeting, he seemed to think the most important thing was placing songs that already sound similar next to each other in the queue. I also realized that music is super complicated, and I was probably never going to be able to perfectly meld two songs together at energetic moments like a real DJ. After I left the meeting, I decided that instead of looking for beats that sound really similar, I was going to look for beats that are really quiet. By picking beats that are quiet, the difference between them becomes significantly less relevant and I was hoping that the resulting transition would be smoother. This was pretty easy to do, I just measured the loudness of each beat (expressed in dB), stored the data in a vector, and compared the vectors using Euclidean distance / 2 norm (again). Then I tried out my new algorithm and recorded some demo videos.