Assignment 2

Alexander Shatov / Unsplash

Is Spotify a form of the Mandala?

Spotify, a music streaming service loved by all. A platform that serves as a beacon to music lovers, one that everyone from children to grandmothers can enjoy. Spotify brings us all together, whether it's sharing our yearly Spotify Wrapped or hating the desktop updates that no one asks for. However, is there a price for this service, apart from the fee that we pay (I hope no one uses Spotify with ads)? To find the answer to this question, I decided to ask Spotify for the data that they store about me. Hopefully, what they sent is all that they track, which is unlikely, but one can hope.

The question then is, to figure out if my Spotify data has enough data markers that can allow it to see through my soul. What do the people at Spotify know about me? If my data is being sold, what do the people buying it know about me? If there is enough data, Spotify could be just like the Mandala Cube in Jennifer Egan's book The Candy House. The Mandala Cube is a service where people can opt in to join the collective unconscious and look into each other's memories while giving others the access to their own. Spotify is a bit more one sided, as I just get to listen to music, and they get to potentially peer through my soul.

Why choose Spotify data?

For me, choosing Spotify as my data source made the most sense for a few reasons. Firstly, I had been using Spotify for a few years, so there was plenty of data about me. Secondly, I use Spotify quite frequently, so I was curious about what I could uncover. Lastly, Spotify Wrapped just came out and I want to know if there's some extra info I could uncover through my analysis.

How easy was it to request the data?

Spotify has a dedicated page where you can request data. They have two types of requests that you can make: a short-term data request which contains the past 2-3 years, and an extended dataset that contains all of your streaming data. I requested the extended data, which took a bit more than 30 days for me. Your mileage may vary.

What does the data look like?

The above screenshot shows what the data looks like. It comes in multiple JSON files, which I collected into one large file for ease of use. It also contains some sensitive fields, such as IP addresses, so I cleaned the data and kept only useful fields. In total, I streamed over 152,000 times, which is insane. In total, I had around 192 days of total streaming time, and 22,000 unique songs that I had streamed on Spotify in total. After removing songs that I had listened to for less than 5 minutes, this number dropped to 15,000 unique songs, which is still a lot.

The Dark Side

This graph is particularly interesting to me, since I can trace my location in the past decade using this data. Although there are some data points that do not reflect reality, such as the multiple points in Czechia, since I was using a VPN when Spotify was blocked in my home country, this graph is largely accurate. I was in Pakistan until 2020, when I came to UAE for Candidate Weekend. I then started going to university here at NYU Abu Dhabi, which is reflected by the back and forth between PK and AE since 2021. The data even captures my visits to other countries since then, like my visit to Azerbaijan, and then Czechia (it wasn't a VPN this time), and then a visit to Georgia. It was even able to track my stop at Dubai since I was getting a connecting flight from there to Czechia. Since people are more likely to listen to music while traveling, Spotify data can be quite accurate in tracking your movement. Not only that, but since it keeps logs of your IP address, there is much more personal data that can be extracted from this collected data. 

The good thing is that the data isn't too clean. I removed extraneous values during the cleaning process, which showed me a perfect tracking of my location over time. However, things like VPNs mess up this tracking, and the messy data you see on the left is what you would get upon first inspection. Sure, there are ways to denoise this by averaging out the location or removing locations with very low frequency, but it requires some effort.

Data Present in the Requested Dataset

I wanted to see what kind of things I could figure out about my listening habits from just the raw data available from the request, without using any API to get detailed records. I visualized my data using Tableau after cleaning it using Python.

Spatial Patterns

Let's go back to the theme of extracting more analytics in the vein of the yearly Spotify Wrapped. Apart from the countries I've streamed from in the past few years, I wanted to see what spatial information I could infer from my streaming records. I've lived in two countries for an extended period of time: Pakistan and the United Arab Emirates. So, I was curious to see what the split was between these two countries when it came to the number of total songs streamed in each. To my surprise, the numbers are actually equal—up to the hundreds of streams, which is pretty close considering there are upwards of 150,000 streams.

Seeing this made me realize that I really have spent 3 years in the UAE, which I just realized that I haven't really thought about too much. So, this statistic brought about a little session of self-reflection.

Number of songs streamed per country

Top 100 Songs

One of the first things I wanted to look at was my Top 100 Song List. I ranked the songs based on the total time I spent listening to them. The song I've listened to the most is Poison Tree by Grouper. I expected this, because even though I first heard this song a year or two ago, I have been listening to this on repeat ever since. What's also interesting is that only a few songs take up a large chunk of the total time I spend listening to music, even when it comes to my Top 100 songs of all time (Spotify only).

Artist and Song - Color and size represent total minutes streamed.  The marks are labeled by Artist and Song.

Comparing cumulative stream time per song widened the gap between the top 100 songs and the other songs even further. Now, only 100 out of the 23000 tracks contribute to 10% of the total stream time. Given I've streamed my top song, Poison Tree, for over 10 hours, this seems pretty reasonable. Even though I believe that I explore music quite a lot, it's easy to see that I have clear favorites that I come back to and spend the majority of my time listening to.

To put this into perspective, I compared the total times the top 100 songs have been listened to versus the other songs, and the top 100 songs take up a staggering 5% of the total streams. This means that I stream them a lot considering the other songs number to around 23,000 songs. 

Just to be sure this wasn't skewed data because I replayed the same songs over and over again without listening to them, I decided to do this comparison with the total streaming time, which was easy to calculate since the raw data contained the time streamed per stream.

Artist Variety

Again wanting to confirm whether I am as exploratory with music as I believe I am, I decided to look at the artists I listen to, with the ones I listen to the most up top. This time, I measured the number of different songs for each artist that I have listened to. To keep things fair and not inflate the numbers, I filter out songs that I've spent less than 20 minutes listening to. With this done, we see that I really explore the artists that I like, often branching out and listening to other music by an artist. I've listened to 2-3 songs at least for most artists. This does reflect reality, as I find myself trying to find music similar to what I like using the Spotify Radio, or just exploring the artists discography when I get music fatigue from my current rotation of tracks. You can really see that I like to explore music using this data.

Temporal Elements

I explored the temporal element of my listening habits next. First, we see how my hourly music listening habits have changed over the years. Between 2016 to 2019, my peak listenership remained below 2000 minutes. 2020 is the year I start going to college, and we see a massive upward trend as I listen to music more frequently across all hours of the day. The peak listenership is in 2021, when I had my first on-campus semester at NYUAD. It has since declined, but the numbers have been consistently higher than pre-university numbers. This makes sense to me, and the reason I started getting into music more was that I was inspired by people to music very different to what I was used to listening to. This made me want to explore, and I started listening to a greater variety of songs, of many different languages as a way to connect with the people I met here.

Exploratory Habits

Next, I could see how exploratory I have been across the months, and if there are specific time periods when I'm more likely to listen to songs that I haven't listened to before. To do this, I combined my streaming data along with the data about my library—songs that I have saved and that I know. Then, I could use a track identifier to see if a streamed song was in my library or not. After doing this, I visualized monthly listenership:

There are a few interesting things here. Firstly, I used to have some months where I did not listen to music at all. That has changed, there I am listening to music every month now. 

Something that's interesting is that I am listening to music that is not in my library for the same number of minutes as I used to before 2021, which is confusing because I am listening to so much more music now. At first, I thought that I was going back to my library and listening to songs, but I know that I listen to a much larger variety of music now than before. So what's the issue? This can be explained by a change in the way I save music to my library. Previously, I used to be very selective in the songs that I added to my library. Therefore, there were the songs that I knew really well and had listened to a lot, that were in my library. Now, I find that it's much more convenient to add songs to my library for easier access, even if I haven't listened to them much. I then remove them if I dislike them, but I stretch out my listening sessions to different periods. With this change in saving habits, it's hard to really see the shift in music exploration behavior. As such, this data isn't really a good metric to judge that.

Data Requested via Spotify API

Even though the data sent by Spotify gave me lots of information, I know that Spotify has MUCH more data about the different artists and songs that they have on their platform. Features like the song's BPM, the genre of music that an artist creates, the danceability and so on are present in their databases. Since Spotify made it convenient for me and included a unique identifier field called the track URI, in the data that they sent, getting the extra data should have been a piece of cake.

However, I got rate-limited for using the Spotify API, and the annoying thing is that Spotify doesn't disclose what the limit is. I found out because my code kept giving me a 409 error. Still, I was able to extract genre information about the songs that I've streamed, and I'll be exploring those next.

I did the same thing as before, grouping up the genres and measuring which ones I've listened to the most. The metric I used here is the number of minutes listened, and modern rock is my top genre with pop and, rock, indie, and alt rock music coming next. I filtered out genres that I haven't listened to for less than 100 minutes in total, as there is a lot of subgenre data included through the API request, and I don't think I know a genre if I haven't even spent ~1.5 hours listening to it.

I did the same thing again but changed the metric to the number of distinct songs listened to, and I guess I've listened to more pop songs than anything. But I think total listening time is a much better metric. 

Connection to The Candy House

Spotify, much like the collective unconscious in The Candy House, is a subscription that allows people to share data. In the case of the Mandala Cube, this data was memory data, and so users of the Mandala Cube could access one another's memories. Spotify is similar, as you can use it as a service to listen to music, but this platform stores a ton of data regarding your listening habits, and even spatial markers on you. Also, since Spotify allows users to view what their friends are listening to, Spotify is much closer to the Mandala Cube that lets you peer into the lives of others. I can see who is listening to what type of music and infer things about them through their listening habits. I can also get my data from Spotify and go back in time, just how the Mandala Cube lets you see your memories that you've forgotten.

What would Roxy say about my data?

Roxy, a character in The Candy House, is an interesting character. She's a recovering drug addict, and parts of the story that show us about her character and her life are very raw because drug addiction is a very real problem, and we see the effects of her struggle with substance abuse in the form of her escapist attitude. She is a proponent of the Mandala Cube, as she uses it to go back in time and relive the memories of the trip to London with her father, and her strained relationship with her sister after that trip. Given that she uses the Mandala cube, I feel that Roxy would be really interested in Spotify data in general, not just mine, because of the way it allows you to relive the past. If music is an integral part of your life, you can really tap into past memories just by listening to the songs you listened to. I feel that if Roxy used Spotify, she'd be excited about her yearly wrap, and would want to know more about the data that Spotify stores on her so that she could come back and relive her past memories.

READY FOR GRADING!