A system that allows you to dig into your listening data to re-discover music that you have already listened to, but might have forgotten.
Figure 1: Timeline of listening activity showing the total number of tracks played per day, December 2024 - May 2025.
Figure 2: A LangGraph network diagram illustrating the 7-node processing pipeline of the AI agent.
Figure 3: Entity-Relationship (ER) diagram of the music listening database.Also available from this link: https://dbdiagram.io/d/last-fm_erdiagram-68a9d7b11e7a6119674638b6.
Keywords
Music Tech, Streaming data, Human-Centered Intelligent/AI Systems.
Links
Github page (data collection, EDA, music agent notebooks).
Background
While modern music streaming services typically focus on discovering entirely new music, a user's own past listening history can often be a more powerful source for personalized and interesting recommendations.
Aim
The aim of this project was to create a system that enables the intelligent re-discovery of tracks and artists from a user's own listening history.
Approach
The project's approach follows a complete data science pipeline. It begins with data collection using a custom Last.fm API wrapper to fetch personal music streaming data. This data is then structured into a SQL database for efficient querying. For exploratory analysis, interactive visualizations of listening patterns are provided via a dedicated notebook. Finally, an AI-powered music discovery agent was developed to allow for natural language interaction with the database. The agent uses a LangGraph workflow and combines sentence transformers with regex patterns for intent classification and entity extraction, supporting queries such as "tell me about Radiohead", "what were my top artists last month", and "find music similar to Radiohead".
Findings
A network analysis of artist similarity relationships, based on the Last.fm API similarity score, was performed during the exploratory data analysis (EDA). This analysis revealed a network of 328 artists connected through 377 relationships, with an average of 2.3 connections per artist. The network was composed of somewhat fragmented communities, though this is likely due to the dataset being a temporal snapshot. The nodes with the most connections for the time period explored consisted of jazz ensembles and Italian artists.
The main outcome of this work was a simple chat interface that allows users to interact with their listening history. In particular, the system lets a user find biographical information about artists they have listened to, as well as identify other similar artists in the database. In addition to looking at top streamed tracks, users can interactively explore many artists through an iterative process that does not require them to write any SQL or Python code.