Music Re-Discoverer

A system that allows you to dig into your listening data to re-discover music that you have already listened to, but might have forgotten.

A bar graph showing the number of tracks played per day from December 2024 to May 2025. The data shows a peak in listening activity around the New Year.

Figure 1: Timeline of listening activity showing the total number of tracks played per day, December 2024 - May 2025.

A node graph of a 7-node AI agent processing pipeline. The graph starts with the 'Classify Intent' node, which leads to 'Extract Entities'. From there, the flow splits into three distinct parallel nodes: 'Describe My Listening', 'Get Biographical Information', and 'Get Music Recommendations'. The output from these three paths merges and then proceeds to the 'Execute SQL Query' node, and finally to the 'Generate Response' node.

Figure 2: A LangGraph network diagram illustrating the 7-node processing pipeline of the AI agent.

An Entity-Relationship (ER) diagram showing the database schema for music listening data. The diagram illustrates the relationships between the main entities: artists, albums, tracks and tags.

Figure 3: Entity-Relationship (ER) diagram of the music listening database.Also available from this link: https://dbdiagram.io/d/last-fm_erdiagram-68a9d7b11e7a6119674638b6.

Keywords

Music Tech, Streaming data, Human-Centered Intelligent/AI Systems.

Links

Github page (data collection, EDA, music agent notebooks).

Technologies

Python (LangGraph, SBERT, sklearn, plotly), SQL (sqlite), Last.fm API, Google Colab.

Background

While modern music streaming services typically focus on discovering entirely new music, a user's own past listening history can often be a more powerful source for personalized and interesting recommendations.

Aim

The aim of this project was to create a system that enables the intelligent re-discovery of tracks and artists from a user's own listening history.

Approach

The project's approach follows a complete data science pipeline. It begins with data collection using a custom Last.fm API wrapper to fetch personal music streaming data. This data is then structured into a SQL database for efficient querying. For exploratory analysis, interactive visualizations of listening patterns are provided via a dedicated notebook. Finally, an AI-powered music discovery agent was developed to allow for natural language interaction with the database. The agent uses a LangGraph workflow and combines sentence transformers with regex patterns for intent classification and entity extraction, supporting queries such as "tell me about Radiohead", "what were my top artists last month", and "find music similar to Radiohead".

Findings

A network analysis of artist similarity relationships, based on the Last.fm API similarity score, was performed during the exploratory data analysis (EDA). This analysis revealed a network of 328 artists connected through 377 relationships, with an average of 2.3 connections per artist. The network was composed of somewhat fragmented communities, though this is likely due to the dataset being a temporal snapshot. The nodes with the most connections for the time period explored consisted of jazz ensembles and Italian artists.

The main outcome of this work was a simple chat interface that allows users to interact with their listening history. In particular, the system lets a user find biographical information about artists they have listened to, as well as identify other similar artists in the database. In addition to looking at top streamed tracks, users can interactively explore many artists through an iterative process that does not require them to write any SQL or Python code.

Return to main page.

Page updated

Google Sites

Report abuse