● Created a model that can label different segments of a song as Chorus, Verse, Intro, Outro, and Bridge, to improve on existing “AutoMix” features from providers like Spotify and Apple Music.
● Compared performance between techniques like CNN, LSTM, and a combination of models.
● Handled Preprocessing, Model Tuning/Training, and Postprocessing on 1000+ songs from SALAMIdb and Harmonix Set, achieving 70% accuracy. Competes with other baseline models with better conditions.
Research in this specific function is very new, with the most recent publication being SongFormer, late 2025. With less than 1/6th of their database, a significantly less complex model, and three weeks, we were able to make a competitive model that rivaled their baseline.
Our model is an LSTM and CNN combination that takes positional and sequencing context, RMS and Spectrum averages, and spectrogram data to classify song segments created using Librosa. We overcame three big hurdles:
Relatively small data sets
Label Confusion, as the spectograms of a chorus and verse can look very similar
Very inconsistent data quality
Our full report can be found here. Please read it if you are interested in how we overcame the hurdles, our results in detail, and how we will apply our findings in the future.