Song Genre Visualisation and Classification

Song Genre Classification

A song genre is a categorization of music based on shared musical characteristics, such as instrumentation, rhythm, tempo, melody, harmony, and vocal style. Genres can also be defined by their cultural or historical context, or by their lyrical themes.

In the following venture we try to predict genres of songs by using a machine learning approach.

The code aims to classify songs into different genres based on audio features such as tempo, spectral centroid, and rhythm.
The dataset is a reduced version of the GTZAN Dataset obtained from Kaggle. Here songs are organized by genre.
The dataset is split into training and testing sets, and features are extracted from the audio files.
Features include tempo, spectral centroid, and rhythm features obtained using librosa functions.
The code uses a linear Support Vector Machine (SVM) classifier for genre classification.
The SVM classifier is trained on the training set and evaluated on the testing set.
Classification accuracy is calculated using scikit-learn's accuracy_score function.

Google Colaboratory

Song Genre Classification code

Click here to access the dataset

20% of the dataset was used for testing, and the model had an accuracy of 50%

Visualizations

The following code is a tool to visualize audio signals in the time and frequency domains. It can plot the waveform, spectrogram and pitch contour for a given audio file.

Google Colaboratory

Code

jazz.00000.wav

Audio file

A waveform is a graphical representation of a signal in the time domain, showing how the amplitude (strength) of the signal varies over time.

A spectrogram is a visual representation of the spectrum of frequencies in a signal as they vary with time. It provides a way to analyze the frequency content of a signal over time, making it particularly useful for studying how the spectral characteristics of a sound change over different portions of an audio recording.

In a spectrogram, the horizontal axis represents time, the vertical axis represents frequency, and the color intensity (or darkness) represents the magnitude or power of the frequencies present at a particular time.

A pitch contour is a graphical representation of how the pitch (frequency of a sound) varies over time in an audio signal. In the context of speech and music, pitch represents the perceived frequency of a sound, often associated with the perceived "highness" or "lowness" of the sound.

A pitch contour typically shows the changes in pitch over time, providing a visual representation of the pitch fluctuations in an audio signal.

Page updated

Report abuse