šµ Pitch refers to the frequency of a sound, with different frequencies producing different musical notes.
š¶ Note encompasses both pitch and duration, represented by symbols on a staff.
š¤ Tone, or timbre, describes the unique sound quality of a musical note influenced by overtones and instrument characteristics.
Pitch Estimation with CREPE
The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval.
CREPE (Convolutional REpresentation for Pitch Estimation) is a deep learning-based pitch estimation algorithm that was proposed in 2018. It is a convolutional neural network (CNN) that operates directly on the time-domain waveform of an audio signal to estimate its pitch.Ā The CNN learns to extract features from the time-domain waveform that are relevant to pitch estimation. These features are then used to predict the pitch of the audio signal.
CREPE is trained on a large dataset of audio signals with known pitch labels. The training dataset includes a variety of audio signals, such as speech, music, and environmental noise. This allows CREPE to learn to estimate the pitch of a wide variety of audio signals.
Spectral Centroid
The spectral centroid is a fundamental audio feature that characterizes the "center of mass" or the average frequency of a sound spectrum. It plays a crucial role in audio analysis, including pitch estimation and timbre characterization.
A higher spectral centroid indicates that the audio signal contains a significant amount of high-frequency content.
Conversely, a lower spectral centroid suggests that the audio signal is dominated by low-frequency components.
It provides insights into the "brightness" or "dullness" of a sound, with higher values associated with brighter sounds and lower values with duller sounds.
Audio file considered for analysis
Estimations of Pitch, Note and Spectral Centroid per window of the audio file
Here's a brief comparison between a few common methods for pitch estimation:
Autocorrelation:
Autocorrelation measures the similarity between a signal and a time-shifted version of itself. Peaks in the autocorrelation function correspond to periods or pitches.
Good for periodic signals with clear fundamental frequencies, but susceptible to noise interference.
Simple to implement algorithmically, involves calculating correlations at different time shifts.
Widely used in speech analysis due to its simplicity and effectiveness in detecting fundamental frequencies in voiced speech.
Fast Fourier Transform (FFT):
FFT analyzes the frequency content of a signal by transforming it from the time domain to the frequency domain.
Effective for signals with clear and steady tones but less accurate with noisy or transient signals.
Efficient computationally, especially when using optimized FFT algorithms.
Commonly used in music applications for pitch analysis and detecting fundamental frequencies in harmonic-rich signals.
Harmonic Product Spectrum (HPS):
HPS involves combining multiple spectra (by multiplying them) to enhance the harmonic structure in a signal.
Effective for identifying harmonic structures, especially in music signals.
Moderately complex due to spectral analysis and combining spectra.
Useful in music analysis where harmonic content and identifying fundamental frequencies among harmonics are crucial.
YIN Algorithm:
YIN algorithm calculates the difference between a signal and its shifted version to estimate the fundamental frequency.
Provides high precision and robustness, even in noisy environments.
Moderately complex due to advanced algorithms, involving difference calculations and thresholding.
Commonly used in both music and speech analysis due to its accuracy in detecting pitch, especially in challenging conditions.
Subharmonic Summation:
This technique involves detecting and summing subharmonic frequencies (integer fractions of the fundamental frequency).
Effective for signals with strong subharmonic content, aiding in identifying the fundamental frequency.
Can be complex, especially when dealing with multiple subharmonics or non-linear systems.
Useful in specific musical contexts and specialized applications where subharmonic relationships are prominent, like some wind instruments or vocal techniques.