Pitch Detection Overview

This page will describe broadly several modern approaches to pitch detection.

Klapuri's Iterative Approach

This technique is proposed and elaborated in 3-4 papers by Anssi Klapuri (U. Tampere, Finland) from 1997 to 2006.

  • Intuitive, fairly straightforward
  • Gets under 5% error rate in polyphonies up to 4 voices
  • Haven't evaluated the real-time potential of the algorithm
How it works:
  1. The signal is acquired, and noise suppression is performed.
  2. Predominant F0 detection is done
  3. The spectrum is smoothed, and the choice is reevaluated.
  4. The harmonic series for the detected F0 is removed from the spectrum.
  5. The number of remaining voices is estimated.
  6. Return to step 2 if necessary.
  • Uses a number of tricks and more complicated techniques to handle the details of each of these steps and to reduce error rates.

The McLeod Pitch Method (MPM)

This technique is described in "A Smarter Way to Find Pitch" (by Phillip McLeod, 2005, in New Zealand).

  • high frequency resolution (+/- 1 cent is reliably detected)
  • accurate with small time windows (512 - 4098 samples at 44100 Hz)
  • detects pitch, not fundamental frequency
  • runs in real-time (confirmed)
  • uses the FFT to compute the ACF quickly (Rabiner. L, Schafer. R, Digital Processing of Signals, Prentice Hall, 1978), and gives algorithm (simple)
  • only works on monophonic sounds
  • fundamental period detection scheme relies on tuned thresholds
How it works:
  • Computes a "Normalized Square Difference Function"
    • Computes the ACF
    • Computes another summation quantity as a normalization value
    • Time is centered over the sample window.
  • Selects key maxima from the NSDF output
    • Find all local maxima
    • Identify the regions of the output bounded on the left by a positively sloped zero-crossing and on the right by a negatively sloped zero-crossing
    • Each of the largest maxima over each of these regions is a "key" maximum
    • The precise position of each of the key maxima is determined using parabolic interpolation over the surrounding points
  • Chooses a key maximum as the pitch
    • Identifies the highest key maximum, and finds the value of its height multiplied by the constant k.
    • Chooses the first key maximum whose height is above that threshold.
    • k is in the range of 0.8 to 1.0
  • The clarity measure is the actual height of the selected peak.
Ideas for improvement:
  • Take the FFT of the correlation, thereby finding the dominant period
    • this might not work on an ACF-like object, which was generated from the FFT in the first place
    • maybe with smoothing?