Estimation of inharmonicity in Kora recordings using harpsichord analysis techniques
1 Introduction
This project is concerned with the kora, a 21 string ’lute-harp’, commonly played in West African folk music. The kora is tuned to a 7-note scale. It is played with both hands, with the thumbs plucking the lower-pitched strings to provide an ostinato or groove (the Kumbengo), and the fingers picking out more detailed melodic lines (the Birimitingo). Recordings of virtuoso kora players such as Toumani Diabaté and Ballaké Sissoko are have received widespread critical acclaim through the popularisation of ‘world music’, yet very few examples of music analysis research exist for this instrument. Conversely, the harpsichord is a widely studied and understood instrument, with many examples of research into harpsichord acoustics and tuning in existence [Tidhar et al., 2010, Chadefaux, 2013, Dixon et al., 2012]. The kora and the harpsichord have similar spectral characteristics, due to both instruments employing
plucked strings to generate tones. It is therefore possible to apply well documented analysis techniques from the harpsichord to the kora. In this study I will present a method of estimating the inharmonicity of different kora recordings, using an algorithm described in [Dixon et al., 2012].
Figure 1: Toumani and Sidiki Diabaté, two kora players from Mali
The kora is constructed from a large resonant chamber made from a calabash gourd. The 21 strings are often made using fishing wire, with many thicker strands weaved together to provide the lower bass notes. They are attached to a long neck with leather straps, which can be moved up and down the neck for tuning, although many modern koras make use of tuning pegs similar to those seen on a guitar. At the other end, the strings attach to a floating bridge with notches to hold each string in place. Two vertical posts either side of the bridge allow the performer to hold the kora upright, facing the strings, with the thumb and forefinger free to pluck the strings.
Figure 2: How a kora is constructed
The 21 strings on a traditional kora are arranged in two parallel lines running vertically down the side of the bridge. The strings are typically tuned to a heptatonic scale. Running up and down scales is achieved by alternately plucking with the left and right hand. In solo performances, koras are tuned to a pitch close to a tonic close to an F, but can be higher or lower depending on the musician’s preference.
Figure 3: How a kora is tuned
Typical kora compositions feature the Kumbengo: an ostinato bass pattern that provides a rhythmic accompaniment, with pedal notes reinforcing the tonality; and the Birimitingo: the melody played with the forefingers, often featuring rapid movements up and down the scale, with many improvised flourishes and ornaments. Traditionally, the kora was played by a Jali or Griot: an equivalent to a European bard, who would provide an oral history or social commentary through folk songs [Charry, 2000]. Kora performers tend to take artistic license with their interpretations of folk songs, and adapt, omit or add entirely new musical ideas to well known songs. This makes transcription a particularly difficult task, with some attempts to notate kora music using western notation and alternative stave layout [Loquenz], and tablature for guitar interpretations [Gripper].
This study employs techniques from Dixon’s work on pitch and inharmonicity estimation for harpsichord recordings [Dixon et al., 2012]. The original work describes an algorithm capable of automatic note transcription and temperament estimation of a given harpsichord recording, using accurate inharmonicity judgment to improve the fundamental pitch estimation. The present study deals only with the inharmonicity detection process.
4.1 Inharmonicity
Figure 4: Frequencies of true and ideal harmonics of a plucked string
Inharmonicity is a phenomenon found in plucked string instruments, in which the true harmonics of a plucked string are slightly higher than integer multiples of the fundamental. The frequency of the kth partial above the fundamental can be calculated using equation 1 [Fletcher, 1964]:
(1)
Where B is the inharmonicity constant. If the kth and jth partial frequencies are known, the inharmonicity constant can be calculated as follows:
(2)
The overall inharmonicity for each string can be estimated by averaging over all inharmonicities between Bi,i+1 for N harmonics ki -kN. Various methods exist for this operation, although taking the median of all inharmonicities has been shown to be the least susceptible to outliers from miscalculation/noisy input. Once the overall inharmonicity for each string or pitch has been established, it is possible to construct an inharmonicity profile, seen in figure 5. This plot describes the inharmonicity for a harpsichord, which shows a rising inharmonicity relative to pitch:
Figure 5: Inharmonicity profile for harpsichord taken from [Dixon et al., 2012]
4.2 Method
By analysing the spectrogram in Sonic Visualiser, it was found that the low notes in the kumbengo part could be easily separated from the rest of the recording by applying a high pass filter with -48 dB/Octave slope at 300 Hz. This removes many of the overlapping fundamental frequencies and hopefully lessens the likelihood of a partial peak of a low note being incorrectly classified as a fundamental pitch. As the lengths of these recordings can reach 10-12 minutes, automatically searching for single notes within the algorithm would increase the computation time dramatically. Analysing only short sections of the music was considered, but this would also reduce the number of notes analysed. In order to maximise the size of the dataset whilst avoiding heavy computational load, it was decided to manually search for and cut out suitable notes for analysis from an existing recording. This was achieved in the Logic Pro X DAW.
Figure 6: Spectrogram of Toumani Diabaté’s recording of ‘Kaira’. Note the ‘gap’ between low and high notes at 300 Hz
Figure 7: Using Logic Pro X to search for and cut out individual notes
This process was performed on two recordings of ‘Kaira’, by Toumani Diabaté and Mamadou Sidiki Diabaté, yielding 56 and 51 separate sound files respectively. Three more datasets were used to verify the performance of the system, which consisted of recordings of individual notes from a 1720’s Blanchet harpsichord, obtained from http://sonimusicae.free.fr/blanchet1-en.html.
The inharmonicity estimation algorithm is performed as follows:
Step 1: Perform STFT on single note waveform
In order to estimate the the fundamental frequency, an STFT is performed with the following parameters:
Downsample to 11.025 kHz
Hamming window
Window length = 512 (4096 in original paper: too large for notes < 0.25s)
Hop size = 256
FFT size = 8192 (zero padding factor of 2)
Figure 8: Single frame from STFT of a single note extracted from ’Kaira’ - Toumani Diabaté
Step 2: estimate f0 by searching for peak frequencies in the amplitude spectrum X(n,i)
Use Adaptive Thresholding technique in order to find locally significant frequency bins for each FFT frame:
Calculate moving weighted mean μ(n,i) and moving weighted standard deviation σ(n,i) of |X(n,i)|
“Locally salient” bins are counted when a spectral bin |X(n,i)| exceeds the moving mean plus half the standard deviation:
(3)
and is within 25dB of the global maximum bin amplitude:
(4)
Step 3: Find fundamental frequency by quadratic interpolation
From the locally salient bins found in step 2, the maximum bin provides a rough estimate of the peak frequency. The true fundamental frequency can be found by quadratic interpolation of the log magnitude of the peak frequency bin and the two adjacent bins:
Take peak bin ap and its two adjacent bins ap-1 and ap+1.
The true frequency is the corresponding frequency of the peak bin plus an offset δ, and is found using
(5)
Where p is the peak bin number, fs′ is the sampling frequency after downsampling, and N is the number of frequency bins.
δ is defined as the peak of the parabola over points (-1,ap-1), (0,ap) and (1,ap+1), obtained using
(6)
Figure 9: Estimating true frequency using quadratic interpolation
Step 4: Perform a second STFT to find partial frequencies
A second STFT is performed on the signal in order to locate local peaks at the bins containing partial frequency amplitudes. The following parameters are used:
Sampling frequency 44.1 kHz (no downsampling)
Blackman Harris window
Window length = 4096
Hop size = 1024
FFT size = 16384
Step 5: Estimate partial frequency location using standard inharmonicity constant
In order to classify peaks in the second STFT as partials belonging to the current note, only peaks found within certain windows corresponding to the estimated partial frequencies are observed. To begin with, the partial frequencies are estimated using the f0 estimation from step 3 and a standard inharmonicity constant B = 2 × 10-5.
Using equation 1, the frequencies of the first 40 partials are estimated
For every partial frequency k, a peak in the spectrum is searched for 30 cents above and below the corresponding bin.
If no peak is found within this window, this partial is not counted
Step 6: Inharmonicity Estimation
Now that partial frequencies have been estimated, the inharmonicity constant B can be refined using equation 2.
Find the inharmonicity between every detected partial using equation 2
Take the median of all inharmonicity values to obtain a new value for B
Repeat steps 5 and 6 to refine the inharmonicity estimation
Iterate until B converges or until 100 iterations
Repeat steps 1-6 for every note
The algorithm was first tested on the Blanchet Harpsichord datasets. These datasets consist of a single .wav file for each note on the keyboard. The length of the files range from 14-3s.
Figures 10-12: Inharmonicity profiles for Blanchet Harpsichord, stop 1, 2 and "lute"
The scatter graphs shown in figures 10 and 11 show a promisingly similar correlation between pitch and inharmonicity to that seen in figure 5. Keeping in mind that no statistical analysis has been performed on this data, it can be seen that the large spread in inharmonicity values at lower frequencies corresponds with the more variable predicted inharmonicity at similar frequencies in figure 5. The presence of negative inharmonicity values implies some error, as B should never be negative for any realistic values of f0, fk, and fj from equation 2.
It is unclear why the ‘lute’ stop inharmonicity profile is apparently so much more uncorrelated than the others when analysing by eye. Clearly, further adaptation is needed for a truly robust system.
The same algorithm was then applied to the kora recordings. Unlike the harpsichord dataset, these datasets often contain many different examples of the same note. As inharmonicity has been shown to vary with amplitude, there is now a range of inharmonicity values for individual frequencies, unlike the more linear plots seen above. These are not ‘clean’ samples, and while care was taken to find isolated notes from the recording, interferences from other notes could not be totally avoided, especially due to the harp-like nature of the instrument, which allows notes to sustain after a new note is played.
Figure 13 & 14: ‘Kaira’ recorded by Toumani Diabaté and Mamadou Sidiki Diabaté - Inharmonicity Profiles
The vertical clusters of points on both scatter plots show that at least the fundamental frequency estimation is relatively consistent, especially in figure 13. The range of values for inharmonicity is also promising, save for one outlier in figure 14. No clear positive or negative correlation between pitch and inharmonicity can be seen in these plots, but statistical analysis could show an inharmonicity response unique to the kora.
To verify the f0 estimation performance, the system was tested with a tone consisting of three sine waves at 440 Hz, 880.9 Hz and 1232.9 Hz, representing a note A4 with inharmonicity B1,2 = 7.75 × 10-4. This resulted in a calculated f0 value of 441.5 Hz. As the system is not intended to run in real-time, extending the STFT could yield more accurate f0 readings, with the longer computation time being an acceptable trade-off.
This study presents a largely successful implementation of Dixon’s harpsichord inharmonicity estimation algorithm for recordings of the kora. f0 estimation is consistent across separate occurrences of the same note, however the accuracy could be improved. Verification with the harpsichord datasets provided the expected result of a positive correlation between pitch and inharmonicity, although the poor performance with the lute dataset is problematic. The range of inharmonicity values for the kora (to the order of 104) seems reasonable when compared with the results from Dixon’s paper and the harpsichord analysis.
The first step for future work on this project would be to extend the inharmonicity estimation further by collating the inharmonicity value for each occurrence of the same note and taking an average with a confidence rating obtained from the interquartile range (IQR), as per the original study. This would allow trends in the inharmonicity profiles to be more easily measured and compared across different recordings and performances.
Due to the effect of amplitude on inharmonicity, it would be interesting to include the amplitudes of each note in the profile plots, in order to visualise to what extent this effect is present in kora performances.
Aside from inharmonicity, accurate pitch recognition of kora recording would allow analysis of tuning schemes, in a similar manner to the temperament classification described in the original study. It would be interesting to see how the kora is tuned in terms of western tuning schemes, and to compare different tuning across different performers and regions.
In conclusion, this project presents the first steps towards deeper analysis of the kora. Expanding and building on techniques such as this allows the kora and other instruments to be studied without requiring direct access to the instrument. This would allow for large comparison studies to be made using only recordings of the instrument, as well as making such research more accessible where there is difficulty in taking measurements of the instrument in situ.
[Chadefaux, 2013] Chadefaux, D. (2013). Analysis of the harpsichord plectrum-string interaction. (March 2016).
[Charry, 2000] Charry, E. S. (2000). Mande music : traditional and modern music of the Maninka and Mandinka of Western Africa /. University of Chicago Press,, Chicago :.
[Dixon et al., 2012] Dixon, S., Mauch, M., and Tidhar, D. (2012). Estimation of harpsichord inharmonicity and temperament from musical recordings. The Journal of the Acoustical Society of America, 131(1):878.
[Fletcher, 1964] Fletcher, H. (1964). Normal Vibration Frequencies of a Stiff Piano String. The Journal of the Acoustical Society of America, 36(1):203.
[Gripper] Gripper, D. Reading the Kora Scores. http://www.derekgripper.com/african-guitar/reading-kora-scores/
[Loquenz] Loquenz, H. transcriptions. http://www.kora-music.com/e/transkriptionen.htm
[Tidhar et al., 2010] Tidhar, D., Mauch, M., and Dixon, S. (2010). High precision frequency estimation for harpsichord tuning classification. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, number APRIL 2010, pages 61–64.