December 15, 2023
I read Daniel Levitin’s This is Your Brain on Music, an introduction to music from a neuroscientist’s point of view, in my sophomore year of high school. I loved it enough to write about pursuing a music cognition minor in my application to Northwestern University. Today, I study math and statistics at NU—my promised career in music cognition never having materialized—but I still enjoy learning about the inner workings of music.
This is the first of two posts on mathematical topics in music. This one will be about the nuances behind tuning, and the next will be about symmetries in the chromatic scale.
Here’s a question. Sound waves are physical phenomena, and we perceive the frequency of a sound wave as a pitch—say, the pitch A4 corresponding to a physical frequency of 440 Hz. If a trumpet and violin play the same pitch, how can we tell them apart? What makes a trumpet sound like a trumpet and not a violin?
The answer is part physics and part cognition. When we hear an instrument play, we’re not just hearing one sound wave. Instead, there are an infinite number of frequencies of decreasing intensity based on multiples of the fundamental frequency. That is, we hear the 440 Hz, as well as a weaker 880 Hz (A5, an octave up), then 1320 Hz (E6) weaker still, then 1760 (A6), and so on. This is called the harmonic or overtone series.
The amplitudes (loudness) of each harmonic differ by instrument. For example, the first two harmonics of a flute have relatively high amplitudes, followed by much weaker successive harmonics. The harmonics of a violin taper off more gradually. In this way, distinct harmonics are part of what makes a flute sound like a flute or a violin like a violin, called an instrument’s timbre. Our brains essentially do math really quickly to piece together a series of frequencies—440 Hz, 880 Hz, etc.—into the perceived sound of one instrument.
The harmonic series, characterized by clean whole-number ratios between successive notes, is closely related to tuning. Say you have an A4 at 440 Hz. The ratio it forms with the next harmonic, at 880 Hz, is 2:1. We associate this 2:1 ratio between frequencies as the musical interval of an octave, so 880 Hz corresponds to A5. Taking the next harmonic, 1320 Hz, with the A5 gives a 3:2 ratio, which we call a perfect fifth—an E6. 5:4 gives a major third, 9:8 a major second, and so on.
Just intonation is a tuning system built from these “pure” ratios found in the harmonic series. If you want to build a major seventh chord—C E G B in C major—using just intonation, you start with the frequency of the C in the root and relate the other notes in the chord to that root. The E, a 5:4 ratio; the G, 3:2; and the B, 15:8. If the root note had a frequency of 100 Hz, the other notes would have frequencies of 125 Hz, 150 Hz, and 187.5 Hz.
Having a physical basis in the harmonic series gives just intonation a feeling of stability. Performers who can take on a continuous range of pitches, like singers or string instruments, will often justly tune their chords.
But just intonation has a famous problem. You may have seen it referred to as “why you can’t tune a piano” or “why a piano is never perfectly in tune.”
Suppose you want to assign a frequency to each note in a piano—twelve notes per octave, seven white keys and five black keys. Again for simplicity, we’ll take a starting frequency of 100 Hz and call it C. Now we happily assign G the frequency of 150 Hz, forming a nice 3:2 ratio.
We can do it again starting with the G. Multiply the 150 Hz by a 3:2 ratio to get 225 Hz, corresponding to the D a fifth above G. Because we perceive notes an octave apart as the same note, we can divide the frequency by 2 to drop the D down an octave at 112.5 Hz. Repeating this process—multiplying by 1.5 to go up a perfect fifth, then dividing by 2 when necessary to place a note in the correct octave—can generate all twelve notes in the chromatic scale:
C = 100 Hz
G = 150 Hz
D = 112.5 Hz
A = 168.75 Hz
E = 126.56 Hz
B = 189.44 Hz
F#/Gb = 142.38 Hz
C#/Db = 106.79 Hz
G#/Ab = 160.18 Hz
D#/Eb = 120.14 Hz
A#/Bb = 180.20 Hz
F = 135.15 Hz
By their construction, these are the only frequencies based on the 100 Hz root which preserve the “pure” ratio of 3:2 between fifths. But notice that extending it once more, moving a fifth above F to the original C, gives 202.73 Hz, which disagrees from the 200 Hz you'd expect.
It gets worse. Notice that E at 126.56 Hz, for example, disagrees with the “pure” ratio of 5:4 (125 Hz) we would expect from a note a major third above C.
So we see that you can’t have a twelve-note scale which has twelve just-in-tune fifths. And the rest of the intervals? No luck. You certainly can’t tune a piano in a way that preserves perfect intervals between every note. How do we tune a piano, then? Using approximations.
Twelve-tone equal temperament (12-TET) uses powers of twelfth roots of two to create its frequencies. 12-TET creates equal spacing between notes. Because we perceive pitches logarithmically—for example, every octave increase requires multiplying the frequency by two, rather than adding some constant to the frequency—“equal spacing” means the ratio of frequencies between notes should be the same.
There are twelve notes in an octave, so after applying this “equal spacing” twelve times, we expect to have doubled our frequency. Therefore, the ratio between any two consecutive notes, an interval called a semitone or half step, should be the twelfth root of two. If C is 100 Hz, then C# is 100*2^(1/12) = 105.95 Hz, D is 100*2^(2/12) = 112.25 Hz, etc.
The twelfth root of two is irrational, so we’re certainly not going to have nice perfect intervals. G, which under just intonation should be 150 Hz, instead comes to 100*2^(7/12)=149.83 Hz—not quite, but close! E would ideally be 125 Hz but comes out to 125.99 Hz. The musical world decided that these deviations, which are barely perceptible, are preferable because they come with a benefit that all twelve keys come with approximately good intervals that work in all twelve keys.
We measure slight differences in pitches using cents, or one hundredth of a semitone—the ratio of a 1200th root of 2. Humans can’t perceive a one-cent difference in pitch. Somewhere around 10 cents seems to be the threshold for perceptibility, but it can vary by person and by other sonic characteristics like tone quality. A 25 cent difference is reliably perceptible, according to Wikipedia.
The difference between 12-TET and just intonation came up when I played trumpet in high school, though I didn’t realize that’s what it was at the time. Our band director would isolate individual chords, and if we were on the third of a major chord—E in C major—we were told to “lip down” our pitch slightly. (Trumpet players can adjust their facial muscles, called their embouchure, to change a note’s pitch slightly.)
This is because a just-in-tune major third has a 1.25 frequency ratio with the root note, while a 12-TET third has around a 1.26 ratio. Because 1.25*2^(14/1200)=1.26 approximately, the 12-TET third is about 14 cents sharper than the just-in-tune third. That's a big enough difference to matter even for a high school band.
If you want to hear what the difference sounds like, Jacob Collier (dubbed “your favorite musician’s favorite musician” by Rolling Stone) demonstrated.
Collier is famous, at least within certain circles of music, for using just intonation to modulate—change key—to G half sharp in his version of “In the Bleak Midwinter” (he’s also famous for his five Grammy Awards). That's a key between G and G#/Ab, slightly offset from the song's initial tuning.
I’m not going to explain how he did it in full detail, but here’s an idea of how it worked. Beginning with a root note in the traditional A4 = 440 Hz tuning, he built up a chord using just intonation, which we’ve seen will produce notes that differ from the 12-TET norms by a few cents.
For the next chord in the progression, he chose a root note related to those notes rather than the original root, again building a justly in-tune chord, this time outside the A4 = 440 Hz framework. This compounded the difference in cents.
Repeat, and eventually he’s reached a tuning system completely offset from where he started—hence, half sharp. David Bruce has a more full explanation on YouTube, and there’s a really impressive transcription by June Lee if you just want to see the magic unfold (~4:10 is the timestamp for when it happens).
What I like about the Jacob Collier example is that it shows that people can actually create interesting music using the mathematical nuances behind tuning. However esoteric it may seem—and it does seem quite esoteric—understanding music on this level can provide a new creative avenue for enterprising musicians like Collier. As always, creativity comes from a deep foundational understanding.