Western Music From First Principles

by Tom Lokovic


Why do we have twelve notes per octave?

Why do we have sharp and flat notes?

Why do we have major and minor scales?

Why do we talk about what "key" a song is in? Why is that even a concept?

As an engineer who is now learning music, I find the notation and terminology to be both fascinating and mind-boggling. Introductory material tends to start with concepts like sharps, flats, and major and minor keys, which are easy enough to understand, but very difficult to motivate if you take them for granted. For me, it's not enough to know the concepts--I need to know why these are the concepts we use. Must they be this way? Are they preordained by physics and the way our ears work? Or are they an arbitrary inheritance from our predecessors?

It should be no surprise that the answer is "both". Musical structures are influenced heavily by science and mathematics and physiology, and also by evolving tradition. However, our conventions so permeate the teaching of music that it can be difficult to tease apart the why. When the science and history are explained, it is usually in terms of our modern structures, which is exactly the wrong way around!

This page attempts to describe Western music notation the "right" way around: from first principles. I'll start with the science (the part that Must Be That Way) and then progress through just enough history (the part that Happens to Be That Way) to understand why Western music is structured the way it is, without taking the basic concepts for granted.

Note: I have done my best to tease apart the cause-and-effect and to present a clear story in which the tail does not wag the dog, but I am not an expert in this area. Please feel free to contact me with comments or corrections.

Frequency and Pitch

The frequency of a sound wave is its rate of repetition, generally measured as repetitions per second (hertz). An ideal sound consisting of a single sine wave would have exactly one frequency. However, most real sounds (including musical notes) consist of many overlaid frequencies, so that one could describe the full frequency spectrum of the sound. This spectrum is a physical property of the sound independent of any listener.

When a human hears a sound, the auditory system picks out a fundamental frequency which would cause us to consider it higher or lower than some other sound. This perceived frequency is known as pitch, and it is not a property of the sound, per se, but the interaction of the sound with the auditory system. Perception of pitch is affected by many aspects of the sound, the environment, the context, and the listener. For our purposes, though, pitch is the "effective" or "perceived" frequency of a sound.

Intervals and Octaves

The relationship between two pitches is known as an interval. You can think of an interval as the "space between" or "distance between" the two frequencies. Intervals can be measured by frequency ratios (eg 3:2) or in a logarithmic scale called cents. As we will see, there are many names for many different kinds of intervals in music.

The empty interval, or the interval between a note and itself, is called unison.

The next most fundamental interval is the octave. This is the interval between any frequency and its double (or its half). For example, 440Hz and 880Hz are an octave apart. When two pitches are an octave apart (or any multiple of an octave apart), the human ear perceives them as two versions of the same note (one being higher than the other). Most music systems, including the Western one, are built around structures of pitches that repeat from one octave to the next.

Consonance, Dissonance, and Expectation

To the human ear, some intervals sound "better" than others. Broadly speaking, notes that "sound good together" are said to exhibit consonance, and notes which do not are said to exhibit dissonance. Consonance and dissonance have both objective (physiological) and subjective (historical) aspects.

Physiologically, consonance seems to have a strong correlation with harmonics and overtones. That is, intervals with small-integer ratios (3:2, 5:4, 4:3, etc) will generally be more consonant than others. The perfect fifth (a 3:2 ratio) is, broadly, the most consonant interval of all, and functions as an essential building block common to most music systems. As we will see below, musical structures are heavily influenced by the desire to create harmonically pure intervals.

However, there is also a subjective aspect. Historically, some intervals which were once considered dissonant are now considered consonant, and vice-versa. For example, in Renaissance music the perfect fourth (a 4:3 ratio) was considered quite dissonant, while in modern music the perfect fourth is one of the more consonant intervals. Clearly, it's not just the physics at play here; why the change over time?

The answer: changes in the music that people are used to hearing. Conventions evolve over time, and as a result, listener expectations evolve. Many aspects of music sound "good" to listeners simply because that's what they're used to hearing. Listener expectation is a huge part of music, and some musical concepts (like those in the next section) only make sense in the context of an audience that is so acclimated to a set of conventions that they (the listeners) have strong expectations. The expectations may be subconscious (as they generally are for non-musicians), but if they are violated, the listener will notice.

Tension/Resolution and Tonality

Composers must know something about what their audience expects. Music is generally only successful if it follows convention enough to be comfortable, but violates expectations enough to be interesting. Violating expectation creates tension in the listener's mind, while returning to it creates release or resolution. A song without tension is unbearably boring--it always does what you expect. A song without resolution will feel incomplete and dissatisfying, like the listener is "left hanging". Composers strive to combine tension and release in order to achieve some desired effect on the listener.

One way in which music may create tension and release is through tonality. When a piece of music tends to center on a particular pitch, to return repeatedly to that pitch, and to prefer interval structures based on that pitch, then the music is called tonal. The pitch in question is called the tonal center or tonic. You can think of the tonic as the "center of gravity" of the tune: as the song moves away from the tonic, the listener experiences tension; as it returns to the tonic, the listener experiences resolution.

Like pitch, tonality is not a property of the music itself: it is an effect that music may have on certain listeners. In this case, it's those listeners who are acclimated to that style of tonality. For those listeners, it will feel that the song should "resolve" to the tonic. They expect it to return to the tonic because, well, other music in that style has conditioned them to expect it. Until the song finds the tonic, the song feels unfinished, and if a song finishes on something other than the tonic, it will feel dissatisfying.

Not all music is tonal. Some music is completely atonal, while other music may wander through different tonalities or exhibit periods of temporary atonality. Western music, however, is overwhelmingly tonal. Much of the musical structure is dedicated to establishing tonal expectations in the listener, then reinforcing those expectations, and then manipulating them to create some effect. We will return to the topic of tonality later, in the context of scales.

Just Intonation

In the interval of an octave there are infinitely many frequencies to choose from. However, most instruments (and most notational systems) make only some of those frequencies available. If you wish to discretize the octave into a finite set of pitches, you must choose how many pitches and how they will be spaced, frequency-wise. A specific collection of pitches intended for use in music is called a tuning system.

Most music traditions throughout history have pursued just intonation: intervals which are based on small-integer ratios, which correspond to harmonic relationships and thus achieve a high level of consonance. The ancient Chinese Guqin, Classical Greek Pythagorean Tuning, and the Renaissance-era meantone temperament are all examples of just intonation. (As we will see later, modern Western music is not based on just intonation.)

Just intonation does not prescribe a specific spacing for the notes--just that harmonic relationships are present. It also doesn't prescribe a specific number of notes. Both have varied greatly across traditions, influenced by evolving instruments and music styles. In the Western tradition, pentatonic, hexatonic, and heptatonic (5, 6, and 7 notes-per-octave) have been most common.

The Lyre and Diatonic Tuning

It turns out one of the keys to understanding Western music structure is the way the Lyre was tuned in Ancient Greece. This stringed instrument went through many forms and tuning systems, but the one that left its mark on our tradition was the four-stringed Lyre with diatonic tuning, which is described in this section.

The set of four notes on the Lyre was known as a tetrachord, and a tetrachord was designed to cover a perfect fourth (4:3 ratio). Of the various ways to arrange the three intervals in a tetrachord, the most common was "diatonic": two large intervals and a small one. (In Pythagorean tuning, the ratios were 9/8, 9/8, and 256/243).

Note that two tetrachords back-to-back don't quite fill an octave (4/3 * 4/3 = 16/9 < 2). The Greeks constructed their octaves from two tetrachords with an additional "filler" interval in between. The result was a seven-note (heptatonic) tuning system. (If you count both endpoints it's eight, which is where the name "octave" comes from. The term was coined in the context of heptatonic systems, though now we use it to describe a doubling of frequency regardless of the number of notes involved.)

This heptatonic system with diatonic structure--two tetrachords spanning the octave with an extra interval in between--became the basis for much of the Western music tradition. However, as we will see, it evolved into a slightly different form.

Equal Temperament

Though just intonation achieves a high level of consonance, it has two notable disadvantages.

    1. It is not possible in general to cover the octave with consistent harmonically related intervals. As a result, extraneous "wolf intervals" sound bad and cannot be used.
    2. The intervals are unequal in size, which makes it difficult to transpose the music up or down in pitch.

Transposing is still possible with just intonation: you simply re-tune your instrument. But modulation, on the other hand, is essentially impossible with just intonation. Modulation refers to the practice of moving the tonal center around during the course of a song. This requires intervals of equal size, and so it's not practical with just intonation. In the seventeenth century, as composers came to rely heavily on modulation as a form of expression, they gradually adopted a tuning system which had previously been out of favor: equal temperament.

Equal temperament is a tuning system in which the notes are (logarithmically) evenly spaced throughout the octave--that is, every pair of adjacent notes has the same frequency ratio. This eliminates wolf intervals because the intervals cover the octave perfectly; and it makes modulation (key changing) possible because the intervals all have the same ratio.

However, equal temperament has a catch: the resulting interval ratios are irrational numbers, not small integers. This means that the intervals in equal temperament are not true harmonics and cannot be as consonant as those in just temperament. The cost of modulation and eliminating the one bad wolf interval is that every interval in equal temperament is slightly out of tune.

Twelve Tone Equal Temperament

The amount of dissonance depends on the chosen number of notes-per-octave; some choices are better approximations of just intonation than others. It turns out twelve tone equal temperament (12-TET) makes a better approximation than any of the nearby equal temperaments. Though many note combinations are quite dissonant, others are close enough to small-integer ratios to be indistinguishable.

Today, twelve-tone equal temperament is the standard in Western music. Studies show that modern Westerners are attuned to the amount of dissonance in 12-TET and can find other tuning systems (such as Eastern systems based on just intonation) jarring. Those other systems are more harmonically pure but perhaps less flexible (in terms of modulation) than the Western one.

As we will see, our notation and terminology are heavily influenced by this choice.

Scales and "In the Key Of"

The adoption of a 12-note octave did not mean that composers suddenly starting writing music that used 12 notes instead of 7 (or 5). The prevailing music styles continued to be based heavily on heptatonic and pentatonic structure, but they were adapted to work within the framework of 12-TET. This let them play music in familiar styles while fitting it into a more flexible structure.

When you choose any given subset of the pitches in a tuning system, you get another (smaller) tuning system. For example, dropping every other note in 12-TET would give a hexatonic system with fewer notes and larger intervals. Similarly, dropping other subsets of notes can yield a variety of heptatonic, pentatonic, and other tuning systems. When used in this way, 12-TET serves not as the tuning system per se, but as the union of a great many tuning systems, or the totality of notes from which the whole collection is drawn. The medieval term gamut referred to this encompassing set of notes.

The common term for a sequence of pitches drawn from 12-TET is a scale. A scale is both a sequence of notes (ascending through the octave) and a repertoire of notes to be used (exclusively or preferentially) in a piece of music. Examples of scales are chromatic (all notes in 12-TET), whole tone (every other note), blues, and the major and minor scales which we discuss below.

In tonal music, the first note in the scale is assumed to be the tonic. If a piece of music is tonal and conforms to a scale and has the scale's first note as its tonic, then the music is said to be "in the key of" that scale. A listener familiar with that key will expect the song to stay (mostly) within scale, and to resolve eventually to the tonic.

Representing Legacy Systems in 12-TET

It was natural, even with the adoption of 12-TET, to preserve the properties that were familiar from prevailing musical systems. However, in 12-TET, the ratio between any two adjacent frequencies is the twelfth root of two. Legacy systems based on just intonation could not, in general, be represented exactly within 12-TET. Instead, they were approximated in the new system in the form of scales.

Today, those musical systems that survive are frequently described as scales within 12-TET, whether they originated before or after its inception. This is part of why it's difficult to tease apart cause-and-effect. Even those systems which predate 12-TET are now described as if they always fit within that structure.

The interval between adjacent pitches in 12-TET is called a semitone or half step, and two consecutive semitones form a whole tone or whole step. Scales are often described by the starting note and the sequence of whole- and half-steps that traverse the appropriate subset of 12-TET.

The Diatonic in 12-TET

The diatonic system was adapted to 12-TET in the following way. Each tetrachord becomes W-W-H (whole step, whole step, half step), and two tetrachords are connected with a whole step to complete an octave. Thus the complete pattern for the diatonic scale is W-W-H-W-W-W-H.

The resulting scale is only an approximation of the original diatonic tuning, but it is fairly close.

Sharps and Flats

As we mentioned above, the diatonic scale is heptatonic. The seven notes in this and other heptatonic scales have historically been written as A-B-C-D-E-F-G. When reckoned in the context of a twelve note octave, however, there are "skipped" notes that don't have letter names. Rather than reorder the letters to include all twelve notes (A-B-C-D-E-F-G-H-I-J-K-L, which would make sense to me!), composers retained the existing lettering and referred to the skipped notes as sharp or flat relative to the primary notes.

The names of the twelve notes-per-octave in 12-TET are thus:

Notice the W-W-H-W-W-W-H pattern that enumerates the original heptatonic notes. This should emphasize how influential the diatonic system (dating back to the Lyre in ancient Greece) has been on Western music. Non-diatonic scales exist and are used regularly, but they are less convenient to describe than diatonic scales because our very terminology has the diatonic pattern baked into it.

The Many Diatonic Scales

The so-called diatonic scale is, to be precise, not a single scale, but a class of scales. The note names above are clearly motivated by a diatonic pattern starting at C, but the pattern can just as easily begin at any of the 12 notes. For example, if you start at E then you get a scale with the notes E-F♯-G♯-A-B-C♯-D♯. If you start at D then you get D, E, F♯, G, A, B, C♯.

So there are twelve diatonic subsets of 12-TET, each formed by beginning the pattern at a different note. (The twelve subsets are distinct because seven is not a factor of twelve.) But as we said above, a scale is defined not just by the subset, but by the starting point as well. You could choose whatever tonic you like, but it turns out two configurations have come to dominate in Western music.

Major and Minor Scales

If the tonic appears at the beginning of the W-W-H-W-W-W-H pattern, then the scale is called a major scale. Major scales have some pleasing properties (several of the intervals closely match strong harmonics) and have come to dominate in Western music.

If the tonic appears later in the pattern (so that the effective pattern is instead W-H-W-W-H-W-W), the scale is a minor scale. While minor scales are used less often than major ones, they are used more often than the many other configurations of the diatonic that are possible.

Major and minor scales get their names from their tonic notes. For example, the major scale starting at B is B Major.

Because major and minor scales use different rotations of the same pattern, every major scale has a corresponding minor scale (its relative minor) which has exactly the same set of notes. However, because they have different tonics, a major and its relative minor will sound dramatically different. Broadly, people find that major scales sound "stronger" or "happier", and minor scales sound "weaker" or "sadder", even when comprised of the same notes.

The enumeration of all major and minor scales, and the pairings of major scales with their relative minors, can be seen in a diagram called the Circle of Fifths. C Major, the one at the top of the circle, is the particular scale on which the naming convention for notes is based. That scale (and that scale alone) looks like the heptatonic schemes of olde. All other scales incorporate sharps or flats that are a byproduct of the 12-note-per-octave nomenclature.

How Does Tonality Work?

When presented with a song in a major key, most Western listeners will "tune in" to the key automatically, and thus come to expect a particular kind of resolution (or at least they'll know it when they hear it). This is true even when they have no idea which specific major key it is. (Only people with perfect pitch can do that.) How does this work? How can a song evoke a tonal center, and how can listeners, even untrained ones, reliably sense the tonal center?

As we described above, music is tonal when it successfully establishes expectations in the listener about a tonal center. But not all songs achieve this. To achieve tonality, a song needs to stay (mostly) within some scale. And just as important, there must be a listener who is acclimated to that specific scale.

But even that is not enough. Only some scales are well-suited for establishing tonal expectations. The whole tone scale, for example, consists of every other note in 12-TET, which means every note is separated from its neighbors by a whole tone. This scale is highly atonal: it does not evoke any particular tonal center. The diatonic scales, on the other hand, are highly tonal. Why, then, are some scales well suited for tonality and others are not?

For one thing, the whole tone scale is uniform: there are no irregularities in the spacing of its notes. There are no intrinsic landmarks to suggest where the scale starts or stops. This makes it possible to emphasize any given tonic, or none at all.

The diatonic scales, on the other hand, are not only irregular (a mixture of whole steps and half steps), but also asymmetric (one sequence of two whole steps and another of three whole steps). This creates a tonal landscape that has a distinctive shape, and listeners can become familiar enough with this shape to orient themselves towards the tonic.

In major scales, for example, the tonic is the only note with a half step below it (the "leading tone") and another note a perfect fourth above it. A listener who hears these notes in proximity to each other is being given a subtle cue as to the tonality. However, it's an ambiguous cue: the song might be in a major key or it might be in the relative minor--or, for that matter, in any of the other (less common) diatonic scales that share the same set of notes but have different tonics.

So the shape of the scale is important, but in general it's not enough to convey the specific key to the listener.

Cadences and Other Tonal Cues

As we saw above, some scales provide tonal cues by the nature of their shape. A sequence of notes which reveals this shape is sometimes sufficient to signal the key for a listener.

In practice, however, common idioms are used to strengthen the sense of tonality and to distinguish among similar keys. For example, songs in Western Music nearly always begin and end with the tonic chord. Clearly this need not be the case, but violating this rule eliminates a very familiar tonal cue, and may disorient listeners.

In the interior of a song, there are idioms that commonly signal transition away from the tonic or back towards it. For example, the ii-V-I turnaround is a particular chord progression which so permeates modern Western music that it serves as a very strong signal of return-to-the-tonic. Listeners who have never heard the name of this idiom nonetheless are acclimated to its use, and so its appearance serves as a strong tonal cue for them.

The ii-V-I turnaround is an instance of the more general idea of cadences. A cadence is an idoimatic chord progression which signals movement relative to the tonic, and is used not only to establish key, but also to suggest to the listener where the musical piece is going. For example, if a song transitions into a new key, certain candences may be used to signal the change and orient the listener to the new key.

In summary, tonality is a fairly general concept, but any given implementation of it requires listeners who are familiar not only with the scales involved, but also the conventional cadences for each scale which are used to move relative to the tonic.

[Questions: What other features aid tonality? Are some diatonic scales more strongly tonal? Can any irregular, asymmetric scale evoke tonality? I'm not sure.]


We are now in a position to answer the questions posed in the introduction. These answers are simplistic, but at least they don't resort to saying "because that's the way it is." To me, this is progress.

Why do we have twelve notes per octave?

Because equal temperament allows for variety in modulation, and twelve-tone equal temperament does the best job of accommodating the older tuning systems that composers were used to.

Why do we have sharp and flat notes?

Because people were comfortable with the existing naming scheme for heptatonic scales, and rather than change it for 12-TET, they just augmented it.

Why do we have major and minor scales?

Because the ancient and influential diatonic tuning can be adapted into 12-TET in a variety of ways, but two of the forms in particular have caught on and come to dominate.

Why do we talk about what "key" a song is in? Why is that even a concept?

Tonality, or orienting the listener to a central, resolving tone, is a useful way to create tension and resolution, and so various systems for establishing tonality have developed over time. Scales that work well for establishing tonality have been preferred over those that don't, and conventions have been adopted that help orient listeners to specific tonalities.