23. Speech
23. Speech
Vowels are mostly made distinguishable by differences in the distribution of energy among their harmonics. As vowels are produced with the vocal tract open, these differences are mostly generated through appropriate positioning of the elements in the vocal tract. This alters the resonance in the vocal cavities which in turn modifies the distribution of energy among the harmonics. The acoustics of vowels can be visualized using spectrograms, which display the acoustic energy at each frequency, and how this changes with time.
Figure 1. Spectrogram of vowels [i, u, ɑ]. [ɑ] is a low vowel, so its F1 value is higher than that of [i] and [u], which are high vowels. [i] is a front vowel, so its F2 is substantially higher than that of [u] and [ɑ], which are back vowels. More details.
A vowel is therefore determined by the configuration of the vocal tract and not by the tension in the vocal folds. A singer can vocalize an “ah” and change the fundamental frequency of the voice drastically while still holding a sound that can clearly be recognized as the vowel “ah”. Conversely, the singer can maintain the tension at the vocal folds constant while cycling through all the vowels.
Figure 2. Watch the MRI video at https://upload.wikimedia.org/wikipedia/commons/9/9f/Real-time_MRI_-_Speaking_%28English%29.ogv. More details.
The frequencies that are emphasized at a given configuration of the vocal tract are called formants. When the tension in the vocal folds is changed without moving the vocal tract, the formants stay the same, but the fundamental frequency changes. This results in a different set of harmonics being matched by the formants and emphasized by resonance in the vocal tract.
One important aspect of vowel production is that the muscle contractions that produce the various vowels all occur along continual ranges without categorical steps. A person’s concept of a given vowel is a learned convention that is employed when producing the vowel and when recognizing it in another person’s speech. These conventions do not hold across languages and they also vary among regions that speak a same language.
Formant charts are frequently used to analyze the continuum of the vowel space and study accents, dialects, pathologies and the boundaries between vowels. The first formant, abbreviated “F1", corresponds to vowel openness (vowel height). Open vowels have high F1 frequencies, while close vowels have low F1 frequencies, as can be seen in the accompanying spectrogram. The [i] and [u] have similar low first formants, whereas [ɑ] has a higher formant.
The second formant, F2, corresponds to vowel frontness. Back vowels have low F2 frequencies, while front vowels have high F2 frequencies. This is very clear in the spectrogram, where the front vowel [i] has a much higher F2 frequency than the other two vowels. However, in open vowels, the high F1 frequency forces a rise in the F2 frequency as well, so an alternative measure of frontness is the difference between the first and second formants. For this reason, some people prefer to plot as F1 vs. F2 – F1.
Figure 3. A formant chart showing the stem vowel space of the Kyrgyz language. Vowels are produced in different parts of the acoustic space formed when the frequency of the first formant (F1) is plotted against that of the second formant (F2). More details.
Vowels are more traditionally defined phonetically or phonologically.
In the phonetic definition, a vowel is a sound, such as the English "ah" or "oh", produced with an open vocal tract. It is median, oral, frictionless and continuant, meaning that most air escapes through the middle of the tongue, through the mouth, without restriction and continuously. There is no significant build-up of air pressure at any point above the glottis. This contrasts with consonants, such as the English "sh" [ʃ], which have a constriction or closure at some point along the vocal tract.
In the phonological definition, a vowel is defined as the nucleus of a syllable. In oral languages, phonetic vowels normally form the nucleus of many or all syllables, whereas consonants form the onset and (in languages that have them) coda.
The traditional view of vowel production is one of articulatory features that determine a vowel's quality as distinguishing it from other vowels. Daniel Jones developed the cardinal vowel system to describe vowels in terms of the features of tongue height (vertical dimension), tongue backness (horizontal dimension) and roundedness (lip articulation). These three parameters are indicated in the schematic quadrilateral IPA vowel diagram. The vowel quadrilateral is not a perfect mapping of tongue position but it is an intuitive morphological approximation to the tongue influences the acoustics of the vowels.
Figure 4. The original vowel quadrilateral, from Jones' articulation. The vowel trapezoid of the modern IPA, and at the top of this article, is a simplified rendition of this diagram. More details. Try the current IPA chart with audio.
Vowel height is named for the vertical position of the tongue relative to either the roof of the mouth or the aperture of the jaw. It correlates closely with the first formant (lowest resonance of the voice), abbreviated F1. In close vowels, also known as high vowels, such as [i] and [u], the first formant is consistent with the tongue being positioned close to the palate, high in the mouth, whereas in open vowels, also known as low vowels, such as [a], F1 is consistent with the jaw being open and the tongue being positioned low in the mouth. Height is defined by the inverse of the F1 value: The higher the frequency of the first formant, the lower (more open) the vowel.
Figure 5. Idealistic tongue positions of cardinal front vowels with highest point indicated. More details.
The International Phonetic Alphabet defines seven degrees of vowel height, but no language is known to distinguish all of them:
close (high)
near-close (near-high)
close-mid (high-mid)
mid (true-mid)
open-mid (low-mid)
near-open (near-low)
open (low)
The parameter of vowel height appears to be the primary cross-linguistic feature of vowels in that all spoken languages use height as a contrastive feature. No other parameter, even backness or rounding (see below), is used in all languages. Some languages have vertical vowel systems in which at least at a phonemic level, only height is used to distinguish vowels.
Vowel backness is named for the position of the tongue during the articulation of a vowel relative to the back of the mouth. As with vowel height, however, it is defined by a formant of the voice, in this case the second, F2. In front vowels, such as [i], the frequency of F2 is relatively high, which generally corresponds to a position of the tongue forward in the mouth, whereas in back vowels, such as [u], F2 is low, consistent with the tongue being positioned towards the back of the mouth.
The International Phonetic Alphabet defines five degrees of vowel backness:
Although English has vowels at five degrees of backness, there is no known language that distinguishes five degrees of backness without additional differences in height or rounding.
The conception of the tongue moving independently in two directions, high–low and front–back, is not supported by articulatory evidence. The natural movements of the tongue are better characterized by the three directions of movement that it can take from its neutral position: front, raised, and retracted. In this alternative classification, front vowels can be secondarily qualified as close or open, as in the traditional conception.
Figure 6. Front, raised and retracted are the three articulatory dimensions of vowel space. More details.
Roundedness is named after the rounding of the lips in some vowels. Because lip rounding is easily visible, vowels may be commonly identified as rounded based on the articulation of the lips. Acoustically, rounded vowels are identified chiefly by a decrease in F2, although F1 is also slightly decreased.
In most languages, roundedness is a reinforcing feature of mid to high back vowels rather than a distinctive feature. Usually, the higher a back vowel, the more intense is the rounding.
Nasalization refers to whether some of the air escapes through the nose. In nasal vowels, the velum (soft palate) is depressed and some air travels through the nasal cavity. An oral vowel is a vowel in which all air escapes through the mouth. French, Polish and Portuguese contrast nasal and oral vowels.
Voicing describes whether the vocal cords are vibrating during the articulation of a vowel. Most languages have only voiced vowels, but several Native American languages, such as Cheyenne and Totonac, contrast voiced and devoiced vowels. Vowels are devoiced in whispered speech. In Japanese and in Quebec French, vowels that are between voiceless consonants are often devoiced.
Some languages contrast between vowels made with the root of the tongue advanced or retracted, using or not secondary narrowings in the vocal tract or making tense or lax vowels.
Vowels are characterized not by fundamental frequency of vocal fold vibration, but by distribution of energy in their harmonics. This is a function of the resonance patterns in the vocal cavities, which can be modified by moving the anatomy of the vocal tract. Vowels are not categorically different in production mode or sound. The categorization is learned and the boundaries are variable among languages, dialects and accents. The articulation of vowels mostly involves raising, protracting or retracting the tongue.
Harmonic, formant, vowel space, formant chart, vowel height, vowel backness, vowel roundedness, nasalization, phonation.
Figure 1 by anonymous, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=221013
Figure 2 by Biomedizinische NMR Forschungs GmbH. http://www.biomednmr.mpg.de. CC BY-SA 3.0. Martin Uecker, Shuo Zhang, Dirk Voit, Alexander Karaus, Klaus-Dietmar Merboldt, and Jens Frahm, Real-time magnetic resonance imaging at a resolution of 20 ms, NMR in Biomedicine 23: 986–994 (2010) DOI:10.1002/nbm.1585.
Figure 3 by Peter238 - Own work, based on the formant chart in "Jonathan North Washington - Phonetic and Phonological Problems in Kyrgyz: A Fulbrighter's plans for gathering data in the field", page 10., CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=39437469
Figure 4 by Kwamikagami - Copied from the 1949 Principles of the IPA, p. 6., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=56805128
Figure 5 by BadseedThis vector image was created with Inkscape. - Own work, data: see below, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3539944
Figure 6 by Kwamikagami - modified my file IPA vowel chart 2005.png, CC BY-SA 3.0, https://en.wikipedia.org/w/index.php?curid=46937434
Figure 7 by WellsTribute - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=7452974