Auditory Perception

The physical perception of sound

Sound waves are mechanical waves that are converted into electrical signals when sound such as speech is perceived (Ashmore, 2007). This process is known as mechanotransduction (Ashmore, 2007). Before the subject of how sound is perceived is addressed, the anatomy of the hearing system will be briefly outlined.

As shown in Fig. 1, the ear consists of the outer ear, the middle ear and the inner ear (Howard & Angus, 2009). Both the outer ear and middle ear are filled with air while the inner ear contains fluid. The outer ear includes the pinna and the auditory canal and is separated from the middle ear by the ear drum (tympanic membrane), which is linked to the inner ear through the auditory ossicles that are bones, known as the malleus, incus and the stapes.

Fig.1:Anatomy of the human ear (modified from Crowne,

While the malleus links to the eardrum, the stapes are connected to the oval window, which gives access to the inner ear (Howard & Angus, 2009). The inner ear consists of three regions that are the cochlea, which is involved in hearing, the semicircular canals and the vestibule that are involved in coordination and balance (Howard & Angus, 2009).

The cochlea contains canals known as the vestibular canal (scala vestibuli), tympanic canal (scala tympani) and the cochlear canal or duct (scala media) that includes the spiral organ that is the organ of Corti (Fig. 2).


Fig. 2: Cross-section of the organ of Corti within the cochlea (modified from Wikipedia,

The scala tympani is separated from the middle ear by the round window, which sustains an even volume of fluid. The organ of Corti has around 16,000 hair cells that act as sound receptors. They can be found on the ground of the cochlear canal, that is the basilar membrane, with their specialised subunit, the stereocilia (hairs from hair cells), set in the tectorial membrane (Howard & Angus, 2009). The nerves from these cells create a spiral bundle that is the auditory nerve (also known as cochlear nerve) (Ashmore, 2007). The organ of Corti has a line of inner hair cells and three lines of outer hair cells.

The perception of sound starts when variations in air pressure arrive in form of sound waves at the ear and go through the auditory canal, thereby causing the tympanic membrane to vibrate. The tympanic membrane changes the acoustic pressure variations into mechanical vibrations within the middle ear that are then sent through the auditory ossicles (Howard & Angus, 2009).

The mechanical movements are thereby forwarded to the fluid that fills the cochlea within the inner ear. By going from the scala vestibuli and scala tympani, the mechanical movements eventually lead the basilar membrane to oscillate, thereby causing the stereocilia of the hair cells to be bent. When hair cells convert the mechanical (acoustic) signal into an electrical (neural) signal, this information is sent in form of action potentials over the cochlear nucleus and the inferior colliculus to the medial geniculate nucleus (MGN) within the thalamus (Ashmore, 2007; Gazzaniga, Ivry, & Mangun, 2009). From there it is projected to the superior section of the temporal lobe, known as the primary auditory cortex that results in the sensation of hearing (Ashmore, 2007; Gazzaniga, Ivry, & Mangun, 2009).

The physical perception of formant frequency

As shown in Fig. 3, different parts of the organ of Corti within the cochlea of the inner ear respond to dissimilar wave frequencies (Howard & Angus, 2009; Rossing, Moore, & Wheeler, 2002). While the apex of the organ is sensitive to low frequencies, the base of the organ is sensitive to higher frequencies. Therefore, the travelling sound waves reach their peak at frequency-dependent positions. Neurons within the cochlea that respond to speech are therefore organised according to their frequency sensitivity, which is known as a tonotopic organization.

Fig. 3: Different aspects of the organ of Corti showing sensitiveness to different frequencies (modified from Encyclopaedia Britannica,

Every neuron responds to some extent selectively to existent acoustic energy at a particular frequency. Thus, when acoustic energy exists in formants at particular frequencies, neurons matching particular frequencies will send this information about the existence of acoustic energy to the brain in form of action potentials through the cochlear nerve. The cochlea has therefore been considered to work similar to a bank of frequency filters (Howard & Angus, 2009; Rossing, Moore, & Wheeler, 2002).

The physical perception of pitch

The distinction between sounds that are of high and low pitch is made possible by the basilar membrane. The basilar membrane becomes narrower as it goes from the apex to the base. When the sound signal is transmitted by the inner hair cells at the top end of the basilar membrane, the sound is construed as in low pitch. In contrast, when sound signals are sent from inner hair cells at the base end, the sound is perceived as high in pitch because the cochlea is stiff and thin at the base (Howard & Angus, 2009; Rossing, Moore, & Wheeler, 2002).

The physical perception of sound amplitude

The distinction between loud and soft sounds is supported by the organ of Corti (Howard & Angus, 2007; Rossing et al., 2002). When loud sounds cause the organ of Corti to vibrate more strongly by inducing the fluid in the scala vestibule to exercise more pressure, larger quantities of hair cells will get stimulated over a larger region of the basilar membrane. As a result, more action potentials are instigated in the auditory nerve, which is then construed as a sound with a high sound intensity (Howard & Angus, 2007; Rossing et al., 2002). Accordingly, soft sounds are perceived when smaller amounts of hair cells are excited whereby less action potentials are originated in the auditory nerve compared to the perception loud sounds (Howard & Angus, 2007; Rossing et al., 2002).



Ashmore, J. (2007). Hearing. Sound (pp. 65-88). Cambridge: Cambridge University Press.

Crowne, D.Hearing. Retrieved June, 2013, from

Encyclopaedia Britannica.Basilar membrane: Anlysis of sound frequencies. Retrieved June, 2013, from

Gazzaniga, M., Ivry, R., & Mangun, G. (2009). Cognitive neuroscience: The biology of the mind. New York: Norton.

Howard, D. M., & Angus, J. A. S. (2009). Acoustics and PsychoAcoustics. Oxford: Focal Press.

Rossing, T., Moore, R., & Wheeler, P. (2002). The science of sound. San Francisco, CA: Addison Wesley.

Wikipedia, t. f. e.Cochlea-cross section. Retrieved June, 2013, from