As you learned in the previous section, sound waves change air pressure. These changes are the basis for recording sound with analogue or digital methods. Both methods require a microphone that translates the changes in air pressure into electrical signals.
In a dynamic microphone like the one shown above, sound waves enters through the microphone head, and the changing air pressure forces the diaphragm and metal coil to move back and forth past the magnet. This creates a varying voltage in the coil: an electrical signal. Analogue and digital recording methods process this signal in different ways.
Analogue recording methods store the electrical signal directly in some form of physical medium.
Analogue recording dates back to 1877 when Thomas Edison, creator of the lightbulb, invented a mechanical machine called the phonograph, which produced and recorded sound.
Ten years later, Emile Berliner created the gramophone to improve upon the storage capability of the phonograph, which wore out its storage media very quickly. The storage medium Berliner invented was similar to a vinyl record!
These inventions could store sound recordings, but because storage media were physically changed during the recording process, the recordings were nearly impossible to edit. In addition, playing back sound also changed the storage media, so each storage medium was viable for only a limited number of playbacks before it was too damaged to read.
In the 20th century, Fritz Pfleumer’s invention, the magnetophone, overcame these limitations by using magnetic tape as a storage medium. The magnetophone encoded the electrical signals from the microphone via a process of magnetisation; reading the magnetic tape to play back sound did not damage the tape. In addition, tapes could be edited through the process of splicing: a metal blade cut the tape, and sections of tape are joined together.
An analogue sound recording is stored safely on a mechanical device, disk, or magnetic tape, and we can easily convert the analogue recording back into sound. Computers, however, don’t understand analogue and need another way to process sound.
In the electrical signal from a microphone, the voltage changes depending upon changes in air pressure (i.e. the sound waves). A computer converts these changes in voltage into a digital signal using an analogue-to-digital converter or ADC. And the digital signal is stored in the form of binary numbers.
Possible classroom project: you can easily create an ADC by adding a small chip to a Raspberry Pi.
The resulting binary numbers are available for the computer to:
Edit: no need to splice sections of tape anymore: the binary numbers can be directly manipulated (a variety of software programs let you do this very easily, e.g. Audacity)
Play back: sending the binary numbers directly to the loudspeaker would be like sending a stream of ‘loudspeaker on’ and ‘loudspeaker off’ commands — this would not be nice to listen to! So when you want to play sound back, the computer converts the binary numbers back to an analogue signal using a digital-to-analogue converter (DAC) before sending it to the loudspeaker.
You may have heard the term ‘high fidelity’ in the context of stereos or CD players, where it’s often abbreviated as hi-fi. The term refers to the quality at which sound is recorded or played back. To produce enough data for a high-fidelity digital sound recording, the computer needs to check the electrical signal coming from the microphone many thousands of times per second. This process of checking is known as sampling, and you will learn about it in the next step, where we’ll look more closely at how digital sound recording works.
As I described in the previous step, a computer records sound by converting an analogue electrical signal into a digital signal. This involves taking lots of individual measurements to approximate the form of the entire sound wave; this process is called sampling.
Let’s look at how this works for the electrical signal produced by a very simple sound wave, represented as a sine wave of the voltage change over time.
The computer samples the microphone’s electrical signal at lots of different time points, with the same interval between each time point and the next.
The number of samples the computer takes per second is known as the sample rate. Common sample rates are 44.1 kHz, 48 kHz, and 96 kHz. kHz stands for kilohertz, or 1000 samples per second, so 44.1 kHz represents 44100 samples per second. This many samples are needed to capture the full complexity of sound waves.
Let’s look at one sample the computer takes:
You might think the computer could just read the height of the wave (which represents the electrical signal from the microphone) and store that value, but there’s a problem with that: the electrical signal is analogue, meaning it’s continuous. This means that no number of bits is enough to store the value entirely accurately. Instead, the computer has to set a sample resolution, which determines how accurately the computer represents the strength of the electrical signal.
The sample resolution is dictated by the number of bits the computer uses to store a sample value. For example, if the computer uses three bits, it can represent eight different levels of sample value; what the computer actually stores is the level that the analogue signal is closest to:
Representing a point on a continuous scale with a discrete value is known as quantisation. Although there is a difference between the measured value and the stored value, a high enough sample resolution (using more bits) lets the computer get very close to the actual signal level. It’s quite common to use 16 or 24 bits, allowing the computer to represent 65536 or 16777216 different levels.
Quantisation affects sound recordings in ways that you’ve probably encountered: it can cause distortions. The sample resolution that is set for a recording process determines a maximum quantisation level. If the volume of the sound that is being recorded goes above this level, the signal becomes “clipped”, which means the recording will sound distorted.
This concept of sample resolution might remind you of something you learned about when we discussed image files in Week 2: bit depth. We defined this as the storage space each pixel needs to represent the available range of different colours; more shades of colours produce a more detailed image. Sample resolution is the matching concept for sound files: a higher sample resolution produces a more detailed sound recording. This is why sample resolution is also called audio bit depth.
The computer performs quantisation each time it samples the signal from the microphone, and by repeating this process, it produces a series of binary numbers that represent the height of the wave at many different time points.
The computer stores stores this binary representation as a file and can use it to edit, modify, or reproduce the recorded sound.