How to design an audio playback system
Introduction
An audio playback path requires understanding for audio format, audio postprocessing, and audio hardware.
Background
Audio system engineering is a mix of signal processing, codec, amplifier design, and acoustics design. It's a very interdisciplinary field that requires system level thinking and design.
Goal
The goal of this article is to provide an introduction audio playback system operation basics, architecture overview, and design areas.
Audio System Overview
Audio Pipeline
Content application -> audio service (background service)-> audio driver (kernel)-> audio hardware -> speaker.
A content application generates audio source, an example is Spotify
A audio service write down audio from content app to low level audio drivers.
A audio drivers sends digital audio stream down to the wire to audio hardware
A audio hardware coverts digital audio stream to analog signal which is played out by the speaker.
Audio Format
Uncompressed Audio: LPCM
Bit depth; 16/32 bits
Sampling rate: 44.1 kHz, 48KHz, and 98 Khz
Audio Hardware
Amplifier
Class-D Amplifiers are commonly used
Drivers
full range
10w rated output
4 ohms coil impedance
DSP
could be either external hardware or internal to Application processor
Note: simple speaker design does not need a dedicated DSP processor.
Audio Postprocessing
Limiter: it dynamically clips very intense audio peaks to reduce risk of over-driving the speaker
Compressor (e.g dynamic range conversion): it dynamically compressed audio peaks under a predefined level to reduce distortion generated by the speaker at high output power.
Equalization (sound effects): it's used to generate special sound effects of the sound system. The audio engineers adjust the amplitude for frequency of interest to achieve desired sound effect.
Cross-over filter: if more multiple speakers are used (e.g. 2 ways system containing woofer (20 to 2kHz) and tweeter (2kHz to 20kHz), a cross over filter is used to split the audio stream into two frequency bands for woofer and tweeter respectively.
System Interface
Data
I2S: it is 4 wire bus interface that commonly used for audio data transfer. On the BUS, other protcols such as PCM (Stereo Channels) and TDM (up to 16 Channels of audio).
Control:
I2C: it's a two wire interface that is widely used communication interface for sending control and command signals.
Audio System Clock
All audio subsystem MUST be referenced to the same clock source. The reason is that different clock source has inherent clock drifts. Clock drifts from two different clock sources introduces acoustic misalignment resulting unwanted sound distortions such as beats as played out by two speakers each driven with separate clock.
How is common clocking achieved?
A clock source inside an SoC is generated by an crystal oscillator that is highly precise and stable, which desired for data transfer and timing. This clock is used as the master clock generator for all subsystems within the SoC, and one of such is audio subsystem. We Must ensure that audio hardware such as CODEC, DSP, Amplifiers, and Digital Microphone are referenced by this master clock.
MCLK from I2S interface could be used for such external reference clock output or a dedicated clock output generator pin that pinmux the internal system clock to an output pin can also be used
What if there is external digital audio source that needs to be mixed with the internal audio playback?
Ideally all audio equipment is synchronized with common master clock, however external digital audio is sampled by external clock source which will have difference frequency offset compared to internal master clock reference. When mixed both two audios with different sample rate, this will cause an audio misalignment that result in audible audio errors. Hence any cross domain clock audio system (i.e two or more separate clock) needs to be synchronized by a Asynchronous rate converter (ASRC) to address this issue.
AI Speaker Design Example
Following example is a generic voice command speaker.
Block Description and Design
Speaker Amplifiers
we choose a Class-D amplifier with integrated I2S audio interface to simplified the design. Note I2S is an audio interface that can support stereo channels, Hence one I2S interface can be used to interface two speakers ampifiters
DSP
The DSP chosen here is to synchronize both speaker audio playback and microphone audio capture for voice processing. In side this DSP, postprocessing algorithm such as compressor, EQ, etc. can be applied per product requirements.
For voice processing, active echo cancellation is generally used along with noise reduction algorithm. To select a right DSP for the audio processing, one needs estimate the memory requirements (KB) as well as the computational horse power (DMIPS) as well as the DSP architecture (eg.g HI-FI 3). For simple audio processing, a generic microcontroller with DSP instruction and floating point unit can be used as well (e.g ARM Cortex M4F).
Note: the DSP block is for illustration purpose. A lot of digital system processor has audio system ports that supports PDM mic and I2S interface and audio post processing can be run in software or internal DSP/coprocessor.
System Clock
as we can see that common reference clock is generated by the SoC system clock fed to DSP and and DSP fed it to AMP and Microphone. This achieves synchronized audio.
Summary and conclusion
Understand audio pipeline of an audio playback system
learn about functionality of different audio hardware
learn that common reference clock is a MUST to synchronize audio system to reduce audio errors.
Go over a design steps for a voice command speaker.
In order to have a high quality audio system, we must use the right parts for right audio processing. Clocking mechanism is often the most problems seen in audio design early on.