Audio compression

Sound Compression using DCT

Discrete Fourier Transform

The numbers X0,...,XN-1 described in the first formula are called the Discrete Fourier Transform (DFT) of the sequence x0,...,xN-1.

The Inverse Discrete Fourier Transform (IDFT) of the sequence X0,...,XN-1 is given by the numbers x0,...,xN-1 from the second formula.

In the literature, there is no standard definition of the DFT; instead, you can find other versions with different signs and normalization factors.

For example, here in Scipy for Python.

The Discrete Cosine Transform

Similarly, the Discrete Cosine Transform (DCT) of the sequence x0,...,xN-1 has different versions, as does the Inverse Discrete Cosine Transform (IDCT) of the sequence.

Sound Waves

The sound wave representing the audio is a continuous function defined on the finite interval [0,T]. Analog sound is digitized by taking a certain number of samples per second and storing these samples in digital form with a certain precision ("bit depth").

For example, in an audio CD we have

Sample rate: 44100 (44100 samples are taken per second, or 44.1 kHz)

Sound quality (or Bit Depth): 16 bits (each sample is stored in a 16-bit variable)

Bitrate = 44100*16*2 = 1,411,200 bit/s = 1.41 Mbit/s if the sound is stereo.

That is, 1.41/8 = 0.1764 MB/s o 10.58 MB/min.

Here, the Python code for the example download.

import matplotlib.pyplot as plt #paquete para graficar

import numpy as np #paquete de operaciones matemáticas

from scipy.io import wavfile #paquete para cargar archivos wav

from scipy.fftpack import dct, idct #paquete para dct y idct

frec, x = wavfile.read('audio1.wav') # leemos el archivo wav

x = x[:,1] # el audio es estéreo, para simplificar nos quedamos con un solo canal

# calculamos la DCT # cuidado, sino normalizamos idct(dct(x)) no da x

X = dct(x, norm="ortho")

# en F guardo las frecuencias ordenadas de manera ascendente

F = np.sort(np.abs(X))

# en indices[k] guardo la posición que ocupaba originalmente en X en elemento F[k]

indices = np.argsort(np.abs(X))

# quiero que F esté ordenado descendientemente (sort ordena ascendente)

F = F[::-1]

indices = indices[::-1]

# X es del tipo "list" lo paso al tipo "array de numpy"

Xantes = X

X = np.array(X)

F = np.array(F)

# nos quedamos con "cuanto" de las frecuencias (1=100%)

cuanto = 0.5

cuantasFrecuencias = int(len(X)*cuanto)

cuantaEnergia = np.linalg.norm( X[ indices[range(0,cuantasFrecuencias)] ] )/np.linalg.norm(X)

# al resto de las frecuencias las ponemos en 0

X[ indices[range(cuantasFrecuencias+1, len(X))] ] =0

# reconstruimos el sonido con las nuevas frecuencias

xx= idct(X, norm="ortho")

print("Frecuencias totales:", len(X))

print("Frecuencias que consideramos:",cuantasFrecuencias, "que conforman el", cuanto*100,"% de la energia")

# guardamos el nuevo sonido

wavfile.write("Naudio.wav", frec, xx)

#graficamos la señal/frecuencias viejas/nuevas

fig1 = plt.subplots()

plt.plot(Xantes, 'r,')

plt.plot(X,'g,')

fig2 = plt.subplots()

plt.plot(x, 'r,')

plt.plot(xx, 'g,')

plt.show()

A particular example:

We take Q = 0.8. Here you can see the first 2500 coefficients of the vector X, with those necessary to reach 80% of the energy shown in red. All the blue points are discarded

With Q = 0.8. In blue, we see the values of the original sound x. In red, we see the values reconstructed from the coefficients sufficient to reach 80% of the original energy.

Histogram of the original sound. The reds correspond to high coefficients, decreasing with orange, yellow, green, cyan, and blue, which represents null coefficients.

Histogram of the "compressed" sound. The reds correspond to high coefficients, decreasing with orange, yellow, green, cyan, and blue, which represents null coefficients.

Page updated

Google Sites

Report abuse