DCT-based Image Compression

We shall attempt to make a block DCT based encoding scheme, and study its compression for 2 images in Python. Performance shall be judged on metrics like PSNR, compression ratio and bits per pixel.

Pipeline

The compression pipeline is divided into 2 stages: preprocessing and then channel wise compression.

First, the input image is read. If it is a gray-scale (single channel) image, no preprocessing is required. If it is a RGB image, it is first converted to YUV. Then the U and V channels are downsampled by 2. This is the 4:2:0 format. This step is lossy as information is irretrievably lost. But since the U and V channels do not contain much details, there is not much loss visually. This step gives a compression ratio of 1.5 rightaway.

Preprocessing block

Channel wise compression

Block based DCT

At this point we have 3 channels, which needs to be encoded separately. In block based coding, we divide the image into blocks of sizes 8x8, 16x16 or 32x32. Some schemes implement fixed block size, that is the whole image is divided into blocks of equal sizes. Here we shall experiment with variable block sizes.

First we divide the channel into 32x32 blocks. If the variance is very high, we subdivide the 32x32 block into 16 8x8 blocks. If the variance is medium, we subdivide into 4 16x16 blocks. This subdivision helps us have large blocks in regions of low variance and smaller blocks in regions of high variance.

DCT and IDCT

In python, scipy.fftpack offers 1-D DCT and IDCT functions. Therefore the 1-D DCT functions are adapted to 2-D using the following separable method.

def dct2d(x):

  return fftpack.dct(fftpack.dct(x.T, norm='ortho').T, norm='ortho')

def idct2d(x):

 return fftpack.idct(fftpack.idct(x.T, norm='ortho').T, norm='ortho')

Variable block size DCT code

First we calculate the variance of all 32x32 blocks and divide them into 3 sections (low, mid and high) based on variance. The variances are sorted and t1 and t2 percentiles are used as the thresholds. Then DCT is applied using block sizes dependent on variance.

def doDCT(ch, th1, th2, display):

  sh = ch.shape

 v = getVarTh(ch)  #get variance in 32x32 blocks

 thresholds = [np.percentile(v,th1), np.percentile(v,th2)]

 dctSize = [[-1 for j in range(sh[1]/32)] for i in range(sh[0]/32)]

 blkSize = {0:8, 1:16, 2:32}

 coefflist = [[None for j in range(sh[1]/32)] for i in range(sh[0]/32)]

 for i in range(sh[0]/32):  #outer loop for rows

  for j in range(sh[1]/32):  #inner loop for cols

   blk = ch[32*i:32*(i+1), 32*j:32*(j+1)].astype('float32')

   if v[i][j] > thresholds[1]: #use 8x8  #depending on variance, choose DCT block size

    dctSize[i][j] = 0

   elif v[i][j] > thresholds[0]:  #use 16x16

    dctSize[i][j] = 1

   else:  #use 32x32

    dctSize[i][j] = 2

   numsubblocks = 32/blkSize[dctSize[i][j]]; subblocksize = blkSize[dctSize[i][j]]

   coeffs = []

   for kk in range(numsubblocks):  #perform DCT on 1 32x32 block or 4 16x16 blocks or 16 8x8 blocks

    for mm in range(numsubblocks):

     coeffs += [dct2d(blk[kk*subblocksize:(kk+1)*subblocksize,mm*subblocksize:(mm+1)*subblocksize])]

   coefflist[i][j] = coeffs

 return coefflist, dctSize

Left: Moon image with variable block sizes. Variances are sorted for all 32x32 blocks. Then blocks with less than 50 percentile use 32x32 DCT. Blocks with variance between 50 to 80 percentile use 16x16, and the top 20 percentile use 8x8 DCT

Bottom: Similar thresholding (50,80 percentile) for the baboon image. All three channels are shown. U and V channels are downsampled.

We observe that regions of more details have smaller block sizes, while near constant regions have large block sizes.

Effect of thresholding on variable block size

We can choose the threshold for the block sizes (that is at which percentile of variance we shift to a different size). Some samples are provided below. If we use th1=th2=0, then only 8x8 blocks are used. Similarly for th1th2=100, only 32x32 blocks are used. For furthur analysis we use th1=50 and th2=80, whose blocking results are shown above.

th1 = 0, th2 = 0

th1 = 0, th2 = 50

th1 = 100, th2 = 100

Quantization

For quantization, we use JPEG's standard quantization matrix and scale it using JPEG's Q factor mechanism. Quantization is a lossy step.

Q factor

Q factor is an integer between 1 to 100. Scaling factor s is calculated from it as shown below. Then each element of the base quantization matrix is multiplied by (s/1000) to get the new quantization matrix. Finally each element of the DCT is pointwise divided and rounded.

def getS(Q):

  return 5000/Q if Q<50 else (200-2*Q if Q!=100 else 1)

Expanding the quantization matrix

JPEG's base quantization matrix is 8x8. There are 2 matrices, one for Y and one for U and V channels. The U and V channels are more coarsely quatized. Since we are using blocksize of 16x16 and 32x32 also, the original 8x8 matrix is upsampled to 16x16 or 32x32 using the following code:

def expandQMtx(q, factor):  #function to expand 8x8 quantization matrix to 16x16 or 32x32

  newQ = np.zeros((8*(2**factor), 8*(2**factor)))

 for i in range(8):

  for j in range(8):

   newQ[i*(2**factor):(i+1)*2**factor, j*(2**factor):(j+1)*2**factor] = q[i][j]

 return newQ

The following code applies quantization.

def quantize(coefflist, baseMtx, Q):

 s = getS(Q)

 qMtx = baseMtx*(s/1000.)

 qDict = {1: expandQMtx(qMtx, 2), 4: expandQMtx(qMtx, 1), 16: qMtx}  #expand the base matrix for 16x16 and 32x32 case

 roundedCoeff = [[None for j in range(len(coefflist[0]))] for i in range(len(coefflist))]

 for i in range(len(coefflist)): #For each of the 32x32 blocks

  for j in range(len(coefflist[0])):

   cff = coefflist[i][j]

   q = qDict[len(cff)]

   roundedCoeff[i][j] = [np.round(np.divide(k,q)).astype(int) for k in cff]  #quantization step

 return roundedCoeff

Run length encoding

Run length Encoding (RLE) is a lossless compression technique that compresses long runs of zeroes effectively. After quantization, we expect high frequency components have become zero, so RLE can help here

Zigzag scan

To perform run length encoding, we first need to perform zigzag scan on the 2-D DCT coefficients to make them 1-D. This scan order ensures we go from low to high frequencies gradually, thus enabling long runs of zeros.

Zigzag scan of 8x8 DCT coefficients. From wikipedia

The function to get the zigzag scan order is given below:

def getZigzagScanOrder(sz):

  scanOrder = [(0,0)]

 r = 0; c = 0;

 dir = 'b'  #'b' = border, 'd' = down, 'u' = up

 while(len(scanOrder)<sz*sz):

  if dir=='b':  #if at border

   if r==0:  #top border

    c += 1; dir = 'd'

   elif r==sz-1: #bottom border

    c += 1; dir = 'u'

   elif c==0: #left border

    r += 1; dir = 'u'

   elif c==sz-1: #right border

    r += 1; dir = 'd'

   else:

    assert False

  elif dir=='u':  #direction is 'up'

   if r==0 or c==sz-1: #if direction is 'up' nd we are at the border, change mode to 'border'

    dir = 'b'; continue

   else:  #if direction is 'up' nd we are not at the border, go 'up'

    r -= 1; c += 1

  else:

   if c==0 or r==sz-1:  #if direction is 'down' nd we are at the border, change mode to 'border'

    dir = 'b'; continue

   else: #if direction is 'down' nd we are not at the border, go 'down'

    r += 1; c -= 1

  scanOrder.append((r,c))

 return scanOrder

Encoding

Ru length encoding in JPEG is done using the following convention: ((runlength,size),amplitude). Runlength is the number of zeros preceding a non zero value, size is the number of bits needed to code amplitude and amplitude is the actual value. Runlength and size form a single symbol and amplitude forms another symbol. After performing RLE, we can apply Huffman coding on those symbols, and bring down the size even more. However since Python does not have a readily builtin function to do Huffman coding unlike MATLAB, we skip entropy coding here.

Analysing the encoder

We plot the number of bits per pixel, compression ratio and PSNR vs Q and get the following graphs. We observe the expected trends, number of bits allocated per pixel and PSNR increases with quality, while compression ratio decreases with quality. For same quality factor, the baboon image needs more bits per pixel compared to the moon image as it is a coloured image. Similarly for same quality factor, the moon image has more psnr.

Asymptotic behaviour

The moon image is grayscale, hence needs 8 bits per pixel. The baboon image is RGB, hence it needs 8*3=24 bits per pixel. However after downsampling the U and V channels, it needs only 2/3 bits, that is 16 bits. We observe that as Q becomes close to 100, the number of bits per pixel needed for moon becomes close to 8, while the number of bits per pixel needed for baboon approaches 16. As Q becomes close to 100, the compression ratio drops to 1.

Number of bits per pixel vs Q

PSNR vs Q

Compression ratio vs Q

Effect of bits for runlength

In the run length coding, we must specify number of bits to be used to denote the runlength. If the number of bits is too large or small it might be wasteful. We can plot graphs of number of bits per pixel to find optimal bits to allocate to denote runlength

For two Q factor values (5 and 10), we observe that 2 bits for runlength gives best performance.

Google Sites

Report abuse