H.264 (MPEG-4 AVC) & JPEG Based Compression & Transmission System

A three part system consisting a JPEG based Image compression & transmission system, a H.264 (MPEG-4 AVC) based Video compression & transmission system and an Improved hybrid video codec. : Discrete cosine transformation (DCT), Quantization, Huffman entropy coding, Run length coding, ZigZag coding, IDCT, Motion vector aclculation, Motion & Residue Estimation, Quality Analysis techniques are implemented in the system and the freedom to choose any level of quantization and matrix quantization is also provided through user input.

Part One - Image Compression & Transmission

1. Forward Transform

For the forward transform stage, many transforms can be used. Some of them are listed below. • Walsh Transform • Hadamard Transform • Walsh-Hadamard Transform (WHT) • Haar Transform • Slant Transform • KL Transform The Karhunen-Loeve (KL) transform gives us the optimal transformation between the domains. But, this is not used in practice as there are no fast implementation and that can be an issue when working with large systems with high bitrate and high bandwidth systems. On the hand, Discrete Cosine Transform (DCT) is usually used for this application as it can be easily coded to be executed fast and it gives us some additional benefits such as filtering out the DC components, AC components and pushing the higher frequency components towards the end of the bit vector.

The feature extraction by transforming and the way the data points are distributed such that it supports, difference coding or run length coding is another advantage of transforming the Time Domain Signal (Image) to Frequency Domain.

As shown above the amplitude of the highest frequency components (rare occurring ones) are attenuated as they do not represent a larger percentage of the essential information to recreate the image in a satisfactory level.

2. Quantization

Quantization is a method involved in lossy compression and it pushes the already existing values into a several windows of values and if these values are rounded, then it will further shrink the amount of information included. Quantization can be done in two ways, namely, Level Quantization and using a Quantization matrix. Both these methods were implemented in the project.

Quantization by levels

Here, any level can be chosen to match the required, High, Medium and Low Quality outputs and they were, Q = 32, Q = 16, Q = 8 respectively.

Quantization by matrix

In this approach, the DCT calculated matrix is quantized using another matrix of choice. There are several options corresponding to different quantization levels and Q10, Q50, Q90 quantization matrices were used in the implementation of the system. The first few elements are usually made smaller to preserve the low frequency data and the elements that are the furthest are made large to attenuate the amplitudes of the higher frequency components, which do not carry significant information useful to the image reconstruction. The following image shows the negligible difference of the use of higher frequency components of an image. Therefore, zeroing them out does not affect the user experience greatly.

3. Entropy Coding

Entropy coding is a method of lossless coding and it is used regardless of media’s specific characteristics. The data is input as the sequence and depending on the number of occurrences or the probability of each symbol, individual sequences are assigned to the existing ones, which then are used for encoding and transmission. The decoding is completely reversible as they are lossless and runlength, Huffman, Arithmetic coding schemes are examples for entropy coding. Huffman coding was used in this project as the encoding method.

Huffman Encoding

Huffman coding is a variable length type coding system that uses the statistical properties of each symbol to assign sequences for them. This is more efficient when the symbol probabilities vary widely and fewer bits are used for the more frequent symbols and opposite approach is taken for less frequent symbols. The algorithm is as follows. 1. Initialization: Put all symbols on a list sorted according to their frequencies. 2. Repeat until the list has only one symbol left: a. From the list pick two symbols with lowest frequency counts b. Form Huffman subtree that has these two symbols as child nodes and create a parent node. c. Assign the sum of children’s frequency counts to the parent and insert it to the list such that the order is maintained. d. Delete the children from the list. 3. Assign a codeword for each lead based on the path from the root.

Huffman is a greedy type algorithm and it yields the overall best solution. But this does not give us the optimal solution always. From the Shannon’s information theorem, we can obtain the optimal rate for transmission and usually it’s a bit lower than what we achieve from Huffman coding.

Run Length Coding

Run length is another coding scheme that was used in the implementation stage and it is useful when consecutive symbols in the sequence are identical. The basic idea can be interpreted as follows

Part Two - Video Compression & Transmission

All of the same concepts listed above were used in the implementation of the video coding system and some other techniques which are not listed above was used as well. Those methodologies are as follows.

4. Motion Estimation and motion Vector Generation

A video is a set of images put together and the same compression techniques can be used for video compression as well. But there are other methods that we can use to further reduce the bandwidth requirement and speed up the transmission process. The spatial redundancies of the frames are addressed by the coding methods mentioned above and the temporal redundancies should also be addressed to further reduce the memory requirements. Motion estimation and motion vector generation is used for this purpose. We consider a set of consecutive frames and we look for similarities between them to encode that information only so that the compression is better. We can use Current, Previous, and Future frames for this purpose and these are called I, P, B frames respectively.

To extract these information, we use motion estimation techniques to obtain, which information and which blocks have been displaced in the future frames and by how much. Some of the Motion estimation techniques are listed below.

• Sequential Search • Logarithmic Search • Hierarchical Search

The frames are first divided into blocks of choice and these methods are used to figure out which block has ended up where in the next frame and the direction displacement vector corresponding to the movement is called as the Motion Vector and that is also recorded to be transmitted. This block matching process can be done across a limited number of neighbor frames and that is choice for the developer to make.

Therefore, now we don’t have to transmit all the frames one by one, instead we can just send the reference frame, the next frames motion information and the remaining difference between the reference frame and the next frame that is needed to properly recreate the frame on the receivers side. This difference is called the Residue and these operations have resulted in,

a. The original frame which the motions were obtained (Reference Frame)

b. Motion Vector

c. Residue

And we only have to transmit these three instead of the whole frame. Residue is also known as ‘Control Data’ and the Reference frame & Motion Vector is also known as ‘Motion Data’. The best matching motion block is chosen by the calculation of loss functions and Mean Squared Error (MSE), Mean Absolute Difference (MAD), Sum of Absolute Differences (SAD) are used in common. SAD was used as the loss function in the implementation.

The residue is important for the acceptable recreation of the image and the created image with just the Blok Matching and Reference frame is called the ‘Compensated Image’. As you can see below, it results in a blocky image and the residue must be added to create the required form of the next frame.

Part Three - Improved Hybrid Video Codec

Rate distortion curve shows that the more you compress something the more the distortion there will be. Therefore, finding an optimal point the best quality is achieved while the rate is not unnecessarily large, is important. For the distortion calculation, several quality metrics can be used and some of them are listed below.

• Mean Square Error (MSE) • Peak Signal to noise Ratio (PSNR) • Structural Similarity Index (SSIM) • VQM (NTIA General Model)

The first three listed above was used in the implementation to obtain Rate-Distortion Curves of the system and the optimal point was chosen for the transmission system. These values were calculated for the different quantization levels that were discussed earlier and 10 different quantization levels were used to obtain the results.