This week I will develop and compare 2 lossless codecs for binary images. These codecs convert a binary image to a symbol stream of integers (but not to actual bitstreams) and back.
Activity 1: Run length codec for binary images
We have encountered run length coding for binary image. It performs better for binary images because in such images, usually there are long runs of whites or blacks that is succinctly represented with a single number. In this section I will describe a simple codec (both the encoder and the decoder) that encoded a binary image into a sequence of positive numbers.
Description of Simple Run length Codec
We consider that the image is composed of 0 (black) and 1 (white). We assume that each row starts with a 1 and then we encode the whole row with a string of numbers. Each number represents the run length of the current pixel. The next number in the sequence would represent the run length of the flipped version of the current bit.
Lets see an example. I will mark the encoded symbol stream with underline. On encoding 1 1 0 0 0 1 0 0 0 0 1 1, we get 2 3 1 4 2, because we have 2 ones, followed by 3 zeroes etc. Similarly 0 0 1 1 0 1 1, is encoded as 0 2 2 1 2, because we assume we start with one, but since there were no ones in the beginning, that run length is 0. That is followed by 2 zeroes, 2 ones, 1 zero and so on.
Now we introduce the symbol 0 as representing a new line. That is when encoding, if we reach the end of a row, a 0 is added to the symbol stream. Similarly on decoding side, if the decoder encounters a 0, then it starts a new row in the image. Note since we are using 0 as a symbol for new line, it causes ambiguity as 0 could also mean a run length of length 0 (which is possible when the row begins with a 0, as shown in the second example above). Therefore if a runlength is of length n, I encode it using the symbol n+1, so that the symbol 0 uniquely represents a newline. Thus the symbol 1 would represent a run length of length 0. In the decoder side, when a non zero symbol is encountered, the decoder understands that it should subtract 1 from it to get the actual runlength. Note that the possible symbols thus formed can be positive numbers only (including 0) and the scheme is lossless.
Lets consider the following image:
1 1 0 0 1 1 1 0 1
0 0 0 0 1 1 0 0 0
This is encoded as 3 3 4 2 2 0 1 5 3 4 0. Note that a runlength of length n is coded by the symbol n+1.
Code
Lets see the code for the encoder.
%0 used for newline and a run of n positions is encoded as n+1
function [symbolstream, symbols, bitsneeded] = encode_runlength(img)
sz = size(img);
symbolstream = [];
bitsneeded = 0;
symbols = [0 0]; %unique symbols present and their count
for i = 1:sz(1)
currrun = 0; currpixel = 1; %current run length and current value of pixel
for j = 1:sz(2)
if (img(i,j) ~= currpixel) %current bit has flipped
currpixel = 1 - currpixel; %flip the current bit
%add the symbol to the encoder stream
symbolstream = [symbolstream currrun + 1];
%number of bits needed to represent this symbol in simple binary
bitsneeded = bitsneeded + ceil(log2(currrun + 1));
symbols = addsymbol(symbols, currrun);
currrun = 1;
else
currrun = currrun + 1; %increment the run if the bit hasn't flipped
end
end
symbols = addsymbol(symbols, currrun+1);
symbolstream = [symbolstream currrun + 1 0]; %add newline symbol
bitsneeded = bitsneeded + 1;
symbols(1,2) = symbols(1,2) + 1; %the 1st symbol is 0 (new line)
end
end
Now lets see the function for the decoder.
function decodedimg = decode_runlength(symbolstream)
decodedimg = [];
currrow = [];
currpixel = 1;
for i = 1:length(symbolstream)
if (symbolstream(i) == 0) %newline encountered
decodedimg = [decodedimg; currrow]; %append row to image
currrow = [];
currpixel = 1; %since a row is beginning we assume the first pixel is 1
else
%add n-1 number of 'currpixel', where n is the symbol
currrow = [currrow repmat(currpixel, 1, symbolstream(i)-1)];
currpixel = 1 - currpixel;
end
end
end
Now let us see some results. I read in an image that is possible to binarize and find the 50 percentile gray level value and binarize the image on that threshold. Below are the original images, binarised image and decoded image (that is I first encode the image and then decode it using the codec described above).
Activity 2: Run length codec using prediction for binary images
Now I describe a second approach where I predict the current row based on the previous row and then encode the difference.
Description of Predictive Runlength Codec
At the encoder, the first row of the image is coded using the previous coding scheme. Now consider all rows n > 1. Then we consider the difference between the current row and the previous row. If at position j the difference is zero then it means that this pixel at the jth position is same as the jth position of the previous row. If there is a non-zero difference (1 or -1)
then we add the start of the flip and the length of the flip to the symbol stream. Lets work out an example to make this clear.
Say we wish to encode the following:
1 1 1 0 0 0 0 0 1 1
1 1 0 0 0 1 1 1 1 1
The first line is encoded as 4 6 3 0 (according to the last scheme). Now to encode the 2nd row, lets see the difference of the 2nd and the 1st row. The difference is: 0 0 -1 0 0 1 1 1 0 0. Thus we see a non zero values at 3, 6, 7 and 8 (that is the 2nd row and the 1st row differ at these points). We encode this as 3 1 6 3. What it means is take the 1st row. Flip the 3rd bit and continue flipping for length one. Then flip the 6th bit and continue flipping for 3 places (ie 6th, 7th and 8th bits are flipped).
Now note that the starting point of the flip can be anything from 0 to the width of the picture, that is it can take a wide range of numbers. To get over that we can code the flip start incrementally. Lets take an example. Say a particular row is encoded as 3 1 7 2. We could use the following stream: 3 1 4 2. It means that at position 3 there is a flip of length 1. Then at position (3+4 = 7), there is a flip of length 2. So instead of adding the symbol 7 in the bitstream, we add the symbol 4. Since the last flip position was 3, we add 4 to 3 and get 7. Note that since the symbol 0 is used to denote newline, and we could have flips at position 0, so we encode the positions by adding 1 to them. So instead of 3 1 4 2, we have 4 1 5 2.
Thus this stream:
1 1 1 0 0 0 0 0 1 1
1 1 0 0 0 0 1 1 1 1
1 2 3 4 5 6 7 8 9 10
gets encoded as: 4 6 3 0 4 1 5 2 0. The first row (4 6 3 0) is same as earlier encoder. The next symbols (4 1 5 2 0) tell us that at position 4-1 = 3, for 1 runlength the bits differ from 1st row. Then at position 3+5-1 for a run length of 2, the bits in the second row differ from bits in the first row. The differing bits and their positions are shown in red.
Code
The encoder is shown below
function [symbolstream, symbols, bitsneeded] = encode_runlength_mod(img)
sz = size(img);
symbolstream = [];
bitsneeded = 0;
symbols = [0 0]; %unique symbols present and their count
%process first row separately using normal runlength encoder
[symbolstream symbols bitsneeded] = encode_runlength(img(1,:));
prevrow = img(1,:);
startflip = 0;
prevflip = 0;
for i = 2:sz(1) %start the modified encoding from the second row, as first row is encoded using normal runlength encoding
diffrow = img(i,:) - prevrow; %difference of current row and previous row
currpixel = img(1,1);
startflip = 0;
prevflip = 0;
for j = 1:sz(2)
if (diffrow(j) ~= 0)
if (startflip == 0)
startflip = j;
end
else
if (startflip ~= 0)
symbolstream = [symbolstream startflip-prevflip+1 j-startflip]; %insert the pair (start of flip, length of flip)
bitsneeded = bitsneeded + ceil(log2(startflip-prevflip+1)) + ceil(log2(j-startflip));
symbols = addsymbol(symbols, startflip-prevflip+1); symbols = addsymbol(symbols, j-startflip);
prevflip = startflip;
startflip = 0;
end
end
end
prevrow = img(i, :);
symbolstream = [symbolstream 0];
bitsneeded = bitsneeded + 1;
symbols(1,2) = symbols(1,2) + 1; %the 1st symbol stored is 0 (new line)
end
end
The decoder is as follows:
function decodedimg = decode_runlength_mod(symbolstream)
decodedimg = [];
currrow = [];
currpixel = 1;
temp = find(symbolstream == 0);
decodedimg = decode_runlength(symbolstream(1:temp(1))); %1st row is decoded using normal runlength decoder
remaining = symbolstream(temp+1:end);
prevrow = decodedimg(1,:);
beginnew = 1;
currrow = decodedimg(1,:);
i = 1;
prevflip = 0;
while(1)
if remaining(i) == 0 %new row
decodedimg = [decodedimg; currrow];
prevrow = currrow;
beginnew = 1;
i = i+1;
startflip = 0;
prevflip = 0;
else
if beginnew==1
beginnew = 0;
currrow = prevrow;
end
startflip = remaining(i) + prevflip - 1; %start of flip is old flip point + current symbol -1
flipdist = remaining(i+1); %runlength of flip
for k = startflip:startflip + flipdist - 1 %flip the required bits in this loop
currrow(k) = 1 - currrow(k);
end
prevflip = startflip;
i = i+2; %read 2 symbols (flip start, length of flip)
end
if (i > length(remaining)) break; end %break if all symbols have been used up
end
end
We get exactly same results (that is perfectly lossless encoding and decoding), as expected. Since the results are same as shown in the table of images above, I am not making a new table here.
Activity 3: Comparision of the two codecs
Now that we have seen the 2 codecs (Simple (codec1) and Predictive (codec2)), lets compare them. Consider an image of width W, then both codecs generate a symbol stream containing values from the set {0, 1, 2, ..., W+1}. Both codecs use the horizontal redundancies present in the image
Before discussing the advantages of the Predictive codec, lets keep in mind the following:
1. Its better if we have a smaller number of symbols, as we can use small number of bits for each symbol then.
2. Its better if some symbols have very high probability,, as then short bit stream code words can be assigned to them and a lot of compression is achieved.
Now lets look atthe predictive codecs advantages:
1. It uses the vertical redundancies also. Since each row looks very much like the row above it, the error between them is very less, hence we can save a lot if we only encode the difference.
2. Also since the difference is probably not much between consecutive rows we have short runs of flips. Thus smaller symbols become more probable, making it possible to encode them with small number of bits. Thus this scheme concentrates the probability towards the smaller symbols.
To compare the two codecs, I collected the following statistics:
1. the number of unique symbols used. (the less the better)
2. Max value of symbol. (the less the better)
3. Histogram (relatively fewer symbols should have more occurrences)
4. No of bits if constant length binary converted bitstream is generated. Here each symbol uses ceil(log2(max)) number of bits, where max is the maximum value among all the symbols used. This is because to encode max, we need atleast that many bits and since it is a fixed length scheme, all symbols are encoded using that many bits. Clearly its not very efficient. (the less the better)
5. No of bits if each symbol is encoded in binary using minimal number of bits. So to encode the symbol 5 we need 3 bits and for 9 we need 4 bits. So to encode the symbol stream 5 9 5, we need 10 bits. Note this kind of binary conversion using variable code length is ambiguous and is not a practical thing. I just use this as a metric to compare the codecs. (the less the better)
The table below compares the numbers for the 3 images shown above. In each cell the left hand number is for codec 1 and the right hand number is for codec 2. The number in green is indicates better performance
Image
1
2
3
Image size (# pixels)
50336
50421
50625
Max value of symbol
# unique symbols
Bits used for constant length binary symbols
Bits used for variable length binary symbols
Comments
Note that the constant length encoding actually gives more bits than the original image, but the variable bit length can give much less. Codec 2 outperforms codec 1 in all departments
Since the max value of symbol for codec 2 is much less, it uses smaller number of bits for all the symbols and thus has better performance for constant length.
Since the max value of a symbol is same, constant length bitstream is almost same for both. But codec 2 is better when variable length is considered
260
160
78
66
123300
119552
24497
25974
344
197
15
71
84015
49984
23107
13481
226
226
91
86
30184
31032
12178
9577
If we compare the two codecs, we see that the predictive codec mostly outperforms the simple codec, mostly because it gives lesser number of unique symbols and the max value of its symbols are less too. Lets now look at the histograms of the symbols generated by both codecs for the three test images.
Histogram of symbols for image 1 using codec 1
Histogram of symbols for image 1 using codec 2
Histogram of symbols for image 2 using codec 1
Histogram of symbols for image 2 using codec 2
Histogram of symbols for image 3 using codec 1
Histogram of symbols for image 3 using codec 2
In the first image we see that the range of symbols is same for both, so nothing much can be concluded here. But in image 2, codec uses smaller number of symbols and in image 3, codec 2 a few symbols have a very large concentration (around 1500) but in codec 1, the maximum used symbol is only around 450.
Hopefully next time I will explore converting the symbol stream into actual bitstreams, perform more rigorous analysis and observe which codec does better.