Bit Stream Input/Output

Bit Stream Input/Output


Data compression techniques involve, at the fundamental level, the purpose of compressing a fixed-bit symbol into a smaller set of bits. As such, the programmer will ultimately need to output bits into a file. Likewise, a function must be able to read a single bit or multiple bits from a compressed file. This section describes an include file (GTBITIO.C) which contains bit-oriented input/output (I/O) functions that allow reading and writing bits from and to a file.

The file GTBITIO.C can be a little complicated for the absolute beginner. First it uses the common technique of creating input and output buffers. It is well-known to programmers that file-reading and writing bytes one at a time is a very time-consuming process and it just slows the program down. To remedy this situation, "a buffer in fast RAM" is usually created to act as a temporary data storage array, and functions are employed to read and write larger "chunks" of bytes. These functions are the C standard input/output functions fread() and fwrite(), respectively.

The fread() function gets a chunk of bytes from the file device and stores that chunk into the input buffer created by the program. Conversely, the function fwrite() writes to the actual file the chunks of bytes stored in the output buffer. Since larger bytes are read or written at a time, less actual file read/write is done (i.e. less function calls) which greatly speeds up the program. The file GTBITIO.C contains the following global FILE pointer variables:

The buffers that we use for input and ouput must also be declared as well as the "counter" variables that track the number of bytes/bits read or written so far. We dynamically allocate memory for the output and input buffers. The buffer pointers (i.e. both pointers to unsigned char) are named pbuf and gbuf respectively.

unsigned char *pbuf, *gbuf, p_cnt = 0, g_cnt = 0;

unsigned int pbuf_count, nfread;

The buffers must be of sufficiently larger size and must be a multiple of 1024. 

Because we are just using a pointer for the output buffer and we initialize or reset the output buffer to all 'ZERO' bytes to start outputting bits, I chose to create separate functions for writing a 'ONE' and 'ZERO' bit. The function for writing a ZERO bit (e.g., advance_buf()) just increments the counter p_cnt. This counter is the one that tells if we have already written 8 bits so that the next byte should be next one to write to. Incrementing the counter by 1 is fast since we know that the output buffer is composed of all ZERO bits initially


/* ---- writes a ZERO (0) bit. ---- */

#define put_ZERO() advance_buf()


/* just increment the pbuf buffer for faster processing. */

#define advance_buf()       \

{                          \

    if ( (++p_cnt) == 8 ) { \

        p_cnt = 0; \

        if ( (++pbuf_count) == pBUFSIZE ){ \

            pbuf = pbuf_start; \

            fwrite( pbuf, pBUFSIZE, 1, pOUT ); \

            memset( pbuf, 0, pBUFSIZE ); \

            pbuf_count = 0; \

            nbytes_out += pBUFSIZE; \

        } \

        else pbuf++; \

    } \

}



It is only when we write a 'ONE' bit  (through the function pset_bit()) that we actually manipulate the current byte of the output buffer. Therefore, each time we write a ONE bit, we must also advance the bit counter to account for the bit written:


#define pset_bit() *pbuf |= (1<<p_cnt)


/* ---- writes a ONE (1) bit. ---- */

#define put_ONE() { pset_bit(); advance_buf(); }



More importantly, the function put_nbits() outputs more bits at a time.


When 8 bits are already written to the first byte in the output buffer, the output buffer pointer is also just incremented by 1 (i.e., pbuf++) so that the next bit-writes will simply write to the next byte in the buffer. When the number of bytes written equals the size of the output buffer (pBUFSIZE) we reset everything.  This is done in the advance_buf() function. Naturally, it is easy to recognize that it is not always possible to actually output bytes that are an exact multiple of the output buffer size. It is highly possible that there are still remaining bytes in the buffer not actually written yet by advance_buf(). These bytes are flushed out (using fwrite()) to the output file in the flush_put_buffer() function. Moreover, we must also consider that all bits written are not exactly a multiple of 8 bits, so some remaining bits still must be included in flushing the actual bytes. This is done by adding 1 to pbuf_cnt if and only if p_cnt is greater than 0, meaning at least a bit was written:


void flush_put_buffer( void )

{

    if ( pbuf_count || p_cnt ) {

        fwrite( pbuf_start, pbuf_count+(p_cnt?1:0), 1, pOUT );

:

:

    }

}



Similarly, the input buffer also makes it faster to fetch bits from the input stream. First, a size of gBUFSIZE is read from the actual file device and copied to the input buffer gbuf using the fread() function. It is this buffer that we use for reading the actual bits. When we have finished reading 8 bits (indicated by g_cnt), the buffer pointer is just incremented by one, and this new byte position is now the one used for reading bits. Further, the get_nbits()  function inputs more bits at a time and is definitely faster.


After reading all bits in the buffer, we simply fread() another chunk of bytes from the actual file. This is done as long as there are bytes in the input file. Finally, at the end of the program that uses this bit I/O library file, it is a good programming practice to close the input and output files using fclose() and also free the input and output buffers. The GTBITIO library is available in the algorithms implementations ZIP files featured in this site. You can now implement your best compression ideas using just the given library routines. Note that these are not the fastest bit I/O routines today. This is something like 1990s technology. I wrote them just to be different than the other implementations. But they are moderately fast enough for our purposes. It is your destiny to improve them and write your own faster put_nbits() and get_nbits() functions.