Data Compression Algorithm

Both Cntrlx and Maestro use a 12-bit, 16-channel data acquisition board to record the animal's eye position and velocity, head position and velocity, and other relevant analog signals. The experiment designer usually chooses to record only a subset of the 16 available channels. The selected channels are recorded in the same sequence at the beginning of each scan, or "tick", during an experiment. Each digitized sample is a signed 12-bit number, falling in the range [-2048..2047]. Thus, if the N channels {c1, ..., cN} were recorded over the course of M ticks, then the uncompressed analog data stream would consist of a sequence of N*M 12-bit integers:

c1(0), ... cN(0), c1(1), ... cN(1), ..., c1(M-1), ... cN(M-1).

Given the practicalities of writing data to file in byte-sized units, it would be a laborious operation to pack the 12-bit samples (one and a half bytes!) into our 1024-byte records. Each sample could be stored in the Maestro data file's analog data record as a 16-bit integer, but that wastes considerable space -- particularly when you consider the potentially large data sets that can be generated in Continuous mode. Instead, since the recorded analog signals change relatively slowly most of the time, Maestro saves only the difference between successive samples on each selected channel:

D1(0), ... DN(0), D1(1), ... DN(1), ..., D1(M-1), ... DN(M-1), where Dn(m) = cn(m) - cn(m-1) and Dn(0) = cn(0).

It will often be the case that the difference between successive samples is small enough to fit into a single byte, and Maestro takes advantage of this fact to compress the analog "difference" stream using the following algorithm:

    1. Compute the "difference" sample Dn(m).

    2. If the absolute value of Dn(m) < 64, then add 64 and store it as a single byte. Thus, [-63..63] is mapped to [0x01..0x7F]. Observe that bit 7 of the compressed sample is always 0.

    3. Otherwise, compute (Dn(m) + 4096) | 0x8000. This transformation maps the range of values [-2048..-64] to [0x8800..0x8FC0], and [64..2047] to [0x9140..0x97FF]. Observe that bit15 of the two-byte result is always 1. The result is stored as a two-byte integer, with the high byte first.

This scheme can compress the analog data by a factor of 2 at best. Typical compression factors for actual recorded data are probably in the range of 1.5 to 2. To decompress the encoded data stream, we simply read in each byte, check its most significant bit to determine whether it is a single-byte compressed sample or the high byte of a two-byte encoded sample, then perform the inverse of the transformation described above to recover the original "difference" sample. Of course, in order to segregate the uncompressed difference stream into the individual recorded channel traces, we need to know which analog channels were saved and the channel scan order -- which is why that information is stored in the data file header.

The pseudo-code snippet below implements the decompression task. It assumes the compressed data stream has already been extracted from the analog data records in the data file and stored in a byte buffer. It also assumes that the compressed data stream is correct and complete (no error checking code). Note that it leaves the channel trace data in its digitized form. Most analysis programs will ultimately wish to convert the digitized samples to real units -- degrees, degrees/second, or volts. Let S represent the digitized signal trace. Maestro's external laboratory apparatus is carefully calibrated so that S * 0.025 is position in degrees subtended at the eye (if the channel is recording a position signal, of course!), while S * 0.09189 is velocity in degrees/second. Finally, since Maestro's analog input device has 12 bits resolution and a bipolar range of ±10 volts, (S*20)/4096 is the analog signal in millivolts.

// [in] the compressed analog data stream extracted from data file, and the header record

char* pCompressedStream;

CXFILEHDR header;


// [out] buffers to hold the recorded channel traces after decompression

short* traces[CXH_MAXAI];


// allocate buffers only for those AI channels that were recorded; all other channel trace buffers are set to NULL int i;

for( i = 0; i < CXH_MAXAI; i++ ) traces[i] = NULL;

for( i = 0; i < header.nchans; i++ )

{

int ch = header.chlist[i];

traces[ch] = (char*) malloc( header.nScansSaved*sizeof(short) );

}


// the previously decompressed sample for each AI channel; all channels init'd to 0 at "t=-1"

int iLastSample[CXH_MAXAI];

memset( iLastSample, 0, CXH_MAXAI*sizeof(int) );


// uncompress the data stream, one channel scan's worth at a time...

int nBytes = 0;

int nScans = 0;

while( nScans < header.nScansSaved )

{

for( i = 0; i < header.nchans; i++ )

{

int ch = header.chlist[i];

char cByte = pCompressedStream[nBytes++];

short shTemp;

if( cByte & 0x080 ) // bit7 set, next datum is 2 bytes

{

shTemp = (cByte & 0x7F);

shTemp <<= 8;

cByte = pCompressedStream[nBytes++];

shTemp |= 0x00FF & ((short) cByte);

shTemp -= 4096;

}

else // bit 7 clear, next datum is 1 byte

shTemp = cByte - 64;


// add recovered difference to prev sample to get current sample and save this to channel's trace buffer

iLastSample[ch] += (int) shTemp;

traces[ch][nScans] = (short) iLastSample[ch];

}

++nScans;

}