Data storage and data compression

Measurement of data storage

A bit is the basic unit of all computing memory storage terms and is either 1 or 0. The word comes from binary digit. The byte is the smallest unit of memory in a computer. 1 byte is 8 bits. A 4-bit number is called a nibble – half a byte.

1 byte of memory wouldn’t allow you to store very much information so memory size is measured in the multiples shown in Table 1.4:

▼ Table 1.4 Memory size using denary values

The above system of numbering now only refers to some storage devices but is technically inaccurate. It is based on the SI (base 10) system of units where 1 kilo is equal to 1000.

A 1 TB hard disk drive would allow the storage of 1 × 1012 bytes according to this system.

However, since memory size is actually measured in terms of powers of 2, another system has been adopted by the IEC (International Electrotechnical Commission) that is based on the binary system (Table 1.5):

This system is more accurate. Internal memories (such as RAM and ROM) should be measured using the IEC system. A 64 GiB RAM could, therefore, store 64 × 230 bytes of data (68 719 476 736 bytes).

Calculation of file size In this section we will look at the calculation of the file size required to hold a bitmap image and a sound sample.

The file size of an image is calculated as: image resolution (in pixels) × colour depth (in bits)

The size of a mono sound file is calculated as: sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)

Example 1

A photograph is 1024 × 1080 pixels and uses a colour depth of 32 bits. How many photographs of this size would fit onto a memory stick of 64 GiB? 1 Multiply number of pixels in vertical and horizontal directions to find total number of pixels = (1024 × 1080) = 1 105 920 pixels 2 Now multiply number of pixels by colour depth then divide by 8 to give the number

of bytes = 1 105 920 × 32 = 35 389 440/8 bytes = 4 423 680 bytes 3 64 GiB = 64 × 1024 × 1024 × 1024 = 68 719 476 736 bytes 4 Finally divide the memory stick size by the files size = 68 719 476 736/4 423 680

= 15 534 photos.

Example 2

A camera detector has an array of 2048 by 2048 pixels and uses a colour depth of 16. Find the size of an image taken by this camera in MiB.

1 Multiply number of pixels in vertical and horizontal directions to find total number of pixels = (2 048 × 2 048) = 4 194 304 pixels

2 Now multiply number of pixels by colour depth = 4 194 304 × 16 = 67 108 864 bits

3 Now divide number of bits by 8 to find the number of bytes in the file

= (67 108 864)/8 = 8 388 608 bytes

4 Now divide by 1024 × 1024 to convert to MiB = (8 388 608)/(1 048 576) = 8 MiB.

Example 3

An audio CD has a sample rate of 44 100 and a sample resolution of 16 bits. The music

being sampled uses two channels to allow for stereo recording. Calculate the file size

for a 60-minute recording.

1 Size of file =

sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)

2 Size of sample = (44 100 × 16 × (60 × 60)) = 2 540 160 000 bits

3 Multiply by 2 since there are two channels being used = 5 080 320 000 bits

4 Divide by 8 to find number of bytes = (5 080 320 000)/8 = 635 040 000

5 Divide by 1024 × 1024 to convert to MiB = 635 040 000 / 1 048 576 = 605 MiB.

Data Compression

The calculations in Section 1.3.2 show that sound and image files can be verylarge. It is therefore necessary to reduce (or compress) the size of a file for the

following reasons:

» to save storage space on devices such as the hard disk drive/solid state drive

» to reduce the time taken to stream a music or video file

» to reduce the time taken to upload, download or transfer a file across a network

» the download/upload process uses up network bandwidth – this is the

maximum rate of transfer of data across a network, measured in bits per second. This occurs whenever a file is downloaded, for example, from a server.

Compressed files contain fewer bits of data than uncompressed files and therefore use less bandwidth, which results in a faster data transfer rate.

» reduced file size also reduces costs. For example, when using cloud storage,the cost is based on the size of the files stored. Also an internet service provider (ISP) may charge a user based on the amount of data downloaded.

1.3.4 Lossy and lossless file compression

File compression can either be lossless or lossy.

Lossy file compression With this technique, the file compression algorithm eliminates unnecessary data

from the file. This means the original file cannot be reconstructed once it has been compressed.

Lossy file compression results in some loss of detail when compared to the original file. The algorithms used in the lossy technique have to decide which parts of the file need to be retained and which parts can be discarded.

For example, when applying a lossy file compression algorithm to:

» an image, it may reduce the resolution and/or the bit/colour depth

» a sound file, it may reduce the sampling rate and/or the resolution.

Lossy files are smaller than lossless files which is of great benefit when considering storage and data transfer rate requirements.

Common lossy file compression algorithms are:

» MPEG-3 (MP3) and MPEG-4 (MP4)

» JPEG.

MPEG-3 (MP3) and MPEG-4 (MP4)

MP3 files are used for playing music on computers or mobile phones. This compression technology will reduce the size of a normal music file by about 90%. While MP3 music files can never match the sound quality found on a DVD or CD,

the quality is satisfactory for most general purposes. But how can the original music file be reduced by 90% while still retaining most

of the music quality? Essentially the algorithm removes sounds that the human ear can’t hear properly. For example:

» removal of sounds outside the human ear range

» if two sounds are played at the same time, only the louder one can be heard by the ear, so the softer sound is eliminated. This is called perceptual music shaping.

MP4 files are slightly different to MP3 files. This format allows the storage of

multimedia files rather than just sound – music, videos, photos and animation

can all be stored in the MP4 format. As with MP3, this is a lossy file compression

format, but it still retains an acceptable quality of sound and video. Movies,

for example, could be streamed over the internet using the MP4 format without

losing any real discernible quality.

JPEG

When a camera takes a photograph, it produces a raw bitmap file which can be very large in size. These files are temporary in nature. JPEG is a lossy file compression algorithm used for bitmap images. As with MP3, once the image

is subjected to the JPEG compression algorithm, a new file is formed and the original file can no longer be constructed.

The JPEG file reduction process is based on two key concepts:

» human eyes don’t detect differences in colour shades quite as well as they detect differences in image brightness (the eye is less sensitive to colour variations than it is to variations in brightness)

» by separating pixel colour from brightness, images can be split into 8 × 8 pixel blocks, for example, which then allows certain ‘information’ to be discarded from the image without causing any real noticeable deterioration in quality.

Lossless file compression With this technique, all the data from the original uncompressed file can be reconstructed. This is particularly important for files where any loss of data would be disastrous (e.g. when transferring a large and complex spreadsheet or when downloading a large computer application).

Lossless file compression is designed so that none of the original detail from the file is lost.

Page updated

Google Sites

Report abuse