Data Representation

Learning Outcomes

• explain the terms bit, byte, kilobyte, megabyte, gigabyte and terabyte;

• demonstrate that 2n different values can be represented with n bits (maximum n = 8);

• Binary and decimal

• perform conversions from decimal to binary and from binary to decimal for a maximum of 8 bits;

• demonstrate how the two’s complement system can represent positive and negative numbers in binary using 8 bits;

• demonstrate how American Standard Code for Information Interchange (ASCII) and Unicode are used to represent characters;

bits and bytes

•Units of Storage Computers use a variety of memory techniques to store data. All data is stored in digital format using a number system known as binary. A Binary digIT (known as a BIT) is either a 0 or a 1. It is the smallest unit of storage. When bits are grouped together, typically eight, it is referred to as a byte. A single character (such as a letter or digit) is typically represented by a byte.

The capacity of storage in a typical computer or peripheral is also measured in bytes.

•1024 Bytes = 1 Kilobyte

•1024 Kilobytes = 1 Megabyte

•1024 Megabytes = 1 Gigabyte

•1024 Gigabytes = 1 Terabyte

These terms are usually used to describe disk capacity, or data storage capacity, and system memory. Today, Terabyte is the common term being used to describe the capacity of a hard drive

Question: Calculate the number of bytes in a 2 Gigabyte USB memory pen?

binary and decimal

•The number of bits used will determine the number of different values that can be represented. The greater the number of bits the greater the number of values. For example if we use 3 bits, this will allow eight different values as shown here.

•In the decimal (sometimes referred to as denary) numbering system, each integer number column has values of units, tens, hundreds, thousands, etc. as we move along the number from right to left. Mathematically these values are written as index numbers starting from the right hand side 100, 101, 102, 103 etc. Then each position to the left of the decimal point indicates an increased positive power of 10.

•The Binary Numbering System is used in all digital and computer based systems. Binary numbers follow the same set of rules as the decimal numbering system. The main difference is the decimal system uses powers of ten whereas the binary numbering system works on powers of two. Each binary place value can be converted to an equivalent decimal number. As seen above.

When converting from binary to decimal, the decimal number is equal to the sum of powers of 2 of the binary number ‘1’ digits. Consider changing the binary value 11001110 to decimal.

Two's Complement

Using the two’s complement system to represent positive and negative numbers in binary Consider the problem of representing both positive and negative integers over a given range in terms of only using ones and zeroes. This would mean we have to build in the sign as part of the binary representation. We can represent a negative number in binary by making the most significant bit (MSB) a sign bit, which will tell us whether the number is positive or negative. In two’s complement if the MSB has a value of 1 then the number will have a negative value and if the MSB is 0 then the number will have a positive value. The column headings for an 8 bit two’s complement number will look like this:

Representing Text

Representing characters using ASCII and Unicode

A character set is a set of symbols that maybe represented in a computer at a particular time. These symbols, called characters, can be classified as letters, digits and punctuation marks. A character set also contains “control” characters which are non-printing and are used for special purposes such as an end of file marker. Two examples of character sets are ASCII and Unicode.

A character set is range of symbols/characters which can be represented on a particular device. Each character has a unique binary value to represent it. These include displayable characters/letters/digits/punctuation marks And control codes/non-printing characters

ASCII (American Standard Code for Information Interchange)

Standard ASCII uses a 7 bit code (The 8th bit is used simply as a parity bit, it has no meaning and is only used to make sure each character is a byte) and represents 128 characters including control characters. The characters are coded as consecutive numbers from 0 to 127. These characters and control codes are encoded as simple, unsigned, binary integers. Every character has an unique binary pattern as shown in the table below

ASCII was designed to represent English-language text for an American user base, and is therefore insufficient for representing text in almost any language other than American English.

UNICODE

Unicode is a worldwide character standard. It allows for the interchange, processing, and display of the written texts of the majority of the diverse languages of the modern world. Whereas ASCII was limited to American English characters Therefore it is a single-coded character set that incorporates characters from almost all the worlds’ languages. To allow for the increase in representing characters Unicode is a 16 bit extension of ASCII code. As unicode has a larger range of characters it will also require more memory than ASCII.

Keywords

Past Paper Questions

1 (a) (i) How many bytes are there in each of the following?

A megabyte 1024 to the power of 2

A terabyte [2] 1024 to the power of 4

(ii) A file requires 4 megabytes of storage. How many files of this size could be held in a gigabyte? You must show your work. [3]

1024/4= 256

(b) The two’s complement system can be used to represent positive and negative numbers.

(i) Describe how negative numbers are represented in two’s complement using 8 bits. [2]

The binary of the +ve equivalent is inverted 1 is added to the least significant bit (LSB) [1] + [1]

the first bit equals -128 when this is one the remaining values are added to it to determine the negative munmber.

(ii) Show how the decimal number -64 can be represented as a two’s complement binary number using 8 bits. [3]

The decimal number 64 is 01000000 in binary

[1] Invert 10111111

[1] Add 1 to LSB11000000 [1]

ASCII uses 7 bits ASCII can represent represent 128 characters.

The 8th bit can be used for error checking Alternatively, ASCII can represent 28 characters/ASCII uses 8 bits to represent 256 characters Unicode Unicode can represent 216 characters/Unicode uses 16 bits to represent 65536 characters Points of comparison Unicode can represent 28 (or 256) times as many characters as ASCII and this eliminates the need to have different character sets for different languages ASCII uses fewer bits than Unicode which could result in faster processing/reduced memory requirements

2 (a) Explain how both positive and negative numbers can be represented in binary using the two’s complement system. [4]

The most significant bit (MSB) is used as a sign bit The MSB is 0 for a positive number The MSB is 1 for a negative number For a positive number, place values are used A negative number is stored as the two’s complement (of its positive equivalent) To get the two’s complement, invert the bits and add 1 to the LSB

(b) Convert each of the following two’s complement binary numbers into decimal. Show your working out.

00001111

11100001 [4]

-31

3 (a) By converting both units to bytes, calculate how many gigabytes there are in a terabyte.[3 Marks]

One gigabyte = bytes 1024 to the power of 3

One terabyte = bytes 1024 to the power of 4

One terabyte = gigabytes 1024

(b) 128 characters can be represented in ASCII using 7 bits. Suggest a use for the eighth bit in each byte. [1]

It can be used for error checking ... as a parity bit It can be used to represent additional characters/extend the number of characters which can be represented ... so that graphics characters/accented letters can be represented

(c) Describe one advantage and one disadvantage of using Unicode instead of ASCII to represent characters. Advantage Disadvantage [4]

More characters can be represented Unicode uses 16 bits Unicode can represent 216 characters/65536 characters 29 (512) times as many characters can be represented Allows characters from most of the world’s languages to be represented /emoticons can be represented

A character set is all of the characters (Letters/Numbers and Symbols which can be represented on a computer. ASCII is an example of a character set. It uses 7 bits. Unicode is also an example it uses 16 bits.

(d) How many characters can be represented in ASCII? [1]

127

(e) ASCII uses 7 bits in a byte. Suggest a use for the 8th bit. [2]

it can be used for error checking ... as a parity bit It can be used to represent additional characters/extend the number of characters which can be represented ... so that graphics characters/accented letters can be represented.

(f) The capacity of a particular hard drive is 1TB. Write down what ‘TB’ stands for and state how many gigabytes it contains. [2 Marks]

TB terabyte

Number of gigabytes 1024

(g) Discuss the benefit of using Unicode instead of ASCII as a character set. [3]

ASCII can represent 128 characters because it uses 7 bits. Unicode can represent 65536 characters because it uses 16 bits. It can be used to represent a wider range of characters including many languages, symbols and emoticons.

(h) Describe one advantage and one disadvantage of using Unicode instead of ASCII to represent characters. [4 Marks]

Advantage

It uses 16 bits which means it can represent over 64000 characters including many languages, symbols and emoticons.

Disadvantage

ASCII uses fewer bits which results in faster processing and reduced memory/storage requirements as opposed to unicode.

Page updated

Report abuse