Floating Point Representation

Course Content Specification

Advantage of floating point

Disadvantage of floating point

Floating Point Number Structure

Lesson Video - Floating Point Representation

A worked example

Decimal fraction => .03125

Binary fraction = 0.00001

What about small numbers?

Tutorial Video on creating the Decimal Portion

Floating Point Worked Examples

Worked Example 1

Worked Example 2

Worked Example 3 - negative number

Allocating more bits to Mantissa

Allocating more bits to Exponent

Course Content Specification

Describe and exemplify floating-point representation of positive and negative real numbers, using the terms mantissa and exponent.
Describe the relationship between the number of bits assigned to the mantissa/exponent, and the range and precision of floating-point numbers.

At present we have used binary to represent integers (whole numbers) and two's complement to store negative numbers. But there is a problem at present we cannot store real numbers (numbers with decimal portions). It is also cumbersome to store large integers.

We are accustomed to using a fixed notation where the decimal point is fixed and we know that any numbers to the right of the decimal point are the decimal portion and to the left is the integer part.

E.g. 10.75

10 is the Integer Portion and 0.75 is the decimal portion. To get around this problem the computer uses Floating Point Representation.

The number above demonstrates the location of the mantissa and exponent.

So in this instance the mantissa would be 25 and the exponent would be 5

The computer only stores the Mantissa and the Exponent. It does not need to store the base as it already knows that this will always be 2, this saves memory.

Advantage of floating point

It takes up less space for storing large numbers and allows real numbers to be stored.

Disadvantage of floating point

The main disadvantage of floating point is that the computer has to split its storage space between the mantissa and the exponent. This means that the mantissa can cause rounding errors if not enough room is assigned to it. There has to be a tradeoff between accuracy and the range of numbers that we can represent.

Floating Point Number Structure

There is an IEEE (Institute of Electrical and Engineers) standard that defines the structure of a floating point number. It is IEEE754-2008. It defines 4 sizes of floating point numbers.

There are 4 sizes of numbers defined:

16 bit sometimes known as Half precision
32 bit sometimes known as Single precision
64 bit sometimes known as Double precision
128 bit sometimes known as Quadruple precision

A 32 bit floating number (single precision) has the following structure.

Sign bit and Mantissa (24 bits) - the sign part is considered part of the mantissa
Exponent 8 bits

Lesson Video - Floating Point Representation

Floating Point Representations.MP4

A worked example

In decimal first

250.03125

First you convert the integer part of the mantissa into binary (as you have done previously)

250 = 1111 1010

Now to convert the decimal portion of the mantissa (although this would usually be done in the exam for you.

Decimal fraction => .03125

Multiply and use any remainder over 1 as a carry forward. Continue until you reach 1.0 with no carry over

0.03125 * 2 = 0 r 0.0625

0.0625 * 2 = 0 r 0.125

0.125 * 2 = 0 r 0.25

0.25 * 2 = 0 r 0.5

0.5 * 2 = 1 r 0

Binary fraction = 0.00001

So far we have : 1111 1010.00001 (250.03125)

But we need it in the format .11111 0100 0001 (the decimal point to the left of the first 1)

So back to our example

Sign Bit = 0

Mantissa =.11111 0100 0001 (.25003125)

Exponent = 0000 1000 (8)

And the number is positive so the sign bit is 0

What about small numbers?

If we are trying to convert the number: 0.0625

In binary this would be 0.001

The leading bit after the . has to be a 1 so this time the decimal point has to move to the right, which means it is a negative number.

So as the exponent is -2 this would be stored using two's complement notation (link here for reminder)

Tutorial Video on creating the Decimal Portion

Although unlikely to be asked in the exam I have put together a small video on how to create the decimal port of a floating point number - such as the 0.5 in 12.5

Creating the decimal portion.MP4

Floating Point Worked Examples

Worked Example 1

We are using

8 bits for the exponent
16 bits for the mantissa
(1 is the sign bit)

102.9375 = 1100110.1111

Sign = 0 (+ve)

Number = 1100110.1111 -> Needs to be .11001101111

Exponent = 7 = 0000 0111

Number = 0 110011011110000 00000111

Worked Example 2

We are using

8 bits for the exponent
16 bits for the mantissa
(1 is the sign bit)

250.75 = 11111010.11

Sign = 0 (+ve)

Number = 11111010.11 -> Needs to be .1111101011

Exponent = 8 = 00001000

Number = 0 111110101100000 00001000

Worked Example 3 - negative number

We are using

8 bits for the exponent
16 bits for the mantissa
(1 is the sign bit)

0.0009765625 = 0.0000000001

Sign = 0 (+ve)

Number = 0.0000000001

Exponent = -9 = 1111 0111

Number = 0 101 0000 0000 0100 1111 0111

Further Practice

Convert the following decimal numbers into single precision floating point numbers:

123.046875 (click here for the answer)
124.28125 (click here for the answer)
100.34375 (click here for the answer)

Allocating more bits to Mantissa

As can be seen from the example earlier the more bytes that we have for the mantissa means we can represent decimal fractions more accurately.

Allocating more bits to Exponent

Whereas the number of bytes used for the exponent means we can move the decimal point more places which means we can represent a larger range of numbers.

Mnemonic: MARE (Mantissa Accuracy Range Exponent)

Next - Text

Page updated

Report abuse