13.3 Floating-Point Numbers, Representation and Manipulation

13.3 Floating Point Representation Files

Revision

This section is intended for those who have already read the PowerPoints, watched the videos, etc. If the videos are unhelpful, do look for others.

To convert from unnormalised/normalised floating point to denary:

    • The implied binary point is always following the most significant digit (one in from the left)

    • look at the exponent first

      • if negative (starting with a 1), convert using two's complement (flip and add 1)

      • If the number was negative, the radix point moved that many places left, otherwise right.

    • If the mantissa is negative, convert using two's complement first

    • Now move the radix point using the calculated value in the exponent. If moving left (pad with 0's before moving the point)

    • Now calculate the answer using fixed-point binary

To convert from denary to normalised floating point:

    1. Calculate the absolute (unsigned) value of the number using fixed point-binary. However, note if the number is negative.

    2. Move the radix point so that the point precedes the first binary digit 1. If moving right, the exponent will be negative, otherwise it will be positive. E.g. 0.0000110 becomes .11 with a negative exponent of 4.

    3. Calculate the exponent from the number of places moved above. If negative, convert using two's complement.

    4. If you have not already done so, add a 0 before the radix point. Any other zeros need to go. The number MUST start 0.1

    5. If negative, convert the number using two's complement. It should now start 1.0

Two's Complement

There is a useful PowerPoint file listed below, along with the lesson slides, giving many different example calculations.



Fixed Point Binary & Floating Point Binary

Before you can grasp the concept of floating point numbers, it's important to understand how to convert a fractional number (i.e. a value with an integer and/or fractional part). The PowerPoint below, of course, goes into this detail, but here are some useful videos to help. The videos below are specific instructional videos very relevant to the course, while the videos below in the general section will help give a different perspective or more detailed information.


Normalisation



In normalised scientific notation, there is just one digit before the decimal point (e.g. 3.5664 x10^3) and this number must be a non-zero digit. The exponent ensures that the correct magnitude of the number is maintained.

When representing real numbers in binary floating-point format, normalisation means that the mantissa should have a significant first bit. This is important so that as few significant bits as possible are lost, because only a limited number of bit can be stored.

The video below shows you how to normalise numbers, but why must the first two digits are different? If you follow scientific notation correctly, a number such as 34.6 would become 3.46 x 10^1, equally 0.002 would become 2 x 10^-3. Therefore, +ve numbers would start with a 1 and because we add a 0 to represent the sign, we get 01. Equally, if the number was negative, after applying two's complement, the binary value would start 10.

How do you represent 0 or -1 in normalised two's complement? It's complicated and in reality, processors generally use an IEEE standard, which has special exceptions. Equally, numbers such as 0 and -1 shouldn't be stored as a floating point number anyway, but would be denormalised. You will need to know that any value starting 00 or 11 is NOT normalised in the exam.

Normalisation is also critical because we need to preserve the most significant digits and by normalising, we are making full use of the bits available to us and, potentially, sacrificing the less important digits. Also, normalised values are easier to calculate. E.g. (4.4 * 10^7) * (3.2 * 10^-3) can easily be calculated as (4.4*3.2) and (10^7 + 10^-3).

Exceptions

There are a couple of tricky values that can cause difficulty with normalisation. For example -0.5, -0.25. As with the largest negative value, the tricky part comes as, when normalised, you end up with a single 1 digit. We state that, as a rule of thumb, a normalised value is either 01 or 10 and -0.5, -0.25 or even -1 is the exception as when normalised, all start .1000000 (the exponent of course accounting for the magnitude).

-0.25 in normalised two's complement (10b M and 6b E) is officially

1.000000000 with exponent of 111110

However, when you do this yourself, you might end up with

1.100000000 and exponent 111111, which annoyingly, is the exception to our rule of thumb and correct.

Another example is -0.5, which (using 8 & 4 bits) gives 11000000 and an exponent of 0000. However, we still should normalise this and thus we are left with 100000000 and because we shifted the value, we set the exponent as 1111.

The exam board does not want to trip you up, so it's very unlikely you'll be asked these values.

Useful Links

Wikibooks: Floats WITH PRACTICE QUESTIONS

Floating-point theory - Teach ICT

Two's Complement Youtube playlist

Tutorial on data representation - Nanvanng Technological Universityhttps://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html

Normalised binary calculator

General Videos

Why we use two's complement



Floating-point numbers

A general video covering floating point numbers, including types of error


Specific Topic Videos

Converting to two's complement

Converting negative decimal to two's complement format. Remember, a positive number is just the value itself. A number, stored in two's complement (even if not using the method for a positive number) can only hold numbers from -2^(n-1) up to 2^(n-1) -1. E.g. using 5 bits you can go from -16 to +15


Converting two's complement float to decimal


Normalising numbers