Floating Point Representation


	Home	Back	Topics	CD Orders	INDEX

Floating Point Representation of Real numbers

In denary the integer 25000 can be written as

2.5 x 10⁴

This number is in floating point format :

Mantissa x 10^Exponent

In binary, though, we express the exponent as a power of 2, so every binary floating point number is of the form :

Mantissa x 2^Exponent

The mantissa would be a fixed point fraction and the exponent would be an integer.

There can be many floating point representations of the same number...

For example, the integer 40 can be written as
= 20 x 2¹
= 10 x 2²
= 5 x 2³= 2.5 x 2⁴
= 1.25 x 2⁵
= 0.625 x 2⁶

!!!Which one do we use?..Well the answer is : We use the last one where the mantissa lies between 0.5 and 1 - this is said to be in normalised form and is the most accurate.

Example

A floating point number system uses 16-bit numbers. 8 bits for the (signed)mantissa, and 8 bits for the (signed) exponent.

Convert the following binary number to denary.

01010001 00000101

Step 1 : The Exponent 00000101

This is a positive integer :

Sign	64	32	16	8	4	2	1
0	0	0	0	0	1	0	1

The exponent value is 4 + 1 = 5

Step 2 :

The Mantissa 0.1010001

This is a positive fraction (binary point after the sign bit):

The exponent is 5 ...so perform an arithmetic left shift 5 times...

0.1010001
01.010001 (once)
010.10001 (twice)
0101.0001 (3 times)
01010.001 (4 times)
010100.01 (5 times)

Sign	16	8	4	2	1	.	0.5	0.25
0	1	0	1	0	0	.	0	1

The final answer : 19.25

Converting a real number into floating point form.

(This example uses a 12-bit signed mantissa and a 4-bit signed exponent - make sure you read any exam questions carefully for the storage setup)

Example : Convert 14.625 into floating point form.

Step 1 : Convert the integer and fraction parts of the number into binary:

Sign	...	32	16	8	4	2	1
0	...	0	0	1	1	1	0

14 = 01110

Sign . 0.5 0.25 0.125 0.0625 0.03125 0.015625 ...

0 . 1 0 1 0 0 0 ...

.625 = 0.101

So 14.625 = 01110.1010000 (adding 0s to make it up to 12 bits)

Step 2 : Perform a number of arithmetic right shifts (divide mantissa by 2) until the binary point is in the correct position (after the sign bit). For each shift, add 1 to the exponent...

(Think of this as moving the binary point to the left a number of places)

Mantissa	Exponent
01110.1010000	0
0111.01010000	1
011.101010000	2
01.1101010000	3
0.11101010000	4

Step 3 : Convert the exponent into a binary integer.

Exponent = 4 = 0100

So final answer :

14.625 =011101010000 0100

Advantages and Disadvantages

The advantage of using floating point form for numbers is that a greater range of numbers is representable.

The disadvantages -

more storage space needed
slower processing times
lack of precision - some real numbers can only be represented approximately