Floating Point Numbers - Yr 2 Only

From TRCCompSci - AQA Computer Science
Jump to: navigation, search

Floating point numbers are a method of dynamic binary numerical representation, allowing for a customizable range and accuracy using the same number of digits.

Floating point consists of 2 parts, a mantissa which contains the binary value of the represented number, and the exponent which shifts the decimal point according to the size of the number. The mantissa determines the precision of the number (i.e. how near the exact value it is). The exponent determines the range of the number.

Overview

https://www.youtube.com/watch?v=dcIDAnfp8Dc&list=PLCiOXwirraUDGCeSoEPSN-e2o9exXdOka&index=4

Range & Precision

https://www.youtube.com/watch?v=qVo2kHCtX2M&list=PLCiOXwirraUDGCeSoEPSN-e2o9exXdOka&index=7

A floating point number has 2 key parts, the mantissa and the exponent. Remember the size of the exponent and mantissa will always be a trade off between range and precision.

Mantissa

Controls the precision of the number, the more bits used for the mantissa the more precise the value represented.

Exponent

Controls the range of values that can be represented, the more bits used for the exponent the wider the range.

Normalisation

For a floating point number to be normalized and make the best use of available memory, it must begin with "0.1" for a positive number and "1.0" for a negative number. Any deviation with this could be a waste of bits, as the same number could be represented with a smaller mantissa.

This means that for positive numbers all leading 0’s should be removed (except for the sign bit).

This means that for negative numbers all leading 1’s should be removed (except for the sign bit).

https://www.youtube.com/watch?v=RcY0aiSsyqI&list=PLCiOXwirraUDGCeSoEPSN-e2o9exXdOka&index=8


Example

For example, the number 32 could be represented by a floating point number with an 8 bit mantissa and a 5 bit exponent.

32 in binary is:

00100000

The mantissa would be as follows:

0.1000000

The exponent must shift the decimal point to shift 1 into the value of 32, it must therefore have a value of 6:

00110

Converting from Binary to Denary

  1. Write down the mantissa, with the point inserted after the sign bit. (Miss off trailing 0’s)
  2. If the mantissa is negative (sign bit = 1) then
    1. find the twos complement of the mantissa
  3. If the exponent is negative (sign bit = 1) then
    1. find the twos complement of the exponent
  4. Calculate the value of the exponent in denary
  5. If the exponent is positive then
    1. move the point in the mantissa to the right the number of places given by the exponent
  6. else {if the exponent is negative}
    1. move the point in the mantissa to the left the number of places given by the exponent
  7. Convert the mantissa to denary to obtain the answer

Example 1

Convert the number 0100100100 000100 to denary

  1. Write down mantissa, including point, without trailing 0s = 0.1001001
  2. Do nothing (Mantissa not negative)
  3. Do nothing (Exponent not negative)
  4. Calculate exponent (000100) in denary = 4
  5. Adjust point in mantissa (move point 4 places right) = 1001.001
  6. Convert mantissa to denary = 9.125

Answer = 9.125

Example 2

Convert the number 1010000000 000101 to denary

  1. Write down mantissa, including point, without trailing 0s = 1.01
  2. Mantissa negative so find the twos complement = - 0.11
  3. Do nothing (Exponent not negative)
  4. Calculate exponent (000101) in denary = 5
  5. Adjust point in mantissa (move point 5 places right) = - 11000.
  6. Convert mantissa to denary = - 24

Answer = - 24

Example 3

Convert the number 1010000000 111101 to denary

  1. Write down mantissa, including point, without trailing 0s = 1.01
  2. Mantissa negative so find the twos complement = - 0.11
  3. Exponent negative so find the twos complement = - 000011
  4. Calculate exponent (- 000011) in denary = - 3
  5. Adjust point in mantissa (move point 3 places left) = - 0.00011
  6. Convert mantissa to denary = - 0.09375

Answer = - 0.09375

Converting from Denary to Binary

  1. Convert the denary number to an unsigned binary number (the mantissa)
  2. Normalise this (move the point to in front of the leading 1)
  3. If the number is negative then
    1. represent it as its twos complement equivalent
  4. Count the number of places the point has been moved to give exponent
  5. If point moved left then
    1. exponent is positive
  6. else {if point moved right}
    1. exponent is negative
  7. Convert exponent to twos complement binary (6-bits in this case)
  8. Add 0’s to the mantissa if necessary (to give 10 bits in this case)

Example 1

Convert 123.5 to floating point form

  1. Convert number (123.5) to pure binary = 1111011.1
  2. Normalise mantissa = 0.11110111
  3. (Number not negative)
  4. The point has moved 7 places left, so exponent = 7
  5. Convert exponent to twos complement binary = 000111
  6. Add 0’s to the mantissa = 0.111101110

Answer = 0111101110 000111

Example 2

Convert 0.1875 to floating point form

  1. Convert number (0.1875) to pure binary = 0.0011
  2. Normalise mantissa = 0.11
  3. (Number not negative)
  4. The point has moved 2 places right, so exponent = - 2
  5. Convert exponent to twos complement binary = 111110
  6. Add 0’s to the mantissa = 0.110000000

Answer = 0110000000 111110

Example 3

Convert -0.375 to floating point form

  1. Convert number (-0.375) to pure binary = - 0.011
  2. Normalise mantissa = - 0.11
  3. Number negative so find twos complement = 1.01
  4. The point has moved 1 place right, so exponent = - 1
  5. Convert exponent to twos complement binary = 111111
  6. Add 0’s to the mantissa = 1.010000000

Answer = 1010000000 111111

Errors

Overflow & Underflow

https://www.youtube.com/watch?v=vH7pQxWTTio&list=PLCiOXwirraUDGCeSoEPSN-e2o9exXdOka&index=9

Under Flow

A number is too small to be represented with the number of bits allocated. i.e. it gives zero

Over Flow

A number so large it can not be represented with the number of bits allocated. This is hard to show in floating point because the range of values are so high. But it is just the same problem with standard binary:

01111111 is 127 in two's complement binary (ie the 0 is in the -128 place value)
Adding 00000001 to this value will give us 10000000
This is an overflow, because 127+1 = 128, but 10000000 is -128
128 in 8 bit two's complement can't be represented

Rounding

https://www.youtube.com/watch?v=e9QsbmwckbE&list=PLCiOXwirraUDGCeSoEPSN-e2o9exXdOka&index=6

Not all values can be represented in floating point, this is due to the size of the Mantissa. Some values might need more digits than the size of the mantissa to accurately represent a number.

Absolute Error

The absolute error is the difference between the number you wanted to represent and the number you have. So if you wanted to represent 10.8, but you actually have 10.75 the absolute error is just 0.05

Relative Error

The relative error takes the absolute error and represents it as a percentage of the original value. So in the example above 10.8 is only represented as 10.75, the absolute error is 0.05. The relative error is (0.05/10.8) * 100 which is 0.463%

Revision Quiz

1. What are the two components of floating point?

Mantissa
This is the first part of the floating point
Fixed point
INCORRECT
Hexadecimal
INCORRECT
Exponent
This is the second part of the floating point

2. What is the purpose of the mantissa?

The mantissa determines the order of the number
INCORRECT
The mantissa determines the range of the number
INCORRECT This is the role of the exponent
The mantissa determines the precision of the number
The mantissa controls the precision of the number, the more bits used for the mantissa the more precise the value represented.
The mantissa does not have a purpose in floating point
INCORRECT

3. What is the purpose of the exponent?

The exponent determines the order of the number
INCORRECT
The exponent determines the range of the number
The exponent controls the range of values that can be represented, the more bits used for the exponent the wider the range.
The exponent determines the precision of the number
INCORRECT This is the role of the mantissa
The exponent does not have a purpose in floating point
INCORRECT

4. What is it called when a number is too small to be represented with the number of bits allocated?

UnderFlow
CORRECT
OverFlow
INCORRECT
Relative Error
INCORRECT
Absolute Error
INCORRECT

5.

What is it called when a number is so large it cannot be represented with the number of bits allocated?

6.

What is it called when the absolute error is represented as a percentage of the original number ?

7. What is the denary value of this floating point number (10 bit mantissa, 6 bit exponent, two's complement)? 0110010000 000011


8. What is the floating point value of this denary number (10 bit mantissa, 6 bit exponent, two's complement)? -0.875


9. What is the floating point value of this denary number (10 bit mantissa, 6 bit exponent, two's complement)? -100


10. What is the denary value of this floating point number (10 bit mantissa, 6 bit exponent, two's complement)? 0100100000 000011


11. What is the denary value of this floating point number (10 bit mantissa, 6 bit exponent, two's complement)? 0100111000 00011


12. What is the floating point value of this denary number (10 bit mantissa, 6 bit exponent, two's component)? 99


Your score is 0 / 0