The Pentium division Flaw - Chapter 1
Here we briefly review some simple concepts you probably already know.
(100) | ones | - | - | + | | + | - | - | tenths | (10-1) |
(101) | tens | - | + | | | | | | + | - | hundredths | (10-2) |
(102) | hundreds | + | | | | | | | | | | + | thousandths | (10-3) |
| | | | | | | | | | | | | | |
| 4 | 1 | 9 | . | 5 | 8 | 3 | |
| ^ | -decimal point |
(20) | ones | - | - | - | + | | + | - | - | - | halves | (1/2) |
(21) | twos | - | - | + | | | | | | + | - | - | fourths | (1/4) |
(22) | fours | - | + | | | | | | | | | | + | - | eighths | (1/8) |
(24) | eights | + | | | | | | | | | | | | | | + | sixteenths | (1/16) |
| | | | |
| | | | | | | | | | | | | |
| 1 | 0 | 1 | 1 | . | 1 | 0 | 1 | 1 | |
| ^ | -radix point |
Given an ordinary decimal number such as |
|
14.195835 |
|
we can move the decimal point to the right or left by multiplying
or dividing by powers of ten. For example, to move the decimal
point right two places, we multiply by 102 = 100 |
|
14.195835 * 100 = 1419.5835 |
Given an ordinary binary number such as |
|
1011.1011 |
|
we can move the radix point to the right or left by multiplying
or dividing by powers of two. For example, to move the decimal
point right two places, we multiply by 22 = 4 |
|
1011.1011 * 4 = 101110.11 |
Given an ordinary decimal number such as |
|
14.195835 |
|
we can move the decimal point so it lies directly after the first
digit by multiplying or dividing by an appropriate power of 10. In
our example: |
14.195835 | | = | | 1.4195835 | | * | | 101 |
| mantissa | | exponent |
In this representation 1.4195835 is called the mantissa, and 101
is called the exponent. |
Given an ordinary binary number such as |
|
1011.1011 |
|
we can move the radix point so it lies directly after the first
digit by multiplying or dividing by an appropriate power of 2. In
our example: |
1011.1011 | | = | | 1.0111011 | | * | | 23 |
| mantissa | | exponent |
In this representation 1.0111011 is called the mantissa, and 23
is called the exponent. |
dividend numerator 12
quotient = -------- = ----------- Example: 3 = --
divisor denominator 4
quotient 3
+--------- Example: +---
divisor | dividend 4 | 12
Given any two decimal numbers that we wish to divide, such as:
14.195835
----------
119.716320
we can normalize the values using exponential notation to obtain:
1.4195835 * 101 1.4195835 101 1.4195835
----------------- = ---------- * ---- = ---------- * 10(1-2)
1.19716320 * 102 1.19716320 102 1.19716320
Thus the problem has been reduced to the division of two
normalized numbers followed by a shifting of the decimal point.
Given any two binary numbers that we wish to divide, such as:
1011.1011
---------
11.001000
we can normalize the values using exponential notation to obtain:
1.0111011 * 2^3 1.0111011 2^3 1.0111011
--------------- = --------- * --- = --------- * 2(3-1)
1.1001000 * 2^1 1.1001000 2^1 1.1001000
Thus the problem has been reduced to the division of two
normalized numbers followed by a shifting of the radix point. |
|
The important point to note is that any division operation can be
reduced to dividing normalized mantissas. The exponents and signs
can be handled separately. |
The Pentium uses IEEE standard 594 for representing floating point
numbers. In this standard a single precision floating point variable
is represented using a 24 bit mantissa and an exponent which can take
on values between +127 and -126.
The Pentium also uses two's compliment notation to represent negative
numbers. A positive number is negated by complementing all the bits
(one's complement, i.e. "flip the bits") and then adding 1 to the
result.
Carry-Save addition is a method of quickly reducing the sum of three
variables A, B, and C, down to the sum of two variables PARTIAL_SUM and
CARRY, using only bitwise logical operations (AND, OR, Exclusive-OR).
With Carry-Save addition we operate on all of the columns at once,
placing the partial sum of that column at the bottom, and saving any
overflow as a carry digit at the top. The bottom is our PARTIAL_SUM,
and the carry digits at the top become our CARRY. The true sum of
(A + B + C) is thus reduced to the sum (PARTIAL_SUM + CARRY).
EXAMPLE #1:
CARRY= 0010.1100010
------------
A=101.11010011
B=001.01100110 CARRY= 0010.1100010
C=000.01100000 PARTIAL_SUM= 100.11010101
------------ -------------
PARTIAL_SUM= 100.11010101 TOTAL= 0111.10011001
Note that:
PARTIAL_SUM = (A ^ B ^ C)
CARRY = (A & B) | (A & C) | (B & C)
^ represents Exclusive-OR
& represents AND
| represents OR
Thus both PARTIAL_SUM and CARRY are calculated quickly using nothing
but bitwise logical operations.
The one drawback to Carry-Save addition is if we want to know the true
sum of (A + B + C) we still have to add (PARTIAL_SUM + CARRY), and this
can take time. The classic method of adding two numbers is to start
with the right most column and add it up, placing the partial sum for
that column below and any overflow as carry digits above the next
column to the left. Then move left one column and repeat. Continue in
this manner, adding one column at a time, until all the columns have
been added and the calculation is complete. This method of addition is
slow since each number may have 64 bits, and we must loop through all
of the columns, working on only one column at a time, propagating the
overflow on to the next column as carry bits.
However, it may be that we don't need an exact answer but that an
approximate answer will do fine. We can get an approximate answer by
adding just the first few columns and ignoring the rest. For Example,
we can calculate an approximate answer to example #1 above
by considering just the first 7 columns:
CARRY= 0010.110
PARTIAL_SUM= 100.110
-------------
approx total= 0111.100
Here our total is approximate, but it is very close to the actual
answer. In this case the total dropped down from the true answer to
the first integral multiple of 0.001 below the true answer.
Consider now example #2 below which shows a slightly different
PARTIAL_SUM and CARRY. Notice the actual sum is identical to the sum
we had before, but the approximate total obtained by adding the first
seven bits only is lower this time:
EXAMPLE #2:
CARRY= 0010.1011101 CARRY= 0010.101
PARTIAL_SUM= 100.11011111 PARTIAL_SUM= 100.110
------------- -------------
actual total= 0111.10011001 approx total= 0111.011
This time our approximate total dropped down to the second integral
multiple of 0.001 rather than the first integral multiple as it did
before. The difference is in example #1 the sum of the truncated parts
is less than 0.001, whereas in example #2 the sum of the truncated
parts is greater than 0.001:
Example #1: Example #2:
|truncated |truncated
|part |part
| |
CARRY= . 0010 CARRY= . 1101
PARTIAL_SUM= . 10101 PARTIAL_SUM= . 01111
------------- -------------
TOTAL= .___11001 TOTAL= .__101001
Mathematically our approximate total is equal to the true total minus
the sum of the truncated parts:
approx total = total - (sum of truncated parts)
Consider now a worst case scenario in which all the truncated bits are
1's. Convince yourself that even in this case the sum of the truncated
parts will always be less than (2 x 0.001). Consider now the other
extreme scenario where all the truncated bits are 0's. Convince
yourself that in this trivial case the sum of the truncated parts is 0.
We have thus determined the bounds on our approximate total:
total >= approx total > total - (2 x 0.001)
Our approximate total will always be equal to or lower than the true
answer, and the error will always be less than (2 x 0.001). This is a
crucial result so make sure you understand it.