The Pentium division Flaw - Chapter 4
And now for an example where we reproduce the Pentium bug. In the
Microsoft windows calculator try 5506153/294911
Correct: 18.67055823621364
Pentium: 18.66990719233938
If you get the correct answer then either you have the newer Pentium
chip with the bug corrected, or you're using a calculator program which
is "Pentium aware". "Pentium aware" software looks for and catches any
incorrect division result caused by the bug and recalculates the result
using some other method so you still get the right answer.
To demonstrate the division here we must first convert our decimal
divisor and dividend to binary numbers. This can easily be done
using the Microsoft windows calculator with View set to Scientific.
Enter 5506153, then click on BIN to switch to binary mode. The result
is:
5506153 = 10101000000010001101001.0
= 1.01010000000100011010010 * 2^22
We set the exponent to 222 so the radix point falls right after the
leading 1. This gives us our normalized mantissa for the dividend.
The divisor, 294911, is similarly converted:
294911 = 1000111111111111111.00000
= 1.00011111111111111100000 * 2^18
Our division becomes:
1.01010000000100011010010 * 2^22
quotient = -----------------------------------
1.00011111111111111100000 * 2^18
1.01010000000100011010010
= ------------------------- * 2^(22-18)
1.00011111111111111100000
We will perform the division on the mantissa values, and then multiply
the result by 2(22-18) = 24 = 16 at the very end to get our
quotient answer.
To perform the division we will be calculating:
R[j+1] = 4 * (R[j] - qD)
which we will rewrite as:
R[j+1] = 4 * (R[j] + -qD)
The steps are:
a: determine q from our table given R[j] and D
b: calculate -qD
c: add (R[j] + -qD)
d: multiply by 4 (move binary point right by two bits) to get R[j+1]
Note how easy it is to calculate -qD when the possible values of q are
(2,1,0,-1,-2}. To multiply by 2 we just shift the bits left one
position. To negate the value we just flip the bits and add one to the
lowest bit, except we don't add that lowest bit to -qD, because that
would take time to perform the addition, instead we place it
add it in slightly later when it's more convenient.
Nomenclature:
ps - partial sum
pps - previous partial sum
c - carry
pc - previous carry
D - Divisor
D~ - Approximate Divisor. In our example below this is 1.0001 The
divisor truncated after the fourth bit following the radix
point.
R~ - Approximate Remainder.
_ _ _
2 - A bar over a number indicates negation. 1 = -1 and 2 = -2
Our initial conditions are:
Divisor (D)
= 0001.000111111111111111000000000000000000000000000000000000000000
Dividend (initial remainder R)
= 0001.010100000001000110100100000000000000000000000000000000000000
ITERATION #1:
1a: | Choose next quotient digit q. |
| Given R~=0001.010 (and D~=1.0001), examine the lookup table to find
the cell corresponding to these values. D~ runs across the bottom
of the table, and R~ runs up and down the side of the table. The
lookup table says next q digit is 1. |
q digits (base 4): 1. = 1.0 (decimal)
1b: | Calculate -qD |
| q=1, so this is just -D. Negation is performed by flipping the
bits and adding 1 to the lowest bit. However, instead of adding
that 1 to the lowest bit of -qD as would normally be done, we
instead add it to the carry (c) result. Since we know the lowest
bit of the carry (c) result will always be clear (since there's no
lower bits to set it), adding that extra low bit here means just
setting the bit, whereas adding that extra low bit to -qD, as would
normally be done, could take time, requiring many clock cycles to
figure out the new -qD after adding in that lowest bit. |
-qD = 1110.111000000000000000111111111111111111111111111111111111111111
1c: | Add (R + -qD) |
| R for this step is our initial Dividend, and we just calculated
-qD. Here we start using our Carry-Save format. We add the bits,
putting the sum bits in partial_sum (ps) below, and saving the
carry bits in carry (c) above. Our answer, the new remainder, can
be obtained by adding ps + c, but that would take time, so for now
we'll leave the two values separate. |
|
| Note also we added to carry (c) the lowest bit leftover from step
1b when we negated D. This will only be necessary when calculating
-qD involves negation, which occurs only when q=1 or q=2. Note that
this carry bit, being the lowest one, will always be clear, so there
is no time wasted adding the bit here. |
lowest bit added--+
from step 1b v
c = 0000.100000000000000001001000000000000000000000000000000000000001
-----------------------------------------------------------------
R = 0001.010100000001000110100100000000000000000000000000000000000000
-qD = 1110.111000000000000000111111111111111111111111111111111111111111
-----------------------------------------------------------------
ps = 1111.101100000001000110011011111111111111111111111111111111111111
1d: | Multiply by 4. We do this by shifting the bits left two places. |
c*4 = 0010.000000000000000100100000000000000000000000000000000000000100
ps*4 = 1110.110000000100011001101111111111111111111111111111111111111100
We have now calculated R[1] = 4*(R[0] - qD) as intended. This ends
the first iteration.
ITERATION #2:
2a: | Choose next quotient digit q. |
| Let us call c*4 from the previous iteration 'pc' for
previous_carry, and let us call ps*4 from the previous iteration
'pps' for previous_partial_sum. Our new remainder R[1] at this
moment is pps + pc. To save time the Pentium adds only the first
eight bits of pps and pc, obtaining an approximate value R~[1] for
the current remainder which is good enough for using with the
lookup table to determine what next quotient digit to choose. |
pc = 0010.000 000000000000100100000000000000000000000000000000000000100
pps = 1110.110 000000100011001101111111111111111111111111111111111111100
--------
R ~= 0000.110
| Note the R~ value obtained here will be either the first or the
second integral multiple of 0.001 below the true value R. Because
R~ might be the second integral multiple of 0.001 below the true
value R, it is important that the actual staircase divisions within
the lookup table be chosen so the q value chosen in this case is
still an acceptable q value. Thus every cell
in the P-D plot given in chapter 3 must return a value suitable also for
the cell directly above it.
|
|
| Now examine the lookup table and find the cell corresponding to
divisor D=1.001 and remainder R=0.110 From the table our next
quotient digit q is 1. |
q digits (base 4): 1.1 = 1.25 (decimal)
2b: | Calculate -qD. q=1, so -qD is the same as in iteration #1. Again,
as explained in iteration #1, since we perform a negation here we
also set the lowest bit of the carry (c) result. |
-qD = 1110.111000000000000000111111111111111111111111111111111111111111
2c: | Add (R + -qD). We have -qD from previous step. R is pps + pc.
|
| We create new partial sum (ps) and new carry (c). |
lowest bit added--+
from step 2b v
c = 0000.100000000000000001001000000000000000000000000000000000000001
-----------------------------------------------------------------
pc = 0010.000000000000000100100000000000000000000000000000000000000100
pps = 1110.110000000100011001101111111111111111111111111111111111111100
-qD = 1110.111000000000000000111111111111111111111111111111111111111111
-----------------------------------------------------------------
ps = 0010.001000000100011101110000000000000000000000000000000000000111
2d: | Multiply by 4 (shift the bits left two places). |
c*4 = 0110.000000000000000101111111111111111111111111111111111111100100
ps*4 = 1000.100000010001110111000000000000000000000000000000000000011100
ITERATION #3:
The steps are the same from here on out.
pc = c*4 from previous iteration.
pps = pps*4 from previous iteration.
3a: | Add first 8 bits of pc and pps to get R~: |
pc = 0110.000 000000000000101111111111111111111111111111111111111100100
pps = 1000.100 000010001110111000000000000000000000000000000000000011100
--------
R ~= 1110.100
| Given R~=1110.100 (and D~=1.0001), lookup table says next q digit is -1. |
(40) ones-+ +--(1/4) (4-1)
| |+-(1/16) (4-2)
_
q digits (base 4): 1.11 = 1.1875 (decimal)
3b: | Calculate -qD. q=-1, so this is just D, and there is no low bit
to add to carry (c) since there was no negation. |
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
3c: | Add (R + -qD). (R = pps + pc) |
c = 0000.00000010001110111000000000000000000000000000000000000000100
-----------------------------------------------------------------
pc = 0110.000000000000000101111111111111111111111111111111111111100100
pps = 1000.100000010001110111000000000000000000000000000000000000011100
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1111.100111101110001101111111111111111111111111111111111111111000
ITERATION #4:
4a: | Choose next quotient digit q. |
| Add first 8 bits of pc and pps to get R~: |
pc = 0000.000 010001110111000000000000000000000000000000000000000100000
pps = 1110.011 110111000110111111111111111111111111111111111111111100000
--------
R ~= 1110.011
| Given R~=1110.011 (and D~=1.0001), lookup table says next q digit is -1. |
(40) ones-+ +---(1/4) (4-1)
| |+--(1/16) (4-2)
| ||+-(1/64) (4-3)
__
q digits (base 4): 1.111 = 1.171875 (decimal)
4b: | Calculate -qD. q=-1, so this is just D, and there is no low bit
to add to carry (c) since there was no negation. |
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
4c: | Add (R + -qD). (R = pps + pc) |
c = 0000.00110111110111111000000000000000000000000000000000000100000
-----------------------------------------------------------------
pc = 0000.000010001110111000000000000000000000000000000000000000100000
pps = 1110.011110111000110111111111111111111111111111111111111111100000
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1111.011011001001110000111111111111111111111111111111111111000000
4d: | Multiply by 4 (shift the bits left two places) |
c*4 = 0000.110111110111111000000000000000000000000000000000000100000000
ps*4 = 1101.101100100111000011111111111111111111111111111111111100000000
ITERATION #5:
5a: | Add first 8 bits of pc and pps to get R~ |
pc = 0000.110 111110111111000000000000000000000000000000000000100000000
pps = 1101.101 100100111000011111111111111111111111111111111111100000000
--------
R ~= 1110.011
| Given R~=1110.011 (and D~=1.0001), lookup table says next q digit is -1. |
(40) ones-+ +----(1/4) (4-1)
| |+---(1/16) (4-2)
| ||+--(1/64) (4-3)
| |||+-(1/256) (4-4)
___
q digits (base 4): 1.1111 = 1.16796875 (decimal)
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
(It's just D)
5c: | Add (R + -qD) |
| R = pps + pc, so this becomes (pc + pps + -qD), which we add using
Carry-Save format, creating a partial sum (ps) and a carry (c): |
c = 0011.00111110111111011000000000000000000000000000000000100000000
-----------------------------------------------------------------
pc = 0000.110111110111111000000000000000000000000000000000000100000000
pps = 1101.101100100111000011111111111111111111111111111111111100000000
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1100.011100101111000100111111111111111111111111111111111000000000
5d: | Multiply by 4 (shift the bits left two places) |
| Multiply partial sum (ps) and carry (c) by 4 (there is no lb to add this time): |
c*4 = 1100.111110111111011000000000000000000000000000000000100000000000
ps*4 = 0001.110010111100010011111111111111111111111111111111100000000000
ITERATION #6:
6a: | Add first 8 bits of pc and pps to get R~ |
pc = 1100.111 110111111011000000000000000000000000000000000100000000000
pps = 0001.110 010111100010011111111111111111111111111111111100000000000
--------
R ~= 1110.101
| Given R~=1110.101 (and D=1.0001), lookup table says next q
digit is -1. |
(40) ones-+ +-----(1/4) (4-1)
| |+----(1/16) (4-2)
| ||+---(1/64) (4-3)
| |||+--(1/256) (4-4)
| ||||+-(1/1024) (4-5)
____
q digits (base 4): 1.11111 = 1.1669921875 (decimal)
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
(It's just D, and there is no low bit to add in later.)
6c: | Add pc + pps + -qD, creating partial sum ps and carry c: |
c = 0011.10110111111011011000000000000000000000000000000100000000000
-----------------------------------------------------------------
pc = 1100.111110111111011000000000000000000000000000000000100000000000
pps = 0001.110010111100010011111111111111111111111111111111100000000000
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1100.001011111100110100111111111111111111111111111111000000000000
6d: | Multiply psum and carry by 4 (there is no lb to add this time) |
c*4 = 1110.110111111011011000000000000000000000000000000100000000000000
ps*4 = 0000.101111110011010011111111111111111111111111111100000000000000
ITERATION #7:
7a: | Add first 8 bits of pc and pps to get R~ |
pc = 1110.110 111111011011000000000000000000000000000000100000000000000
pps = 0000.101 111110011010011111111111111111111111111111100000000000000
--------
R ~= 1111.011
| Given R~=1111.011 (and D~=1.0001), lookup table says next q digit is -1 |
_____
q digits (base 4): 1.111111 = 1.166748046875 (decimal)
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
(It's just D, and there is no low bit lb to add in later.)
7c: | Add prev carry, prev psum, and -qD, creating psum and carry: |
c = 0001.00111111011011011000000000000000000000000000100000000000000
-----------------------------------------------------------------
pc = 1110.110111111011011000000000000000000000000000000100000000000000
pps = 0000.101111110011010011111111111111111111111111111100000000000000
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1111.011111110111110100111111111111111111111111111000000000000000
7d: | Multiply psum and carry by 4 (there is no low bit lb to add this time) |
c*4 = 0100.111111011011011000000000000000000000000000100000000000000000
ps*4 = 1101.111111011111010011111111111111111111111111100000000000000000
ITERATION #8:
8a: | Add first 8 bits of pc and pps to get R~ |
pc = 0100.111 111011011011000000000000000000000000000100000000000000000
pss = 1101.111 111011111010011111111111111111111111111100000000000000000
--------
R ~= 0010.110
| Given R~=0010.110 (and D~=1.0001), lookup table says next q digit is 2 |
_____
q digits (base 4): 1.1111112 = 1.1668701171875 (decimal)
-qD = 1101.110000000000000001111111111111111111111111111111111111111111
8c: | Add prev carry, prev psum, and -qD, creating psum and carry: |
c = 1011.11111011011010001111111111111111111111111100000000000000000
-----------------------------------------------------------------
pc = 0100.111111011011011000000000000000000000000000100000000000000000
pps = 1101.111111011111010011111111111111111111111111100000000000000000
-qD = 1101.110000000000000001111111111111111111111111111111111111111111
-----------------------------------------------------------------
ps = 0100.110000000100001010000000000000000000000000111111111111111111
8d: | Multiply psum and carry by 4 (there is no low bit lb to add this time) |
c*4 = 1111.111011011010001111111111111111111111111100000000000000000100
ps*4 = 0011.000000010000101000000000000000000000000011111111111111111100
ITERATION #9: (HITS THE ERROR CELL)
9a: | Add first 8 bits of pc and pps to get R~ |
pc = 1111.111 011011010001111111111111111111111111100000000000000000100
pss = 0011.000 000010000101000000000000000000000000011111111111111111100
--------
R ~= 0010.111
| Given R~=0010.111 (and Divisor~=1.0001), the next quotient digit
should be 2, but lookup table hits error cell, incorrectly
specifying a quotient digit of 0 instead. There is no recovery
from this point on. The resulting quotient value for the division
will be less than it should be. This is the Pentium bug. |
_____
q digits (base 4): 1.11111120 = 1.1668701171875 (decimal)
^-should be 2
-qD = 0000.000000000000000000000000000000000000000000000000000000000000
(since q=0, -qD=0)
9c: | Add prev carry, prev psum, and -qD, creating psum and carry: |
c = 0110.00000010000001000000000000000000000000000000000000000000100
-----------------------------------------------------------------
pc = 1111.111011011010001111111111111111111111111100000000000000000100
pps = 0011.000000010000101000000000000000000000000011111111111111111100
-qD = 0000.000000000000000000000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1100.111011001010100111111111111111111111111111111111111111111000
9d: | Multiply psum and carry by 4 (there is no low bit lb to add this time) |
c*4 = 1000.000010000001000000000000000000000000000000000000000000100000
ps*4 = 0011.101100101010011111111111111111111111111111111111111111100000
ITERATION #10:
10a: | Add first 8 bits of pc and pps to get R~ |
pc = 1000.000 010000001000000000000000000000000000000000000000000100000
pps = 0011.101 100101010011111111111111111111111111111111111111111100000
--------
R ~= 1011.101
| Given R~=1011.101 (and D~=1.0001), lookup table says next q digit is
out of bounds, below the -8/3D line. Experimental results suggests
the Pentium returns q=0 when R~ is out of bounds. |
_____
q digits (base 4): 1.111111200 = 1.1668701171875 (decimal)
-qD = 0000.000000000000000000000000000000000000000000000000000000000000
(since q=0, -qD=0)
10c: | Add prev carry, prev psum, and -qD, creating psum and carry: |
c = 0000.00000000000000000000000000000000000000000000000000000100000
-----------------------------------------------------------------
pc = 1000.000010000001000000000000000000000000000000000000000000100000
pps = 0011.101100101010011111111111111111111111111111111111111111100000
-qD = 0000.000000000000000000000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1011.101110101011011111111111111111111111111111111111111111000000
10d: | Multiply psum and carry by 4 (there is no low bit lb to add this time) |
c*4 = 0000.000000000000000000000000000000000000000000000000000100000000
ps*4 = 1110.111010101101111111111111111111111111111111111111111100000000
ITERATION #11:
11a: | Add first 8 bits of pc and pps to get R~ |
pc = 0000.000 000000000000000000000000000000000000000000000000100000000
pps = 1110.111 010101101111111111111111111111111111111111111111100000000
--------
R ~= 1110.111
| Given R~=1110.111 (and D~=1.0001), lookup table says next q digit is -1 |
_____ _
q digits (base 4): 1.1111112001 = 1.1668691635131836 (decimal)
|
+--error cell hit. All further q
digits are wrong.
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
11c: | Add prev carry, prev psum, and -qD, creating psum and carry: |
c = 0000.00010101101111111000000000000000000000000000000000100000000
-----------------------------------------------------------------
pc = 0000.000000000000000000000000000000000000000000000000000100000000
pps = 1110.111010101101111111111111111111111111111111111111111100000000
-qD = 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1111.111101010010000000111111111111111111111111111111111000000000
11d: | Multiply psum and carry by 4 |
c*4 = 0000.010101101111111000000000000000000000000000000000100000000000
ps*4 = 1111.110101001000000011111111111111111111111111111111100000000000
ITERATION #12:
12a: | Add first 8 bits of pc and pps to get R~ |
pc = 0000.010 101101111111000000000000000000000000000000000100000000000
pps = 1111.110 101001000000011111111111111111111111111111111100000000000
--------
R ~= 0000.000
| Given R~=0000.000 (and D~=1.0001), lookup table says next q digit is 0 |
_____ _
q digits (base 4): 1.11111120010 = 1.1668691635131836 (decimal)
-qD = 0000.000000000000000000000000000000000000000000000000000000000000
(since q=0, -qD=0)
12c: | Add prev carry, prev psum, and -qD, creating psum and carry: |
c = 0000.10101001000000000000000000000000000000000000000100000000000
-----------------------------------------------------------------
pc = 0000.010101101111111000000000000000000000000000000000100000000000
pps = 1111.110101001000000011111111111111111111111111111111100000000000
-qD = 0000.000000000000000000000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps = 1111.100000100111111011111111111111111111111111111111000000000000
12d: | Multiply psum and carry by 4 |
c*4 = 0010.101001000000000000000000000000000000000000000100000000000000
ps*4 = 1110.000010011111101111111111111111111111111111111100000000000000
The remaining iterations continue in a similar manner. A total of 28
iterations are required for a double precision answer. The final
quotient digits are:
_____ _ _ ___ _ __
q digits: 1.111111200101221122211001110 = 1.1668691995212115 (decimal)
(base 4)
We now break the q digits into positive q digits and negative q digits,
and subtract.
quotient = (pos q digits) - (neg q digits)
_ _ _ _ _ _ _ _ _ _ _ _ _
q digits: 1. 1 1 1 1 1 1 2 0 0 1 0 1 2 2 1 1 2 2 2 1 1 0 0 1 1 1 0
pos q digits: 1.010000000000100000000001001000000010100001000000000100
neg q digits: 0.000101010101000000010000100001011000000100000001010000
--------------------------------------------------------
quotient: 1.001010101011011111110000100110101010011100111110110100
Finally, we multiply our quotient value here by 2^(22-18) to obtain our
final answer to the original problem (this is equivalent to just moving
the radix point over 4 places):
answer = 10010.10101011011111110000100110101010011100111110110100
Converting our binary answer to decimal we have:
answer = 18.66990719233938 (decimal)
This is exactly the answer the defective Pentium gives.
[See Tim Coe Source Code for a program written in C which
emulates the Pentium Division Flaw.]