The Pentium division Flaw - Chapter 4

Chapter 3          Intel Inside          Chapter 5
index
Back to Deley's Homepage



CHAPTER 4:    EXAMPLE OF BUG

And now for an example where we reproduce the Pentium bug. In the Microsoft windows calculator try 5506153/294911

     Correct: 18.67055823621364
     Pentium: 18.66990719233938

If you get the correct answer then either you have the newer Pentium chip with the bug corrected, or you're using a calculator program which is "Pentium aware". "Pentium aware" software looks for and catches any incorrect division result caused by the bug and recalculates the result using some other method so you still get the right answer.

To demonstrate the division here we must first convert our decimal divisor and dividend to binary numbers. This can easily be done using the Microsoft windows calculator with View set to Scientific. Enter 5506153, then click on BIN to switch to binary mode. The result is:

      5506153 = 10101000000010001101001.0

              = 1.01010000000100011010010  *  2^22


We set the exponent to 222 so the radix point falls right after the leading 1. This gives us our normalized mantissa for the dividend. The divisor, 294911, is similarly converted:

       294911 = 1000111111111111111.00000

              = 1.00011111111111111100000  *  2^18


Our division becomes:

                1.01010000000100011010010  *  2^22
     quotient = -----------------------------------
                1.00011111111111111100000  *  2^18


                1.01010000000100011010010
              = -------------------------  *  2^(22-18)
                1.00011111111111111100000


We will perform the division on the mantissa values, and then multiply the result by 2(22-18) = 24 = 16 at the very end to get our quotient answer.

To perform the division we will be calculating:

      R[j+1] = 4 * (R[j] - qD)

which we will rewrite as:

      R[j+1] = 4 * (R[j] + -qD)

The steps are:
   a:  determine q from our table given R[j] and D
   b:  calculate -qD
   c:  add (R[j] + -qD)
   d:  multiply by 4 (move binary point right by two bits) to get R[j+1]

Note how easy it is to calculate -qD when the possible values of q are (2,1,0,-1,-2}. To multiply by 2 we just shift the bits left one position. To negate the value we just flip the bits and add one to the lowest bit, except we don't add that lowest bit to -qD, because that would take time to perform the addition, instead we place it add it in slightly later when it's more convenient.

Nomenclature:
  ps  - partial sum
  pps - previous partial sum
  c   - carry
  pc  - previous carry
  D   - Divisor
  D~  - Approximate Divisor.  In our example below this is 1.0001  The
        divisor truncated after the fourth bit following the radix
        point.
  R~  - Approximate Remainder.
  _                                              _          _
  2   - A bar over a number indicates negation.  1 = -1 and 2 = -2

Our initial conditions are:

  Divisor (D)
       = 0001.000111111111111111000000000000000000000000000000000000000000

  Dividend (initial remainder R)
       = 0001.010100000001000110100100000000000000000000000000000000000000

ITERATION #1:

1a: Choose next quotient digit q.
Given R~=0001.010 (and D~=1.0001), examine the lookup table to find the cell corresponding to these values. D~ runs across the bottom of the table, and R~ runs up and down the side of the table. The lookup table says next q digit is 1.
      q digits (base 4): 1. = 1.0 (decimal)

1b:Calculate -qD
q=1, so this is just -D. Negation is performed by flipping the bits and adding 1 to the lowest bit. However, instead of adding that 1 to the lowest bit of -qD as would normally be done, we instead add it to the carry (c) result. Since we know the lowest bit of the carry (c) result will always be clear (since there's no lower bits to set it), adding that extra low bit here means just setting the bit, whereas adding that extra low bit to -qD, as would normally be done, could take time, requiring many clock cycles to figure out the new -qD after adding in that lowest bit.
   -qD = 1110.111000000000000000111111111111111111111111111111111111111111

1c:Add (R + -qD)
R for this step is our initial Dividend, and we just calculated -qD. Here we start using our Carry-Save format. We add the bits, putting the sum bits in partial_sum (ps) below, and saving the carry bits in carry (c) above. Our answer, the new remainder, can be obtained by adding ps + c, but that would take time, so for now we'll leave the two values separate.

Note also we added to carry (c) the lowest bit leftover from step 1b when we negated D. This will only be necessary when calculating -qD involves negation, which occurs only when q=1 or q=2. Note that this carry bit, being the lowest one, will always be clear, so there is no time wasted adding the bit here.
                                                       lowest bit added--+
                                                       from step 1b      v
     c = 0000.100000000000000001001000000000000000000000000000000000000001
         -----------------------------------------------------------------
     R = 0001.010100000001000110100100000000000000000000000000000000000000
   -qD = 1110.111000000000000000111111111111111111111111111111111111111111
         -----------------------------------------------------------------
    ps = 1111.101100000001000110011011111111111111111111111111111111111111
1d:Multiply by 4. We do this by shifting the bits left two places.
  c*4  = 0010.000000000000000100100000000000000000000000000000000000000100
  ps*4 = 1110.110000000100011001101111111111111111111111111111111111111100

We have now calculated R[1] = 4*(R[0] - qD) as intended. This ends the first iteration.




ITERATION #2:

2a: Choose next quotient digit q.
Let us call c*4 from the previous iteration 'pc' for previous_carry, and let us call ps*4 from the previous iteration 'pps' for previous_partial_sum. Our new remainder R[1] at this moment is pps + pc. To save time the Pentium adds only the first eight bits of pps and pc, obtaining an approximate value R~[1] for the current remainder which is good enough for using with the lookup table to determine what next quotient digit to choose.
    pc = 0010.000 000000000000100100000000000000000000000000000000000000100
   pps = 1110.110 000000100011001101111111111111111111111111111111111111100
         --------
    R ~= 0000.110
    Note the R~ value obtained here will be either the first or the second integral multiple of 0.001 below the true value R. Because R~ might be the second integral multiple of 0.001 below the true value R, it is important that the actual staircase divisions within the lookup table be chosen so the q value chosen in this case is still an acceptable q value. Thus every cell in the P-D plot given in chapter 3 must return a value suitable also for the cell directly above it.

Now examine the lookup table and find the cell corresponding to divisor D=1.001 and remainder R=0.110 From the table our next quotient digit q is 1.
      q digits (base 4): 1.1 = 1.25 (decimal)
2b:Calculate -qD. q=1, so -qD is the same as in iteration #1. Again, as explained in iteration #1, since we perform a negation here we also set the lowest bit of the carry (c) result.
   -qD = 1110.111000000000000000111111111111111111111111111111111111111111
2c:Add (R + -qD). We have -qD from previous step. R is pps + pc.
We create new partial sum (ps) and new carry (c).
                                                       lowest bit added--+
                                                       from step 2b      v
     c = 0000.100000000000000001001000000000000000000000000000000000000001
         -----------------------------------------------------------------
    pc = 0010.000000000000000100100000000000000000000000000000000000000100
   pps = 1110.110000000100011001101111111111111111111111111111111111111100
   -qD = 1110.111000000000000000111111111111111111111111111111111111111111
         -----------------------------------------------------------------
    ps = 0010.001000000100011101110000000000000000000000000000000000000111
2d:Multiply by 4 (shift the bits left two places).
   c*4 = 0110.000000000000000101111111111111111111111111111111111111100100
  ps*4 = 1000.100000010001110111000000000000000000000000000000000000011100



ITERATION #3:

The steps are the same from here on out.
    pc  = c*4 from previous iteration.
    pps = pps*4 from previous iteration.
3a: Add first 8 bits of pc and pps to get R~:
    pc = 0110.000 000000000000101111111111111111111111111111111111111100100
   pps = 1000.100 000010001110111000000000000000000000000000000000000011100
         --------
    R ~= 1110.100
    Given R~=1110.100 (and D~=1.0001), lookup table says next q digit is -1.

               (40) ones-+ +--(1/4)  (4-1)
                         | |+-(1/16) (4-2)
                            _
      q digits (base 4): 1.11        = 1.1875 (decimal)

3b:Calculate -qD.  q=-1, so this is just D, and there is no low bit to add to carry (c) since there was no negation.
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
3c:Add (R + -qD).  (R = pps + pc)
     c = 0000.00000010001110111000000000000000000000000000000000000000100
         -----------------------------------------------------------------
    pc = 0110.000000000000000101111111111111111111111111111111111111100100
   pps = 1000.100000010001110111000000000000000000000000000000000000011100
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1111.100111101110001101111111111111111111111111111111111111111000




ITERATION #4:

4a: Choose next quotient digit q.
Add first 8 bits of pc and pps to get R~:
    pc = 0000.000 010001110111000000000000000000000000000000000000000100000
   pps = 1110.011 110111000110111111111111111111111111111111111111111100000
         --------
    R ~= 1110.011
    Given R~=1110.011 (and D~=1.0001), lookup table says next q digit is -1.
               (40) ones-+ +---(1/4)  (4-1)
                         | |+--(1/16) (4-2)
                         | ||+-(1/64) (4-3)
                            __
      q digits (base 4): 1.111       = 1.171875 (decimal)
4b:Calculate -qD. q=-1, so this is just D, and there is no low bit to add to carry (c) since there was no negation.
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
4c:Add (R + -qD).  (R = pps + pc)
     c = 0000.00110111110111111000000000000000000000000000000000000100000
         -----------------------------------------------------------------
    pc = 0000.000010001110111000000000000000000000000000000000000000100000
   pps = 1110.011110111000110111111111111111111111111111111111111111100000
    -qD= 0001.000111111111111111000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1111.011011001001110000111111111111111111111111111111111111000000
4d:Multiply by 4 (shift the bits left two places)
   c*4 = 0000.110111110111111000000000000000000000000000000000000100000000
  ps*4 = 1101.101100100111000011111111111111111111111111111111111100000000




ITERATION #5:

5a: Add first 8 bits of pc and pps to get R~
    pc = 0000.110 111110111111000000000000000000000000000000000000100000000
   pps = 1101.101 100100111000011111111111111111111111111111111111100000000
         --------
    R ~= 1110.011
    Given R~=1110.011 (and D~=1.0001), lookup table says next q digit is -1.
               (40) ones-+ +----(1/4)   (4-1)
                         | |+---(1/16)  (4-2)
                         | ||+--(1/64)  (4-3)
                         | |||+-(1/256) (4-4)
                            ___
      q digits (base 4): 1.1111       = 1.16796875 (decimal)
5b:Calculate -qD:
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         (It's just D)
5c:Add (R + -qD)
R = pps + pc, so this becomes (pc + pps + -qD), which we add using Carry-Save format, creating a partial sum (ps) and a carry (c):
     c = 0011.00111110111111011000000000000000000000000000000000100000000
         -----------------------------------------------------------------
    pc = 0000.110111110111111000000000000000000000000000000000000100000000
   pps = 1101.101100100111000011111111111111111111111111111111111100000000
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1100.011100101111000100111111111111111111111111111111111000000000
5d:Multiply by 4 (shift the bits left two places)
Multiply partial sum (ps) and carry (c) by 4 (there is no lb to add this time):
   c*4 = 1100.111110111111011000000000000000000000000000000000100000000000
  ps*4 = 0001.110010111100010011111111111111111111111111111111100000000000




ITERATION #6:

6a: Add first 8 bits of pc and pps to get R~
    pc = 1100.111 110111111011000000000000000000000000000000000100000000000
   pps = 0001.110 010111100010011111111111111111111111111111111100000000000
         --------
    R ~= 1110.101
    Given R~=1110.101 (and D=1.0001), lookup table says next q digit is -1.
               (40) ones-+ +-----(1/4)    (4-1)
                         | |+----(1/16)   (4-2)
                         | ||+---(1/64)   (4-3)
                         | |||+--(1/256)  (4-4)
                         | ||||+-(1/1024) (4-5)
                            ____
      q digits (base 4): 1.11111       = 1.1669921875 (decimal)
6b:Calculate -qD:
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         (It's just D, and there is no low bit to add in later.)
6c:Add pc + pps + -qD, creating partial sum ps and carry c:
     c = 0011.10110111111011011000000000000000000000000000000100000000000
         -----------------------------------------------------------------
    pc = 1100.111110111111011000000000000000000000000000000000100000000000
   pps = 0001.110010111100010011111111111111111111111111111111100000000000
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1100.001011111100110100111111111111111111111111111111000000000000
6d:Multiply psum and carry by 4 (there is no lb to add this time)
   c*4 = 1110.110111111011011000000000000000000000000000000100000000000000
  ps*4 = 0000.101111110011010011111111111111111111111111111100000000000000




ITERATION #7:

7a: Add first 8 bits of pc and pps to get R~
    pc = 1110.110 111111011011000000000000000000000000000000100000000000000
   pps = 0000.101 111110011010011111111111111111111111111111100000000000000
         --------
    R ~= 1111.011
    Given R~=1111.011 (and D~=1.0001), lookup table says next q digit is -1
                            _____
      q digits (base 4): 1.111111   = 1.166748046875 (decimal)
7b:Calculate -qD:
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         (It's just D, and there is no low bit lb to add in later.)
7c:Add prev carry, prev psum, and -qD, creating psum and carry:
     c = 0001.00111111011011011000000000000000000000000000100000000000000
         -----------------------------------------------------------------
    pc = 1110.110111111011011000000000000000000000000000000100000000000000
   pps = 0000.101111110011010011111111111111111111111111111100000000000000
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1111.011111110111110100111111111111111111111111111000000000000000
7d:Multiply psum and carry by 4 (there is no low bit lb to add this time)
   c*4 = 0100.111111011011011000000000000000000000000000100000000000000000
  ps*4 = 1101.111111011111010011111111111111111111111111100000000000000000




ITERATION #8:

8a: Add first 8 bits of pc and pps to get R~
    pc = 0100.111 111011011011000000000000000000000000000100000000000000000
   pss = 1101.111 111011111010011111111111111111111111111100000000000000000
         --------
    R ~= 0010.110
    Given R~=0010.110 (and D~=1.0001), lookup table says next q digit is 2
                            _____
      q digits (base 4): 1.1111112  = 1.1668701171875 (decimal)
8b:Calculate -qD:
   -qD = 1101.110000000000000001111111111111111111111111111111111111111111
8c:Add prev carry, prev psum, and -qD, creating psum and carry:
     c = 1011.11111011011010001111111111111111111111111100000000000000000
         -----------------------------------------------------------------
    pc = 0100.111111011011011000000000000000000000000000100000000000000000
   pps = 1101.111111011111010011111111111111111111111111100000000000000000
   -qD = 1101.110000000000000001111111111111111111111111111111111111111111
         -----------------------------------------------------------------
    ps = 0100.110000000100001010000000000000000000000000111111111111111111
8d:Multiply psum and carry by 4 (there is no low bit lb to add this time)
   c*4 = 1111.111011011010001111111111111111111111111100000000000000000100
  ps*4 = 0011.000000010000101000000000000000000000000011111111111111111100




ITERATION #9: (HITS THE ERROR CELL)

9a: Add first 8 bits of pc and pps to get R~
    pc = 1111.111 011011010001111111111111111111111111100000000000000000100
   pss = 0011.000 000010000101000000000000000000000000011111111111111111100
         --------
    R ~= 0010.111
    Given R~=0010.111 (and Divisor~=1.0001), the next quotient digit should be 2, but lookup table hits error cell, incorrectly specifying a quotient digit of 0 instead. There is no recovery from this point on. The resulting quotient value for the division will be less than it should be. This is the Pentium bug.
                            _____
      q digits (base 4): 1.11111120 = 1.1668701171875 (decimal)
                                  ^-should be 2
9b:Calculate -qD:
   -qD = 0000.000000000000000000000000000000000000000000000000000000000000
         (since q=0, -qD=0)
9c:Add prev carry, prev psum, and -qD, creating psum and carry:
     c = 0110.00000010000001000000000000000000000000000000000000000000100
         -----------------------------------------------------------------
    pc = 1111.111011011010001111111111111111111111111100000000000000000100
   pps = 0011.000000010000101000000000000000000000000011111111111111111100
   -qD = 0000.000000000000000000000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1100.111011001010100111111111111111111111111111111111111111111000
9d:Multiply psum and carry by 4 (there is no low bit lb to add this time)
   c*4 = 1000.000010000001000000000000000000000000000000000000000000100000
  ps*4 = 0011.101100101010011111111111111111111111111111111111111111100000




ITERATION #10:

10a: Add first 8 bits of pc and pps to get R~
    pc = 1000.000 010000001000000000000000000000000000000000000000000100000
   pps = 0011.101 100101010011111111111111111111111111111111111111111100000
         --------
    R ~= 1011.101
    Given R~=1011.101 (and D~=1.0001), lookup table says next q digit is out of bounds, below the -8/3D line. Experimental results suggests the Pentium returns q=0 when R~ is out of bounds.
                            _____
      q digits (base 4): 1.111111200 = 1.1668701171875 (decimal)
10b:Calculate -qD:
   -qD = 0000.000000000000000000000000000000000000000000000000000000000000
         (since q=0, -qD=0)
10c:Add prev carry, prev psum, and -qD, creating psum and carry:
     c = 0000.00000000000000000000000000000000000000000000000000000100000
         -----------------------------------------------------------------
    pc = 1000.000010000001000000000000000000000000000000000000000000100000
   pps = 0011.101100101010011111111111111111111111111111111111111111100000
   -qD = 0000.000000000000000000000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1011.101110101011011111111111111111111111111111111111111111000000
10d:Multiply psum and carry by 4 (there is no low bit lb to add this time)
   c*4 = 0000.000000000000000000000000000000000000000000000000000100000000
  ps*4 = 1110.111010101101111111111111111111111111111111111111111100000000




ITERATION #11:

11a: Add first 8 bits of pc and pps to get R~
    pc = 0000.000 000000000000000000000000000000000000000000000000100000000
   pps = 1110.111 010101101111111111111111111111111111111111111111100000000
         --------
    R ~= 1110.111
Given R~=1110.111 (and D~=1.0001), lookup table says next q digit is -1
                            _____   _
      q digits (base 4): 1.1111112001 = 1.1668691635131836 (decimal)
                                  |
                                  +--error cell hit.  All further q
                                     digits are wrong.
11b:Calculate -qD:
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
11c:Add prev carry, prev psum, and -qD, creating psum and carry:
     c = 0000.00010101101111111000000000000000000000000000000000100000000
         -----------------------------------------------------------------
    pc = 0000.000000000000000000000000000000000000000000000000000100000000
   pps = 1110.111010101101111111111111111111111111111111111111111100000000
   -qD = 0001.000111111111111111000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1111.111101010010000000111111111111111111111111111111111000000000
11d:Multiply psum and carry by 4
   c*4 = 0000.010101101111111000000000000000000000000000000000100000000000
  ps*4 = 1111.110101001000000011111111111111111111111111111111100000000000




ITERATION #12:

12a: Add first 8 bits of pc and pps to get R~
    pc = 0000.010 101101111111000000000000000000000000000000000100000000000
   pps = 1111.110 101001000000011111111111111111111111111111111100000000000
         --------
    R ~= 0000.000
Given R~=0000.000 (and D~=1.0001), lookup table says next q digit is 0
                            _____   _
      q digits (base 4): 1.11111120010 = 1.1668691635131836 (decimal)
12b:Calculate -qD:
   -qD = 0000.000000000000000000000000000000000000000000000000000000000000
         (since q=0, -qD=0)
12c:Add prev carry, prev psum, and -qD, creating psum and carry:
     c = 0000.10101001000000000000000000000000000000000000000100000000000
         -----------------------------------------------------------------
    pc = 0000.010101101111111000000000000000000000000000000000100000000000
   pps = 1111.110101001000000011111111111111111111111111111111100000000000
   -qD = 0000.000000000000000000000000000000000000000000000000000000000000
         -----------------------------------------------------------------
    ps = 1111.100000100111111011111111111111111111111111111111000000000000
12d:Multiply psum and carry by 4
   c*4 = 0010.101001000000000000000000000000000000000000000100000000000000
  ps*4 = 1110.000010011111101111111111111111111111111111111100000000000000




The remaining iterations continue in a similar manner. A total of 28 iterations are required for a double precision answer. The final quotient digits are:

               _____   _  _ ___  _   __
  q digits: 1.111111200101221122211001110 = 1.1668691995212115 (decimal)
            (base 4)

We now break the q digits into positive q digits and negative q digits, and subtract.
quotient = (pos q digits) - (neg q digits)

                     _ _ _ _ _       _     _   _ _ _     _       _ _
      q digits: 1. 1 1 1 1 1 1 2 0 0 1 0 1 2 2 1 1 2 2 2 1 1 0 0 1 1 1 0

  pos q digits: 1.010000000000100000000001001000000010100001000000000100
  neg q digits: 0.000101010101000000010000100001011000000100000001010000
                --------------------------------------------------------
      quotient: 1.001010101011011111110000100110101010011100111110110100


Finally, we multiply our quotient value here by 2^(22-18) to obtain our final answer to the original problem (this is equivalent to just moving the radix point over 4 places):

       answer = 10010.10101011011111110000100110101010011100111110110100

Converting our binary answer to decimal we have:

       answer = 18.66990719233938 (decimal)

This is exactly the answer the defective Pentium gives.

[See Tim Coe Source Code for a program written in C which emulates the Pentium Division Flaw.]




Chapter 3          Intel Inside          Chapter 5
index
Back to Deley's Homepage