The following paper describes the details behind the floating point division bug in the hardware divide unit of Intel Corporation's Pentium(TM) processor. The Pentium is Intel's next generation of IBM-PC compatible microprocessors following the i486 CPU family. The original Pentium processor was introduced into the market in May of 1993, and an estimated two million were sold. A year later it was discovered that in certain very rare instances a division operation returned a result that was slightly incorrect.
Intel corrected the bug in June of 1994, and any new Pentium computer sold after January of 1995 probably has the newer corrected chip in it. Intel will also replace upon request any Pentium chip which has the floating point division flaw with a new one, just ask. I myself have never bothered replacing my flawed Pentium chip. The flaw is too insignificant to affect everyday users like myself. All my software runs just fine with no problems, and quite fast I might add. Actually that part about no problems is a lie -- I have endless problems with bugs in software for IBM-PC compatible computers, but none of the bugs are the result of the Pentium's floating point division flaw explained here.
For most people the flaw itself is just an entertaining curiosity since it's rare and the error in precision when it does occur is small. The flaw does arouse one's curiosity though, and it is quite interesting to study the details behind the flaw, as it brings together many aspects of computer science. Like any technical field, computer science can be made easy to understand, or it can be made difficult to understand. I hope most people find this paper easy to understand.
2.1 | EXAMPLE | |
2.2 | ABSTRACT DIVISON | |
2.3 | THE ITERATIVE FORMULA FOR DIVISON | |
2.4 | BOUNDS ON RESULTING REMAINDER | |
2.5 | LOOKUP TABLE | |
2.6 | NEGATIVE QUOTIENT DIGITS |
3.1 | THE SRT DIVISION ALGORITHM | |
3.2 | THE ITERATIVE FORMULA FOR DIVISION BASE 4 | |
3.3 | BOUNDS ON RESULTING REMAINDER | |
3.4 | THE PENTIUM LOOKUP TABLE (P-D PLOT) | |
3.5 | ITERATION USING THE P-D PLOT AS A LOOKUP TABLE |
CHAPTER 4 | EXAMPLE OF BUG | |
CHAPTER 5 | OBSERVATIONS ON HITTING THE ERROR CELL | |
CHAPTER 6 | TOPICS FOR FURTHER STUDY | |
CHAPTER 7 | HISTORY OF THE BUG'S DISCOVERY | |
CHAPTER 8 | HOW TO TEST FOR THE PENTIUM BUG | |
CHAPTER 9 | PENTIUM JOKES | |
REFERENCES |
Here we briefly review some simple concepts you probably already know.
(100) | ones | - | - | + | + | - | - | tenths | (10-1) | ||
(101) | tens | - | + | | | | | + | - | hundredths | (10-2) | ||
(102) | hundreds | + | | | | | | | | | + | thousandths | (10-3) | ||
| | | | | | | | | | | | ||||||
4 | 1 | 9 | . | 5 | 8 | 3 | |||||
^ | -decimal point |
(20) | ones | - | - | - | + | + | - | - | - | halves | (1/2) | ||
(21) | twos | - | - | + | | | | | + | - | - | fourths | (1/4) | ||
(22) | fours | - | + | | | | | | | | | + | - | eighths | (1/8) | ||
(24) | eights | + | | | | | | | | | | | | | + | sixteenths | (1/16) | ||
| | | | | | | | | | | | | | | | ||||||
1 | 0 | 1 | 1 | . | 1 | 0 | 1 | 1 | |||||
^ | -radix point |
Given an ordinary decimal number such as |
14.195835 |
we can move the decimal point to the right or left by multiplying or dividing by powers of ten. For example, to move the decimal point right two places, we multiply by 102 = 100 |
14.195835 * 100 = 1419.5835 |
Given an ordinary binary number such as |
1011.1011 |
we can move the radix point to the right or left by multiplying or dividing by powers of two. For example, to move the decimal point right two places, we multiply by 22 = 4 |
1011.1011 * 4 = 101110.11 |
Given an ordinary decimal number such as |
14.195835 |
we can move the decimal point so it lies directly after the first digit by multiplying or dividing by an appropriate power of 10. In our example: |
14.195835 | = | 1.4195835 | * | 101 | ||||
mantissa | exponent |
In this representation 1.4195835 is called the mantissa, and 101 is called the exponent. |
Given an ordinary binary number such as |
1011.1011 |
we can move the radix point so it lies directly after the first digit by multiplying or dividing by an appropriate power of 2. In our example: |
1011.1011 | = | 1.0111011 | * | 23 | ||||
mantissa | exponent |
In this representation 1.0111011 is called the mantissa, and 23 is called the exponent. |
dividend numerator 12 quotient = -------- = ----------- Example: 3 = -- divisor denominator 4 quotient 3 +--------- Example: +--- divisor | dividend 4 | 12
14.195835 ---------- 119.716320
1011.1011 --------- 11.001000
1.0111011 * 2^3 1.0111011 2^3 1.0111011 --------------- = --------- * --- = --------- * 2(3-1) 1.1001000 * 2^1 1.1001000 2^1 1.1001000
Thus the problem has been reduced to the division of two normalized numbers followed by a shifting of the radix point. |
The important point to note is that any division operation can be reduced to dividing normalized mantissas. The exponents and signs can be handled separately. |
The Pentium uses IEEE standard 594 for representing floating point numbers. In this standard a single precision floating point variable is represented using a 24 bit mantissa and an exponent which can take on values between +127 and -126.
The Pentium also uses two's compliment notation to represent negative numbers. A positive number is negated by complementing all the bits (one's complement, i.e. "flip the bits") and then adding 1 to the result.
Carry-Save addition is a method of quickly reducing the sum of three variables A, B, and C, down to the sum of two variables PARTIAL_SUM and CARRY, using only bitwise logical operations (AND, OR, Exclusive-OR). With Carry-Save addition we operate on all of the columns at once, placing the partial sum of that column at the bottom, and saving any overflow as a carry digit at the top. The bottom is our PARTIAL_SUM, and the carry digits at the top become our CARRY. The true sum of (A + B + C) is thus reduced to the sum (PARTIAL_SUM + CARRY).
CARRY= 0010.1100010 ------------ A=101.11010011 B=001.01100110 CARRY= 0010.1100010 C=000.01100000 PARTIAL_SUM= 100.11010101 ------------ ------------- PARTIAL_SUM= 100.11010101 TOTAL= 0111.10011001
PARTIAL_SUM = (A ^ B ^ C) CARRY = (A & B) | (A & C) | (B & C) ^ represents Exclusive-OR & represents AND | represents OR
Thus both PARTIAL_SUM and CARRY are calculated quickly using nothing but bitwise logical operations.
The one drawback to Carry-Save addition is if we want to know the true sum of (A + B + C) we still have to add (PARTIAL_SUM + CARRY), and this can take time. The classic method of adding two numbers is to start with the right most column and add it up, placing the partial sum for that column below and any overflow as carry digits above the next column to the left. Then move left one column and repeat. Continue in this manner, adding one column at a time, until all the columns have been added and the calculation is complete. This method of addition is slow since each number may have 64 bits, and we must loop through all of the columns, working on only one column at a time, propagating the overflow on to the next column as carry bits.
However, it may be that we don't need an exact answer but that an approximate answer will do fine. We can get an approximate answer by adding just the first few columns and ignoring the rest. For Example, we can calculate an approximate answer to example #1 above by considering just the first 7 columns:
CARRY= 0010.110 PARTIAL_SUM= 100.110 ------------- approx total= 0111.100
Here our total is approximate, but it is very close to the actual answer. In this case the total dropped down from the true answer to the first integral multiple of 0.001 below the true answer.
Consider now example #2 below which shows a slightly different PARTIAL_SUM and CARRY. Notice the actual sum is identical to the sum we had before, but the approximate total obtained by adding the first seven bits only is lower this time:
CARRY= 0010.1011101 CARRY= 0010.101 PARTIAL_SUM= 100.11011111 PARTIAL_SUM= 100.110 ------------- ------------- actual total= 0111.10011001 approx total= 0111.011
This time our approximate total dropped down to the second integral multiple of 0.001 rather than the first integral multiple as it did before. The difference is in example #1 the sum of the truncated parts is less than 0.001, whereas in example #2 the sum of the truncated parts is greater than 0.001:
Example #1: Example #2: |truncated |truncated |part |part | | CARRY= . 0010 CARRY= . 1101 PARTIAL_SUM= . 10101 PARTIAL_SUM= . 01111 ------------- ------------- TOTAL= .___11001 TOTAL= .__101001
Mathematically our approximate total is equal to the true total minus the sum of the truncated parts:
approx total = total - (sum of truncated parts)
Consider now a worst case scenario in which all the truncated bits are 1's. Convince yourself that even in this case the sum of the truncated parts will always be less than (2 x 0.001). Consider now the other extreme scenario where all the truncated bits are 0's. Convince yourself that in this trivial case the sum of the truncated parts is 0.
We have thus determined the bounds on our approximate total:
total >= approx total > total - (2 x 0.001)
Our approximate total will always be equal to or lower than the true answer, and the error will always be less than (2 x 0.001). This is a crucial result so make sure you understand it.
<-- quotient +--------- 3 | 7.203125 <-- Dividend (starting Remainder R) | Divisor (D)
+-- quotient digit q 2 +--------- 3 | 7.203125
2 +--------- 3 | 7.203125 <-- Remainder R 6 <-- (2 * 3 = 6) --------- 1.203125 <-- New Remainder R
2 +--------- 3 | 7.203125 6 --------- 1.203125 <-- New Remainder R ======== 12.03125 <-- R = 10 * R
0 <= R < 10*D ( D = Divisor, 3 in this example)
We now repeat those three steps above for division in a more general abstract setting:
q - quotient digit D - Divisor R[j] - Remainder after the j'th iteration. (The starting remainder R[0] is the Dividend.) +--------- D | R[j] <-- Remainder / Divisor
+-- quotient digit q q +--------- D | R[j]
q +--------- D | R[j] -qD --------- Rnew[j] <-- Rnew[j] = (R[j] - qD)
Step 3. |
To prepare for the next iteration, move radix point right one place. (In base 10 the radix point is more familiarly known as the decimal point, and we multiply by 10. In base 4, which the Pentium uses for division, we multiply by 4.) |
q +--------- D | R[j] -qD --------- Rnew[j] <-- Rnew[j] = (R[j] - qD) ======== R[j+1] = base * Rnew[j] (base = 10 for decimal 4 for radix 4)
R[j+1] = base * (R[j] - q[j]*D)
(D * n.nnnnn...) <= R[j+1] < (D * m.mmmmm...) where: n = lowest quotient digit we can use m = highest quotient digit we can use
9 | 9 4 3 2 1 1 1 1 1 8 | 8 4 2 2 1 1 1 1 0 7 | 7 3 2 1 1 1 1 0 0 6 | 6 3 2 1 1 1 0 0 0 Current 5 | 5 2 1 1 1 0 0 0 0 Remainder 4 | 4 2 1 1 0 0 0 0 0 3 | 3 1 1 0 0 0 0 0 0 2 | 2 1 0 0 0 0 0 0 0 1 | 1 0 0 0 0 0 0 0 0 0 +------------------ 0 1 2 3 4 5 6 7 8 9
_ 16 = 24 (i.e. 1*10 + 6 = 2*10 + -4)
R[j+1] = 4 * (R[j] - q[j]*D) (3-1)
_ _____ (D * 2.22222...) < R[j+1] < (D * 2.22222...) [base 4] [base 4] _ where: 2 = lowest quotient digit we can use (-2) 2 = highest quotient digit we can use
_ _____ 2.22222... (base 4) = -8/3 (decimal)
(D * -8/3) < R[j+1] < (D * 8/3)
(-8/3)*D < 4*(R[j] - qD) < (8/3)*D (3-2)
FOR q=+2 (-8/3)*D < 4*(R[j] - 2*D) < (8/3)*D becomes: (4/3)*D < R[j] < (8/3)*D
FOR q=+1: (-8/3)*D < 4*(R[j] - 1*D) < (8/3)*D becomes: (1/3)*D < R[j] < (5/3)*D
FOR q=-1: (-8/3)*D <4*(R[j] - (-1)*D) < (8/3)*D becomes: (-5/3)*D < R[j] < (-1/3)*D
FOR q=-2: (-8/3)*D < 4*(R[j] - (-2)*D) < (8/3)*D becomes: (-8/3)*D < R[j] < (-4/3)*D
If R is between (5/3)*D and (4/3)*D we can choose either q=2 or q=1. If R is between (2/3)*D and (1/3)*D we can choose either q=1 or q=0. If R is between (-1/3)*D and (-2/3)*D we can choose either q=0 or q=-1. If R is between (-4/3)*D and (-5/3)*D we can choose either q=-1 or q=-2.
Above (8/3)D is out of bounds. If we ever end up there, something has gone wrong. The Pentium returns a value of q=0 if we ever hit this area. |
Between (8/3)D and (5/3)D we must have q=2. Each cell which is in or partly overlaps this area must have the value 2. The five red error cells marked with have a value of 0 instead of 2. |
Between (5/3)D and (4/3)D we can have either q=2 or q=1. The actual division line (as best as we know) is the staircase line drawn in brown. Cells above that staircase shaded light blue have q=2, cells below that staircase shaded yellow have q=1. Cells immediately above the division line, marked with a '?', are cells which might return either q=2 or q=1 depending upon other factors that will be discussed in chapter 4. [2004—I apologize, I no longer remember the reason for the '?' in these cells, and chapter 4 doesn't explain them. Chapter 4 does explain why every cell must return a value suitable also for the cell directly above it. The dividing lines shown here I believe are from Tim Coe's Source Code.] |
Between (4/3)D and (2/3)D we must have q=1. Each cell which is in or partly overlaps this area must have the value 1. |
Between (2/3)D and (1/3)D we can have either q=1 or q=0. The actual division line (as best as we know) is the staircase line drawn in brown. Cells above that staircase have q=1, cells below that staircase have q=0. Cells immediately above the division line, marked with a '?', are cells which might return either q=1 or q=0 depending upon other factors that will be discussed in chapter 4. |
Between (1/3)D and (-1/3)D we must have q=0. Each cell which is in or partly overlaps this area must have the value 0. |
Between (-1/3)D and (-2/3)D we can have either q=0 or q=-1. The actual division line (as best as we know) is the staircase line drawn in brown. Cells above that staircase have q=0, cells below that staircase have q=-1. Note this division staircase is not a mirror image of the one in the positive plane, but is instead shifted down by two cells. Cells immediately above the division line, marked with a '?', are cells which might return either q=0 or q=-1 depending upon other factors that will be discussed in chapter 4. |
Between (-2/3) and (-4/3) we must have q=-1. Each cell which is in or partly overlaps this area must have the value -1. |
Between (-4/3)D and (-5/3)D we can have either q=-1 or q=-2. The actual division line (as best as we know) is the staircase line drawn in brown. Cells above that staircase have q=-1, cells below that staircase have q=-2. Note again this division staircase is not a mirror image of the one in the positive plane, but is instead shifted down by two cells. Cells immediately above the division line, marked with a '?', are cells which might return either q=-1 or q=-2 depending upon other factors that will be discussed in chapter 4. |
Between (-5/3)D and (-8/3)D we must have q=-2. Each cell which is in or partly overlaps this area must have the value -2. Note there are no error cells down here in the negative plane. The 5 error cells are all in the positive plane. |
Below (-8/3)D is out of bounds. If we ever end up there, something has gone wrong. The Pentium returns a value of q=0 if we ever hit this area. (We actually do hit this area after hitting the error cell.) |
Given: D - Divisor (note this value remains fixed) R - Dividend (initial Remainder)Step 1:
R[j+1] = 4 * (R[j] - q*D)Repeat steps 1 & 2 until enough quotient digits have been determined.
Chapter 4 | ||||||
index | ||||||
Back to Deley's Homepage |