From coe@vitsemi.com Mon Nov 28 07:14:48 EST 1994 Article: 21882 of comp.sys.intel Newsgroups: comp.sys.intel Path: news.mathworks.com!hookup!olivea!news.hal.COM!decwrl!amd!netcomsv!vitsemi!coe From: coe@vitsemi.com (Tim Coe) Subject: Re: Glaring FDIV bug in Pentium! Message-ID: <1994Nov28.063342.23267@vitsemi.com> Sender: coe@vitsemi.com (Tim Coe) Organization: Vitesse Semiconductor Date: Mon, 28 Nov 94 06:33:42 GMT Lines: 548 There is a C model of the Pentium hardware divider at the end of this message that accurately predicted many of the stated failing divides, and accurately confirms all failing divides of which I am aware. I worked on an IEEE hardware FPU from 1989-1991. As an FPU designer I am naturally interested in algorithms for hardware arithmetic. I am currently working on something completely different, but I still occasionally support related development tasks. I saw the first post relating to the Pentium FDIV bug in comp.sys.intel. When I saw the post from Andreas Gruss (included), I saw a pattern and the opportunity to completely reverse engineer Intel's divider. I took to this task with great vigor, as it is very rare that one gets visibility into the details of someone else's leading edge design. I decided to post my results when it appeared to me that Intel was not coming clean with the characteristics of the bug. The best characteristic and only characteristic of the bug to come from Intel is its 1 in 9 billion probability of occurring with random operands. The worst characteristic of the bug is that the specific operands that are most at risk are integers +/- very small deltas. The integers 3, 9, 15, 21, and 27 minus very small deltas are THE at risk divisors. (In particular the maximum expressible single precision, double precision, and extended precision numbers less than 3, 9...27 are all seriously at risk divisors.) The other bad characteristic of this bug that I did not hear >from Intel is that the worst case error induced by the bug was considerably greater than the 4 parts per billion error observed by Professor Nicely. It appeared to me that Intel was attempting to minimize its exposure by focusing on the 1 in 9 billion probability of error that it publicized and the 4 part per billion error observed by Professor Nicely. I posted my conclusions so that the Intel user community could be a peer to Intel when determining what applications may be at risk due to this bug. I think Intel does outstanding technical work. After all, the only reason I was reading comp.sys.intel was that I was considering the purchase of a P90 system. After this brouhaha I will still buy a P90 system, though when I do I will ask for a fixed chip and a guarantee that if I find after receiving my system that it does not contain said fixed chip that the seller will replace the unfixed chip posthaste. I regard the fact that the bug occurred as completely excusable, for I have designed many chips and therefore designed many bugs. I posted an additional program not included here that scanned single precision operands for errors induced that were greater that one single precision least significant bit. I received back a list of 1738 problem single precision divisions (out of 64 trillion). Herb Savage provided the list. The following divisors and their binary scalings (by this I mean different only in the binary exponent) appear to account for >95% of the divide errors: 3.0 > divisor >= 3.0 - 36*(2^-22) 9.0 > divisor >= 9.0 - 36*(2^-20) 15.0 > divisor >= 15.0 - 36*(2^-20) 21.0 > divisor >= 21.0 - 36*(2^-19) 27.0 > divisor >= 27.0 - 36*(2^-19) A divide with a divisor in one of the above ranges has roughly a 1 in 200000 chance of suffering loss of precision in double extended precision operations. The other <5% of the divide errors can be accounted for by changing the above 36 to 2048. All dividends are somewhat at risk versus the above divisors. The following formula identifies dividends that are at particularly high risk for errors in general and also for relatively large errors: dividend = intdividend + deltadividend or dividend = intdividend - deltadividend divisor = intdivisor - deltadivisor intdivisor = 3, 9, 15, 21, 27 and one of the following must hold true, which one depends on the exponent in the IEEE representation of the dividend in question: intdividend = intdivisor/3 mod intdivisor intdividend = 2*intdivisor/3 mod intdivisor The restrictions on the above deltadividend and deltadivisor are somewhat complex, the details of which are left as an exercise for the reader. ;-) I have not worked out the restrictions in detail. Here are the previous posts to comp.sys.intel. Read and enjoy. -Tim Coe coe@vitsemi.com ---- First and Second Post text ---- On a Packard Bell P90 PC I performed the following calculation using Microsoft Windows Desk Calculator: (4195835 / 3145727) * 3145727 [typo corrected from earlier posts] The result was 4195579. This represents an error of 256 or one part in ~16000. ak@ananke.s.bawue.de (Andreas Kaiser) writes >Usually, the division is correct (what did you expect?). Just a few >operands are divided wrong. My results (P90) with ~25.000.000.000 >random arguments (within 1..2^46), with even results divided by two >until odd, to assure unique mantissa patterns (the binary exponent >doesn't care, of course). > > 3221224323 > 12884897291 > 206158356633 > 824633702441 > 1443107810341 > 6597069619549 > 9895574626641 > 13194134824767 > 13194134826115 > 13194134827143 > 13194134827457 > 13194138356107 > 13194139238995 > 26388269649885 > 26388269650425 > 26388269651561 > 26388276711601 > 26388276712811 > 52776539295213 > 52776539301125 > 52776539301653 > 52776539307823 > 52776553426399 > > Gruss, Andreas > >-------------------- >-- Andreas Kaiser -- internet: ak@ananke.s.bawue.de >-------------------- fidonet: 2:246/8506.9 Analysis of these numbers reveals that all but 2 of them are of the form: 3*(2^(K+30)) - 1149*(2^(K-(2*J))) - delta*(2^(K-(2*J))) where J and K are integers greater than or equal to 0, and delta is a real number that has varying ranges depending on J but can generally be considered to be between 0 and 1. The 2*J terms in the above equation leads to the conclusion that the Pentium divider is an iterative divider that computes 2 bits of quotient per cycle. (This is in agreemnent with the quoted 39 cycles per extended long division from the Pentium data book. The technical name for this type of divider is radix 4) The extremely low probability of error (1 in 10^10) implies that the remainder is being held in carry save format. (Carry save format is where a number is represented as the sum of two numbers. This format allows next remainder calculation to occur without propagating carries. The reason that carry save format is implied by the error probability is that it is very difficult but not impossible to build up long coincident sequences of ones in both the sum word and the carry word.) I assumed the digit set was -2, -1, 0, 1, and 2. (Having 5 possible digits in a radix 4 divider allows a necessarry margin for error in next digit selection. When doing long division by hand the radix 10 and 10 possible digits allow no margin for error.) Taking the above into consideration I wrote the tentative model of Pentium divide hardware included below so that I might watch what bit patterns developed in the remainder. After running the numbers that were known to fail and numbers near them that appeared not to fail I determined the conditions for failure listed in the program. Analysis of the precise erroneous results returned on the bad divides indicates that a bit (or bits) is being subtracted >from the remainder at or near its most significant bit. A modeling of this process is included in the program. The program accurately explains all the published errors and accurately predicted the error listed at the beginning of the article. The determination of the quotient from the sequence of digits is left as an exercise for the reader ;-). I would like to thank Dr. Nicely for providing this window into the Pentium architecture. ---- Third Post ---- Since then I performed the following calculations in Microsoft Windows Desk Calculator on a Pentium machine with the following results: (41.999999/35.9999999)*35.9999999 - 41.999999 ==> (-0.75)*(2^-13) (48.999999/41.9999999)*41.9999999 - 48.999999 ==> (-1.0)*(2^-13) (55.999999/47.9999999)*47.9999999 - 55.999999 ==> (-1.0)*(2^-13) (62.999999/53.9999999)*53.9999999 - 62.999999 ==> (-1.0)*(2^-13) (54.999999/59.9999999)*59.9999999 - 54.999999 ==> (-1.0)*(2^-13) (5244795/3932159)*3932159 - 5244795 ==> (-1.0)*(2^8) I chose these calculations in anticipation of them exposing further Pentium FDIV failure modes. They did. The size of the erroneous results are exactly consistant with the final version of tentive Pentium divider model included below and in no way can be attributed to a Desk Calculator bug. The existance of these results pins most of the digit selection thresholds included in the model. I also performed the following calculations that did NOT produce erroneous results: (38.499999/32.9999999)*32.9999999 - 38.499999 ==> 0 (45.499999/38.9999999)*38.9999999 - 45.499999 ==> 0 I have been following this thread with great interest. One misperception that needs clearing is that this is an extended precision problem. This bug hits between 50 and 2000 single precision dividend divisor pairs (out of a total of 64 trillion.) Another misperception is related to the magnitude of the relative error. I would propose the following table of probabilities of getting the following relative errors when performing random double extended precision divides: relerror = (correct_result - Pentium_result)/correct_result Error Range | Probability ------------------------------------------- 1e-4 < relerror | 0 1e-5 < relerror < 1e-4 | 0.3e-11 1e-6 < relerror < 1e-5 | 0.6e-11 1e-7 < relerror < 1e-6 | 0.6e-11 1e-8 < relerror < 1e-7 | 0.6e-11 . . 1e-18 < relerror < 1e-17 | 0.6e-11 1e-19 < relerror < 1e-18 | 0.6e-11 Examination of the above divide failures reveals that both the dividend and divisor are integers minus small deltas. Also notable is the induced error is roughly delta^(2/3). The integers in the divisors are actually restricted to those listed and their binary scalings. The integers in the dividends may be much more freely chosen. This type of dividend divisor pair actually occurs quite often when forward integrating trajectories off metastable points. This is because metastable points in systems often have certain exactly integral characteristics and as a path diverges from the metastable point these characteristics slowly diverge >from their integral values. If the forward integration algorithm happens to divide these characteristics, and they happen to be for example 7 and 3, it will get nailed. The divider model includes support for up to 60 bits of divisor and up to 64 bits of dividend. The last four bits of dividend are kludged in. Here is a list of failing dividend divisor mantissas in hex. A dash between two numbers indicates an inclusive failing range. Compile the program and run these numbers through it and watch the bits dance: 800bf6 bffffc a00ef6 effffc a808d2 8fffe e00bd2 bfffe a7ffd2 8fffe c3ffd2 a7ffe dfffd2 bfffe fbffd2 d7ffe f9ffdc7 efffe b9feab7-b9feabf 8fff b9ffab0e-b9ffab7f 8fffc -the following double extended pair fails 3 times!!! c3ffd2eb0d2eb0d2 a7ffe e00bd229315 bfffe 9fffef5-9fffeff effff4 9ffff21-9ffff3f effff8 9ffff4d-9ffff7f effffc f008e35-f008e3f 8ffff4 f008e6d-f008e7f 8ffff6 f008ea1-f008ebf 8ffff8 f008ed9-f008eff 8ffffa f008f0d-f008f3f 8ffffc f008f45-f008f7f 8ffffe f008f7e 8ffffff1 f0023e 8fffff8 effff0d 8ffffc a808d1b-a808d3f 8fffe a808d67-a808d7f 8fffe4 a808db3-a808dbf 8fffe8 a808dff 8fffec An example run of the program (using the first reported error): ---Enter dividend mantissa in hex: 8 ---Enter divisor mantissa in hex: bfffffb829 ---next digit 1 ---1111000000000000000000000001000111110101101111111111111111111100 ---0000000000000000000000000000000000000000000000000000000000000100 ---11110000000000000000000000010001 iteration number 1 ---. ---. ---. ---next digit -1 ---0011111111100100101011110100110000010111010000000000000000000000 ---1101111111111111111110110110010010010000000000000000000000000000 ---00011111111001001010101010110000 iteration number 14 ---next digit 2 ---A bug condition has been detected. ---Enter 0 for correct result or 1 for incorrect result: 1 ---0000000001101101010100001000000111110110011111111111111111111100 ---1111111100100101010110100110010010010010000000000000000000000100 ---11111111100100101010101011100101 iteration number 15 ---next digit 0 ---1111110100100000001010111001010110010001111111111111111111100000 ---0000000100101010100000000000010010010000000000000000000000100000 ---11111110010010101010101110011001 iteration number 16 ---. ---. ---. -Tim Coe coe@vitsemi.com #include main() { unsigned r0, r1, r2, r3, r4, r5, r6, s0, s1; unsigned t0, t1, t2, t3, cycle, f, incorrect, spup; unsigned thr_m2_m1, thr_m1_0, thr_0_1, thr_1_2, positive, errornum; char line[30], *linepoint; r0 = 0x0bffffc0; r1 = 0; r2 = 0x0800bf60; r3 = 0; printf("First digit of mantissas must be between 8 and f\n"); printf("Enter dividend mantissa in hex: "); *(line+15) = '0'; scanf("%s", line); linepoint = line; while (*linepoint != '\0') linepoint++; while (linepoint < line + 15) *linepoint++ = '0'; *(line+16) = '\0'; sscanf(line+15, "%x", &spup); spup = (spup >> 2) | (12 & (spup << 2)); *(line+15) = '\0'; sscanf(line+7, "%x", &r3); *(line+7) = '\0'; sscanf(line, "%x", &r2); printf("Enter divisor mantissa in hex: "); scanf("%s", line); linepoint = line; while (*linepoint != '\0') linepoint++; while (linepoint < line + 15) *linepoint++ = '0'; *(line+15) = '\0'; sscanf(line+7, "%x", &r1); *(line+7) = '\0'; sscanf(line, "%x", &r0); r4 = 0; r5 = 0; t0 = r2; while (!(t0 & 1)) t0 = t0 >> 1; printf("%d\n", t0); t0 = r0; while (!(t0 & 1)) t0 = t0 >> 1; printf("%d\n", t0); /* These thresholds are VERY tentative. */ /* There may be bugs in them. */ t0 = r0 >> 22; /* Next threshold is strongly indicated */ /* by the failure of 1/9895574626641 */ if (t0 < 36) thr_0_1 = 3; /* Next threshold is strongly indicated */ /* by the failure of 1/824633702441 */ else if (t0 < 48) thr_0_1 = 4; /* Next threshold is strongly indicated */ /* by the failure of 5244795/3932159 */ else if (t0 < 60) thr_0_1 = 5; else thr_0_1 = 6; thr_m1_0 = 254 - thr_0_1; if (t0 < 33) thr_1_2 = 11; else if (t0 < 34) { printf("This model does not correctly handle\n"); printf("this divisor. The Pentium divider\n"); printf("undoubtly handles this divisor correctly\n"); printf("by some means that I have no evidence\n"); printf("upon which speculate.\n"); exit(); } /* Next threshold is strongly indicated */ /* by the failure of 41.999999/35.9999999 */ else if (t0 < 36) thr_1_2 = 12; else if (t0 < 39) thr_1_2 = 13; /* Next threshold is strongly indicated */ /* by the failure of 1/1443107810341 and */ /* by the failure of 48.999999/41.9999999 */ else if (t0 < 42) thr_1_2 = 14; else if (t0 < 44) thr_1_2 = 15; /* Next threshold is strongly indicated */ /* by the failure of 55.999999/47.9999999 */ else if (t0 < 48) thr_1_2 = 16; /* Next threshold is strongly indicated */ /* by the failure of 62.999999/53.9999999 */ else if (t0 < 54) thr_1_2 = 18; /* Next threshold is strongly indicated */ /* by the failure of 54.999999/59.9999999 */ else if (t0 < 60) thr_1_2 = 20; else thr_1_2 = 23; thr_m2_m1 = 254 - thr_1_2; if (t0 == 35) errornum = 22; else if (t0 == 41) errornum = 26; else if (t0 == 47) errornum = 30; else if (t0 == 53) errornum = 34; else if (t0 == 59) errornum = 38; else errornum = 128; incorrect = 0; cycle = 1; /* The cycle limit would be ~34 instead of */ /* 18 for double extended precision. */ while (cycle < 18) { t0 = 255 & ((r2 >> 24) + (r4 >> 24)); if ((t0 > thr_m1_0) || (t0 < thr_0_1)) { s0 = 0; s1 = 0; positive = 0; printf("next digit 0\n"); } else if (t0 > thr_m2_m1) { s0 = r0; s1 = r1; positive = 0; printf("next digit -1\n"); } else if (t0 < thr_1_2) { s0 = ~r0; s1 = ~r1; positive = 4; printf("next digit 1\n"); } else if (t0 & 128) { s0 = (r0 << 1) | (r1 >> 31); s1 = r1 << 1; positive = 0; printf("next digit -2\n"); } else { s0 = ~((r0 << 1) | (r1 >> 31)); s1 = ~(r1 << 1); positive = 4; printf("next digit 2\n"); if ((t0 == errornum) && (((r2 >> 21) & 7) == 7) && (((r4 >> 21) & 7) == 7)) { printf("A bug condition has been detected.\n"); printf("Enter 0 for correct result or 1 for incorrect result: "); scanf("%d", &incorrect); if (incorrect) { /* These amounts that are subtracted from the */ /* remainder have NOT been extensively verified. */ if (errornum == 22) s0 = s0 - (3 << 25); else s0 = s0 - (4 << 25); } } } t0 = s0 ^ r2 ^ r4; t1 = s1 ^ r3 ^ r5; t2 = (s0 & r2) | (s0 & r4) | (r2 & r4); t3 = (s1 & r3) | (s1 & r5) | (r3 & r5); r2 = (t0 << 2) | (t1 >> 30); r3 = t1 << 2; r4 = (t2 << 3) | (t3 >> 29); r5 = (t3 << 3) | positive | (spup & 3); spup = spup >> 2; t0 = r2; f = 32; while (f--) { if (t0 & (1 << 31)) putchar('1'); else putchar('0'); t0 = t0 << 1; } t0 = r3; f = 32; while (f--) { if (t0 & (1 << 31)) putchar('1'); else putchar('0'); t0 = t0 << 1; } putchar('\n'); t0 = r4; f = 32; while (f--) { if (t0 & (1 << 31)) putchar('1'); else putchar('0'); t0 = t0 << 1; } t0 = r5; f = 32; while (f--) { if (t0 & (1 << 31)) putchar('1'); else putchar('0'); t0 = t0 << 1; } putchar('\n'); t0 = r2 + r4; f = 32; while (f--) { if (t0 & (1 << 31)) putchar('1'); else putchar('0'); t0 = t0 << 1; } printf(" iteration number %d\n", cycle++); } }