## The Pentium division Flaw - Chapter 1

Introduction                   Chapter 2

## CHAPTER 1: REVIEW

Here we briefly review some simple concepts you probably already know.

### 1.1    INTERPRETING THE VALUE OF A DECIMAL NUMBER:

 (100) ones - - + + - - tenths (10-1) (101) tens - + | | + - hundredths (10-2) (102) hundreds + | | | | + thousandths (10-3) | | | | | | 4 1 9 . 5 8 3 ^ -decimal point

### 1.2    INTERPRETING THE VALUE OF A BINARY NUMBER:

 (20) ones - - - + + - - - halves (1/2) (21) twos - - + | | + - - fourths (1/4) (22) fours - + | | | | + - eighths (1/8) (24) eights + | | | | | | + sixteenths (1/16) | | | | | | | | 1 0 1 1 . 1 0 1 1 ^ -radix point

### 1.3    MOVING THE DECIMAL POINT OF A BASE 10 NUMBER:

 Given an ordinary decimal number such as 14.195835 we can move the decimal point to the right or left by multiplying or dividing by powers of ten. For example, to move the decimal point right two places, we multiply by 102 = 100 14.195835 * 100 = 1419.5835

### 1.4    MOVING THE RADIX POINT OF A BINARY NUMBER:

 Given an ordinary binary number such as 1011.1011 we can move the radix point to the right or left by multiplying or dividing by powers of two. For example, to move the decimal point right two places, we multiply by 22 = 4 1011.1011 * 4 = 101110.11

### 1.5    NORMALIZED NOTATION: BASE 10

 Given an ordinary decimal number such as 14.195835 we can move the decimal point so it lies directly after the first digit by multiplying or dividing by an appropriate power of 10. In our example:

 14.195835 = 1.4195835 * 101 mantissa exponent

 In this representation 1.4195835 is called the mantissa, and 101 is called the exponent.

### 1.6    NORMALIZED NOTATION: BASE 2

 Given an ordinary binary number such as 1011.1011 we can move the radix point so it lies directly after the first digit by multiplying or dividing by an appropriate power of 2. In our example:

 1011.1011 = 1.0111011 * 23 mantissa exponent

 In this representation 1.0111011 is called the mantissa, and 23 is called the exponent.

### 1.7    DIVISION NOMENCLATURE

```               dividend    numerator                        12
quotient = -------- = -----------         Example:  3 = --
divisor    denominator                        4

quotient                         3
+---------         Example:     +---
divisor | dividend                    4 | 12
```

### 1.8    DIVISION OF NORMALIZED NUMBERS: BASE 10

Given any two decimal numbers that we wish to divide, such as:

```                    14.195835
----------
119.716320
```

we can normalize the values using exponential notation to obtain:

1.4195835 * 101      1.4195835   101      1.4195835
-----------------  =  ---------- * ----  =  ---------- * 10(1-2)
1.19716320 * 102     1.19716320   102     1.19716320

Thus the problem has been reduced to the division of two normalized numbers followed by a shifting of the decimal point.

### 1.9    DIVISION OF NORMALIZED NUMBERS: BASE 2

Given any two binary numbers that we wish to divide, such as:

```                   1011.1011
---------
11.001000
```

we can normalize the values using exponential notation to obtain:

```       1.0111011 * 2^3     1.0111011   2^3     1.0111011
---------------  =  --------- * ---  =  --------- * 2(3-1)
1.1001000 * 2^1     1.1001000   2^1     1.1001000
```

 Thus the problem has been reduced to the division of two normalized numbers followed by a shifting of the radix point. The important point to note is that any division operation can be reduced to dividing normalized mantissas. The exponents and signs can be handled separately.

### 1.10    PENTIUM REPRESENTATION OF FLOATING POINT NUMBERS

The Pentium uses IEEE standard 594 for representing floating point numbers. In this standard a single precision floating point variable is represented using a 24 bit mantissa and an exponent which can take on values between +127 and -126.

The Pentium also uses two's compliment notation to represent negative numbers. A positive number is negated by complementing all the bits (one's complement, i.e. "flip the bits") and then adding 1 to the result.

Carry-Save addition is a method of quickly reducing the sum of three variables A, B, and C, down to the sum of two variables PARTIAL_SUM and CARRY, using only bitwise logical operations (AND, OR, Exclusive-OR). With Carry-Save addition we operate on all of the columns at once, placing the partial sum of that column at the bottom, and saving any overflow as a carry digit at the top. The bottom is our PARTIAL_SUM, and the carry digits at the top become our CARRY. The true sum of (A + B + C) is thus reduced to the sum (PARTIAL_SUM + CARRY).

EXAMPLE #1:

```          CARRY= 0010.1100010
------------
A=101.11010011
B=001.01100110                CARRY= 0010.1100010
C=000.01100000          PARTIAL_SUM=  100.11010101
------------                       -------------
PARTIAL_SUM= 100.11010101                TOTAL= 0111.10011001
```

Note that:

```     PARTIAL_SUM = (A ^ B ^ C)
CARRY       = (A & B) | (A & C) | (B & C)

^  represents Exclusive-OR
&  represents AND
|  represents OR
```

Thus both PARTIAL_SUM and CARRY are calculated quickly using nothing but bitwise logical operations.

The one drawback to Carry-Save addition is if we want to know the true sum of (A + B + C) we still have to add (PARTIAL_SUM + CARRY), and this can take time. The classic method of adding two numbers is to start with the right most column and add it up, placing the partial sum for that column below and any overflow as carry digits above the next column to the left. Then move left one column and repeat. Continue in this manner, adding one column at a time, until all the columns have been added and the calculation is complete. This method of addition is slow since each number may have 64 bits, and we must loop through all of the columns, working on only one column at a time, propagating the overflow on to the next column as carry bits.

However, it may be that we don't need an exact answer but that an approximate answer will do fine. We can get an approximate answer by adding just the first few columns and ignoring the rest. For Example, we can calculate an approximate answer to example #1 above by considering just the first 7 columns:

```         CARRY= 0010.110
PARTIAL_SUM=  100.110
-------------
approx total= 0111.100
```

Here our total is approximate, but it is very close to the actual answer. In this case the total dropped down from the true answer to the first integral multiple of 0.001 below the true answer.

Consider now example #2 below which shows a slightly different PARTIAL_SUM and CARRY. Notice the actual sum is identical to the sum we had before, but the approximate total obtained by adding the first seven bits only is lower this time:

EXAMPLE #2:

```         CARRY= 0010.1011101               CARRY= 0010.101
PARTIAL_SUM=  100.11011111        PARTIAL_SUM=  100.110
-------------                     -------------
actual total= 0111.10011001       approx total= 0111.011
```

This time our approximate total dropped down to the second integral multiple of 0.001 rather than the first integral multiple as it did before. The difference is in example #1 the sum of the truncated parts is less than 0.001, whereas in example #2 the sum of the truncated parts is greater than 0.001:

```     Example #1:                          Example #2:
|truncated                           |truncated
|part                                |part
|                                    |
CARRY=     .   0010                  CARRY=     .   1101
PARTIAL_SUM=     .   10101           PARTIAL_SUM=     .   01111
-------------                        -------------
TOTAL=     .___11001                 TOTAL=     .__101001
```

Mathematically our approximate total is equal to the true total minus the sum of the truncated parts:

```    approx total = total - (sum of truncated parts)
```

Consider now a worst case scenario in which all the truncated bits are 1's. Convince yourself that even in this case the sum of the truncated parts will always be less than (2 x 0.001). Consider now the other extreme scenario where all the truncated bits are 0's. Convince yourself that in this trivial case the sum of the truncated parts is 0.

We have thus determined the bounds on our approximate total:

```     total >= approx total > total - (2 x 0.001)
```

Our approximate total will always be equal to or lower than the true answer, and the error will always be less than (2 x 0.001). This is a crucial result so make sure you understand it.

Introduction                   Chapter 2