How Computers Generate Random Numbers - Chapter 4

Chapter 3          die4          Chapter 5
index
Back to Deley's Homepage



4.0    LINEAR CONGRUENTIAL RANDOM NUMBER GENERATORS


4.1    GENERATORS TESTED FOR THIS REPORT AND THEIR DEFINITION

The following generators were tested for this report. SEED is a value which gets updated at each iteration. X is the "random" number returned at each iteration. Each generator needs to be given an initial value of SEED to start with. Choosing good initial SEED values is discussed in sections 4.3, 4.4, and 4.5

1.  MTH$RANDOM
Used by VMS FORTRAN, VMS BASIC, and others.
   Definition:
   SEED = (69069*SEED + 1) mod 2**32
X = SEED/2**32
Returns real in range [0,1)     {including 0, excluding 1}



2.  RANDU
Obsolete but still found on some systems
   Definition:
   SEED = (65539*SEED) mod 2**31
X = SEED/2**31
Returns real in range [0,1)     {including 0, excluding 1}



3.  C and ANSI C
The ANSI C standard only states that rand() is a random number generator which generates integers between [0,RAND_MAX] inclusive, with RAND_MAX being a value defined in stdlib.h, and RAND_MAX being at least 32767.

The rand() function suggested by Brian W. Kernighan and Dennis M. Ritchie in their book "The C programming language" is the one shown below.
   Definition:

   #define RAND_MAX 32767          /* (2**15 - 1) */

SEED = (1103515245*SEED + 12345) mod 2**15
X = SEED
Returns integer in range [0, RAND_MAX]

Note this is not the only random number generator that can be used. Any random number generator can be used. The only stipulation is that RAND_MAX be at least 32767.

The rand() generator used by VAX C is almost the same except it uses a modulus of 2**32 instead of 2**15, giving RAND_MAX the value 2147483647.

Definition:

SEED = (1103515245*SEED + 12345) mod 2**31
X = SEED
Returns integer in range [0, 2**31)     {including 0, excluding 2**31}



4.  Microsoft C
The rand() function in the Microsoft C library v4.0.
   Definition:
   SEED = (214013*SEED + 2531011) mod 2**31
X = int( SEED/2**16 )
Returns integer in range [0, 2**15)     {0 inclusive, 2**15 exclusive}



5.  Turbo Pascal
The random function in Turbo Pascal v6.0
   Definition:
   SEED = (134775813*SEED + 1) mod 2**32
X = int( SEED/2**16 )    {i.e. the upper 16 bits of SEED}
Returns integer in range [0, 2**16)     {0 inclusive, 2**16 exclusive}





4.2 GENERAL FORMAT OF LINEAR CONGRUENTIAL RANDOM NUMBER GENERATORS
These generators all fall into the category of linear congruential generators. The general form of a linear congruential generator is:

SEED = (A * SEED + C) mod M
X = f(SEED)

The user supplies an initial integer value for SEED. On each call to the random number generator a new value of SEED is calculated by taking the current value of SEED, multiplying by A, adding C, and taking the remainder MOD M of the result. The new value of SEED is an integer between 0 and M-1. This new value of SEED is then converted into a value X which is returned to the user.

First note some obvious properties of sequences generated by this method:

1.  The modulus M is an upper bound on the number of different values SEED can take on.

2.If the new value of SEED is the same as one we had before, the sequence will begin to repeat itself in a cyclic manner.

3.All sequences generated by a linear congruential formula will eventually enter a cycle which repeats itself endlessly.

A good linear congruential formula will generate a long sequence of numbers before repeating itself. The maximum length a sequence can possibly be before repeating itself is M, the modulus. This is because there are only M different possible values SEED can take on. A linear congruential formula will generate a sequence of maximum length M if and only if the following three conditions are met (See Knuth for a proof of this theorem):

i)  C is relatively prime to M;
ii)B = (A - 1) is a multiple of P, for every prime P dividing M;
iii)B = (A - 1) is a multiple of 4, if M is a multiple of 4.




4.3 THE MTH$RANDOM ALGORITHM:
We now take a look at the MTH$RANDOM function. The algorithm MTH$RANDOM used to generate successive random numbers is:

SEED = (69069*SEED + 1) MOD 2**32
X = SEED/2**32


This is a linear congruential generator with:

A = 69069
C = 1
M = 2**32

Note first that MTH$RANDOM satisfies the three necessary and sufficient conditions above which insure that the sequence it generates is of maximum length:

i) 1 is relatively prime to 2**32
since 1 is relatively prime to all numbers.
ii)69068 is a multiple of 2, which is the only prime dividing 2**32.
iii)69068 is a multiple of 4.

Thus the sequence generated by MTH$RANDOM is of maximal length, generating all 2**32 possible values of SEED before repeating itself.

Note for the MTH$RANDOM function if SEED is initially an ODD value then the new value of SEED will always be an even value. And if SEED is an EVEN value, then the new value of SEED will be an ODD value. Thus if the algorithm is repeatedly called, the value of SEED will alternate between EVEN and ODD values.




4.4 THE RANDU ALGORITHM
Now let us look at the RANDU function. The algorithm RANDU uses to generate successive random numbers is:

SEED = (65539*SEED) MOD 2**31
X = SEED/2**31

This is a linear congruential generator with:

A = 65539
C = 0
M = 2**31

Note first that the modulus M for RANDU is 2**31 whereas the modulus M for MTH$RANDOM is 2**32. The maximum sequence length for RANDU is thus at least 1/2 of what it is for MTH$RANDOM. Note also that if SEED is initially an odd value, the new SEED generated will also be an odd value. Similarly, if SEED is initially an even value, the new SEED generated will also be an even value. Thus there are at least two separate disjoint cycles for the RANDU generator, depending upon whether you start with an even SEED value or and odd SEED value.

Actually, the situation is even worse than that. For odd SEED values there are two separate disjoint cycles, one generated by the SEED value 1, and one generated by the SEED value 5. Although we can generate each full cycle by initializing SEED to any value the cycle takes on, it is convenient to refer to the smallest SEED value the cycle takes on as the one which generates the cycle. This helps us differentiate between different cycles. We will denote the sequence generated by the SEED value 1 as <1>, and the sequence generated by the SEED value 5 as <5>.

The cycles <1> and <5> each contain 536,870,912 values. Together they account for all of the (2**31)/2 possible odd SEED values. The following table summarizes the cycles for odd values of SEED:

TABLE 4.2.1    RANDU WITH ODD VALUES OF SEED
CYCLE    LENGTH OF CYCLE
<1>536,870,912
<5>536,870,912

Even values of SEED do not have it so nice. There are 30 different disjoint cycles using even SEED values. Some of them are quite short:

TABLE 4.2.2    RANDU WITH EVEN VALUES OF SEED
CYCLE    LENGTH OF CYCLE
<2>268435456
<4>134217728
<8>67108864
<10>268435456
<16>33554432
<20>134217728
<32>16777216
<40>67108864
<64>8388608
<80>33554432
<128>4194304
<160>16777216
<256>2097152
<320>8388608
<512>1048576
<640>4194304
<1024>524288
<1280>2097152
<2048>262144
<2560>1048576
<4096>131072
<5120>524288
<8192>65536
<10240>262144
<16384>32768
<20480>131072
<32768>16384
<40960>65536
<81920>32768
<163840>16384
---------------
Total:1073709056


There are a total of (2**31)/2 = 1,073,741,824 possible even SEED values. So far we've accounted for 1,073,709,056 of them. The remaining 32768 SEED values are ones for which the 31 bit binary representation of them has the lower 16 bits set to 0. These SEED values are treated by RANDU as if the SEED value were 1, and they result in the cycle <1>.

We can see from this that even values of SEED should be avoided when using RANDU since they can give us very short cycles.




4.5 STARTING THE MTH$RANDOM GENERATOR
The initial SEED value for the MTH$RANDOM generator can be any integer value between 0 and (2**32)-1 = 4294967295. If we could "randomly" choose a number in this range and insure that each number had an equal probability of being chosen, we'd be set. However, human nature being what it is, simply picking a number out of the air usually does not satisfy the criterion for randomness, especially when there are over four billion numbers to choose from.

For many applications it doesn't matter what the initial few numbers from the random number generator are, and ANY initial seed value will suffice. Note also some myths about choosing the initial seed value for the MTH$RANDOM generator. The VAX/VMS FORTRAN manual recommends using an odd value. This piece of advice is obviously left over from the RANDU days. The obsolete RANDU generator does require an odd seed value, but for the MTH$RANDOM generator the seed value alternates between even and odd values at each iteration, so there is no advantage to starting with an odd value.

Another incorrect comment about the MTH$RANDOM generator can be found in the VMS source code comments. There it indicates that the MTH$RANDOM generator might do poorly in a 3-D test for randomness, but actually MTH$RANDOM does quite well in a 3-D test. It's the RANDU generator which does poorly in a 3-D test.

More important than starting the MTH$RANDOM generator to get one random sequence is the problem of restarting the generator to get several different sequences. You may wish to run a simulation several times and use a different random sequence each time.

One simple method would be to use consecutive seed values for each run. For example, you could generate ten different random sequences using initial seed values of 1 through 10. If we compare side by side these ten random sequences the first thing we'll note is the first value of each sequence is nearly identical. This is because there is a very strong correlation between the initial SEED value and the first value output from the generator. The simple relationship between SEED and the first output value is approximately:

x

where mod 1 here indicates when SEED/62183.7 becomes greater than 1 we wrap around and start counting from 0 again.

Thus if SEED changes by 10 the first value changes by only (10/62183.7) ~= 0.00016 . Note that this correlation between the initial SEED value and the first value output from the MTH$RANDOM generator will happen no matter what the initial value of SEED is. Large initial values of SEED are no better than small initial values of SEED.

Besides the initial expected correlation of the first value output from the generator, there are also unexpected correlations between other sequence elements. For the example above, not only are the first values of all 10 sequences nearly identical, there is also a strong correlation for elements 6, 13, 39, 41, 54, 60, 65, 72, 82, and 84.

It seems to be not all that easy to get away from this problem of occasional correlation between different sequences. If instead of using the seed values {1,2,3,...,10} you used the seed values {1,1001,2001,3001,...,9001}, you would still get occasional elements which are strongly correlated, they just occur in slightly different places. In this example they occur at elements 1, 11, 17, 32, and 80.

For many applications this occasional correlation may not be important. Each sequence when taken as a whole is quite different from the other ones.

One method for dealing with the correlation between the initial SEED value and the first number output from the random number generator is to simply throw away the first number. The second number output from the generator has a much weaker correlation with the initial SEED value. It is much more "random" in that very small changes in the initial SEED value produce very large and somewhat unpredictable changes in it's value. And since you're throwing away numbers you might as well throw away the second number too. The third number output from the MTH$RANDOM generator is quite random compared to the initial SEED value.

There is another way to view this operation of throwing away the first two values. We need to come up with a good "random" initial SEED value to start the generator with. So we can use our random number generator to help us generate a random SEED value to start our random number generator with! We select an initial nonrandom SEED value, then run our random number generator a few cycles to generate a random SEED value. We then restart our random number generator with the new random SEED value, and the output is then a properly initialized random sequence. Note that this is the same as starting with a nonrandom SEED value and throwing away the first few numbers output from the generator.

Another method to deal with correlation is to use the shuffling technique to be described in section 8 (Shuffling). This technique essentially removes all problems of correlation and also greatly improves the randomness of the numbers. If shuffling is used, then ANY set of initial seed values may be used with confidence including something as simple as {1,2,...10}.




4.6 STARTING THE RANDU GENERATOR
The RANDU generator should only be started with an odd SEED value, since even initial SEED values can result in sequences with very short periods. Note that for RANDU, if the initial SEED value is odd, then all subsequent SEED values will be odd.

The rest of the problems mentioned for MTH$RANDOM also apply to RANDU, except that instead of throwing away the first two random numbers output you should instead throw away the first ten numbers output.

Actually you shouldn't be using the RANDU generator at all because of it's very poor 3-D performance.




4.7 STARTING THE ANSI C GENERATOR
The ANSI C generator is very much like the MTH$RANDOM generator. All comments for the MTH$RANDOM generator also apply to the ANSI C generator. See the comments above for the MTH$RANDOM generator.




Chapter 3          die4          Chapter 5
index
Back to Deley's Homepage