How Computers Generate Random Numbers - Chapter 3

Chapter 2          die3          Chapter 4
index
Back to Deley's Homepage



3.0    ANALYSIS OF A SEQUENCE OF RANDOM NUMBERS

Enough theory for now. Let's turn to the analysis of a sequence of random numbers. There's nothing "random" about any specific sequence of numbers. But we can analyze the sequence of numbers to see if we can find any sufficient reason to doubt the hypothesis that the numbers could have come from multiple independent trials of a chance experiment.

In the following subsections we test the MTH$RANDOM and RANDU random number generators using chisq tests and compare the results. We will soon find out why the infamous RANDU generator is considered a very poor random number generator. The details of what's inside these generators is deferred until section 5.0 . For now accept that we can set up a random number generator to generate random numbers between 1 and NBINS, and that we can use a SEED value of 1 to initialize these generators.



3.1    ONE-DIMENSIONAL TESTS

Our first test then is to see if each possible outcome appears to be equally likely. Imagine we have a number of bins, and we toss balls at the bins. After throwing a certain number of balls at the bins, we count up how many balls fell in each bin. We then apply the chisq test to determine if the balls appear to have an equal probability of falling in each bin.

Arbitrarily choosing NBINS = 30 (number of bins), SEED = 1 (initial SEED value), and BALLS = 300, the chisq test was performed 10 times on the MTH$RANDOM generator with the following results:

30 BINS,   300 BALLS,   10 chisq TESTS,   MTH$RANDOM
BALLS= 300   chisq = 35.1999969   PROB= 0.8019531
BALLS= 300chisq = 22.8000031PROB= 0.2143845
BALLS= 300chisq = 36.7999992PROB= 0.8485908
BALLS= 300chisq = 19.8000011PROB= 0.1009650
BALLS= 300chisq = 48.7999992PROB= 0.9878765
BALLS= 300chisq = 29.3999996PROB= 0.5556139
BALLS= 300chisq = 22.8000011PROB= 0.2143840
BALLS= 300chisq = 36.5999985PROB= 0.8432655
BALLS= 300chisq = 29.3999996PROB= 0.5556139
BALLS= 300chisq = 18.6000004PROB= 0.0688844

Here chisq is the chi-square value, and PROB is the percentage of random sequences (with 30 - 1 = 29 degrees of freedom) which would have a lower chisq value. An extremely high value of PROB, say above 0.990, would indicate that some bins had a much higher than expected number of balls in them while other bins have a much lower than expected number of balls in them. In general, an extremely high value of PROB indicates that the experimental results do not reinforce the hypothesis that each bin had an equally likely chance of getting a ball. So far all of the PROB values appear reasonable.

We can further analyze the results of the 10 chisq tests above by noting that the distribution of the 10 chisq values should approximate the chisq distribution (for 30 - 1 = 29 degrees of freedom). We can apply the KS test to the 10 chisq values to see how well their distribution approximates the chisq distribution for d.f. = 29.

KS     D= 0.2019531      PROB= 0.8093885

Here D is the KS D value which is the maximum distance between the approximate and the real chisq function, and PROB is the percentage of times we would have a larger D value, meaning the approximation could be worse. A very low value of PROB here means it couldn't get much worse. Anything above a very low value means our distribution bears at least some resemblance to the function we're comparing it to. Again this result appears reasonable.

The following values were obtained when the above chisq test was applied 10 times to the RANDU generator:

30 BINS,   300 BALLS,   10 chisq TESTS,   RANDU
BALLS= 300   chisq = 31.3999996   PROB= 0.6531839
BALLS= 300chisq = 60.7999992PROB= 0.9995093
BALLS= 300chisq = 33.4000015PROB= 0.7381005
BALLS= 300chisq = 24.3999977PROB= 0.2910264
BALLS= 300chisq = 20.8000011PROB= 0.1336691
BALLS= 300chisq = 16.6000023PROB= 0.0319657
BALLS= 300chisq = 32.0000000PROB= 0.6801265
BALLS= 300chisq = 30.2000027PROB= 0.5959312
BALLS= 300chisq = 31.2000027PROB= 0.6439425
BALLS= 300chisq = 45.5999985PROB= 0.9742958

And a KS test on the above 10 chisq values:

KS     D= 0.2959312      PROB= 0.3452127


Again there is nothing as yet unusual about either the MTH$RANDOM or RANDU generators.

We can also test the MTH$RANDOM and RANDU generators using the KS test directly to see how well the output fits the hypothesized uniform distribution.


KS TEST ON MTH$RANDOM (Start with SEED = 1)
KS   NPOINTS= 10      D= 0.2142200     PROB= 0.7484154
KSNPOINTS= 100D= 0.0944867PROB= 0.3338285
KSNPOINTS= 1000D= 0.0314864PROB= 0.2746516
KSNPOINTS= 10000D= 0.0079555PROB= 0.5514056
KSNPOINTS= 100000D= 0.0017763PROB= 0.9105781
KSNPOINTS= 1000000D= 0.0009270PROB= 0.3565834


KS TEST ON MTH$RANDOM (Start with SEED = 1)
KS   NPOINTS= 10      D= 0.6555050     PROB= 0.0003705
KSNPOINTS= 100D= 0.1326773PROB= 0.0591586
KSNPOINTS= 1000D= 0.0337385PROB= 0.2050490
KSNPOINTS= 10000D= 0.0063568PROB= 0.8138239
KSNPOINTS= 100000D= 0.0042999PROB= 0.0495556
KSNPOINTS= 1000000D= 0.0007990PROB= 0.5457213


Note the only thing questionable here is the extremely low value of PROB on RANDU for the first 10 points. This indicates that perhaps a seed value of 1 for the RANDU generator is not a good place to start it. Indeed, the first 10 numbers from RANDU starting with SEED = 1 are:


RANDU first 10 numbers starting with seed = 1
      1      0.0001831
2 0.0032959
3 0.0444950
4 0.5339386
5 0.0068024
6 0.8734164
7 0.1705012
8 0.3222913
9 0.9906484
10 0.7260775


Notice that half of the first 10 values are either below 0.006 or above 0.99. This alone does not indicate a problem with the RANDU generator, it does indicate that a SEED value of 1 is a poor choice to start it with. Note that the MTH$RANDOM generator does not have this problem when started with SEED = 1.



3.2    TWO-DIMENSIONAL TESTS

Our original hypothesis about our sequence of supposedly random numbers was that it could have come from a repeated chance experiment in which each possible outcome was equally likely and each test of the chance experiment was independent of all previous tests. So far we have been testing the sequence to see if each possible outcome appears to be equally likely. The remainder of our tests now focus on checking the hypothesis that each outcome is independent of all previous outcomes.

If we throw a die and get a 5, what is the probability that the next throw of the die will be a 3? If the probability is still 1/6, we say the next outcome is independent of the previous outcome. i.e. knowledge of the previous outcome does not give us any help in predicting the next outcome.

The mathematical definition of statistical independence between two random variables can be written as:

Pr( X(i+1) | X(i) ) = Pr( X(i+1) )


The serial test in two dimensions checks to see if there is any correlation between two consecutive outcomes of the random number generator. This is a chisq test performed on pairs of random numbers. For the die throwing experiment, we would construct a total of 36 bins numbered BIN(1,1) - BIN(6,6). One bin for each possible pair of numbers we may get. We then throw the die twice to get an ordered pair of numbers, and drop a ball in the corresponding bin. We throw the die enough times so that each bin is expected to get an average of at least 5 balls. Then we perform a chisq test on the results to see if the balls appear to be more or less evenly distributed among all the bins.

This serial test for pairs of outcomes was done on both the MTH$RANDOM and RANDU generators with the following results:

SERIAL 2D   BINS=30x30   BALLS = 9000   chisq TESTS = 10   MTH$RANDOM
BALLS= 900   chisq = 895.799865   PROB= 0.4761399
BALLS= 900chisq = 945.200134PROB= 0.8615244
BALLS= 900chisq = 883.600036PROB= 0.3633031
BALLS= 900chisq = 905.000000PROB= 0.5624363
BALLS= 900chisq = 902.398986PROB= 0.5382197
BALLS= 900chisq = 911.800170PROB= 0.6241364
BALLS= 900chisq = 932.400573PROB= 0.7863315
BALLS= 900chisq = 865.400085PROB= 0.2157318
BALLS= 900chisq = 909.599670PROB= 0.6043593
BALLS= 900chisq = 901.799438PROB= 0.5325246

KS test to see how well the 10 chisq values fit the chi-square distribution:

KS     D= 0.2667681      PROB= 0.4751010

SERIAL 2D   BINS=30x30   BALLS = 9000   chisq TESTS = 10   RANDU
BALLS= 900   chisq = 848.799926   PROB= 0.1168594
BALLS= 900chisq = 891.799987PROB= 0.4386667
BALLS= 900chisq = 874.199829PROB= 0.2827876
BALLS= 900chisq = 884.199768PROB= 0.3687753
BALLS= 900chisq = 799.599975PROB= 0.0077433
BALLS= 900chisq = 841.800598PROB= 0.0864621
BALLS= 900chisq = 867.800720PROB= 0.2330545
BALLS= 900chisq = 892.601013PROB= 0.4461077
BALLS= 900chisq = 952.400085PROB= 0.8945085
BALLS= 900chisq = 921.599914PROB= 0.7069101

KS test to see how well the 10 chisq values fit the chi-square distribution:

KS     D= 0.3631590      PROB= 0.1430003


So far there is nothing blatantly wrong with either the MTH$RANDOM or RANDU generators. All the chisq values are within reason, and the distribution of the chisq values appears to fit the chi-square distribution.


3.3    THREE-DIMENSIONAL TESTS

We can now expand the serial test to triples of numbers. Suppose you are throwing a die, and you know that the results of the last two throws were a 2 and then a 5. What is the probability that you will now throw a 3? It should still be 1/6, because the outcome of this throw should not be affected by the outcome of the previous two throws. We can test for this by constructing 6**3 = 216 bins labeled BIN(1,1,1) - BIN(6,6,6), one bin for each possible triple of outcomes. We then throw the die three times to get a triple of numbers, and then drop a ball in the corresponding bin. We throw the die enough times so that each bin is expected to get an average of at least 5 balls. Then we perform a chisq test on the results to see if the balls appear to be more or less evenly distributed among the bins.

This 3-dimensional test extends the test for statistical independence. If we assume the first two variables are independent, then this test checks to see if the third variable is also independent. The mathematical definition of statistical independence between two random variables can be written as:

Pr( X(i+2) | X(i+1),X(i) ) = Pr( X(i+2) )

This serial test for triples of outcomes was done on both the MTH$RANDOM and RANDU generators. The generators were set up to simulate the throwing of a 30 sided die. A total of 270000 balls were tossed so that an average of 10 balls were expected to fall in each bin. A chisq test was then performed on the results. This was repeated a total of 10 times, and a KS test was done on the resulting chisq values to see how well they fit the chi-square distribution. Below are the results:


SERIAL 3D BINS=30x30x30 BALLS = 270000 MTH$RANDOM

SERIAL 3D   BINS=30x30   BALLS = 270000   chisq TESTS = 10   MTH$RANDOM
BALLS= 270   chisq = 27233.4375   PROB= 0.8438070
BALLS= 270chisq = 26732.8027PROB= 0.1262939
BALLS= 270chisq = 26866.4550PROB= 0.2845250
BALLS= 270chisq = 26765.3710PROB= 0.1561499
BALLS= 270chisq = 26650.6250PROB= 0.0659529
BALLS= 270chisq = 26665.5117PROB= 0.0751096
BALLS= 270chisq = 27165.1523PROB= 0.7621238
BALLS= 270chisq = 26861.5625PROB= 0.2786521
BALLS= 270chisq = 27002.1171PROB= 0.5027421
BALLS= 270chisq = 27090.8613PROB= 0.6547577

KS test to see how well the 10 chisq values fit the chi-square distribution:
KS     D= 0.3162388      PROB= 0.2699622


SERIAL 3D BINS=30x30x30 BALLS = 270000 RANDU

SERIAL 3D   BINS=30x30   BALLS = 270000   chisq TESTS = 10   RANDU
BALLS= 270   chisq = 454159.375   PROB= 1.0000000
BALLS= 270chisq = 452698.125PROB= 1.0000000
BALLS= 270chisq = 455055.343PROB= 1.0000000
BALLS= 270chisq = 455938.281PROB= 1.0000000
BALLS= 270chisq = 454747.718PROB= 1.0000000
BALLS= 270chisq = 453017.812PROB= 1.0000000
BALLS= 270chisq = 453818.781PROB= 1.0000000
BALLS= 270chisq = 454553.593PROB= 1.0000000
BALLS= 270chisq = 453205.687PROB= 1.0000000
BALLS= 270chisq = 454122.062PROB= 1.0000000

KS test to see how well the 10 chisq values fit the chi-square distribution:

KS     D= 1.0000000      PROB= 0.0000000

It is immediately obvious that the RANDU generator has failed miserably on the serial test for triples! The chisq value is way too high on every one of the 10 trials.



3.4    HIGHER DIMENSIONAL TESTS

We can easily expand the chisq tests for higher dimensions. One thing to note though, the number of bins increases exponentially as the number of dimensions increases. Thus there is a practical limit as to how high a dimension we can carry this test. In section 5 (Ratings for Various Generators) we analyze 5 popular random number generators by using the chisq test for dimensions 1 through 8. Higher dimensions are not as serious since such a large number of random numbers are required before some nonrandomness can be detected.

Before we use these tests to analyze some random number generators we should first take a look at what's inside these generators and how to properly use them.




Chapter 2          die3          Chapter 4
index
Back to Deley's Homepage