How Computers Generate Random Numbers - Chapter 3
Enough theory for now. Let's turn to the analysis of a sequence of
random numbers. There's nothing "random" about any specific sequence of
numbers. But we can analyze the sequence of numbers to see if we can
find any sufficient reason to doubt the hypothesis that the numbers
could have come from multiple independent trials of a chance experiment.
In the following subsections we test the MTH$RANDOM and RANDU random
number generators using tests and compare the results. We
will soon find out why the infamous RANDU generator is considered a very
poor random number generator. The details of what's inside these
generators is deferred until section 5.0 . For now accept that we can
set up a random number generator to generate random numbers between 1
and NBINS, and that we can use a SEED value of 1 to initialize these
generators.
Our first test then is to see if each possible outcome appears to be
equally likely. Imagine we have a number of bins, and we toss balls at
the bins. After throwing a certain number of balls at the bins, we
count up how many balls fell in each bin. We then apply the
test to determine if the balls appear to have an equal probability of
falling in each bin.
Arbitrarily choosing NBINS = 30 (number of bins), SEED = 1 (initial SEED
value), and BALLS = 300, the test was performed 10 times on
the MTH$RANDOM generator with the following results:
30 BINS, 300 BALLS, 10 TESTS, MTH$RANDOM |
BALLS= 300 | | = 35.1999969 | | PROB= 0.8019531 |
BALLS= 300 | | = 22.8000031 | | PROB= 0.2143845 |
BALLS= 300 | | = 36.7999992 | | PROB= 0.8485908 |
BALLS= 300 | | = 19.8000011 | | PROB= 0.1009650 |
BALLS= 300 | | = 48.7999992 | | PROB= 0.9878765 |
BALLS= 300 | | = 29.3999996 | | PROB= 0.5556139 |
BALLS= 300 | | = 22.8000011 | | PROB= 0.2143840 |
BALLS= 300 | | = 36.5999985 | | PROB= 0.8432655 |
BALLS= 300 | | = 29.3999996 | | PROB= 0.5556139 |
BALLS= 300 | | = 18.6000004 | | PROB= 0.0688844 |
Here is the chi-square value, and PROB is the percentage of random
sequences (with 30 - 1 = 29 degrees of freedom) which would have a lower
value. An extremely high value of PROB, say above 0.990, would
indicate that some bins had a much higher than expected number of balls
in them while other bins have a much lower than expected number of balls
in them. In general, an extremely high value of PROB indicates that the
experimental results do not reinforce the hypothesis that each bin
had an equally likely chance of getting a ball. So far all of the PROB
values appear reasonable.
We can further analyze the results of the 10 tests above by
noting that the distribution of the 10 values should approximate
the distribution (for 30 - 1 = 29 degrees of freedom). We
can apply the KS test to the 10 values to see how well their
distribution approximates the distribution for d.f. = 29.
KS D= 0.2019531 PROB= 0.8093885
Here D is the KS D value which is the maximum distance between the
approximate and the real function, and PROB is the
percentage of times we would have a larger D value, meaning the
approximation could be worse. A very low value of PROB here means it
couldn't get much worse. Anything above a very low value means our
distribution bears at least some resemblance to the function we're
comparing it to. Again this result appears reasonable.
The following values were obtained when the above test was
applied 10 times to the RANDU generator:
30 BINS, 300 BALLS, 10 TESTS, RANDU |
BALLS= 300 | | = 31.3999996 | | PROB= 0.6531839 |
BALLS= 300 | | = 60.7999992 | | PROB= 0.9995093 |
BALLS= 300 | | = 33.4000015 | | PROB= 0.7381005 |
BALLS= 300 | | = 24.3999977 | | PROB= 0.2910264 |
BALLS= 300 | | = 20.8000011 | | PROB= 0.1336691 |
BALLS= 300 | | = 16.6000023 | | PROB= 0.0319657 |
BALLS= 300 | | = 32.0000000 | | PROB= 0.6801265 |
BALLS= 300 | | = 30.2000027 | | PROB= 0.5959312 |
BALLS= 300 | | = 31.2000027 | | PROB= 0.6439425 |
BALLS= 300 | | = 45.5999985 | | PROB= 0.9742958 |
And a KS test on the above 10 values:
KS D= 0.2959312 PROB= 0.3452127
Again there is nothing as yet unusual about either the MTH$RANDOM or
RANDU generators.
We can also test the MTH$RANDOM and RANDU generators using the KS test
directly to see how well the output fits the hypothesized uniform
distribution.
KS TEST ON MTH$RANDOM (Start with SEED = 1) |
KS | | NPOINTS= | 10 | | D= 0.2142200 | | PROB= 0.7484154 |
KS | | NPOINTS= | 100 | | D= 0.0944867 | | PROB= 0.3338285 |
KS | | NPOINTS= | 1000 | | D= 0.0314864 | | PROB= 0.2746516 |
KS | | NPOINTS= | 10000 | | D= 0.0079555 | | PROB= 0.5514056 |
KS | | NPOINTS= | 100000 | | D= 0.0017763 | | PROB= 0.9105781 |
KS | | NPOINTS= | 1000000 | | D= 0.0009270 | | PROB= 0.3565834 |
KS TEST ON MTH$RANDOM (Start with SEED = 1) |
KS | | NPOINTS= | 10 | | D= 0.6555050 | | PROB= 0.0003705 |
KS | | NPOINTS= | 100 | | D= 0.1326773 | | PROB= 0.0591586 |
KS | | NPOINTS= | 1000 | | D= 0.0337385 | | PROB= 0.2050490 |
KS | | NPOINTS= | 10000 | | D= 0.0063568 | | PROB= 0.8138239 |
KS | | NPOINTS= | 100000 | | D= 0.0042999 | | PROB= 0.0495556 |
KS | | NPOINTS= | 1000000 | | D= 0.0007990 | | PROB= 0.5457213 |
Note the only thing questionable here is the extremely low value of
PROB on RANDU for the first 10 points. This indicates that perhaps a
seed value of 1 for the RANDU generator is not a good place to start
it. Indeed, the first 10 numbers from RANDU starting with SEED = 1
are:
RANDU first 10 numbers starting with seed = 1 |
1 | | 0.0001831 |
2 | | 0.0032959 |
3 | | 0.0444950 |
4 | | 0.5339386 |
5 | | 0.0068024 |
6 | | 0.8734164 |
7 | | 0.1705012 |
8 | | 0.3222913 |
9 | | 0.9906484 |
10 | | 0.7260775 |
Notice that half of the first 10 values are either below 0.006 or above
0.99. This alone does not indicate a problem with the RANDU generator,
it does indicate that a SEED value of 1 is a poor choice to start it
with. Note that the MTH$RANDOM generator does not have this problem when
started with SEED = 1.
Our original hypothesis about our sequence of supposedly random numbers
was that it could have come from a repeated chance experiment in which
each possible outcome was equally likely and each test of the chance
experiment was independent of all previous tests. So far we have been
testing the sequence to see if each possible outcome appears to be
equally likely. The remainder of our tests now focus on checking the
hypothesis that each outcome is independent of all previous outcomes.
If we throw a die and get a 5, what is the probability that the next
throw of the die will be a 3? If the probability is still 1/6, we say
the next outcome is independent of the previous outcome. i.e. knowledge
of the previous outcome does not give us any help in predicting the next
outcome.
The mathematical definition of statistical independence between two
random variables can be written as:
Pr( X(i+1) | X(i) ) = Pr( X(i+1) )
The serial test in two dimensions checks to see if there is any
correlation between two consecutive outcomes of the random number
generator. This is a test performed on pairs of random
numbers. For the die throwing experiment, we would construct a total of
36 bins numbered BIN(1,1) - BIN(6,6). One bin for each possible pair of
numbers we may get. We then throw the die twice to get an ordered pair
of numbers, and drop a ball in the corresponding bin. We throw the die
enough times so that each bin is expected to get an average of at least
5 balls. Then we perform a test on the results to see if the
balls appear to be more or less evenly distributed among all the bins.
This serial test for pairs of outcomes was done on both the MTH$RANDOM
and RANDU generators with the following results:
SERIAL 2D BINS=30x30 BALLS = 9000 TESTS = 10 MTH$RANDOM |
BALLS= 900 | | = 895.799865 | | PROB= 0.4761399 |
BALLS= 900 | | = 945.200134 | | PROB= 0.8615244 |
BALLS= 900 | | = 883.600036 | | PROB= 0.3633031 |
BALLS= 900 | | = 905.000000 | | PROB= 0.5624363 |
BALLS= 900 | | = 902.398986 | | PROB= 0.5382197 |
BALLS= 900 | | = 911.800170 | | PROB= 0.6241364 |
BALLS= 900 | | = 932.400573 | | PROB= 0.7863315 |
BALLS= 900 | | = 865.400085 | | PROB= 0.2157318 |
BALLS= 900 | | = 909.599670 | | PROB= 0.6043593 |
BALLS= 900 | | = 901.799438 | | PROB= 0.5325246 |
KS test to see how well the 10 values fit the chi-square
distribution:
KS D= 0.2667681 PROB= 0.4751010
SERIAL 2D BINS=30x30 BALLS = 9000 TESTS = 10 RANDU |
BALLS= 900 | | = 848.799926 | | PROB= 0.1168594 |
BALLS= 900 | | = 891.799987 | | PROB= 0.4386667 |
BALLS= 900 | | = 874.199829 | | PROB= 0.2827876 |
BALLS= 900 | | = 884.199768 | | PROB= 0.3687753 |
BALLS= 900 | | = 799.599975 | | PROB= 0.0077433 |
BALLS= 900 | | = 841.800598 | | PROB= 0.0864621 |
BALLS= 900 | | = 867.800720 | | PROB= 0.2330545 |
BALLS= 900 | | = 892.601013 | | PROB= 0.4461077 |
BALLS= 900 | | = 952.400085 | | PROB= 0.8945085 |
BALLS= 900 | | = 921.599914 | | PROB= 0.7069101 |
KS test to see how well the 10 values fit the chi-square
distribution:
KS D= 0.3631590 PROB= 0.1430003
So far there is nothing blatantly wrong with either the MTH$RANDOM or
RANDU generators. All the values are within reason, and the
distribution of the values appears to fit the chi-square
distribution.
We can now expand the serial test to triples of numbers. Suppose you
are throwing a die, and you know that the results of the last two throws
were a 2 and then a 5. What is the probability that you will now throw
a 3? It should still be 1/6, because the outcome of this throw should
not be affected by the outcome of the previous two throws. We can test
for this by constructing 6**3 = 216 bins labeled BIN(1,1,1) -
BIN(6,6,6), one bin for each possible triple of outcomes. We then
throw the die three times to get a triple of numbers, and then drop
a ball in the corresponding bin. We throw the die enough times so that
each bin is expected to get an average of at least 5 balls. Then we
perform a test on the results to see if the balls appear to
be more or less evenly distributed among the bins.
This 3-dimensional test extends the test for statistical independence.
If we assume the first two variables are independent, then this test
checks to see if the third variable is also independent. The
mathematical definition of statistical independence between two random
variables can be written as:
Pr( X(i+2) | X(i+1),X(i) ) = Pr( X(i+2) )
This serial test for triples of outcomes was done on both the
MTH$RANDOM and RANDU generators. The generators were set up to simulate
the throwing of a 30 sided die. A total of 270000 balls were tossed so
that an average of 10 balls were expected to fall in each bin. A
test was then performed on the results. This was repeated a
total of 10 times, and a KS test was done on the resulting
values to see how well they fit the chi-square distribution. Below are
the results:
SERIAL 3D BINS=30x30x30 BALLS = 270000 MTH$RANDOM
SERIAL 3D BINS=30x30 BALLS = 270000 TESTS = 10 MTH$RANDOM |
BALLS= 270 | | = 27233.4375 | | PROB= 0.8438070 |
BALLS= 270 | | = 26732.8027 | | PROB= 0.1262939 |
BALLS= 270 | | = 26866.4550 | | PROB= 0.2845250 |
BALLS= 270 | | = 26765.3710 | | PROB= 0.1561499 |
BALLS= 270 | | = 26650.6250 | | PROB= 0.0659529 |
BALLS= 270 | | = 26665.5117 | | PROB= 0.0751096 |
BALLS= 270 | | = 27165.1523 | | PROB= 0.7621238 |
BALLS= 270 | | = 26861.5625 | | PROB= 0.2786521 |
BALLS= 270 | | = 27002.1171 | | PROB= 0.5027421 |
BALLS= 270 | | = 27090.8613 | | PROB= 0.6547577 |
KS test to see how well the 10 values fit the chi-square
distribution:
KS D= 0.3162388 PROB= 0.2699622
SERIAL 3D BINS=30x30x30 BALLS = 270000 RANDU
SERIAL 3D BINS=30x30 BALLS = 270000 TESTS = 10 RANDU |
BALLS= 270 | | = 454159.375 | | PROB= 1.0000000 |
BALLS= 270 | | = 452698.125 | | PROB= 1.0000000 |
BALLS= 270 | | = 455055.343 | | PROB= 1.0000000 |
BALLS= 270 | | = 455938.281 | | PROB= 1.0000000 |
BALLS= 270 | | = 454747.718 | | PROB= 1.0000000 |
BALLS= 270 | | = 453017.812 | | PROB= 1.0000000 |
BALLS= 270 | | = 453818.781 | | PROB= 1.0000000 |
BALLS= 270 | | = 454553.593 | | PROB= 1.0000000 |
BALLS= 270 | | = 453205.687 | | PROB= 1.0000000 |
BALLS= 270 | | = 454122.062 | | PROB= 1.0000000 |
KS test to see how well the 10 values fit the chi-square
distribution:
KS D= 1.0000000 PROB= 0.0000000
It is immediately obvious that the RANDU generator has failed miserably
on the serial test for triples! The value is way too high
on every one of the 10 trials.
We can easily expand the tests for higher dimensions. One
thing to note though, the number of bins increases exponentially as the
number of dimensions increases. Thus there is a practical limit as to
how high a dimension we can carry this test. In section 5 (Ratings for
Various Generators) we analyze 5 popular random number generators by
using the test for dimensions 1 through 8. Higher
dimensions are not as serious since such a large number of random
numbers are required before some nonrandomness can be detected.
Before we use these tests to analyze some random number generators we
should first take a look at what's inside these generators and how to
properly use them.