How Computers Generate Random Numbers

How Computers Generate Random Numbers - Chapter 3

Chapter 2		Chapter 4
	index
	Back to Deley's Homepage

3.0 ANALYSIS OF A SEQUENCE OF RANDOM NUMBERS

Enough theory for now. Let's turn to the analysis of a sequence of random numbers. There's nothing "random" about any specific sequence of numbers. But we can analyze the sequence of numbers to see if we can find any sufficient reason to doubt the hypothesis that the numbers could have come from multiple independent trials of a chance experiment.

In the following subsections we test the MTH$RANDOM and RANDU random number generators using

tests and compare the results. We will soon find out why the infamous RANDU generator is considered a very poor random number generator. The details of what's inside these generators is deferred until section 5.0 . For now accept that we can set up a random number generator to generate random numbers between 1 and NBINS, and that we can use a SEED value of 1 to initialize these generators.

3.1 ONE-DIMENSIONAL TESTS

Our first test then is to see if each possible outcome appears to be equally likely. Imagine we have a number of bins, and we toss balls at the bins. After throwing a certain number of balls at the bins, we count up how many balls fell in each bin. We then apply the

test to determine if the balls appear to have an equal probability of falling in each bin.

Arbitrarily choosing NBINS = 30 (number of bins), SEED = 1 (initial SEED value), and BALLS = 300, the

test was performed 10 times on the MTH$RANDOM generator with the following results:

30 BINS, 300 BALLS, 10 TESTS, MTH$RANDOM
BALLS= 300	= 35.1999969	PROB= 0.8019531
BALLS= 300	= 22.8000031	PROB= 0.2143845
BALLS= 300	= 36.7999992	PROB= 0.8485908
BALLS= 300	= 19.8000011	PROB= 0.1009650
BALLS= 300	= 48.7999992	PROB= 0.9878765
BALLS= 300	= 29.3999996	PROB= 0.5556139
BALLS= 300	= 22.8000011	PROB= 0.2143840
BALLS= 300	= 36.5999985	PROB= 0.8432655
BALLS= 300	= 29.3999996	PROB= 0.5556139
BALLS= 300	= 18.6000004	PROB= 0.0688844

Here

is the chi-square value, and PROB is the percentage of random sequences (with 30 - 1 = 29 degrees of freedom) which would have a lower

value. An extremely high value of PROB, say above 0.990, would indicate that some bins had a much higher than expected number of balls in them while other bins have a much lower than expected number of balls in them. In general, an extremely high value of PROB indicates that the experimental results do not reinforce the hypothesis that each bin had an equally likely chance of getting a ball. So far all of the PROB values appear reasonable.

We can further analyze the results of the 10

tests above by noting that the distribution of the 10

values should approximate the

distribution (for 30 - 1 = 29 degrees of freedom). We can apply the KS test to the 10

values to see how well their distribution approximates the

distribution for d.f. = 29.

KS D= 0.2019531 PROB= 0.8093885

Here D is the KS D value which is the maximum distance between the approximate and the real

function, and PROB is the percentage of times we would have a larger D value, meaning the approximation could be worse. A very low value of PROB here means it couldn't get much worse. Anything above a very low value means our distribution bears at least some resemblance to the function we're comparing it to. Again this result appears reasonable.

The following values were obtained when the above

test was applied 10 times to the RANDU generator:

30 BINS, 300 BALLS, 10 TESTS, RANDU
BALLS= 300	= 31.3999996	PROB= 0.6531839
BALLS= 300	= 60.7999992	PROB= 0.9995093
BALLS= 300	= 33.4000015	PROB= 0.7381005
BALLS= 300	= 24.3999977	PROB= 0.2910264
BALLS= 300	= 20.8000011	PROB= 0.1336691
BALLS= 300	= 16.6000023	PROB= 0.0319657
BALLS= 300	= 32.0000000	PROB= 0.6801265
BALLS= 300	= 30.2000027	PROB= 0.5959312
BALLS= 300	= 31.2000027	PROB= 0.6439425
BALLS= 300	= 45.5999985	PROB= 0.9742958

And a KS test on the above 10

values:

KS D= 0.2959312 PROB= 0.3452127

Again there is nothing as yet unusual about either the MTH$RANDOM or RANDU generators.

We can also test the MTH$RANDOM and RANDU generators using the KS test directly to see how well the output fits the hypothesized uniform distribution.

KS TEST ON MTH$RANDOM (Start with SEED = 1)
KS	NPOINTS=	10	D= 0.2142200	PROB= 0.7484154
KS	NPOINTS=	100	D= 0.0944867	PROB= 0.3338285
KS	NPOINTS=	1000	D= 0.0314864	PROB= 0.2746516
KS	NPOINTS=	10000	D= 0.0079555	PROB= 0.5514056
KS	NPOINTS=	100000	D= 0.0017763	PROB= 0.9105781
KS	NPOINTS=	1000000	D= 0.0009270	PROB= 0.3565834

KS TEST ON MTH$RANDOM (Start with SEED = 1)
KS	NPOINTS=	10	D= 0.6555050	PROB= 0.0003705
KS	NPOINTS=	100	D= 0.1326773	PROB= 0.0591586
KS	NPOINTS=	1000	D= 0.0337385	PROB= 0.2050490
KS	NPOINTS=	10000	D= 0.0063568	PROB= 0.8138239
KS	NPOINTS=	100000	D= 0.0042999	PROB= 0.0495556
KS	NPOINTS=	1000000	D= 0.0007990	PROB= 0.5457213

Note the only thing questionable here is the extremely low value of PROB on RANDU for the first 10 points. This indicates that perhaps a seed value of 1 for the RANDU generator is not a good place to start it. Indeed, the first 10 numbers from RANDU starting with SEED = 1 are:

RANDU first 10 numbers starting with seed = 1
1		0.0001831
2		0.0032959
3		0.0444950
4		0.5339386
5		0.0068024
6		0.8734164
7		0.1705012
8		0.3222913
9		0.9906484
10		0.7260775

Notice that half of the first 10 values are either below 0.006 or above 0.99. This alone does not indicate a problem with the RANDU generator, it does indicate that a SEED value of 1 is a poor choice to start it with. Note that the MTH$RANDOM generator does not have this problem when started with SEED = 1.

3.2 TWO-DIMENSIONAL TESTS

Our original hypothesis about our sequence of supposedly random numbers was that it could have come from a repeated chance experiment in which each possible outcome was equally likely and each test of the chance experiment was independent of all previous tests. So far we have been testing the sequence to see if each possible outcome appears to be equally likely. The remainder of our tests now focus on checking the hypothesis that each outcome is independent of all previous outcomes.

If we throw a die and get a 5, what is the probability that the next throw of the die will be a 3? If the probability is still 1/6, we say the next outcome is independent of the previous outcome. i.e. knowledge of the previous outcome does not give us any help in predicting the next outcome.

The mathematical definition of statistical independence between two random variables can be written as:

Pr( X(i+1) | X(i) ) = Pr( X(i+1) )

The serial test in two dimensions checks to see if there is any correlation between two consecutive outcomes of the random number generator. This is a

test performed on pairs of random numbers. For the die throwing experiment, we would construct a total of 36 bins numbered BIN(1,1) - BIN(6,6). One bin for each possible pair of numbers we may get. We then throw the die twice to get an ordered pair of numbers, and drop a ball in the corresponding bin. We throw the die enough times so that each bin is expected to get an average of at least 5 balls. Then we perform a

test on the results to see if the balls appear to be more or less evenly distributed among all the bins.

This serial test for pairs of outcomes was done on both the MTH$RANDOM and RANDU generators with the following results:

SERIAL 2D BINS=30x30 BALLS = 9000 TESTS = 10 MTH$RANDOM
BALLS= 900	= 895.799865	PROB= 0.4761399
BALLS= 900	= 945.200134	PROB= 0.8615244
BALLS= 900	= 883.600036	PROB= 0.3633031
BALLS= 900	= 905.000000	PROB= 0.5624363
BALLS= 900	= 902.398986	PROB= 0.5382197
BALLS= 900	= 911.800170	PROB= 0.6241364
BALLS= 900	= 932.400573	PROB= 0.7863315
BALLS= 900	= 865.400085	PROB= 0.2157318
BALLS= 900	= 909.599670	PROB= 0.6043593
BALLS= 900	= 901.799438	PROB= 0.5325246

KS test to see how well the 10

values fit the chi-square distribution:

KS D= 0.2667681 PROB= 0.4751010

SERIAL 2D BINS=30x30 BALLS = 9000 TESTS = 10 RANDU
BALLS= 900	= 848.799926	PROB= 0.1168594
BALLS= 900	= 891.799987	PROB= 0.4386667
BALLS= 900	= 874.199829	PROB= 0.2827876
BALLS= 900	= 884.199768	PROB= 0.3687753
BALLS= 900	= 799.599975	PROB= 0.0077433
BALLS= 900	= 841.800598	PROB= 0.0864621
BALLS= 900	= 867.800720	PROB= 0.2330545
BALLS= 900	= 892.601013	PROB= 0.4461077
BALLS= 900	= 952.400085	PROB= 0.8945085
BALLS= 900	= 921.599914	PROB= 0.7069101

KS test to see how well the 10

values fit the chi-square distribution:

KS D= 0.3631590 PROB= 0.1430003

So far there is nothing blatantly wrong with either the MTH$RANDOM or RANDU generators. All the

values are within reason, and the distribution of the

values appears to fit the chi-square distribution.

3.3 THREE-DIMENSIONAL TESTS

We can now expand the serial test to triples of numbers. Suppose you are throwing a die, and you know that the results of the last two throws were a 2 and then a 5. What is the probability that you will now throw a 3? It should still be 1/6, because the outcome of this throw should not be affected by the outcome of the previous two throws. We can test for this by constructing 6**3 = 216 bins labeled BIN(1,1,1) - BIN(6,6,6), one bin for each possible triple of outcomes. We then throw the die three times to get a triple of numbers, and then drop a ball in the corresponding bin. We throw the die enough times so that each bin is expected to get an average of at least 5 balls. Then we perform a

test on the results to see if the balls appear to be more or less evenly distributed among the bins.

This 3-dimensional test extends the test for statistical independence. If we assume the first two variables are independent, then this test checks to see if the third variable is also independent. The mathematical definition of statistical independence between two random variables can be written as:

Pr( X(i+2) | X(i+1),X(i) ) = Pr( X(i+2) )

This serial test for triples of outcomes was done on both the MTH$RANDOM and RANDU generators. The generators were set up to simulate the throwing of a 30 sided die. A total of 270000 balls were tossed so that an average of 10 balls were expected to fall in each bin. A

test was then performed on the results. This was repeated a total of 10 times, and a KS test was done on the resulting

values to see how well they fit the chi-square distribution. Below are the results:

SERIAL 3D BINS=30x30x30 BALLS = 270000 MTH$RANDOM

SERIAL 3D BINS=30x30 BALLS = 270000 TESTS = 10 MTH$RANDOM
BALLS= 270	= 27233.4375	PROB= 0.8438070
BALLS= 270	= 26732.8027	PROB= 0.1262939
BALLS= 270	= 26866.4550	PROB= 0.2845250
BALLS= 270	= 26765.3710	PROB= 0.1561499
BALLS= 270	= 26650.6250	PROB= 0.0659529
BALLS= 270	= 26665.5117	PROB= 0.0751096
BALLS= 270	= 27165.1523	PROB= 0.7621238
BALLS= 270	= 26861.5625	PROB= 0.2786521
BALLS= 270	= 27002.1171	PROB= 0.5027421
BALLS= 270	= 27090.8613	PROB= 0.6547577

KS test to see how well the 10

values fit the chi-square distribution:

KS D= 0.3162388 PROB= 0.2699622

SERIAL 3D BINS=30x30x30 BALLS = 270000 RANDU

SERIAL 3D BINS=30x30 BALLS = 270000 TESTS = 10 RANDU
BALLS= 270	= 454159.375	PROB= 1.0000000
BALLS= 270	= 452698.125	PROB= 1.0000000
BALLS= 270	= 455055.343	PROB= 1.0000000
BALLS= 270	= 455938.281	PROB= 1.0000000
BALLS= 270	= 454747.718	PROB= 1.0000000
BALLS= 270	= 453017.812	PROB= 1.0000000
BALLS= 270	= 453818.781	PROB= 1.0000000
BALLS= 270	= 454553.593	PROB= 1.0000000
BALLS= 270	= 453205.687	PROB= 1.0000000
BALLS= 270	= 454122.062	PROB= 1.0000000

KS test to see how well the 10

values fit the chi-square distribution:

KS D= 1.0000000 PROB= 0.0000000

It is immediately obvious that the RANDU generator has failed miserably on the serial test for triples! The

value is way too high on every one of the 10 trials.

3.4 HIGHER DIMENSIONAL TESTS

We can easily expand the

tests for higher dimensions. One thing to note though, the number of bins increases exponentially as the number of dimensions increases. Thus there is a practical limit as to how high a dimension we can carry this test. In section 5 (Ratings for Various Generators) we analyze 5 popular random number generators by using the

test for dimensions 1 through 8. Higher dimensions are not as serious since such a large number of random numbers are required before some nonrandomness can be detected.

Before we use these tests to analyze some random number generators we should first take a look at what's inside these generators and how to properly use them.

Chapter 2		Chapter 4
	index
	Back to Deley's Homepage