How Computers Generate Random Numbers - Chapter 2

Chapter 1          die2          Chapter 3
index
Back to Deley's Homepage



2.0    INFERENTIAL STATISTICS

Up until now we have been dealing with probability theory. We started with a chance experiment in which the probability of each result is given, and then worked our way forward to find out what the probability of a particular compound experiment would be. We now turn to the field of inferential statistics, where we attempt to do the reverse. Given a sequence of numbers created by N trials of a chance experiment, we attempt to work our way backwards and estimate what the defining properties of the chance experiment are.


2.1    THE PROBABILITY OF ERROR

For our die throwing experiment, we would like to estimate the probability of each outcome based on experimental data. If we throw the die N times, then the best estimate we could make for the probability of a particular outcome zeta would be to count the number of times zeta occurred in N trials and form the ratio:

ratiozetaovern


If we define a frequency function f(zeta) to be the number of times zeta occurred in N trials, we then would have:

przeta

This is our estimate of the probability of zeta based on N trials of the chance experiment. Note that some people use this equation as the definition for the probability of an event zeta when N goes to infinity:

przetadef

For our die throwing experiment, assume we want to estimate the probability of zeta1, (the side with one dot shows up). Say we throw the die 6 times, and out of those 6 times zeta1 occurred twice. We then calculate the probability of zeta1 as 2/6.

We now have the problem of deciding which value to accept as correct, and how much faith we can place on our choice. Let us define two possible hypotheses, which we will denote as H0 and H1.

H0 : The probability of zeta1 is 1/6
H1 : The probability of zeta1 is 2/6

In choosing which hypothesis to believe, there are two possible errors we could make, known as a type I error and a type II error:

Type I error : Choose H1 when H0 is actually true.
Type II error : Choose H0 when H1 is actually true.

We can determine the probability of making each type of error by examining the theoretical probability distributions for f(zeta) for each hypothesis being tested.

We rolled the die six times, so the possible number of times zeta1 can occur is between 0 and 6 times. Let f(zeta1) be the number of times zeta1 occurred in six rolls of the die. The following binomial formula will calculate the probability of having zeta1 occur k times in n trials:

prfzetaeqk

Where:
Pr( f(zeta1) = k ) = probability of exactly k occurrences of zeta1 in n trials

f(zeta1) = number of occurrences of zeta1 in n trials

n = number of trials

p = Pr(zeta1)     (probability of zeta1)

q = 1 - p

ntk      (n things taken k at a time)


With n = 6 trials, and assuming hypothesis H0 to be true (the probability of throwing zeta1 is 1/6), we obtain the following table:

f(zeta1)      Pr ( f(zeta1) | H0 )      sideways bar graph
00.334897|================
10.401877|====================
20.200938|==========
30.053583|==
40.008037|
50.000643|
60.000021|

As you can see from the numbers, the highest probability is for throwing one zeta1, although there is still a high probability for throwing zero or two zeta1's. If we repeat the binomial calculations this time assuming hypothesis H1 is true (the probability of zeta1 is 2/6), we arrive at the following slightly different table:

f(zeta1)      Pr ( f(zeta1) | H1 )      sideways bar graph
00.087791|====
10.263374|=============
20.329218|================
30.219478|==========
40.082304|====
50.016460|=
60.001371|

We can now calculate the probability of making a type I error and of making a type II error.

Define:

M : f(zeta1) = 2      (the event that zeta1 occurs twice in six rolls)
H0 : The probability of zeta1 is 1/6
H1 : The probability of zeta1 is 2/6

Notation:

Pr(A|B) : The probability of A given B

From Bayes's Theorem we then have:

Pr(H0|M) =

applying the principle of total probability to Pr(M)

Pr(H0|M) =

and if we allow both hypothesis H0 and H1 to be equally likely so that Pr(H0) = Pr(H1) = 1/2

Pr(H0|M) =

Similarly, we can repeat the above steps to obtain

Pr(H1|M) =


From these two formulas and the tables above we can now calculate the probability of making a type I error and a type II error.

First note that

Pr(M|H0) = probability of rolling zeta1 two out of 6 times when probability of rolling zeta1 is 1/6
= 0.200938 (from table above)

Pr(M|H1) = probability of rolling zeta1 two out of 6 times
when probability of rolling zeta1 = 2/6
= 0.329218 (from table above)

Then from the above formulas and data we have:

Pr( type I error ) = Pr(H0|M) =

=prh0mh0eqh1n

=0.379

Pr( type II error )=Pr(H1|M)=

=

=0.621

In this case we have a 38% chance of being wrong if we choose to believe Pr(zeta1) = 2/6 and a 62% chance of being wrong if we choose to believe Pr(zeta1) = 1/6. Either way we're likely to be wrong. The data does not allow us to differentiate between the two possibilities with much confidence.


Let us repeat the above calculations this time assuming we rolled zeta1 five out of six times. This would lead us to the estimate that Pr(zeta1) = 5/6. Let us create a new hypothesis:

H1 : The probability of zeta1 is 5/6

and let us repeat the binomial calculations assuming hypothesis H1 is true (the probability of zeta1 is 5/6), we arrive at the following table:

f(zeta1)      Pr ( f(zeta1) | H1 )      sideways bar graph
60.001371|
00.000021|
10.000430|
20.008037|
30.053583|==
40.200938|==========
50.401877|====================
60.334897|================

Following the same procedure as above:

Define:

M : f(zeta1) = 5      (the event that zeta1 occurs twice in six rolls)
H0 : The probability of zeta1 is 1/6
H1 : The probability of zeta1 is 5/6


Pr( type I error ) = Pr(H0|M) =

=

=0.001

Pr( type II error )=Pr(H1|M)=

=

=0.998

In this case, given a choice between believing the theoretician who claims Pr(zeta1) = 1/6 and believing our own tests which claim Pr(zeta1) = 5/6, if we choose to believe the theoretician we have a 99.8% chance of being wrong, and if we choose to believe our own experimental results we have a 0.1% chance of being wrong. Clearly the odds are in favor of our being right. And we only threw the die six times!

This then is why a sequence consisting of all ones { 1, 1, ..., 1 } is unlikely to be from a compound experiment in which each outcome is equally likely. If we hypothesize that each outcome is equally likely (hypothesis H0), then all elements in our sample space Sn do have an equal probability. But if we conjure up an alternative hypothesis about the probability of outcomes (hypothesis H1), then all the elements in our sample space Sn no longer have an equal probability. The probability will be higher for some elements and lower for other elements. Our hypothesis H1 is specifically formulated to give whatever sequence we actually got the highest probability possible. If the resulting probability of our sequence occurring under our original H0 hypothesis is very low, then it's almost for certain that it will loose out under an alternate hypothesis H1. Thus if all we want to do is test whether or not to reject our original H0 hypothesis, all we need to do is see what the probability of our sequence occurring under our original H0 hypothesis is. If it is very low, we can reject our H0 hypothesis under the implied knowledge that a better hypothesis H1 probably exists. We don't actually have to formulate a better hypothesis H1, we just have to know that we could make one if we cared to.

Thus from here on out we won't go the whole distance of formulating an alternative hypothesis H1 and then calculating the probabilities of type I errors and type II errors. We'll just say if an actual outcome of a compound experiment has a very low probability of occurring under our H0 hypothesis (or null hypothesis), then that hypothesis is probably not true.




2.2    GENERAL TEST PROCEDURE

Our general procedure then for testing the results of a compound experiment is:

1.   Formulate a null hypothesis H0 about the single chance experiment which was repeated N times to create a sequence of N values. To test a sequence of supposedly random numbers, our null hypothesis H0 is that each outcome of the chance experiment is equally likely, and that each trial of the chance experiment is independent of all previous trials.

2.Formulate a real valued function g which somehow tests the null hypothesis H0. To test a sequence of supposedly random numbers, our function g might be one which counts the number of occurrences of a particular outcome.

3.Under our null hypothesis H0, mathematically define a sequence of N random variables:

( X1, X2, ..., Xn )

and apply our function g to the sequence of N random variables creating a new random variable Z:

Z = g( X1, X2, ..., Xn )

Then determine the probability density function of Z either by mathematical calculations or by finding a table of the particular probability density function we're interested in.

4.Take the specific sequence of values supposedly obtained by performing a chance experiment N times:

( x1, x2, ..., xn )

and apply the function g to obtain a specific value z

z = g( x1, x2, ..., xn )

5.Determine from the probability density function of Z how likely or unlikely we are to obtain our value of z assuming our null hypothesis H0 is true. If the probability is very low, we may reject our hypothesis H0 as being most likely incorrect.




2.3    THE CHI-SQUARE TEST

We come now to our first real test for a sequence of random numbers. The test is the chisq (Chi-Square, pronounced ki-square) test. It tests the assumption that the probability distribution for a given chance experiment is as specified. For our die throwing experiment, it tests the probability that each possible outcome is equally likely with a probability of 1/6. The formula for our chisq g function is:
Define:

N = number of trials
k = number of possible outcomes of the chance experiment
f(zetai) = number of occurrences of zetai in N trials
E(zetai) = The expected number of occurrences of zetai in N trials = N*Pr(zetai).

chisqdef

We now go through the steps as defined above:
1.   Formulate the null hypothesis H0. Our null hypothesis H0 for the die throwing experiment is that each of the six possible outcomes is equally likely with a probability of 1/6.

2.Formulate a real valued function g which somehow tests the null hypothesis. Our g function is the chisq test. (This is not exactly the chisq function, but for N large enough it approaches the chisq distribution.) Note that if the result chisq is zero, then each possible outcome occurred exactly the expected number of times. On the other hand, if the result chisq is very large, it indicates that one or more of the possible outcomes of the chance experiment occurred far less or far more than would be expected if our null hypothesis H0 was true.

3.Determine the probability density function of Z = g(X1,X2,...Xn). As mentioned in step 2, our chisq test function g is not exactly the chisq function, but for N large enough it approaches the chisq distribution of which tables are available for in any book on probability theory or any book of mathematical tables such as the CRC Standard Mathematical Tables book. The stated rule of thumb is N should be large enough so that every possible outcome is expected to occur at least 5 times. Thus for our die throwing experiment we should repeat it at least 30 times, since 30*(1/6) = 5.

Each row of the chisq distribution table is for a particular number of "degrees of freedom" which we will abbreviate to df. The value to choose for df is one less than the number of possible outcomes for our chance experiment. Thus for our die throwing experiment where there are 6 possible outcomes, we have df = 6 - 1 = 5.

Below is the chisq distribution for 5 degrees of freedom:

p=1%      p=5%      p=25%      p=50%      p=75%      p=95%      p=99%
0.55431.14552.6754.3516.62611.0715.09

(Note: See NUMERICAL RECIPES: The Art of Scientific Computing listed in the references for computer code that will calculate chisq values so you don't have to look them up in a table.)

This is the cumulative chisq distribution. If we recall the distribution itself is typically bell shaped, then the extremes of the bell shaped curve where the probability is very low corresponds to the extremes of the cumulative distribution. Thus if we are below the p=5% mark or above the p=95% mark, the corresponding probability is quite low. On the other hand, if we are between the p=25% mark and the p=75% mark, the corresponding probability is high.

(Note that some tables list the probability for (1 - p) instead of p, so that 0.5543 is listed as p=99% instead of p=1%, and 15.09 is listed as p=1% instead of p=99%. It should be easy to tell by inspection which probability the table is listing.)

4.   We can now apply our chisq test to a specific outcome of rolling the die 30 times. Assume we threw the die 30 times and obtained the following results:

f(zeta1) = 7
f(zeta2) = 5
f(zeta3) = 4
f(zeta4) = 6
f(zeta5) = 6
f(zeta6) = 2

Then our chisq test becomes:

chisq32
= 3.2

5.We can now check our result of 3.2 against the chisq table above in step 3 and note that it falls between the p=25% mark and the p=50% mark. This is a high probability region and so we are not led to believe our null hypothesis H0 is false.

Further notes on chisq:

Note that not only are overly large values of chisq considered highly improbable, but so are overly small values of chisq. Whereas overly large values of chisq lead us to believe the probabilities of each possible outcome of our chance experiment are not what we thought them to be, overly small values of chisq lead us to believe the N trials of our chance experiment were not independent.

As an example, if we threw a single die 6 times, there would be 6**6 or 46656 possible outcomes. If we calculate the chisq value for each of the possible values and tabulate them we get the following table:

chisq      f(chisq)
---------------
0720
210800
416200
69000
87200
122100
14450
20180
306

Although a chisq value of 0 happens 720 times, a chisq value of 2 happens 10800 times, making it much more likely. We can compare our chisq values here with the chisq distribution table above for 5 degrees of freedom. Our values of chisq = 2 through 12 which appear to be the most likely ones lie above the p=5% mark and below the p=95% mark on the table, where we would expect them to lie. Note that this comparison was for N = 6, although we have as a rule of thumb that N should be at least 30 for this test.

Another note about the chisq distribution, most tables of the chisq distribution go up to 30 degrees of freedom. Above that the distribution approaches the normal distribution. If your chance experiment has more than 30 degrees of freedom (number of possible outcomes minus one), you can use the following formula to convert the chisq distribution for df > 30 into the normal distribution with mean = 0 and variance = 1:

given: chisq (and) df

Z = z

Here Z is a normally distributed variable with mean = 0 and variance = 1. As an example, let's assume we are simulating the random choosing of a single card from a deck of 52. There are 52 possible outcomes. We will need to repeat this experiment 52*5=260 times so that each possible outcome occurs about 5 times. Our number of degrees of freedom is 52 - 1 = 51. We calculate chisq and get chisq = 30.4 . We now calculate Z from the above equation given chisq = 30.4 and df = 51:

Z = zn
= -2.33

We now look up in a table of the cumulative normal distribution and find this value of Z corresponds to the first 1% of the distribution. Thus our chisq value is much lower than expected. We could conclude that something is probably amiss.

One last note about chisq. If you are simulating the flipping of a coin, then the number of degrees of freedom is 1 (two possible outcomes). Our rule of thumb about performing enough experiments so that each possible outcome occurs at least 5 times is not adequate in this case. Because the number of degrees of freedom is so low, we need to compensate for this by increasing our number of trials. Enough trials so that each outcome is expected to occur at least 75 times is required in this case.




2.4 THE KOLMOGOROV-SMIRNOV TEST

If we repeat the die throwing experiment 30 times, we can perform one chisq test on the results. If we repeat the die throwing experiment 3000 times, we can either perform one chisq test to get an idea of the long term probabilities of each possible outcome, or we can perform 300 chisq tests for each block of 30 trials to get an idea of the relative short term fluctuations. In the latter case, we would expect the distribution of our 300 chisq tests to approach the chisq distribution.

This brings us to our next subject, which is testing a set of data points to see how closely it matches a given continuous distribution. The Kolmogorov-Smirnov (KS) test compares a set of nc data points Snc (e.g. set of a number of chisq data points) with a given cumulative distribution function P(x) (e.g. the chisq cumulative distribution).

Given a set of nc data points Snc, we can create a cumulative distribution function Snc(x) from the data points. The function will be a step function, rising from 0 to 1, with each step size equal to 1/nc. Below is a generalized example which compares a cumulative distribution step function Snc(x) for 9 data values against a continuous cumulative distribution P(x) for a function which is uniform between 0 and 1:
ks
Kolmogorov-Smirnov statistic D. A measured distribution of values in x ( shown as N dots on the lower abscissa) is to be compared with a theoretical distribution whose cumulative probability distribution is plotted as P(x). A step-function cumulative probability distribution Snc(x) (labeled ssubnofx in the figure above) is constructed, one which rises an equal amount at each measured point. D is the greatest distance between the two cumulative distributions. (Reprinted from NUMERICAL RECIPES: The Art of Scientific Computing)

The KS test is now very simple. Define D as the maximum vertical difference between these two functions:

D = max |Snc(x) - P(x)|

And define DQ as:

dq

where nc is the number of data points.

We can then go directly to a table for the KS test (similar to the chisq table), and read off where in the cumulative distribution our DQ value lies, remembering that values below p=5% only occur 5% of the time, and values above p=95% only occur 5% of the time. Tables for this KS test are not as easy to find as tables for the chisq test. Fortunately, as in the chisq case, if our number of data values nc in set Snc is greater than 30, the distribution approaches something which can be easily calculated as follows:

Given:   D = max |Snc(x) - P(x)|
nc > 30   (number of data points)







Chapter 1          die2          Chapter 3
index
Back to Deley's Homepage