Chapter 1 | Chapter 3 | |||||

## index | ||||||

## Back to Deley's Homepage |

If we define a frequency function f() to be the number of times occurred in N trials, we then would have:

This is our estimate of the probability of based on N trials of the chance experiment. Note that some people use this equation as the definition for the probability of an event when N goes to infinity:

For our die throwing experiment, assume we want to estimate the probability of , (the side with one dot shows up). Say we throw the die 6 times, and out of those 6 times occurred twice. We then calculate the probability of as 2/6.

We now have the problem of deciding which value to accept as correct, and how much faith we can place on our choice. Let us define two possible hypotheses, which we will denote as H0 and H1.

H0 : The probability of is 1/6

H1 : The probability of is 2/6

H1 : The probability of is 2/6

In choosing which hypothesis to believe, there are two possible errors we could make, known as a type I error and a type II error:

Type I error : Choose H1 when H0 is actually true.

Type II error : Choose H0 when H1 is actually true.

Type II error : Choose H0 when H1 is actually true.

We can determine the probability of making each type of error by examining the theoretical probability distributions for f() for each hypothesis being tested.

We rolled the die six times, so the possible number of times can occur is between 0 and 6 times. Let f() be the number of times occurred in six rolls of the die. The following binomial formula will calculate the probability of having occur k times in n trials:

Where:

Pr( f() = k ) = probability of exactly k occurrences of in n trials

f() = number of occurrences of in n trials

n = number of trials

p = Pr() (probability of )

q = 1 - p

(n things taken k at a time)

f() = number of occurrences of in n trials

n = number of trials

p = Pr() (probability of )

q = 1 - p

(n things taken k at a time)

With n = 6 trials, and assuming hypothesis H0 to be true (the probability of throwing is 1/6), we obtain the following table:

f() | Pr ( f() | H0 ) | sideways bar graph | ||
---|---|---|---|---|

0 | 0.334897 | |================ | ||

1 | 0.401877 | |==================== | ||

2 | 0.200938 | |========== | ||

3 | 0.053583 | |== | ||

4 | 0.008037 | | | ||

5 | 0.000643 | | | ||

6 | 0.000021 | | |

As you can see from the numbers, the highest probability is for throwing one , although there is still a high probability for throwing zero or two 's. If we repeat the binomial calculations this time assuming hypothesis H1 is true (the probability of is 2/6), we arrive at the following slightly different table:

f() | Pr ( f() | H1 ) | sideways bar graph | ||
---|---|---|---|---|

0 | 0.087791 | |==== | ||

1 | 0.263374 | |============= | ||

2 | 0.329218 | |================ | ||

3 | 0.219478 | |========== | ||

4 | 0.082304 | |==== | ||

5 | 0.016460 | |= | ||

6 | 0.001371 | | |

We can now calculate the probability of making a type I error and of making a type II error.

Define:

M : f() = 2 (the event that occurs twice in six rolls)

H0 : The probability of is 1/6

H1 : The probability of is 2/6

H0 : The probability of is 1/6

H1 : The probability of is 2/6

Notation:

Pr(A|B) : The probability of A given B

From Bayes's Theorem we then have:

Pr(H0|M) =

applying the principle of total probability to Pr(M)

Pr(H0|M) =

and if we allow both hypothesis H0 and H1 to be equally likely so that Pr(H0) = Pr(H1) = 1/2

Pr(H0|M) =

Similarly, we can repeat the above steps to obtain

Pr(H1|M) =

From these two formulas and the tables above we can now calculate the probability of making a type I error and a type II error.

First note that

Pr(M|H0) | = | probability of rolling two out of 6 times when probability of rolling is 1/6 |

= | 0.200938 (from table above) |

Pr(M|H1) | = | probability of rolling two out of 6 times |

when probability of rolling = 2/6 | ||

= | 0.329218 (from table above) |

Then from the above formulas and data we have:

Pr( type I error ) | = | Pr(H0|M) | = | |

= | ||||

= | 0.379 | |||

Pr( type II error ) | = | Pr(H1|M) | = | |

= | ||||

= | 0.621 |

In this case we have a 38% chance of being wrong if we choose to believe Pr() = 2/6 and a 62% chance of being wrong if we choose to believe Pr() = 1/6. Either way we're likely to be wrong. The data does not allow us to differentiate between the two possibilities with much confidence.

Let us repeat the above calculations this time assuming we rolled five out of six times. This would lead us to the estimate that Pr() = 5/6. Let us create a new hypothesis:

H1 : The probability of is 5/6

and let us repeat the binomial calculations assuming hypothesis H1 is true (the probability of is 5/6), we arrive at the following table:

f() | Pr ( f() | H1 ) | sideways bar graph | ||
---|---|---|---|---|

6 | 0.001371 | | | ||

0 | 0.000021 | | | ||

1 | 0.000430 | | | ||

2 | 0.008037 | | | ||

3 | 0.053583 | |== | ||

4 | 0.200938 | |========== | ||

5 | 0.401877 | |==================== | ||

6 | 0.334897 | |================ |

Following the same procedure as above:

Define:

M : f() = 5 (the event that occurs twice in six rolls)

H0 : The probability of is 1/6

H1 : The probability of is 5/6

H0 : The probability of is 1/6

H1 : The probability of is 5/6

Pr( type I error ) | = | Pr(H0|M) | = | |

= | ||||

= | 0.001 | |||

Pr( type II error ) | = | Pr(H1|M) | = | |

= | ||||

= | 0.998 |

In this case, given a choice between believing the theoretician who claims Pr() = 1/6 and believing our own tests which claim Pr() = 5/6, if we choose to believe the theoretician we have a 99.8% chance of being wrong, and if we choose to believe our own experimental results we have a 0.1% chance of being wrong. Clearly the odds are in favor of our being right. And we only threw the die six times!

This then is why a sequence consisting of all ones { 1, 1, ..., 1 } is unlikely to be from a compound experiment in which each outcome is equally likely. If we hypothesize that each outcome is equally likely (hypothesis H0), then all elements in our sample space Sn do have an equal probability. But if we conjure up an alternative hypothesis about the probability of outcomes (hypothesis H1), then all the elements in our sample space Sn no longer have an equal probability. The probability will be higher for some elements and lower for other elements. Our hypothesis H1 is specifically formulated to give whatever sequence we actually got the highest probability possible. If the resulting probability of our sequence occurring under our original H0 hypothesis is very low, then it's almost for certain that it will loose out under an alternate hypothesis H1. Thus if all we want to do is test whether or not to reject our original H0 hypothesis, all we need to do is see what the probability of our sequence occurring under our original H0 hypothesis is. If it is very low, we can reject our H0 hypothesis under the implied knowledge that a better hypothesis H1 probably exists. We don't actually have to formulate a better hypothesis H1, we just have to know that we could make one if we cared to.

Thus from here on out we won't go the whole distance of formulating an alternative hypothesis H1 and then calculating the probabilities of type I errors and type II errors. We'll just say if an actual outcome of a compound experiment has a very low probability of occurring under our H0 hypothesis (or null hypothesis), then that hypothesis is probably not true.

1. | Formulate a null hypothesis H0 about the single chance experiment which was repeated N times to create a sequence of N values. To test a sequence of supposedly random numbers, our null hypothesis H0 is that each outcome of the chance experiment is equally likely, and that each trial of the chance experiment is independent of all previous trials. | |||

2. | Formulate a real valued function g which somehow tests the null hypothesis H0. To test a sequence of supposedly random numbers, our function g might be one which counts the number of occurrences of a particular outcome. | |||

3. | Under our null hypothesis H0, mathematically define a sequence of N random variables: | |||

( X1, X2, ..., Xn ) | ||||

and apply our function g to the sequence of N random variables creating a new random variable Z: | ||||

Z = g | ( X1, X2, ..., Xn ) | |||

Then determine the probability density function of Z either by mathematical calculations or by finding a table of the particular probability density function we're interested in. | ||||

4. | Take the specific sequence of values supposedly obtained by performing a chance experiment N times: | |||

( x1, x2, ..., xn ) | ||||

and apply the function g to obtain a specific value z | ||||

z = g | ( x1, x2, ..., xn ) | |||

5. | Determine from the probability density function of Z how likely or unlikely we are to obtain our value of z assuming our null hypothesis H0 is true. If the probability is very low, we may reject our hypothesis H0 as being most likely incorrect. |

Define:

N | = | number of trials |

k | = | number of possible outcomes of the chance experiment |

f() | = | number of occurrences of in N trials |

E() | = | The expected number of occurrences of in N trials = N*Pr(). |

We now go through the steps as defined above:

1. | Formulate the null hypothesis H0. Our null hypothesis H0 for the die throwing experiment is that each of the six possible outcomes is equally likely with a probability of 1/6. |

2. | Formulate a real valued function g which somehow tests the null hypothesis. Our g function is the test. (This is not exactly the function, but for N large enough it approaches the distribution.) Note that if the result is zero, then each possible outcome occurred exactly the expected number of times. On the other hand, if the result is very large, it indicates that one or more of the possible outcomes of the chance experiment occurred far less or far more than would be expected if our null hypothesis H0 was true. |

3. | Determine the probability density function of Z = g(X1,X2,...Xn). As mentioned in step 2, our test function g is not exactly the function, but for N large enough it approaches the distribution of which tables are available for in any book on probability theory or any book of mathematical tables such as the CRC Standard Mathematical Tables book. The stated rule of thumb is N should be large enough so that every possible outcome is expected to occur at least 5 times. Thus for our die throwing experiment we should repeat it at least 30 times, since 30*(1/6) = 5. |

Each row of the distribution table is for a particular number of "degrees of freedom" which we will abbreviate to df. The value to choose for df is one less than the number of possible outcomes for our chance experiment. Thus for our die throwing experiment where there are 6 possible outcomes, we have df = 6 - 1 = 5. | |

Below is the distribution for 5 degrees of freedom: |

p=1% | p=5% | p=25% | p=50% | p=75% | p=95% | p=99% | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

0.5543 | 1.1455 | 2.675 | 4.351 | 6.626 | 11.07 | 15.09 |

(Note: See NUMERICAL RECIPES: The Art of Scientific Computing listed in the references for computer code that will calculate values so you don't have to look them up in a table.) | |

This is the cumulative distribution. If we recall the distribution itself is typically bell shaped, then the extremes of the bell shaped curve where the probability is very low corresponds to the extremes of the cumulative distribution. Thus if we are below the p=5% mark or above the p=95% mark, the corresponding probability is quite low. On the other hand, if we are between the p=25% mark and the p=75% mark, the corresponding probability is high. | |

(Note that some tables list the probability for (1 - p) instead of p, so that 0.5543 is listed as p=99% instead of p=1%, and 15.09 is listed as p=1% instead of p=99%. It should be easy to tell by inspection which probability the table is listing.) | |

4. | We can now apply our test to a specific outcome of rolling the die 30 times. Assume we threw the die 30 times and obtained the following results: |

f() = 7 | |

f() = 5 | |

f() = 4 | |

f() = 6 | |

f() = 6 | |

f() = 2 | |

Then our test becomes: | |

= 3.2 | |

5. | We can now check our result of 3.2 against the table above in step 3 and note that it falls between the p=25% mark and the p=50% mark. This is a high probability region and so we are not led to believe our null hypothesis H0 is false. |

Further notes on :

Note that not only are overly large values of considered highly improbable, but so are overly small values of . Whereas overly large values of lead us to believe the probabilities of each possible outcome of our chance experiment are not what we thought them to be, overly small values of lead us to believe the N trials of our chance experiment were not independent. |

As an example, if we threw a single die 6 times, there would be 6**6 or 46656 possible outcomes. If we calculate the value for each of the possible values and tabulate them we get the following table: |

f() | ||
---|---|---|

------ | --------- | |

0 | 720 | |

2 | 10800 | |

4 | 16200 | |

6 | 9000 | |

8 | 7200 | |

12 | 2100 | |

14 | 450 | |

20 | 180 | |

30 | 6 |

Although a value of 0 happens 720 times, a value of 2 happens 10800 times, making it much more likely. We can compare our values here with the distribution table above for 5 degrees of freedom. Our values of = 2 through 12 which appear to be the most likely ones lie above the p=5% mark and below the p=95% mark on the table, where we would expect them to lie. Note that this comparison was for N = 6, although we have as a rule of thumb that N should be at least 30 for this test. |

Another note about the distribution, most tables of the distribution go up to 30 degrees of freedom. Above that the distribution approaches the normal distribution. If your chance experiment has more than 30 degrees of freedom (number of possible outcomes minus one), you can use the following formula to convert the distribution for df > 30 into the normal distribution with mean = 0 and variance = 1: |

given: (and) df | |

Z = | |

Here Z is a normally distributed variable with mean = 0 and variance = 1. As an example, let's assume we are simulating the random choosing of a single card from a deck of 52. There are 52 possible outcomes. We will need to repeat this experiment 52*5=260 times so that each possible outcome occurs about 5 times. Our number of degrees of freedom is 52 - 1 = 51. We calculate and get = 30.4 . We now calculate Z from the above equation given = 30.4 and df = 51: | |

Z = | |

= | -2.33 |

We now look up in a table of the cumulative normal distribution and find this value of Z corresponds to the first 1% of the distribution. Thus our value is much lower than expected. We could conclude that something is probably amiss. |

One last note about . If you are simulating the flipping of a coin, then the number of degrees of freedom is 1 (two possible outcomes). Our rule of thumb about performing enough experiments so that each possible outcome occurs at least 5 times is not adequate in this case. Because the number of degrees of freedom is so low, we need to compensate for this by increasing our number of trials. Enough trials so that each outcome is expected to occur at least 75 times is required in this case. |

This brings us to our next subject, which is testing a set of data points to see how closely it matches a given continuous distribution. The Kolmogorov-Smirnov (KS) test compares a set of nc data points Snc (e.g. set of a number of data points) with a given cumulative distribution function P(x) (e.g. the cumulative distribution).

Given a set of nc data points Snc, we can create a cumulative distribution function Snc(x) from the data points. The function will be a step function, rising from 0 to 1, with each step size equal to 1/nc. Below is a generalized example which compares a cumulative distribution step function Snc(x) for 9 data values against a continuous cumulative distribution P(x) for a function which is uniform between 0 and 1:

Kolmogorov-Smirnov statistic D. A measured distribution of values in x (
shown as N dots on the lower abscissa) is to be compared with a theoretical distribution whose
cumulative probability distribution is plotted as P(x). A step-function cumulative
probability distribution Snc(x) (labeled in
the figure above) is constructed, one which rises an equal amount at each measured point.
D is the greatest distance between the two cumulative distributions.
(Reprinted from NUMERICAL RECIPES: The Art of Scientific Computing) |

The KS test is now very simple. Define D as the maximum vertical difference between these two functions:

D = max |Snc(x) - P(x)|

And define DQ as:

where nc is the number of data points.

We can then go directly to a table for the KS test (similar to the table), and read off where in the cumulative distribution our DQ value lies, remembering that values below p=5% only occur 5% of the time, and values above p=95% only occur 5% of the time. Tables for this KS test are not as easy to find as tables for the test. Fortunately, as in the case, if our number of data values nc in set Snc is greater than 30, the distribution approaches something which can be easily calculated as follows:

Given: | D = max |Snc(x) - P(x)| |

nc > 30 (number of data points) |

Chapter 1 | Chapter 3 | |||||

## index | ||||||

## Back to Deley's Homepage |