Binomial Proportion Monte Carlo I prepared this lesson for the son of a seventh grade friend who was working on a project for a Math/Science Fair. It involves simulating the results of flips of a fair coin, where the estimated parameter is the probability of success. As you probably know, an estimated binomial probability is a consistent estimator, and his project should demonstrate this. In statistics, “Monte Carlo” is refers to using a mathematical model of something of interest (often a sampling distribution) along with random number generators to address the effects of varying assorted parameters. Here we have the outcomes from 1,000,000 samples. P2 is the proportion of heads in each sample of two flips of a fair coin. As you can see, there is a lot of error in the estimation of the true proportion (.5). If you had just one sample, 50% of the time you would be off as much as possible (proportion = 0 or 1). The FREQ Procedure P2 Frequency Percent 0 25045 25.05 0.5 49701 49.70 1 25254 25.25 Here we have the outcomes from 1,000,000 samples. P10 is the proportion of heads in each sample of ten flips of a fair coin. As you can see, there is much less error in the estimation of the true proportion (.5). If you had just one sample, 66% of the time you would be off by not more than .1 (.4 proportion .6). P10 Frequency Percent Cumulative Cumulative Frequency Percent 0 102 0.10 102 0.10 0.1 916 0.92 1018 1.02 0.2 4442 4.44 5460 5.46 0.3 11549 11.55 17009 17.01 0.4 20356 20.36 37365 37.37 0.5 24875 24.88 62240 62.24 0.6 20692 20.69 82932 82.93 0.7 11584 11.58 94516 94.52 0.8 4368 4.37 98884 98.88 0.9 1003 1.00 99887 99.89 1 113 0.11 100000 100.00 Here we also have the outcomes from 1,000,000 samples. P20 is the proportion of heads in each sample of twenty flips of a fair coin. As you can see, there is even less error in the estimation of the true proportion (.5). If you had just one sample, 74% of the time you would be off by not more than .1 (.4 proportion .6). P20 Frequency Percent Cumulative Cumulative Frequency Percent 0.05 2 0.00 2 0.00 0.1 26 0.03 28 0.03 0.15 123 0.12 151 0.15 0.2 452 0.45 603 0.60 0.25 1422 1.42 2025 2.03 0.3 3707 3.71 5732 5.73 0.35 7300 7.30 13032 13.03 0.4 11829 11.83 24861 24.86 0.45 15990 15.99 40851 40.85 0.5 17873 17.87 58724 58.72 0.55 16259 16.26 74983 74.98 0.6 11944 11.94 86927 86.93 0.65 7283 7.28 94210 94.21 0.7 3693 3.69 97903 97.90 0.75 1519 1.52 99422 99.42 0.8 447 0.45 99869 99.87 0.85 104 0.10 99973 99.97 0.9 27 0.03 100000 100.00 The MEANS Procedure Variable Std Dev P2 0.3546092 P10 0.1577809 P20 0.1115296 So, how much are we likely to be offer when we have just one sample of 2, 10, or 20 flips. We could compute the mean absolute deviation. This would involve finding, for each sample, the amount by which it was off (observed proportion minus .5) and then average those deviations, ignoring the sign of the deviations. A much more common procedure is to compute the standard deviation, which the square root of the mean squared deviation. This statistic is called the “standard error.” In the table above are the observed standard deviations for each of the three sample sizes. Notice that as the number of coin flips increases, the expected amount of error decreases. This is a very important property of estimators, a property known as “consistency.” pq , where p is the probability of n success (heads in this case), q is the probability of failure (number of heads), and n is the sample .5(.5) size (number of coin flips). For n = 2, the standard error is .354 . For n = 10, the standard 2 .5(.5) .5(.5) error is .158 . For n = 20, the standard error is .112 . Notice that these theoretical 10 20 values are exceptionally close to those observed. If we had all of eternity to get an infinite number of samples (and if p really were exactly .5), the observed standard errors would perfectly match the theoretical standard error. It is known that the standard error of a proportion is equal to SAS Code for This Monte Carlo options formdlim='-' pageno=min nodate; DATA coinflips; drop I; do Sample=1 to 100000; array Ys[20] Y1-Y20; do I = 1 to 20; Ys[I]=UNIFORM(0); array Head[20] Head1-Head20; Head[I} = 0; if Ys[I] >.5 then Head[I] = 1; end; output; end; *proc print; Data Flips; set coinflips; P2 = (Head1 + Head2)/2; P10 = Sum(of Head1-Head10)/10; P20 = Sum(of Head1-Head20)/20; *proc print; Proc Freq; Table P2 P10 P20 / plots=freqplot; run; Proc Means std; var P2 P10 P20; run;