Example A student attempts a multiple choice exam (options A to F for each question), but having done no work, selects his answers to each question by rolling a fair die (A = 1, B = 2, etc.). If the exam contains 100 questions, what is the probability of obtaining a mark below 20? Simulation Now, let us simulate a large number of realisations of students using this random method of answering multiple choice questions. We still require the same 1 Binomial distribution with n=100 and a= 6 This can be done on R using the command rbinom. For example, let’s simulate 1000 students. > xsim=rbinom(1000,100,1/6) > xsim [1] 18 22 9 17 18 20 21 16 8 18 11 16 16 13 16 14 25 15 16 17 [21] 13 25 11 24 17 16 13 21 10 17 18 10 17 18 19 17 19 15 13 12 [41] 15 11 21 23 19 14 19 25 23 19 20 17 17 15 16 14 13 16 17 14 [61] 24 21 19 8 18 20 22 16 15 20 19 17 13 15 13 21 22 12 12 12 [81] 11 14 11 12 16 16 17 21 17 16 17 14 9 17 16 17 12 20 16 17 [101] 18 13 15 16 12 15 17 16 17 26 18 14 21 15 10 23 12 16 16 12 [121] 17 18 22 17 18 14 19 22 13 17 21 15 21 16 17 16 16 28 16 17 [141] 18 19 16 11 14 18 16 18 18 14 20 13 19 19 22 22 13 17 19 17 [161] 18 20 11 22 19 25 15 15 17 18 5 15 14 13 18 15 17 15 20 17 [181] 16 14 23 17 16 10 12 16 21 30 16 13 22 14 15 16 17 14 16 18 [201] 14 20 16 19 25 14 15 24 22 19 15 17 22 10 20 13 10 15 14 22 [221] 17 12 16 19 20 17 15 21 14 13 21 11 19 9 21 22 16 13 13 12 [241] 14 13 18 8 14 18 10 16 10 12 21 18 15 17 16 8 19 17 11 18 [261] 23 17 20 16 12 20 11 16 22 17 16 13 22 20 15 15 20 17 22 14 [281] 18 23 18 20 20 16 19 16 15 19 18 17 14 22 15 24 17 15 17 22 [301] 18 22 10 19 24 21 16 14 11 14 20 15 21 11 17 16 20 19 13 14 [321] 17 17 19 15 17 13 18 23 16 12 25 13 13 21 19 16 20 27 19 18 [341] 18 24 15 23 13 13 14 15 23 13 19 15 11 19 17 12 15 15 17 14 [361] 18 20 17 13 16 14 13 20 18 15 18 16 17 20 14 19 21 12 13 17 [381] 22 17 19 16 14 18 16 18 12 16 13 15 16 9 15 16 18 22 14 16 [401] 14 17 12 16 21 16 21 13 14 19 18 18 16 19 17 17 17 13 17 11 [421] 16 16 13 10 26 12 20 17 11 19 18 12 15 14 14 20 15 15 15 11 [441] 18 23 20 23 13 12 18 22 12 16 13 21 22 14 18 21 17 12 19 16 [461] 17 18 15 22 22 20 15 16 13 12 19 22 16 20 19 19 16 8 15 12 [481] 29 26 19 16 20 15 11 22 15 20 21 14 16 13 17 15 10 13 17 12 [501] 18 20 17 14 13 19 23 11 27 19 17 16 17 20 21 15 20 20 21 19 [521] 21 16 13 21 16 19 13 9 10 20 12 18 14 13 18 19 22 19 21 18 [541] 6 17 17 19 19 22 23 18 13 12 17 16 21 16 18 21 19 13 22 19 [561] 20 17 18 15 17 15 15 10 18 13 23 17 14 23 22 10 18 11 11 18 [581] 16 17 14 13 9 12 14 14 21 23 24 19 12 15 17 18 11 14 19 19 [601] 19 16 17 13 13 15 17 18 17 13 9 19 18 22 17 13 14 22 13 23 [621] 23 19 19 16 24 14 17 18 17 13 16 12 7 15 17 16 18 22 19 15 [641] 16 18 18 13 20 18 12 6 15 11 16 19 12 13 11 17 11 15 11 19 [661] 17 16 16 21 12 18 20 19 16 14 18 17 16 14 11 17 17 16 17 17 [681] 17 18 16 18 12 18 18 20 19 13 12 16 14 13 13 6 15 12 19 14 [701] 20 17 16 14 21 19 15 26 17 20 12 24 13 11 19 21 18 13 9 16 [721] 9 16 17 16 15 12 11 21 21 13 19 13 13 16 11 17 15 19 22 19 [741] 11 13 14 16 20 15 16 12 18 14 12 14 21 12 23 21 19 10 24 17 [761] 17 19 19 15 18 12 14 14 14 20 12 20 12 21 19 20 21 20 17 18 [781] 15 12 16 23 16 16 19 15 12 14 21 25 12 19 20 22 17 16 21 20 [801] 23 24 17 20 17 19 14 22 20 25 10 12 15 16 7 14 14 18 22 10 [821] 15 22 23 18 12 10 14 18 15 15 18 10 21 11 20 15 20 10 13 16 [841] 16 17 22 19 19 16 8 20 17 13 21 16 25 16 13 17 14 17 19 21 [861] 17 19 14 22 20 18 14 19 17 23 20 18 14 11 16 18 26 24 24 18 [881] 21 16 23 20 14 16 15 13 14 11 12 13 14 16 18 17 16 17 13 20 [901] 22 8 17 17 16 16 14 22 17 18 18 21 15 11 20 21 18 15 19 21 [921] 16 22 14 12 16 20 16 21 11 13 19 14 23 12 12 17 14 15 26 17 [941] 18 14 21 17 14 24 21 12 21 13 20 22 11 20 10 16 16 15 19 13 [961] 16 15 16 17 9 14 11 12 19 17 16 15 21 14 15 14 15 17 15 16 [981] 19 11 15 17 17 17 11 18 21 14 15 17 18 16 11 22 19 16 14 15 It makes sense now to look at properties of these 1000 simulations which have been placed in the vector “xsim”. > mean(xsim) [1] 16.624 > median(xsim) [1] 17 > sd(xsim) [1] 3.778479 > var(xsim) [1] 14.2769 > Now compare the actual values from the simulations, with the theoretical values from the probability distribution. SIMULATION THEORETICAL MEAN 16.624 16.66667 VARIANCE 14.2769 13.88889 A full summary of the results of the simulation is given with: > table(xsim) xsim 5 6 7 8 9 10 11 12 13 14 15 16 17 1 3 2 7 10 21 40 57 72 80 82 118 118 18 19 20 21 22 23 24 25 26 27 28 29 30 85 83 61 55 46 25 14 9 6 2 1 1 1 > A Histogram can also be plotted of this: > hist(xsim) Notice that a BARPLOT of xsim does NOT produce a useful graph! > barplot(xsim) A barplot of the TABLE of xsim does work,though. > barplot(table(xsim)) Poisson Distribution The Poisson distribution is used to model the number of events occurring within a given time interval. The formula for the Poisson probability density (mass) function is x e p( x ) x! is the shape parameter which indicates the average number of events in the given time interval. Some events are rather rare - they don't happen that often. For instance, car accidents are the exception rather than the rule. Still, over a period of time, we can say something about the nature of rare events. An example is the improvement of traffic safety, where the government wants to know whether seat belts reduce the number of death in car accidents. Here, the Poisson distribution can be a useful tool to answer questions about benefits of seat belt use. Other phenomena that often follow a Poisson distribution are death of infants, the number of misprints in a book, the number of customers arriving, and the number of activations of a Geiger counter. The distribution was derived by the French mathematician Siméon Poisson in 1837, and the first application was the description of the number of deaths by horse kicking in the Prussian army. Example Arrivals at a bus-stop follow a Poisson distribution with an average of 4.5 every quarter of an hour. Obtain a barplot of the distribution (assume a maximum of 20 arrivals in a quarter of an hour) and calculate the probability of fewer than 3 arrivals in a quarter of an hour. The probabilities of 0 up to 2 arrivals can be calculated directly from the formula e p( x ) x! p(0) e 4.5 x with =4.5 0 4.5 0! So p(0) = 0.01111 Similarly p(1)=0.04999 and p(2)=0.11248 So the probability of fewer than 3 arrivals is 0.01111+ 0.04999 + 0.11248 =0.17358 R Code As with the Binomial distribution, the codes dpois and ppois will do the calculations for you. > x=dpois(0:20,4.5) >x [1] 1.110900e-02 4.999048e-02 1.124786e-01 1.687179e-01 1.898076e-01 [6] 1.708269e-01 1.281201e-01 8.236295e-02 4.632916e-02 2.316458e-02 [11] 1.042406e-02 4.264389e-03 1.599146e-03 5.535504e-04 1.779269e-04 [16] 5.337808e-05 1.501258e-05 3.973919e-06 9.934798e-07 2.352979e-07 [21] 5.294202e-08 > > barplot(x,names=0:20) Now check that ppois gives the same answer (ppois is a cumulative distribution). > ppois(2,4.5) [1] 0.1735781 > Consider a collection of graphs for different values of =3 =4 =5 =6 =10 In the last case, the probability of 20 arrivals is no longer negligible, so values up to, say, 30 would have to be considered. Properties of Poisson The mean and variance are both equal to . The sum of independent Poisson variables is a further Poisson variable with mean equal to the sum of the individual means. As well as cropping up in the situations already mentioned, the Poisson distribution provides an approximation for the Binomial distribution. Approximation: If n is large and p is small, then the Binomial distribution with parameters n and p, ( B(n;p) ), is well approximated by the Poisson distribution with parameter np, i.e. by the Poisson distribution with the same mean Example Binomial situation, n= 100, p=0.075 Calculate the probability of fewer than 10 successes. > pbinom(9,100,0.075) [1] 0.7832687 > This would have been very tricky with manual calculation as the factorials are very large and the probabilities very small The Poisson approximation to the Binomial states that will be equal to np, i.e. 100 x 0.075 so =7.5 > ppois(9,7.5) [1] 0.7764076 > So it is correct to 2 decimal places. Manually, this would have been much simpler to do than the Binomial.