Week 9-25-06 and some preparation for exam 2. 1 Week 9-25-06 and some preparation for exam 2. 2 NORMAL DISTRIBUTION BERNOULLI TRIALS BINOMIAL DISTRIBUTION POISSON DISTRIBUTION 3 note the point of inflexion note the balance point 4 SD=15 MEAN = 100 point of inflexion 5 5 50 6 6.3 39.7 7 Illustrated for the Standard Normal Mean=0, SD=1 ~68% 8 Illustrated for the Standard normal Mean=0, SD=1 ~95% 9 15 ~68/2 =34% ~95/2=47.5% 85 100 130 10 15 ~68/2 =34% ~95/2=47.5% 85 100 130 11 IQ 15 100 1 Z 0 Standard Normal 12 13 P(Z > 0) = P(Z < 0 ) = 0.5 P(Z > 2.66) = 0.5 - P(0 < Z < 2.66) = 0.5 - 0.4961 = 0.0039 P(Z < 1.92) = 0.5 + P(0 < Z < 1.92) = 0.5 + 0.4726 = 0.9726 14 x p(x) 1 0 p q __ 1 (1 denotes “success”) (0 denotes “failure”) 0<p<1 q=1-p 15 P(success) = P(X = 1) = p P(failure) = P(X = 0) = q e.g. X = “sample voter is Democrat” Population has 48% Dem. p = 0.48, q = 0.52 P(X = 1) = 0.48 16 P(S1 S2 F3 F4 F5 F6 S7) = p3 q4 just write P(SSFFFFS) = p3 q4 “the answer only depends upon how many of each, not their order.” e.g. 48% Dem, 5 sampled, with-repl: P(Dem Rep Dem Dem Rep) = 0.483 0.522 17 e.g. P(exactly 2 Dems out of sample of 4) = P(DDRR) + P(DRDR) + P(DDRR) + P(RDDR) + P(RDRD) + P(RRDD) = 6 .482 0.522 ~ 0.374. There are 6 ways to arrange 2D 2R. 18 e.g. P(exactly 3 Dems out of sample of 5) = P(DDDRR) + P(DDRDR) + P(DDRRD) + P(DRDDR) + P(DRDRD) + P(DRRDD) + P(RDDDR) +P(RDDRD) + P(RDRDD) + P(RRDDD) = 10 .483 0.522 ~ 0.299. There are 10 ways to arrange 3D 2R. Same as the number of ways to select 3 from 5. 19 5! ways to arrange 5 things in a line Do it thus (1:1 with arrangements): select 3 of the 5 to go first in line, arrange those 3 at the head of line then arrange the remaining 2 after. 5! = (ways to select 3 from 5) 3! 2! So num ways must be 5! /( 3! 2!) = 10. 20 Let random variable X denote the number of “S” in n independent Bernoulli p-Trials. By definition, X has a Binomial Distribution and for each of x = 0, 1, 2, …, n P(X = x) = (n!/(x! (n-x)!) ) px qn-x e.g. P(44 Dems in sample of 100 voters) = (100!/(44! 56!)) 0.4844 0.52100-44 = 0.05812. 21 n!/(x! (n-x)!) is the count of how many arrangements there are of a string of x letters “S” and n-x letters “F.” px qn-x is the shared probability of each string of x letters “S” and n-x letters “F.” (define 0! = 1, p0 = q0 = 1 and the formula goes through for every one of x = 0 through n) . is short for the arrangement count = 22 Week 9-25-06 23 n = 10, p = 0.4 mean = n p = 4 sd = root(n p q) ~ 1.55 Week 9-25-06 24 n = 30, p = 0.4 mean = n p = 12 sd = root(n p q) ~ 2.683 Week 9-25-06 25 n = 100, p = 0.4 mean = n p = 40 sd = root(n p q) ~ 4.89898 Week 9-25-06 26 -mean e x mean p(x) = / x! for x = 0, 1, 2, ..ad infinitum Week 9-25-06 27 e..g. X = number of times ace of spades turns up in 104 tries X~ Poisson with mean 2 -mean x p(x) = e mean / x! e.g. p(3) = e-2 23 / 3! ~ 0.18 Week 9-25-06 28 e.g. X = number of raisins in MY cookie. Batter has 400 raisins and makes 144 cookies. E X = 400/144 ~ 2.78 per cookie -mean x p(x) = e mean / x! -2.78 2 e.g. p(2) = e 2.78 / 2! ~ 0.24 (around 24% of cookies have 2 raisins) Week 9-25-06 29 THE FIRST BEST THING ABOUT THE POISSON IS THAT THE MEAN ALONE TELLS US THE ENTIRE DISTRIBUTION! note: Poisson sd = root(mean) Week 9-25-06 30 E X = 400/144 ~ 2.78 raisins per cookie sd = root(mean) = 1.67 (for Poisson) Week 9-25-06 31 THE SECOND BEST THING ABOUT THE POISSON IS THAT FOR A MEAN AS SMALL AS 3 THE NORMAL APPROXIMATION WORKS WELL. 1.67 = sd = root(mean) Special to Poisson Week 9-25-06 mean 2.78 32 E X = 127.8 accidents If Poisson then sd = root(127.8) = 11.3049 and the approx dist is: ~ Week 9-25-06 sd = root(mean) = 11.3 Special to Poisson mean 127.8 accidents 33 Week 9-25-06 34 The overwhelming majority of samples of n from a population of N can stand-in for the population. 35 The overwhelming majority of samples of n from a population of N can stand-in for the population. 36 Sample size n must be “large.” For only a few characteristics at a time, such as profit, sales, dividend. SPECTACULAR FAILURES MAY OCCUR! 37 With-replacement 38 With-replacement vs without replacement. 39 This sample is obviously “not representative.” 40 Rule of thumb: With and without replacement are about the same if root [(N-n) /(N-1)] ~ 1. 41 They would have you believe the population is {8, 9, 12, 42} and the sample is {42}. A SET is a collection of distinct entities. 42 IF THE OVERWHELMING MAJORITY OF SAMPLES ARE “GOOD SAMPLES” THEN WE CAN OBTAIN A “GOOD” SAMPLE BY RANDOM SELECTION. 43 Digits are made to correspond to letters. a = 00-02 b = 03-05 …. z = 75-77 Random digits then give random letters. 1559 9068 … (Table 14, pg. 809) 15 59 90 68 etc… (split into pairs) f t * w etc… (take chosen letters) For samples without replacement just pass over any duplicates. 44 The Great Trick is far more powerful than we have seen. A typical sample closely estimates such things as a population mean or the shape of a population density. But it goes beyond this to reveal how much variation there is among sample means and sample densities. A typical sample not only estimates population quantities. It estimates the sample-to-sample variations of its own estimates. 45 The average account balance is $421.34 for a random with-replacement sample of 50 accounts. We estimate from this sample that the average balance is $421.34 for all accounts. From this sample we also estimate and display a “margin of error” $421.34 +/- $65.22 = . 46 NOTE: Sample standard deviation s may be calculated in several equivalent ways, some sensitive to rounding errors, even for n = 2. 47 The following margin of error calculation for n = 4 is only an illustration. A sample of four would not be regarded as large enough. Profits per sale = {12.2, 15.3, 16.2, 12.8}. Mean = 14.125, s = 1.92765, root(4) = 2. Margin of error = +/- 1.96 (1.92765 / 2) Report: 14.125 +/- 1.8891. A precise interpretation of margin of error will be given later in the course, including the role of 1.96. The interval 14.125 +/- 1.8891 is called a “95% confidence interval for the population mean.” We used: (12.2-14.125)2 + (15.3-14.125)2 + (16.2-14.125)2 + (12.8-14.125)2 = 11.1475. 48 A random with-replacement sample of 50 stores participated in a test marketing. In 39 of these 50 stores (i.e. 78%) the new package design outsold the old package design. We estimate from this sample that 78% of all stores will sell more of new vs old. We also estimate a “margin of error +/- 11.5% Figured: 1.96 root(pHAT qHAT)/root(n) =1.96 root(.78 .22)/root(50) = 0.114823 in Binomial setup 49 A sample of only n = 600 from a population of N = 500 million. (FINE resolution) 50 51