ESTIMATION AND TESTING POPULATION PROPORTIONS 1 INTERVAL ESTIMATION OF THE POPULATION PROPORTION π Recall the binomial experiment in which we randomly select individuals from a population and record, for each individual, which of two categories they belong to. One of the two categories is defined as a “Success” and by default the other is a “Failure”. e.g. categories that are exhaustive and mutually exclusive: male vs female succeed vs. fail herbaceous vs. woody trained vs. untrained The population proportion of successes is denoted π and the sample proportion of successes is denoted πˆ = number of observed successes . number of trials e.g. suppose we are studying ESP and we perform an experiment in which an individual is tested with three different playing cards. Each test consists of a card being held up and their guess as to which card it is. If they do not have ESP, then they have a 1/3 chance (π = 0.33 ) of picking a card correctly. Now, suppose I run this experiment with the person being tested 25 times (n) and they get 10 correct. Then I observed a 10 sample proportion of πˆ = = 0.40 correct responses. 25 ESTIMATION AND TESTING POPULATION PROPORTIONS 2 If the sample size (n) is sufficiently large (both nπ and n(1 − π ) ≥ 5 ), then The sample proportion πˆ is approximately Normally distributed with mean µπˆ = π and variance σ π2ˆ = π (1 − π ) n . Since we don’t know the population proportion, estimate the unknown variance with sπ2ˆ = πˆ (1 − πˆ ) n and we check if the sample size is sufficiently large by checking that both nπˆ and n(1 − πˆ ) ≥ 5 . e.g. Parents of autistic children are often told that their child is autistic around 1-2 years of age, approximately the same age that children receive their MMR vaccinations (mumps, measles and rubella). As a result some parents claim that the vaccine causes autism. To test this, a study was done to estimate the rate of autism in children who receive the MMR vaccine. In a sample of 8,500 randomly selected children who did receive the MMR vaccine, the proportion with autism was .00282. Can we assume approximate normality? πˆ = n= ESTIMATION AND TESTING POPULATION PROPORTIONS 3 nπˆ = n(1 − πˆ ) = sπ2ˆ = πˆ (1 − πˆ ) n = A large sample 95% confidence interval for the population proportion π is πˆ ± 1.96 πˆ (1 − πˆ ) n • Large-sample means that the sampling was done randomly and the sample size is sufficiently large to invoke the Central Limit Theorem. • 1.96 is the z-score, z*, that makes the following statement true: 0.95 = Pr(- z* < Z < + z*). We use this because we are using the CLT which states that sample proportions are normally distributed for large samples ESTIMATION AND TESTING POPULATION PROPORTIONS The formula is easily adapted for other confidence levels. Simply replace 1.96 with the appropriate number from the table below. The z critical values for common confidence levels are: Confidence Level 80% 90% 95% 98% 99% 99.9% Z critical values 1.28 1.645 1.96 2.33 2.58 3.29 e.g. Autistic children. A 95% CI for the true proportion children who have received the MMR vaccine that are autistic is given by πˆ ± 1.96 πˆ (1 − πˆ ) n = 0.00282 ± 1.96(0.0005752) = 0.00282 ± 0.00113 We interpret this to mean that we are 95% confident that the true proportion children who have received the MMR vaccine that are autistic is within the interval (0.00169, 0.00395). 4 ESTIMATION AND TESTING POPULATION PROPORTIONS 5 e.g. A researcher flew to the South Pacific and collected 150 fiddler crabs. For each crab she recorded whether the left or right pincer was dominant and observed that 20 crabs were left-pincered. Calculate a 90% C.I. to estimate the true proportion of left-pincered crabs on the island. 1) Is the sample size large enough to use our method? 2) πˆ = 20 150 = 0.133, and sπˆ = πˆ (1 − πˆ ) n = 3) 90% Confidence Î z = 4) The 90% C.I. then is 5) What if we had calculated a 95% C.I.? Would it be wider or shorter than the 90% C.I.? ESTIMATION AND TESTING POPULATION PROPORTIONS 6 6) What if she had seen 13.3% based on a sample of 300 crabs? Would the 90% C.I. be wider or shorter than the one based on 150 crabs? Defn: Confidence intervals can be written in the form point estimate ± MARGIN OF ERROR where the margin of error (ME) is the product of the critical value and the standard deviation of the point estimate. Suppose the scientist is planning to repeat the fiddler crab experiment and wants to calculate a 95% confidence interval with a margin of error of no more than 2.5%. How big a sample size should she take in the new experiment? Margin of Error (ME) = 1.96 πˆ (1 − πˆ ) n = 0.025 From the earlier experiment an estimate for πˆ is 0.133 so we’ll use that. ESTIMATION AND TESTING POPULATION PROPORTIONS Now we need to solve 1.96 7 .133(1 − .133) = 0.025 for n. n General equation to estimate the needed sampled size for a specified margin of error (ME) when estimating a population proportion is: ⎛ zα ⎞ ⎟ ⎜ 2 n = π 0 (1 − π 0 )⎜ ⎟ ⎜ ME ⎟ ⎠ ⎝ 2 where • π 0 is hypothesized as the likely true proportion in the population (if completely unsure use π 0 = 0.5 ) • zα is the z critical value for the desired confidence 2 level (1-α)100% • ME is the desired margin of error (in decimals) ESTIMATION AND TESTING POPULATION PROPORTIONS 8 TESTING THE POPULATION PROPORTION π Let’s walk through one example, put the pieces into a testing procedure for proportions and then use the procedure in another example e.g. autistic children. Some parents claim that the MMR vaccine causes autism. To test this, a study was done to compare the rate of autism in children who receive the MMR vaccine to the known population rate for children who do not receive the vaccine. Among those who did not receive the vaccine, the proportion of children with autism is 0.0021. In a sample of 8,500 randomly selected children who did receive the MMR vaccine, the proportion with autism was .0028. Is this sufficient evidence to indicate that the vaccine is related to autism? H0: π = HA: π 1. Hypotheses: 2. Significance level: α = 3. If the Null Hypothesis is true (which is assumed to be true until proven otherwise), then The distribution of the sample proportion that we get from doing such an experiment, πˆ , has a mean of µπˆ = π = 0.0021 ESTIMATION AND TESTING POPULATION PROPORTIONS and a standard deviation of σ πˆ = π (1 − π ) n = 0.00049567 Further the distribution of πˆ is approximately Normal in shape if the sample size is big enough (both nπ and n(1 − π ) ≥ 5 ). Check: nπ = 8500(0.0021) = 17.85 > 5 n(1 − π ) = 8500 − 17.85 = 8482.15 > 5 So, suppose H0 is true. Is a value of πˆ = 0.00282 sufficiently larger than π 0 = 0.0021 to imply that the true rate is larger than 0.0021? Let’s convert the observed sample proportion to a z-score so that we can interpret the difference more easily: z= πˆ − µπˆ 0.0028 − 0.0021 = = 1.4122 σ πˆ 0.00049567 (this z-score is assuming that H0 is true!) 9 ESTIMATION AND TESTING POPULATION PROPORTIONS 10 This says that 0.0028 is 1.41 standard deviation units above the hypothesized value of 0.0021. Is this very likely if the null hypothesis is true? Is it supportive of H0 or HA? What is the p-value associated with this z-score? We calculate Pr( z > 1.41) = p − value . Pr( z > 1.41) = 1 − Pr( z ≤ 1.41) = 1 − 0.9207 = 0.0793 So, the probability that a random sample would yield a sample proportion of 0.0028 or more by chance alone when the null hypothesis is true is approximately 8%. So, are the data sufficiently contradictory of H0 for us to reject it? ESTIMATION AND TESTING POPULATION PROPORTIONS Large Sample Hypothesis Test of a Population Proportion Null hypothesis: H0: π = πo where πo is the hypothesized value Alternative Hypothesis is one of three: a) HA: π > πo b) HA: π < πo c) HA: π ≠ πo Test Statistic: z = πˆ − π o π o (1 − π o ) n where n is the sample size and πˆ is the observed sample proportion P-value: depends on the alternative hypothesis: a) p-value = Pr( Z > z) b) p-value = Pr( Z < z) c) p-value = 2 Pr( Z < -|z|) Decision Rule: reject Ho if P-value ≤ α Assumptions: 1. n is large enough for p to be approximately normally distributed ( nπo≥5 and n(1-πo)≥5 ) 2. the sampling was random 11 ESTIMATION AND TESTING POPULATION PROPORTIONS 12 e.g. The incidence rate of a certain type of chromosome defect in adult males in the U.S. is believed to be 1 in 80. A random sample of 1000 men in prison revealed 20 men with defects. Is there evidence to suggest that the rate for prisoners differs from that in the general population? Use a significance level of 0.05. Null hypothesis: H0: π = πo = 1/80 = .0125 Alternative Hypothesis: HA: π ≠ .0125 Check assumptions: 1) nπo = 1000(.0125) = 12.5 ≥5 n(1-πo)=987.5 ≥5 2) sampling was given to be random Test Statistic: πˆ − π o .02 − .0125 z= = = 2.1347 π o (1 − π o ) .0125(1 − .0125) n P-value: 1000 2 Pr( Z < -|z|) = 2 Pr(Z<-2.13) = 2(0.0166) = 0.0332 Conclusion: reject the null hypothesis since 0.0332<0.05=α. There is sufficient evidence based on this sample to conclude that the population of adult males in the U.S. penal system has a different rate of a certain type of genetic defect than the general adult male population. ESTIMATION AND TESTING POPULATION PROPORTIONS 13 e.g. Suppose a genetic crossing experiment was performed. If independent sorting of the genes occurs then it is expected that 25% of the offspring would display a certain characteristic. If a particular type of non-independent event occurs, the proportion should be smaller. The experiment resulted in 50 plants out of 230 having the characteristic. Is this sufficient evidence to reject independent sorting of the genes? Significance level: α=.10 Null hypothesis: H0: π = πo = Alternative Hypothesis: HA: π Check assumptions: 1) nπo = n(1-πo) = 2) sampling? Test Statistic: P-value: Conclusion: