Confidence Intervals and Hypothesis tests with Proportions What happens to your confidence as the interval gets smaller? Your confidence level decreases with smaller intervals % % % % Confidence level • Is the success rate of the method used to construct the interval • Using this method, ____% of the time the intervals constructed will contain the true population parameter Critical value (z*) • Found from the confidence level • The upper z-score with probability p lying to its right under the standard normal curve Confidence level 90% 95% 99% z*=1.645 tail area z*=1.96 z*=2.576z* .05 .025 .005 1.645 .05 .025 1.96 .005 2.576 Confidence interval for a But do we know the population proportion: population proportion? pˆ z * pˆ1 ppˆ n Statistic + Critical value × Standard deviation of the statistic Margin of error What are the steps for performing a confidence interval? 1.) Assumptions • • • SRS of context Approximate Normal distribution because np > 10 & n(1-p) > 10 Population is at least 10n 2.) Calculations 3.) Conclusion We are ________% confident that the true proportion context is between ______ and ______. • As the confidence level increases, do the intervals generally get wider or more narrow? Explain. • As the sample size increases, do the intervals generally get wider or more narrow? Explain. •When 100 confidence intervals are generated, why are they all different? • If the confidence level selected is 90%, about how many of 100 intervals will cover the true percentage of orange balls? Will exactly this number of intervals cover the true percentage each time 100 intervals are created? Explain. A May 2000 Gallup Poll found that 38% of a random sample of 1012 adults said that they believe in ghosts. Find a 95% confidence interval for the true proportion of adults who believe in ghost. Assumptions: Step 1: check assumptions! •Have an SRS of adults •np =1012(.38) = 384.56 & n(1-p) = 1012(.62) = 627.44 Since both are greater than 10, the distribution can be approximated by a normal curve 2:10,120. make •Population of adults isStep at least calculations .38(.62) p 1 p .38 1.96 Pˆ z * n 1012 .35,.41 Step 3: conclusion in context We are 95% confident that the true proportion of adults who believe in ghosts is between 35% and 41%. The manager of the dairy section of a large supermarket took a random sample of 250 egg cartons and found that 40 cartons had at least one broken egg. Find a 90% confidence interval for the true proportion of egg cartons with at least one broken egg. Assumptions: Step 1: check assumptions! •Have an SRS of egg cartons •np =250(.16) = 40 & n(1-p) = 250(.84) = 210 Since both are greater than 10, the distribution can be approximated by a normal curve 2: make •Population of cartons Step is at least 2500. calculations .16(.84) .122,.198 .16 1.645 250 Step 3: conclusion in context We are 90% confident that the true proportion of egg cartons with at least one broken egg is between 12.2% and 19.8%. Another Gallop Poll is taken To findtosample size: the in order measure proportion of adults pwho 1 p m z * approve of attempts to clone n humans. What size is a However, since sample we have not yet taken sample, we know a + p-hat (or p) necessary todobenotwithin 0.04 oftothe use! true proportion of adults who approve of attempts to clone humans with a 95% Confidence Interval? Another Gallop Poll is taken in order to measure the proportion of adults who approve of attempts to clone humans. What sample size is necessary to be within + 0.04 of the true proportion of adults who approve of attempts to clone humans with a 95% Confidence p 1 p Interval? m z * .04 1.96 .5.5 n n .5.5 n .04 1.96 2 .25 .04 n 1.96 n 600 .25 601 Use p-hat = .5 Divide by 1.96 Square both sides Round up on sample size What are hypothesis tests? Calculations that tell us if the sample These calculations (called the Is it one of the statistics (p-hat) occurs by random test statistic) willproportions tell us how sample chance or not OR . . . if it is statistically many standard deviations a that are likely to significant sample proportion is from the occur? IsStatistically it . . . population significant means that it proportion! Is it one that –isaNOT random occurrence to natural a random chancedue occurrence! isn’t likely to variation? occur? – an occurrence due to some other reason? Steps: 1) Assumptions 2) Hypothesis statements & define parameters 3) Calculations 4) Conclusion, in context Assumptions for z-test: • • YEA – These the same Have an SRS of are context assumptions as confidence Distribution is intervals!! (approximately) normal because both np > 10 and n(1-p) > 10 • Population is at least 10n How to write hypothesis statements • Null hypothesis – is the statement (claim) being tested; this is a statement of “no effect” or “no difference” H0: • Alternative hypothesis – is the statement that we suspect is true Ha: How to write hypotheses: Null hypothesis H0: parameter = hypothesized value Alternative hypothesis Ha: parameter > hypothesized value Ha: parameter < hypothesized value Ha: parameter = hypothesized value Facts to remember about hypotheses: • Hypotheses ALWAYS refer to populations (use parameters – never statistics) • The alternative hypothesis should be what you are trying to prove! • ALWAYS define your parameter in context! Activity: For each pair of hypotheses, indicate which are not legitimate & Must use parameter Must be(population) NOT equal! x explain why is a statistics (sample) a) H0 : 15 ; Ha : 15 is the population b) H0 : x 123; Ha : x 123 proportion! Must use same .1 a 1 ;asHHa 0:! –Not : isa.statistic H0 number c) P-hat parameter! d) H0 : p .4; Ha : p .6 e) H0 : pˆ .1 ; Ha : pˆ .1 P-value - The statistic is our p-hat! • Assuming H0 is true, the probability that the statistic would have a value as extreme or more than what is actually observed Notice that this is a Why not find the probability Remember that in continuous conditional probability that the equals distributions, wep-hat cannot find a value? probabilitiescertain of a single value! P-values We can use normalcdf to • Assuming H0 find is true, the probability this probability. that the statistic would have a value as extreme or more than what is actually observed In other words . . . What is the probability of getting values more (or less) than our p-hat? pˆ pˆ Level of significance • Is the amount of evidence necessary before we begin to doubt that the null hypothesis is true • Is the probability that we will reject the null hypothesis, assuming that it is true • Denoted by a – Can be any value – Usual values: 0.1, 0.05, 0.01 – Most common is 0.05 Statistically significant – • Our statistic (p-hat) is statistically Remember that the verdict is never significant if the p-value is as small or “innocent” – so we can never decide smaller than the significance (a). that thelevel null of is true! Our “guilty” verdict. Our “not guilty” verdict. Decisions: • If p-value < a, “reject” the null hypothesis at the a level. • If p-value > a, “fail to reject” the null hypothesis at the a level. Facts about p-values: • ALWAYS make the decision about the null hypothesis! • Large p-values show support for the null hypothesis, but never that it is true! • Small p-values show support that the null is not true. • Double the p-value for two-tail (≠) tests • Never accept the null hypothesis! Never “accept” the null hypothesis! Never “accept” the null hypothesis! Never “accept” the null hypothesis! Calculating p-values • For z-test statistic (z) – – Use normalcdf(lb,ub) to find the probability of the test statistic or more extreme We will seewehow Since areto incompute the – Remember the standard normal this value tomorrow. standard normal curve, weof z’s where curve is comprised do =not 0 need and s, =s 1here. Writing Conclusions: 1) A statement of the decision being made (reject or fail to reject H0) & why (linkage) AND 2) A statement of the results in context. (state in terms of Ha) “Since the p-value < (>) a, I reject (fail to reject) the H0. There is (is not) sufficient evidence to suggest that Ha.” Be sure to write Ha in context (words)! Formula for hypothesis test: statistic - parameter Test statistic SD of parameter z pˆˆ p p 1 p n Example 5: A company is willing to renew its advertising contract with a local radio station only if the station can prove that more than 20% of the residents of the city have heard the ad and recognize the company’s product. The radio station conducts a random sample of 400 people and finds that 90 have heard the ad and recognize the product. Is this sufficient evidence for the company to renew its contract? Assumptions: •Have an SRS of people •np = 400(.2) = 80 & n(1-p) = 400(.8) = 320 - Since both are greater than 10, this distribution is approximately normal. •Population of people is at least 4000. Use the parameter in the null hypothesis to check assumptions! H0: p = .2 where p is the true proportion of people who Ha: p > .2 heard the ad .225 .2 z 1.25 p value .1056 a .05 .2(.8) Use the parameter in the null hypothesis to calculate standard 400 deviation! Since the p-value > a, I fail to reject the null hypothesis. There is not sufficient evidence to suggest that the true proportion of people who heard the ad is greater than .2. The company will not renew their advertising contract with the radio station. Calculate the appropriate confidence interval for the above problem. CI = (.19066,.25934) How do the results from the confidence interval compare to the results of the hypothesis test? The confidence interval contains the parameter of .2 thus providing no evidence that more than 20% had heard the ad. Two-Sample Proportions Inference Assumptions: • Two, independent SRS’s from populations ( or randomly assigned treatments) • Populations at least 10n • Normal approximation for both n1 p1 10 n1 1 p1 10 n2 p2 10 n2 1 p2 10 Formula for confidence interval: CI statistic critical value SD of statistic pˆ pˆ Margin of 1 error! 2 z* Standard error! pˆ1 1 pˆ1 pˆ2 1 pˆ2 n1 n2 Note: use p-hat when p is not known Example 1: At Community Hospital, the burn center Since is experimenting new nplasma n1p1=259, n1with (1-p1a )=57, 2p2=94, compressn2treatment. A random sample of 316 (1-p2)=325 and all > 10, then the distribution of burns difference in proportions patients with minor received the plasma is approximately normal. compress treatment. Of these .82(.18) .22patients, (.78) it was S .259 E . had no visible scars after found that 419 of 419 treatment. Another 316 random sample 0.0296 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is the shape & standard error of the sampling distribution of the difference in the proportions of people with no visible scars between the two groups? Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is a 95% confidence interval of the difference in proportion of people who had no visible scars between the plasma compress treatment & control group? Assumptions: Since these are all burn patients, we can add 316 + 419 treatment = 735. •Have 2 independent randomly assigned groups If not the same – you MUST list separately. •Both distributions are approximately normal since n1p1=259, n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all > 5 •Population of burn patients is at least 7350. p1 1 p1 p2 1 p2 pˆ1 pˆ2 z * n1 n2 .82.18 .22.78 .82 .22 1.96 .537, .654 316 419 We are 95% confident that the true the difference in proportion of people who had no visible scars between the plasma compress treatment & control group is between 53.7% and 65.4% Example 2: Suppose that researchers want to estimate the difference in proportions of people who are against the death penalty in Texas & in Since both n’s are the same California.size, Ifyou the two sample sizes have common denominators so add! are the same, what –size sample is needed to be within 2% of the true difference at 90% confidence? .5(.5) .5(.5) .02 1.645 n n .25 .25 .02 1.645 n n = 3383 Hypothesis statements: H00:: pp11 -=pp22= 0 H H p11 >- p p22 > 0 Haa:: p Haa:: p H p11 <- p p22 < 0 H p11 ≠- pp22≠ 0 Haa:: p Be sure to define both p1 & p2! Since we assume that the population proportions are equal in the null hypothesis, the variances are equal. Therefore, we pool the variances! x1 x 2 pˆ n1 n2 Formula for Hypothesis test: Usually p1 statistic – p2 =0 Test statistic z - parameter SD of statistic pˆ1 pˆ2 p1 p2 1 1 pˆ1 pˆ n1 n2 Example 4: A forest in Oregon has an infestation of spruce moths. In an effort to control the moth, one area has been regularly sprayed from airplanes. In this area, a random sample of 495 spruce trees showed that 81 had been killed by moths. A second nearby area receives no treatment. In this area, a random sample of 518 spruce trees showed that 92 had been killed by the moth. Do these data indicate that the proportion of spruce trees killed by the moth is different for these areas? Assumptions: •Have 2 independent SRS of spruce trees •Both distributions are approximately normal since n1p1=81, n1(1-p1)=414, n2p2=92, n2(1-p2)=426 and all > 10 •Population of spruce trees is at least 10,130. H0: p1=p2 Ha: p1≠p2 z where p1 is the true proportion of trees killed by moths in the treated area p2 is the true proportion of trees killed by moths in the untreated area pˆ1 pˆ2 pˆ 1 pˆ 1 1 n1 n2 .16 .18 0.59 1 1 .17.83 495 518 P-value = 0.5547 a = 0.05 Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the proportion of spruce trees killed by the moth is different for these areas