MATH/STAT 352: LECTURE 17 Sections 6.1, 6.2, 6.12: Tests of Hypotheses, Large Sample Test for the Population Mean, Fixed Level of Significance Tests. 1 TESTING HYPOTHESES FOR A SINGLE SAMPLE PROBLEM: Make a statement about a parameter. GOAL: Find out, DECIDE , if the sample data supports or contradicts this statement. PROCESS: Test hypothesis (statement) Examples: 1. Is a coin fair? Is the probability of H equal to 0.5? Statement or hypothesis: P(H)=0.5 2. Average height in a population is 64 in. Hypothesis: average height μ = 64. TESTING HYPOTHESES - INTRODUCTION Definition. A hypothesis is a statement about an unknown parameter. Based on sample data we TEST if the HYPOTHESIS is true or false. DECISION: Accept or reject hypothesis. - If the hypothesis is consistent with the data, we accept it (no reason to reject it). - Otherwise, we reject the hypothesis in favor of an alternative relative to which we judge our hypothesis. NULL AND ALTERNATIVE HYPOTHESES Null hypothesis Ho – statement denoting “no effect”, “no change”. Alternative hypothesis Ha - reflects “expected change”, “research hypothesis”. IDEA: We hold on to Ho as true and reject it ONLY IF there is sufficient evidence against it. EXAMPLE: A coin is tossed 100 times and gives 62 H. Is it a fair coin? Assume the coin is fair until proven otherwise. Ho: P(H)=0.5 Alternative? Coin is not fair. Ha: P(H) ≠ 0.5 Two sided alternative (possible values of P(H) on both sides of 0.5) NULL AND ALTERNATIVE HYPOTHESES –EXAMPLES contd. One sided alternatives. 1. Coin tossing example contd. New question. Does the coin favor H? Ho: P(H) ≤ 0.5 Ha: P(H) > 0.5 2. Suppose the coin came up H 25 times. Does the coin favor T? Ho: P(H) ≥ 0.5 One sided alternatives Ha: P(H) < 0.5 Example2. A new drug comes to market for high blood pressure. Decide if it is significantly better than the old one. Ho: Old drug same as new (both drugs equally effective) Ha: New drug better than old (new drug more effective than old one) Example: Identify the Null and Alternative Hypothesis. a) The proportion of drivers who admit to running red lights is greater than 0.5. • Ho: p ≤ 0.5 H1. p > 0.5 b) The mean height of professional basketball players is at most 7 ft. • Ho: μ ≥ 7ft H1. μ < 7ft c) The standard deviation of IQ scores of actors is equal to 15. • Ho: σ = 15 H1. σ ≠ 15 Example: ProCare Industries claimed that couples using their product Gender Choice would have girls at a rate that is greater than 50% or 0.5. In an experiment whereby 100 couples used Gender Choice in an attempt to have a baby girl, there were exactly 52 girls born. NOTE: Under normal circumstances the proportion of girls p is 0.5, so a claim that Gender Choice is effective can be expressed as p > 0.5. Ho P ≤ 0.5 H1 p> 0.5 Is the observed sample unusual? Could it happen by chance? Assume p=0.5. Using a normal distribution as an approximation to the binomial distribution, we find P(52 or more girls in 100 births) = 0.3821. Conclusion. We do not reject random chance as a reasonable explanation. We conclude that the proportion of girls born to couples using Gender Choice is not significantly greater than the number that we would expect by random chance EXAMPLE Average score on midterms in a calculus class in the past years was 70. This year a sample of 100 students averaged 73. Are the students smarter this year? Assume scores follow a normal distribution with σ=10. Solution. Let μ= true mean midterm score this year (unknown), X = score, sample mean x = 73, σ=10, n=100, X ~ N(μ, 10). Ho: μ ≤ 70 (no change from last yr) If Ho true, how likely is it to observe By CLT (or Fact 2), Ha: μ > 70 (smarter students this yr) X µ= 70 and σ= X X of 73 or higher? σ = n 10 = 1, 100 73 − 70 P( X ≥ 73)= P( Z ≥ )= P( Z > 3)= 0.0013 1 so very small! So, the data suggests that Ho is not true. DECISION: Reject Ho. CONCLUSION: Students are smarter this year. NOTES The data is the truth. If the chances of observing what actually happened are small if Ho is true, then the data tells us that Ho is not true. If the data we observed are likely to come up when Ho is true, the data supports Ho. When in doubt about one sided alternative, use two sided alternative. Example A market survey company MS claims that more than 25% of Internet users pay their bills online. A recent survey of 50 Internet users in Nevada showed that only 5% pay their bills online. Assuming that MS claim holds in NV, the chances of such survey results are very small, below 0.0005. Do the survey results provide evidence in support of the MS claim in NV? Answer: Since, if the claim holds, the chances of observing what was observed are very small, then the claim is probably not true. This data does not support the MS’s claim. TESTING HYPOTHESES: P-VALUE APPROACH QUESTION: How to make the decision TO REJECT OR NOT Ho? A p-value is the probability of observing a value of the test statistic at least as contradictory to Ho (favoring Ha) as the observed value, when Ho is assumed to be true. In the calc test example, for the sample of 100 observations with sample mean 73: p-value=P( X ≥ 73 given that μ=70) = 0.0013. If p-value is small Ho is true is small probability of observing what we observed if we have evidence against Ho reject Ho. Typically, we reject Ho for p-values below 0.01 or 0.05. P-value is also called observed significance level. TESTING HYPOTHESES: ERRORS CORRECT DECISIONS AND ERRORS DECISION Reject Ho Do not reject Ho Ho true Type I error Correct decision Ho false Correct decision Type II error TRUTH TESTING HYPOTHES: FIXED LEVEL OF SIGNIFICANCE APPROACH 2 TYPES OF ERROR: Type I Error: Reject Ho when it is true. Type II Error: Do not reject Ho when it is false. LEVEL OF SIGNIFICANCE α OF A TEST = probability of Type I error we are willing to tolerate. Our procedures are constructed in such a way that they have minimal chance of Type II error for a given significance level α. Usually significance level is given/decided before any data is collected. Significance level is up the researcher. Controlling Type I and Type II Errors; Power of a Test For any fixed α, an increase in the sample size n will cause a decrease in β. For any fixed sample size n, a decrease in α will cause an increase in β. Conversely, an increase in α will cause a decrease in β. To decrease both α and β, increase the sample size. Power of a test. The power of a hypothesis test is the probability (1 - β ) of rejecting a false null hypothesis. It is computed by using a particular significance level α and a particular value of the population parameter that is an alternative to the value assumed true in the null hypothesis. That is, the power of the hypothesis test is the probability of supporting an alternative hypothesis that is true. 6.2: Large sample test for the population mean. Setup: Population normal, σ known, OR if σ not known LARGE SAMPLE and population close to normal. ONE SAMPLE Z-TEST STEP 1. Ho: μ = μo Ho: μ ≤ μo ( ≥ ) Ha: μ ≠ μo or Ha: μ > μo (<) STEP 2. Compute the test statistic: x −µ z= . σ/ n STEP 3. Compute the critical number/value for the test Find the critical/rejection region for the test. The critical value depends on Ha. Two sided alternative One sided alternatives critical value = zα/2. critical value = zα if Ha: μ > μo ; or = – zα if Ha: μ < μo FIXED LEVEL OF SIGNIFICANCE PROCEDURE contd. STEP 4. DECISION-critical/rejection regions: depend on Ha. Ha: μ ≠ μo Reject Ho if |z|> zα/2; Ha: μ > μo Reject Ho if z > zα; Ha: μ < μo Reject Ho if z < - zα. STEP 5. Answer the question in the problem. P-value approach STEP 3. Compute the p-value. Value of the test statistic Two sided test p-value: Ha: μ ≠ μo , P-value: 2P( Z > |z|) One sided tests p-values: Ha: μ > μo, P-value: P( Z > z) Ha: μ < μo, P-value: P( Z < z) STEP 4. DECISION Reject Ho if p-value < significance level α. STEP 5. Answer the question in the problem. EXAMPLE 1. Suppose the verbal SAT score for 100 students gives average of 500, and it is known that σ =100. Test the hypothesis that the true mean SAT score for this population is 475 versus a two sided alternative. Use significance level α = 5%. Solution. n=100, x = 500, STEP1. Ho: μ = 475 STEP 2. Test statistic = z σ=100, μo=475, α = 5%. Ha: μ ≠ 475 500 − 475 = 2.5. 100 / 100 STEP3. Critical value zα/2= z0.025= 1.96. STEP 4. z=2.5 > 1.96 reject Ho. STEP5. There is enough evidence to support the claim that the true mean SAT score for this pop. differs significantly from 475. P-value: P( Z > 2.5)= 0.0062, p-value=2(0.0062)= 0.0124 EXAMPLE 2. Suppose that the mean height of men is 66”. A sample of 36 women yielded mean height of 62”. Are women, on average, shorter than men? Use σ =10 and 1% significance level. Compute p-value for your test. Solution. n=36, x = 62 ,σ=10, STEP1. Ho: μ ≥ 66 STEP 2. Test statistic μo=66, α = 1%. Ha: μ < 66 z= 62 − 66 = −2.4 10 / 36 STEP3. Critical value zα= - z0.01= - 2.33. STEP 4. z = - 2.4 < -2.33, reject Ho. STEP5. There is enough evidence to support the claim that on average, women are shorter than men. 62 − 66 P-value = P ( X ≤ 62) = P( Z ≤ = ) P( Z ≤ −2.4) = 0.0082. 10 / 36 Testing hypotheses in MINITAB Example: SAT scores. Use Stat, Basic Statistics, 1 sample Z, use Summarized data, check “perform hypothesis test”, and set hypothesized mean, in Options set the confidence level (1- significance level) and set alternative hypothesis. Results: One-Sample Z Test of mu = 475 vs not = 475 The assumed standard deviation = 100 N Mean SE Mean 95% CI 100 500.0 10.0 (480.4, 519.6) Z P 2.50 0.012 Since p-value=0.012 < 0.05=significance level, we reject Ho, and conclude that the mean SAT score for this population differs significantly from 475. Testing hypothesis MINITAB: Heights example. Results: One-Sample Z Test of mu = 66 vs < 66 The assumed standard deviation = 10 99% Upper N Mean SE Mean Bound 36 62.00 1.67 65.88 Z P -2.40 0.008 Since the p-value=0.008 < 0.01=significance level, we reject Ho, and conclude that women are on average shorter than men.