Chapter 7 Hypothesis Testing with One Sample Section 7.1 – Overview Hypothesis – a claim or statement about a property of a population Hypothesis Test – a standard procedure for testing a claim about a property of a population Below is a general outline of a hypothesis test. We will be discussing each of these steps in more detail as we discuss Section 7.2. Steps for Hypothesis Testing 1. Determine the hypotheses. Null Hypothesis: an assumption concerning the value of the population parameter being studied (usually represents no effect, no change, no difference, etc.) Notation: H0 Alternative Hypothesis: a statement that specifies an alternative set of possible values for the population parameter that is not included in the null hypothesis (states the result for which we hope to find evidence) Notation: H1 (or HA or Ha) Note: The null hypothesis may or may not be true. We will carry out a study and then determine if we have strong enough evidence to conclude that the null hypothesis is false (meaning our evidence suggests that H1 is true). 2. Obtain a simple random sample of n observations from the desired population and calculate the observed sample statistic. For example, if we want to test something about a population proportion (p), then we would calculate the sample proportion ( p̂ ). If we want to test something about a population mean (µ), then we would calculate the sample mean ( x ). The test statistic is the corresponding z-score (or t-score) for the observed statistic under the assumption that the null hypothesis is true. 3. Determine the “strength” of your evidence. The evidence is strong if the outcome we observe is highly unlikely to occur by chance, assuming the null hypothesis is true (meaning it is more probable that the alternative hypothesis is true). The evidence is weak if the outcome we observe can easily occur by chance, assuming the null hypothesis is true. We measure the strength of the evidence by calculating a P-value. P-value: the probability of obtaining a sample outcome as extreme or more extreme than the actual observed outcome, assuming the null hypothesis is true. The smaller the P-value, the stronger the evidence is against H0. (You may also think of the P-value as describing the risk of making a mistake if we wrongly reject the null hypothesis.) 4. Draw a conclusion. If the P-value is “small”, then we reject H0 in favor of H1. If the P-value is “large”, then we fail to reject H0, meaning we cannot conclude H1. Note: You may NEVER conclude that the null is true. Unfortunately, you CANNOT be certain that you have made the correct conclusion. Section 7.2 – Basics of Hypothesis Testing Null and Alternative Hypotheses Look at exercises 5, 7, 9, and 11 on page 335. Note: The claim that you wish to support must be worded so that it becomes the alternative hypothesis. Test Statistic z pˆ p pq n z Look at Examples 21 and 23 x n t x s n Critical Region, Significance Level, Critical Value, and P-value Critical Region – the set of all values of the test statistic that cause us to reject H0 (the set of all values that are highly unlikely to occur by chance if H0 is true) Significance Level – the probability that we choose to use to determine if an outcome is highly unlikely Critical Value(s) – the value(s) that separates the critical region from the rest of the sampling distribution Two-tailed test One-tailed test Left-tailed Right tailed P-value – the probability of obtaining a sample outcome as extreme or more extreme than the actual observed outcome, assuming the null hypothesis is true. Look at exercises 25, 26, 27, 29, and 31 Statistically Significant Decisions and Conclusions Decision Criterion: Traditional Method P-value Method Another Option Look at exercises 33 and 35 Giving the P-value is ALWAYS more informative than just stating if the results are statistically significant or not. Advantages to this Approach: When the P-value is reported, the decision of whether or not to reject the null hypothesis is left up to the reader. For example, suppose a P-value of .03 is reported. If you, the reader, think that a 5% level of significance (α = .05) is sufficient, then you would choose to reject the null hypothesis in favor of the alternative hypothesis. If, however, a second reader thinks that a 5% level of significance is insufficient and would rather use α = .01, then he or she would fail to reject the null hypothesis. Publishing Our Results: P-values are very often reported when describing the results of studies in many fields. Therefore, it is very important to understand what they are telling you. Example: The financial aid office of a university asks a sample of students about their employment and earnings. The report says, “For academic year earnings, a significant difference ( P-value = .038) was found between the sexes, with men earning more on the average.” Interpretation: If there really is no difference in academic year earnings between the sexes, then we would have seen a difference this big or bigger in only 3.8% of all samples. (i.e. There is only a 3.8% chance that these results occurred by chance alone.) Consequences of Our Decisions – Type I & Type II Errors The first thing to note is that for any hypothesis test, there are four possible outcomes, two of which are correct and two of which are incorrect. Actual Truth Decision Reject Ho Fail to Reject Ho H0 is true H1 is true Type I Error: We reject Ho when it is true. Type II Error: We fail to reject Ho when it is false. Probability of a Type I Error: The probability that the test statistic falls in the critical region when the null hypothesis is true. Notation: α Probability of a Type II Error: The probability that the test statistic does not fall in the critical region when the null hypothesis is false. Notation: β Power – the probability (1 – β) of rejecting a false null hypothesis (See pages 331-333 and exercise 43 for more details.) Implications of Rejecting or Failing to Reject the Null Hypothesis If a test statistic falls in the critical region, it does not prove that the null hypothesis is false. Instead, it indicates that we have strong evidence to believe it is not true. When the test statistic falls in the critical region, there are two possibilities. 1) The null hypothesis really is false. 2) By bad luck, we have observed a very unlikely event in our sample. Similarly, if the test statistic does not fall in the critical region, it does not prove that the null hypothesis is true. Instead, it indicates that our evidence is not strong enough to reject the null hypothesis. (This is the reason we do not want to say that we accept the null hypothesis.) Since we assume that the null hypothesis is true in the beginning, it takes strong evidence from the data to reject it. Usually we will choose (or be given) a small α, such as .05 or .01. By choosing α small, we can guarantee that our chance of making a Type I Error is small. Statistical Significance is NOT the same as practical importance. If we use a small sample, it is very unlikely that we will reject the null hypothesis. As the sample size increases, it becomes more likely that we will reject the null hypothesis. Hence, if a very large sample is used, we may reject the null hypothesis, thus reporting that our test statistic is statistically significant (meaning that it fell in the critical region), even if the difference is not of any practical importance. Section 7.3 – Testing a Claim About a Proportion Requirements 1. The data was gathered by using a simple random sampling method. 2. The conditions for the binomial distribution are satisfied. (Independent trials and each trial has two possible outcomes.) 3. Both np and nq are greater than or equal to 5. Examples – Exercises 1, 5, 7, 11, and 17 Section 7.4 – Testing a Claim About a Mean: σ Known Requirements 1. The data was gathered by using a simple random sampling method. 2. The value of the population standard deviation, σ, is known. 3. Either the population is already normal or n ≥ 30 so the Central Limit Theorem can be applied. Examples – Exercises 5, 7, 9, and 11 Section 7.5 – Testing a Claim About a Mean: σ Not Known Requirements 1. The data was gathered by using a simple random sampling method. 2. The value of the population standard deviation, σ, is not known. 3. Either the population is already normal or n ≥ 30 so the Central Limit Theorem can be applied. Examples – Exercises 1, 2, 3, 4, 5, 7, 9, 13, 15, and 17 Section 7.6 – Testing a Claim About a Standard Deviation or Variance Requirements 1. The data was gathered by using a simple random sampling method. 2. The population has a normal distribution. Test Statistic: 2 n 1 s 2 2 Examples – Exercises 1 and 5 with d.f. = n – 1