Author: Brenda Gunderson, Ph.D., 2012 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License: http://creativecommons.org/licenses/by-nc-sa/3.0/ The University of Michigan Open.Michigan initiative has reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The attribution key provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to attribute these materials visit: http://open.umich.edu/education/about/terms-of-use. Some materials are used with permission from the copyright holders. You may need to obtain new permission to use those materials for other uses. This includes all content from: Mind on Statistics Utts/Heckard, 4th Edition, Cengage L, 2012 Text Only: ISBN 9781285135984 Bundled version: ISBN 9780538733489 SPSS and its associated programs are trademarks of SPSS Inc. for its proprietary computer software. Other product names mentioned in this resource are used for identification purposes only and may be trademarks of their respective companies. Attribution Key For more information see: http:://open.umich.edu/wiki/AttributionPolicy Content the copyright holder, author, or law permits you to use, share and adapt: Creative Commons Attribution-NonCommercial-Share Alike License Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Make Your Own Assessment Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. Public Domain – Ineligible. WOrkds that are ineligible for copyright protection in the U.S. (17 USC §102(b)) *laws in your jurisdiction may differ. Content Open.Michigan has used under a Fair Use determination Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act (17 USC § 107) *laws in your jurisdiction may differ. Our determination DOES NOT mean that all uses of this third-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use t his content you should conduct your own independent analysis to determine whether or not your use will be Fair. Module 5: One-Sample t-Test Procedures Objective: In this module, you will learn an important statistical technique that will allow you to answer the question, “Was our observation due to chance, or is it more significant?” The objective is to guide you through the ideas behind tests of statistical significance and the statistical language involved. This module first presents a general overview of testing. Then, the activities discuss the one-sample t-test for a population mean, as well as providing practice in answering the question of whether an event can be attributed to chance. Overview: Hypothesis Tests: A test of hypotheses (or significance test) is a procedure designed to assess what the evidence provided by the data says about some statement about a population parameter. The key elements of a statistical test include: a null and alternative hypothesis, assumptions that must be checked, a test statistic and its p-value, a decision, and a conclusion. These components depend on the distribution of the population from which the data come as well as the question being studied. Hypothesis Testing Steps: 1. Determine appropriate null and alternative hypotheses. 2. Check assumptions for performing the test and calculate the test statistic. 3. Calculate the p-value under the assumption the null hypothesis is true. 4. Determine if the result is statistically significant. 5. Report a conclusion in the context of the problem. The first step of a hypothesis test is to identify the hypotheses; this step is crucial, as it dictates the procedures for the remainder of the test. The null hypothesis, H0, represents the status quo or statement of no effect. That is, it is the model from which the experimenter would like to show a deviation. The alternative hypothesis, Ha (or sometimes H1), represents the experimenter's new model, or what the experimenter would like to show. In both instances, the hypotheses are postulated about the population parameter of interest. The alternative hypothesis can take three different forms – it may be a denial of the null hypothesis (uses ; called a two-sided test), or it may specify a direction of interest (uses > or <; called a one-sided test). Notice that we never test for equality in Ha. 71 The purpose of a significance test is to assess whether or not the observed data are consistent with the null hypothesis (within the reasonable bounds of sampling variability). If the data seem unlikely to occur if the null hypothesis is assumed true, then we would reject the statement made in the null hypothesis. To help us make this decision, we use a test statistic, which represents a summary of the data. More specifically, a test statistic is a random variable related to the hypotheses of interest that has a known probability distribution (under the null hypothesis), and will be examined for evidence in favor of or against H0. In hypothesis testing, another frequently reported value is the p-value, a number that is used to indicate the degree of significance of the data. The p-value is the probability of getting a test statistic as extreme or more extreme than the observed value of the test statistic, assuming the null hypothesis is true. Here “extreme” means in the direction of Ha, or providing more evidence against H0 . We must decide in advance how much evidence against H0 we will require for rejection. This designated amount of evidence is called the level of significance, denoted by α (alpha). Common values of α are 0.01, 0.05, and 0.10. If the p-value is less than or equal to α, we make the decision to reject H0. If we reject the null hypothesis, the results of the test are said to be statistically significant at level α. A “significant” result in the statistical sense does not necessarily imply an “important” result in the practical sense. It simply means that such a difference from the null hypothesis is not likely to happen just by chance. Of course, no procedure is perfect, and as such, there are two types of errors possible during hypothesis testing. If the null hypothesis is true but the decision is to reject H0, then we say that a Type I error has occurred. A Type II error occurs when the alternative hypothesis is true, but we fail to reject H0. Each type of error has a probability of occurring. If the null hypothesis is true, the level of significance, α, is also the probability of a Type I error, while the probability of a Type II error is denoted by β. Another important component of a hypothesis test is its power. The power of a test measures its ability to detect an alternative hypothesis when it is true. Power of a particular test is calculated as the probability that the test will reject H0 when the alternative hypothesis is true. Since we just learned that β is the probability that we did NOT reject H0 when the Ha is true, we can see that power is represented by 1 – β. 72 Result Associated Probability Reject H0 Type I Error α Do Not Reject H0 Correct Decision 1–α Reject H0 Correct Decision 1 – β = power Do Not Reject H0 Type II Error β Truth Decision Made H0 True Ha True A Few Notes: 1. In practice we want to protect the status quo, so we are usually most concerned with Type I error. We set Type I error at a constant value by fixing the significance level α. 2. Most tests we describe have the smallest for given α. 3. For a fixed sample size n, there is a tradeoff between α and – decreasing one type of error increases the other. Ideally, we want the probabilities of making a mistake to be small. However, once a decision is made, it is either right or wrong – only prior to looking at the data can we talk about probabilities of making these two types of errors. 73 One-Sample t-Tests: A one-sample t-test is used to test whether the mean of a quantitative variable is significantly different from some value. This value, the test value or μ0 , is given in the null hypothesis (H0: μ = μ0), and is often taken to be zero (i.e., H0: μ = 0). The test relies on two key assumptions: (1) the data is a random sample and (2) the data are observations from a normally distributed population. The Central Limit Theorem (see Module 4) allows the assumption of normality to be more relaxed if our sample size, n, is large. (Generally, “large” means more than 30 observations, although it also depends somewhat on how serious the data depart from normality.) In Module 4 (CLT) you worked with an applet that demonstrated the CLT using a large sample size of 25. Thus we will accept at least 25 as large enough for inference about means. Once the hypotheses are stated, we carry out the test by first calculating the test statistic, = x̅−µ0 𝑠/√𝑛 . The test statistic tells us how many standard errors the sample mean, x̅, is from the test value, μ0 . To summarize the statistical importance of this distance, we calculate the p-value using the distribution of the test statistic. Our test statistic, t, has a t-distribution with n-1 degrees of freedom (df), which is written as t(n-1). Formula Card: 74 Activity 1: Is There Salmonella enteritidis in the Ice Cream? Background: A massive multi-state outbreak of food-borne illness was attributed to Salmonella enteritidis. Epidemiologists determined that the source of the illness was ice cream. To determine the level of Salmonella enteritidis in the ice cream, they sampled nine production runs from the company that produced the ice cream. The levels of Salmonella enteritidis in Most Probable Number/Gram (MPN/g) are given in the salmonella.sav data set (Source Lyman Ott et. al., pg 232). The researchers would like to use these data to determine whether the average level of Salmonella enteritidis in the ice cream is greater than 0.3 MPN/g, since such levels are considered very dangerous using a 5% significance level. Time ordering of the data is present since the sample is collected from production runs. Task: Perform a test to assess if the average level of Salmonella enteritidis in the ice cream is significantly larger than 0.3 MPN/g. Recall: Write out the Five Steps for conducting a test of hypotheses (reference page 53). 1. 2. 3. 4. 5. Clarify: Before conducting any test, here are a set of questions to ask yourself: How many populations are there? One Two How many variables are there? Two What is the response variable? What type of variable is the response? What type of parameter would be useful for summarizing this response? (See Supplement 3) Proportion Mean Other One 75 Categorical More than two Quantitative Procedure: Use your answers to these questions, to guide you to the appropriate inference procedure. You may refer back to Supplement 3: Name that Scenario for assistance. The appropriate inference procedure for this scenario is __________________________________________________________ , and the specific parameter of interest is ___________________ . Hypothesis Test: You can now implement the Five Steps for conducting a test of hypotheses. 1. State the Hypotheses: H0: ___________ = ___________ and Ha: ___________ ___________ , where __________ represents: Remember: Your hypotheses and parameter definition should always be a statement about the population(s) under study. 2. Checking the Assumptions and Computing the Test Statistic Assumptions: a. For this scenario, we need to assume that the data are a ________________ sample. To check this assumption, we would make a ________ plot (if there was time order) of the observations, and look for ____________________ . b. We also need to assume that the responses come from a __________________ distributed ___________________ . To check this assumption, we would make a __________ plot. c. Comment on each assumption below, using graphs and output when appropriate. 76 d. Is the assumption of normality really that important with a sample of size of 9? Why? If the sample size were larger, would this assumption be as important? Why? Test-Statistic: e. Generate the t-test output using Analyze> Compare Means> One-Sample T Test. Enter a test value of __________ (this is the value you are testing against from the null hypothesis). f. What is the value of the test statistic? ____ = ____________ g. Provide an interpretation of the test statistic value. (To guide you, see Supplement 6.) h. What is the distribution of the test statistic if the null hypothesis is true? Note: This is not the same as the distribution of the population that the data were drawn from, and will be the model used to find the p-value. 3. Calculate the p-Value: 1. What is the SPSS reported p-value? _____________ . Is this the p-value we want? ________ b. Draw a picture of the desired p-value. distribution and x-axis. Include labels for both the c. Ultimately, the p-value we want is _____________ d. Provide an interpretation of the p-value (see Supplement 6). 77 4. Decision: What is your decision at a 5% significance level? Reject H0 Remember: Reject H0 Fail to Reject H0 Fail to Reject H0 Results statistically significant Results not statistically significant 5. Conclusion: What is your conclusion in the context of the problem? Note: Conclusions should always include a reference to the population parameter of interest. Be careful that your conclusion is not too strong; you can say that you have sufficient evidence or something equivalent, but do NOT say that we have proven anything. 78 Activity 2: Testing the Population Mean with p-Value Practice For each of the following sets of hypotheses and test statistic values, provide a sketch of the p-value. You can assume the degrees of freedom are 20 throughout. Then use your computer with the pval() function in R (accessible in its own folder under “Lab Info” folder, which is in the “Resources” folder on the Stats 250 CTools site), your calculator (if it has the capabilities), or the partial t distribution table provided below, and compute the p-value (or bounds for the p-value). Finally consider H0 at a 5% significance level, and circle the corresponding statistical decision. df 20 1.28 0.108 1.50 0.075 1. H0: μ = 20, Ha: μ > 20, Reject H0 Absolute Value of t-Statistic 1.65 1.80 2.00 2.33 0.057 0.043 0.030 0.015 t = 2.58 Do Not Reject H0 Sketch: 2. H0: μ = 6, Reject H0 Ha: μ > 6, t = 2.12 Do Not Reject H0 Sketch: 79 2.58 0.009 3.00 0.004 3. H0: μ = 0, Reject H0 Ha: μ < 0, t = -3.20 Do Not Reject H0 Sketch: 4. H0: μ = 6, Reject H0 Ha: μ ≠ 6, t = -1.87 Do Not Reject H0 Sketch: Check Your Understanding: Note that while the alternative hypothesis and significance level must be set in advance, we can think about what would have happened if the alternative had originally been two-sided. Complete the following to understand how the analysis of the ice cream data in Activity 1 would change assuming the alternative was twosided. How would the following change? a. Test statistic: -2.205 0 2.205 b. Distribution of thee test statistic assuming H0 is true: t(8) t(9) t(16) c. Interpretation of the value of the test statistic: d. p-value: 0.0295 0.059 1-0.0259 e. Decision at a 5% significance level: Reject H0 1-0.059 Fail to Reject H0 Think About It: If you were going to repeat the Salmonella study and wanted to increase the power of the test, which of the following could you do? Increase alpha Increase beta Increase the sample size, n Decrease the sample size, n 80 Example Exam Question on One-Sample t-Tests The Healthy Life Company produces canned whole pineapples. The production line is supposed to yield cans with an average weight of 1 kg. In order to assess that the production line is operating properly, the quality control statistician at Healthy Life selects a random sample of 36 cans produced over the course of the shift (keeping track of the order number). The latest sample gave a mean weight of 980 grams (1 kg = 1000 grams). Based on this sample, an estimate of the average distance that possible sample means are from the true population mean is about 8.33 grams. a. The value of 8.33 grams described above corresponds to what statistical term or notation? (Be specific.) b. The hypotheses to be tested are H0: = 1000 grams versus Ha: 1000 grams. To examine the data, various graphs were made. One graph is given below. State what assumption can be checked using the graph and comment on the validity of the assumption. Assumption: ____________________________________________. Based on this graph, this assumption appears (circle one): valid not valid because: WEIGHT 1100 1050 1000 950 900 850 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Order number 81 c. The test is to be performed using a 5% significance level. The observed t-test statistic is t = –2.4 and the corresponding 2-tailed p-value is 0.022. Consider the statements below and clearly circle all those that are correct statements for this statistical analysis. The null hypothesis is rejected at the 5% level. The results are not statistically significant at the 5% level. The probability that the null hypothesis is true is 0.022. If this process were operating as it should and repeated random samples of 36 cans were obtained, we would observe a t statistic of –2.4 or smaller or a t statistic of 2.4 or larger in about 2.2% of the repetitions. d. What would have been the p-value … i. if Ha: < 1000 grams? ii. if Ha: > 1000 grams? e. Compute a 95% CI for the population mean weight. Do you expect your 95% CI to contain 1000? Why? 82