Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. Some material may be sourced from: Mind on Statistics Utts/Heckard, 3rd Edition, Duxbury, 2006 Text Only: ISBN 0495667161 Bundled version: ISBN 1111978301 Material from this publication used with permission. Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt { Content the copyright holder, author, or law permits you to use, share and adapt. } Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Creative Commons – Zero Waiver Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Make Your Own Assessment { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ { Content Open.Michigan has used under a Fair Use determination. } Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. Module 5: One Sample t-Test Procedures Objectives: In this module you will learn an important statistical technique that will allow you to answer the question, “Was it due to chance, or is there something else?” The objective is to guide you in the understanding of the ideas behind tests of statistical significance and the statistical language involved. This module first presents a general overview of testing. The activity discusses the one-sample t test for a population mean. Module 6 will cover the paired data scenario for a population mean difference and Module 7 will discuss inference for comparing two population means based on independent random samples. Overview of testing: A test of hypotheses or significance test is a procedure designed to assess the evidence provided by the data in favor of some statement about a population parameter. Elements of a statistical test include: a null and alternative hypothesis, assumption checking, a test statistic, a p-value, a decision, and a conclusion. The choice of the test statistic depends on the distribution of the population from which the data come as well as the hypotheses being considered. The null hypothesis, H0, represents the status quo or statement of no effect. It is generally the model that the experimenter would like to replace. The alternative hypothesis, Ha, usually represents the experimenter's new model, what the experimenter would like to support (some texts will write the alternative as H1). It may be a denial of the null hypothesis (two-sided test, ) or it may specify a direction of interest (one-sided test; >, <). The purpose of significance tests is to assess whether or not the observed data are consistent with the null hypothesis within the reasonable bounds of sampling variability. If the data seem to be unlikely to occur if the null hypothesis is assumed to be true, then we would reject the statement made in the null hypothesis. The test statistic is a summary of the data that is used to help make the decision. It is a random variable related to the hypotheses of interest having a known probability distribution (under the null hypothesis) and will be examined for evidence for or against H0. In hypothesis testing, it is often preferable to report the p-value, a number that is used to indicate the degree of significance of the data. The p-value is the probability of getting a test statistic as extreme or more extreme than the observed value of the test statistic, assuming the null hypothesis is true. Here “extreme” means in the direction of Ha or providing more evidence against H0. Hypothesis Testing Steps: 1. Determine appropriate null and alternative hypotheses. 2. Check assumptions for performing the test and calculate the test statistic. 3. Calculate the p-value under the assumption the null hypothesis is true. 4. Determine if the result is statistically significant. 5. Report a conclusion in the context of the problem. 52 We must decide in advance how much evidence against H0 we will insist on. This designated amount of evidence is called the level of significance and denoted by (alpha). Common values of are 0.01, 0.05, and 0.10. The decision is made to reject H0 if the p-value is less than or equal to . If we reject the null hypothesis, the results of the test are said to be statistically significant at level . A “significant” result in the statistical sense does not necessarily imply an “important” result in the practical sense. It means simply that such a difference from the null hypothesis is “not very likely to happen just by chance”. There are two types of errors that can be made in hypothesis testing. If the null hypothesis is true but the decision is to reject H0, then a Type I error is said to have occurred. However, failing to reject H0 when the alternative hypothesis is true is called a Type II error. If the null hypothesis is true, the level of significance is also the probability of a Type I error. The probability of a Type II error is denoted by . The power of a test measures its ability to detect an alternative hypothesis when it is true. Power against a particular alternative is calculated as the probability that the test will reject H0 when the alternative hypothesis is true and thus represented by 1 – Truth Decision Made Result Reject H0 Type I Error Not Reject H0 Correct Decision Reject H0 Correct Decision Not Reject H0 Type II Error H0 True Ha True A few notes: 1. In practice we want to protect the status quo, so we are most concerned with type 1 error. We fix type 1 error at a constant value by fixing the significance level . 2. Most tests we describe have the smallest for given . 3. For a fixed sample size n, there is a tradeoff between and . Decreasing one type of error increases the other. Ideally we want the probabilities of making a mistake to be small. However, once a decision is made, it is either right or wrong – only prior to looking at the data can we talk about probabilities of making these two types of errors. 53 Overview of One Sample t-test: The one sample t-test is used to test whether the mean of a quantitative variable is significantly different from some value. This value, the test value, is given in the null hypothesis (H0: µ = µ0 ) and is often taken to be zero; i.e. H0: µ= 0. The test relies on two key assumptions: (1) the data is a random sample and (2) the data are observations from a normally distributed population. Thanks to the central limit theorem (see Module 3) the assumption of normality can be relaxed if our sample size, n, is large. (Generally, ‘large’ means more than 30 observations though it depends somewhat on how serious the data depart from normality.) To carry out the test we first calculate the test statistic: . The test statistic, t, tell us how many standard errors the sample mean, , is from the test value, . To summarize the statistical importance of this distance we can calculate a p-value using the t-distribution with n-1 degrees of freedom (df). The distribution of the test statistic is t(n-1) rather than normal because both the sample mean, , and the sample standard deviation, s, are random variables (since they depend on the data). Formula card: 54 Activity 1: Is there Salmonella enteritidis in the ice cream? Background: A massive multistate outbreak of food-borne illness was attributed to Salmonella enteritidis. Epidemiologists determined that the source of the illness was ice cream. They sampled nine production runs from the company that had produced the ice cream to determine the level of Salmonella enteritidis in the ice cream. The levels of Salmonella enteritidis in Most Probable Number/Gram (MPN/g) are given in the Salmonella.sav data set (Source Lyman Ott et. al., pg 232). The researchers would like to use these data to determine whether the average level of Salmonella enteritidis in the ice cream is greater than 0.3 MPN/g, since such levels are considered to be very dangerous using a 5% significance level. Time ordering of the data is present since the sample is collected from production runs. Task: Perform a test to assess if the average level of Salmonella enteritidis is significantly larger than 0.3 MPN/g. Recall: Write out the Five Steps for conducting a test of hypotheses (Reference page 51). 1. 2. 3. 4. 5. Before conducting any test, here are a set of questions to ask yourself: How many populations are there? One Two More than two How many variables are there? One Two What is the response variable? What type of variable is the response? Categorical Quantitative What type of parameter would be useful for summarizing this response? Proportion Mean Other (see Supplement 3) Based on the answers to these questions, you should be able to identify the appropriate inference procedure. You may refer back to Supplement 3 – Name that Scenario for assistance. The appropriate inference procedure for this scenario is ______________________________ and the specific parameter of interest is ___________________ . 55 1. State the hypotheses: H0: _____ = _____________ Ha: ______ _____________ where _____ represents: *Your parameter definition should always be a statement about the population(s) under study. 2. Assumption Checks and Computing the Test Statistic Assumptions: a. For this scenario, we need to assume that the data are a ___________ sample. To check this assumption, we would make a _______ plot (if there was time order) of the observations and look for _____________________________________. b. We also need to assume that the responses come from a ______________ distributed _________________ . To check this assumption, we would make a _______ plot. c. Comment on each plot about whether these assumptions appear satisfied. Normal Q-Q Plot of SAL 1.0 .8 .7 .8 .6 .6 Expected Normal Value .5 .4 .4 .3 .2 SAL .2 .1 .1 .2 .3 .4 .5 .6 .7 .8 0.0 1 Observed Value 2 3 4 5 6 7 8 9 Sequence number d. Is the assumption of normality really that important with a sample of size 9? Why? If the sample size were larger would this assumption be as important? Why? Test-statistic: e. Generate the t-test output. Use Analyze> Compare Means> One-Sample T-Test. f. The test value is _____ (this is the null value from the null hypothesis). g. What is the value of the test statistic? h. Provide an interpretation of the test statistic value. (To guide you, see supplement 6 in your lab module notebook.) i. What is the distribution of the test statistic if the null hypothesis is true? This is the same as asking what model you use to find the p-value. 56 3. Calculate the p-value: a. What is the SPSS reported p-value? _____________. Is it the p-value we want? _____ b. Draw a picture of the p-value we want. Note you should be able to label the distribution based on question 2 (i). c. So, our p-value is _____________________ d. Provide an interpretation of the p-value (see supplement 6). 4. Decision: What is your decision at a 5% significance level? Reject H0 Fail to reject H0 Remember: Reject H0 Fail to reject H0 Results statistically significant Results not statistically significant 5. Conclusion: What is your conclusion in the context of the problem? Conclusions should not be too strong -- i.e. say you have sufficient evidence or equivalent, do NOT say we have proven. Conclusions should always include a reference to the population parameter of interest. 57 Activity 2: Testing the Population Mean with p-value practice For each of the following sets of hypotheses and test statistic values, provide a sketch of the p-value. You can assume the degrees of freedom are 20 throughout. Then use your computer with the pval() function in R, your calculator (if it has the capabilities), or the provided partial t distribution table and compute the p-value (or bounds for the p-value) and circle the corresponding statistical decision (using a 5% significance level). Note: This partial t distribution table can also be found on your formula card in Table A.3. Df 20 1.28 0.108 1.50 0.075 Absolute Value of t-Statistic 1.65 1.80 2.00 0.057 0.043 0.030 2.33 0.015 2.58 0.009 3.00 0.004 1. H 0 : 20, H a : 20, t 2.58 Sketch: Reject H0 Do Not Reject H0 2. H 0 : 6, H a : 6, t 2.12 Sketch: Reject H0 Do Not Reject H0 3. H 0 : d 0, H a : d 0, t 3.20 Sketch: Reject H0 Do Not Reject H0 4. H 0 : 6, H a : 6, t 1.87 Sketch: Reject H0 Do Not Reject H0 58 Check Your Understanding: Note that while the alternative hypothesis and significance level must be set in advance, we can think about what would have happened if the alternative had originally been two-sided. Complete the following to understand how the analysis of the ice-cream data in Activity 1 would change assuming the alternative was two-sided (instead of one-sided as you did on page 54). The test statistic would be: -2.205 0 2.205 The distribution of the test statistic assuming the null hypothesis is true is: t(8) t(9) t(16) Your interpretation of the value of the test statistic would change to: The p-value would be: 0.0295 0.059 1-0.0295 Your new decision at a 5% level would be: Reject H0 1-0.059 Fail to reject H0 If you were going to repeat the Salmonella study and wanted to increase the power of the test, which of the following could you do? Increase alpha Increase beta Increase the sample size, n Decrease the sample size, n 59 Example Exam Question on One-Sample t-test The Healthy Life Company produces canned whole pineapples. The production line is supposed to yield cans with an average weight of 1 kg. In order to assess that the production line is operating as it should, the quality control statistician at Healthy Life selects a random sample of 36 cans produced over the course of the shift (keeping track of the order number). The latest sample gave a mean weight of 980 grams (1 kg = 1000 grams). Based on this sample, an estimate of the average distance that possible sample means are from the true population mean is about 8.33 grams. a. The value of 8.33 grams described above corresponds to what statistical term or notation? (be specific) b. The hypotheses to be tested are H0: = 1000 grams versus Ha: 1000 grams. To examine the data, various graphs were made. One graph is given below. State what assumption can be checked using the graph and comment on the validity of the assumption. WEIGHT Assumption: ____________________________________________. Based on this graph, this assumption appears (circle one): valid not valid because: 1100 1050 1000 950 900 850 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Order number c. The test is to be performed using a 5% significance level. The observed t-test statistic is t = –2.4 and the corresponding 2-tailed p-value is 0.022. Consider the statements below and clearly circle all those that are correct statements for this statistical analysis. The null hypothesis is rejected at the 5% level. The results are not statistically significant at the 5% level. The probability that the null hypothesis is true is 0.022. If this process were operating as it should and repeated random samples of 36 cans were obtained, we would observe a t statistic of –2.4 or smaller or a t statistic of 2.4 or larger in about 2.2% of the repetitions. d. What would have been the p-value … i. if Ha: < 1000 grams? ii. if Ha: > 1000 grams? e. Compute a 95% CI for the population mean weight. Do you expect your 95% CI to contain 1000? Why? 60