Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. Some material may be sourced from: Mind on Statistics Utts/Heckard, 3rd Edition, Duxbury, 2006 Text Only: ISBN 0495667161 Bundled version: ISBN 1111978301 Material from this publication used with permission. Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt { Content the copyright holder, author, or law permits you to use, share and adapt. } Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Creative Commons – Zero Waiver Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Make Your Own Assessment { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ { Content Open.Michigan has used under a Fair Use determination. } Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. Module 6: Paired t Procedures Objectives: In this module you will learn how to construct a confidence interval and perform a paired ttest in the case when we have two quantitative variables collected in pairs. You will make a confidence interval for and test hypotheses about the population mean difference, D . You will be able to provide a statement about how confident you are about your interval estimate or in your decision. Overview: Matched or paired data results from a deliberate experimental design scheme. For example, suppose we are examining the effect of a drug on a certain type of response. The drug is administered to a group of people. Responses for each individual can be measured both before and after the drug is given. Or consider an experiment where rats are matched by weight, and then one rat in each match receives a new diet and the other rat in the match receives a control diet. These types of design are called paired data designs. Note that paired designs can occur when you have two measurements on the same individual or when you have two individuals that have been matched or paired prior to administering a treatment. The inference procedures for a paired data design are based on the one sample of differences, and thus the one-sample t procedures from Module 5 could be used. We are interested in estimating or testing hypotheses about the population mean difference D , generally with the hypothesized value of zero, indicating no difference on average. Formula Card: 62 Activity: Do books purchased from Borders (in-store) cost more on average than if purchased online at Amazon.com? Background: In recent years, the popularity of purchasing books via the Internet has increased dramatically. The conventional bookstore no longer dominates the sales of books. The most influential factor that sways customers into purchasing books online is lower prices when compared to local bookstores. A group of Statistics 350 students decided to perform a comparison of the Amazon.com prices versus Borders bookstore (Ann Arbor) prices based on a sample of 40 books, selected from a wide range of categories. For Amazon, a standard ground shipping of $4.29 and local state tax of 6% were included in the cost. The corresponding costs are available in the SPSS data set called books.sav (Source: Stat 350 group project, 2004). Do the data provide sufficient evidence to conclude that, on average, Borders (in-store) books are more expensive than Amazon.com books? Task: Perform the appropriate paired t-test regarding the mean difference in book price, D , where the differences are computed as “Borders less Amazon“ (i.e. ‘price at Borders’ minus ‘price on Amazon’). Before conducting any test, here are a set of questions to ask yourself: How many populations are there? One Two More than two How many variables are there? One Two What is the response variable? What type of variable is the response? Categorical Quantitative What type of parameter would be useful for summarizing this response? Proportion Mean Other (see Supplement 3) Based on the answers to these questions, you should be able to identify the appropriate inference procedure. You may refer back to Supplement 3 – Name that Scenario for assistance. The appropriate inference procedure for this scenario is ______________________________ and the specific parameter of interest is ___________________ . NOTE: Why is this a paired procedure? 1. State the hypotheses: H0: __________ = __________ Ha: _______________________ where _____ represents Your parameter definition should always be a statement about the population(s) under study. 63 2. Assumption Checks and Computing the Test Statistic: Assumptions: a. For this scenario, we need to assume that the sampled differences are a ________ sample. To check this assumption, we would make a _______ plot (if there was time order) of the _________________ and look for _____________________________________. b. We also need to assume that the ___________________ of differences is normally distributed. To check this assumption, we would make a _____________ plot of the __________________. c. We will assume the assumptions are reasonable for this example. Test-statistic: d. Generate the paired t-test output. Use Analyze> Compare Means> Paired-Samples T-Test. Note: If you want a CI, you can use Options to change the confidence level from 95%. e. The test value is _____ (this is the null value from the null hypothesis). f. What is the value of the test statistic? g. What is the distribution of the test statistic if the null hypothesis is true? This is the same as asking what model you use to find the p-value. 3. Calculate the p-value: a. What is the SPSS reported p-value? _____________. Is it the p-value we want? _____ b. Draw a picture of the p-value we want. c. So, our p-value is _____________________ d. Provide an interpretation of the p-value. 64 4. Decision: What is your decision at a 5% significance level? Reject H0 Fail to reject H0 Remember: Reject H0 Fail to reject H0 Results statistically significant Results not statistically significant 5. Conclusion: State your conclusion in the context of the problem. Conclusions should not be too strong -- i.e. say you have sufficient evidence or equivalent, do NOT say we have proven. Conclusions should always include a reference to the population parameter of interest. Check Your Understanding: 1. The denominator of the test statistic is the standard error of the sample mean difference. The following two sentences attempt to interpret a standard error. Which one is correct and why? “The standard error of the sample mean estimates roughly the average distance of the sample mean from the population mean” “The standard error of the sample mean difference estimates roughly the average distance of the observed differences from the population mean differences” Think About It: What is the connection between the paired t-test procedure and the one-sample t-test procedure from Module 5? How could you carry out this test via the one-sample procedure? Try this and compare your results. Comment on your findings. 65 Example Exam Question on Paired t-Test A utilization study was conducted to see how often two rooms of a sports facility were being used during the lunch hour. The number of people in each room was counted at 12:30 noon each Monday for 10 weeks. The results are summarized below. Week 1 2 3 4 5 6 7 8 9 10 1 = Dance Studio 2 = Weight Room D = Difference 16 11 5 20 13 7 7 8 -1 12 14 -2 11 12 -1 19 15 4 23 9 14 12 15 -3 25 18 7 16 10 6 Suppose you want to test the hypothesis of no difference between the utilization of the two rooms against the alternative that the dance studio is used by more people during the lunch hour on average. You conduct a (matched) paired t-test and enter the above data into SPSS to obtain the following output. m i f o n a l e r e E p e 2 e e d w t p a a v P # 0 4 9 2 2 3 9 2 1 # a. The observed test statistic is given as t = 2.13. State what this value tells you about the location of the sample mean difference of 3.6. b. State the appropriate null and alternative hypotheses, and define the parameter of interest. H0: _______ _____________________ Ha: _______ ____________________ where ______ is ____________________________________________________. c. Report the p-value for the test in part (b) and decision using a significance level of 0.10. p-value: _____________________ Decision: (circle) Reject H0 Fail to Reject H0 d. A decision was made in part (c). Which type of error could have been made? Use the appropriate statistical name to identify the mistake. Error: e. Circle each of the following statements that is an assumption required for performing the paired ttest. …the population standard deviation for the difference in room use is known. …the numbers using the dance studio are normally distributed. …the difference in room use is normally distributed. …the standard deviation for the number using the dance studio is equal to the standard deviation for the number using the weight room. …the numbers using the dance studio are independent of the numbers using the weight room. 66 67