Chapter 24 – Comparing Means Boxplot of data Early in the semester, we used boxplots to compare the distributions of different data sets. We will be using them here again when we compare the means of different data sets. This will give us a quick visual as to what we should be testing and whether or not there seems to be a difference between the means. Boxplots Example Below are figures from the text representing the performance of AA alkaline batteries from a major brand name and a generic competitor. Figures from DeVeaux, Intro to Stats Sampling Distribution: Difference Between 2 Sample Means When the conditions are met, the sampling distribution of the standardized sample difference between the means of 2 independent groups, y1 y2 1 2 t SE y1 y2 Can be found using a Student’s t-model with a number of degrees of freedom found by a special formula. We estimate the standard error with s12 s22 SE y1 y2 n1 n2 Assumptions and Conditions Independence Assumption Randomization condition 10% condition Normal Population Assumption Nearly Normal condition (check both groups) Independent Groups Assumption Can’t use related or matched pairs Two-Sample t-Interval for Difference between Means When the conditions are met, we are ready to find the confidence interval for the difference between means of two independent groups. The confidence interval is y1 y2 t df SE y1 y2 where the standard error of the difference of the means is s12 s22 SE y1 y2 n1 n2 The critical value depends on the particular confidence level, C, that you specify and on the number of degrees of freedom, which we get from the sample sizes and a special formula. Example: SSHA Test The Survey of Study Habits and Attitudes(SSHA) was given to male and female first-year students in a selected private school. Most of the studies suggest that the mean SSHA score for men is lower than the that in a comparable group of women. Is this true for first-year students at this college? Let’s explore in Minitab using the dataset: SSHA.mtw Example: SSHA.MTW Generate histograms of each data set to see if the Nearly Normal condition is met. Generate a boxplot to see if it looks like there is a difference between each gender’s performance on the test. Construct a 95% confidence interval using Minitab State your conclusion. Two-Sample t-Test for Difference between Means We test the hypothesis H0: 1 – 2 = 0, where the hypothesized difference, 0, is almost always 0, using the statistic y1 y2 0 t SE y1 y2 The standard error is s12 s22 SE y1 y2 n1 n2 When the conditions are met and the null hypothesis is true, this statistic can be closely modeled by a Student’s t-model with a number of degrees of freedom given by a special formula. We use that model to obtain a P-value. Example: Front vs. Back of the Class Hours spent studying per week are reported by students in a class survey. The 99 students who sat in the front studied an average of 16.4 hours per week with a standard deviation of 10.85 hours. The 94 students who sat in the back studied an average of 10.9 hours per week with a standard deviation of 8.41 hours. Test the claim that the students in the front of the class studied more on average than the students in the back. Use Minitab to perform a hypothesis test. Make sure to state your hypotheses before beginning the test. Homework Chapter 24: 1, 7, 9, 11, 19, 25, 27, 33, 43 Exam 3: Wednesday, 11/30 Project 2: Due Monday, 11/28