Statistics 101: Section L - Laboratory 6 In today’s lab we are going to look more at least squares regression, interpretation of slopes and intercepts, and how to account for another variable in regression. Activity 1: From lab 1 we have data on Total Weight (bag and contents), Net Weight (contents only) and Number, of M&Ms. We want to determine what is the weight of an empty bag and what is the weight of a single M&M. Below are some simple summary statistics. # bags mean Total Weight 110 21.01 g Net Weight 110 20.24 g Number 110 23.64 g 1. Weight of an empty bag? (a) From the simple summary statistics, come up with an estimate of the weight of a single empty bag. Explain your reasoning. (b) If we have Total Weight as the response variable and Net Weight as the explanatory variable, give an interpretation of the intercept of the line that relates Total Weight to Net Weight. The least squares regression equation relating Total Weight to Net Weight is: Predicted Total Weight = 2.55 + 0.912*Net Weight (c) What is the value of the least squares intercept? Given your answers in a) and b), does this value look right? Explain briefly. (d) What is the value of the least squares slope? Why does this value seem wrong? Hint: Think of the interpretation of the slope within the context of the problem. What should the value of the slope be? (e) If we force the slope to be the correct value, the least squares estimate of the intercept is 0.77. How does this compare with your estimate in a)? 2. Weight of a single M&M? (a) From the simple summary statistics, come up with an estimate of a single M&M. Explain your reasoning. (b) If we have Net Weight as the response variable and Number as the explanatory variable, give an interpretation of the slope of the line that relates Net Weight to Number. The least squares regression equation relating Net Weight to Number is: Predicted Net Weight = 2.25 + 0.761*Number (c) What is the value of the least squares slope? Given your answers in a) and b), does this value look right? Explain briefly. (d) What is the value of the least squares intercept? Why does this value seem wrong? Hint: Think of the interpretation of the intercept within the context of the problem. What should the value of the intercept be? (e) If we force the intercept to be the correct value, the least squares estimate of the slope is 0.856. How does this compare with your estimate in a)? 1 Statistics 101: Section L - Laboratory 6 Activity 2: In this activity, we will look closely at the relationship between salary levels and the number of years employees have been at a particular company. A random sample of 25 employees was selected. Information on annual salary ($1000s) and years with the company are given below. Employee Adams Brown Carter Davis Eaves Ford Green Higgins Irvin Jones King Lane Martin Salary 45 37 55 32 35 40 47 35 27 38 53 35 32 Employee North Owen Pitt Quincy Roberts Smith Turner Underwood Vance Wilson Young Ziegler Years 15 17 25 13 2 10 17 17 1 4 25 15 1 Salary 38 39 29 39 48 29 32 49 50 31 38 40 Years 6 20 3 21 19 5 1 20 22 10 7 8 1. Looking at the Plot of Salary By Years, describe the general pattern of the relationship between salary and years with the company. Be sure to mention direction, form and strength. 2. From the Bivariate Fit of Salary By Years, what is the equation for the least squares regression line? Give the value of the slope and its interpretation. Give the value of the intercept and its interpretation. 3. In terms of the company, what does the slope represent? What does the intercept represent? 4. What is the value of R2 ? What is the interpretation of this value? 5. Describe the residual plot. Is there anything in the residual plot that would cause you not to use linear regression to predict salary from years with the company? 6. What do you think might explain the pattern in the residual plot? (OVER) 2 One possible factor influencing salary could be the gender of the employee. Here are the genders with salaries and years in the company of the 25 employees. Employee Adams Brown Carter Davis Eaves Ford Green Higgins Irvin Jones King Lane Martin Salary 45 37 55 32 35 40 47 35 27 38 53 35 32 Years 15 17 25 13 2 10 17 17 1 4 25 15 1 Gender m f m f m m m f f m m f m Employee North Owen Pitt Quincy Roberts Smith Turner Underwood Vance Wilson Young Ziegler Salary 38 39 29 39 48 29 32 49 50 31 38 40 Years 6 20 3 21 19 5 1 20 22 10 7 8 Gender m f f f m f m m m f m m 1. On the JMP output circle the points that correspond to the women. What do you notice about the difference between men and women on the scatterplot? 2. When you get to this point, ask for the JMP output on the Bivariate Fit of Salary By years for each Gender. What is the regression equation for the women? What is the regression equation for the men? 3. How much of the variation in women’s salaries is explained by the linear relationship with years? How much of the variation in men’s salaries is explained by the linear relationship with years? 4. Compare the slopes and intercepts for the men and women? What differences do you see? What does this mean in terms of starting salaries and raises at the company? 5. What would be the predicted salary for a man with 20 years at this company? For a women with 20 years at this company? 6. What underlying factors could be contributing to the difference that you see in salaries between men and women? 7. Are the salaries at this company fair to women? What additional information would help you answer this question? 3