Statistics 101: Section L - Laboratory 6

advertisement
Statistics 101: Section L - Laboratory 6
In today’s lab we are going to look more at least squares regression, interpretation of slopes and intercepts,
and how to account for another variable in regression.
Activity 1: From lab 1 we have data on Total Weight (bag and contents), Net Weight (contents only) and
Number, of M&Ms. We want to determine what is the weight of an empty bag and what is the weight of a
single M&M. Below are some simple summary statistics.
# bags
mean
Total
Weight
110
21.01 g
Net
Weight
110
20.24 g
Number
110
23.64 g
1. Weight of an empty bag?
(a) From the simple summary statistics, come up with an estimate of the weight of a single empty
bag. Explain your reasoning.
(b) If we have Total Weight as the response variable and Net Weight as the explanatory variable, give
an interpretation of the intercept of the line that relates Total Weight to Net Weight.
The least squares regression equation relating Total Weight to Net Weight is:
Predicted Total Weight = 2.55 + 0.912*Net Weight
(c) What is the value of the least squares intercept? Given your answers in a) and b), does this value
look right? Explain briefly.
(d) What is the value of the least squares slope? Why does this value seem wrong? Hint: Think of
the interpretation of the slope within the context of the problem. What should the value of the
slope be?
(e) If we force the slope to be the correct value, the least squares estimate of the intercept is 0.77.
How does this compare with your estimate in a)?
2. Weight of a single M&M?
(a) From the simple summary statistics, come up with an estimate of a single M&M. Explain your
reasoning.
(b) If we have Net Weight as the response variable and Number as the explanatory variable, give an
interpretation of the slope of the line that relates Net Weight to Number.
The least squares regression equation relating Net Weight to Number is:
Predicted Net Weight = 2.25 + 0.761*Number
(c) What is the value of the least squares slope? Given your answers in a) and b), does this value
look right? Explain briefly.
(d) What is the value of the least squares intercept? Why does this value seem wrong? Hint: Think
of the interpretation of the intercept within the context of the problem. What should the value
of the intercept be?
(e) If we force the intercept to be the correct value, the least squares estimate of the slope is 0.856.
How does this compare with your estimate in a)?
1
Statistics 101: Section L - Laboratory 6
Activity 2: In this activity, we will look closely at the relationship between salary levels and the number
of years employees have been at a particular company. A random sample of 25 employees was selected.
Information on annual salary ($1000s) and years with the company are given below.
Employee
Adams
Brown
Carter
Davis
Eaves
Ford
Green
Higgins
Irvin
Jones
King
Lane
Martin
Salary
45
37
55
32
35
40
47
35
27
38
53
35
32
Employee
North
Owen
Pitt
Quincy
Roberts
Smith
Turner
Underwood
Vance
Wilson
Young
Ziegler
Years
15
17
25
13
2
10
17
17
1
4
25
15
1
Salary
38
39
29
39
48
29
32
49
50
31
38
40
Years
6
20
3
21
19
5
1
20
22
10
7
8
1. Looking at the Plot of Salary By Years, describe the general pattern of the relationship between salary
and years with the company. Be sure to mention direction, form and strength.
2. From the Bivariate Fit of Salary By Years, what is the equation for the least squares regression line?
Give the value of the slope and its interpretation. Give the value of the intercept and its interpretation.
3. In terms of the company, what does the slope represent? What does the intercept represent?
4. What is the value of R2 ? What is the interpretation of this value?
5. Describe the residual plot. Is there anything in the residual plot that would cause you not to use linear
regression to predict salary from years with the company?
6. What do you think might explain the pattern in the residual plot?
(OVER)
2
One possible factor influencing salary could be the gender of the employee. Here are the genders with salaries
and years in the company of the 25 employees.
Employee
Adams
Brown
Carter
Davis
Eaves
Ford
Green
Higgins
Irvin
Jones
King
Lane
Martin
Salary
45
37
55
32
35
40
47
35
27
38
53
35
32
Years
15
17
25
13
2
10
17
17
1
4
25
15
1
Gender
m
f
m
f
m
m
m
f
f
m
m
f
m
Employee
North
Owen
Pitt
Quincy
Roberts
Smith
Turner
Underwood
Vance
Wilson
Young
Ziegler
Salary
38
39
29
39
48
29
32
49
50
31
38
40
Years
6
20
3
21
19
5
1
20
22
10
7
8
Gender
m
f
f
f
m
f
m
m
m
f
m
m
1. On the JMP output circle the points that correspond to the women. What do you notice about the
difference between men and women on the scatterplot?
2. When you get to this point, ask for the JMP output on the Bivariate Fit of Salary By years for each
Gender. What is the regression equation for the women? What is the regression equation for the men?
3. How much of the variation in women’s salaries is explained by the linear relationship with years? How
much of the variation in men’s salaries is explained by the linear relationship with years?
4. Compare the slopes and intercepts for the men and women? What differences do you see? What does
this mean in terms of starting salaries and raises at the company?
5. What would be the predicted salary for a man with 20 years at this company? For a women with 20
years at this company?
6. What underlying factors could be contributing to the difference that you see in salaries between men
and women?
7. Are the salaries at this company fair to women? What additional information would help you answer
this question?
3
Download