Uploaded by Margaux Lauren Tan

Lectures11and12AdditionalPracticeSolutions

advertisement
Lectures #11 and #12 - Additional Practice Solution
(1) A corporation administers an aptitude test to all new sales representatives. Management is
interested in the extent to which this test is able to predict their eventual success. The
accompanying table records average weekly sales (in ten thousands of dollars) and aptitude test
scores for a random sample of five representatives. Use a 5% significance level wherever
appropriate.
25
x
5
5
Weekly
Sales
(y)
1
5
4
7
3
20
xi  x
( xi  x ) 2
-1
1
0
1
-1
0
1
1
0
1
1
4
yi  y
( yi  y ) 2
( xi  x )( yi  y )
-3
1
0
3
-1
0
9
1
0
9
1
20
3
1
0
3
1
8
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
Test
Score
(x)
4
6
5
6
4
25
S y2 

20
y
4
5
( y i  y ) 2 20

5
n 1
4
S x2
 (x

i
 x) 2
n 1

4
1
4
a. Calculate the covariance between test score and weekly sales and the correlation between
test score and weekly sales.
Cov(X,Y) = Sxy =
 (x
i
 x )( y i  y )
n 1

8
2
4
b. Using the least squares method, find the simple linear regression equation to predict
weekly sales from test score.
b1 
s xy
s
2
x

2
1
b0  y  b1 x = 4 – (2)(5) = -6
Th
Salesˆ  6.00  2.00Score
sh
c. Interpret the regression coefficients.
The sample slope tells us that for each additional point attained on the test, average
weekly sales are estimated to increase $20,000.
The sample intercept tells us that when the test score is 0, average weekly sales are
estimated to be -$60000. This is an extrapolation.
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
d. Find the coefficient of determination and explain its meaning.
R2 
SSR 16

 0.80
SST 20
SST   ( yi  y ) 2  (n  1)s y2 = (4)(5) = 20
SSR  b12  ( xi  x ) 2  b12 (n  1)s x2 = (2)2(4)(1)=16
We can explain 80% of the differences in weekly sales by relating it to the test score.
e. Find the residuals from the regression.
f.
e i  yi  yˆ
yˆ  6.00  2.00x
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
Test Score Weekly Sales
(x)
(y)
4
1
6
5
5
4
6
7
4
3
2
6
4
6
2
1
5
4
7
3
–
–
–
–
–
2
6
4
6
2
=
=
=
=
=
-1
-1
0
1
1
Find a point estimate of σε.
S   MSE  1.333  1.15470
SSE = SST – SSR = 20 – 16 = 4
Or, SSE =
MSE 
e
2
i
 (1) 2  (1) 2  (0) 2  (1) 2  (1) 2  4
SSE
4

n  p 1 3
g. What are the required data conditions for the inference procedures we discussed to be
reliable?
Th
εi iid N(0, σ). The error terms are independent at identically distributed according to the
Normal distribution with a mean of 0 and some constant standard deviation, σ.
sh
h. Find a 95% confidence interval estimate of the population slope. Interpret it.
b1   t , n  k 1 S b1  2  3.182(0.5774) = (0.16, 3.84)
 2

S b1 
MSE
 ( xi  x )
2

MSE
(n  1) s x2

1.333
 0.5774
4
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
tc = ±t0.025,3 = ±3.182
We can be 95% confident that each additional point earned on the test is associated with
between $1600 and $38,400 additional weekly sales, on average.
i.
What are the hypotheses for the test to determine if test score is a significant predictor of
weekly sales.
H0: β1 = 0
Ha: β1 ≠ 0
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
Find the F test statistic and the F-critical point(s) for the test described in question #i.
What is your conclusion?
Scatterplot of C5 vs C4
16
MSR
Fobs =
 1  12
Reject H0
Do Not Reject H0
MSE 4
Reject H0
Do Not Reject H0
3
C5
j.
Since Fobs > Fc, Reject H0 and
conclude that using the linear model
that relates weekly sales to test score
provides significantly more
explanation of the variation is weekly
sales than y-bar does.
α = 0.05
C4
Fc = 10.128
k. Find the t-test statistic and the t-critical point(s) for the test described in question #i.
What is your conclusion?
b  1
20
tobs = 1

 3.464
s b1
0.5774
Reject
Reject
α/2
α/2
3.182
t
Th
-3.182
Since tobs > 3.182, Reject H0 and
conclude that using the linear model that
relates weekly sales to test score
provides significantly more explanation
of the variation is weekly sales than ybar does.
l.
Find 95% confidence interval for E(y) at x = 5.
sh
yˆ   t , n k 1 S ˆ
 2

yˆ  6  2(5)  4
 1 (5  5) 2
4  3.182 1.333 
4
5

 = (2.36, 5.64) × $10,000

This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
m. Find 95% prediction interval for y at x = 5.
yˆ   t , nk 1 S yˆ
 2

 1 (5  5) 2
4  3.182 1.3331  
4
 5

 = (-0.025, 8.025) × $10,000

n. Should the model be used to predict y when x = 10? Explain.
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
No, the sample data includes values for X in the range 4 – 6. Predicting Y when X = 10
would be an extrapolation.
(2) The Pearson coefficient of correlation r equals 1 when there is no:
a. explained variation
b. unexplained variation The unexplained variation is based on the residuals. The
relationship is deterministic (all points fall on a straight line) when r = 1, so all residuals
will be 0.
c. y-intercept in the model
d. outliers
(3) In a regression problem, if the coefficient of determination is 0.95, this means that:
a.
b.
c.
d.
95% of the y values are positive
95% of the variation in y can be explained by the variation in x
95% of the x values are equal
95% of the variation in x can be explained by the variation in y
(4) In simple linear regression, which of the following statements indicate no linear relationship
between the variables x and y?
a. Coefficient of determination is 1.0
b. Coefficient of correlation is 0.0
c. Sum of squares for error is 0.0
d. Sum of squares for regression is relatively large
Th
(5) A scatter diagram includes the following data points:
3
8
2
6
5
12
4
10
5
14
sh
x
y
Two regression models are proposed: (1) ŷ  1.2 + 2.5x, and (2) ŷ  3 + 2.0x. Using the least
squares method, which of these regression models provides the better fit to the data? Why?
The better equation is (1). It is the one that results in the lower SSE. Find the residuals using
both equations; square them; sum the squared residuals.
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
(6) A simple linear regression between X and Y using 25 observations has SSR = 100.
The sample variance for Y is 6.25. Construct the Regression ANOVA table.
Source of Variation
SS
df
MS
F
Regression
100
p=1
100
46
Error
50
n – p – 1 = 23
2.1739
Total
150
n – 1 = 24
sh
Th
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
SST  (n  1)s 2y  (24)(6.25)  150
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
(7) Below are the (partial) simple linear regression results for the 12 observations of monthly
advertising (in $100’s) and sales (in $1000’s) values given below.
18
40
88
50
82
53
36
18
86
32
61
63
Covariances: Sales, Advertising
Advertising
8
20
30
20
32
21
18
10
32
13
23
28
Sales
606.3864
196.3864
Sales
Advertising
Advertising
67.2955
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
Sales
a) Fill in the missing information. (Note: This should not require a great deal of
calculation.)
Summary
Multiple
R
R-Square
Adjusted
R-Square
StErr of
Estimate
0.9451
0.9396
6.050251124
Sum of
Squares
6304.195
366.055
Mean of
Squares
6304.195
36.6055
.9451
ANOVA Table
Explained
Unexplained
Degrees of
Freedom
p=1
n – p – 1 = 10
F-Ratio
p-Value
172.2197
< 0.0001
Standard
Regression
Table
Constant
Advertising
Coefficient
Error
5.0378
0.22237
-9.763255657
2.918270854
Confidence Interval 95%
t-Value
p-Value
-1.9380
13.123
0.0814
< 0.0001
Th
MSE = s2 = (6.0502)2 = 36.6055
sh
Use the relationship t obs 
bj   j
sb j
to find s b0 and t-value for b1.
SSE = (dfE)(MSE) = (10)(36.6055) = 366.055
s 2y  606.3864-----→ SST =
 ( yi  y ) 2 = (12 – 1)(606.3864) = 6,670.25
SSR = SST – SSE = 6670.25 – 366.055 = 6304.195
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
Lower
-20.9884
2.4228
Upper
1.4619
3.4138
b) What is the least squares equation to predict sales given advertising?
Salesˆ  9.763  2.919AD
c) Discuss how strong this model is
The model is fairly strong. The coefficient of determination = R2 = .9451. This tells us
that 94.51% of the differences in (variability in) sales values can be explained by the
amount of advertising. The standard error of the estimate = s = 6.05 ($1000’s). This
must be considered in the context of the sales values. The range for sales is 18 to 88 with
an average of 47.5. While s = 6.05 is not excessively large in this context, it isn’t small
either.
What is the 95% confidence interval estimate of the population slope?
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
d)
This can be read directly from the StatTools output: (2.4228, 3.4138)
It is calculated: b1  (t*)sb1  2.91827 (2.228)(0.22237)
e) What does this model predict will happen to sales if advertising is decreased $1000?
f) A decrease in advertising of $1000 is a decrease of 10 units of advertising. The model predicts
that sales increase 2.9183 units of y for each additional unit of advertising. Therefore, a 10 unit
decrease in advertising is predicted to be accompanied by a 29.183 ($1000) = $29,183
decrease in sales.
g) What does this model predict will happen to sales if advertising is increased by $2000?
Since the model is linear, every 1 unit change in x is accompanied by 2.9183 units change in y (in
the same direction):
Therefore, a 20 unit increase in advertising is predicted to be accompanied by a 2.9183 ×
sh
Th
20 unit = $58,366 increase in sales.
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
(8) For the following set of residual plots, discuss whether or not all of the assumptions required
sh
Th
is
ar stu
ed d
vi y re
aC s
o
ou urc
rs e
eH w
er as
o.
co
m
for inference in regression have been satisfied
This study source was downloaded by 100000835462848 from CourseHero.com on 10-29-2021 08:06:42 GMT -05:00
https://www.coursehero.com/file/6961803/Lectures11and12AdditionalPracticeSolutions/
Powered by TCPDF (www.tcpdf.org)
Download