Correlations: Volumes Sold Y, Pages X1, Competing Books X2

advertisement
KING FAHD UNIVERSITY OF PETROLEUM & MINERALS
DEPARTMENT OF MATHEMATICS and STATISTICS
DHAHRAN, SAUDI ARABIA
STAT212: BUSINESS STATISTICS II (091)
Quiz #5, Jan 03, 2010.
Time: two days
Instructor: Dr Abdulkadir Hussein;
Student name:_________________________
ID:____________________________
section:____
Student
Q1: A
publisher wanted to figure out the variables that contribute to the number of
books sold. They considered volume sold as Y and other variables such as
competing books etc were considered as dependent variables. Answer the
following questions.
a. From the correlation matrix below, which variables seem to be significantly
correlated with Y ? Justify your answer
Correlations: Volumes Sold Y, Pages X1, Competing Books X2, Advertising Budget X3, Age of Author X4
Volumes Sold Y
0.622
0.013
Pages X1
Competing Books
0.355
0.194
0.501
0.057
Advertising Budg
0.620
0.014
0.091
0.746
0.384
0.158
Age of Author X4
0.485
0.067
-0.019
0.947
-0.113
0.687
Pages X1
Cell Contents: Pearson correlation
P-Value
MTB >
Competing Books
Advertising Budg
0.265
0.340
Q2: The publisher used best subset variable selection method and they produced the
following MINITAB output. Use the output to decide the best set of variables to be used
in a regression model for predicting the volumes sold. Justify your answer.
Best Subsets Regression: Volumes Sold versus Pages X1,
Competing Bo, ...
Response is Volumes Sold Y
C
o
m
p
e
t
i
n
g
R-Sq
38.6
38.5
70.7
63.3
83.5
74.1
84.5
R-Sq(adj)
33.9
33.8
65.8
57.2
79.0
67.1
78.3
Mallows
Cp
28.5
28.6
9.9
14.6
3.6
9.7
5.0
S
41.593
41.644
29.930
33.463
23.436
29.357
23.850
A
g
e
o
f
B
u
d
g
e
t
A
u
t
h
o
r
X X X
1 2 3
X
X
X
X
X
X
X
X X X
X X X
X
4
P
a
g
e
s
Vars
1
1
2
2
3
3
4
A
d
v
e
r
t
i
s
i
n
g
B
o
o
k
s
X
X
X
Q3: Finally, the publisher decided to use all the variable in the regression model …Use
the MINITAB output to answer the question..
Regression Analysis: Volumes Sold versus Pages X1,
Competing Bo, ...
The regression equation is
Volumes Sold Y = - 125 + 0.176 Pages X1 - 1.57 Competing Books X2 + 1.59
Advertising Budget X3
+ 1.61 Age of Author X4
Predictor
Constant
Pages X1
Competing Books X2
Advertising Budget X3
Age of Author X4
S = 23.8497
Coef
-125.31
0.17590
-1.574
1.5917
1.6137
R-Sq = 84.5%
SE Coef
31.08
0.03977
1.996
0.4445
0.6250
T
-4.03
4.42
-0.79
3.58
2.58
P
0.002
0.001
0.449
0.005
0.027
VIF
1.369
1.687
1.353
1.152
R-Sq(adj) = 78.3%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
4
10
14
SS
30960.3
5688.1
36648.4
MS
7740.1
568.8
F
13.61
P
0.000
Residual Plots for Volumes Sold Y
Normal Probability Plot
Versus Fit s
99
40
Residual
Percent
90
50
10
1
-50
-25
0
Residual
25
20
0
-20
-40
50
0
40
40
3
20
2
160
0
-20
1
0
120
Versus Order
4
Residual
Frequency
Histogram
80
Fitted Value
-30
-20
-10
0
10
Residual
20
30
40
-40
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
Observation Order
a. Test the hypothesis that, overall, the regression model is significant.
Hypotheses:
Critical region:
Test statistic:
Decision and conclusion:
b. Test the hypothesis that the age of the author is significant.
c. Interpret the coefficient of the age of the author ..
d. Using the residual plots above, check the regression assumptions of:
Normality, Independence, and variance constancy of the regression errors..
Q3: In the following regression, the independent variables X1 and X2 are
continuous, whereas, X3 and X4 are dummy variables and X5=X1*X4 (that is, an
interaction term between X1 and X4). Answer the following questions..
The regression equation is
y = 9.45 + 2.98 x1 - 0.503 x2 + 1.54 x3 + 2.86 x4 + 2.98 x5
Predictor
Constant
x1
x2
x3
x4
x5
Coef
9.454
2.9842
-0.50282
1.5412
2.855
2.9770
S = 2.01791
SE Coef
5.136
0.2534
0.03741
0.4332
5.676
0.2810
R-Sq = 99.5%
T
1.84
11.78
-13.44
3.56
0.50
10.59
P
0.069
0.000
0.000
0.001
0.616
0.000
VIF
5.300
1.042
1.034
152.238
153.019
R-Sq(adj) = 99.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
5
94
99
SS
80021
383
80404
MS
16004
4
F
3930.33
P
0.000
a. Interpret the coefficient of X4
b. Identify the variables which exhibit multicollinearity
c. In your opinion, do you suggest removing some of the variables which have
multicollinearity problem ? Why or why not?
Download