hw_02n_sol

advertisement
Solution - Homework 2
Use the data set HW_2 to complete this assignment and regress Y on X
1. Create boxplots for both X and Y. Are there any outliers?
No outliers identified. See boxplot below.
Boxplot of Y, X
Y
0.16
X
20
0.14
0.12
15
0.10
0.08
10
0.06
5
0.04
0.02
0
0.00
2. Make a Scatterplot with Regression. Does there appear to be a linear relationship?
Yes there does appear to be a linear relationship.
Scatterplot of Y vs X
0.16
0.14
0.12
Y
0.10
0.08
0.06
0.04
0.02
0.00
0
5
10
X
15
20
1
3. Check for outliers using the semi-studentized method. Are there any outliers, if so what the
absolute values of the semi-studentized residuals?
No as all semi-studentized residuals have an absolute value less than four, and now points
were identified as potential outliers.
4. Do a check of normality by using a probability plot of the residuals. Include: a) the null and
alternative hypotheses, b) the p-value of the test, c) your decision based on a 0.05 level of
significance, and d) Minitab copy of your plot.
a) Ho: The residuals come from a normal distribution
Ha: The residuals do not come from a normal distribution
b) p-value is 0.033
c) Since p-value is less than 0.05 we reject Ho and will conclude the error terms are NOT
normally distributed
d)
Probability Plot of RESI1
Normal - 95% CI
99
Mean
-3.59012E-17
StDev
0.01228
N
23
AD
0.797
P-Value
0.033
95
90
Percent
80
70
60
50
40
30
20
10
5
1
-0.04
-0.03
-0.02
-0.01
0.00
0.01
RESI1
0.02
0.03
0.04
5. Do a check of equal variances by performing a Modified Levene Test, Breusch-Pagan Test,
and White's Test. Include: a) the null and alternative hypotheses, b) the test statistic c) p-value
and DF of the test (the df for the BP and White test only), and d) your decision based on a 0.05
level of significance.
Modified Levene Test
a) Ho: The variances are equal
Ha: The variances are not equal
b) Test statistic = 9.45
2
c) The p-value is 0.006 NOTE: Remember that the Levene’s test is more robust against
violations to normality than is the F-test making the Levene test a better overall test of equal
variances. The only condition for the Levene test is that the variable being tested is continuous.
d) Since the p-value is less than 0.05 we conclude that the assumption of equal variances
is NOT satisfied.
Breusch-Pagan Test
a) Ho: All slopes are equal to zero
Ha: At least one slope differs from 0
b) Test statistic F = 16.38
c) The DF = 1,21 and p-value is 0.001
d) Since the p-value is less than 0.05 we conclude that the assumption of equal variances
is NOT satisfied.
White's Test
a) Ho: All slopes are equal to zero
Ha: At least one slope differs from 0
b) Test statistic F = 10.90
c) The DF = 2, 20 p-value is 0.001
d) Since the p-value is less than 0.05 we conclude that the assumption of equal variances
is NOT satisfied.
6. Perform a Lack of Fit Test using both Pure Error and Data Subsetting to check if linear
regression function is appropriate. Include: a) the null and alternative hypotheses, b) the correct
F-statistic, DF and p-value of the Pure Error test, c) the results of the Data Subsetting test, and d)
your decision based on a 0.05 level of significance, and e) Minitab copy of your ANOVA output
and the Data Subsetting results.
a) Ho: The linear regression function is appropriate
Ha: The linear regression function is not appropriate
3
b) F-statistic is 0.51, DF = 2, 19 and p-value is 0.610
c) The p-value for data subsetting is 0.000 indicating the linear model is not a good fit.
d) Since the Pure Error p-value is greater than 0.05 we conclude that the error is due
more to random variation within each X than to lack of model fit. However, the low p-value for
the data subsetting comes from possible curvature in the model.
e)
Analysis of Variance
Source
Regression
Residual Error
Lack of Fit
Pure Error
Total
DF
1
21
2
19
22
SS
0.036190
0.003319
0.000168
0.003151
0.039509
MS
0.036190
0.000158
0.000084
0.000166
F
229.00
P
0.000
0.51
0.610
R denotes an observation with a large standardized residual.
Possible lack of fit at outer X-values (P-Value = 0.000)
Overall lack of fit test is significant at P = 0.000
7. Perform a Box-Cox analysis on Y to see if any transformation is suggested. Include the a)
estimated and rounded lambda values, b) the interpretation of this value, and c) the Box-Cox plot.
NOTE: This can only be done using Minitab Version 15 or higher – i.e. student version 14
does not contain Box-Cox program.
a) Estimated value is 0.21 and rounded lambda is 0.00
b) The rounded value implies we should apply a log transformation on Y.
c)
Box-Cox Plot of Y
Lower CL
Upper CL
Lambda
0.30
(using 95.0% confidence)
0.25
StDev
0.20
Estimate
0.21
Lower CL
Upper CL
-0.13
0.58
Rounded Value
0.00
0.15
0.10
0.05
Limit
0.00
-1
0
1
Lambda
2
3
4
8. Create a transformation of Y using the natural log. Using these transformed Y values check
the assumption for normality and use the BP method to check constant variance. Include the a)
hypotheses, b) test statistic, c) p-value and d) decision. Use 0.05 as level of significance.
Normality
a) Ho: The residuals come from a normal distribution
Ha: The residuals do not come from a normal distribution
b) AD = 0.355
c) p-value = 0.429
d) Since p-value is greater than 0.05 we fail to reject the null hypothesis. The assumption
of normality is plausible.
Variance
a) Ho: All slopes are equal to zero
Ha: At least one slope differs from 0
b) F = 0.82
c) p-value = 0.375
d) Since p-value is greater than 0.05 we fail to reject the null hypothesis. The assumption
of constant variance is plausible.
9. Using the transformed Y-values conduct lack of fit tests using the Pure Error and Data
Subsetting options. What is a) the p-value for both tests and b) the conclusion for both tests? Use
alpha of 5%.
a) The p-values for both tests is 0.000
b) With the p-value being less than 0.05 we reject the null hypothesis and conclude that
the model is not a good fit and the variation in X is not due to random error.
10. Create a new, squared term for X by squaring each X value. Regress the transformed Y
values on both the X and X-squared terms (i.e. put both these X terms in the predictor field in
Minitab). Perform a lack of fit tests using the Pure Error and Data Subsetting options. What is a)
the p-value for both tests and b) the conclusion for both tests, c) what is your overall general
conclusion about model fit? Use alpha of 5%.
a) The p-values for Pure Error is 0.018 and for Data Subsetting the p-value is greater
than 0.1
b) With Pure Error p-value being less than 0.05 this indicates that adding to the model a
squared term does not result in a well fitted model. Most likely more term(s) need to be added.
Conversely, the data subsetting indicates that the squared term corrects for possible curvature in
X.
5
c) In this case, the Pure Error test is comparing two models: one with and without the
squared term concluding the simpler model is a better model fit than the multiple model. This
conflicts with the data subsetting results that show the squared term corrects for curvature. Since
we have replicates in X and have satisfied normality and constant variance with the natural log of
Y, we will follow those results and conclude that the model is still not a good fit for the data. The
best solution would be to find additional predictors.
6
Download