UK Spirits Sales Analysis (WORD Document)

advertisement
British Spirits Example
Simple Linear Regression Over Time
Response Variable (Y) – Per capita consumption of Spirits
Predictor Variable (X) – Indexed Price-to-income ratio
Time Period – Annual Data 1870-1938
Source: J. Durbin and G.S. Watson, “Testing for Serial Correlation in Least Squares
Regression. II.” Biometrika. 38: (June 1951) pp.159-177.
Model: Y   0  1 X  
Step 1: Plot the consumption versus the indexed price-to-income ratio.
Consumption vs Price/Income
2.2
Consumption
2
1.8
1.6
consume
1.4
1.2
1
0.75 0.8 0.85 0.9 0.95
1
1.05 1.1 1.15 1.2 1.25
Price-to-Income Ratio
Step 2: Fit a simple linear regression model:
Intercept
price_inc
Coefficients
5.16
-3.14
Standard Error
0.258
0.238
t Stat
20.01
-13.17
P-value
Lower 95% Upper 95%
.0000
4.64
5.67
.0000
-3.62
-2.66
^
Thus, the fitted equation is: Y  5.16  3.14 X
Step 3: Obtain a histogram of the residuals (I copied residuals to original spreadsheet)
12
10
8
6
4
2
0
0.
25
0.
15
0.
05
05
-0
.
-0
.
-0
.
15
Frequency
25
Frequency
Histogram
Residuals
The first bin (0 cases) represents the number less than –0.25, the second bin (9 cases)
represents the number between –0.25 and –0.20, and so on. The distribution is centered at
0, but not particularly mound shaped.
^
Step 4: Plot the residuals versus Y (I copied these values to original spreadsheet).
residuals vs fitted values
0.3
0.2
0.1
0
residuals
1.2
-0.1
-0.2
-0.3
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
Note that there is some evidence of non-constant error variance (but I’ve seen much
worse in practice).
Step 5: Plot the residuals versus X (This was automatically printed by PHStat, but I
rescaled it).
price_inc Residual Plot
0.3
0.2
Residuals
0.1
0
-0.1
-0.2
-0.3
0.9
0.95
1
1.05
1.1
1.15
price_inc
This is a mirror image of the residuals versus fitted values, re-scaled.
1.2
1.25
Step 6: Plot the residuals versus year.
residuals
0.3
0.2
0.1
0
1860
-0.1
residuals
1880
1900
1920
1940
1960
-0.2
-0.3
Residuals close in time are very similar, displaying clearly that there is positive
autocorrelation among residuals. This is by far the most serious violation of model
assumptions from the 4 graphs.
Step 7: Conduct the Durbin-Watson test for Positively correlated errors.
H0: Errors are not positively correlated over time
HA: Errors are positively correlated over time
69
Test Statistic: DW 
 (e
i 2
i
 ei 1 ) 2
n
e
i 1
2
i
Durbin-Watson Calculations
Sum of Squared Difference of Residuals
Sum of Squared Residuals
0.099348772
1.401388918
Durbin-Watson Statistic
0.070893076
Decision Rule (n=69 observations, k=1 predictor, =0.05 significance level):
Conclude HA if dL < 1.58, Conclude H0 if dU > 1.64
Here, we clearly conclude in favor of HA. There is serious autocorrelation among errors.
Step 8: Regression statistics and the Analysis of Variance (for completeness, Tests are
not appropriate after step 7)
Regression Statistics
Multiple R
0.849290817
R Square
0.721294892
Adjusted R Square
0.717135114
Standard Error
0.144624523
Observations
69
ANOVA
df
Regression
Residual
1
67
SS
3.626825052
1.401388918
MS
3.626825052
0.020916253
F
Significance F
173.3974597
2.93887E-20
Our estimate of the standard error of the random errors () is Se=0.1446. The model
explains 72.13% of the variation in consumption (R2=0.7213). Note that the F-statistic is
highly significant. We would certainly conclude there is an association (it’s actually
negative, based on the 95% confidence interval for 1 from step 1). The interval was:
(-3.62,-2.66), which is entirely below 0. However, it has been found that the standard
errors of regression coefficients can be biased downward when errors are not
independent. This leads us to believe this confidence interval is probably too narrow.
Download