British Spirits Example Simple Linear Regression Over Time Response Variable (Y) – Per capita consumption of Spirits Predictor Variable (X) – Indexed Price-to-income ratio Time Period – Annual Data 1870-1938 Source: J. Durbin and G.S. Watson, “Testing for Serial Correlation in Least Squares Regression. II.” Biometrika. 38: (June 1951) pp.159-177. Model: Y 0 1 X Step 1: Plot the consumption versus the indexed price-to-income ratio. Consumption vs Price/Income 2.2 Consumption 2 1.8 1.6 consume 1.4 1.2 1 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 Price-to-Income Ratio Step 2: Fit a simple linear regression model: Intercept price_inc Coefficients 5.16 -3.14 Standard Error 0.258 0.238 t Stat 20.01 -13.17 P-value Lower 95% Upper 95% .0000 4.64 5.67 .0000 -3.62 -2.66 ^ Thus, the fitted equation is: Y 5.16 3.14 X Step 3: Obtain a histogram of the residuals (I copied residuals to original spreadsheet) 12 10 8 6 4 2 0 0. 25 0. 15 0. 05 05 -0 . -0 . -0 . 15 Frequency 25 Frequency Histogram Residuals The first bin (0 cases) represents the number less than –0.25, the second bin (9 cases) represents the number between –0.25 and –0.20, and so on. The distribution is centered at 0, but not particularly mound shaped. ^ Step 4: Plot the residuals versus Y (I copied these values to original spreadsheet). residuals vs fitted values 0.3 0.2 0.1 0 residuals 1.2 -0.1 -0.2 -0.3 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 Note that there is some evidence of non-constant error variance (but I’ve seen much worse in practice). Step 5: Plot the residuals versus X (This was automatically printed by PHStat, but I rescaled it). price_inc Residual Plot 0.3 0.2 Residuals 0.1 0 -0.1 -0.2 -0.3 0.9 0.95 1 1.05 1.1 1.15 price_inc This is a mirror image of the residuals versus fitted values, re-scaled. 1.2 1.25 Step 6: Plot the residuals versus year. residuals 0.3 0.2 0.1 0 1860 -0.1 residuals 1880 1900 1920 1940 1960 -0.2 -0.3 Residuals close in time are very similar, displaying clearly that there is positive autocorrelation among residuals. This is by far the most serious violation of model assumptions from the 4 graphs. Step 7: Conduct the Durbin-Watson test for Positively correlated errors. H0: Errors are not positively correlated over time HA: Errors are positively correlated over time 69 Test Statistic: DW (e i 2 i ei 1 ) 2 n e i 1 2 i Durbin-Watson Calculations Sum of Squared Difference of Residuals Sum of Squared Residuals 0.099348772 1.401388918 Durbin-Watson Statistic 0.070893076 Decision Rule (n=69 observations, k=1 predictor, =0.05 significance level): Conclude HA if dL < 1.58, Conclude H0 if dU > 1.64 Here, we clearly conclude in favor of HA. There is serious autocorrelation among errors. Step 8: Regression statistics and the Analysis of Variance (for completeness, Tests are not appropriate after step 7) Regression Statistics Multiple R 0.849290817 R Square 0.721294892 Adjusted R Square 0.717135114 Standard Error 0.144624523 Observations 69 ANOVA df Regression Residual 1 67 SS 3.626825052 1.401388918 MS 3.626825052 0.020916253 F Significance F 173.3974597 2.93887E-20 Our estimate of the standard error of the random errors () is Se=0.1446. The model explains 72.13% of the variation in consumption (R2=0.7213). Note that the F-statistic is highly significant. We would certainly conclude there is an association (it’s actually negative, based on the 95% confidence interval for 1 from step 1). The interval was: (-3.62,-2.66), which is entirely below 0. However, it has been found that the standard errors of regression coefficients can be biased downward when errors are not independent. This leads us to believe this confidence interval is probably too narrow.