Simple Linear Regression - Bus Company Cost Function

advertisement
Simple Linear Regression
British Bus Company Expenses/Mileage
Description: Total deflated expenses (Y, in £100,000s) and Car Miles (X, in millions) for
a British bus company over 33 time periods.
Data:
Period
1
2
3
4
5
6
7
8
9
10
11
Cost (Y) Miles (X) Period
Cost (Y) Miles (X) Period
Cost (Y) Miles (X)
2.139
3.147
12
2.027
3.141
23
2.244
3.844
2.126
3.16
13
1.985
2.928
24
2.153
3.276
2.153
3.197
14
1.956
3.063
25
2.025
3.184
2.153
3.173
15
2.004
3.096
26
2.007
3.037
2.154
3.292
16
2.001
3.096
27
2.018
3.142
2.282
3.561
17
2.015
3.158
28
2.021
3.159
2.456
4.013
18
2.132
3.338
29
2.004
3.139
2.599
4.244
19
2.195
3.492
30
2.093
3.203
2.509
4.159
20
2.437
4.019
31
2.139
3.307
2.345
3.776
21
2.623
4.394
32
2.27
3.585
2.059
3.232
22
2.523
4.251
33
2.464
4.073
Proposed Model:
Y   0  1 X  
 ~ N (0, ) (independe nt over time)
^
Regression Fit Summary (EXCEL): Y  0.65  0.45 X
^
  0.046
Regression Statistics
Multiple R
0.97284955
R Square
0.94643625
Adjusted R Square
0.94470838
0.04625823
Standard Error
Observations
33
ANOVA
df
Regression
Residual
Total
1
31
32
Coefficients
Intercept
Miles(X)
0.64963277
0.44672959
SS
MS
F
1.172087528 1.172088 547.7496
0.066334533 0.00214
1.238422061
Standard Error
t Stat
0.066359737 9.789562
0.019087704 23.40405
P-value
5.31E-11
2.89E-21
Significance F
2.8938E-21
Lower 95%
Upper
95%
0.51429112 0.784974
0.40779994 0.485659
Scatterplot of data with least squares regression equation superimposed:
SPSS:
Costs vs Miles (British Bus Company)
A
A
2.60
costs = 0.65 + 0.45 * miles
R-Square = 0.95
A
A
A A
A
2.40
A
co
st
s
A
A
A
A
2.20
AA
A
A
AA
AA
A
A
2.00
A
AAAA
A A AA
A
3.00
3.50
4.00
miles
Measures of Variation:
n
Total sum of Squares: SST   (Yi  Y ) 2  1.2384 df T  n  1  33  1  32
i 1
Costs vs Miles (British Bus Company)
A
A
2.60
A
A
A A
A
2.40
costs
A
A
A
A
A
2.20
AA
A
A
AA
AA
Mean = 2.19
A
A
2.00
A
A
AAA
A A AA
A
3.00
3.50
4.00
miles
Mean
n
^
Regression Sum of Squares: SSR   (Y i  Y ) 2  1.1721
df R  1
i 1
n
^
Error Sum of Squares: SSE   (Yi  Y i ) 2  0.0663
i 1
df E  n  2  33  2  31
ANOVA
df
Regression
Residual (Error)
Total
1
31
32
Coefficient of Determination: r 2 
SS
1.172087528
0.066334533
1.238422061
SSR 1.1721

 0.9465
SST 1.2384
The regression model explains about 95% of the variation in costs.
R Square
0.94643625
Coefficient of Correlation: r  sign (b1 ) r 2   0.9465  0.9729
Very strong positive correlation between costs and miles of service
Multiple R
0.97284955
Residual standard deviation (aka standard error of estimate):
^
  se  SYX
2
^


 Yi  Y i 

SSE
  0.0663  0.00214  0.0462

 i 1 
n2
n2
31
n
“Typical” distance from actual costs to fitted costs is 0.046.
Standard Error
0.04625823
Residual Analysis
^
Plot of Residuals vs Fitted values ei vs Y i
Residuals
0.1
Residuals (e)
0.05
0
1.8
2.2
2.6
3
-0.05
-0.1
-0.15
Fitted (Yhat)
Points are approximately a random cloud around 0…consistent with linear relation and
constant error variance.
Plot of Residuals vs Period ei vs i
Residuals
0.1
-0.05
-0.1
-0.15
Period
Errors appear correlated over time. Do not appear to be independent.
Histogram of residuals
0.
07
5+
0.
05
0.
02
5
0
-0
.0
25
-0
.0
5
12
10
8
6
4
2
0
-0
.1
25
-0
.0
75
Frequency
Histogram
bins
Not too consistent with normal distribution…but sample is not very large.
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
0
1
Residual
0.05
Durbin-Watson Test
H0: Errors are independent (no autocorrelation among residuals)
HA: Errors are positively autocorrelated
n
TS : D 
 (e
i 2
i
 ei 1 ) 2
n
e
i 1

2
i
0.0790
 1.1603
0.0663
Durbin-Watson Calculations
Sum of Squared Difference of Residuals
Sum of Squared Residuals
0.076970088
0.066334533
Durbin-Watson Statistic
1.160332102
Decision Rule (k=1, n=33, 0.05):
If D < dL = 1.38 then conclude Positive autocorrelation
If D > dU = 1.51 then conclude independent
Otherwise withhold judgment
We conclude autocorrelation is present. Other methods would be more appropriate for
this data. We will continue with the analysis on this data, knowing that while estimates
are still unbiased, standard errors are too small.
Test whether costs are linearly associated with miles of service
provided:
t-test
Parameter:1
Estimate: b1 = 0.4467
Estimated standard error: S b1  0.0191
H 0 : 1  0 H A : 1  0
0.4467  0
 23.4
0.0191
RR : | t obs |  t.025,332  2.0395
TS : t obs 
P  value  2 P(t  23.4)  0
Coefficients
Intercept
Miles(X)
0.64963277
0.44672959
Standard Error
t Stat
0.066359737 9.789562
0.019087704 23.40405
P-value
5.31E-11
2.89E-21
Lower 95%
Upper
95%
0.51429112 0.784974
0.40779994 0.485659
F-Test
H 0 : 1  0 H A : 1  0
MSR
SSR / 1
1.1721 / 1 1.1721



 548.0
MSE SSE /( n  2) 0.0663 / 31 0.0021
 t.05,1,31  4.16
TS : Fobs 
RR : Fobs
P  value  P( F  548.0)  0
ANOVA
df
Regression
Residual
Total
1
31
32
SS
MS
F
1.172087528 1.172088 547.7496
0.066334533 0.00214
1.238422061
Significance F
2.8938E-21
95% CI for 1: 0.4467  2.0395(0.0191)  0.4467  0.0390  (0.4077 , 0.4857)
We can be 95% confident that as miles of service increases by 1 unit (1 million miles),
costs increase on average by between 0.41 and 0.49 units (£100,000s).
Coefficients
Intercept
Miles(X)
0.64963277
0.44672959
Standard Error
t Stat
0.066359737 9.789562
0.019087704 23.40405
P-value
5.31E-11
2.89E-21
Lower 95%
Upper
95%
0.51429112 0.784974
0.40779994 0.485659
Estimate of Population Mean of all Periods where X=3, and prediction for a single
future period where X=3:
Confidence Interval Estimate
Data
X Value
Confidence Level
3
95%
Intermediate Calculations
Sample Size
Degrees of Freedom
t Value
Sample Mean
Sum of Squared Difference
Standard Error of the Estimate
h Statistic
Predicted Y (YHat)
33
31
2.039515
3.450879
5.873144
0.046258
0.064917
1.989822
For Average Y
Interval Half Width
Confidence Interval Lower Limit
Confidence Interval Upper Limit
0.024038
1.965784
2.013859
For Individual Response Y
Interval Half Width
Prediction Interval Lower Limit
Prediction Interval Upper Limit
0.097358
1.892463
2.08718
We can be very confident that mean cost among all periods with 3 units of service is
between 1.966 and 2.014 (recall units).
We would predict that if next period we were to provide 3 units of service, our costs
would be between 1.892 and 2.087.
Results of analysis that “adjusts” for autocorrelated errors (SAS,
PROC AUTOREG):
The SAS System
The AUTOREG Procedure
Dependent Variable
cost
Ordinary Least Squares Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
Variable
Intercept
miles
0.06633453
0.00214
-104.27226
0.9464
1.1603
DFE
Root MSE
AIC
Total R-Square
31
0.04626
-107.26528
0.9464
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
1
1
0.6496
0.4467
0.0664
0.0191
9.79
23.40
<.0001
<.0001
Estimates of Autoregressive Parameters
Lag
Coefficient
Standard
Error
t Value
1
2
-0.226275
-0.383562
0.171492
0.171492
-1.32
-2.24
Yule-Walker Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
Variable
Intercept
miles
DF
1
1
0.04578728
0.00158
-109.0495
0.9448
1.9519
Estimate
0.6629
0.4445
DFE
Root MSE
AIC
Total R-Square
Standard
Error
0.0710
0.0200
29
0.03974
-115.03553
0.9630
t Value
9.34
22.28
Approx
Pr > |t|
<.0001
<.0001
Note that our slope estimate and standard error have changed only slightly by
fitting this model. All inferences remain same.
Source: J. Johnston (1956), "Scale, Costs, and Profitability in Road
Passenger Transport," The Journal of Industrial Economics Vol 4, pp207-223.
Download