Ch06-ClassNotes

advertisement
Chapter 6: Simple Linear Regression
Regression Line
The line that best fits a collection of X-Y data -- the line that minimizes the sum of
squared vertical distances (errors or residuals) from the points to the line. The line is
also known as “least squared line”. The fitted straight line is of the form Ŷ = bo + b1X.
Least Squares Method
Y = Observation, Ŷ = Fit
Error (residual) = Y – Ŷ
SSE = Sum of squared errors =  (Y – Ŷ)2 =  (Y – b0 – b1X)2
The least squares method chooses the values for b0 and b1 to minimize the sum of
squared errors (SSE).
Also, Y = Ŷ + (Y – Ŷ), i.e. Y = Fit + Residual
Statistical Model for Straight-Line Regression
Population model: Y = o + 1X + 
i.e. Y = Y/X + where, Y/X = o + 1X = Mean of Y for the given X value
Assumption: The deviations  are assumed to be independent and normally distributed
with mean 0 and standard deviation .
Estimation
Unknowns to be estimated are o, 1and .
Estimate for o = bo = INTERCEPT
Estimate for 1 = b1 = SLOPE
Estimate for  = S = Sy.x = Standard error of the estimate = STEYX
Decomposition of variance
Y = Fit + Residual = Ŷ + (Y – Ŷ)
Subtracting Y from both sides, (Y - Y ) = (Ŷ - Y ) + (Y – Ŷ)
Given assumption #4 above,
(Y - Y )2 =  (Ŷ - Y )2 +  (Y – Ŷ)2
i.e. SST = SSR + SSE
SST = Sum of Squares Total = (Y - Y )2
SSR = Sum of Squares Regression =  (Ŷ - Y )2
SSE = Sum of Squares due to Error  (Y – Ŷ)2= SST - SSR
ANOVA Table
Source
Sum of squares
Regression
SSR
Error
SSE
Total
SST
(Note: MSE = S2y.x) and MSE
Df
Mean Square
F-test
1
MSR = SSR/1
F = MSR/MSE
n-2
MSE = SSE/(n-2)
n-1
= Sy.x = Standard error of estimate
Coefficient of Determination r 2 
SSR
SST
Sample Correlation Coefficient rxy = (the sign of b1)
= (the sign of b1)
r2
where b1 = the slope of the regression equation
Hypothesis Testing:
Given the four assumptions stated earlier, the model for Y = o + 1X + . Then, this
regression model is statistically significant only if 1  0. Therefore, the hypothesis for
testing whether or not the regression is significant is as follows.
Ho: 1 = 0; Ho: 1  0
To test the above hypotheses, either a t-test or an F-test may be used.
b
t Test:
t 1
s b1
F Test:
F
MSR
, also note, F = t2
MSE
Forecasting Y
Point prediction Ŷ = bo + b1X
Interval prediction = Ŷ ± t Sf, where Sf = Standard error of the forecast with df = n-2
Estimated Simple Linear Regression Equation y  b o  b1 x  e
b0 = the y-intercept, b1 = the slope of the line
Review of assumptions
1. The mean of Y (y) = o + 1X
2. For a given X, Y values follow normal distribution
3. The dispersion (variance) of Y values remains constant everywhere along the line.
4. The error terms (e) are independent
Analysis of residuals
Recall the assumptions made for statistical analysis of regression
1. The underlying relation is linear (y = o + 1X)
2. The errors are independent
3. The errors have constant variance
4. The error are normally distributed
Residual plots used for verifying assumptions
Histogram of residuals
Checks for the normality assumption –
moderate deviation from bell-shape is
permitted
Residuals (on y-axis) v. fitted
Checks for the linear assumptions – if the plot
values Ŷ (on x-axis)
is not completely random, a transformation
may be considered
Residuals v. explanatory
Also checks for the linear model and for
variable (x)
constant variance assumption
Residuals over time (for time
Used for time-series data - checks for all
series)
assumptions
Autocorrelation of residuals
Checks for independence of residuals
Variable transformations
1.
1/x
Y = 0 + 1 (1/X) + 
2.
log(x)
Y = 0 + 1 log(X) + 
x
3.
Y = 0 + 1 x + 
2
4.
x
Y = 0 + 1 X2 + 
Growth Curves
Growth curves are used for long term forecasts using annual data.
Exponential growth
Sales increase over time by the same percentage each time period
Y = b0b1t
Log(Y) = Log(b0) + Log(b1)t = bo’ + b1’t, where b0’ = Log(b0) and b1’ = Log(b1)
Conversely, b0 = 10^b0’ and b1 = 10^b1’
Linear growth:
Sales increases over time by the same amount. Transformation may be used to
convert exponential growth into linear growth.
Download