Ch14-ClassNotes

advertisement

Independent variable (X)

Chapter 14: Simple Linear Regression

The variable that is doing the predicting or explaining.

Dependent Variable (Y)

The variable that is being predicted or explained by the regression equation.

Simple Linear regression

Regression involving only two variables X and Y. The relationship between the two variables is approximated by a straight line.

Regression Model: Y = 

0

+ 

1

X + 

The model describing how variable (y) is related to the variable ( x

) in simple linear regression.

Regression Equation: E(y) = 

0

+ 

1 x.

Estimated Regression Equation: Ŷ = b

0

+ b

1

X (Estimated from sample data). Also called the “fitted” line.

Least Squares Method

Define: e = Y i

– Ŷ i.

”e” is also called “error” or “residual”. The method of least squares, estimates the values for b

0

and b

1

such that  e 2 =  (Y i

– Ŷ i

) 2 is minimized.

Sums of Squares

Total Sum of Squares (SST) =  (Y i

– i

) 2

Sum of Squares due to Regression (SSR) =  (Ŷ i

– i

) 2

Sum of Squares due to Error (SSE) =  (Y i

– Ŷ i

) 2

Note: SST = SSR + SSE

Coefficient of Determination (r 2 )

A measure of the proportion of the variation in the dependent variable that is explained by the estimated regression equation. It is a measure of how well the estimated regression equation fits the data.

Correlation Coefficient (r)

A statistical measure of the strength of the linear relationship between two variables.

Mean Square Error (MSE or S 2 )

The unbiased estimate of the variance,  2 , of the error term e.

Standard Error of the Estimate (S)

The square root of the mean square error, denoted. It is the estimate of  , the standard deviation of the error term e.

Outlier

A data point or observation that is unusual compared to the remaining data.

Influential Observation

An observation that has a strong influence or effect on the regression results.

Leverage

A measure of the influence an observation has on the regression results. Influential observations have high leverage.

Excel functions

Estimate for  o

= b o

: INTERCEPT(y-range,x-range)

Estimate for 

1

= b

1

: SLOPE(y-range,x-range)

Coefficient of determination (r2) : RSQ(y-range,x-range)

Estimate for  = S = S y.x

: Standard error of the estimate = STEYX(y-range,x-range)

LINEST Array Function:

1.

Highlight 5 rows and as many columns as no. of Xs

2.

Enter the function

LINEST(y-range,x-range,,1)

3.

Press Control-Shift-Enter

Output:

Row 1 (Coefficients): b k b k-1

….. b

1 b

0

Row 2 (Standard error of b's): S bk

S bk-1

….. S b1

S b0

Row 3 (R 2 and S

Y.X

Row 4 (F and df):

): R

F

2 S

Y.X

df of MSE

Row 5 (SS's): SSR

Forecasting Y

Point prediction Ŷ = b o

+ b

1

X

Estimated Simple Linear Regression Equation

SSE y

 b o

 b

1 x

 e b

0

= the y-intercept, b

Review of assumptions

1

= the slope of the line

1.

The mean of Y (  y

) =  o

+ 

1

X

2.

For a given X, Y values follow normal distribution

3.

The dispersion (variance) of Y values remains constant everywhere along the line.

4.

The error terms (e) are independent

Decomposition of variance

 (Y -

Y

) 2 =  (Ŷ -

Y

) 2 +  (Y – Ŷ) 2 i.e. SST = SSR + SSE

SST = Sum of Squares Total =

 (Y -

Y

) 2

SSR = Sum of Squares Regression =  (Ŷ -

Y

) 2

SSE = Sum of Squares due to Error  (Y – Ŷ) 2 

= SST - SSR

ANOVA Table

Source Sum of squares Df

Regression SSR

Error SSE

Total SST

1 n-2 n-1

Mean Square

MSR = SSR/1

MSE = SSE/(n-2)

F-test

F = MSR/MSE

(Note: MSE = S 2y.x

) and

MSE

= S y.x

= Standard error of estimate

Coefficient of Determination r

2 

SSR

SST

Sample Correlation Coefficient r xy

= (the sign of b

1

)

= (the sign of b

1

) r

2 where b

1

= the slope of the regression equation

Hypothesis testing for 

1

:

Given the four assumptions stated earlier, the model for Y =  regression model is statistically significant only if

 o

+ 

1

X + 

. Then, this

1

 0. Therefore, the hypothesis for testing whether or not the regression is significant is as follows.

H o

: 

1

= 0; H o

: 

1

 0

To test the above hypotheses, either a t-test or an F-test may be used. t Test: t

 b

1 s b

1

(values of b

1

and S b1

are from the LINEST output)

F Test:

F

MSR

, also note, F = t 2

MSE

Confidence interval for 

1

: b

1

± t

 /2

S b1, where df for t = df of MSE

Interval Estimation for Y

Confidence Interval Estimate of an individual Y

IND

: Ŷ± t

 /2

S ind where df for t = df of MSE, and S ind

= From special regression output

Confidence Interval Estimate of  y p

: Ŷ± t

 /2

S

Ŷp where df for t = df of MSE, and S yp

S

2 ind

MSE

Leverage of observation (h i

) = h i n

 x i

 x

 x i

 x

 

2

2

Download