Decomposition of Sum of Squares ( )

advertisement

Decomposition of Sum of Squares

• The total sum of squares (SS) in the response variable is

SSTO   

Y i

 Y

2

• The total SS can be decompose into two main sources; error SS and regression SS…

• The error SS is SSE   e i

2 .

• The regression SS is SSR  b

1

2

 

X i

 X

2

.

It is the amount of variation in Y ’s that is explained by the linear relationship of Y with X .

STA302/1001 - week 4 1

Claims

• First, SSTO = SSR +SSE, that is

SSTO   

Y i

 Y

2  b

1

2

 

X i

 X

2   e i

2

• Proof:….

• Alternative decomposition is

SSTO   

Y i

 Y

2  

Y

ˆ i

 Y

2

 

Y i

 Y

ˆ i

2

• Proof: Exercises.

STA302/1001 - week 4 2

Analysis of Variance Table

• The decomposition of SS discussed above is usually summarized in analysis of variance table (ANOVA) as follow:

• Note that the MSE is s 2 our estimate of σ 2 .

STA302/1001 - week 4 3

Coefficient of Determination

• The coefficient of determination is

R 2 

SSR

SSTO

 1 

SSE

SSTO

• It must satisfy 0 ≤ R 2 ≤ 1.

• R 2 gives the percentage of variation in Y ’s that is explained by the regression line.

STA302/1001 - week 4 4

Claim

• R 2 = r 2 , that is the coefficient of determination is the correlation coefficient square.

• Proof:…

STA302/1001 - week 4 5

Important Comments about R 2

• It is a useful measure but…

• There is no absolute rule about how big it should be.

• It is not resistant to outliers.

• It is not meaningful for models with no intercepts.

• It is not useful for comparing models unless same Y and one set of predictors is a subset of the other.

STA302/1001 - week 4 6

ANOVE F Test

• The ANOVA table gives us another test of H

0

: β

1

= 0.

• The test statistics is F stat

MSR

MSE

• Derivations …

STA302/1001 - week 4 7

Prediction of Mean Response

• Very often, we would want to use the estimated regression line to make prediction about the mean of the response for a particular X value (assumed to be fixed).

• We know that the least square line is an estimate of

E

Y | X

 

0

 

1

X

Y

ˆ b

0 1

• Now, we can pick a point, X = x* (in the range in the regression

Y

ˆ b

0 b

1

E

Y | X  x *

 

0

 

1 x

Claim: Var

Y

ˆ

* | X  x *

  2



1 n

 x *  x

2

S

XX



* .

Proof:

• This is the variance of the estimate of E ( Y | X=x* ).

STA302/1001 - week 4 8

Confidence Interval for E(Y | X = x*)

• For a given x , x * , a 100(1α )% CI for the mean value of Y is

Y

ˆ

*  t

 n  2

; 

2 s

1 n

 x *  x

2

S

XX where s  MSE .

STA302/1001 - week 4 9

Example

• Consider the smoking and cancer data.

• Suppose we wish to predict the mean mortality index when the smoking index is 101, that is, when x* = 101….

STA302/1001 - week 4 10

Prediction of New Observation

• Suppose we want to predict a particular value of Y* when X = x *.

• The predicted value of a new point measured when X = x * is

Y

ˆ

*  b

0

 b

1 x *

• Note, the above predicted value is the same as the estimate of

E ( Y | X = x* ).

to the regression line being estimated by b

0

+ b

1

X . The second one is due to ε * i.e., points don’t fall exactly on line.

• To calculated the variance in error of prediction we look at the difference Y *  Y

ˆ

* ....

STA302/1001 - week 4 11

Prediction Interval for New Observation

• 100(1α )% prediction interval for when X = x* is

Y

ˆ

*  t

 n  2

; 

2 s 1 

1 n

 x *  x

2

S

XX

• This is not a confidence interval; CI’s are for parameters and we are estimating a value of a random variable.

• Prediction interval is wider than CI for E ( Y | X = x* ).

STA302/1001 - week 4 12

Dummy Variable Regression

• Dummy or indicator variable takes two values: 0 or 1.

• It indicates which category an observation is in.

• Example…

• Interpretation of regression coefficient in a dummy variable regression…

STA302/1001 - week 4 13

Download