Analysis of Variance (ANOVA) for Simple Linear Regression

Analysis of Variance (ANOVA) for Simple Linear Regression The variability in Y values can be partitioned into two pieces. Pn i=1 (Yi − Ȳ )2 Pn = i=1 (Ŷi − Ȳ )2 + Pn i=1 (Yi − Ŷi )2 Total Sum of Squares = Regression Sum of Squares + Error (or Residual) Sum of Squares SSTO = SSREG + SSE We can organize the results of a simple linear regression analysis in an ANOVA table. Source D.F. Sum of Squares Mean Square F Regression dfREG SSREG M SREG M SREG M SE Error dfE SSE M SE Total dfT O SST O Source D.F. Sum of Squares − Ȳ )2 Pn (Ŷi −Ȳ )2 1 )2 Pn (Yi −Ŷi )2 n−2 1 Pn Error n−2 Pn − Ŷi Total n−1 Pn − Ȳ )2 Regression The F -statistic M SREG M SE i=1 (Ŷi i=1 (Yi i=1 (Yi Mean Square i=1 F M SREG M SE P-value P (T 2 ≥ M SREG M SE ) T 2 ∼ F (dfREG , dfE ) P-value P (T 2 ≥ M SREG M SE ) T 2 ∼ F (1, n − 2) i=1 is used to test H0 : µ{Y |X} = β0 versus HA : µ{Y |X} = β0 + β1 X for some β1 6= 0. or H0 : β1 = 0 vs. HA : β1 6= 0 for short. The test is equivalent to the t-test that we learned about previously because (1) F = M SREG β̂12 2 =h i2 = t M SE SE(β̂1 ) and (2) T 2 ∼ F with 1 and n − 2 d.f. ⇐⇒ T ∼ t with n − 2 d.f. The F -statistic M SREG M SE is a special case of the F -statistic used to compare full and reduced models. F = [RSS(red.) − RSS(full)]/[dfRSS(red.) − dfRSS(full) ] RSS(full)/dfRSS(full) Recall that our null and alternative hypotheses are H0 : µ{Y |X} = β0 versus HA : µ{Y |X} = β0 + β1 X for some β1 6= 0. The full model corresponds to the situation where β1 can be any value. The reduced model forces β1 to be 0, just like H0 . Write down formulas for RSS(red.), RSS(full), dfRSS(red.) , and dfRSS(full) for the special case of simple linear regression; and show that the resulting reduced vs. full model F -statistic is the same as SREG F = MM SE . Because SST O = SSREG + SSE, we may write 1 = SSE SST O SSREG SST O + SSE SST O . is the proportion of total variation in the Y values that was not explained by the regression of Y on X. SSE SSREG The remaining proportion of variation in the Y values is 1 − SST O = SST O . This quantity – known as the coefficient of determination – is the proportion of the variation in the Y values that was explained by the regression of Y on X. It can be shown that the coefficient of determination is equal to the square of the sample linear correlation coefficient between X and Y . SSE SSREG 1− = = r2 SST O SST O

Analysis of Variance (ANOVA) for Simple Linear Regression

Related documents

Products

Support

Analysis of Variance (ANOVA) for Simple Linear Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib