Review for Unit II Exam(Correlation and Regression) Statistics 1040 Dr. McGahagan Handouts from Michael Wichura are especially recommended; Philip Stark's chapters are also recommended. Problems for Chapter 8. Correlation: Review exercises 3, 5, 7, 8, 9, 11. Problems for Chapter 9. More on Correlation. Review Exercises 2, 4, 5, 9, 12. Terms for chapters 8 and 9 (Correlation) Terminology: EcLS commands to illustrate: Bivariate data (scatter 0.5 500 10 20 3 9) Scatterplot (plot y x) "Football shaped" key command H for "convex hull" Point of averages key command x for crosshairs Linear equation y=a+bx 5-number summary for bivariate data Means, SDs, correlation coefficient Covariance (average of product of deviations) (covariance x y) Correlation coefficient (corr x y) rho = (covariance x y) / (SDx * SD y) Problems with correlation coefficient: non-linear relations: Example: (use anscombe), (corr y1 x) (corr y2 x) and plot the two. outliers which inflate or reduce correlation coefficent. Illustrate with (use anscombe); (plot y3 x) --outlier reduces rho; (plot y4 x2) -- outlier increases rho attenuation (looking at limited slice reduces correlation; see ch. 9, review ex. 4 for the opposite -- expanding the range of data increases correlation) scale of graphed data (see p. 145 graphs) ecological correlation (correlation of averages stronger than for individuals) Andrew Gelman (Red State, Blue State) spurious correlation (due to chance or unrecognized influences or selection bias) Problems for Chapter 10: 3, 4, 6, 9, 10 Problems for Chapter 11: 4, 5, 7 Problems for Chapter 12. The Regression Line: 1, 2, 5, 7, 9, 10 Terminology for chapters 10, 11, 12 (Regression) SD box (mean +/- 2 SDs) and SD line Regression line (how does it differ from SD line?) Regression effect and regression fallacy (what is the connection?) -- Galton and regression effect. Permanent income (Friedman) as illustration of regression effect Two regression lines (regress y x) and (regress x y) -- Why not the same? How does Okun data illustrate? (use okun), (regress gdpgap ugap), (regress ugap gdpgap) R-squared RMS error for regression = Standard errror of regression = SE of regression = (sqrt (1 - rsq)) * SD y SE of coefficients, T-statistics and test that coefficients do not equal zero. Residual = Data - Fit; residual plot (why should you always use) Problems: omitted variables, non-linearity, heteroscedaticity, autocorrelation See handouts on the web on "Worked Regression Examples" (under chapter 10; includes Okun example) and "Reading a Regression" (under Chapter 12), and know how to read a regression table.