Review for Unit II Exam(Correlation and Regression) Statistics 1040 Dr. McGahagan

advertisement
Review for Unit II Exam(Correlation and Regression)
Statistics 1040
Dr. McGahagan
Handouts from Michael Wichura are especially recommended; Philip Stark's chapters are also recommended.
Problems for Chapter 8. Correlation: Review exercises 3, 5, 7, 8, 9, 11.
Problems for Chapter 9. More on Correlation. Review Exercises 2, 4, 5, 9, 12.
Terms for chapters 8 and 9 (Correlation)
Terminology:
EcLS commands to illustrate:
Bivariate data
(scatter 0.5 500 10 20 3 9)
Scatterplot
(plot y x)
"Football shaped"
key command H for "convex hull"
Point of averages
key command x for crosshairs
Linear equation
y=a+bx
5-number summary
for bivariate data
Means, SDs, correlation coefficient
Covariance (average of product of deviations) (covariance x y)
Correlation coefficient
(corr x y)
rho = (covariance x y) / (SDx * SD y)
Problems with correlation coefficient:
non-linear relations: Example: (use anscombe), (corr y1 x) (corr y2 x) and plot the two.
outliers which inflate or reduce correlation coefficent. Illustrate with (use anscombe);
(plot y3 x) --outlier reduces rho; (plot y4 x2) -- outlier increases rho
attenuation (looking at limited slice reduces correlation; see ch. 9, review ex. 4 for the
opposite -- expanding the range of data increases correlation)
scale of graphed data (see p. 145 graphs)
ecological correlation (correlation of averages stronger than for individuals)
Andrew Gelman (Red State, Blue State)
spurious correlation (due to chance or unrecognized influences or selection bias)
Problems for Chapter 10: 3, 4, 6, 9, 10
Problems for Chapter 11: 4, 5, 7
Problems for Chapter 12. The Regression Line: 1, 2, 5, 7, 9, 10
Terminology for chapters 10, 11, 12 (Regression)
SD box (mean +/- 2 SDs) and SD line
Regression line (how does it differ from SD line?)
Regression effect and regression fallacy (what is the connection?) -- Galton and regression effect.
Permanent income (Friedman) as illustration of regression effect
Two regression lines (regress y x) and (regress x y) -- Why not the same?
How does Okun data illustrate? (use okun), (regress gdpgap ugap), (regress ugap gdpgap)
R-squared
RMS error for regression = Standard errror of regression = SE of regression = (sqrt (1 - rsq)) * SD y
SE of coefficients, T-statistics and test that coefficients do not equal zero.
Residual = Data - Fit; residual plot (why should you always use)
Problems: omitted variables, non-linearity, heteroscedaticity, autocorrelation
See handouts on the web on "Worked Regression Examples" (under chapter 10; includes Okun
example) and "Reading a Regression" (under Chapter 12), and know how to read a regression table.
Download