Political Science 30: Political Inquiry Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for independent variables Fit of the regression: R Square Statistical significance How to reject the null hypothesis Multivariate regressions College graduation rates Ethnicity and voting Linear Regression: Review Want to draw a line that best represents the relationship between the IV (X) and DV (Y). Y = a + b*X Allows us to predict DV given value of IV Regression finds the values for a and b that minimizes the distance between the points and the line. Technically, a and b are population parameters. We only get to calculate sample statistics, a-hat and b-hat. Interpreting SPSS regression output 100 80 Slope or “coefficient” 60 Graduation Rate 40 How tight is the fit? Y-intercept or “constant” 20 0 Rsq = 0.3454 0 200 400 600 Average SAT Score 800 1000 1200 1400 1600 Interpreting SPSS regression output An SPSS regression output includes two key tables for interpreting your results: A “Coefficients” table that contains the yintercept (or “constant”) of the regression, a coefficient for every independent variable, and the standard error of that coefficient. A “Model Summary” table that gives you information on the fit of your regression. Interpreting SPSS regression output: Coefficients Coefficients Unstandardized Coefficients Model 1 B a Standardized Coefficients Std. Error (Constant) 4.236 7.048 Average SAT Score 5.88E-02 .007 Beta t .588 Sig. .601 .549 8.778 .000 a. Dependent Variable: Graduation Rate In this class, we will ONLY LOOK AT UNSTANDARDIZED COEFFICIENTS! • The y-intercept is 4.2% with a standard error of 7.0% • The coefficient for SAT Scores is 0.059%, with a standard error of 0.007%. Interpreting SPSS regression output: Coefficients Coefficients Unstandardized Coefficients Model 1 B a Standardized Coefficients Std. Error (Constant) 4.236 7.048 Average SAT Score 5.88E-02 .007 Beta t .588 Sig. .601 .549 8.778 .000 a. Dependent Variable: Graduation Rate Est. Graduation Rate = 4.2 + 0.059 * Average SAT Score Interpreting SPSS regression output: Coefficients The y-intercept or constant is the predicted value of the dependent variable when the independent variable takes on the value of zero. This basic model predicts that when a college admits a class of students who averaged zero on their SAT, 4.2% of them will graduate. The constant is not the most helpful statistic. Interpreting SPSS regression output: Coefficients The coefficient of an independent variable is the predicted change in the dependent variable that results from a one unit increase in the independent variable. A college with students whose SAT scores are one point higher on average will have a graduation rate that is 0.059% higher. Increasing SAT scores by 200 points leads to a (200)(0.059%) = 11.8% rise in graduation rates Interpreting SPSS regression output: Fit of the Regression Model Summary Model 1 R a .588 R Square .345 Adjusted R Square .341 Std. Error of the Estimate 12.45% a. Predictors: (C onstant), Aver age SAT Score The R Square measures how closely a regression line fits the data in a scatterplot. • It can range from zero (no explanatory power) to one (perfect prediction). • An R Square of 0.345 means that differences in SAT scores can explain 35% of the variation in college graduation rates. Key sentence for your homework! R Square Examples Statistical Significance What would the null hypothesis look like in a scatterplot? If the independent variable has no effect on the dependent variable, the scatterplot should look random, the regression line should be flat, and its slope should be zero. Null hypothesis: The regression coefficient (b) for an independent variable equals zero. Can we reject null b=0 based on our estimate of b-hat? Statistical Significance Our formal test of statistical significance asks whether we can be sure that a regression coefficient for the population differs from zero. Just like in a difference in means/proportions test, the “standard error” is the standard deviation of the sample distribution. If a coefficient is more than two standard errors away from zero, we can reject the null hypothesis (that it equals zero). Statistical Significance So, if a coefficient is more than twice the size of its standard error, we reject the null hypothesis with 95% confidence. This works whether the coefficient is negative or positive. The coefficient/standard error ratio is called the “test statistic” or “t-stat.” A t-stat bigger than 2 or less than -2 indicates at statistically significant correlation. Interpreting SPSS regression output: T-Stats Coefficients Unstandardized Coefficients Model 1 B Standardized Coefficients Std. Error (Constant) 4.236 7.048 Average SAT Score 5.88E-02 .007 a. Dependent Variable: Graduation Rate a Beta t .588 Sig. .601 .549 8.778 .000 Multivariate Regressions A “multivariate regression” uses more than one independent variable (or confound) to explain variation in a dependent variable. The coefficient for each independent variable reports its effect on the DV, holding constant all of the other IVs in the regression. Thought experiment: Comparing two colleges founded in the same year with the same student faculty ratio, what is the effect of SATs? Multivariate Regressions Year of Founding SAT Scores Tuition Student/Faculty Ratio Graduation Rates Multivariate Regressions Again, want to estimate coefficients: Est. Grad. Rate = a + b1*SAT Score + b2*Year Founded+ b3*Tuition + b4*Faculty Ratio Multivariate Regressions Coefficients a Unstandardized Coefficients Model 1 Std. Error B (Constant) Standardized Coefficients Beta 59.187 47.203 -2.1E-02 .023 Average SAT Score 4.2E-02 In-state Tuition Year school was founded Student/faculty ratio t Sig. 1.254 .212 -.072 -.917 .361 .010 .410 4.224 .000 8.4E-04 .000 .208 2.109 .037 -.206 .329 -.054 -.626 .533 a. Dependent Variable: Graduation Rate Model Summary Model 1 R a .630 R Square .397 Adjusted R Square .377 a. Predictors: (Constant), Student/faculty ratio, Year school was founded, Average SAT Score, In-state Tuition Std. Error of the Estimate 12.11% Multivariate Regressions Holding all other factors constant, a 200 point increase in SAT scores leads to a predicted (200)(0.042) = 8.4% increase in the graduation rate, and this effect is statistically significant. Controlling for other factors, a college that is 100 years younger should have a graduation rate that is (100)(-0.021) = 2.1% lower, but this effect is not significantly different from zero.