Soc. 504: Multiple Regression Lecture 10, Oct. 31, 2006 In reading chapter 12, you can skip bottom of p. 402 and p. 403; also skip section 11.7 A. Multiple regression 1. MR OLS regression is a straightforward extension of bivariate OLS regression a. I will discuss just two indep. vars., but same logic for 3+ indep. vars. 2. Assumptions a. Linearity: relationship between all independent variables and dependent variable is linear b. Formula: Ŷ = a + bX1 + bX2 + Ui (1) U is a “disturbance” term that captures all the unmeasured and unmeasurable other factors that influence the value of Y c. The formula assumes that the value of Y depends on random error as well as omitted independent variables 3. the formula for multiple regression with two independent variables defines a regression plane in 3-dimensional space, not a straight line [transparency] a. imagine the cases scattered in a 3-dimensional space with Y on the vertical axis and X1 and X2 each on a horizontal axis at right angles to each other b. we cannot visualize the regression hypersurface if there are more than two independent variables. 4. the regression plane is positions to minimize the squared deviations of Yi from the regression plane 5. Interpreting the Y-intercept: a = the value of Y when both X1 and X2 = 0 a. if X1 and X2 have is no meaningful zero, intercept term is not meaningful 6. Interpreting the effects of the independent variables: multiple regression estimates the partial effects of X1 and X2 on Y. a. b1 represents the effect of X1 on Y, holding X2 constant; reverse is true for b2 (1) thus, bYX1 is the partial effect of X1 on Y (2) some people denote the partial effect of X1 on Y as bYX1.X2 b. the partial regression coefficient bYX1 indicates how much Y changes (in units of Y) with a one-unit change in X1 when we statistically control for X2 2 c. ways to think about controlling for a second independent variable in multiple regression (1) Crude way to understand what’s going on in multiple regression: think of it as bivariate regression in which you regress Y on X1 and then regressing the unexplained variation in Y (the residual) on X2. In this sense, you give X1 first crack at explaining Y and then you let X2 have a shot at it. This is not what really happens. (2) Conceptualize multiple regression as literally holding X2 constant: Imagine literally holding X2 constant by dividing our sample up into a whole bunch of subsamples based on the values of X2 (a) This would yield an estimate of the effect of X1 on Y for every possible value of X2 (b) imagine slicing the regression plane into very thin slices, one slice for each value of X2, with each slice showing the effect of X1 on Y, and then regressing Y on X1 for each slice defined by X2 (c) if X1 and X2 are completely independent (i.e., unrelated), each regression coefficients byx1 would be equal size, regardless of the value of X2. And each would estimate the effect of X1 on Y, net of X2. (3) More generally, if X1 and X2 are uncorrelated, the partial regression coefficient for X1, net of X2 (bYX1.X2) = the bivariate regression coefficient for X1 (bYX1) (4) Important: the stronger the correlation between X1 and X2, the more of X1’s effect on Y you remove when you control for X2 (and vice v.) B. When we have more than two independent variables, Stata fits our data to a hyperplane with 3 through k dimensions (where k = number of independent variables) that cuts through a k + 1 dimensional scattergram. 1. Example 1 from A&F p. 384 2. Example 2 from Stata output C. Assessing the size of the effects of more than one independent variable on Y 1. comparing b1 with b2; in other words, the effects of one unit change in X1 on Y with that of one unit change in X2 2 3 a. e.g., each additional average year of experience of a school’s teachers might increase students’ WASL scores by 7 points, whereas each additional computer might raise students’ WASL scores by 2 points b. unless all of our indep. vars. are measured in the same unit (e.g., years), we cannot compare their relative importance for the value of Y c. one way around this is to convert our indep. vars into unit-free variables before we do a multiple regression. (1) do this by transforming all our indep. vars. to Z-scores d. easier to standardize the b for each independent variable by computing standardized regression coefficients for each variable (2) We symbolize these as either β1 or b1* (3) b* gives standard-unit change in Y w/ each standard unit change in X (4) The conversion formula from is b* = b(sxi/sy) (a) nb: this is identical to the way that we can convert b to r; we multiplied b by the ratio of (sxi/sy) (5) transforming b to β allows us to directly compare the effect of independent vars. measured in different units; it tells us how many standard units of Y change with a change in one standard unit of X. (a) standardized regression coefficients tell us the relative importance for a given dep var. of various independent variables. (b) Thus, βs allow us to compare the effects of apples and oranges on health. (d) The larger the b*, the more relatively important the variable e. In sum, in multiple regression we can transform the regression slope b to βs which are analogues of r in order to assess the relative strength of the effects of the independent variables. 2. R2: The “coefficient of determination” a. For a bivariate association, r measures the strength of the association between X and Y, and r2 tells us the proportion of variation in Y that X explains. 3 4 b. In multivariate regression R (read “multiple R”) is analogous to r, but when we square R we get the proportion of the variation in Y that all the independent variables in a regression equation explain. Like r2, R2 is a PRE measure. c. the theoretical formula for R2 is the same as that for r2: (TSS-SSE)/TSS d. the more independent variables, the greater their likely explanatory power; to adjust for this inflation of the R2, most computer programs also provide the “adjusted R2” also known as R-bar-square D. The order in which you enter independent variables in the regression equation 1. “stepwise” 2. Why do it? 3. Why do any analysis? 4