Multivariate Analysis

advertisement
Soc. 504: Multiple Regression
Lecture 10, Oct. 31, 2006
In reading chapter 12, you can skip bottom of p. 402 and p. 403; also skip section 11.7
A. Multiple regression
1. MR OLS regression is a straightforward extension of bivariate OLS regression
a. I will discuss just two indep. vars., but same logic for 3+ indep. vars.
2. Assumptions
a. Linearity: relationship between all independent variables and dependent
variable is linear
b. Formula: Ŷ = a + bX1 + bX2 + Ui
(1) U is a “disturbance” term that captures all the unmeasured and unmeasurable
other factors that influence the value of Y
c. The formula assumes that the value of Y depends on random error as well as omitted
independent variables
3. the formula for multiple regression with two independent variables defines a regression
plane in 3-dimensional space, not a straight line [transparency]
a. imagine the cases scattered in a 3-dimensional space with Y on the vertical axis and
X1 and X2 each on a horizontal axis at right angles to each other
b. we cannot visualize the regression hypersurface if there are more than two
independent variables.
4. the regression plane is positions to minimize the squared deviations of Yi from the
regression plane
5. Interpreting the Y-intercept: a = the value of Y when both X1 and X2 = 0
a. if X1 and X2 have is no meaningful zero, intercept term is not meaningful
6. Interpreting the effects of the independent variables: multiple regression estimates the
partial effects of X1 and X2 on Y.
a. b1 represents the effect of X1 on Y, holding X2 constant; reverse is true for b2
(1) thus, bYX1 is the partial effect of X1 on Y
(2) some people denote the partial effect of X1 on Y as bYX1.X2
b. the partial regression coefficient bYX1 indicates how much Y changes (in units of Y)
with a one-unit change in X1 when we statistically control for X2
2
c. ways to think about controlling for a second independent variable in multiple
regression
(1) Crude way to understand what’s going on in multiple regression: think of it
as bivariate regression in which you regress Y on X1 and then regressing the
unexplained variation in Y (the residual) on X2. In this sense, you give X1 first
crack at explaining Y and then you let X2 have a shot at it. This is not what really
happens.
(2) Conceptualize multiple regression as literally holding X2 constant: Imagine
literally holding X2 constant by dividing our sample up into a whole bunch of
subsamples based on the values of X2
(a) This would yield an estimate of the effect of X1 on Y for every possible
value of X2
(b) imagine slicing the regression plane into very thin slices, one slice for each
value of X2, with each slice showing the effect of X1 on Y, and then
regressing Y on X1 for each slice defined by X2
(c) if X1 and X2 are completely independent (i.e., unrelated), each regression
coefficients byx1 would be equal size, regardless of the value of X2. And each
would estimate the effect of X1 on Y, net of X2.
(3) More generally, if X1 and X2 are uncorrelated, the partial regression
coefficient for X1, net of X2 (bYX1.X2) = the bivariate regression coefficient for X1 (bYX1)
(4) Important: the stronger the correlation between X1 and X2, the more of X1’s
effect on Y you remove when you control for X2 (and vice v.)
B. When we have more than two independent variables, Stata fits our data to a hyperplane
with 3 through k dimensions (where k = number of independent variables) that cuts through a k +
1 dimensional scattergram.
1. Example 1 from A&F p. 384
2. Example 2 from Stata output
C.
Assessing the size of the effects of more than one independent variable on Y
1. comparing b1 with b2; in other words, the effects of one unit change in X1 on Y with that
of one unit change in X2
2
3
a. e.g., each additional average year of experience of a school’s teachers might increase
students’ WASL scores by 7 points, whereas each additional computer might raise
students’ WASL scores by 2 points
b. unless all of our indep. vars. are measured in the same unit (e.g., years), we cannot
compare their relative importance for the value of Y
c. one way around this is to convert our indep. vars into unit-free variables before we do
a multiple regression.
(1) do this by transforming all our indep. vars. to Z-scores
d. easier to standardize the b for each independent variable by computing standardized
regression coefficients for each variable
(2) We symbolize these as either β1 or b1*
(3) b* gives standard-unit change in Y w/ each standard unit change in X
(4) The conversion formula from is b* = b(sxi/sy)
(a) nb: this is identical to the way that we can convert b to r; we
multiplied b by the ratio of (sxi/sy)
(5) transforming b to β allows us to directly compare the effect of independent
vars. measured in different units; it tells us how many standard units of Y change
with a change in one standard unit of X.
(a) standardized regression coefficients tell us the relative importance for
a given dep var. of various independent variables.
(b) Thus, βs allow us to compare the effects of apples and oranges on
health.
(d) The larger the b*, the more relatively important the variable
e. In sum, in multiple regression we can transform the regression slope b to βs which are
analogues of r in order to assess the relative strength of the effects of the independent
variables.
2. R2: The “coefficient of determination”
a. For a bivariate association, r measures the strength of the association between X and
Y, and r2 tells us the proportion of variation in Y that X explains.
3
4
b. In multivariate regression R (read “multiple R”) is analogous to r, but when we
square R we get the proportion of the variation in Y that all the independent variables in a
regression equation explain. Like r2, R2 is a PRE measure.
c. the theoretical formula for R2 is the same as that for r2: (TSS-SSE)/TSS
d. the more independent variables, the greater their likely explanatory power; to adjust
for this inflation of the R2, most computer programs also provide the “adjusted R2” also
known as R-bar-square
D. The order in which you enter independent variables in the regression equation
1. “stepwise”
2. Why do it?
3. Why do any analysis?
4
Download