SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models CONSTRUCTED VARIABLES 1.0 – BRANCH QUESTIONS I. LSTRAIL LSTRAILX BRANCH question [ASK: IF YES] In the past year, has any adult in your household gone to a trail for walking, hiking or bicycling? 1 Yes 5,334 2 No 3,795 8 DK 26 How many times in the past year did any adult from your household use a trail for walking, hiking or bicycling? Once 2 2-6 Times 3 7-11 Times 4 About 1/month 5 13-40 Times 6 Almost 1/wk 7 About 1/wk 8 53+ Times 98 DK Total 1 9,155 ASKED: 1992, 1993, 1996 165 1,236 Data set doesn’t include 0 times 398 243 642 Codes are not interval 169 118 467 69 3,507 ASKED: 1993, 1996 TRAILuseSCALE -- part way done TRAILuseSCALE -- interval 0 zero 3,795 0 zero 3,795 1 Once 165 1 Once 165 2 2-6 Times 1,236 4 2-6 Times 1,236 3 7-11 Times 398 9 7-11 Times 398 4 About 1/month 243 12 About 1/month 243 5 13-40 Times 642 20 13-40 Times 642 6 Almost 1/wk 169 20 Almost 1/wk 169 7 About 1/wk 118 20 About 1/wk 118 8 53+ Times 467 20 53+ Times 467 69 M DK 98 DK 7,303 Total create a new variable with 0 included recode to make it interval collapse extreme values to avoid outlier issues missing, not zero 69 7,303 Total Number of Trail Uses / year Average Trail Use 7.5 6.9 52% 5.0 Mean = 5.46 3.1 6% 2% 0 1 4 Slope = 1.51 19% 17% 9 3% 12 20 Lowest quarter $ 1 2nd Qtr $ 3rd Qtr $ Top Qtr $ SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models II. DUMMY VARIABLES 2.0 -- All 2+ nominal, ordinal variables Marital Status 1 2 3 4 Single Other adults in HH Partnership Married Policy-relevant nominal variable N 23% 17% 4% 56% 100% 8,371 added to data set, in syntax file, 6,284 1,491 LOOK AT INSTRUCTIONS to resolve inconsistencies, SPSS executes instructions in order 20,189 36,335 Marital status Household Income Average HH income percentile REGRESSION ANALYSIS . . . Rules for choosing a reference category Large case base Intuitive comparison 59.3 54.7 43.9 33.2 Dummy Var (Ref = Married) Single Other adults in HH Partnership Single Other adults Partnership in HH Slope T -26.1 -15.3 -4.6 -72.1 -38.0 -6.3 Married (K-1) dummies MULTIPLE REGRESSION RESULTS Partial Slope Zero order Explained by Education -26.1 -15.3 -4.6 6% 16% 16% Three-way regression Marital Status Single Other adults in HH Partnership (Ref = Married) -24.6 -12.8 -3.9 ** ** ** 14.2 22.1 34.5 ** ** ** Dummy variable slopes for education ** p < .05 how to report significance tests Amount of zero order effect due to education Education HSG Some college College grad+ (Ref = 0-11) Use these procedures for more complicated typologies – e.g. Single Men Men married to an employed woman Men married to a non-employed woman 2 SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models III. CAUSAL MODELING -- A logical system for diagramming causal order and explaining causal impact A. ZERO ORDER Start with . . . THEORY 1: Higher income people are more likely to be in excellent health . . . Income Health Status (Self-report Exc v. Good/Fair/Poor) Proportion "Excellent" Health X1 Y .70 .60 Proportion “Excellent” = .229 + .004*(Income percentile) .50 .40 T(slope) = 28 .30 .20 .10 Income Percentile .00 5 18 29 43 58 71 83 95 B. Two variable system graph B1 X1 two variable system, squares are variables, arrows are paths, B’s are slopes B1 is estimated with the zero order, bivariate regression of Y on X1 Y C. INTERVENING . . . THEORY 1a: The income difference is due to healthier life styles (walking, hiking bicycling) Income Outdoor Activity Health Proportion "Excellent" Health DATA MINING CHECK . . . X2 Y Significantly related, not curvilinear Hiking health B = .009 T=8 .60 .50 .40 .30 .20 .10 Frequency of outdoor activity .00 0 1 4 9 12 20 DATA MINING CHECK . . . X1 X2 Significantly related, not curvilinear Frequency of outdoor activity 10.0 8.0 6.0 Outdoor activity score = 2.59 + .0613 * (Income pctile) T(slope) = 19 4.0 2.0 Income Percentile 0.0 5 18 29 43 58 71 83 95 3 SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models D. Three variable system graph, X2 intervening B1 X1 B2 B1, B3 are estimated with the stepwise multiple regression, X1 first, then X2 analyze % of Zero order explained by X2 B2 is the zero order, bivariate regression of X2 on X1 Y B3 X2 MULTIPLE REGRESSION RESULTS Partial Slope Income Percentile Outdoor Activity Zero order 0.0033 ** 0.0037 0.0058 ** ** p < .05 Explained by Outdoor Activity 10% 10% of the income effect is explained by life style choices X2 is called an intervening variable because it is caused by X1, and goes on to cause Y The impact of controlling the intervening variable shows how much of the causal impact of X1 (income) on Y (health) goes down this causal pathway (lifestyle choice) – 10% o At present we derive the causal impact by subtraction (ZERO – PARTIAL) o But in a three-variable system you can also estimate it directly by multiplying B2*B3 o ZERO ORDER B1 = PARTIAL B1 + B2 * B3 .0037 = .0033 + .0613 * .0058 The graph tells you the regression equation to run and how to report the results E. CAUSALLY PRIOR THEORY 1b The income difference in health status is explained by the fact that higher income people are more likely to have had a higher education, and higher educated people know more about how to live a healthy life style. 1. When X2 is causally prior then the explanation is due to the fact that both X1 and Y depend on some third factor. B1 X1 B2 B1, B3 are estimated with the stepwise multiple regression, X1 first, then X2 analyze % of Zero order explained by X2, using stepwise output B2 is the zero order, bivariate regression of X1 on X2 Y B3 X2 4 SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models 2. When X2 is causally prior, the amount of explanation of the ZERO ORDER relationship is referred to as a SPURIOUS COMPONENT MULTIPLE REGRESSION RESULTS Zero order Partial Slope Income Percentile Education HSG Some college College grad+ (Ref = 0-11) 0.0025 ** 0.08 0.15 0.24 0.0037 Spurious result of Education 34% lots of explanation K-1 dummies comprise X2 If any are significant then you leave them all in the equation and the report ** ** ** ** p < .05 It is a SPURIOUS COMPONENT because it is a part of the apparent relationship between income and health that is due to the fact that both depend on a third variable -- education You find the spurious component by controlling for the causally prior variable (education) and looking at the remaining (PARTIAL) relationship between income and health You have to get the spurious component by subtraction, can’t multiply B1*B2 – that now means something else IV. DETERMINING CAUSAL ORDER AMONG VARIABLES If you don’t know the causal order between X1 and Y, there is very little research you can do. X1 Y X1 Y X1 X1 Y Y Depicted as 2-headed arrow or 2 arrows Can’t use slopes, since they are measures of 1-way causal impact Can do correlational analysis – Chi square – there is a difference, we don’t know which way it goes. There is no statistical test for determining causal order, it has to come from your knowledge of how the world works. It is arbitrary. It is approximate. A. Rules for assigning causal order X Y a. X happened first, earlier in life, … adolescent experiences adult experiences b. Y starts after X stops … education earnings c. Change in X precedes change in Y … divorced happiness d. X never changes, Y sometimes changes … gender employment status e. X doesn’t change much, Y changes more often … income TV usage, opinions B. Specifying the causal order for control variables determines whether you are o elaborating the reasons for how a causal chain works Income health o Or whether you are showing that the causal impact is not as great as people think because some of the apparent ZERO ORDER relationship is spurious 5 SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models V. A FULLY SPECIFIED SYSTEM . . . Prior and Intervening THEORY 1c The income difference in health status is explained partly by healthier life style and partly by the fact that higher income people are more likely to have had a higher education, and higher educated people know more about how to live a healthy life style. A. Four variable system, X2 prior, X3 intervening B1 X1 Zero order, Direct, Indirect and Spurious effects to be measured Y B3 B2 X2 B4 B5 B6 X3 B. Calculate a three step regression model, listwise deletion. . . Model 1 2 3 B t p (Constant) .1930 16.3 .000 incomeINTERVAL Percentile of HH income .0042 20.3 .000 (Constant) .1168 7.1 .000 incomeINTERVAL Percentile of HH income .0028 12.0 .000 educHSG dummy var HSG vs 0-11 .0996 5.1 .000 educANYCOLL dummy any coll vs 0-11 .1380 6.9 .000 educCOLLGRAD dummy coll grad vs 0-11 .2612 12.6 .000 (Constant) .1099 6.7 .000 incomeINTERVAL Percentile of HH income .0026 11.2 .000 educHSG dummy var HSG vs 0-11 .0958 4.9 .000 educANYCOLL dummy any coll vs 0-11 .1271 6.3 .000 educCOLLGRAD dummy coll grad vs 0-11 .2436 11.6 .000 TRAILuseSCALE .0045 5.6 .000 X1 ZERO ORDER X1 + prior vars (X2) X1 + priors (X2) + intervening vars (X3) C. Analyze the full regression equation from model 3 Proportion “Excellent Health” = .11 + .0026*Income Percentile + .10*High School Grad + .13*Any College + .24*College Grad + .005*Lifestyle Choice REGRESSION RESULTS Income Percentile Education HSG Some college College grad+ (Ref = 0-11) Outdoor Activity Slope 0.0026 T 11.2 0.0958 0.1271 0.2436 4.9 6.3 11.6 0.0045 5.6 6 Summarize the slopes comment on significance, pattern of dummies, etc SPS 580 Lecture 8 Dummy vars 2.0 Spurious effects Causal models D. Analyze the causal connections in the system Key Accounting equation . . . Zero Order = Direct Effect + Sum of all spurious components + Sum of all indirect effects CAUSAL ANALYSIS Zero Order Effect 0.0042 1. X1 slope from model 1 (zero order) Causal Effect Direct 0.0026 Indirect 0.0002 Total Causal Effect 0.0028 Spurious Effect VI. 0.0014 62% 4% 66% 3. X1 slope from model 3 (all vars) 4. Zero order – spurious - direct 5. Direct + Indirect 34% 2. X1 slope from model 1 minus X1 slope from model 2 (prior vars) CAVEATS, OBSERVATIONS A. Direct effect = amount not explained so far – what is the meaning of causality, unmeasured process variables (OK) unmeasured prior variables (not OK) B. The slope summary allows you to say what is significant and what is not, it does not allow you to say which variable is more important. The units are different. To compare magnitudes of slopes, you need a standardized unit. We will cover this next week. C. Income slope is changing slightly between examples due to Listwise deletion Assignment 8: 7