SPS 580 Lecture 5 eta Bivariate regression F Rsq notes I. MORE STRATEGIES ABOUT MAKING A SCALE TO TEST A THEORY Idea . . . Higher income are less pessimistic about economic change in their neighborhood Theory . . . Income Attitude on Neighborhood Economic Future THREE VARIABLES FOR NEIGHBORHOOD PESSIMISM SCALE ORIGINAL CODE 1 2 3 nbrchg Neighborhood Changes In Past Five Years 1 2 3 nbinvest Would Invest In Current Neighborhood 1 Better 2,372 1 Better 2,748 1 Good Investment 7,125 2 Worse 1,727 2 Worse 1,541 2 Better Elsewhere 1,775 3 Same 4,886 3 Same 4,545 8 DK 149 300 239 7 Ref 3 2 15 1 1 8 7 9 0 nbrfutr In 5 Yrs Neighborhood Will Be 9 NA Neighborhood pessimism scale 67% 18% 10% 5% 100% Neighborhood pessimism scale 67% RECODE =0 =1 =0 =0 = 9 MIS = 9 MIS an OK scale Skewed distribution (not good) 18% 0 10% 1 Tail (= skew) 5% 2 3 Nice scale Horrible scale 91% 30% 26% 19% Skewed 0 25% 6% 2% 1% 1 2 3 0 Nothing but tail, in reality at best a dichotomy Mean = 0.13 (i.e. almost no variance) Almost everybody already has the same score No differences will be significant 1 2 3 Not skewed at all . . . No tail mean = 1.54 Lots of variance maximum chance for finding significant differences RULE OF THUMB: If you have a choice, code component variables to maximize scale variance E.g., Sum of (“Very satisfied” = 1) vs. Sum of (“Very satisfied + somewhat satisfied” =1) Unless you have a really good reason not to (e.g., policy importance of the “rare” cases) 1 SPS 580 Lecture 5 II. eta Bivariate regression F Rsq notes INTRO TO BIVARIATE REGRESSION Theory . . . Income Attitude on Neighborhood Economic Future A. CODE YOUR VARIABLES X (0,1) Y(int) INCOME RECODED TO (0,1) DICHOTOMY variable labels incomeadj 'HH income percentile in year survey was conducted'. value labels incomeadj 0 '<10%' 1 '10-25%' 2 '25-33%' 3 '33-50%' 4 '50-66%' 5 '66-75%' 6 '75-90%' 7 '90%+'. missing values incomeadj (99). RECODE incomeadj (0 thru 3=0) (4 thru 9=1) (ELSE=9) INTO income50pct. VARIABLE LABELS income50pct 'above or below median'. value labels income50pct 0 'below median' 1 'above median'. missing values income50pct (9). ALWAYS CODE DICHOTOMIES (0,1), whether they are X (indep) or Y (dep) THERE ARE NO GOOD REASONS NOT TO DO THIS B. DESCRIBE DEPENDENT VARIABLE Neighborhood pessimism scale Low 0 High 3 Mean 0.52 PQ table explains the dependent variable Variance 0.7280 Mean/range = Index of skew C. PERCENTAGE TABLE NOT USUALLY DONE, BUT CRUCIAL TO DO ONCE IN ORDER TO UNDERSTAND THIS STUFF Household Income Conditional distributions of %s Score on Neighborhood Pessimism Scale 0 1 2 3 total 0 Below median 58% 23% 13% 6% 100% 1 Above median 75% 14% 7% 3% 100% Chi Square (3) = 198 p < .05 Phi = .188 Is there a (sig) difference? Where’s the action? D. TABLE OF MEANS Score on Neighborhood Pessimism Scale Conditional means, variances Household Income Mean Variance SEM N 0 Below median .681 .8551 .01788 2,675 1 Above median .386 .5834 .01418 2,901 .528 .7354 .01148 5,576 Total 2 Difference = .386 -.681 = - .295 SE(Diff) = .023 T (like chi sq) = - 12.9 p <.05 Eta (like phi) = .172 Is it sig? Is it large, medium or small? SPS 580 Lecture 5 eta Bivariate regression F Rsq notes E. GRAPH THE CONDITIONAL MEANS Total Variance Var Y = .7354 Conditional Variances . . . Var Y | X=0 = .8551 Var Y | X=0 = .5834 Avg Conditional Variance = .7138 Score on Neighborhood Pessimism Scale .68 .80 .60 Eta Squared = 1 – Avg Conditional / Total Variance = .03 Measures how much variance in Y is “explained” by X ( = 3%) .39 .40 .20 .00 0 Below median income 1 Above median income Eta = SQRT(Eta^2) . . . like phi, a coefficient that indexes magnitude of correlation III. AND NOW, HERE’S JOHNNY . . . REGRESSION A. WHAT IT’S ABOUT Regression is based on the idea of a equation to predict the average score of Y as a function of X Predicted average (Y) = a + B(x) how the math works Y = a + B(x) usually expressed this way a = the intercept, the predicted average on Y when X = 0 B = the slope = how much the predicted average on Y changes when X goes up by 1 So in our example a = .681 = predicted average on Y for below median income (when X = 0) And B = -.295 = how much the predicted average on Y changes when X goes up by 1 REGRESSION EQUATION . . . Y = .681 -.295(x) Solve for predicted values of Y . . . below median $ . . . Y = .681 -.295*0 = .681 above median $ . . . Y = .681 -.295*1 = .386 B. GET THE COMPUTER TO DO IT ANALYZE REGRESSION LINEAR Dependent nbhdscale Independent income50pct OPTIONS exclude cases pairwise STATISTICS Descriptives CONTINUE OK Coefficients Model 1 Unstandardized Coefficients B Std. Error (Constant) .669 .016 income50pct above or below median -.293 .023 Intercept = a Slope = B Standardized Coefficients Beta -.172 Y = .669 -.293(x) t Sig. 42.280 .000 -13.030 .000 T-test for the slope = -13 3 The output for the regression equation is in the section on Coefficients p < .05 SPS 580 Lecture 5 eta Bivariate regression F Rsq notes Model Summary R R Square Adjusted R Square Std. Error of the Estimate .172 .030 .029 .84057 The output for explained variance is in the Model Summary R^2 = explained variance = .029 (.030 unadjusted) It measures the “goodness of fit” of the linear regression model to the observed data It is the same as ETA^2 when X is a dichotomy, in general that will not be the case C. SUMMARIZE THE RESULTS Impact on Neighborhood Pessimism Slope T-test significant? R Sq -.293 -13.030 yes .029 Predictor (coding) Income (0,1) IV. PQ MAKE X AN INTERVAL VARIABLE X (int 0,3) Y(int) ALWAYS CODE INTERVAL VARS STARTING with (0) A. Look over the pattern Score on Neighborhood Pessimism Scale Household Income 0 1 2 3 total 0 Lowest qtr 52% 25% 15% 7% 100% 1 Second qtr 64% 20% 12% 5% 100% 2 Third qtr 70% 17% 8% 4% 100% 4 Top qtr 80% 11% 6% 3% 100% Whenever feasible, look at the XTAB to see what’s going on ChiSq(9) = 270 Phi = .22 p < .05 Is it sig, where’s the action? B. Table of Means Neighborhood Pessimism Score 0 Lowest Income Quarter 0.78 1 Second qtr 0.57 2 Third qtr 0.46 4 Top qtr 0.31 Total 3+ means . . . can’t do a simple T-test ETA^2 = .040 (4% explained variance) ETA = .200 0.53 4 SPS 580 Lecture 5 eta Bivariate regression F Rsq notes C. Run the Regression Coefficientsa Model Standardized Unstandardized Coefficients B 1 (Constant) incomeQUARTER quarter Coefficients Std. Error .752 .019 -.152 .010 Beta t -.199 Sig. 40.087 .000 -15.150 .000 a. Dependent Variable: nbhdscale D. Write the equation, solve for Y as a function of X Predicted Y x= x= x= x= 0 1 2 3 0.75 0.60 0.45 0.30 Y = .752 -.152 (x) linear pattern of predicted means (LINEAR Regression) E. Plot the Observed and Predicted Means 0.80 Impact of Household Income on Neighborhood Pessimism Score 0.60 0.40 Observed Predicted 0.20 0.00 0 Lowest Income Quarter 1 Second qtr 2 Third qtr 4 Top qtr F. SUMMARIZE THE RESULTS Predictor Income Quarter (0,3) Impact on Neighborhood Pessimism Slope T-test significant? R Sq Eta Sq -.152 -15.1 yes .039 .040 PQ Difference between R^2 and Eta^2 shows the difference between a LINEAR prediction model and a CURVILINEAR prediction model Papers had several curvilinear patterns Age HH electronics, financial service use, social taxes, safety of parks, honesty of charities Topic of curves is for the future 5 SPS 580 Lecture 5 V. eta Bivariate regression F Rsq notes MAKE Y A DICHOTOMOUS VARIABLE A. Advantageous choice of Y recode to highlight data pattern (causal reln) and to simplify the analysis and presentation Score on Neighborhood Pessimism Scale Household Income 0 1 2 3 total 0 Lowest qtr 52% 25% 15% 7% 100% 1 Second qtr 64% 20% 12% 5% 100% 2 Third qtr 70% 17% 8% 4% 100% 4 Top qtr 80% 11% 6% 3% 100% Recode Y (0 vs 1+) ………... (0-1 vs. 2+)………..…………. (0-2 vs. 3 )………..……………………… CONSIDERATIONS ON THE OPTIMAL RECODE Highlight the data pattern, Maximize variance, Focus on policy-relevant group (if there is one) B. Look over the pattern, Table of Means When Y is (0,1) the proportion and the mean are the same thing, so you only need one table Percent pessimistic 0 Lowest Income Quarter 48% 1 Second qtr 36% 2 Third qtr 30% 4 Top qtr 20% Total Chi Sq(3) = 264 p < .05 Phi = .217 Eta = .217 Eta^2 = .047 33% C. Graph the data Percent pessimistic 60% 48% 36% 40% 30% 20% 20% 0% 0 Lowest Income Quarter 1 Second qtr 2 Third qtr 4 Top qtr 6 SPS 580 Lecture 5 eta Bivariate regression F Rsq notes D. Perform the Regression Coefficientsa Model 1 Unstandardized Coefficients B Std. Error (Constant) .468 .010 incomeQUARTER quarter -.091 .005 Standardized Coefficients Beta -.217 t Sig. 45.413 .000 -16.557 .000 E. Write the equation, solve for Y as a function of X Predicted Y x= x= x= x= 0 1 2 3 0.47 0.38 0.29 0.19 Y = .468 - .091 (x) linear pattern of Exp(Y) = a + B(x) F. Compare Observed Y with Predicted Y Examination of Outliers, Residuals Income Observed Predicted Residual Difference Y Qtr Y x= 0 0.48 0.47 0.01 x= 1 0.36 0.38 -0.01 x= 2 0.30 0.29 0.01 x= 3 0.20 0.19 0.00 Residual means Observed minus Predicted pattern of residuals tells you if there is curvilinearity (along with inspection of graph, and (ETA^2 – R^2) ) G. SUMMARIZE THE RESULTS Predictor Income Quarter (0,3) Impact on Neighborhood Pessimism Slope T-test significant? R Sq Eta Sq -.091 -16.6 yes .047 .047 7 PQ SPS 580 Lecture 5 eta Bivariate regression F Rsq notes ASSIGNMENT 5: A. Develop a theory of interest relating two interval-level measures X and Y 1. Create an interval dependent variable (Y) of interest. Recode as necessary to deal with outliers, skewness. Show the PQ percentage BAR GRAPH for the scale. 2. Create an interval independent variable (X) of interest. Recode as necessary to deal with outliers, skewness. Show the PQ percentage BAR GRAPH for the scale. B. Analyze the data according to three different data analysis situations 1. X and Y both interval 2. X dichotomous, Y interval 3. X dichotomous, Y dichotomous C. For each data analysis situation: 1. Show the pattern, show the Table of Means 2. Graph the data 3. Perform the regression – show the result 4. Write the equation, solve for Y as a function of X – show illustrative results 5. Compare Observed Y with Predicted Y – comment on curvilinearity, outliers 6. Summarize the results in a table 7. State your conclusion D. In each section, use enough English so I can follow what you are doing without having to memorize computer words. 8