Exploring Causality Using Multiple Regression and Path Analysis

1 SPSS Mini 6: Exploring Causality Using Multiple Regression and Path Analysis Introduction: This final assignment will use the CCHS 2007 dataset and will be the core of your analysis for your major research project. It consists of a multi-part multiple regression analysis and a causal path analysis like the one demonstrated in class during Lecture 8 (Multivariate 1). Your multiple regression analysis will consist of 1) your main dependent variable, 2) one or two main independent variables and should incorporate 3) one or two control/intervening variables. For this assignment, do not use more than 5-6 variables since this will make your interpretation much more difficult! You can add any additional variables later on. This assignment is more extensive than the previous ones you have done in the lab. The deadline for SPSS Mini 6 (this assignment) is Nov. 17th so you will have to weeks (and two lab periods) to complete it. Please save all of your output in a separate file but print only the output for Part II. The final SPSS assignment is designed to be part of your major research project. It is an individual assignment that each of you is required to submit. However, lab partners can work co-operatively and should provide each other with suggestions and feedback whenever possible. Like the other assignments, it is worth 5% of your mark. Upon completion of the assignment, you will have completed the core analytical component of your final project, although in many cases further analysis in the form of a revised causal model, t-tests, ANOVA, or contingency tables may still be necessary to make sense of the variables that you have chosen for your project. The critique that you receive upon submission of the assignment can be used to 'perfect' the SPSS analysis for your project. In submitting the assignment, attach only the output from Part II and attach a separate page with your final causal model and accompanying calculations. Since the CCHS data set is very large and contains over 1100 variables, you may wish to make some changes to it before you begin. Following are a few suggestions.... a) Eliminate the variables not needed for your project and save only the variables that you will be using, or may conceivably use later in the analysis. Go to Save Data As...>Variables and Uncheck the unnecessary variables, saving the remaining variables in a new .sav file. b) If you are planning to select for certain cases (for instance, if you only want to look at cases from a certain health region or province), do so by going to Data>Select Cases and saving the result as the new .sav file. c) Once you have your new .sav file, make all your changes (recoding, etc.) to this new file, saving the changes as you go. Make sure you do not save over the original CCHS file, in case you need more from it later on! 2 Part I Preliminary Steps: 1. In a few sentences, outline your problem for your major research project. 2. What is your DV? 3. What is (are) your main (one or two) independent variable(s)? 4. What other IV's (one or two controls/intervening variables) are you planning to use? 5. Find the variables of interest in the CCHS data set. Examine the questions and levels of measurement of your dependent variable and your main independent variables. List this information below. 3 6. Create an index to use as one of the main variables in your analysis. Your DV must be at the interval-ratio level, and if it is not, create the index for your DV. Otherwise, create an index to measure one of your main IV's. To do this, find a series of related variables with similar answer categories and follow the techniques suggested in Babbie Ch. 8 (SPSS Mini 1.) If desired, you can create more than one index for your analysis but if you are using ordinal variables with more than five categories, you can use them "as is." Briefly describe your index below. 7. Run frequencies and histograms (Analyze>Descriptives>Frequencies), asking for a normal curve and skewness/kurtosis statistics for all interval-ratio, summated ordinal indexes and ordinal variables. You do not need to print any of this output – just examine it and save the file for future use if necessary. 8. Check histograms and statistics for skewness and kurtosis. Comment on this below. If any variables are extremely skewed, they are not appropriate for a multiple regression analysis and should be treated differently. If they are only moderately skewed, merely note this below and go on with the assignment. For your final project, your best choice would be to recode the problematic variables and use them in a multivariate contingency analysis (similar to what you did in Mini Assignments 3 and 5) or alternatively, they can be recoded and used as 'dummy' variables in your multiple regression analysis. Do not use a dummy variable for your DV! Note: at a more advanced level, you could do algebraic transformations on the offending variables, but that is beyond the scope of this course ;-) Comment on your histograms below and see me if in doubt about any of this. 9. Draw a rough preliminary model of your problem below, showing the inter-relationships between the variables and using arrows to indicate the causal direction. 4 10. Create a bivariate correlation matrix incorporating the variables from above. (At this point, don't worry if not all your variables are at the interval-ratio level, because you are merely checking to see if your proposed relationships between your DV and other variables are 'real'.) Go to Analyze > Correlations for this. List your DV first, followed by all of your IV's. 11. Examine the correlations of your DV with the independent variables. Are all variables significantly related to your DV? List the variables, Pearson's r and p-values below (sig. level). Use asterisks to indicate significance (* = <.05, ** = <.01, *** = < .001) 12. Examine the correlations between your IV's. Do any of the inter-correlations seem very high (.700 or above) indicating multi-collinearity? List the r and p-values between all IV's below, indicating which variables may not be appropriate for this analysis. 13. Drop any variables that are not significantly correlated with your DV unless you have theoretical reasons for keeping them in the model. If there are indications of multi-collinearity in the IV's, drop one of the two variables that are highly correlated. Revise your hypothetical causal model. 14. Run a revised correlation matrix and check for changes. 15. On a separate page, draw your revised causal model, indicating all relationships you expect to find. You will use this diagram for the regression analysis and path analysis in Part II. Submit this page with your assignment. Part II Your Mulitple Regression Analysis (Note: print and submit the output for this section): 1. Create scatterplots for all variables at the interval ratio or summated ordinal level (i.e. your index.) Check for linearity and homoscedasticity and briefly comment on this below. 5 2. Run a multiple regression analysis using the process outlined in Lecture 9 (Multivariate Analysis 2) that incorporates at least three other variables. If any of your IV's are at the nominal level, you will need to first recode them into binary "dummy" variables to use in the analysis. Note: if there are more than three categories, you will need to collapse the categories to create a binary variable (see Babbie.) Go to Analyze>Regression to do the analysis. Under Statistics, check off Estimates, Model Fit, Descriptives, and Part and Partial Correlations. Under Options, the default Listwise should be checked off. 3. Examine the Model Summary. Interpret R and Adj. R2 below. 4. Examine the ANOVA table. Interpret F and p-value below. 5. Examine the Coefficients table. Interpret Slopes, significance and Beta values below. 6. Look at the zero-order and partial correlations. Look for indications of spuriousness and multicolinnearity. Comment on any large changes below. 7. Enter the Beta Weights (the path coefficients) from your multiple regression analysis onto your revised model diagram (the one on the separate page.) 6 8. Use regression to calculate the Betas for the other causal relationships. You will need to run a partial regression analysis for each endogenous IV in your model. Exogenous variables (those that have no prior causes in your model) can be ignored. Enter beta weights for all paths onto your model. Part III Path Analysis: Calculating the Causal Effects 1. Do a complete path analysis according to the guidelines given in class (Lecture 8 Multivariate 1) Show your calculations below the causal model on the separate page attached to this assignment. 2. Which variable(s) have the greatest causal effect (direct and indirect) on your DV? How much unexplained variability (error or 1 - R2) is there in your model? Do you think the model is correctly specified, over-identified (unnecessary variables) or under-identified (missing crucial IV's that should have been included)? 3. How might you revise the model at a later date?

Exploring Causality Using Multiple Regression and Path Analysis

Related documents

Products

Support

Exploring Causality Using Multiple Regression and Path Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib