Exploring Causality Using Multiple Regression and Path Analysis

advertisement
1
SPSS Mini 6: Exploring Causality Using Multiple Regression and Path Analysis
Introduction:
This final assignment will use the CCHS 2007 dataset and will be the core of your analysis for your
major research project. It consists of a multi-part multiple regression analysis and a causal path
analysis like the one demonstrated in class during Lecture 8 (Multivariate 1).
Your multiple regression analysis will consist of 1) your main dependent variable, 2) one or two main
independent variables and should incorporate 3) one or two control/intervening variables. For this
assignment, do not use more than 5-6 variables since this will make your interpretation much more
difficult! You can add any additional variables later on.
This assignment is more extensive than the previous ones you have done in the lab. The deadline for
SPSS Mini 6 (this assignment) is Nov. 17th so you will have to weeks (and two lab periods) to
complete it. Please save all of your output in a separate file but print only the output for Part II.
The final SPSS assignment is designed to be part of your major research project. It is an individual
assignment that each of you is required to submit. However, lab partners can work co-operatively and
should provide each other with suggestions and feedback whenever possible. Like the other
assignments, it is worth 5% of your mark. Upon completion of the assignment, you will have
completed the core analytical component of your final project, although in many cases further analysis
in the form of a revised causal model, t-tests, ANOVA, or contingency tables may still be necessary to
make sense of the variables that you have chosen for your project. The critique that you receive upon
submission of the assignment can be used to 'perfect' the SPSS analysis for your project.
In submitting the assignment, attach only the output from Part II and attach a separate page with your
final causal model and accompanying calculations.
Since the CCHS data set is very large and contains over 1100 variables, you may wish to make some
changes to it before you begin. Following are a few suggestions....
a)
Eliminate the variables not needed for your project and save only the variables that you will be
using, or may conceivably use later in the analysis. Go to Save Data As...>Variables and
Uncheck the unnecessary variables, saving the remaining variables in a new .sav file.
b)
If you are planning to select for certain cases (for instance, if you only want to look at cases from
a certain health region or province), do so by going to Data>Select Cases and saving the result as
the new .sav file.
c)
Once you have your new .sav file, make all your changes (recoding, etc.) to this new file, saving
the changes as you go. Make sure you do not save over the original CCHS file, in case you need
more from it later on!
2
Part I Preliminary Steps:
1.
In a few sentences, outline your problem for your major research project.
2.
What is your DV?
3.
What is (are) your main (one or two) independent variable(s)?
4.
What other IV's (one or two controls/intervening variables) are you planning to use?
5.
Find the variables of interest in the CCHS data set. Examine the questions and levels of
measurement of your dependent variable and your main independent variables. List this
information below.
3
6.
Create an index to use as one of the main variables in your analysis. Your DV must be at the
interval-ratio level, and if it is not, create the index for your DV. Otherwise, create an index to
measure one of your main IV's. To do this, find a series of related variables with similar answer
categories and follow the techniques suggested in Babbie Ch. 8 (SPSS Mini 1.) If desired, you
can create more than one index for your analysis but if you are using ordinal variables with more
than five categories, you can use them "as is." Briefly describe your index below.
7.
Run frequencies and histograms (Analyze>Descriptives>Frequencies), asking for a normal curve
and skewness/kurtosis statistics for all interval-ratio, summated ordinal indexes and ordinal
variables. You do not need to print any of this output – just examine it and save the file for future
use if necessary.
8.
Check histograms and statistics for skewness and kurtosis. Comment on this below. If any
variables are extremely skewed, they are not appropriate for a multiple regression analysis and
should be treated differently. If they are only moderately skewed, merely note this below and go
on with the assignment. For your final project, your best choice would be to recode the
problematic variables and use them in a multivariate contingency analysis (similar to what you
did in Mini Assignments 3 and 5) or alternatively, they can be recoded and used as 'dummy'
variables in your multiple regression analysis. Do not use a dummy variable for your DV!
Note: at a more advanced level, you could do algebraic transformations on the offending
variables, but that is beyond the scope of this course ;-) Comment on your histograms below and
see me if in doubt about any of this.
9.
Draw a rough preliminary model of your problem below, showing the inter-relationships between
the variables and using arrows to indicate the causal direction.
4
10.
Create a bivariate correlation matrix incorporating the variables from above. (At this point, don't
worry if not all your variables are at the interval-ratio level, because you are merely checking to
see if your proposed relationships between your DV and other variables are 'real'.) Go to Analyze
> Correlations for this. List your DV first, followed by all of your IV's.
11.
Examine the correlations of your DV with the independent variables. Are all variables
significantly related to your DV? List the variables, Pearson's r and p-values below (sig. level).
Use asterisks to indicate significance (* = <.05, ** = <.01, *** = < .001)
12.
Examine the correlations between your IV's. Do any of the inter-correlations seem very high
(.700 or above) indicating multi-collinearity? List the r and p-values between all IV's below,
indicating which variables may not be appropriate for this analysis.
13.
Drop any variables that are not significantly correlated with your DV unless you have theoretical
reasons for keeping them in the model. If there are indications of multi-collinearity in the IV's,
drop one of the two variables that are highly correlated. Revise your hypothetical causal model.
14.
Run a revised correlation matrix and check for changes.
15.
On a separate page, draw your revised causal model, indicating all relationships you expect to
find. You will use this diagram for the regression analysis and path analysis in Part II. Submit
this page with your assignment.
Part II Your Mulitple Regression Analysis (Note: print and submit the output for this section):
1.
Create scatterplots for all variables at the interval ratio or summated ordinal level (i.e. your
index.) Check for linearity and homoscedasticity and briefly comment on this below.
5
2.
Run a multiple regression analysis using the process outlined in Lecture 9 (Multivariate Analysis
2) that incorporates at least three other variables. If any of your IV's are at the nominal level, you
will need to first recode them into binary "dummy" variables to use in the analysis. Note: if there
are more than three categories, you will need to collapse the categories to create a binary variable
(see Babbie.) Go to Analyze>Regression to do the analysis. Under Statistics, check off Estimates,
Model Fit, Descriptives, and Part and Partial Correlations. Under Options, the default Listwise
should be checked off.
3.
Examine the Model Summary. Interpret R and Adj. R2 below.
4.
Examine the ANOVA table. Interpret F and p-value below.
5.
Examine the Coefficients table. Interpret Slopes, significance and Beta values below.
6.
Look at the zero-order and partial correlations. Look for indications of spuriousness and
multicolinnearity. Comment on any large changes below.
7.
Enter the Beta Weights (the path coefficients) from your multiple regression analysis onto your
revised model diagram (the one on the separate page.)
6
8.
Use regression to calculate the Betas for the other causal relationships. You will need to run a
partial regression analysis for each endogenous IV in your model. Exogenous variables (those
that have no prior causes in your model) can be ignored. Enter beta weights for all paths onto
your model.
Part III Path Analysis: Calculating the Causal Effects
1.
Do a complete path analysis according to the guidelines given in class (Lecture 8 Multivariate 1)
Show your calculations below the causal model on the separate page attached to this assignment.
2.
Which variable(s) have the greatest causal effect (direct and indirect) on your DV? How much
unexplained variability (error or 1 - R2) is there in your model? Do you think the model is
correctly specified, over-identified (unnecessary variables) or under-identified (missing crucial
IV's that should have been included)?
3.
How might you revise the model at a later date?
Download