Selecting a data analysis technique: the first steps

advertisement
Selecting a Data Analysis Technique:
The First Steps
Bivariate Analysis (two variables)
The questions we want to answer are these:
 Are these variables related, or are they
independent of each other?
 Does the variability of one distribution tell us
anything about the variability of the other?
To select the right technique for answering these
questions, we first have to determine THE
LEVEL OF MEASUREMENT OF THE
VARIABLES.
Both Variables are Categoric
If both variables are categoric variables (nominal,
ordinal, or dichotomous), then we examine their
relationship using
 Crosstabs (we make a table)
 Chi-square (test of significance)
 Measures of association
Both Variables are Interval-Ratio
If both variables are interval-ratio variables (and,
for the independent variable, that can also include
dichotomous “dummy” variables)
 Look at the scatterplot. Does it look linear?
 Use linear regression analysis:
 correlation coefficient (r)
 ordinary least squares regression coefficient (Is it
significant?)
 the coefficient of determination (R2).
Independent Variable is Categoric,
Dependent Variable is Interval-Ratio
 ANOVA: Analysis of variance, comparing
variance within groups and variance between
groups
 Special type of “compare means” procedure
(Are the means of the dependent variable
different among the independent variable
categories?)
 F-test of significance
Interval-Ratio Independent Variables,
Dichotomous Dependent Variable
Use Logistic regression.
(This is not bivariate, but we’re just looking ahead.)
Selecting Data Analysis Techniques:
Examples

Is an individual’s religious choice (e.g., atheist,
Buddhist, Catholic) related to self-description as an
“adventurous eater” (agree, not sure, disagree)?
Crosstabs

Are countries’ suicide rates related to their homicide
rates? Regression analysis

Do individuals with different sexual orientations
(heterosexual, gay/lesbian, bisexual) have different
(mean) GPAs at this university? ANOVA

Is residential mobility (a lot, some, little) related to
performance on standardized achievement tests?
ANOVA
Logistic Regression Examples
(not bivariate)
 Are income, years of education, gender, and
race-ethnicity (changed into 0/1 dummy
variables) related to voting Republican or notRepublican?
 Are mother’s years of education, respondent’s
household income, and % below poverty line at
the high school attended related to a
dichotomous variable: childbearing before high
school graduation or not.
SPSS/PASW: What to Click
Both variables categoric: Analyze–Descriptive
Statistics–Crosstabs.
Both variables interval-ratio: Graphs–Legacy
Dialogs–Scatter/Dot (for scatterplot) and then
Analyze–Regression–Linear and Analyze–
Correlate.
Categoric IV (with more than two categories) and
interval-ratio DV: Analyze–Compare Means–One
Way ANOVA.
Warning: A Relationship
Does NOT Mean Causality!
When we find that two variables are related to
each other (using one of the data analysis
techniques), that does NOT necessarily mean that
the independent variable is a cause of the
dependent variable.
What do we mean by “cause”?
To be continued….
Download