class 20 correlation

advertisement
Planned Contrast: Execution (Conceptual)
1. Must predict pattern of interaction before gathering data.
Predict that Democratic women will be most opposed to gun instruction
in school, compared to Democratic men, Republican men, and
Republican women.
5
Rating
4
3
Male
2
Female
1
0
Republican
Democrat
Post Hoc Tests
Do female democrats differ from other groups?
1
2
3
4
=
=
=
=
Male/Republican
Male/Democrat
Female/Republican
Female/Democrat
Conduct six t tests? NO. Why not?
5.00
4.50
4.75
2.75
Will capitalizes on chance.
Solution: Post hoc tests of multiple comparisons.
Post hoc tests consider the inflated likelihood of Type I error
Kent's favorite—Tukey test of multiple comparisons,
which is the most generous.
NOTE: Post hoc tests can be done on any multiple set of means,
not only on planned contrasts.
Conducting Post Hoc Tests
1. Recode data from multiple factors into single factor, as per
planned contrast.
2. Run oneway ANOVA statistic
3. Select "posthoc tests" option.
ONEWAY
gunctrl BY genparty
/CONTRAST= -1 -1 -1 3
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS
/POSTHOC = TUKEY
ALPHA(.05).
Note: Not necessary to
conduct planned contrast
to conduct post-hoc test
Selected posthoc test
Post hoc Tests, Page 1
Descriptives
gunctrl
N
male republican
male democrat
female republican
female democrat
Total
4
4
4
4
16
Mean
5.0000
4.5000
4.7500
2.7500
4.2500
Std. Deviation
.81650
1.29099
.95743
.95743
1.29099
Std. Error
.40825
.64550
.47871
.47871
.32275
95% Confidence Interval for
Mean
Lower Bound Upper Bound
3.7008
6.2992
2.4457
6.5543
3.2265
6.2735
1.2265
4.2735
3.5621
4.9379
Minimum
4.00
3.00
4.00
2.00
2.00
Maximum
6.00
6.00
6.00
4.00
6.00
ANOVA
gunctrl
Between Groups
Within Groups
Total
Sum of
Squares
12.500
12.500
25.000
df
3
12
15
Mean Square
4.167
1.042
F
4.000
Sig.
.035
Post Hoc Tests, Page 2
Multiple Comparisons
Dependent Variable: gunctrl
Tukey HSD
(I) genparty
male republican
male democrat
female republican
female democrat
(J) genparty
male democrat
female republican
female democrat
male republican
female republican
female democrat
male republican
male democrat
female democrat
male republican
male democrat
female republican
Mean
Difference
(I-J)
.50000
.25000
2.25000*
-.50000
-.25000
1.75000
-.25000
.25000
2.00000
-2.25000*
-1.75000
-2.00000
*. The mean difference is significant at the .05 level.
Std. Error
.72169
.72169
.72169
.72169
.72169
.72169
.72169
.72169
.72169
.72169
.72169
.72169
Sig.
.898
.985
.039
.898
.985
.125
.985
.985
.070
.039
.125
.070
95% Confidence Interval
Lower Bound
Upper Bound
-1.6426
2.6426
-1.8926
2.3926
.1074
4.3926
-2.6426
1.6426
-2.3926
1.8926
-.3926
3.8926
-2.3926
1.8926
-1.8926
2.3926
-.1426
4.1426
-4.3926
-.1074
-3.8926
.3926
-4.1426
.1426
Data Management Issues
Setting up data file
Checking accuracy of data
Disposition of data
Why obsess on these details? Murphy's Law
If something can go wrong, it will go wrong,
and at the worst possible time.
Errars Happin!
Creating a Coding Master
1. Get survey copy
2. Assign variable names
3. Assign variable values
4. Assign missing values
5. Proof master for accuracy
6. Make spare copy, keep in file drawer
Coding Master
variable
values
variable names
Note: Var. values not
needed for scales
Cleaning Data Set
1. Exercise in delay of gratification
2. Purpose: Reduce random error
3. Improve power of inferential stats.
Complete Data Set
Note: Are any cases missing data?
Checking Descriptives
Are any “Minimums” too low?
Are any “Maximums” too high?
Do Ns indicate missing data?
Do SDs indicate extreme outliers?
Checking Correlations Between Variables
Do variables correlate in the expected manner?
Using Cross Tabs to Check for
Missing or Erroneous Data Entry
Case A: Expect equal cell sizes
Gender
Oldest
Youngest
Only Child
Males
10
10
20
Females
5
15
20
TOTAL
15
25
40
Case B: Impossible outcome
Number of Siblings
Oldest
Youngest
Only Child
None
4
3
6
One
3
4
0
More than one
3
4
2
TOTAL
10
10
8
Storing Data
Raw Data
1. Hold raw data in secure place
2. File raw data by ID #
3. Hold raw date for at least 5 years post publication, per APA
Automated Data
1. One pristine source, one working file, one syntax file
2. Back up, Back up, Back up
`
3. Use external hard drive as back-up for PC
File Raw Data Records By ID Number
01-20
21-40
41-60
61-80
81-100
101-120
COMMENT SYNTAX FILE GUN CONTROL STUDY SPRING 2007
COMMENT
DATA MANAGEMENT
IF (gender = 1 & party = 1) genparty = 1 .
EXECUTE .
IF (gender = 1 & party = 2) genparty = 2 .
EXECUTE .
IF (gender = 2 & party = 1) genparty = 3 .
EXECUTE .
IF (gender = 2 & party = 2) genparty = 4 .
EXECUTE .
COMMENT
ANALYSES
UNIANOVA
gunctrl BY gender party
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = DESCRIPTIVE
/CRITERIA = ALPHA(.05)
/DESIGN = gender party gender*party .
ONEWAY
gunctrl BY genparty
/CONTRAST= -1 -1 -1 3
/STATISTICS DESCRIPTIVES
/MISSING ANALYSIS
/POSTHOC = TUKEY ALPHA(.05).
Save Syntax File!!!
Research Project Notebook
Purpose: All-in-one handy summary of research project
Content:
1. Administrative (timeline, list of staff, etc.)
2. Overview of Research
3. Experiment Materials
* Surveys
* Consents, debriefings
* Manipulations
* Procedures summary/instructions
4. IRB materials
* Application
* Approval
5. Data
* Coding forms
* Syntax file
* Primary outcomes
Correlation
Class 20
Today's Class Covers
What and why of measures of association
Covariation
Pearson's r correlation coefficient
Partial Correlation
Comparing two correlations
Non-Parametric correlations
Do Variables Relate to One Another?
Is teacher pay related to performance?
Positive
Is exercise related to illness?
Negative
Is CO2 related to global warming?
Positive
Is platoon cohesion related to PTSD?
Negative
Is TV viewing related to shoe size?
Zero
Exercise and Illness
1. How many times a week do you exercise? _____
2. How many days have you missed school this term due
to illness? _____
3. How many hours of sleep do you get each night? ____
Interpreting Correlations
[C]
Sleep
Hours
[A]
Exercise
[B]
Illness
A --> B Exercise reduces illness
B --> C Illness reduces exercise
C --> (A & B) Third variable (sleep) affects exercise and
illness simultaneously
Exercise and Illness Data
(fabricated)
subject
exerise.days
sleep.hours
sick.days
1
5
7
0
2
3
6
2
3
4
8
1
4
6
7
1
5
2
6
3
6
4
7
1
7
1
5
7
8
7
6
3
9
4
7
3
10
3
6
3
11
5
7
2
12
2
6
4
13
3
5
2
14
3
6
4
Description of Data
Scatterplot: Exercise and Days Sick
Regression Line
8
7
Co-variation
exercise days
sick days
6
# Days
5
4
3
2
1
0
Subject Number
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Covariation Formula
cov (x,y) =
Σ (Xi – X) (Yi – Y)
cov(exercise, sickness) =
N–1
(-3.32) + (0.40) + (-0.46) …+ (-1.02)
14-1
= -23/13 = -1.77
Problem with Covariation
"To all health and exercise researchers: Please send
us your exercise and health covariations."
Team 1: exercise = days per week exercise, covariation = -1.77
Team 2: exercise = hours per week exercise, covariation = -34.00
What if we all we have are the covariations?
How do we compare them?
How would we know, in this case, whether Team 1 showed a
larger, smaller, or equal covariation than did Team 2?
Pearson Correlation Coefficient
cov
xy
r=
sxsy
r=
Σ (Xi – X) (Yi – Y)
(N – 1) sxsy
Pearson r (“rho”): -1.00 to + 1.00
Using R2 to Interpret Correlation
R2 = r2 = amount of variance shared between correlated
variables.
Correl: exercise.hours, sick.days = .613
R2 = .6132 = .376
“About 38% of variability in sick days is explained by
variability in exercise hours.”
Variation in Sick Days Explained by
Exercise Hours
R2 = .6132 = .376
Exercise hours = .376%
0
2.5
Number of Sick Days Last Term
7
Partial Correlation
Issue: How much does Variable 1 explain Variable 2,
AFTER accounting for the influence of Variable 3?
Sickness and Exercise Study: How much does
exercise explain days sick, AFTER accounting for the
influence of nightly hours of sleep?
Partial Correlation answers this question.
Partial Correlation
Sick Days
Exercise Days
Sleep Hours
var. explained = .376
var. explained = .277
var. explained by sleep
alone (.17)
var. explained by
exercise alone (.04)
var. explained by
exercise + sleep (.21)
Partial Correlations in SPSS
PARTIAL CORR /VARIABLES= sleep.hours exercise.days by sick.days
/SIGNIFICANCE=TWOTAIL /MISSING=LISTWISE.
PARTIAL CORR /VARIABLES= sleep.hours sick.days by exercise.days
/SIGNIFICANCE=TWOTAIL /MISSING=LISTWISE.
Non-Parametric Correlations
Assumptions of Correlations
1. Normally distributed data
2. Homogeneity of variance
3. Interval data (at least)
What if Assumptions Not Met?
Spearman's rho: Data are ordinal.
Kendall's tau: Data are ordinal, but small sample, and
many scores have the same ranking
Parametric Correlations
Assumptions of Correlations
1. Normally distributed data
2. Homogeneity of variance
3. Interval data (at least)
Var. A
Var. B
Watch
TV
1 hr
2 hr
3 hr
4 hr
5 hr
Eat Fast
Food
1 day
2 day
3 day
4 day
5 day
Non-Parametric Correlations
What if Assumptions Not Met?
Spearman's rho: Data are ordinal.
Kendall's tau: Data are ordinal, but small sample, and many scores
have the same ranking.
Var. A
Var. B
Watch
TV
Never
Daily
Weekly
Monthly
Yearly
Eat Fast
Food
Never
Daily
Weekends
Holidays
Leap Years
Comparing Correlations
Issue: How do we know if one correlation is different
from another?
Example: Is the nightly-sleep / sick days correl.
different from the TV hours /sick days correl?
Difference Between Correlations
Diff. Between 2 Independent
correlations
zr1 - zr2
z=
1
n1 - 3
1
+
n2 - 3
Diff. Between 2 dependent =
correlations
tdifference = (rxy - rzy)
√
(n-3) (1 + rxz)
2 (1-r2xy -r2xz - r2zy + 2rxyrxzrzy)
Link to calculator for two ind. samples correlations
http://faculty.vassar.edu/lowry/rdiff.html
Note:
Assumes independent samples
Download