Planned Contrast: Execution (Conceptual) 1. Must predict pattern of interaction before gathering data. Predict that Democratic women will be most opposed to gun instruction in school, compared to Democratic men, Republican men, and Republican women. 5 Rating 4 3 Male 2 Female 1 0 Republican Democrat Post Hoc Tests Do female democrats differ from other groups? 1 2 3 4 = = = = Male/Republican Male/Democrat Female/Republican Female/Democrat Conduct six t tests? NO. Why not? 5.00 4.50 4.75 2.75 Will capitalizes on chance. Solution: Post hoc tests of multiple comparisons. Post hoc tests consider the inflated likelihood of Type I error Kent's favorite—Tukey test of multiple comparisons, which is the most generous. NOTE: Post hoc tests can be done on any multiple set of means, not only on planned contrasts. Conducting Post Hoc Tests 1. Recode data from multiple factors into single factor, as per planned contrast. 2. Run oneway ANOVA statistic 3. Select "posthoc tests" option. ONEWAY gunctrl BY genparty /CONTRAST= -1 -1 -1 3 /STATISTICS DESCRIPTIVES /MISSING ANALYSIS /POSTHOC = TUKEY ALPHA(.05). Note: Not necessary to conduct planned contrast to conduct post-hoc test Selected posthoc test Post hoc Tests, Page 1 Descriptives gunctrl N male republican male democrat female republican female democrat Total 4 4 4 4 16 Mean 5.0000 4.5000 4.7500 2.7500 4.2500 Std. Deviation .81650 1.29099 .95743 .95743 1.29099 Std. Error .40825 .64550 .47871 .47871 .32275 95% Confidence Interval for Mean Lower Bound Upper Bound 3.7008 6.2992 2.4457 6.5543 3.2265 6.2735 1.2265 4.2735 3.5621 4.9379 Minimum 4.00 3.00 4.00 2.00 2.00 Maximum 6.00 6.00 6.00 4.00 6.00 ANOVA gunctrl Between Groups Within Groups Total Sum of Squares 12.500 12.500 25.000 df 3 12 15 Mean Square 4.167 1.042 F 4.000 Sig. .035 Post Hoc Tests, Page 2 Multiple Comparisons Dependent Variable: gunctrl Tukey HSD (I) genparty male republican male democrat female republican female democrat (J) genparty male democrat female republican female democrat male republican female republican female democrat male republican male democrat female democrat male republican male democrat female republican Mean Difference (I-J) .50000 .25000 2.25000* -.50000 -.25000 1.75000 -.25000 .25000 2.00000 -2.25000* -1.75000 -2.00000 *. The mean difference is significant at the .05 level. Std. Error .72169 .72169 .72169 .72169 .72169 .72169 .72169 .72169 .72169 .72169 .72169 .72169 Sig. .898 .985 .039 .898 .985 .125 .985 .985 .070 .039 .125 .070 95% Confidence Interval Lower Bound Upper Bound -1.6426 2.6426 -1.8926 2.3926 .1074 4.3926 -2.6426 1.6426 -2.3926 1.8926 -.3926 3.8926 -2.3926 1.8926 -1.8926 2.3926 -.1426 4.1426 -4.3926 -.1074 -3.8926 .3926 -4.1426 .1426 Data Management Issues Setting up data file Checking accuracy of data Disposition of data Why obsess on these details? Murphy's Law If something can go wrong, it will go wrong, and at the worst possible time. Errars Happin! Creating a Coding Master 1. Get survey copy 2. Assign variable names 3. Assign variable values 4. Assign missing values 5. Proof master for accuracy 6. Make spare copy, keep in file drawer Coding Master variable values variable names Note: Var. values not needed for scales Cleaning Data Set 1. Exercise in delay of gratification 2. Purpose: Reduce random error 3. Improve power of inferential stats. Complete Data Set Note: Are any cases missing data? Checking Descriptives Are any “Minimums” too low? Are any “Maximums” too high? Do Ns indicate missing data? Do SDs indicate extreme outliers? Checking Correlations Between Variables Do variables correlate in the expected manner? Using Cross Tabs to Check for Missing or Erroneous Data Entry Case A: Expect equal cell sizes Gender Oldest Youngest Only Child Males 10 10 20 Females 5 15 20 TOTAL 15 25 40 Case B: Impossible outcome Number of Siblings Oldest Youngest Only Child None 4 3 6 One 3 4 0 More than one 3 4 2 TOTAL 10 10 8 Storing Data Raw Data 1. Hold raw data in secure place 2. File raw data by ID # 3. Hold raw date for at least 5 years post publication, per APA Automated Data 1. One pristine source, one working file, one syntax file 2. Back up, Back up, Back up ` 3. Use external hard drive as back-up for PC File Raw Data Records By ID Number 01-20 21-40 41-60 61-80 81-100 101-120 COMMENT SYNTAX FILE GUN CONTROL STUDY SPRING 2007 COMMENT DATA MANAGEMENT IF (gender = 1 & party = 1) genparty = 1 . EXECUTE . IF (gender = 1 & party = 2) genparty = 2 . EXECUTE . IF (gender = 2 & party = 1) genparty = 3 . EXECUTE . IF (gender = 2 & party = 2) genparty = 4 . EXECUTE . COMMENT ANALYSES UNIANOVA gunctrl BY gender party /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PRINT = DESCRIPTIVE /CRITERIA = ALPHA(.05) /DESIGN = gender party gender*party . ONEWAY gunctrl BY genparty /CONTRAST= -1 -1 -1 3 /STATISTICS DESCRIPTIVES /MISSING ANALYSIS /POSTHOC = TUKEY ALPHA(.05). Save Syntax File!!! Research Project Notebook Purpose: All-in-one handy summary of research project Content: 1. Administrative (timeline, list of staff, etc.) 2. Overview of Research 3. Experiment Materials * Surveys * Consents, debriefings * Manipulations * Procedures summary/instructions 4. IRB materials * Application * Approval 5. Data * Coding forms * Syntax file * Primary outcomes Correlation Class 20 Today's Class Covers What and why of measures of association Covariation Pearson's r correlation coefficient Partial Correlation Comparing two correlations Non-Parametric correlations Do Variables Relate to One Another? Is teacher pay related to performance? Positive Is exercise related to illness? Negative Is CO2 related to global warming? Positive Is platoon cohesion related to PTSD? Negative Is TV viewing related to shoe size? Zero Exercise and Illness 1. How many times a week do you exercise? _____ 2. How many days have you missed school this term due to illness? _____ 3. How many hours of sleep do you get each night? ____ Interpreting Correlations [C] Sleep Hours [A] Exercise [B] Illness A --> B Exercise reduces illness B --> C Illness reduces exercise C --> (A & B) Third variable (sleep) affects exercise and illness simultaneously Exercise and Illness Data (fabricated) subject exerise.days sleep.hours sick.days 1 5 7 0 2 3 6 2 3 4 8 1 4 6 7 1 5 2 6 3 6 4 7 1 7 1 5 7 8 7 6 3 9 4 7 3 10 3 6 3 11 5 7 2 12 2 6 4 13 3 5 2 14 3 6 4 Description of Data Scatterplot: Exercise and Days Sick Regression Line 8 7 Co-variation exercise days sick days 6 # Days 5 4 3 2 1 0 Subject Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Covariation Formula cov (x,y) = Σ (Xi – X) (Yi – Y) cov(exercise, sickness) = N–1 (-3.32) + (0.40) + (-0.46) …+ (-1.02) 14-1 = -23/13 = -1.77 Problem with Covariation "To all health and exercise researchers: Please send us your exercise and health covariations." Team 1: exercise = days per week exercise, covariation = -1.77 Team 2: exercise = hours per week exercise, covariation = -34.00 What if we all we have are the covariations? How do we compare them? How would we know, in this case, whether Team 1 showed a larger, smaller, or equal covariation than did Team 2? Pearson Correlation Coefficient cov xy r= sxsy r= Σ (Xi – X) (Yi – Y) (N – 1) sxsy Pearson r (“rho”): -1.00 to + 1.00 Using R2 to Interpret Correlation R2 = r2 = amount of variance shared between correlated variables. Correl: exercise.hours, sick.days = .613 R2 = .6132 = .376 “About 38% of variability in sick days is explained by variability in exercise hours.” Variation in Sick Days Explained by Exercise Hours R2 = .6132 = .376 Exercise hours = .376% 0 2.5 Number of Sick Days Last Term 7 Partial Correlation Issue: How much does Variable 1 explain Variable 2, AFTER accounting for the influence of Variable 3? Sickness and Exercise Study: How much does exercise explain days sick, AFTER accounting for the influence of nightly hours of sleep? Partial Correlation answers this question. Partial Correlation Sick Days Exercise Days Sleep Hours var. explained = .376 var. explained = .277 var. explained by sleep alone (.17) var. explained by exercise alone (.04) var. explained by exercise + sleep (.21) Partial Correlations in SPSS PARTIAL CORR /VARIABLES= sleep.hours exercise.days by sick.days /SIGNIFICANCE=TWOTAIL /MISSING=LISTWISE. PARTIAL CORR /VARIABLES= sleep.hours sick.days by exercise.days /SIGNIFICANCE=TWOTAIL /MISSING=LISTWISE. Non-Parametric Correlations Assumptions of Correlations 1. Normally distributed data 2. Homogeneity of variance 3. Interval data (at least) What if Assumptions Not Met? Spearman's rho: Data are ordinal. Kendall's tau: Data are ordinal, but small sample, and many scores have the same ranking Parametric Correlations Assumptions of Correlations 1. Normally distributed data 2. Homogeneity of variance 3. Interval data (at least) Var. A Var. B Watch TV 1 hr 2 hr 3 hr 4 hr 5 hr Eat Fast Food 1 day 2 day 3 day 4 day 5 day Non-Parametric Correlations What if Assumptions Not Met? Spearman's rho: Data are ordinal. Kendall's tau: Data are ordinal, but small sample, and many scores have the same ranking. Var. A Var. B Watch TV Never Daily Weekly Monthly Yearly Eat Fast Food Never Daily Weekends Holidays Leap Years Comparing Correlations Issue: How do we know if one correlation is different from another? Example: Is the nightly-sleep / sick days correl. different from the TV hours /sick days correl? Difference Between Correlations Diff. Between 2 Independent correlations zr1 - zr2 z= 1 n1 - 3 1 + n2 - 3 Diff. Between 2 dependent = correlations tdifference = (rxy - rzy) √ (n-3) (1 + rxz) 2 (1-r2xy -r2xz - r2zy + 2rxyrxzrzy) Link to calculator for two ind. samples correlations http://faculty.vassar.edu/lowry/rdiff.html Note: Assumes independent samples