RStats Statistics and Research Camp 2014 Welcome! Slide 1 Helen Reid, PhD Dean of the College of Health and Human Services Missouri State University Slide 2 Welcome • Goal: keeping up with advances in quantitative methods • Best practices: use familiar tools in new situations – Avoid common mistakes Todd Daniel, PhD Slide 3 RStats Institute Coffee Advice Ronald A. Fisher Slide 4 Box, 1976 Cambridge, England 1920 Slide 5 Dr. Muriel Bristol Familiar Tools • Null Hypothesis Significance Testing (NHST) • a.k.a. Null Hypothesis Decision Making • a.k.a. Statistical Hypothesis Inference Testing p < .05 The probability of finding these results given that the null hypothesis is true Slide 6 Benefits of NHST • All students trained in NHST • Easier to engage researchers • Results are not complex Everybody is doing it Slide 7 Statistically Significant Difference What does p < .05 really mean? 1. There is a 95% chance that the alternative hypothesis is true 2. This finding will replicate 95% of the time 3. If the study was repeated, the null would be rejected 95% of the time Slide 8 The Earth is Round, p < .05 What you want… • The probability that an hypothesis is true given the evidence What you get… • The probability of the evidence assuming that the (null) hypothesis is true. Slide 9 Cohen, 1994 Pigs Might Fly Slide 10 • Null: There are no flying pigs H0: P = 0 • Random sample of 30 pigs • One can fly 1/30 = .033 • What kind of test? – Chi-Square? – Fisher exact test? – Binomial? Slide 11 Why do you even need a test? Daryl Bem and ESP • Assumed random guessing p = .50 • Found subject success of 53%, p < .05 • Too much power? – Everything counts in large amounts • What if Bem set p = 0 ? – One clairvoyant v. group that guesses 53% In the real world, the null is always false. Slide 12 Problems with NHST Cohen (1994) reservations Non-replicable findings Poor basis for policy decisions False sense of confidence NHST is “a potent but sterile intellectual rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring” - Paul Meehl Slide 13 Cohen, 1994 • • • • What to do then? • Learn basic methods that improve your research (learn NHST) • Learn advanced techniques and apply them to your research (RStats Camp) • Make professional connections and access professional resources Slide 14 Agenda 9:30 Best Practices (and Poor Practices) In Data Analysis 11:00 Moderated Regression 12:00 – 12:45 Lunch (with Faculty Writing Retreat) 1:00 Effect Size and Power Analysis 2:00 Meta-Analysis 3:00 Structural Equation Modeling Slide 15 RStats Statistics and Research Camp 2014 Best Practices and Poor Practices Session 1 R. Paul Thomlinson PhD Burrell Slide 16 Poor Practices Common Mistakes Slide 17 Mistake #1 No Blueprint Slide 18 Mistake #2 Ignoring Assumptions Pre-Checking Data Before Analysis Slide 19 Assumptions Matter • Data: I call you and you don’t answer. • Conclusion: you are mad at me. • Assumption: you had your phone with you. If my assumptions are wrong, it prevents me from looking at the world accurately Slide 20 Assumptions for Parametric Tests • "Assumptions behind models are rarely articulated, let alone defended. The problem is exacerbated because journals tend to favor a mild degree of novelty in statistical procedures. Modeling, the search for significance, the preference for novelty, and the lack of interest in assumptions -- these norms are likely to generate a flood of nonreproducible results." – David Freedman, Chance 2008, v. 21 No 1, p. 60 Slide 21 Assumptions for Parametric Tests • "... all models are limited by the validity of the assumptions on which they ride." Collier, Sekhon, and Stark, Preface (p. xi) to Freedman David A., Statistical Models and Causal Inference: A Dialogue with the Social Sciences. • Parametric tests based on the normal distribution assume: – – – – Slide 22 Interval or Ratio Level Data Independent Scores Normal Distribution of the Population Homogeneity of Variance Assessing the Assumptions • Assumption of Interval or Ratio Data – Look at your data to make sure you are measuring using scale-level data – This is common and easily verified Slide 23 Independence • Techniques are least likely to be robust to departures from assumptions of independence. • Sometimes a rough idea of whether or not model assumptions might fit can be obtained by either plotting the data or plotting residuals obtained from a tentative use of the model. Unfortunately, these methods are typically better at telling you when the model assumption does not fit than when it does. Slide 24 Independence • Assumption of Independent Scores – Done during research construction – Each individual in the sample should be independent of the others • The errors in your model should not be related to each other. • If this assumption is violated: – Confidence intervals and significance tests will be invalid. Slide 25 Assumption of Normality • You want your distribution to not be skewed • You want your distribution to not have kurtosis – At least, not too much of either Slide 26 Normally Distributed Something or Other • The normal distribution is relevant to: – Parameters – Confidence intervals around a parameter – Null hypothesis significance testing • This assumption tends to get incorrectly translated as ‘your data need to be normally distributed’. Slide 27 Assumption of Normality • Both skew and kurtosis can be measured with a simple test run for you in SPSS – Values exceeding +3 or -3 indicate very skewed Slide 28 Assessing Normality with Numbers Slide 29 Tests of Normality • Kolmogorov-Smirnov Test – Tests if data differ from a normal distribution – Significant = non-Normal data – Non-Significant = Normal data SPSSExam.sav • Non-Significant is the ideal Slide 30 The P-P Plot Slide 31 SPSSExam.sav Histograms & Stem-and Leaf Plots Slide 32 Double-click on Histogram in Output window to add the normal curve When does the Assumption of Normality Matter? • Normality matters most in small samples – The central limit theorem allows us to forget about this assumption in larger samples. • In practical terms, as long as your sample is fairly large, outliers are a much more pressing concern than normality Slide 33 Assessing the Assumptions • Assumption of Homogeneity of Variance – Only necessary when comparing groups – Levene’s Test Slide 34 Assessing Homogeneity of Variance Graphs Number of hours of ringing in ears after a concert Slide 35 Assessing Homogeneity of Variance Numbers • Levene’s Tests – Tests if variances in different groups are the same. – Significant = Variances not equal – Non-Significant = Variances are equal • Non-Significant is ideal • Variance Ratio (VR) – With 2 or more groups – VR = Largest variance/Smallest variance – If VR < 2, homogeneity can be assumed. Slide 36 Spotting problems with Linearity or Homoscedasticity Slide 37 Mistake #3 Ignoring Missing Data Slide 38 Missing Data Slide 39 It is the lion you don’t see that eats you Amount of Missing Data • APA Task Force on Statistical Inference (1999) recommended that researchers report patterns of missing data and the statistical techniques used to address the problems such data create • Report as a percentage of complete data – “Missing data ranged from a low of 4% for attachment anxiety to a high of 12% for depression.” • If calculating total or scale scores, impute the values for the items first, then calculate scale Slide 40 Pattern of Missing Data • Missing Completely At Random (MCAR) – No pattern; not related to variables – Accidentally skipped one; got distracted • Missing At Random (MAR) – Pattern does not differ between groups • Not Missing At Random (NMAR) – Parents who feel competent are more likely to skip the question about interest in parenting classes Slide 41 Pattern of Missing Data Distinguish between MCAR and MAR • Create a dummy variable with two values: missing and non-missing – SPSS: recode new variable • Test the relation between dummy variable and the variables of interest – If not related: data are either MCAR or NMAR – If related: data are MAR or NMAR • Little’s (1988) MCAR Test – Missing Values Analysis add-on module in SPSS 20 – If the p value for this test is not significant, indicates data are MCAR Slide 42 What if my Data are NMAR? • You’re not screwed • Report the pattern and amount of missing data Slide 43 Listwise Deletion • Cases with any missing values are deleted from analysis – Default procedure for SPSS • Problems – If the cases are not MCAR remaining cases are a biased subsample of the total sample – Analysis will be biased – Loss of statistical power • Dataset of 302 respondents dropped to 154 cases Slide 44 Pairwise Deletion • Cases are excluded only if data are missing on a required variable – Correlating five variables: case that was missing data on one variable would still be used on the other four • Problems – Uses different cases for each correlation (n fluctuates) – Difficult to compare correlations – May mess with multivariate analyses Slide 45 Mean Substitution • Missing values are imputed with the mean value of that variable • Problems – Produces biased means with data that are MAR or NMAR – Underestimates variance and correlations • Experts strongly advise against this method Slide 46 Regression Substitution • Existing scores are used to predict missing values • Problems – Produces unbiased means under MCAR or MAR – Produces biases in the variances • Experts advise against this method Slide 47 Pattern-Matching Imputation • Hot-Deck Imputation – Values are imputed by finding participants who match the case with missing data on other variables • Cold-Deck Imputation – Information from external sources is used to determine the matching variables Slide 48 • Does not require specialized programs • Has been used with survey data • Reduces the amount of variation in the data Stochastic Imputation Methods • Stochastic = random – Does not systematically change the mean; gives unbiased variance estimates • Maximum Likelihood (ML) Strategies – Observed data are used to estimate parameters, which are then used to estimate the missing scores – Provides “unbiased and efficient” parameters – Useful for exploratory factor analysis and internal consistency calculations Slide 49 Multiple Imputation (MI) • Create several imputed data sets (3 – 5) • Analyze each data set and save the parameter estimates • Average the parameter estimates to get an unbiased parameter estimate – Most complex procedure – Computer-intensive Slide 50 Handling Missing Data • Read: Examine published literature to find similar situations • Choose an appropriate method – Expectation maximization – Multiple imputation – Maximum likelihood • Report the method chosen to handle the data and give a brief rationale for that selection Slide 51 Mistake #4 Ignoring Outliers Slide 52 Outliers Outliers can change the nature of the relationship Slide 53 Extreme example Outliers • Univariate outlier – “Outliers are people, too.” – Check for • Multivariate outlier – Should be removed – Find with Mahalanobis test Slide 54 Spotting Outliers With Graphs MusicFestival.sav Outlier Slide 55 Slide 56 Mistake #5 Ignoring Effect Size Slide 57 Ignoring Effect Size • • • • Effect size is the magnitude of the findings Post hoc: easy, non-controversial A priori: used for statistical power analysis Power: probability of rejecting the null when the null is false – The ability to find a difference where one exists Slide 58 More Significant? • Imagine you are comparing two tests. The first test is significant z = 2.01, p < .05, two tail The second is significant z = 8.37, p < .0001, two tail • Is the second more significant than the first? – No, it is only a less likely result. We want to know how BIG the effect was Slide 59 How does Significance Differ From Effect Size You failed to record a 25¢ charge to your checking account Was your 25¢ deficit due to random variation or was it a real mistake. Real mistake Will that mistake have a big effect? No. Real effect but a small effect You recorded a $200 payment as a $200 deposit Was your $400 deficit due to random variation or was it a real mistake. Real mistake Will that mistake have a big effect? Yes. Real effect and a large effect size Slide 60 Effect Size • How big was the effect the treatment had – Critical value does not tell you effect size • Hypothesis testing tells if an effect is significant – You should also report the effect size –r –d Slide 61 Cohen’s d Effect Size r = .1 d = .2 small effect the effect explains 1% of the total variance r = .3 d = .5 medium effect the effect accounts for 9% of the total variance r = .5 d = .8 large effect the effect accounts for 25% of the variance Free effect size calculator at: http://www.missouristate.edu/rstats/110161.htm Slide 62 Mistake #6 Making Continuous Categorical Mean or Median Splits Slide 63 Making Continuous Categorical Slide 64 Bad Idea—Don’t Do It • Results in: – Lost information—why throw away all those data?? – Reduced statistical power – Increased likelihood of Type II error • Only justified when: – Distribution of the variable is highly skewed – The variable’s relationship to another variable is non-linear. Slide 65 Mistake #7 Misunderstood Analysis “But that’s how we have always done it.” Slide 66 MANOVA then ANOVA • • • • • Slide 67 Study of 222 MANOVAs in six journals Common: MANOVA followed by ANOVAs MANOVA controls for Type I error Protected F Test “A significant MANOVA difference need not imply that any significant ANOVA effect or effects exist…” Huberty & Morris, 1989 When to Use ANOVA • Outcome variables are conceptually independent – Effects of using clickers, teacher interaction, and student ability on Algebra concept attainment, Geometry concept attainment, Musical concept attainment, and classroom interaction? – Use four 3-Way ANOVAs Slide 68 When to Use ANOVA • Research is exploratory – Study of new treatment or outcome variables – Non-confirmatory • Reexamine bivariate relationships in multivariate context – Outcome variables were previously studied in univariate contexts – Useful for comparisons Slide 69 When to Use ANOVA • Selecting a comparison group – Demonstrate that two or more groups are similar on a number of descriptors • Problem • If both IV-1 and IV-2 are significant, but IV-2 is highly correlated with IV-1, then IV-2 is not really contributing – MANOVA can control for this Slide 70 When to Use MANOVA • Are there any overall interactions or main effects present? • Variable Selection – Do I need all these DVs? – Find the parsimonious DV combination • Variable Ordering – Assess the relative contribution of DVs to group differences • Variable System Structure Slide 71 When to Use MANOVA • Variable System Structure – Identify a construct that underlies the DVs • More of an art than statistical science – System: collection of conceptually related variables that underlies a construct – Five attitude DVs, reduced to 2 (Watterson, Joe, Cole, & Sells, 1980) – Reduced 21 DVs on student performance to 2 constructs: academic performance and personal growth (Hackman & Taber, 1979) Slide 72 So • MONOVA and ANOVA address different research questions – One may have little bearing on the other • Controlling for Type I errors with preliminary MANOVA is a myth • Whether using MANOVA or multiple ANOVAs, report the intercorrelation of the variables. Slide 73 Confidence Intervals Slide 74 Confidence Intervals • When estimating the population mean, the best guess is the sample mean • The sample mean is very precise, but it is unlikely to be 100% accurate – Any outcome has some measurement error Slide 75 Confidence Intervals • Another way to estimate the population value is a Confidence Interval – The mean should be between this and that • The confidence interval is not very specific but we are very confident that the real mean is contained within its range – The average movie ticket is $6.83 – Tickets will probably cost between $5 and $8 Slide 76 Confidence Intervals • Dugong, et al. (2008) – Plankton consumption by sharks at National Aquarium • True Mean (all basking sharks) – 15 Million • Sample Mean (sharks at National Aquarium) – 17 Million • Confidence Interval estimate – 12 to 22 million (contains true value) – 16 to 18 million (misses true value) – CIs constructed such that 95% of the CIs contain the true value. Slide 77 Basking Shark FIGURE 2 The confidence intervals of the number of plankton consumed by a basking shark at one time (horizontal axis) for 50 different samples (vertical avis) Slide 78 plankton (in millions) Moving Beyond NHST Next Steps Slide 79 The Four Parameters 1. Alpha significance criterion (p < .05) 2. The sample size 3. The population effect size 4. The power of the test. Any one is a function of the other three Slide 80 1. Power • Before conducting an study, you should do a power analysis • Power is the probability of not making a Type II error – Power = 1 - B • We find the effect when it is truly there – We want to maximize power D. Wayne Mitchell PhD Slide 81 Type I and Type II Errors • Type I Error – Occurs when we believe that there is a genuine effect in our population, when in fact there isn’t. – The probability is the α-level (usually .05) • Type II Error Pinocchio – Occurs when we believe that there is no effect in the population when, in reality, there is. – The probability is the β-level (often .2) Dunce CapΨ Slide 82 2. Effect Size • A significant alpha tell us the results were (most likely) not accidental • Effect size tells us whether the effect was large or small – Gas prices • Effect size can be used in meta-analysis Melissa Meier PhD Slide 83 3. Complex Relationships • NHST tells us that differences exist between groups • Complex relationships can exist among variables • Structural Equation Modeling Kayla Jordan, RStats Slide 84 4. Mediation and Moderation • NHST tells us what differences exist • Mediation tells us how relationships between variables change • Moderation tells us when relationships between variables exist Todd Daniel PhD Slide 85 Take a Break Slide 86