RStats Camp Best Practices - Missouri State University

advertisement
RStats Statistics and Research Camp 2014
Welcome!
Slide 1
Helen Reid, PhD
Dean of the College of Health and
Human Services
Missouri State University
Slide 2
Welcome
• Goal: keeping up with advances in
quantitative methods
• Best practices: use familiar tools in new
situations
– Avoid common mistakes
Todd Daniel, PhD
Slide 3
RStats Institute
Coffee Advice
Ronald A. Fisher
Slide 4
Box, 1976
Cambridge, England 1920
Slide 5
Dr. Muriel Bristol
Familiar Tools
• Null Hypothesis Significance Testing (NHST)
• a.k.a. Null Hypothesis Decision Making
• a.k.a. Statistical Hypothesis Inference Testing
p < .05
The probability of finding these results
given that the null hypothesis is true
Slide 6
Benefits of NHST
• All students trained in NHST
• Easier to engage researchers
• Results are not complex
Everybody is doing it
Slide 7
Statistically Significant Difference
What does p < .05 really mean?
1. There is a 95% chance that the
alternative hypothesis is true
2. This finding will replicate 95% of the
time
3. If the study was repeated, the null
would be rejected 95% of the time
Slide 8
The Earth is Round, p < .05
What you want…
• The probability that an hypothesis is true
given the evidence
What you get…
• The probability of the evidence assuming
that the (null) hypothesis is true.
Slide 9
Cohen, 1994
Pigs Might Fly
Slide 10
• Null: There are no flying pigs
H0: P = 0
• Random sample of 30 pigs
• One can fly 1/30 = .033
• What kind of test?
– Chi-Square?
– Fisher exact test?
– Binomial?
Slide 11
Why do you even need a test?
Daryl Bem and ESP
• Assumed random guessing p = .50
• Found subject success of 53%, p < .05
• Too much power?
– Everything counts in large amounts
• What if Bem set p = 0 ?
– One clairvoyant v. group that guesses 53%
In the real world, the null is always false.
Slide 12
Problems with NHST
Cohen (1994) reservations
Non-replicable findings
Poor basis for policy decisions
False sense of confidence
NHST is “a potent but sterile intellectual rake who
leaves in his merry path a long train of ravished
maidens but no viable scientific offspring”
- Paul Meehl
Slide 13
Cohen, 1994
•
•
•
•
What to do then?
• Learn basic methods that improve your
research (learn NHST)
• Learn advanced techniques and apply
them to your research (RStats Camp)
• Make professional connections and
access professional resources
Slide 14
Agenda
9:30 Best Practices (and Poor Practices)
In Data Analysis
11:00 Moderated Regression
12:00 – 12:45 Lunch (with Faculty Writing Retreat)
1:00 Effect Size and Power Analysis
2:00 Meta-Analysis
3:00 Structural Equation Modeling
Slide 15
RStats Statistics and Research Camp 2014
Best Practices and Poor
Practices
Session 1
R. Paul Thomlinson PhD
Burrell
Slide 16
Poor Practices
Common Mistakes
Slide 17
Mistake #1
No Blueprint
Slide 18
Mistake #2
Ignoring Assumptions
Pre-Checking Data Before Analysis
Slide 19
Assumptions Matter
• Data: I call you and you don’t answer.
• Conclusion: you are mad at me.
• Assumption: you had your phone with you.
If my assumptions are wrong, it prevents
me from looking at the world accurately
Slide 20
Assumptions
for Parametric Tests
• "Assumptions behind models are rarely articulated,
let alone defended. The problem is exacerbated
because journals tend to favor a mild degree of
novelty in statistical procedures. Modeling, the
search for significance, the preference for novelty,
and the lack of interest in assumptions -- these
norms are likely to generate a flood of
nonreproducible results."
– David Freedman, Chance 2008, v. 21 No 1, p. 60
Slide 21
Assumptions
for Parametric Tests
• "... all models are limited by the validity of the
assumptions on which they ride."
Collier, Sekhon, and Stark, Preface (p. xi) to
Freedman David A., Statistical Models and Causal
Inference: A Dialogue with the Social Sciences.
• Parametric tests based on the normal
distribution assume:
–
–
–
–
Slide 22
Interval or Ratio Level Data
Independent Scores
Normal Distribution of the Population
Homogeneity of Variance
Assessing the Assumptions
• Assumption of Interval or Ratio Data
– Look at your data to make sure you are
measuring using scale-level data
– This is common and easily verified
Slide 23
Independence
• Techniques are least likely to be robust to
departures from assumptions of independence.
• Sometimes a rough idea of whether or not
model assumptions might fit can be obtained by
either plotting the data or plotting residuals
obtained from a tentative use of the
model. Unfortunately, these methods are
typically better at telling you when the model
assumption does not fit than when it does.
Slide 24
Independence
• Assumption of Independent Scores
– Done during research construction
– Each individual in the sample should be
independent of the others
• The errors in your model should not be
related to each other.
• If this assumption is violated:
– Confidence intervals and significance tests will
be invalid.
Slide 25
Assumption of Normality
• You want your distribution
to not be skewed
• You want your distribution
to not have kurtosis
– At least, not too much of either
Slide 26
Normally Distributed Something or
Other
• The normal distribution is relevant to:
– Parameters
– Confidence intervals around a parameter
– Null hypothesis significance testing
• This assumption tends to get incorrectly
translated as ‘your data need to be normally
distributed’.
Slide 27
Assumption of Normality
• Both skew and kurtosis can be
measured with a simple test run for
you in SPSS
– Values exceeding +3 or -3 indicate very skewed
Slide 28
Assessing Normality with Numbers
Slide 29
Tests of Normality
• Kolmogorov-Smirnov Test
– Tests if data differ from a normal distribution
– Significant = non-Normal data
– Non-Significant = Normal data
SPSSExam.sav
• Non-Significant is the ideal
Slide 30
The P-P Plot
Slide 31
SPSSExam.sav
Histograms & Stem-and Leaf Plots
Slide 32
Double-click on Histogram in Output window to add the normal curve
When does the Assumption of
Normality Matter?
• Normality matters most in small samples
– The central limit theorem allows us to forget
about this assumption in larger samples.
• In practical terms, as long as your sample is
fairly large, outliers are a much more
pressing concern than normality
Slide 33
Assessing the Assumptions
• Assumption of Homogeneity of
Variance
– Only necessary when comparing groups
– Levene’s Test
Slide 34
Assessing Homogeneity of Variance
Graphs
Number of hours of ringing in ears after a concert
Slide 35
Assessing Homogeneity of Variance
Numbers
• Levene’s Tests
– Tests if variances in different groups are the
same.
– Significant = Variances not equal
– Non-Significant = Variances are equal
• Non-Significant is ideal
• Variance Ratio (VR)
– With 2 or more groups
– VR = Largest variance/Smallest variance
– If VR < 2, homogeneity can be assumed.
Slide 36
Spotting problems with Linearity or
Homoscedasticity
Slide 37
Mistake #3
Ignoring Missing Data
Slide 38
Missing Data
Slide 39
It is the lion you don’t see that eats you
Amount of Missing Data
• APA Task Force on Statistical Inference (1999)
recommended that researchers report patterns
of missing data and the statistical techniques
used to address the problems such data create
• Report as a percentage of complete data
– “Missing data ranged from a low of 4% for
attachment anxiety to a high of 12% for
depression.”
• If calculating total or scale scores, impute
the values for the items first, then calculate
scale
Slide 40
Pattern of Missing Data
• Missing Completely At Random (MCAR)
– No pattern; not related to variables
– Accidentally skipped one; got distracted
• Missing At Random (MAR)
– Pattern does not differ between groups
• Not Missing At Random (NMAR)
– Parents who feel competent are more likely to skip the question
about interest in parenting classes
Slide 41
Pattern of Missing Data
Distinguish between MCAR and MAR
• Create a dummy variable with two values:
missing and non-missing
– SPSS: recode new variable
• Test the relation between dummy variable and
the variables of interest
– If not related: data are either MCAR or NMAR
– If related: data are MAR or NMAR
• Little’s (1988) MCAR Test
– Missing Values Analysis add-on module in SPSS
20
– If the p value for this test is not significant, indicates
data are MCAR
Slide 42
What if my Data are NMAR?
• You’re not screwed
• Report the pattern and amount of
missing data
Slide 43
Listwise Deletion
• Cases with any missing values are
deleted from analysis
– Default procedure for SPSS
• Problems
– If the cases are not MCAR remaining cases
are a biased subsample of the total sample
– Analysis will be biased
– Loss of statistical power
• Dataset of 302 respondents dropped to 154
cases
Slide 44
Pairwise Deletion
• Cases are excluded only if data are
missing on a required variable
– Correlating five variables: case that was
missing data on one variable would still be
used on the other four
• Problems
– Uses different cases for each correlation (n
fluctuates)
– Difficult to compare correlations
– May mess with multivariate analyses
Slide 45
Mean Substitution
• Missing values are imputed with the
mean value of that variable
• Problems
– Produces biased means with data that are
MAR or NMAR
– Underestimates variance and correlations
• Experts strongly advise against this
method
Slide 46
Regression Substitution
• Existing scores are used to predict
missing values
• Problems
– Produces unbiased means under MCAR or
MAR
– Produces biases in the variances
• Experts advise against this method
Slide 47
Pattern-Matching Imputation
• Hot-Deck Imputation
– Values are imputed by finding participants who
match the case with missing data on other
variables
• Cold-Deck Imputation
– Information from external sources is used to
determine the matching variables
Slide 48
• Does not require specialized programs
• Has been used with survey data
• Reduces the amount of variation in the
data
Stochastic Imputation Methods
• Stochastic = random
– Does not systematically change the mean;
gives unbiased variance estimates
• Maximum Likelihood (ML) Strategies
– Observed data are used to estimate
parameters, which are then used to estimate
the missing scores
– Provides “unbiased and efficient” parameters
– Useful for exploratory factor analysis and
internal consistency calculations
Slide 49
Multiple Imputation (MI)
• Create several imputed data sets (3 – 5)
• Analyze each data set and save the
parameter estimates
• Average the parameter estimates to get
an unbiased parameter estimate
– Most complex procedure
– Computer-intensive
Slide 50
Handling Missing Data
• Read: Examine published literature to
find similar situations
• Choose an appropriate method
– Expectation maximization
– Multiple imputation
– Maximum likelihood
• Report the method chosen to handle the
data and give a brief rationale for that
selection
Slide 51
Mistake #4
Ignoring Outliers
Slide 52
Outliers
Outliers can change the nature of the relationship
Slide 53
Extreme example
Outliers
• Univariate outlier
– “Outliers are people, too.”
– Check for
• Multivariate outlier
– Should be removed
– Find with Mahalanobis test
Slide 54
Spotting Outliers With Graphs
MusicFestival.sav
Outlier
Slide 55
Slide 56
Mistake #5
Ignoring Effect Size
Slide 57
Ignoring Effect Size
•
•
•
•
Effect size is the magnitude of the findings
Post hoc: easy, non-controversial
A priori: used for statistical power analysis
Power: probability of rejecting the null
when the null is false
– The ability to find a difference where one exists
Slide 58
More Significant?
• Imagine you are comparing two tests.
The first test is significant z = 2.01, p < .05, two
tail
The second is significant z = 8.37, p < .0001,
two tail
• Is the second more significant than the
first?
– No, it is only a less likely result.
We want to know how BIG the effect was
Slide 59
How does Significance Differ From
Effect Size
You failed to record a 25¢ charge to your checking
account
Was your 25¢ deficit due to random variation or was it a real
mistake.
Real mistake
Will that mistake have a big effect?
No. Real effect but a small effect
You recorded a $200 payment as a $200 deposit
Was your $400 deficit due to random variation or was it a real
mistake.
Real mistake
Will that mistake have a big effect?
Yes. Real effect and a large effect size
Slide 60
Effect Size
• How big was the effect the treatment
had
– Critical value does not tell you effect size
• Hypothesis testing tells if an effect is
significant
– You should also report the effect size
–r
–d
Slide 61
Cohen’s d Effect Size
r = .1
d = .2
small effect
the effect explains 1% of the total variance
r = .3
d = .5
medium effect
the effect accounts for 9% of the total variance
r = .5
d = .8
large effect
the effect accounts for 25% of the variance
Free effect size calculator at:
http://www.missouristate.edu/rstats/110161.htm
Slide 62
Mistake #6
Making Continuous Categorical
Mean or Median Splits
Slide 63
Making Continuous Categorical
Slide 64
Bad Idea—Don’t Do It
• Results in:
– Lost information—why throw away all those
data??
– Reduced statistical power
– Increased likelihood of Type II error
• Only justified when:
– Distribution of the variable is highly skewed
– The variable’s relationship to another variable is
non-linear.
Slide 65
Mistake #7
Misunderstood Analysis
“But that’s how we have
always done it.”
Slide 66
MANOVA then ANOVA
•
•
•
•
•
Slide 67
Study of 222 MANOVAs in six journals
Common: MANOVA followed by ANOVAs
MANOVA controls for Type I error
Protected F Test
“A significant MANOVA difference need not
imply that any significant ANOVA effect or
effects exist…”
Huberty & Morris, 1989
When to Use ANOVA
• Outcome variables are conceptually
independent
– Effects of using clickers, teacher interaction, and
student ability on Algebra concept attainment,
Geometry concept attainment, Musical concept
attainment, and classroom interaction?
– Use four 3-Way ANOVAs
Slide 68
When to Use ANOVA
• Research is exploratory
– Study of new treatment or outcome variables
– Non-confirmatory
• Reexamine bivariate relationships in
multivariate context
– Outcome variables were previously studied in
univariate contexts
– Useful for comparisons
Slide 69
When to Use ANOVA
• Selecting a comparison group
– Demonstrate that two or more groups are
similar on a number of descriptors
• Problem
• If both IV-1 and IV-2 are significant, but IV-2
is highly correlated with IV-1, then IV-2 is
not really contributing
– MANOVA can control for this
Slide 70
When to Use MANOVA
• Are there any overall interactions or main
effects present?
• Variable Selection
– Do I need all these DVs?
– Find the parsimonious DV combination
• Variable Ordering
– Assess the relative contribution of DVs to
group differences
• Variable System Structure
Slide 71
When to Use MANOVA
• Variable System Structure
– Identify a construct that underlies the DVs
• More of an art than statistical science
– System: collection of conceptually related
variables that underlies a construct
– Five attitude DVs, reduced to 2 (Watterson, Joe, Cole, & Sells, 1980)
– Reduced 21 DVs on student performance to 2
constructs: academic performance and personal
growth (Hackman & Taber, 1979)
Slide 72
So
• MONOVA and ANOVA address different
research questions
– One may have little bearing on the other
• Controlling for Type I errors with
preliminary MANOVA is a myth
• Whether using MANOVA or multiple
ANOVAs, report the intercorrelation of the
variables.
Slide 73
Confidence Intervals
Slide 74
Confidence Intervals
• When estimating the population
mean, the best guess is the sample
mean
• The sample mean is very precise,
but it is unlikely to be
100% accurate
– Any outcome has some
measurement error
Slide 75
Confidence Intervals
• Another way to estimate the population
value is a Confidence Interval
– The mean should be between this and that
• The confidence interval is not very
specific but we are very confident that
the real mean is contained within its
range
– The average movie ticket is $6.83
– Tickets will probably cost between $5 and $8
Slide 76
Confidence Intervals
• Dugong, et al. (2008)
– Plankton consumption by sharks at National
Aquarium
• True Mean (all basking sharks)
– 15 Million
• Sample Mean (sharks at National Aquarium)
– 17 Million
• Confidence Interval estimate
– 12 to 22 million (contains true value)
– 16 to 18 million (misses true value)
– CIs constructed such that 95% of the
CIs contain the true value.
Slide 77
Basking Shark
FIGURE 2
The confidence
intervals of the
number of plankton
consumed by a
basking shark at one
time (horizontal axis)
for 50 different
samples (vertical avis)
Slide 78
plankton (in millions)
Moving Beyond NHST
Next Steps
Slide 79
The Four Parameters
1. Alpha significance criterion (p < .05)
2. The sample size
3. The population effect size
4. The power of the test.
Any one is a function of the other three
Slide 80
1. Power
• Before conducting an study, you should do a
power analysis
• Power is the probability of not making a
Type II error
– Power = 1 - B
• We find the effect when it is truly there
– We want to maximize power
D. Wayne Mitchell PhD
Slide 81
Type I and Type II Errors
• Type I Error
– Occurs when we believe that there is a
genuine effect in our population, when
in fact there isn’t.
– The probability is the α-level (usually .05)
• Type II Error
Pinocchio
– Occurs when we believe that there is
no effect in the population when, in
reality, there is.
– The probability is the β-level (often .2)
Dunce CapΨ
Slide 82
2. Effect Size
• A significant alpha tell us the results were
(most likely) not accidental
• Effect size tells us whether the effect was
large or small
– Gas prices
• Effect size can be used in meta-analysis
Melissa Meier PhD
Slide 83
3. Complex Relationships
• NHST tells us that differences exist between
groups
• Complex relationships can exist among
variables
• Structural Equation Modeling
Kayla Jordan, RStats
Slide 84
4. Mediation and Moderation
• NHST tells us what differences exist
• Mediation tells us how relationships
between variables change
• Moderation tells us when relationships
between variables exist
Todd Daniel PhD
Slide 85
Take a Break
Slide 86
Download