Effect size - Department of Education

advertisement
Funded through the ESRC’s Researcher
Development Initiative
Session 1.2: Introduction
Prof. Herb Marsh
Ms. Alison O’Mara
Dr. Lars-Erik Malmberg
Department of Education,
University of Oxford
 Meta-analysis is an increasingly popular tool for
summarising research findings
 Cited extensively in research literature
 Relied upon by policymakers
 Important that we understand the method, whether
we conduct or simply consume meta-analytic
research
 Should be one of the topics covered in all
introductory research methodology courses
 Meta-analysis: a statistical analysis of a set of
estimates of an effect (the effect sizes), with the
goal of producing an overall (summary) estimate of
the effects. Often combined with analysis of
variables that moderate/predict this effect
 Systematic review: a comprehensive, critical,
structured review of studies dealing with a certain
topic. They are characterised by a scientific,
transparent approach to study retrieval and
analysis
 Most meta-analyses start with a systematic review
 Coding: the process of extracting the information
from the literature included in the meta-analysis.
Involves noting the characteristics of the studies in
relation to a priori variables of interest (qualitative)
 Effect size: the numerical outcome to be analysed
in a meta-analysis; a summary statistic of the data
in each study included in the meta-analysis
(quantitative)
 Summarise effect sizes: central tendency,
variability, relations to study characteristics
(quantitative)
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 Comparison of treatment & control groups
What is the effectiveness of a reading skills program for
treatment group compared to an inactive control group?
 Pretest-posttest differences
Is there a change in motivation over time?
 What is the correlation between two variables
What is the relation between teaching effectiveness and
research productivity
 Moderators of an outcome
Does gender moderate the effect of a peer-tutoring
program on academic achievement?
 Do you wish to generalise your findings to other
studies not in the sample?
 Do you have multiple outcomes per study. e.g.:
 achievement in different school subjects;
 5 different personality scales;
 multiple criteria of success
 Such questions determine the choice of metaanalytic model
 fixed effects
 random effects
 multilevel
Brown, S. A. (1990). Studies of educational
interventions and outcomes in diabetic adults:
A meta-analysis revisited. Patient Education and
Counseling, 16,189-215
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 Need to have explicit inclusion and exclusion criteria
 The broader the research domain, the more detailed
they tend to become
 Refine criteria as you interact with the literature
 Components of a detailed criteria
 distinguishing features
 research respondents
 key variables
 research methods
 cultural and linguistic range
 time frame
 publication types
Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for
developing a coding scheme for meta-analysis. Western Journal of
Nursing Research, 25, 205-222
Search electronic databases (e.g., ISI,
Psychological Abstracts, Expanded Academic
ASAP, Social Sciences Index, PsycINFO, and
ERIC)
Examine the reference lists of included studies
to find other relevant studies
If including unpublished data, email researchers
in your discipline, take advantage of Listservs,
and search Dissertation Abstracts International
 “motivation” OR “job satisfaction” produces
ALL articles that contain EITHER motivation OR
job satisfaction anywhere in the text
 inclusive, larger yield
 “motivation” AND “job satisfaction” will capture
only those subsets that have BOTH motivation
AND job satisfaction anywhere in the text
 restrictive, smaller yield
Check abstract & title
NO
DISCARD
NO
DISCARD
YES
Check the participants
and results sections
YES
COLLECT
 Inclusion process
usually requires
several steps to
cull inappropriate
studies
 Example from Bazzano,
L. A., Reynolds, K.,
Holder, K. N., & He, J.
(2006).Effect of Folic
Acid Supplementation on
Risk of Cardiovascular
Diseases: A Metaanalysis of Randomized
Controlled Trials. JAMA,
296, 2720-2726
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 The researcher must have a thorough knowledge of
the literature.
 The process typically involves (Brown et al.,
2003):
a) reviewing a random subset of studies to be
synthesized,
b) listing all relevant coding variables as they appear
during the review,
c) including these variables in the coding sheet, and
d) pilot testing the coding sheet on a separate subset
of studies.
 Coded data usually fall into the following four basic
categories:
1. methodological features
 Study identification code
 Type of publication
 Year of publication
 Country
 Participant characteristics
 Study design (e.g., random assignment, representative
sampling)
2. substantive features
 Variables of interest (e.g., theoretical framework)
3. study quality
 ‘Total’ measure of quality & study design
4. outcome measures - Effect size information
 The code book guides the coding process
 Almost like a dictionary or manual
 “...each variable is theoretically and operationally
defined to facilitate intercoder and intracoder
agreement during the coding process. The
operational definition of each category should be
mutually exclusive and collectively exhaustive”
(Brown et al., 2003, p. 208).
Code Sheet
__
1 Study ID
_99_
Year of publication
__
2
Publication type (1-5)
__
1
Geographical region (1-7)
_87_ _ _ Total sample size
_41_ _
Total number of males
_46_ _
Total number of females
Code Book
Publication type (1-5)
1. Journal article
2. Book/book chapter
3. Thesis or doctoral
dissertation
4. Technical report
5. Conference paper
 From Brown, et al. (2003).
 Code sheet = Table 1.
 Code book = Table 4.
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Random selection of papers coded by both
coders
Meet to compare code sheets
Where there is discrepancy, discuss to reach
agreement
Amend code materials/definitions in code book
if necessary
May need to do several rounds of piloting, each
time using different papers
Coding should ideally be done independently by 2
or more researchers to minimise errors and
subjective judgements
Ways of assessing the amount of agreement
between the raters:
 Percent agreement
 Cohen’s kappa coefficient
 Correlation between different raters
 Intraclass correlation
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 Lipsey & Wilson (2001) present many formulae for
calculating effect sizes from different information
 However, need to convert all effect sizes into a
common metric, typically based on the “natural”
metric given research in the area. E.g.:
 Standardized mean difference
 Odds-ratio
 Correlation coefficient
 Standardized mean difference
 Group contrasts
 Treatment groups
 Naturally occurring groups
 Inherently continuous construct
 Odds-ratio
 Group contrasts
 Treatment groups
 Naturally occurring groups
 Inherently dichotomous construct
 Correlation coefficient
 Association between variables
ES 
X Males  X Females
SD pooled
ES 
ad
bc
ES  r
Means and standard
deviations
Correlations
P-values
F-statistics
t-statistics
d
SE
 From Brown et al. (2003).
 Table 3
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
Includes the entire population of studies to be
considered; do not want to generalise to other
studies not included (e.g., future studies).
All of the variability between effect sizes is due to
sampling error alone. Thus, the effect sizes are only
weighted by the within-study variance.
Effect sizes are independent.
There are 2 general ways of conducting a fixed
effects meta-analysis: ANOVA & multiple regression
The analogue to the ANOVA homogeneity analysis
is appropriate for categorical variables
 Looks for systematic differences between groups of
responses within a variable
Multiple regression homogeneity analysis is more
appropriate for continuous variables and/or when
there are multiple variables to be analysed
 Tests the ability of groups within each variable to predict
the effect size
 Can include categorical variables in multiple regression
as dummy variables. (ANOVA is a special case of
multiple regression)
Is only a sample of studies from the entire
population of studies to be considered; want to
generalise to other studies not included (including
future studies).
Variability between effect sizes is due to sampling
error plus variability in the population of effects.
Effect sizes are independent.
 Variations in sampling schemes can introduce
heterogeneity to the result, which is the presence of
more than one intercept in the solution
 Heterogeneity: between-study variation in effect
estimates is greater than random (sampling)
variance
 Could be due to differences in the study design,
measurement instruments used, the researcher, etc
 Random effects models attempt to account for
between-study differences
 If the homogeneity test is rejected (it almost always
will be), it suggests that there are larger differences
than can be explained by chance variation (at the
individual participant level). There is more than one
“population” in the set of different studies.
 The random effects model helps to determine how
much of the between-study variation can be
explained by study characteristics that we have
coded.
 The total variance associated with the effect sizes
has two components, one associated with
differences within each study (participant level
variation) and one between study variance
Meta-analytic data is inherently hierarchical (i.e.,
effect sizes nested within studies) and has random
error that must be accounted for.
Effect sizes are not necessarily independent
Allows for multiple effect sizes per study
 Level 2: study component
 Publications
 Level 1: outcome-level component
 Effect sizes
Similar to a multiple regression equation, but
accounts for error at both the outcome (effect size)
level and the study level
Start with the intercept-only model, which
incorporates both the outcome-level and the studylevel components (analogous to the random effects
model multiple regression)
Expand model to include predictor variables, to
explain systematic variance between the study
effect sizes
Fixed, random, or multilevel?
Generally, if more than one effect size per study is
included in sample, multilevel should be used
However, if there is little variation at study level
and/or if there are no predictors included in the
model, the results of multilevel modelling metaanalyses are similar to random effects models
 Do you wish to generalise your findings to other
studies not in the sample?
Yes – random
No – fixed
effects or
effects
multilevel
 Do you have multiple outcomes per study?
Yes –
multilevel
No – random
effects or
fixed effects
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 Publication bias
 Fail-safe N (Rosenthal, 1991)
 Trim and fill procedure (Duval & Tweedie, 2000a, 2000b)
 Sensitivity analysis
 E.g., Vevea & Woods (2005)
 Power analysis
 E.g., Muncer, Craigie, & Holmes (2003)
 Study quality
 Quality weighting (Rosenthal, 1991)
 Use of kappa statistic in determining validity of quality
filtering for meta-analysis (Sands & Murphy, 1996).
 Regression with “quality” as a predictor of effect size
(see Valentine & Cooper, 2008)
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a








coding scheme for meta-analysis. Western Journal of Nursing Research, 25, 205-222.
Duval, S., & Tweedie, R. (2000a). A Nonparametric "Trim and Fill" Method of
Accounting for Publication Bias in Meta-Analysis. Journal of the American Statistical
Association, 95, 89-98.
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of
testing and adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA:
Sage Publications.
Muncer, S. J., Craigie, M., & Holmes, J. (2003). Meta-analysis and power: Some
suggestions for the use of power in research synthesis. Understanding Statistics, 2, 112.
Rosenthal, R. (1991). Quality-weighting of studies in meta-analytic research.
Psychotherapy Research, 1, 25-28.
Sands, M. L., & Murphy, J. R. (1996). Use of kappa statistic in determining validity of
quality filtering for meta-analysis: A case study of the health effects of
electromagnetic radiation. Journal of Clinical Epidemiology, 49, 1045-1051.
Valentine, J. C., & Cooper, H. M. (2008). A systematic and transparent approach for
assessing the methodological quality of intervention effectiveness research: The
Study Design and Implementation Assessment Device (Study DIAD). Psychological
Methods, 13, 130-149.
Vevea, J. L., & Woods, C. M. (2005). Publication bias in research synthesis: Sensitivity
analysis using a priori weight functions. Psychological Methods, 10, 428–443.
Download