Meta-Analysis IGSE

advertisement
Doing Synthesis and
Meta-Analysis in
Applied Linguistics
Lourdes Ortega
University of Hawai‘i at Mānoa
National Tsing Hua University
Taiwan, June 8, 2011
Please cite as:
 Ortega, L. (2011). Doing synthesis and meta-analysis
in applied linguistics. Invited workshop at Tsing
Hua University, Taipei, June 8, 2011.
Copyright © Lourdes Ortega, 2011
Research synthesis
(including meta-analysis)
1.
2.
3.
4.
5.
6.
What is it?
Why do it?
How do we do it?
An example…
Challenges?
Value?
What is
research
synthesis?
The reviewing continuum
Secondary
Research
Narrative ..............................................................Systematic
LIT REVIEW
……………..SYNTHESIS……………
META-ANALYSIS
So, what is meta-analysis,
specifically?
…one specific kind of
research synthesis…
Secondary analysis of quantitative analyses
Each primary study is a data point
Goal: what are the main ‘effects’ or ‘relationships’ found
across many studies?
Strictly speaking, only quantitative studies apply
Why do it?
Traditional literature reviews…
…have lead to unending debates:
What does the evidence “say”? According
to whom? How do we know who is
right?
e.g.: Critical Period
Hypothesis
(Hyltenstam et al.
vs. Birdsong)
e.g.: error correction
(Ferris vs. Truscott)
Typical strategies of
traditional reviews?
Tables summarizing
many studies
e.g. from Krashen et al. (1979):
Vote-counting technique
e.g.: Error correction in L2 writing
Limitations:
Idiosyncratic methodology
No specific set of methods, up to
mysterious
expertise
Evidentiary
warrants difficult to judge
Experts are always vested, therefore
vulnerable to charge of bias
Statistical significance
has serious
pitfalls
Over-reliance
on statistical
significance (but magnitude, not just
generalizability, is of interest to social scientists!)
What does the evidence “say”? According
to whom? How do we know who is
right?
SOLUTION in the late 1970s
Methods for reviewing, from “art”
into “science”:
Systematic, not arbitrary
More than the sum of the parts
Replicable
Secondary, yes...
but empirically accountable, & discovering new
truths in old data
How do we
do it?
Norris & Ortega
(2006a, 2006b)
Norris, J. M., & Ortega, L. (2007). The future of
research synthesis in applied linguistics:
Beyond art or science. TESOL Quarterly, 41,
805-815.
Norris, J. M., & Ortega, L. (2010). Timeline:
Research synthesis. Language Teaching, 43,
461-479.
Ortega, L. (2010). Research synthesis. In B.
Paltridge & A. Phakiti (Eds.), Companion to
research methods in applied linguistics (pp. 111126). London: Continuum.
Norris, J. M. (2012). Meta-analysis. In C. Chapelle
(Ed.), Encyclopedia of applied linguistics.
Malden, MA: Wiley.
What are the definitional features of all
syntheses (including meta-analyses)?
1. Principled selection
of primary studies
2. Systematic coding of
each study for main
variables
3. Direct use of the evidence
reported (not the authors’
interpretations) across
studies
1. Principled selection of studies
 Sampling is central to empirical researchwhat
population are we trying to understand?
Random
[experimental]
Purposive
[qualitative]
 Sampling is central to synthesis, as well
Complete
[secondary research should be based
on the full universe of studies that have
investigated the same thing]
Search & Retrieval of Literature
 The literature search is a key step in systematic
synthesis (some direction: In'nami & Koizumi,
2010)identify all studies that are relevant
Exhaustive
[electronic, hand,
footnote chasing
invisible college]
Replicable
[fully explained in report]
 1st electronic searches
 2nd other techniques:
Manual searches of journals
Footnote chasing
Forward searches with Web of Science
Website searches of key contributing scholars
Polite email requests to authors & experts
Inclusion & Exclusion criteria
 All potentially relevant studies must then be
examined to decide: Include or Exclude (“apples
or oranges?”)
Exclusion criteria
Inclusion criteria
[explain each reason for exclusion
and give examples]
[all criteria satisfied]
Full rationale:
[tables, appendices,
philosophy of inclusivity or selectivity]
What are the definitional features of all
syntheses (including meta-analyses)?
1. Principled selection
of studies
Literature search +
Study eligibility criteria,
Inclusion/exclusion
2. Systematic coding of each study
 Eliciting evidence with consistency, just as
when surveying, interviewing, or testing
participants
Asking research questions of the literature:
 What variables are important?
 How (and how well) have they been investigated?
 What are the findings across studies?
Coding book to identify study
features that answer questions
Publication
features
Methodological
features
Year
Sample size
Author
Design
Published
or Fugitive?
Reliability
•Journal
•Book
•Dissertation
•Presentation
Stats used
Etc.
Multiple
coders
Substantive
features
e.g., How was
“explicit”
instruction
defined?
e.g., How was
“learning”
measured?
e.g., Means, sd,
etc?
What are the definitional features of all
syntheses (including all meta-analyses)?
1. Principled selection
of studies
2. Systematic coding of
each study for main
variables
Coding book,
Standardization,
Intercoder reliability
3. Trust the evidence, not the authors
 Record carefully what authors report and
how they report it,…
 But ultimately, analyze what the evidence
they present tells us, not what they say it
means…
 Seeking an objective view across studies of
the accumulated state of knowledge…
When aggregating and
averaging findings is
the goal, as in metaanalysis…
How do we compare,
combine, and interpret
findings across numerous
quantitative studies of the
same thing?
effect sizes &
confidence
intervals
Effect size: What is it?
An estimate of the
magnitude or strength of
a quantitative finding:
…how much difference?
…how much improvement?
…how closely related?
Effect sizes:
absolute scales
scale
Study 1
Study 2
1. percent
Experimental
group = 30%
better than control
Motivation &
achievement,
r = .36
Experimental
group = 20%
better than control
Motivation &
achievement,
r = .78
Pre-post TOEFL
score: 450  575
Pre-post TOEFL
score: 450  495
2. correlation
3. known measure
Q: What happens when studies to not report
findings on comparable scales?
Effect sizes:
standardized
 d is also simple to calculate and to interpret, and it
incorporates variability differences between groups
Effect size d =
The average of the experimental group minus the
average of the control group divided by the pooled
standard deviation of both groups.
Effect sizes:
standardized
 Difference between experimental and control
groups in standard deviation units (Cohen’s d)
No sizeable effect (d=0.10)
exper.
contr.
difference
Very large effect (d=3.00)
exper.
contr.
difference
Effect sizes for
meta-analysis
Study 1
effect size 1
Study 2
effect size 2
Study 3
effect size 3
Study 4
effect size 4
Study 5
effect size 5
Study …
…
Study …
= average effect size
…
d > .30
Interpreting effect
sizes: What does d
d > .80
d < .80
d < .30
really tell us?
"The terms 'small,' 'medium,' and 'large' are relative,
not only to each other, but to the area of behavioral science
or even more particularly to the specific content and
research method being employed in any given
investigation..." (Cohen, 1988, p. 25)
The average is not
enoughConfidence
Intervals
“The margin of error in an observation”
The stroll from the hotel to the University is, on
average, 10 minutes, plus or minus 3 minutes:
Lower bound=
7 minutes
Average=
10 minutes
95% certainty
Upper bound=
13 minutes
Confidence Intervals in
Meta-analysis
CIs tell us about the certainty
with which we can interpret
an average effect size.
Effect Sizes and Confidence Intervals in
Meta-analysis
N
Avg. effect of
instructional
treatment
49
K Mean
d
98
.96
SD
d
.87
95% 95%
CI
CI
lower upper
.78
1.14
We can be 95% certain that the actual effect of
instruction lies between .78 and 1.14
Why does it help to focus on effect sizes?
There is a statistically
significant difference in
mortality rates between
smokers and non-smokers.
Smoking up to half a pack a
day (or less than 10
cigarettes) a day increases
the chance of mortality by
40% when compared to Smoking two packs or more a
non-smokers
day increases the risk of
death by three times to
120% when compared to
e.g., effects of Smoking research
non-smokers
in the 1960s
U.S. Department of Health, Education, and Welfare Report, 1967
And what about small effects—
can they be important too?
r = .034
a truly ‘tiny’ effect!
d = .30
a small magnitude
effect!
Regular aspirin consumption and
decrease in heart attacks = 3.4%
decrease = at least 3 out of 100
who would not have a heart attack
if they regularly took aspirin.
Effects of reading tutorials for
underachieving students, the
same for untrained peer tutoring
and for highly trained teachers
engaging in longer hours of
tutoring. Both are important!
Interpreting effect sizes: complex, contextualized,
not absolute
What are the definitional features of all
syntheses (including all meta-analyses)?
1. Principled selection
of studies
2. Systematic coding of
each study for main
variables
3. Direct use of the evidence
reported (not the authors’
Effect sizes,
interpretations)
Confidence Intervals,
Other kinds of new data based on old
How do we do it?
An example of
Synthesis+meta-analysis
In applied linguistics, the first fullblown synthesis and meta-analysis:
Norris, J. M., & Ortega, L. (2000).
Effectiveness of L2 instruction: A
research synthesis and quantitative
meta-analysis. Language Learning,
50, 417-528.
Step 1: Problem Specification
inductive
Traditional
grammar
dictogloss
Recasts
Effects of Garden path
Input flood instruction
Input
enhancement
Input
processing
Consciousnessraising
Task-based
interaction
Focus of Norris & Ortega
RQ 3:
Effect of
outcome measures?
RQ 1&2
Instruction
Overall? By type?
L2 instruction
RQ 4:
Instructional
intensity?
RQ 5:
Durability
of effects?
RQ 6:
Quality of
research practices?
L2 learning
Step 2: Literature search
 1st electronic searches
 2nd other techniques:
Manual searches of 14 journals
Footnote chasing of 25 reviews
Footnote chasing of each study included
Step 3: Study eligibility criteria
Potentially relevant 250 >>
>> relevant for synthesis 77 >>
>> adequate for meta-analysis 49
Step 4: Coding of study features
Type of instruction: FonF, FonFS, explicit,
implicit
Type of outcome measure: metalinguistic,
selected, constrained, free
Intensity of instruction: Brief (less than 1 hr),
short (between 1 and 2 hrs), medium (between
3 and 6 hrs), long (more than 7 hrs)
Durability of effects: effect sizes on delayed
tests
Steps 5 & 6: Analyze, display, interpret
Findings RQ 1 & 2 (effectiveness):
Findings RQ 3 (type of measure)
Findings RQ 4 (intensity):
Findings RQ 5 (durability):
RQ 1-5 (meta-analysis part):
How effective is L2 instruction?
 Clearly more effective than no instruction or only
meaningful exposure to L2 d = 0.96 based on 49
studies
 Explicit instruction is superior in the short term to
implicit instruction  d = 1.13 versus d = 0.54, based
on 69 and 29 contrasts, respectively
 But focus on form and on formS are equally effective
 d = 1.00 form versus 0.93 formS, based on 43 and
55 contrasts, respectively
 Effects are durable  delayed post-tests from 22
studies: d = 1.02
RQ6 (synthesis part):
Research practices
Too many variables in a single design need to simplify
designs, increase N
No pre-test (18%), no true control group (83%) need
to always include both
Poor reporting standards (52% no sd, 84% no
instrument reliability, 57% no set alpha)  editors
need to demand better reporting
Misuse of statistical inference (no assumptions checked
or met, parametric stats on small samples, no
consideration of magnitude)  the field needs better
training in statistics if they insist on using such
methods
Since then…accumulation of meta-analyses
In 2000, when Norris & Ortega was published, there
were only 2 other published systematic syntheses in
applied linguistics. As of 2010, Norris & Ortega
identified 23 in their Timeline, most published since
2006.
Motivation: Masgoret & Gardner (2003)
Interaction: Keck et al. (2006), Mackey & Goo (2007)
Oral feedback: Russell & Spada (2006), Lyster & Saito (2010), Li
(2010)
Use of glosses in CALL: Taylor (2006 & 2009), Abraham (2008)
Some challenges for
research synthesis
in L2 research…
Publication bias: “file
drawer problem”
 Well known phenomenon, present in all the
social sciences (Rosenthal, 1979; Rothstein et
al., 2005)
 Little understood in applied linguistics
• Include fugitive literature
• Check for publication bias
Quality: “garbage in,
garbage out”
 The quality of a synthesis can only be as good
as the quality of the primary studies that are
synthesized in it...
 But how do we judge quality? Publication
type? Methodology ratings? Exclusions?
Ethics
Anticipate consequences of synthesis
Would it prematurely close the area
for research?
Would it taken as a personal attack on
researchers/labs?
What is the potential for findings to
be (mis)appropriated by audiences
(policy makers, teachers, …)?
High-tech statistication,
cookie-cutter approach
“... conceptual vacuum when technical
meta-analytic expertise is not coupled
with deep knowledge of the theoretical
and conceptual issues at stake in the
research domain under review…”
(Norris & Ortega, 2006b, p. 37)
Meta-analysis only, no interest in
quantitative synthesis of other
kinds/scope
Thomas (1994), (2006)
Ortega (2003)
?????
New-generation meta-analyses bypass
synthesis:
Li (2010)
Lyster & Saito (2010)
Plonsky (2011)
Spada & Tomita (2010)
Qualitative synthesis?
No interest either in exploring
qualitative synthesis… Only Téllez &
Waxman (2006) in applied linguistics
Yet, much contemporary research in applied
linguistics is qualitative and increasingly more
is mixed-methods… both worth synthesizing!
And there are options to draw from in education,
health sciences, and other fields!
Meta-ethnography
(Noblit & Hare, 1988;
see Téllez & Waxman, 2006)
Qualitative Comparative Analysis
(Ragin, 1999)
Critical Interpretive Synthesis
(Dixon-Woods et al., 2006)
Value?
There is huge value in systematic synthesis
(including meta-analysis):
Secondary research, yes... but:
• Empirically accountable
• Conceptually illuminating:
discovering new truths in old
data
Sustained progress…
• Much improvement in certain reporting practices (LL,
MLJ in particular)
• Larger N in primary studies = more trustworthy
analyses
• Use of increasingly sophisticated techniques in metaanalyses…
study quality criteria, weighting (by N, reliability,
variance), fixed/random effects models, sensitivity
analysis, fill & trim estimations, publication bias, etc.
• Use of meta-analytic software, e.g.:
http://www.meta-analysis.com
But only if applied linguists
cultivate“the will to synthesis”
“we envision synthetic methodologies as
advancing our ability to produce new
knowledge by carefully building upon,
expanding, and transforming what has been
accumulated over time ... However, ... all
knowledge is bound by context and
purpose...”
(Norris & Ortega, 2006b, p. 37)
Thank You
lortega@hawaii.edu
References
 Abraham, L. B. (2008). Computer-mediated glosses in second language




reading comprehension and vocabulary learning: A meta-analysis. Computer
Assisted Language Learning , 21, 199-226.
Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R., Miller, T., Sutton, A. J.,
et al. (2006). How can systematic reviews incorporate qualitative research? A
critical perspective. Qualitative Research, 6, 27-44.
Keck, C. M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006).
Investigating the empirical link between task-based interaction and
acquisition: A meta-analysis. In J. M. Norris & L. Ortega (Eds.), Synthesizing
research on language learning and teaching (pp. 91-131). Amsterdam: John
Benjamins.
Krashen, S., Long, M. H., & Scarcella, R. (1979). Accounting for child-adult
differences in second language rate and attainment. TESOL Quarterly, 13, 573582.
Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis.
Language Learning, 60, 309-365.
 Lyster, R., & Saito, K. (2010). Oral feedback in classroom SLA: A meta-




analysis. Studies in Second Language Acquisition, 32(2). Mackey, A., & Goo, J.
M. (2007). Interaction research in SLA: A meta-analysis and research
synthesis. In A. Mackey (Ed.), Conversational interaction in second language
acquisition: A collection of empirical studies (pp. 407-452). New York: Oxford
University Press.
Masgoret, A.-M., & Gardner, R. C. (2003). Attitudes, motivation, and second
language learning: A meta-analysis of studies conducted by Gardner and
associates. Language Learning, 53, 123-163.
Noblit, G. W., & Hare, R. D. (1988). Meta-ethnography : Synthesizing qualitative
studies. Newbury Park, CA: Sage.
Norris, J. M. (2012). Meta-analysis. In C. Chapelle (Ed.), Encyclopedia of
applied linguistics. Malden, MA: Wiley.
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research
synthesis and quantitative meta-analysis. Language Learning, 50, 417-528.
 Norris, J. M., & Ortega, L. (Eds.). (2006a). Synthesizing research on language





learning and teaching. Amsterdam: John Benjamins.
Norris, J. M., & Ortega, L. (2006b). The value and practice of research
synthesis for language learning and teaching. In J. M. Norris & L. Ortega
(Eds.), Synthesizing research on language learning and teaching (pp. 3-50).
Amsterdam: John Benjamins.
Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied
linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.
Norris, J. M., & Ortega, L. (2010). Research timeline: Research synthesis.
Language Teaching, 43, 461-479.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2
proficiency: A research synthesis of college-level L2 writing. Applied
Linguistics, 24, 492-518.
Ortega, L. (2010). Research synthesis. In B. Paltridge & A. Phakiti (Eds.),
Companion to research methods in applied linguistics (pp. 111-126). London:
Continuum.
 Plonsky, L. (2011). The effectiveness of second language strategy instruction:





A meta-analysis. Language Learning, 61(4).
Ragin, C. C. (1999). Using Qualitative Comparative Analysis to study causal
complexity. Health Services Research, 34 (5 -Part 2), 1225-1239.
Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for
the acquisition of L2 grammar: A meta-analysis of the research. In J. M.
Norris & L. Ortega (Eds.), Synthesizing research on language learning and
teaching (pp. 133-164). Amsterdam: John Benjamins.
Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and
type of language feature: A meta-analysis. Language Learning, 60, 263-308.
Taylor, A. M. (2006). The effects of CALL versus traditional L1 glosses on
L2 reading comprehension. CALICO Journal , 23, 309-318.
Taylor, A. M. (2009). CALL-based versus paper-based glosses: Is there a
difference in reading comprehension? CALICO Journal , 27, 147-160.
 Téllez, K., & Waxman, H. C. (2006). A meta-synthesis of qualitative research
on effective teaching practices for English Language Learners. In J. M.
Norris & L. Ortega (Eds.), Synthesizing research on language learning and
teaching (pp. 245-277). Amsterdam: John Benjamins.
 Thomas, M. (1994). Assessment of L2 proficiency in second language
acquisition research. Language Learning, 44, 307-336.
 Thomas, M. (2006). Research synthesis and historiography: The case of
assessment of second language proficiency. In J. M. Norris & L. Ortega
(Eds.), Synthesizing research on language learning and teaching (pp. 279-298).
Amsterdam: John Benjamins.
Download