Meta-analysis - The University of Tennessee at Chattanooga

advertisement
Lecture 14 - Meta-Analysis and Quasi-Experiments
From The American Heritage College Dictionary. 3rd Ed. (1997). Houghton Mifflin Company.
The analysis of analyses.
The units of analysis are not individual participant scores but whole study scores.
Two major functions – identical to the functions of all research, but on a larger scale . . .
1) Descriptive: To summarize results (usually relationships between DVs and IVs) across
collections of studies
Summaries of relationships across studies used to be done with narrative reviews.
Problems with narrative reviews:
No systematic way of synthesizing the results of individual studies.
Conclusions were variable, not reliable.
Two reviewers might review the same studies and arrive at different conclusions.
Part of the problem was due to the tendency to report findings as “significant” or “not significant”.
Such a focus on reporting left the impression that scientists didn’t know what they were doing – else
why didn’t they all report identical “significance” results?
The age of meta-analysis changed the culture to one of reporting quantitative results for studies,
rather than simply “significant” or “not significant”.
The emphasis on quantitative results from individual studies lead to the need to summarize those
quantitative results, which is part of meta-analysis.
Quasi-Experiments and Meta-Analysis Lecture - 1
8/17/3
Why we need descriptive meta-analyses?
Wonderlic <---> Conscientiousness scores from 7 studies conducted at UTC . . .
Red’d correlations are
significant (p < .05).
Quasi-Experiments and Meta-Analysis Lecture - 2
8/17/3
Example of a “descriptive summarize relationships” meta-analysis:
Connolly, J. J., Chockalingam, Viswesvaran. (2000). The role of affectivity in job satisfaction: a
meta-analysis. Personality and Individual Differences, 29, 265-281.
This meta-analysis was related to the dispositional theory of job satisfaction – persons are satisfied
on the job because that’s the way they are, not because of the type of job they have or the way
they’re treated.
The researchers found mean correlation of .49 between a measure of positive affectivity and job
satisfaction across studies. Suggests that up to 25% of variance in job satisfaction could be due to
individual differences in affectivity.
This could have been done within a single organization, but it carries much more weight when done
across multiple organizations because the mean across organizations is less likely to be influenced
by possible weirdness of one organization.
2) Inferential: To examine factors that affect sizes of relationships across collections of
studies. Such factors are called moderators.
Example of a “moderation” meta-analysis:
Griffith, R. W., Hom, P. W., & Gaertner, S. (2000). A meta-analysis of antecedents and correlates
of employee turnover: Update, moderator tests, and research implications for the next millennium.
Journal of Management, 26, 463-488.
They found that correlations between Pay and turnover depended on whether the organization
offered reward contingencies or not. High pay -> low turnover only when reward contingencies
were offered. Otherwise the correlation was essentially zero. So presence of reward contingencies
moderated the pay -> turnover relationship.
This study could not have been conducted in one organization with only one policy. It required
multiple organizations, some with reward contingencies and others without them.
Why not conduct just one big study?
1. Because it is likely not possible for a single researcher to conduct a study as big as the
combination of many other studies.
Some meta-analyses have Ns of 10,000 or more.
2. Because the use of multiple studies increases generalizability of the results – external validity.
From an interview with Frank Schmidt . . .
Those advocating it usually argue that a single large N study can provide the same precision of estimation as the
numerous smaller N studies that go into a meta-analysis. But with a single large N study, there is no way to
estimate SD-rho or SD-delta, the variation of the population parameters. This means there is no way to
demonstrate external validity. A meta-analysis based on a large number of small studies based on different
populations, measurement scales, time periods, and so on, allows one to assess the effects of these
methodological differences.
Quasi-Experiments and Meta-Analysis Lecture - 3
8/17/3
6 Steps from Understanding Meta-Analysis. Joseph A. Durlack
1. Formulating the research question
Which effects do we wish to summarize?
What relationships do we wish to study?
Hunter, J. E., & Schmidt, F. L. (2004). Methods of
Meta-Analysis: Correcting Error and Bias in
Research Findings. (2nd Ed). Thousand Oaks,
CA: Sage.
2. Performing the literature search for articles to include in the M-A.
A. Computerized searches of databases, such as PsychInfo
B. Manual Searches of journal tables of contents.
C. Searches of reference lists of relevant articles.
D. Calls or emails to people doing research in the area.
E. Being cognizant of the round-file effect – positive bias in rs due to looking only at published
articles.
From McAbee, S. & Oswald, F. L. (2013). The criterion-related validity of personality measures for
predicting GPA: A meta-analytic validity competition.
Literature Search
A systematic search of the literature on the relationship between
the Big Five personality traits and GPA in postsecondary student
populations was conducted to identify relevant studies for the
current meta-analysis. Articles published between January 1992
(coinciding with the publication of the NEO-FFI and NEO-PI–R;
Costa & McCrae, 1992; and Goldberg’s, 1992, unipolar Markers)
and February 2012 were considered for inclusion. Studies were
located using several online databases, including PsycINFO,
ERIC, and Google Scholar. The following keywords were entered
into the respective databases to locate articles in various combinations:
personality, Big Five, Five Factor Model, academic performance,
academic achievement, GPA, and grade. Following
database searches, the following online journal records were manually
searched, starting from 2005 to online-first publications as of
February 2012: Contemporary Educational Psychology, Human
Performance, Intelligence, Learning and Individual Differences,
Journal of Applied Psychology, Journal of Educational Psychology,
Journal of Personality, Journal of Personality and Social
Psychology, Journal of Research in Personality, Personality and
Individual Differences. Finally, the references of previous metaanalyses
and large reviews on personality and academic performance
were searched to identify additional studies (Noftle &
Robins, 2007; O’Connor & Paunonen, 2007; Poropat, 2009; Richardson
et al., 2012; Trapmann et al., 2007). Article abstracts, and
Method sections as necessary, were examined to determine
whether the study included measures of personality related to
academic performance outcomes (e.g., GPA, course grades).
Quasi-Experiments and Meta-Analysis Lecture - 4
8/17/3
3. Coding each study.
For each study, identifying which value that study has on each independent variable being
investigated .
e.g. Suppose the dependent
variable is efficacy of therapy.
For each study, code Type of
therapy:
Behavioral or Talking
Group or individual
Professional therapist or lay person
In the McAbee and Oswald study
mentioned above, the study was
coded for type of Big Five
questionnaire used, among other
things. On the left is a table from
that article describing the coding.
Quasi-Experiments and Meta-Analysis Lecture - 5
8/17/3
4. Compute the appropriate effect size for each study.
Most meta-analyses use either . . .
A. A mean difference measure analogous to d = (μ1-μ2)/σ
This measure or measures equivalent to it have been defined for many common research
designs. Oftentimes, can compute from reports of t or F.
B. A correlation measure, i.e., Pearson r.
Regardless of whether d or r is computed, you must compute the same measure from each study.
When you publish research, please think of meta-analyses.
Can what you report be used for a meta-analysis?
Quasi-Experiments and Meta-Analysis Lecture - 6
8/17/3
5. Statistical analysis.
From each study, a measure of effect size is computed. May be more than 1 measure if more than 1
dv was computed.
Descriptive Studies
A weighted mean effect size is computed using the formula
(Illustrated assuming the effect size for study i is ri based on a sample of size Ni.
ΣNiri
r-bar = -------------- should be called the weighted r-bar
ΣNi
The variance of the effect sizes is computed using the formula
ΣNi(ri- r-bar)2
S2r = --------------------ΣNi
The mean and variance are adjusted for
A) Reliability of the independent variable
B) Reliability of the dependent variable
C) Range restriction of the dependent variable.
The variance is also adjusted for
D) Sampling error.
These adjustments are somewhat complex, but doable with a good spreadsheet program..
There are several meta-analysis programs available, some for free.
Quasi-Experiments and Meta-Analysis Lecture - 7
8/17/3
6. Testing for existence of systematic differences between studies
Skipped the rest of this page in 2015
Cochran’s Q statistic
N
X K-1 = ------------------ * S2r
(1-r-bar2)2
2
If the X2 is not significant, then it will be assumed that the effect sizes came from only one
population of studies and the mean is an estimate of the mean effect size of that population of
studies.
If the X2 is significant, then it must be assumed that there are two or more populations represented
by the group of studies, and some attempt to distinguish among them must be made.
Cochran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10,
101–129.
Higgins and Thompson’s I2 statistic
Higgins and Thompson’s (2002) I2 value indexes the proportion of variance due to between-study
differences.
Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in meta-analysis. Statistics
in Medicine, 21, 1539–1558. doi:10.1002/ sim.1186
Moderation effects: Identifying sources of systematic differences between studies.
Individual studies are treated as rows of a data matrix.
Individual study effect sizes are the scores that are analyzed.
Each column contains either an effect size dependent variable or a potential moderating variable.
t-tests, analysis of variance, or regression analysis may be used to examine the relationship between
the dependent variables and potential moderators.
Since the dependent variables in these analyses are measures of relationships, a significant t,
or F, or significant r in a regression analysis is an indication of moderation.
Quasi-Experiments and Meta-Analysis Lecture - 8
8/17/3
Examples of results of a meta-analysis:
From . . .
Weighted mean of observed,
unadjusted correlations.
Estimate of population
correlation adjusted for
predictor and criterion
reliability.
For Selection purposes,
estimate of population
correlation adjusted only
for criterion reliability.
Most people report the operational validity when estimating a population correlation for selection
situations.
They report the true-score correlation when estimating a population correlation in contexts other
than selection.
Quasi-Experiments and Meta-Analysis Lecture - 9
8/17/3
Example 2 from . . .
Ignore this – it refers to parts of the table not
copied here.
Quasi-Experiments and Meta-Analysis Lecture - 10
8/17/3
THE ROLE OF EMOTIONAL INTELLIGENCE IN LEADERSHIP
EFFECTIVENESS: A META-ANALYSIS
A Thesis Presented for the
Master of Science Degree
The University of Tennessee at Chattanooga
Ashleigh D. Farrar
May 2009
Abstract
Leaders are an essential element of the business world. While good leaders can provide many benefits for
an organization, unsuccessful leaders can be detrimental. The notion that emotional intelligence plays a
part in whether a leader is effective or not effective has recently been introduced. This study sought to
unify the literature evaluating the possible link between emotional intelligence and leadership
effectiveness. Meta-analytic techniques were used to analyze this relationship. Results revealed that
overall, there is a positive relationship between emotional intelligence and leadership effectiveness. Also,
while the type of emotional intelligence measure used served as a moderator to this relationship, a second
and third meta-analysis supported the overall positive relationship of emotional intelligence and
leadership effectiveness for each type of EI.
Results
The central aim of the present study was to examine the overall relationship of EI and leadership
effectiveness. The initial meta-analysis was conducted using all of the included studies. The results of
this meta-analysis are provided in Table 1. A total of 20 correlations were used from 20 studies, with a
total sample size of 3,295. After correcting for unreliability in both EI and leadership effectiveness
measures, the sample-size-weighted mean rho linking the constructs was .458. The 80% credibility
interval did not include zero, indicating that there was a relationship between EI and leadership
effectiveness. These results supported Hypothesis 1.
Ashley’s data
Quasi-Experiments and Meta-Analysis Lecture - 11
8/17/3
Ashley’s results
Table 1: All Studies
k
Total Sample Size
Mean Rho
Variance of Rho
80% Credibility
20
3295
0.457
0.028
.24-.67
Table 2: EI Mixed Model Measures
k
12
Total Sample Size.
2265
Mean Rho
0.427
Variance of Rho
0.030
80% Credibility
.20-.65
Table 3: EI Ability Model Measures
k
8
Total Sample Size.
1030
Mean Rho
0.536
Variance of Rho
0.013
80% Credibility
.39-.68
Quasi-Experiments and Meta-Analysis Lecture - 12
8/17/3
From Frank Schmidt’s presentation at the 2012 RCIO conference
Table 1. Selection methods for job performance
Multiple
R
Operational
Selection procedures/predictors
GMA testsa
Integrity testsb
Employment interviews (structured)c
Employment interviews (unstructured)d
Conscientiousnesse
Reference checksf
Biographical data measuresg
Job experience h
Person-job fit measuresi
SJT (knowledge)j
Assessment centersk
Peer ratingsl
T & E point methodm
Years of educationn
Interestso
Emotional Intelligence (ability)p
Emotional Intelligence (mixed)q
GPAr
Person-organization fit measuress
Work sample testst
SJT (behavioral tendency)u
Emotional Stabilityv
Job tryout procedurew
Behavioral consistency methodx
Job knowledgey
validity
(r)
.65
.46
.58
.60
.22 !
.26
.35
.13
.18
.26
.37
.49
.11
.10
.10
.24
.24
.34
.13
.33
.26
.12
.44
.45
.48
Gain in
validity
with GMA
Over GMA
.78
.76
.75
.70
.70
.68
.67
.67
.67
.66
.66
.66
.66
.66
.65
.65
.65
.65
.65
.65
.65
.65
.65
.65
.130
.117
.099
.053
.050
.036
.023
.020
.018
.014
.013
.009
.008
.008
.007
.005
.004
.004
.003
.000
.000
.000
.000
.000
% gain in
validity
20%
18%
15%
8%
8%
6%
4%
3%
3%
2%
2%
1%
1%
1%
1%
1%
1%
1%
0%
0%
0%
0%
0%
0%
Standardized
Regression weights
SuppleGMA
ment
.63
.52
.48
.67
.65
.91
.66
.64
.76
.78
.55
.65
.65
.65
.70
.63
.71
.64
.69
.64
.64
.63
.64
.65
.43
.43
.41
.27
.26
-.34
.17
.16
-.19
-.19
.16
.11
.10
.10
-.11
.09
-.10
.07
-.07
.03
.02
.02
.02
-.01
Table 2. Selection methods for training performance
Operational
Selection procedures/predictors
validity
(r)
Multiple
R
with GMA
Gain in
validity
% gain in
validity
Over GMA
Standardized
Regression weights
SuppleGMA
ment
GMA testsa
.67
Integrity testsb
.43
.78
.109
16%
.65
.40
Biographical data measuresc
.30
.74
.073
11%
1.04
-.50
Conscientiousnessd
.25
.73
.061
9%
.69
.29
Employment interviewse
.48
.72
.051
8%
.57
.28
Reference checksf
.23
.71
.038
6%
.67
.23
Years of educationg
.20
.70
.029
4%
.67
.20
Interestsh
.18
.69
.024
4%
.67
.18
Peer ratingsi
.36
.67
.002
0%
.70
-.06
Emotional Stabilityj
.14
.67
.001
0%
.66
.03
Job experience (years)k
.01
.67
.000
0%
.67
.01
Note. Operational Validity estimates in parentheses are what is reported in Schmidt and Hunter (1998, Table 2).
Selection procedures whose operational validity is equal to and greater than .10 are listed in the order of gain in
operational validity.
Unless otherwise noted, all operational validity estimates are corrected for measurement error in the criterion measure
and indirect range restriction (IRR) on the predictor measure to estimate operational validity for applicant populations.
Quasi-Experiments and Meta-Analysis Lecture - 13
8/17/3
McAbee & Oswald’s results – Prediction of gpa
Quasi-Experiments and Meta-Analysis Lecture - 14
8/17/3
Quasi-Experiments and Meta-Analysis Lecture - 15
8/17/3
Quasi-Experiments Not in 2015
True Experiment
Campbell & Stanley
Cook & Campbell
Shadish, Cook, & Campbell
Design in which individual participants are randomly assigned to conditions.
Quasi-Experiment
Anything else.
Most often: Designs in which different treatments are assigned (perhaps randomly) to
already existing groups.
Always: Designs comparing Subject variables, such as gender, graduate program, age, any
prior condition
Sometimes: Designs for which participants could be randomly assigned but are not for
one reason or the other.
Some authors differentiate designs for which assignment of conditions to groups is possible
(e.g., Training programs to different buildings) from those for which it is not (e.g., Gender).
Pretest-Posttest with Nonequivalent Groups Design
Also called the Nonequivalent Control Groups Design with Pretest (NECG with Pretest
Design)
The most frequently discussed Quasi-Experimental Design
The design involves pretests on both groups, making one the experimental group and the other the
control group, then taking a post observation of both.
Diagrammed as
Pre
Condit
Post
O1
XE
O2
-------------------------------O1
XC
O2
The line signifies nonequivalence.
The true experimental counterpart is the Randomized Groups Design. (RG Design)
O1
XE
O2
O1
XC
O2
R
Note that in the Randomized Groups design, a pretest is not necessary, because the groups are
theoretically equivalent prior to the administration of research conditions.
But it's certainly possible and advisable to take a pretest in the Randomized Groups design. With the
appropriate analysis, having a pretest can increase power of the comparison between the groups.
Quasi-Experiments and Meta-Analysis Lecture - 16
8/17/3
Examples
1. Program to compare efficacy of a new diet program vs. an existing diet program.
O1 is weight prior to beginning the diet
O2 is weight after the diet
XE is the new diet
XC is the old diet
Obviously, if participants could be randomly assigned to the diet type, then this would be a
Randomized Groups design.
But if individual participants could not be randomly assigned, then the design would be NECG.
Pre
Condit
Post
Prior Weight
New Diet
Post Weight
-----------------------------------------------------------:Prior Weight
Old Diet
Post Weight
2. Comparison of I-O vs. Research program students in statistics.
O1 is knowledge of statistical concepts taught in the course prior to the start of the semester
O2 is the same test of knowledge of statistical concepts at the end of the course
XE is the I-O program (or it could be the Research program)
XC is the Research program (or it could be the I-O program)
Pre
Condit
Post
Pretest
I-O
Posttest
-------------------------------------Pretest
Research Posttest
Quasi-Experiments and Meta-Analysis Lecture - 17
8/17/3
A few outcomes of the NECG design and their interpretations . .
Assuming pretest and posttest are commensurable
1. Best possible Outcome
Analysis: Between Subjects / Within Subjects analysis of variance
Group Factor:
Level 1 = Experimental Group
Level 2= Control Group
Time Factor.
Level 1 = Pretest measure
Level 2 = Posttest measure
1. Main Effect of Time Factor could be found if both groups increase performance from pre- to
posttest.
2. Main Effect of Group Factor could be found depending on the specific results.
3. But Interaction is the key. If the interaction is significant and of the form shown below, this
would indicate that the Experimental group increased significantly more than the Control group.
Moreover, since the Experimental group performance was below that of the Control group on the
pretest, any explanation of the result in terms of pre-existing differences would be difficult.
E2
C2
C1
E1
Pre
Treatment
Post
Quasi-Experiments and Meta-Analysis Lecture - 18
8/17/3
2.
2nd best possible outcome
One Between Subjects / One Within Subjects analysis: Significant effects
1. Main Effect of Time Factor might be found if both groups increased from pre- to posttest.
2. Main Effect of Group Factor might be found in some circumstances.
3. Interaction is the key result here, though.
E
C
E
C
Again, an explanation in terms of pre-existing groups would be difficult to accept in view of the
similarity of performance on the pretest.
Quasi-Experiments and Meta-Analysis Lecture - 19
8/17/3
3.
The Dark side of the NECG with Pretest design: Regression to the Mean or Maturation
One Between Subjects / One Within Subjects analysis:
Significant effects
1. ME of Time possibly
2. ME of Group, possibly
3. Interaction is significant.
The point being made here is that you can't blindly follow the significant effects. You must consider
the pattern of differences. Even if the interaction were significant, it might be argued that the
interaction was due to regression to the mean or to maturation effects.
C
C
E
E
Maturation Explanation
C
C
E
E
RTTM Explanation
Children enrolled in Head Start program increased more from pretest to posttest than children not
enrolled. But the result may have been due to differential regression to the mean. The Head Start
children may have been from the lower tail of the distribution on the pretest. They would be
expected to score higher on the posttest simply due to regression to the mean. Those in the control
group may have been more likely to be near the middle of the distribution, and thus would not be
expected to change much from pretest to posttest.
C
E
Quasi-Experiments and Meta-Analysis Lecture - 20
8/17/3
4.
Main effects only outcome
One Between Subjects / One Within Subjects analysis: Significant effects
1. ME of Time possibly
2. ME of Group, possibly
3. Interaction not significant.
Again, you can't blindly follow the significant effects. You must consider the pattern of differences.
This outcome signifies nothing. The control and treatment groups increased by the same amount.
The difference between them at pretest was the same as at posttest.
E
C
E
C
Example from Shadish, Cook, & Campbell (2002).
Carter, Winkler, & Biddle (1987) evaluated the effects of the National Institutes of Health (NIH)
Research Career Development Award, a program designed to improve the research careers of
promising scientists.
Those who received the RCDA did better on the posttest than those who did not receive the
award.
But, the difference on the pretest was about the same as the difference on the posttest,
So the final difference may have been due to pre-existing differences –
Those who got the award were better to start with (which is why they got the award) and they
were about as much better at the end of the study.
Quasi-Experiments and Meta-Analysis Lecture - 21
8/17/3
Download