Quantitative Research Syntheses: Meta-Analysis Advanced Biostatistics Dean C. Adams Lecture 14 EEOB 590C 1 Today •Methods for quantitative research synthesis •Brief history of methods for combining results from prior studies •Vote-counting •Combined probability method •Meta-analysis For further information on these approaches see: Cooper and Hedges (1994). Handbook of Research Synthesis. Hedges and Olkin (2000). Statistical Methods for Meta-Analysis. Rosenberg, Adams, and Gurevitch (2000). MetaWin: Statistical Software for Meta-Analysis. Vsn 2. 2 Synthesizing Prior Research •One important goal of science is synthesizing existing knowledge •What does a body of literature say about a particular topic? •Does existing published evidence support a particular hypothesis? •Is there a general ‘consensus’ about the importance of a hypothesis? •This is an obvious question to ask (what do we already know?) •Literature reviews are common approach: usually narrative •Other more quantitative methods exist •Three main approaches: •Vote-counting •Combined probability methods •Meta-analysis 3 Quantitative Research Synthesis: A Brief History •Quantitative research synthesis as old as modern statistics •First QRS: Pearson (1904) calculated average correlation from several studies on effectiveness of typhoid vaccine •Early 20th century: narrative reviews most common (and still are) •1930’s: several methods for combining probabilities developed (but infrequently used) •1970’s: ‘modern’ meta-analytic methods for combining effect sizes from independent studies developed by Glass (1976), Rosenthal etc. •Currently, meta-analytic methods common in social sciences and medicine; use in ecology and evolutionary biology is increasing 4 QRS: Beginnings •ANY research synthesis begins with a hypothesis (e.g., does smoking significantly increase cancer rates?) •Published studies* are then obtained via a literature search (e.g., keyword search on Web of Science, Scholar.Google, Biological Abstracts, etc.) •Unusable articles are discarded based on certain criteria (e.g., incomplete information) •Remaining articles are reviewed and summarized in some way *Note: unpublished studies that can be obtained from authors can also be included 5 QRS: Vote-Counting •Begin with hypothesis and set of published studies •Results from each study classified as 1 of 3 outcomes •Significant in expected direction •Significant in unexpected direction •Not significant •Calculate proportion of each class, and that class with highest proportion represents the ‘support’ (for, against, equivocal) •Advantages: quick and easy to calculate, intuitive •Disadvantages: overly conservative, low statistical power (# nonsignificant findings > expected # significant findings), ignores magnitude of effects of studies, not sensitive to sample sizes (all studies treated equally) 6 QRS: Combined Probability Methods •Begin with hypothesis and set of published studies (with significance levels) •Combine probabilities in some way •Many methods exist for various distributions (uniform, normal, t, X2, etc: see Becker, 1994 in Handbook of Research Synthesis: Cooper & Hedges) •Advantages: relatively easy to calculate, sample sizes taken into account (b/c use exact probabilities), general approach (can almost always obtain p-value from a study) •Disadvantages: don’t directly assess magnitude of study effects, cannot assess direction of effects, cannot assess whether effects are homogeneous •Often called omnibus tests (only depend on exact probabilities of each study) 7 Some Combined Probability Methods •Minimum P method (Tippet, 1931): uses uniform distribution, * 1/ n a 1 1 a significant if any study is significant at a-level: •Sum of logs method (Fisher, 1932): uses inverse X2 distribution, significant if P 21n log pi < 0.05 from X2 with 2n df (n is # studies and pi are study significance levels) •Sum of Z method (Stouffer et al., 1949): use normal distribution, n significant if probability of Z 1 Z pi n < 0.05 (Z(pi) are Zscores for study p-values) •Sum of p method (Edgington, 1972): uses uniform distribution, n n significant if P 1 pi n! < 0.05 (n is # studies and pi are study significance levels) 8 Example: Fisher’s Approach •Sum of logs method (Fisher, 1932): uses inverse X2 distribution, significant if P 21n log pi < 0.05 from X2 with 2n df (n is # studies and pi are study significance levels) •Pi: 0.06; 0.02; 0.035; 0.001; 0.24 •Log(pi): -1.22; -1.70; -1.46; -3; -0.62 •-2S(pi) = 16; PX2 = 0.096 NS 9 QRS: Meta-Analysis •Approach that combines weighted effect sizes for each study to assess overall significance •Allows the interpretation of the strength of the statistical finding, not just whether or not there is significance •M-A model can be generalized to address more complicated synthesis questions •Requires calculating an effect size and weight for each study •Meta-analysis has two steps: •Calculate effect sizes (and weights) for each study •Summarize effect sizes to address hypothesis (m-a model) 10 Effect Sizes •Effect size: statistical measure of the magnitude of factor in the data (how much does smoking increase cancer rates?) •Different types of primary data require different effect size estimates (some data types have several possible effect sizes) •Many test statistics are a form of effect size (e.g., t X X is a standardized mean difference effect size) •Use of effect sizes in QRS is desirable because they ‘standardize’ results from independent studies and express them in a common way (i.e., all results expressed as t-values) 1 •Weights are inverse of effect size variance: w v •Effect sizes are typically transformed so range is - to + 1 2 11 Effect Sizes From X and •Powerful effect sizes, but require much data from studies •Require means, sample sizes, and std from experimental and C E control group ( X & X , sC & sE, NC & NE) •Most are variants on standardized mean difference (like t-test) Name/s Glass’ Hedges’ g Cohen’s d Equation X X C s X E X C g S X E X C d Cohen E Variance C v vg NC NE 2 NCNE 2 NC 1 g2 NC NE C E N N 2 NC NE 2 v d Cohen Hedges’ d response ratio X E X C J d S XE ln R ln C X vd NC NE d2 C E 2 NC NE 2 N N NC NE d2 NCNE 2 NC NE v ln R s E 2 N E X E 2 N C N E N C N E 2 s C 2 C N C X 2 12 Effect Sizes From 2 X 2 Tables •Common in medicine: for data summarized by 2 X 2 table Treatment Response A No Response C Total Control B D Total A+B C+D nt = A + C nc = B + D N= A + B + C + D •From table calculate Name/s rate difference risk difference relative rate risk ratio rate ratio odds ratio relative odds Pt A nt and Equation RD Pt Pc Pt RR Pc OR Pt 1 Pc Pt 1 Pt Pc B nc : used for effect sizes Variance v RD Pt 1 Pt P 1 Pc c nt nc vln RR vln OR 1 Pt nt Pt 1 Pc nc Pc 1 1 1 1 A B C D 13 Effect Sizes From Correlations •Useful when only summary statistics are available •Convert all test-statistics to correlations, then convert these to 1 1 1 r v Fisher’s Z-transform: variance: z ln z n3 2 1 r •Common transformations statistic Z* conversion r N t r F r 2 Z r t2 t 2 df F F df (21) N *Probabilities can be converted to Z as standard normal deviates 14 Meta-Analytic Models •Summarize effect sizes to assess significance •Standard statistical summary variables: mean, variance •Cumulative Effect Size: weighted mean of effect sizes •Homogeneity Statistic: Quantifies variation in effect sizes (analogous to SS) Are effect sizes homogeneous? •Method of summary depends upon model for effect size variation •No structure: all studies belong to one ‘population’ •Categorical structure: studies belong to groups •Continuous structure: studies covary with continuous variable •For models with structure (categorical, continuous), variables are often called moderator variables (groups, covariate, etc.) •All models are actually special cases of same model 15 Meta-Analysis: No Structure •Model: All studies belong to same group •Example Ho: Is there an effect of competition on plant communities? n n 1 s •Cumulative effect size: E wi Ei wi variance: w 2 E i 1 • CI E ta / 2[ n 1] * s T i 1 i : E significant if it CI does not bracket 0.0 E •Homogeneity: Q i 1 n n wi Ei2 i 1 n wi E i i 1 n wi 2 or QT n wi Ei i 1 E 2 i 1 •Test against X2 (n-1 df) •Significant QT implies samples are NOT homogeneous •Implies structure in data: may be captured by a moderator variable 16 Meta-Analysis: Categorical Structure •Model: Studies belong to different groups •Example Ho: Does competition differ among habitats (terrestrial, marine, etc.)? w E 1 CI E j ta / 2[ k 1] * s E s •For each group calculate: E kj j QW j wij Eij kj i 1 ij ij wij i 1 2 Ej i 1 kj Q M wij E j E j 1 i 1 kj j wij j i 1 Test if each group is different from zero •Test if groups differ: QT QM Q E m kj 2 Ej 2 m m kj j 1 j 1 i 1 Q E QW j wij E ij E j 2 test QM vs. X2 with m-1 df, where m is # groups •Significant QM implies groups are different (significant QE implies there is still structure remaining) 17 Meta-Analysis: Continuous Structure •Model: Study effect sizes covary with continuous variable •Example Ho: Does competition intensity change with age? •Use Weighted GLM: Ei bo b1 X i n wi X i Ei i 1 b1 n n i 1 i 1 wi X i wi Ei n n wi b0 i 1 n wi X i n i 1 wi X i2 n i 1 wi i 1 2 wi Ei i 1 n b1 wi X i i 1 n wi i 1 (test slope and intercept by Z b1 b1 / sb1and Z b0 b0 / sb0 ) b12 •Homogeneity: QM sb2 (QM vs. X2 with 1 df) •Significant QM implies X explains significant component of variation in E 1 18 Meta-Analysis: Comments •What are we doing? Summarizing effect sizes as if ‘primary’ data •If wi= 1.0, then we’re calculating standard means & SS n E wi Ei i 1 n w i 1 i QT n wi Ei i 1 E 2 Also note, QT is partitioned, just like SS •Therefore, think of meta-analysis as ANOVA, regression, etc. •Meta-analytic models are actually Weighted GLM •Weighted GLM is a standard statistical method used to account for different weights of objects (recall PGLS for phylogeny) •Since meta-analysis is analyzed in this general framework, more complicated designs can also be tested (e.g., ANCOVA, 2-factor ANOVA, etc.) 19 Meta-Analysis: Weighted GLM •Represent analyses using standard matrix algebra E X E1 E E n 1 X 11 X p1 X 1 X X pn 1n (For no structure, X is vector of 1’s) •Solve model as: X WX Xt WE •QT, QM, etc. calculated as weighted SS t wi W0 0 0 0 0 W ‘in’ error term 0 (wi inverse of wn variance) 1 •Allows for simple-complicated designs •Can be generalized to multivariate (though multivariate effect sizes nearly impossible to obtain for a set of published studies!) 20 Meta-Analysis: Fixed vs. Random Models •All previous models are ‘fixed effects’ models •Fixed-effects model: assume only one true effect size shared by all studies (studies therefore only differ by sampling error) •Random-effects model: assume studies differ by sampling error 2 and random component (pooled study variance: pooled ) 2 • pooled found from running a fixed effects model 1 2 w • pooled is incorporated in weights for random model: v i ( rand ) 2 pooled i 2pooled QT n 1 n n wi i 1 wi2 i 1 n wi i 1 No Structure 2pooled Q E n m k m j wij j 1 i 1 i 1 kj wij i 1 kj wij2 Categorical Model 2pooled Q E n 2 n n n 2 w X 2 2X w X X wi i i i i i i n n 2 i 1 i 1 i 1 w w i i n n 2 i 1 i 1 n 2 w w X w X i i i i i i 1 i 1 i 1 Continuous Model 21 Meta-Analysis: Example •Competition in biological communities (Gurevitch et al., 1992) •Subset of data (N=43) from 3 habitats (terrestrial, lentic, marine) •Data: mean, std, n data from experiment/control •Ho: Does competition differ among habitats? Study Part of data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Habitat Nc Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial Lentic Ne 7 7 6 5 7 6 3 3 3 3 5 5 4 18 20 18 20 20 20 4 Xc 7 7 6 5 7 6 3 3 3 3 5 5 4 20 20 20 20 20 20 4 Xe 78.14 18.86 -1.8 -2.2 -2.1 -2.3 85.3 0 0 0 17 47 87 -0.113 -0.163 0.14 -0.184 -0.075 0.147 281.11 79.71 26 -2.1 -2.8 -3 -4.2 285.7 3 2 1.67 17 37 272 0.294 0.412 0.632 0.259 0.354 0.541 -201.03 Sc 40.65 9.17 0.49 0.224 0.265 0.49 115.008 0 0 0 7.603 10.286 37.712 0.255 0.588 0.38 0.326 0.487 0.34 158.038 Se 40.65 9.17 0.49 0.447 0.529 1.225 153.806 2.425 2.078 1.732 5.367 9.391 183.532 0.215 0.218 0.359 0.238 0.182 0.299 27.52 d 0.0362 0.7289 0.5651 1.5329 2.0139 1.8799 1.1806 1.3996 1.0889 1.0909 0 -0.9171 1.2142 1.6975 1.2709 1.3051 1.5213 1.1438 1.2062 3.6961 var(d) 0.2858 0.3047 0.3466 0.5175 0.4306 0.4806 0.7828 0.8299 0.7655 0.7658 0.4 0.4421 0.5921 0.1435 0.1202 0.128 0.1289 0.1164 0.1182 1.3538 22 Meta-Analysis: Results •E: Group effect sizes differed from zero (except lentic) •QM: Effect sizes differed among groups •Conclusion: competition occurs and differs among habitats Group #Studies E+ df 95% CI ------------------------------------------------------------Terrestrial 19 1.1417 18 0.8999 to 1.3 Lentic 2 4.1072 1 -7.1465 to 15.3609 Marine 22 0.7985 21 0.5419 to 1.0550 E++ 43 1.0099 Model df Q Prob(Chi-Square) -------------------------------------------------Between 2 16.4798 0.00026 Within 40 69.5016 0.00262 -------------------------------------------------Total 42 85.9814 0.00007 0.8408 to 1.1789 Lentic T errestrial Grand Mean Marine 0.00 1.46 2.92 Effect Size 4.38 5.84 23 Meta-Analysis: Publication Bias •Common concern is that only studies with significant results get published, resulting in bias •Can be assessed in a number of ways: •Funnel Plot: plot effect size vs. sample size: should be funnel shaped (larger variance with smaller n). If overabundance of extreme values (for given n) with lack of data ‘in’ funnel, might be publication bias 4.67 3.02 1.36 d -0.30 -1.96 2.00 6.50 11.00 Nc 15.50 20.00 24 Meta-Analysis: Publication Bias Cont. •Rank-Correlation Tests: Look at rank-correlation of standardized effect size vs. sample size E E E where v v 1v v •Fail-Safe Numbers: For the ‘file drawer problem’. How many non-significant studies must be added to change result to non Z ( p ) n significant (if large #, then result is robust) N Z * i i * i i j * i 2 n 1 R •N: # studies, Z(pi): Z-scores for study significance values, Za: 1-tail probability 1 i 2 a •Normal Quantile Plot: Standardized effect size vs. normal quantile (gaps or strange nonlinearities may indicate publication bias) S t a n 3.66 d a r d i 2.58 z e d E 1.50 f f e c t 0.43 S i z e -0.65 -2.27 -1.13 0.00 Normal Quantile 1.13 2.27 25 Cumulative Meta-Analysis •Rank studies by some criterion (e.g., year of publication) •Perform meta-analysis on 1st 2 studies, then 1st 3, 1st 4, etc. •Plot cumulative effect sizes (with CI) •Addresses when a synthesized result could be determined 26 Meta-Analysis: Resampling Tests •Adams et al., 1997 (Ecology) proposed some resampling methods •Randomization for assessing significance of Q-statistics •Bootstrapping for assessing CI of cumulative effect sizes •Removes assumptions of testing vs. X2 distribution Adams et al. 1997. Ecology. 78:1277-1283. See also: Rosenberg, Adams, Gurevitch. 2000. MetaWin. Sinauer Assoc. 27 Phylogenies and Meta-Analysis • • • When studies come from a set of related taxa, phylogenetic nonindependence is an issue Phylogenetic meta-analysis recently developed (Adams, 2008) • Both PGLS and meta-analysis are GLS models, so can be combined 1 t t X WX X WE • M-A: 1 t 1 t 1 X Σ X X Σ Y • PGLS: Steps 1. SVD of S: obtain transformation matrix (D) [see Garland and Ives, 2000. Am. Nat.] 2. Transform X and E as: Enew DE Xnew DX 3. Solve meta-analysis with transformed data p ma X WXnew Xtnew WEnew t new • 1 NOTE: this is a fixed effects, Brownian motion model (method Adams. 2008. Evolution. 62:567-572. generalized by Lajeunesse, 2009) Also: Lajeunesse. 2009. Am. Nat. 174:369-381. 28