Estimating the prevalence of ‘file drawer’ studies in coordinate-based meta-analysis Pantelis Samartsidis1, Angela Laird2, Peter Fox3, Timothy D. Johnson4, and Thomas Nichols1 1University of Warwick 2 Florida International University 3 4 UTHSCSA University of Michigan The ‘file drawer’ problem is one type of bias in meta-analysis [Easterbrook, 1993]. It refers to research studies that are initiated but are not published due to lack of significance, either by cause of authors’ hesitation to submit or perhaps because of rejection by journals that are reluctant to publish negative results [Ioannidis, 2005]. In the field of fMRI, David et al [2013] found evidence of such publication biases, but to date there has been no work on estimating the fundamental ‘file drawer’ quantity, the number of missing studies. • With no covariates, the fit is easily visualised, and the Poisson is seen to be a poor fit, while the Negative Binomial fits well. 0.15 Results 0.15 Introduction ● ● ● ● ● ● ● Density ● Density ● ● ● ● ● 0.05 ● 0.05 In this work, we propose a method for estimating the number of non-significant studies omitted from a large cohort of studies, making use of the coordinate data in the BrainMap database [Laird et al, 2005]. 0.10 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 • Extensive database consisting of coordinate data obtained from 2562 scientific papers. • Several experiments (questions) per study. ● 15 ● ● ● ● ● ● 20 ● ● ● ● ● ● ● 0.00 ● ● BrainMap ● ● ● 0.00 Motivating dataset: BrainMap database ● ● 25 0 5 10 15 20 Foci per contrast Foci per contrast Poisson Negative binomial 25 • Based on AIC, the Negative Binomial GLM fits the data best (AIC=14164.46) as compared to the simple Negative Binomial (AIC=14175.8), the Poisson GLM (AIC=21911.07) and the simple Poisson (AIC=22379.28). • Invaluable resource meta-analyses. A first look at the data • We consider the 12292 experiments (i.e. contrasts) as the units of observation • Bar plot shows the empirical distribution of the total number of foci per experiment (Fig. 1; zoom-in show in Fig. 2). • We fit the GLM assuming a Negative Binomial. The model predicts that for the mean sample size and the mean year the prevalence of zero-foci experiments is p̂Z = 9.02 (CI [7.32, 10.72]), averaged across all levels of the factors. • Each value or level of a covariate implies a different distribution and pZ . Here we show Bootstrap means and confidence intervals for pZ as a function of publication year and sample size. Increasing year and study size are associated with reduced missing studies: • No zero-foci results. Objectives 20 • Use intrinsic statistical characteristics of (non-zero) count data to infer zero counts. 20 • Estimate the number of unobserved zero-foci studies . mean 95% CI mean 95% CI 1990 1995 60 80 100 2005 2010 10 2015 0 10 20 Publication Year 0 5 10 15 Foci per contrast Foci per contrast Fig. 1 Fig. 2 20 25 30 • For each level of a factor we have fitted values that can be used to generate p̂Z (below left, mean + Bootstrap CI). Likewise, a given factor level has a predicted distribution (below right). The fewer fewer foci reported, the more likely zero foci are. ● 0.20 ●● ● Density ● 0.15 0.20 ● 0.10 ● ● ● Density Density Zero-Truncated Modelling ● 0 10 ^ p Z Interpretation: the total number of missing experiments per 100 published. • To avoid issues with correlated data we randomly choose one contrast from each study. • Model comparison with the AIC. • Confidence intervals with the Bootstrap. 15 0.05 20 ●●●● 25 0 ● ● ● ●● ●● 5 10 15 ●● ●●● 20 Normal mapping Other 0.20 ● ● ● ●●●● ●●● 25 ● ● ● ● ● ● ●● ●● 0 5 10 15 ●● ●●● 20 ●●●●● ●● 25 Foci per contrast ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 ● 0.05 ● 0.10 0.15 0 ● Foci per contrast 0.20 0.05 20 ●●●●● ● ● ●● ●● 15 ●● ●●● 20 Foci per contrast ●●●● ●●●● 25 ● ● ● ● 0.00 5 ●●● 15 0.00 Disease effects ● 10 ●● ● Foci per contrast ●● ● ã Total number of participants: 1-395 with median 12 ã Year: 1985-2014 with median 2004 ã Context: Disease effects, Drug effects, Linguistic effects, Normal Mapping and Other (any other that appears less than 30 times) ã Paradigm: 27 levels, e.g. Reward, Face Monitor/Discrimination, Semantic Monitor/Discrimination, n-back. ●● Density Density • Once θ is estimated, the prevalence of zero-count studies, pz, can be estimated as: • Covariates used, from BrainMap: ● 0.15 Drug effects ● • Study characteristics can be considered as covariates in a Generalised Linear Model: T E [Yi | X] = exp X β + i , ● 0.10 n = 1, 2, . . . 0 5 Linguistic effects ● • For the purposes of this work we will assume that π(·) is the Poisson or Negative Binomial probability mass function. ● 0.00 ● ● 0.00 ● ● ● ● ● ● ● ● ● 0.05 • A zero-truncated count distribution occurs when we restrict the domain of a count distribution π (n | θ) to the positive integers: Normal mapping ●● ● ● ● ● Linguistic effects 0.15 0.20 Drug effects 0.15 Other ● 0.10 Methods P (Y = 0) pz = × 100 1 − P (Y = 0) 50 Participants Disease effects π(n | θ) πZT(n | θ) = P (Y = n) = , 1 − π(0 | θ) 40 0.10 40 2000 0.05 20 5 0 1985 0.00 0 15 ^ : zero−foci experiments per 100 published p Z 15 10 0 5 ^ : zero−foci experiments per 100 published p Z 0.14 0.12 0.10 0.08 0.04 ? 0.00 0.02 0.06 Density 0.08 0.06 0.04 0.02 0.00 Density 0.10 0.12 0.14 • Account for covariates (e.g. publication year, sample size). 0 5 10 ● ● ●● ●● 15 ●●● ●●●●● 20 ●●●● 25 Foci per contrast Discussion Conclusion • Negative Binomial distribution provides a better fit than the Poisson distribution • Magnitude of zero-foci results varies, but is generally around 10 per 100 experiments. Some of these could be attributed to negative contrasts reported in published papers, but some are surely are never published. • Covariates are significant (due to large database) but each only have small effect. Future work • Include some published studies may indeed report negative results, not registered in BrainMap. • Bayesian methodologies can allow to account for correlation between the experiments of a given study and maybe some complex censoring mechanisms (e.g. zero-count experiments that are being upgraded). References [1] David et al 2013, PLoS ONE 8(7): e70104. [2] Easterbrook (1991), The Lancet 337: 867-872. [3] Ioannidis (2005), PLoS Medicine 2(8): e124. [4] Laird et al (2005), Neuroinformatics 3: 65-78. [5] Rigby et al (2005), Applied Statistics 54: 507-554.