Accepted Manuscript Bayesian statistical methods in genetic association studies: Empirical examination of statistically non-significant Genome Wide Association Studies (GWAS) meta-analyses in cancers: A systematic review Jae Hyon Park, Dong Il Geum, Michael Eisenhut, Hans J. van der Vliet, Jae Il Shin PII: DOI: Reference: S0378-1119(18)31096-5 https://doi.org/10.1016/j.gene.2018.10.057 GENE 43313 To appear in: Gene Received date: Accepted date: 12 September 2018 19 October 2018 Please cite this article as: Jae Hyon Park, Dong Il Geum, Michael Eisenhut, Hans J. van der Vliet, Jae Il Shin , Bayesian statistical methods in genetic association studies: Empirical examination of statistically non-significant Genome Wide Association Studies (GWAS) meta-analyses in cancers: A systematic review. Gene (2018), https://doi.org/ 10.1016/j.gene.2018.10.057 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT Original article Bayesian statistical methods in genetic association studies: empirical examination of statistically non-significant Genome Wide Association Studies (GWAS) meta-analyses in RI PT cancers: a systematic review Jae Hyon Park, MD*1, Dong Il Geum, MS*1, Michael Eisenhut, MD2, Hans. J. van der Vliet, NU SC MD, PhD3, Jae Il Shin, MD, PhD4. 1. Department of Radiology, Yonsei University College of Medicine, Severance’s Hospital, MA Seoul, Korea. 2. Luton&Dunstable University Hospital NHS Foundation Trust, United Kingdom Netherlands. PT E D 3. Department of Medical Oncology, VU University Medical Center, Amsterdam, The 4. Department of Pediatrics, Yonsei University College of Medicine, Severance Children’s *3874 words CE Hospital, Seoul, Korea. AC Correspondence to Jae Il Shin, M.D., Ph.D. Address: 50 Yonsei-ro, Seodaemun-gu, C.P.O. Box 8044, Department of Pediatrics, Yonsei University College of Medicine, Seoul 120-752, Republic of Korea Tel: +82-2-2228-2050; Fax: +82-2-393-9118; E-mail: shinji@yuhs.ac *These authors contributed equally to this work. The authors declare no potential conflicts of interest 1 ACCEPTED MANUSCRIPT Key messages Although there have been many articles on the meta-analysis of GWAS in the fields of cancer, very few studies mention the possibility of false positive result and various caveats in determining whether a genotype-phenotype association is truly noteworthy. PT It is still possible to obtain “noteworthy” Bayesian results (i.e. FPRP and Bayesian False Discovery Probability (BFDP)) even at higher P-values that are in general not considered statistically significant for GWAS and our result shows that these false positive results are much more prevalent in the borderline genetic association studies. RI While the initial GWAS results are found non-noteworthy through either FPRP or BFDP, results of the meta-analysis of these GWAS results are found noteworthy in many cases mainly due to the effects of pooled sample size. NU SC We should be cautious when applying the Bayesian statistics in the field of genetic epidemiology for the evaluation of false positive results and in screening truly noteworthy genotype-phenotype associations that are used for future bench-to-bed translations. AC CE PT E D MA Bayesian statistical methods (i.e. FPRP and Bayesian False Discovery Probability (BFDP)) 2 ACCEPTED MANUSCRIPT Abstract A Bayesian statistical method was developed to assess the noteworthiness of a single nucleotide polymorphism (SNP)-phenotype association that shows statistical significance in various observational studies, but it has seldom been applied in the GWAS meta-analysis in cancers. Data (i.e. allelic frequency, odds ratio, 95% confidence interval, etc.) on various PT SNP-cancer associations were extracted from meta-analysis of GWAS and the National RI Human Genome Research Institute (NHGRI) Catalog of Published GWAS and were used to SC compute the false positive report probability (FPRP) and Bayesian false discovery probability (BFDP) to evaluate whether some statistically nonsignificant SNP-cancer associations can be NU “falsely” labeled noteworthy. Independent paired T-tests showed a direct relationship between SNP-cancer P-values and both FPRP and BFDP estimates. However, a discrepancy in the MA number of noteworthy associations between P-value comparison and either FPRP or BFDP was found using data extracted from meta-analysis of GWAS and GWAS Catalog. Most P- D values of associations with nonsignificant P-values but with noteworthy FPRP and BFDP PT E estimates were within the range of 10-6 to 5 × 10-8. A poorly selected genome-wide significance threshold and inclusion of a nonsignificant SNP-phenotype association into the CE noteworthy test can, with either noteworthy FPRP or BFDP computation, give a false impression of noteworthiness to a nonsignificant association. AC Keywords: GWAS; Polymorphism; FPRP; BFDP; P-value 3 ACCEPTED MANUSCRIPT Introduction During the past few decades, an unprecedented advance in genotyping technologies has allowed for a marked increase in the publications of genome-wide association studies (GWAS). Through GWAS, identification of novel single nucleotide polymorphisms (SNPs) that influence human susceptibility to physiological traits or complex diseases has expanded PT our knowledge of the genetic architecture of complex traits(McCarthy et al., 2008). However, RI while the number of publications on SNP-phenotype associations continues to rise, concerns SC have been raised on the fact that some of the findings that were reported as statistically significant could not be reproduced in subsequent studies(Ioannidis, 2007). It has taken the NU field some time to realize that the traditional interpretation of association studies where associations with P values less than 5% are considered as true positive findings is not MA stringent. Numerous statistical approaches have since been suggested to identify “noteworthy” associations that have significantly higher chances of identifying “true” associations. Some of D these methods include methods that account for multiple testing such as statistical PT E independence methods(Martin et al., 2007) and methods that adjust the error rate such as Bonferroni correction(Bland and Altman, 1995) and false discovery rate(Storey and CE Tibshirani, 2003). However, recently, the use of false positive report probability (FPRP) by Wacholder et al.(Wacholder et al., 2004), which is the probability of no true association AC between a genetic variant and disease given a statistically significant finding, has been regarded to be one of the important statistical methods to judge true noteworthiness. Similarly, a more recent development of FPRP known as Bayesian false discovery probability (BFDP)(Wakefield, 2007) has also been suggested. Either method of assessing the noteworthiness of an association is based on Bayesian statistics that uses prior probability and Bayes factor to obtain the posterior probability, which is the conditional probability of the 4 ACCEPTED MANUSCRIPT unobserved quantity of interest given the observed data. These Bayesian procedures provide a level of confidence that is easy to interpret and are preferred by clinicians and researchers. Likewise, there have been many individual GWAS and meta-analysis of GWAS in the field of cancer. Using popular Bayesian statistics (i.e. FPRP and BFDP), many primary investigators have claimed to have found truly noteworthy associations and various detailed PT descriptions of these associations have been suggested as potential future genomic diagnostic RI and prognostic biomarkers in many publications. However, our study shows, using popular SC Bayesian statistics (i.e. FPRP and BFDP), that not all statistically significant genome-wide associations are in truth noteworthy associations. Moreover, here we also show that a NU genome-wide association with non-significant P-value can also be falsely reported as being noteworthy using conventional Bayesian statistics. This paper demonstrates that the task of MA determining whether a genotype-phenotype association is truly noteworthy is very difficult AC CE PT E D even equipped with the most advanced statistical tool. 5 ACCEPTED MANUSCRIPT Methods Meta-analyses that analyzed the association between SNP and cancer risk using GWAS were first identified by performing a literature search of the PubMed database using search terms meta-analysis, genome wide association, GWAS, and cancer on August 2, 2018. 655 articles were identified through our search methods, and finally 29 articles were selected (Figure 1). PT “GWAS meta-analyses,” which are meta-analyses that analyzed only GWAS or articles that RI had at least conducted a subgroup meta-analysis using only GWAS were included in our SC review while meta-analyses that analyzed any combination of GWAS, general observational studies such as case-control studies (i.e. case-control, nested case-control), iSelect SNP NU genotyping array (the iCOGS array) studies, and meta-analyses that did not examine the association between SNP and cancer were excluded. Moreover, meta-analyses that analyzed MA GWAS with subsequent replication studies were also excluded since this paper focuses on the “statistical methods” of assessing the noteworthiness of a genome-wide association instead of D being a comprehensive summary report of all noteworthy associations. Data extracted from PT E each meta-analysis including gene variant’s location (i.e. chromosome #, position #, and locus #), gene name, cancer of interest, allelic frequency (AF) (i.e. risk AF, minor AF, effect CE AF), odds ratio (OR), 95% confidence interval (CI), number of GWAS analyzed, publication bias, heterogeneity (i.e. Ph, I2, tau2) and sample size are summarized in the Supplementary AC Table S1. Similarly, GWAS data (i.e. ethnicity, specific sample size, OR, 95% CI, GWAS source) used to run the meta-analysis in each meta-analysis paper, defined here as “GWAS raw data,” were also extracted and are tabulated in the Supplementary Table S2. Out of the total 401 SNP cancer associations for which we extracted these data (Supplementary Table S1), only 66 SNP cancer associations were finally selected because their results were (1) statistically significant (i.e. P<0.05 and if the 95% CI excluded 1.0), (2) the a comparison 6 ACCEPTED MANUSCRIPT data (i.e. OR, 95% CI, etc.) needed for the FPRP and BFDP estimation for this particular association could be accessed through the GWAS Catalog, and (3) the primary investigator provided “GWAS raw data” in his or her GWAS meta-analysis (Supplementary Table S7). Only associations meeting all three criteria were considered for our analysis because only then could we compute and compare separate FPRP and BFDP values using the results from PT the GWAS Catalog, GWAS meta-analyses, and raw GWAS data. RI For each statistically significant association reported, a FPRP and BFDP value was SC determined using methods previously described by Wacholder et al.(Wacholder et al., 2004) and Wakefield et al.(Wakefield, 2007), respectively. In the case of FPRP, a FPRP value was NU determined using the P value, statistical power of the test, and a prior probability for the association. Since the prior probability is highly subjective, we analyzed all ranges of prior MA probabilities from 10-1 to 10-6 at the statistical power to detect an OR of 1.5 as recommended by Wacholder et al.(Wacholder et al., 2004), which is more conservative than the median D reported OR of 1.2 in our analysis. Summary of the FPRP values estimated using data from PT E the GWAS meta-analyses and “GWAS raw data” are shown in Supplementary Table S3. Not all FPRP values could be obtained since either some GWAS meta-analyses did not report the CE OR and 95% CI of the “GWAS raw data” or the FPRP could not be calculated due to a mathematical error in the process of calculating the inverse of the cumulative normal AC distribution (Supplementary Table S3). More importantly, OR and the 95% CI of the gene variant associations on which we performed the FPRP computations were also extracted from the National Human Genome Research Institute (NHGRI) Catalog of Published GWAS, which is a publicly available manually curated collection of up-to-date published GWAS with all SNP-trait associations with P < 10-5. Data extracted from the GWAS Catalog are summarized in the Supplementary Table S4 and S5. These data were used to calculate the 7 ACCEPTED MANUSCRIPT FPRP and BFDP values for our comparison with the FPRP and BFDP values estimated using reported data from the meta-analysis from our literature search. Considering the narrow literature search criteria we used, the FPRP and BFDP values estimated using the GWAS Catalog reports can be assumed to be closer to the “true” noteworthiness since it is based on all available data published by papers referenced by the GWAS Catalog, which are not PT limited to the meta-analyses of GWAS (i.e. results of various case-control studies and iCOGs RI studies are also reflected in the reports of the GWAS Catalog). 30 out of the total 66 SNP SC cancer associations that were analyzed for the FPRP computation were selected for the final analyses (i.e. FPRP and BFDP calculations for Table 1 and Table 2). Selection of a SNP- NU cancer association was based on the criterion that a FPRP value for the reported GWAS Catalog and GWAS meta-analyses and OR and 95% CI could be calculated. Prior MA probabilities of 10-3 and 10-6, which are considered to be nonsignificant by GWAS interpretation, were used for the final analyses shown in Table 1 and Table 2 as recommended D by Wacholder et al.(Wacholder et al., 2004). Estimated FPRP was also considered significant PT E and a SNP cancer association was found “noteworthy” when the FPRP value was less than 0.2 in all analyses as suggested by Wacholder et al.(Wacholder et al., 2004). All FPRP CE computations were performed using the Excel spreadsheet provided by Wacholder et al.(Wacholder et al., 2004). Similar to the FPRP, the BFDP value of each SNP cancer AC association was calculated with prior probabilities of 10-3 and 10-6 via Excel spreadsheet (http://faculty.washington/edu/jonno/cv.html). In addition, the BFDP value was considered noteworthy if less than 0.8 as recommended by Wakefield(Wakefield, 2007). In the case of BFDP, only two GWAS raw datasets were computed (Table 2) compared to the four GWAS raw datasets computed in the case of FPRP (Table 1) because two out of four GWAS raw datasets used for the FPRP estimation in Table 1 were incomplete due to missing data in their 8 ACCEPTED MANUSCRIPT reference meta-analyses, and thus, these results could not be compared to those of BFDP estimation. After evaluating the noteworthiness of an association based on the FPRP and BFDP values, we calculated the percentages of noteworthiness based on (1) a simple P-value comparison (P < 5×10-8), (2) a FPRP calculation, and (3) a BFDP calculation of data reported by GWAS PT Catalog, GWAS meta-analyses and GWAS raw dataset for comparisons. The mean and RI standard error of noteworthy FPRP and BFDP values derived from data of the GWAS meta- SC analyses were further categorized into three different P-value ranges using the reported Pvalue by the GWAS meta-analyses, and an independent, paired T-test was performed using NU SPSS for Windows version 18.0 (SPSS Inc., Chicago, Illinois, USA) to find a relationship between the FPRP, BFDP and the P-value of the original dataset (Table 1). Lastly, for MA associations with a statistically non-significant P-value (P ≥ 5×10-8) but with noteworthy FPRP (FPRP<0.2) and BFDP (BFDP<0.8) values, we categorized the P-values into eight AC CE PT E D different ranges for a comparison study (Figure 3). 9 ACCEPTED MANUSCRIPT Results Out of the 30 SNP cancer associations for which the FPRP values were estimated, the number of associations for which the FPRP value was considered not noteworthy was found to increase from least to greatest in the order of GWAS Catalog, GWAS meta-analysis, and GWAS raw data for both prior probabilities of 10-3 and 10-6 (Table 1). A similar trend was PT observed in the case of BFDP for both prior probabilities (Table 2). However, overall, more RI associations were found noteworthy using BFDP compared to FPRP for the GWAS Catalog (8% vs. 5%; prior probability of 10-6) and the GWAS meta-analysis (20.0% vs. 13.3%; prior SC probability of 10-3, and 66.7% vs. 53.3%; prior probability of 10-6) while a similar number of NU associations was found noteworthy using both BFDP and FPRP in the case of the GWAS raw data at both prior probabilities of 10-3 and 10-6. Furthermore, some of the associations with MA non-noteworthy FPRP differed from those that were considered non-noteworthy according to the BFDP computation. SNP IDs for the associations with such discrepancies were rs7579899, D rs12105918, rs13314271, rs3806624, rs4888262, rs10911251, rs3217901, and rs59336. PT E Overall, the BFDP computation resulted in a smaller number of noteworthy associations (128 associations vs. 137 associations from FPRP). CE When comparing the percentages of noteworthy associations based on the criteria of (1) Pvalue comparison (P < 5 ×10-8), (2) FPRP estimation, and (3) BFDP estimation, the AC percentage of noteworthy associations based on the P-value was much lower compared to those found using either FPRP or BFDP (Table 2A and 2B), indicating that some of the associations were found noteworthy by means of FPRP and BFDP estimation even though their respective P-values were greater than the 5×10-8 needed to reach a genome wide significance in the first place(Panagiotou and Ioannidis, 2012). The discrepancy in the number of noteworthy associations between the P-value comparison and either FPRP or 10 ACCEPTED MANUSCRIPT BFDP was more evident in the results analyzed using a prior probability of 10-3 than that of 10-6, and was revealed to be largest in the analysis that used the reports of GWAS metaanalysis compared to those derived from either the GWAS Catalog or GWAS raw data. However, categorizing the P-values into three different ranges (10-5≤P-value<0.05, 1010 ≤P-value<10-5, and 10-15≤P-value<10-10) and comparing the mean FPRP and BFDP of PT each range using an independent paired T-test showed that the mean FPRP and BFDP of RI associations with higher P-values was statistically significantly higher than that of SC associations with lower P-values, indicating that in general an increasing P-value results in a large enough increase in both FPRP and BFDP to make an association non-noteworthy as NU expected. Nonetheless, P-value distribution of the associations with P-values greater than 5×10-8 but MA with noteworthy FPRP and BFDP values from data extracted from GWAS Catalog and GWAS meta-analyses showed that many of the P-values for these associations were in the P- D value range from 10-6 to 5×10-8 with the highest number of associations falling in the range PT E between 10-6 and 10-7 (Figure 3A). As expected, while many of the associations with P-values greater than 5×10-8 from the reports of GWAS raw data had their P-values falling within the CE range between 10-1 and 10-4, those also with noteworthy FPRP and BFDP values had their Pvalues falling within the range between 10-4 and 10-7 (Figure 3A, 3B, and 3D). However, this AC trend was only observed for the associations whose noteworthy FPRP and BFDP values were calculated using a prior probability of 10-3 rather than those computed using a prior probability of 10-6 (Figure 3B and 3D) because very few associations had noteworthy FPRP and BFDP values at a lower prior probability of 10-6 using data from GWAS raw data, all of which having P-values less than 5×10-8. 11 ACCEPTED MANUSCRIPT Discussion To the best of our knowledge, this paper is the first to ever use empirical data, especially data extracted from a literature search of GWAS meta-analyses and those from the GWAS Catalog, to prove that not all associations with “noteworthiness” found using the Bayesian statistics such as FPRP and BFDP can be considered as truly “noteworthy”. Here, we show that even PT associations with a P-value greater than 5×10-8 and less than 0.05 can have noteworthy FPRP RI (FPRP<0.2) and BFDP (BFDP<0.8) values and may thus in some instances allow an SC investigator to falsely label a “non-noteworthy” association as “noteworthy”. This finding is problematic not only because it implies that some of the published associations previously NU reported as “noteworthy” using the above Bayesian approach may have to be reevaluated, but it also underscores a need for a more careful method of selecting associations that should MA undergo the “noteworthiness” test or a more rigorous method of classifying an association as positive in addition to the previously used criteria such as the Venice criteria(Ioannidis et al., D 2008), both FPRP and BFDP, higher statistical power (i.e. power greater than 80%), and low PT E heterogeneity (i.e. I2 less than 50%). Our study shows that the Bayesian approach is not rigorous enough as to be used as a single method of finding a noteworthy association as so CE previously done by Dong et al.(Dong et al., 2008). Unfortunately, many investigators still assess the noteworthiness of an association using AC either FPRP or BFDP whenever the association is found to be statistically significant by an observational study (i.e. when P<0.05 and the 95% CI excluded 1.0). For GWAS, the genome wide significance threshold has been previously suggested to be P-value < 5×10-8 and this threshold was also used in our P-value comparison study(Panagiotou and Ioannidis, 2012). However, there still exists some ambiguity as to what constitutes the most suitable genome wide significance threshold. At a practical level, some early GWAS used a threshold of P < 12 ACCEPTED MANUSCRIPT 1×10-7 (Maraganore et al., 2005; Stacey et al., 2007; Zanke et al., 2007) while another threshold (e.g. P < 5×10-7) has been less frequently used(2007). The general rule, however, is that associations with P < 5×10-8 are considered replicable(Hoggart et al., 2008) and associations with P ≥ 1×10-7 are not accepted unless proven through a more rigorous replication. These thresholds are based on the estimated effective number of independent tests PT (~106 tests) if all common SNP in HapMap have been tested with direct genotyping at the α RI = 0.05 level and thus the Bonferroni correction for multiple testing where the number of tests SC is in the order of 106 would be 0.05/106 or 5×10-8. Considering that the genome wide significance depends heavily on the study population, minor allele frequencies of SNP, NU different linkage disequilibrium patterns, and types of genetic data or arrays used for the analysis(Hoggart et al., 2008; Orr and Chanock, 2008; Pe'er et al., 2008), obtaining a true MA genome wide significance threshold is nearly impossible and thus, a threshold P-value for what constitutes a statistical significance worthy of assessing the noteworthiness via FPRP or D BFDP is highly indeterminate. Such ambiguity is very troublesome since our results indicate PT E that it is not just the associations with P-values under the conventional threshold that have the possibility of noteworthiness through FPRP and BFDP estimates. To make matters worse, CE according to our results, most of the associations with P > 5×10-8 and noteworthy FPRP and BFDP values were computed using data extracted from the GWAS Catalog and GWAS meta- AC analysis and had their P-values falling within the range between 10-5 and 5×10-8. This range encompasses the range of P-values between 10-7 and 5×10-8 defined for the “borderline” associations, which have been termed “borderline” because these associations are highly prone to statistical insignificance in studies attempting replication(Panagiotou and Ioannidis, 2012). Though a study(Panagiotou and Ioannidis, 2012) did previously report that about 73% of these borderline associations can be successfully replicated in subsequent GWAS, whether 13 ACCEPTED MANUSCRIPT this phenomenon holds true across various types of SNPs and the disease of interest is not certain. Overall, our results can be interpreted in two different ways: (1) if an investigator is using a higher genome-wide significance threshold than that conventionally used (e.g. P < 5×10-8) in order to better reflect perhaps the quality of the study population, minor allelic frequency, and linkage patterns, then in the likely scenario that an association turns out to be PT statistically significant for a “noteworthiness” test (i.e. its P-value is between 5×10-8 and the RI selected threshold), the investigator should bear in mind that this association could be in truth SC not noteworthy even with a noteworthy FPRP or BFDP estimates; or (2) if an investigator uses the conventional threshold of P < 5×10-8 and decides to neglect an association with P- NU value in the “borderline” range, he or she may possibly be neglecting an truly noteworthy association with a noteworthy FPRP or BFDP estimate. In either case, meticulous attention MA must be given to associations with a P-value near the selected threshold and results of the FPRP or BFDP estimation should be interpreted carefully. D The problem with selecting a threshold P-value becomes even more challenging for results PT E of observational studies whose associations are considered to be statistically significant if the P-value is simply less than 0.05 unlike that of GWAS. Since GWAS results generally have CE much smaller P-values than those reported by the observational studies, one can expect that the observational studies will give rise to a higher number of false-positive noteworthy AC associations compared to that of GWAS. However, in observational studies, the threshold Pvalue is generally fixed at 0.05 and the study’s small sample size allows for a P-value that is highly responsive to the change in the number of case patient. While it is difficult to find a threshold P-value that could determine whether an association is statistically significant enough to enter a noteworthy test (in our case, FPRP or BFDP), our independent, paired T-test results indicate that, in general, associations with lower P-values 14 ACCEPTED MANUSCRIPT are found to be truly noteworthy using the Bayesian approach. One could argue that the Pvalue acts as a dependent variable in the Bayesian function but a careful review of both FPRP and BFDP functions shows that neither directly uses the given P-value of an association (Supplementary Table S6). Instead, the sample size used for a particular association study is thought to allow for this direct relationship. In FPRP, as the prior probability decreases (or as PT the allelic frequency decreases), a larger sample size is needed to achieve a FPRP of RI 0.2(Wacholder et al., 2004) and in BFDP, a larger sample size provides a greater power to SC identify levels of significance with higher posterior odds for a given P-value(Wakefield, 2007). Thus, in general, a larger sample size used for an association gives rise to a higher NU chance of obtaining a statistical significance (i.e. P-value < threshold) and this also increases the posterior odds’ chance of being larger than threshold used for noteworthiness (i.e. 0.2 for MA FPRP and 0.8 for BFDP) for a given prior probability, thus allowing both the P-value and FPRP and BFDP estimates to have a direct relationship. This phenomenon is also evident D from our results in Table 1 and Table 2 wherein the greatest number of non-noteworthy PT E associations is found in the GWAS raw data, GWAS meta-analysis and GWAS Catalog in descending order, which is also the order by which the study sample size increases. Generally, CE a genetic association that is found significant or noteworthy in its initial GWAS undergoes multiple replication studies. Both initial and replicative studies are then used together in AC meta-analysis to conclude that, from a large sample size perspective, a genetic association is indeed significant. However, our finding shows that even associations without statistical significance can, with pooled sample size (i.e. addition of subsequent replication studies or via meta-analysis), become statistically significant and even noteworthy via the Bayesian approach’s function. This problem is even more serious for meta-analysis that uses any arbitrary number of GWAS or observational studies or one that excludes certain individual 15 ACCEPTED MANUSCRIPT studies to make an association statistically significant. Some limitations of this study are that we (1) refrained from discussing some criticisms of FPRP focused on the Bayesian issues such as the determination of prior probabilities for the two hypotheses, restricted FPRP setting (normal distribution with equal and known variances) that does not entirely reflect the realistic setting, and admittedly simplistic model(Lucke, PT 2009); and (2) may have overly simplified our explanations of the FPRP and BFDP and also RI regarding the Bayesian statistics in order to focus on the issues of practical application and SC problems with interpreting the results of the noteworthiness test by clinicians rather than on the matter of theoretical basis since there are many investigators who may not be specialized NU in Bayesian statistics but are familiar enough with the tools to use them for the analyses. Overall, determining the right threshold P-value for a genetic association to be worthy of MA entering the noteworthy test is challenging and selection of a wrong threshold can give falsely noteworthy association. In addition, while the Bayesian approach (i.e. FPRP and BFDP), in D general, assign noteworthiness to associations with low statistically significant P-values, both PT E computations are not independent from the effects of the sample size. Here, we ask clinicians CE to be wary of these limitations when interpreting the results of genetic association studies. None AC Acknowledgements Conflicts of interest. None. Author contributions. J.P and D.I.G designed research; J.P, D.I.G and J.I.S performed research and analyzed data. J.P, D.I.G, M.E, H.J.V and J.I.S wrote the paper. 16 ACCEPTED MANUSCRIPT References , 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-78. Bland, J.M. and Altman, D.G., 1995. Multiple significance tests: the Bonferroni method. Bmj 310, 170. Dong, L.M., Potter, J.D., White, E., Ulrich, C.M., Cardon, L.R. and Peters, U., 2008. Genetic PT susceptibility to cancer: the role of polymorphisms in candidate genes. Jama 299, 2423-36. Hoggart, C.J., Clark, T.G., De Iorio, M., Whittaker, J.C. and Balding, D.J., 2008. Genome-wide RI significance for dense SNP and resequencing data. Genet Epidemiol 32, 179-85. Ioannidis, J.P., 2007. Non-replication and inconsistency in the genome-wide association setting. SC Hum Hered 64, 203-13. Ioannidis, J.P., Boffetta, P., Little, J., O'Brien, T.R., Uitterlinden, A.G., Vineis, P., Balding, D.J., Chokkalingam, A., Dolan, S.M., Flanders, W.D., Higgins, J.P., McCarthy, M.I., McDermott, D.H., NU Page, G.P., Rebbeck, T.R., Seminara, D. and Khoury, M.J., 2008. Assessment of cumulative evidence on genetic associations: interim guidelines. Int J Epidemiol 37, 120-32. MA Lucke, J.F., 2009. A critique of the false-positive report probability. Genet Epidemiol 33, 145-50. Maraganore, D.M., de Andrade, M., Lesnick, T.G., Strain, K.J., Farrer, M.J., Rocca, W.A., Pant, P.V., Frazer, K.A., Cox, D.R. and Ballinger, D.G., 2005. High-resolution whole-genome association study of Parkinson disease. Am J Hum Genet 77, 685-93. D Martin, L.J., Woo, J.G., Avery, C.L., Chen, H.S., North, K.E., Au, K., Broet, P., Dalmasso, C., Guedj, M., PT E Holmans, P., Huang, B., Kuo, P.H., Lam, A.C., Li, H., Manning, A., Nikolov, I., Sinha, R., Shi, J., Song, K., Tabangin, M., Tang, R. and Yamada, R., 2007. Multiple testing in the genomics era: findings from Genetic Analysis Workshop 15, Group 15. Genet Epidemiol 31 Suppl 1, S124-31. CE McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P. and Hirschhorn, J.N., 2008. Genome-wide association studies for complex traits: consensus, uncertainty and AC challenges. Nat Rev Genet 9, 356-69. Orr, N. and Chanock, S., 2008. Common genetic variation and human disease. Adv Genet 62, 1-32. Panagiotou, O.A. and Ioannidis, J.P., 2012. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol 41, 273-86. Pe'er, I., Yelensky, R., Altshuler, D. and Daly, M.J., 2008. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 32, 381-5. Stacey, S.N., Manolescu, A., Sulem, P., Rafnar, T., Gudmundsson, J., Gudjonsson, S.A., Masson, G., Jakobsdottir, M., Thorlacius, S., Helgason, A., Aben, K.K., Strobbe, L.J., Albers-Akkers, M.T., Swinkels, D.W., Henderson, B.E., Kolonel, L.N., Le Marchand, L., Millastre, E., Andres, R., 17 ACCEPTED MANUSCRIPT Godino, J., Garcia-Prats, M.D., Polo, E., Tres, A., Mouy, M., Saemundsdottir, J., Backman, V.M., Gudmundsson, L., Kristjansson, K., Bergthorsson, J.T., Kostic, J., Frigge, M.L., Geller, F., Gudbjartsson, D., Sigurdsson, H., Jonsdottir, T., Hrafnkelsson, J., Johannsson, J., Sveinsson, T., Myrdal, G., Grimsson, H.N., Jonsson, T., von Holst, S., Werelius, B., Margolin, S., Lindblom, A., Mayordomo, J.I., Haiman, C.A., Kiemeney, L.A., Johannsson, O.T., Gulcher, J.R., Thorsteinsdottir, U., Kong, A. and Stefansson, K., 2007. Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet 39, 865-9. PT Storey, J.D. and Tibshirani, R., 2003. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100, 9440-5. RI Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. and Rothman, N., 2004. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. SC J Natl Cancer Inst 96, 434-42. Wakefield, J., 2007. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 81, 208-27. NU Zanke, B.W., Greenwood, C.M., Rangrej, J., Kustra, R., Tenesa, A., Farrington, S.M., Prendergast, J., Olschwang, S., Chiang, T., Crowdy, E., Ferretti, V., Laflamme, P., Sundararajan, S., Roumy, S., Olivier, J.F., Robidoux, F., Sladek, R., Montpetit, A., Campbell, P., Bezieau, S., O'Shea, A.M., MA Zogopoulos, G., Cotterchio, M., Newcomb, P., McLaughlin, J., Younghusband, B., Green, R., Green, J., Porteous, M.E., Campbell, H., Blanche, H., Sahbatou, M., Tubacher, E., BonaitiPellie, C., Buecher, B., Riboli, E., Kury, S., Chanock, S.J., Potter, J., Thomas, G., Gallinger, S., D Hudson, T.J. and Dunlop, M.G., 2007. Genome-wide association scan identifies a colorectal AC CE PT E cancer susceptibility locus on chromosome 8q24. Nat Genet 39, 989-94. 18 ACCEPTED MANUSCRIPT Table 1. Calculated FPRP values for each SNP ID at prior probabilities of 0.001 and 0.000001 and a statistical power to detect an OR of 1.5 using reported OR and 95%CI in the GWAS catalog, GWAS meta-analysis and GWAS raw data used for the GWAS meta-analysis and reported by the GWAS meta-analysis (analyzed were a maximum of 4 sets of raw data) Prior Probability of 0.001 (Expected for a Candidate Gene) SNP ID (rs) rs7579899 rs4953345 rs12105918 rs72858496 rs7105934 rs718314 rs4765623 rs3217810 rs17879961 rs13314271 rs3806624 rs20541 rs444929 rs1860661 rs3817198 rs17021463 rs12699477 rs4888262 rs603965 rs11903757 rs10911251 rs3217901 rs59336 rs17530068 rs2284378 rs3765524 rs3781264 rs10411210 rs961253 rs4444235 No. of nonnoteworthy results GWAS Catalog 7.18873E-05 1.78501E-09 0.144229351 6.24935E-13 4.68753E-09 2.44427E-06 1.22167E-07 3.07367E-05 6.00213E-07 3.15083E-05 1.85665E-10 2.52559E-05 0.003946647 9.14459E-06 8.01445E-10 2.44427E-06 5.34936E-06 1.85665E-10 1.02835E-05 7.9967E-06 0.002766038 0.000169251 0.002766038 0.000686867 7.9967E-06 2.54705E-06 7.8929E-07 1.22167E-07 2.45089E-07 7.56027E-06 0 (0.0%) GWAS Meta 0.217010919 1.32301E-05 0.000210456 5.2017E-05 1.2724E-05 0.802020793 0.217010919 0.00290674 3.91155E-06 0.04094744 0.149033717 2.35165E-07 0.003946647 1.45705E-05 0.671653697 0.004396377 0.00037535 0.000141706 1.11569E-05 0.003536869 0.000169251 0.025695761 0.000169251 0.031819289 0.000686867 1.18482E-10 6.73668E-08 5.26569E-05 0.00363619 0.020330418 GWAS Raw Data 1 0.966728248 0.072094967 0.305719064 0.267351214 0.000341673 0.870163687 0.150471685 0.991531177 0.224574983 0.99483476 0.968131758 9.85358E-06 0.706866392 0.011453285 0.994163969 0.217307661 0.278155362 0.11404148 0.126790586 0.961321595 0.972952691 0.900546291 0.997290718 0.012905608 0.692646105 8.6767E-06 2.45866E-05 0.95655674 0.992700115 0.578906683 20 (66.7%) GWAS Raw Data 3 N/A N/A N/A N/A N/A N/A N/A 0.995408064 0.999202121 0.996448128 0.981788204 0.964732338 0.75519144 0.958080589 N/A N/A N/A N/A N/A 0.967512601 0.995304574 0.997925193 0.977303366 0.998204583 0.93936687 N/A N/A 0.940648956 0.896800977 0.76677849 D E T P E C C 4 (13.3%) GWAS Raw Data 2 0.447046669 0.000566742 0.551614305 0.473744775 0.625252318 0.996915547 0.980371294 0.998377395 0.997491532 0.985035047 0.830558794 0.991179851 0.967376514 0.694530155 0.000686867 0.874438882 0.311786558 0.355074933 0.024977207 0.998000414 0.994210551 0.998208397 0.998292138 0.965268608 0.447046669 0.002747928 0.589878719 0.357665573 0.985992961 0.982003947 26 (86.7%) 30 (100.0%) Prior Probability of 0.000001 (Expected for a Random Single Nucleotide Polymorphism) GWAS Raw Data 4 N/A N/A N/A N/A N/A N/A N/A 0.997911489 0.1210079 0.988397509 N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.830641557 0.946246085 0.997039656 0.903729432 N/A N/A N/A N/A 0.710672002 0.149033717 0.969481017 28 (93.3%) GWAS Catalog 0.067133187 1.78679E-06 0.994107454 6.2556E-10 4.6922E-06 0.002440748 0.000122274 0.029849939 0.000600453 0.030576387 1.85851E-07 0.024658348 0.798640721 0.009070788 8.02246E-07 0.002440748 0.005326216 1.85851E-07 0.010189038 0.007941194 0.735203457 0.144896277 0.735203457 0.407592287 0.007941194 0.002543123 0.000789456 0.000122274 0.000245274 0.007511042 0.996408487 0.013070426 0.174038892 0.049494425 0.012576697 0.999753457 0.996408487 0.74477616 0.003900206 0.977136804 0.994328166 0.000235345 0.798640721 0.014375621 0.999511864 0.815505181 0.273185402 0.124241877 0.011044843 0.78036328 0.144896277 0.96350341 0.144896277 0.970499644 0.407592287 1.18601E-07 6.74296E-05 0.050072903 0.785089954 0.954071713 5 (16.7%) 16 (53.3%) GWAS Raw Data 1 0.999965619 0.987305487 0.997736427 0.997269814 0.254916775 0.999850962 0.994391504 0.999991467 0.996562451 0.999994813 0.999967117 0.009767188 0.999585891 0.920619601 0.999994136 0.996414728 0.997414183 0.992298791 0.993166872 0.999959807 0.999972229 0.999889685 0.999997286 0.929014653 0.999556901 0.00861066 0.024020639 0.999954631 0.999992654 0.99927386 27 (90.0%) T P I R C S U N A M GWAS Meta GWAS Raw Data 2 0.998765858 0.362094589 0.99918861 0.998891499 0.999401603 0.999996909 0.999979999 0.999998376 0.999997488 0.999984823 0.999796237 0.99999111 0.999966311 0.99956081 0.407592287 0.999856574 0.997799735 0.998188794 0.96246622 0.999997998 0.999994183 0.999998207 0.999998291 0.999964056 0.998765858 0.733919121 0.999305913 0.998209099 0.999985808 0.999981693 30 (100.0%) GWAS Raw Data 3 N/A N/A N/A N/A N/A N/A N/A 0.999995392 0.999999202 0.999996439 0.999981469 0.999963481 0.999676261 0.999956292 N/A N/A N/A N/A N/A 0.999966456 0.999995287 0.999997923 0.9999768 0.999998203 0.999935522 N/A N/A 0.999936971 0.999885054 0.999696239 GWAS Raw Data 4 N/A N/A N/A N/A N/A N/A N/A 0.999997909 0.992795613 0.999988273 N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.999796356 0.999943252 0.999997034 0.999893592 N/A N/A N/A N/A 0.999593453 0.994328166 0.999968553 30 (100.0%) 30 (100.0%) Abbreviations: GWAS, genome-wide association studies; FPRP, false positive rate probability; Meta, meta-analysis; SNP, single nucleotide polymorphism; OR, odds ratio; CI, confidence interval; N/A, not available; No., number Cells highlighted in red are FPRP values ≥ 0.2 and thus are not considered noteworthy Cells highlighted in brown or marked N/A are cells of which FPRP values could not be calculated due to missing odds ratio and confidence interval not reported in the GWAS meta-analysis. When summing the non-noteworthy results, “N/A” was also counted as non-noteworthy. Cells highlighted in blue show that the number of non-noteworthy results when deciding note-worthiness based on FPRP calculation differs from the number of non-noteworthy results based on BFDP calculation (see Table 2) FPRP values underlined and bolded give rise to noteworthiness that differ from that obtained through the BFDP calculation (see Table 2); SNP ID that are bolded are the ones with discrepancies. *P-values reported by the GWAS meta-analyses for each SNP ID (rs) is provided in Table 2 (SNP ID in Table 1 and Table 2 are the same) A 19 ACCEPTED MANUSCRIPT Table 2. Calculated BFDP values for each SNP ID at prior probabilities of 0.001 and 0.000001 using reported OR and 95%CI in the GWAS catalog, GWAS meta-analysis and GWAS raw data used for the GWAS meta-analysis reported by the GWAS meta-analysis (analyzed were a maximum of 2 sets of raw data). Calculations of various parameters (Z-statistics, r “shrinkage factor”, approximate Bayes factor, etc.) are listed in Supplementary Table S5 rs7579899 rs4953345 rs12105918 GWAS Catalog 0.009364 2.67E-07 0.788397 Prior Probability of 0.001 (Expected for a Candidate Gene) P-value* GWAS Meta P-value* GWAS Raw Data 1 2.00E-09 0.921833 1.43E-04 0.998506 5.00E-10 0.00098 5.26E-10 0.776396 2.00E-08 0.009155 3.38E-07 0.869662 GWAS Raw DATA 2 0.963092 0.030597 0.958878 Prior Probability of 0.000001 (Expected for a Random Single Nucleotide Polymorphism) GWAS P-value* GWAS Meta P-value* GWAS Raw GWAS Raw Catalog Data 1 DATA 2 0.90442 2.00E-09 0.999915 1.43E-04 0.999999 0.999962 0.000267 5.00E-10 0.4954 5.26E-10 0.999712 0.969319 0.902425 0.999732 2.00E-08 3.38E-07 0.99985 0.999957 rs72858496 rs7105934 rs718314 rs4765623 rs3217810 rs17879961 rs13314271 rs3806624 rs20541 rs444929 rs1860661 rs3817198 rs17021463 rs12699477 rs4888262 rs603965 rs11903757 rs10911251 7.84E-11 3.7E-07 0.000323 2.27E-05 0.003364 2.94E-10 0.004913 2.94E-08 0.001098 0.201958 0.000971 3.66E-07 0.000323 0.000626 2.94E-08 5.32E-06 0.001136 0.305261 2.00E-07 8.00E-14 9.00E-10 3.00E-08 6.00E-08 1.00E-13 7.00E-10 1.00E-12 1.00E-08 3.00E-06 4.00E-10 2.00E-11 1.00E-08 6.00E-09 5.00E-12 2.00E-11 4.00E-08 9.00E-08 0.002312 0.000499 0.992496 0.921833 0.145845 3.73E-06 0.832196 0.878779 1.94E-05 0.162202 0.001029 0.992887 0.167764 0.019904 0.007827 3.76E-05 0.19319 0.019509 1.72E-07 4.93E-09 4.17E-03 2.30E-04 3.40E-07 1.96E-11 2.76E-03 2.00E-04 1.00E-10 3.10E-06 1.90E-08 1.10E-02 6.09E-06 7.16E-07 1.39E-07 7.96E-11 1.38E-06 1.34E-06 0.839498 0.004527 0.993998 0.880994 0.998545 0.344165 0.999756 0.998653 0.000613 0.985515 0.37097 0.999795 0.889369 0.917735 0.792886 0.227909 0.994645 0.998163 0.94609 0.968435 0.999691 0.99896 0.998938 0.998538 0.999355 0.981636 0.998717 0.996146 0.968661 0.048085 0.991639 0.914488 0.922876 0.176272 0.998902 0.998448 7.85E-08 0.00037 0.244246 0.022239 0.771624 2.94E-07 0.831712 2.94E-05 0.523795 0.996068 0.493106 0.000367 0.244246 0.385506 2.94E-05 0.005298 0.532459 0.997732 2.00E-07 8.00E-14 9.00E-10 3.00E-08 6.00E-08 1.00E-13 7.00E-10 1.00E-12 1.00E-08 3.00E-06 4.00E-10 2.00E-11 1.00E-08 6.00E-09 5.00E-12 2.00E-11 4.00E-08 9.00E-08 rs3217901 rs59336 rs17530068 rs2284378 rs3765524 rs3781264 rs10411210 rs961253 rs4444235 0.029203 0.305261 0.068147 0.001136 0.00022 7.08E-05 2.27E-05 5.36E-05 0.001493 3.00E-07 4.00E-07 3.00E-07 1.00E-08 2.00E-09 4.00E-09 5.00E-09 2.00E-10 8.00E-10 0.670696 0.019509 0.622245 0.048085 1.3E-08 5.82E-06 0.003496 0.214815 0.585294 1.71E-06 7.64E-07 3.50E-06 6.50E-07 3.15E-14 5.77E-11 4.90E-08 8.90E-07 1.80E-06 0.993259 0.999693 0.399579 0.988188 0.000604 0.001485 0.995464 0.999392 0.972681 0.999162 0.999369 0.998112 0.963092 0.063731 0.966942 0.87604 0.998957 0.998786 0.967857 0.997732 0.986524 0.532459 0.180241 0.066217 0.022239 0.050893 0.599445 3.00E-07 4.00E-07 3.00E-07 1.00E-08 2.00E-09 4.00E-09 5.00E-09 2.00E-10 8.00E-10 No. of nonnoteworthy results 0 (0.0%) 20 (66.7%) 26 (86.7%) 8 (26.7%) SNP ID (rs) PT E C C A 6 (20.0%) D E M I R C S U N A T P 0.69877 0.333356 0.999992 0.999915 0.994183 0.003719 0.999799 0.999862 0.019004 0.994866 0.507687 0.999993 0.995069 0.953114 0.887593 0.036283 0.995845 0.952192 1.72E-07 4.93E-09 4.17E-03 2.30E-04 3.40E-07 1.96E-11 2.76E-03 2.00E-04 1.00E-10 3.10E-06 1.90E-08 1.10E-02 6.09E-06 7.16E-07 1.39E-07 7.96E-11 1.38E-06 1.34E-06 0.999809 0.819875 0.999994 0.999865 0.999999 0.9981 1 0.999999 0.380406 0.999985 0.998309 1 0.999876 0.99991 0.999739 0.996627 0.999995 0.999998 0.999943 0.999967 1 0.999999 0.999999 0.999999 0.999999 0.999981 0.999999 0.999996 0.999968 0.980607 0.999992 0.999907 0.999917 0.995353 0.999999 0.999998 0.99951 0.952192 0.999394 0.980607 1.3E-05 0.005792 0.778354 0.996362 0.999293 1.71E-06 7.64E-07 3.50E-06 6.50E-07 3.15E-14 5.77E-11 4.90E-08 8.90E-07 1.80E-06 0.999993 1 0.998501 0.999988 0.376773 0.59825 0.999995 0.999999 0.999972 0.999999 0.999999 0.999998 0.999962 0.985536 0.999966 0.999859 0.999999 0.999999 27 (90.0%) 30 (100.0%) 20 (66.7%) Abbreviations: GWAS, genome-wide association studies; BFDP, Bayesian false discovery probability; Meta, meta-analysis; SNP, single nucleotide polymorphism; OR, odds ratio; CI, confidence interval; N/A, not available; No., number Cells highlighted in red are BFDP values ≥ 0.8 and thus are not considered noteworthy Cells highlighted in blue show that the number of non-noteworthy results when deciding note-worthiness based on BFDP calculation differs from the number of non-noteworthy results based on FPRP calculation (see Table 1) 20 ACCEPTED MANUSCRIPT BFDP values underlined and bolded give rise to noteworthiness that differ from that obtained through the FPRP calculation (see Table 1); SNP ID that are bolded are the ones with discrepancies. *P-values are P-values reported by the GWAS meta-analyses for each SNP ID not the FPRP P-values, which can be found in Supplementary Table S4. T P I R C S U N A D E M T P E C C A 21 ACCEPTED MANUSCRIPT Figure legends Figure 1. Flow chart of literature search SC RI PT Figure 2. 2A & 2B: Percentages of noteworthiness based on (1) P-value comparison, (2) FPRP calculation, and (3) BFDP calculation based on the reports from GWAS catalog, GWAS metaanalysis, and GWAS raw data reported by the GWAS meta-analyses. In Figure 2A, the prior probability of 0.001 for a candidate gene was used while in Figure 2B, the prior probability of 0.000001 for a random single nucleotide polymorphism was used for the calculation of FPRP and BFDP. 2C & 2D: Summary of calculated mean and standard error of FPRP and BFDP values using reports of GWAS meta-analyses with P-values categorized into one of three ranges (Summary of the independent and paired T-test is shown in the Supplementary Table S6). In Figure 2C, the prior probability of 0.001 for a candidate gene was used while in Figure 2D, the prior probability of 0.000001 for a random single nucleotide polymorphism was used for the calculation of FPRP and BFDP. AC CE PT E D MA NU Figure 3. P-value distributions of SNP associations with P-values ≤ 5 ×10-8 regardless of FPRP or BFDP noteworthiness (Figures 3A), P-values ≤ 5 ×10-8 and noteworthy FPRP values (FPRP<0.2) (Figures 3B and 3C), and P-values ≤ 5 ×10-8 noteworthy BFDP values (BFDP <0.8) (Figures 3D and 3E) that were calculated using data provided by the GWAS catalog, GWAS meta-analysis, GWAS raw data 1 and GWAS raw data 2. X-axis is the range of P-values while Y-axis is the percentages of SNP associations falling into a particular range. Figures 3B and 3D had noteworthy FPRP and BFDP estimated with a prior probability of 0.001 while Figures 3C and 3E had noteworthy FPRP and BFDP estimated with a prior probability of 0.000001. 22 ACCEPTED MANUSCRIPT Abbreviation list AC CE PT E D MA NU SC RI PT GWAS, genome-wide association studies; FPRP, false positive rate probability; BFDP, Bayesian false discovery probability; Meta, meta-analysis; PP, prior probability 23 Figure 1 Figure 2 Figure 3