Uploaded by Tim Geum

Bayesian statistical methods in genetic association studies - empirical examination of statistically non-significant GWAS meta-analyses in cancers

advertisement
Accepted Manuscript
Bayesian statistical methods in genetic association studies:
Empirical examination of statistically non-significant Genome
Wide Association Studies (GWAS) meta-analyses in cancers: A
systematic review
Jae Hyon Park, Dong Il Geum, Michael Eisenhut, Hans J. van der
Vliet, Jae Il Shin
PII:
DOI:
Reference:
S0378-1119(18)31096-5
https://doi.org/10.1016/j.gene.2018.10.057
GENE 43313
To appear in:
Gene
Received date:
Accepted date:
12 September 2018
19 October 2018
Please cite this article as: Jae Hyon Park, Dong Il Geum, Michael Eisenhut, Hans J.
van der Vliet, Jae Il Shin , Bayesian statistical methods in genetic association studies:
Empirical examination of statistically non-significant Genome Wide Association Studies
(GWAS) meta-analyses in cancers: A systematic review. Gene (2018), https://doi.org/
10.1016/j.gene.2018.10.057
This is a PDF file of an unedited manuscript that has been accepted for publication. As
a service to our customers we are providing this early version of the manuscript. The
manuscript will undergo copyediting, typesetting, and review of the resulting proof before
it is published in its final form. Please note that during the production process errors may
be discovered which could affect the content, and all legal disclaimers that apply to the
journal pertain.
ACCEPTED MANUSCRIPT
Original article
Bayesian statistical methods in genetic association studies: empirical examination of
statistically non-significant Genome Wide Association Studies (GWAS) meta-analyses in
RI
PT
cancers: a systematic review
Jae Hyon Park, MD*1, Dong Il Geum, MS*1, Michael Eisenhut, MD2, Hans. J. van der Vliet,
NU
SC
MD, PhD3, Jae Il Shin, MD, PhD4.
1. Department of Radiology, Yonsei University College of Medicine, Severance’s Hospital,
MA
Seoul, Korea.
2. Luton&Dunstable University Hospital NHS Foundation Trust, United Kingdom
Netherlands.
PT
E
D
3. Department of Medical Oncology, VU University Medical Center, Amsterdam, The
4. Department of Pediatrics, Yonsei University College of Medicine, Severance Children’s
*3874 words
CE
Hospital, Seoul, Korea.
AC
Correspondence to Jae Il Shin, M.D., Ph.D.
Address: 50 Yonsei-ro, Seodaemun-gu, C.P.O. Box 8044, Department of Pediatrics, Yonsei
University College of Medicine, Seoul 120-752, Republic of Korea
Tel: +82-2-2228-2050; Fax: +82-2-393-9118; E-mail: shinji@yuhs.ac
*These authors contributed equally to this work.
The authors declare no potential conflicts of interest
1
ACCEPTED MANUSCRIPT
Key messages
 Although there have been many articles on the meta-analysis of GWAS in the fields
of cancer, very few studies mention the possibility of false positive result and various
caveats in determining whether a genotype-phenotype association is truly noteworthy.
PT
 It is still possible to obtain “noteworthy” Bayesian results (i.e. FPRP and Bayesian
False Discovery Probability (BFDP)) even at higher P-values that are in general not
considered statistically significant for GWAS and our result shows that these false
positive results are much more prevalent in the borderline genetic association studies.
RI
 While the initial GWAS results are found non-noteworthy through either FPRP or
BFDP, results of the meta-analysis of these GWAS results are found noteworthy in
many cases mainly due to the effects of pooled sample size.
NU
SC
 We should be cautious when applying the Bayesian statistics in the field of genetic
epidemiology for the evaluation of false positive results and in screening truly
noteworthy genotype-phenotype associations that are used for future bench-to-bed
translations.
AC
CE
PT
E
D
MA
Bayesian statistical methods (i.e. FPRP and Bayesian False Discovery Probability (BFDP))
2
ACCEPTED MANUSCRIPT
Abstract
A Bayesian statistical method was developed to assess the noteworthiness of a single
nucleotide polymorphism (SNP)-phenotype association that shows statistical significance in
various observational studies, but it has seldom been applied in the GWAS meta-analysis in
cancers. Data (i.e. allelic frequency, odds ratio, 95% confidence interval, etc.) on various
PT
SNP-cancer associations were extracted from meta-analysis of GWAS and the National
RI
Human Genome Research Institute (NHGRI) Catalog of Published GWAS and were used to
SC
compute the false positive report probability (FPRP) and Bayesian false discovery probability
(BFDP) to evaluate whether some statistically nonsignificant SNP-cancer associations can be
NU
“falsely” labeled noteworthy. Independent paired T-tests showed a direct relationship between
SNP-cancer P-values and both FPRP and BFDP estimates. However, a discrepancy in the
MA
number of noteworthy associations between P-value comparison and either FPRP or BFDP
was found using data extracted from meta-analysis of GWAS and GWAS Catalog. Most P-
D
values of associations with nonsignificant P-values but with noteworthy FPRP and BFDP
PT
E
estimates were within the range of 10-6 to 5 × 10-8. A poorly selected genome-wide
significance threshold and inclusion of a nonsignificant SNP-phenotype association into the
CE
noteworthy test can, with either noteworthy FPRP or BFDP computation, give a false
impression of noteworthiness to a nonsignificant association.
AC
Keywords: GWAS; Polymorphism; FPRP; BFDP; P-value
3
ACCEPTED MANUSCRIPT
Introduction
During the past few decades, an unprecedented advance in genotyping technologies has
allowed for a marked increase in the publications of genome-wide association studies
(GWAS). Through GWAS, identification of novel single nucleotide polymorphisms (SNPs)
that influence human susceptibility to physiological traits or complex diseases has expanded
PT
our knowledge of the genetic architecture of complex traits(McCarthy et al., 2008). However,
RI
while the number of publications on SNP-phenotype associations continues to rise, concerns
SC
have been raised on the fact that some of the findings that were reported as statistically
significant could not be reproduced in subsequent studies(Ioannidis, 2007). It has taken the
NU
field some time to realize that the traditional interpretation of association studies where
associations with P values less than 5% are considered as true positive findings is not
MA
stringent. Numerous statistical approaches have since been suggested to identify “noteworthy”
associations that have significantly higher chances of identifying “true” associations. Some of
D
these methods include methods that account for multiple testing such as statistical
PT
E
independence methods(Martin et al., 2007) and methods that adjust the error rate such as
Bonferroni correction(Bland and Altman, 1995) and false discovery rate(Storey and
CE
Tibshirani, 2003). However, recently, the use of false positive report probability (FPRP) by
Wacholder et al.(Wacholder et al., 2004), which is the probability of no true association
AC
between a genetic variant and disease given a statistically significant finding, has been
regarded to be one of the important statistical methods to judge true noteworthiness. Similarly,
a more recent development of FPRP known as Bayesian false discovery probability
(BFDP)(Wakefield, 2007) has also been suggested. Either method of assessing the
noteworthiness of an association is based on Bayesian statistics that uses prior probability and
Bayes factor to obtain the posterior probability, which is the conditional probability of the
4
ACCEPTED MANUSCRIPT
unobserved quantity of interest given the observed data. These Bayesian procedures provide a
level of confidence that is easy to interpret and are preferred by clinicians and researchers.
Likewise, there have been many individual GWAS and meta-analysis of GWAS in the field
of cancer. Using popular Bayesian statistics (i.e. FPRP and BFDP), many primary
investigators have claimed to have found truly noteworthy associations and various detailed
PT
descriptions of these associations have been suggested as potential future genomic diagnostic
RI
and prognostic biomarkers in many publications. However, our study shows, using popular
SC
Bayesian statistics (i.e. FPRP and BFDP), that not all statistically significant genome-wide
associations are in truth noteworthy associations. Moreover, here we also show that a
NU
genome-wide association with non-significant P-value can also be falsely reported as being
noteworthy using conventional Bayesian statistics. This paper demonstrates that the task of
MA
determining whether a genotype-phenotype association is truly noteworthy is very difficult
AC
CE
PT
E
D
even equipped with the most advanced statistical tool.
5
ACCEPTED MANUSCRIPT
Methods
Meta-analyses that analyzed the association between SNP and cancer risk using GWAS were
first identified by performing a literature search of the PubMed database using search terms
meta-analysis, genome wide association, GWAS, and cancer on August 2, 2018. 655 articles
were identified through our search methods, and finally 29 articles were selected (Figure 1).
PT
“GWAS meta-analyses,” which are meta-analyses that analyzed only GWAS or articles that
RI
had at least conducted a subgroup meta-analysis using only GWAS were included in our
SC
review while meta-analyses that analyzed any combination of GWAS, general observational
studies such as case-control studies (i.e. case-control, nested case-control), iSelect SNP
NU
genotyping array (the iCOGS array) studies, and meta-analyses that did not examine the
association between SNP and cancer were excluded. Moreover, meta-analyses that analyzed
MA
GWAS with subsequent replication studies were also excluded since this paper focuses on the
“statistical methods” of assessing the noteworthiness of a genome-wide association instead of
D
being a comprehensive summary report of all noteworthy associations. Data extracted from
PT
E
each meta-analysis including gene variant’s location (i.e. chromosome #, position #, and
locus #), gene name, cancer of interest, allelic frequency (AF) (i.e. risk AF, minor AF, effect
CE
AF), odds ratio (OR), 95% confidence interval (CI), number of GWAS analyzed, publication
bias, heterogeneity (i.e. Ph, I2, tau2) and sample size are summarized in the Supplementary
AC
Table S1. Similarly, GWAS data (i.e. ethnicity, specific sample size, OR, 95% CI, GWAS
source) used to run the meta-analysis in each meta-analysis paper, defined here as “GWAS
raw data,” were also extracted and are tabulated in the Supplementary Table S2. Out of the
total 401 SNP cancer associations for which we extracted these data (Supplementary Table
S1), only 66 SNP cancer associations were finally selected because their results were (1)
statistically significant (i.e. P<0.05 and if the 95% CI excluded 1.0), (2) the a comparison
6
ACCEPTED MANUSCRIPT
data (i.e. OR, 95% CI, etc.) needed for the FPRP and BFDP estimation for this particular
association could be accessed through the GWAS Catalog, and (3) the primary investigator
provided “GWAS raw data” in his or her GWAS meta-analysis (Supplementary Table S7).
Only associations meeting all three criteria were considered for our analysis because only
then could we compute and compare separate FPRP and BFDP values using the results from
PT
the GWAS Catalog, GWAS meta-analyses, and raw GWAS data.
RI
For each statistically significant association reported, a FPRP and BFDP value was
SC
determined using methods previously described by Wacholder et al.(Wacholder et al., 2004)
and Wakefield et al.(Wakefield, 2007), respectively. In the case of FPRP, a FPRP value was
NU
determined using the P value, statistical power of the test, and a prior probability for the
association. Since the prior probability is highly subjective, we analyzed all ranges of prior
MA
probabilities from 10-1 to 10-6 at the statistical power to detect an OR of 1.5 as recommended
by Wacholder et al.(Wacholder et al., 2004), which is more conservative than the median
D
reported OR of 1.2 in our analysis. Summary of the FPRP values estimated using data from
PT
E
the GWAS meta-analyses and “GWAS raw data” are shown in Supplementary Table S3. Not
all FPRP values could be obtained since either some GWAS meta-analyses did not report the
CE
OR and 95% CI of the “GWAS raw data” or the FPRP could not be calculated due to a
mathematical error in the process of calculating the inverse of the cumulative normal
AC
distribution (Supplementary Table S3). More importantly, OR and the 95% CI of the gene
variant associations on which we performed the FPRP computations were also extracted from
the National Human Genome Research Institute (NHGRI) Catalog of Published GWAS,
which is a publicly available manually curated collection of up-to-date published GWAS with
all SNP-trait associations with P < 10-5. Data extracted from the GWAS Catalog are
summarized in the Supplementary Table S4 and S5. These data were used to calculate the
7
ACCEPTED MANUSCRIPT
FPRP and BFDP values for our comparison with the FPRP and BFDP values estimated using
reported data from the meta-analysis from our literature search. Considering the narrow
literature search criteria we used, the FPRP and BFDP values estimated using the GWAS
Catalog reports can be assumed to be closer to the “true” noteworthiness since it is based on
all available data published by papers referenced by the GWAS Catalog, which are not
PT
limited to the meta-analyses of GWAS (i.e. results of various case-control studies and iCOGs
RI
studies are also reflected in the reports of the GWAS Catalog). 30 out of the total 66 SNP
SC
cancer associations that were analyzed for the FPRP computation were selected for the final
analyses (i.e. FPRP and BFDP calculations for Table 1 and Table 2). Selection of a SNP-
NU
cancer association was based on the criterion that a FPRP value for the reported GWAS
Catalog and GWAS meta-analyses and OR and 95% CI could be calculated. Prior
MA
probabilities of 10-3 and 10-6, which are considered to be nonsignificant by GWAS
interpretation, were used for the final analyses shown in Table 1 and Table 2 as recommended
D
by Wacholder et al.(Wacholder et al., 2004). Estimated FPRP was also considered significant
PT
E
and a SNP cancer association was found “noteworthy” when the FPRP value was less than
0.2 in all analyses as suggested by Wacholder et al.(Wacholder et al., 2004). All FPRP
CE
computations were performed using the Excel spreadsheet provided by Wacholder et
al.(Wacholder et al., 2004). Similar to the FPRP, the BFDP value of each SNP cancer
AC
association was calculated with prior probabilities of 10-3 and 10-6 via Excel spreadsheet
(http://faculty.washington/edu/jonno/cv.html). In addition, the BFDP value was considered
noteworthy if less than 0.8 as recommended by Wakefield(Wakefield, 2007). In the case of
BFDP, only two GWAS raw datasets were computed (Table 2) compared to the four GWAS
raw datasets computed in the case of FPRP (Table 1) because two out of four GWAS raw
datasets used for the FPRP estimation in Table 1 were incomplete due to missing data in their
8
ACCEPTED MANUSCRIPT
reference meta-analyses, and thus, these results could not be compared to those of BFDP
estimation.
After evaluating the noteworthiness of an association based on the FPRP and BFDP values,
we calculated the percentages of noteworthiness based on (1) a simple P-value comparison (P
< 5×10-8), (2) a FPRP calculation, and (3) a BFDP calculation of data reported by GWAS
PT
Catalog, GWAS meta-analyses and GWAS raw dataset for comparisons. The mean and
RI
standard error of noteworthy FPRP and BFDP values derived from data of the GWAS meta-
SC
analyses were further categorized into three different P-value ranges using the reported Pvalue by the GWAS meta-analyses, and an independent, paired T-test was performed using
NU
SPSS for Windows version 18.0 (SPSS Inc., Chicago, Illinois, USA) to find a relationship
between the FPRP, BFDP and the P-value of the original dataset (Table 1). Lastly, for
MA
associations with a statistically non-significant P-value (P ≥ 5×10-8) but with noteworthy
FPRP (FPRP<0.2) and BFDP (BFDP<0.8) values, we categorized the P-values into eight
AC
CE
PT
E
D
different ranges for a comparison study (Figure 3).
9
ACCEPTED MANUSCRIPT
Results
Out of the 30 SNP cancer associations for which the FPRP values were estimated, the number
of associations for which the FPRP value was considered not noteworthy was found to
increase from least to greatest in the order of GWAS Catalog, GWAS meta-analysis, and
GWAS raw data for both prior probabilities of 10-3 and 10-6 (Table 1). A similar trend was
PT
observed in the case of BFDP for both prior probabilities (Table 2). However, overall, more
RI
associations were found noteworthy using BFDP compared to FPRP for the GWAS Catalog
(8% vs. 5%; prior probability of 10-6) and the GWAS meta-analysis (20.0% vs. 13.3%; prior
SC
probability of 10-3, and 66.7% vs. 53.3%; prior probability of 10-6) while a similar number of
NU
associations was found noteworthy using both BFDP and FPRP in the case of the GWAS raw
data at both prior probabilities of 10-3 and 10-6. Furthermore, some of the associations with
MA
non-noteworthy FPRP differed from those that were considered non-noteworthy according to
the BFDP computation. SNP IDs for the associations with such discrepancies were rs7579899,
D
rs12105918, rs13314271, rs3806624, rs4888262, rs10911251, rs3217901, and rs59336.
PT
E
Overall, the BFDP computation resulted in a smaller number of noteworthy associations (128
associations vs. 137 associations from FPRP).
CE
When comparing the percentages of noteworthy associations based on the criteria of (1) Pvalue comparison (P < 5 ×10-8), (2) FPRP estimation, and (3) BFDP estimation, the
AC
percentage of noteworthy associations based on the P-value was much lower compared to
those found using either FPRP or BFDP (Table 2A and 2B), indicating that some of the
associations were found noteworthy by means of FPRP and BFDP estimation even though
their respective P-values were greater than the 5×10-8 needed to reach a genome wide
significance in the first place(Panagiotou and Ioannidis, 2012). The discrepancy in the
number of noteworthy associations between the P-value comparison and either FPRP or
10
ACCEPTED MANUSCRIPT
BFDP was more evident in the results analyzed using a prior probability of 10-3 than that of
10-6, and was revealed to be largest in the analysis that used the reports of GWAS metaanalysis compared to those derived from either the GWAS Catalog or GWAS raw data.
However, categorizing the P-values into three different ranges (10-5≤P-value<0.05, 1010
≤P-value<10-5, and 10-15≤P-value<10-10) and comparing the mean FPRP and BFDP of
PT
each range using an independent paired T-test showed that the mean FPRP and BFDP of
RI
associations with higher P-values was statistically significantly higher than that of
SC
associations with lower P-values, indicating that in general an increasing P-value results in a
large enough increase in both FPRP and BFDP to make an association non-noteworthy as
NU
expected.
Nonetheless, P-value distribution of the associations with P-values greater than 5×10-8 but
MA
with noteworthy FPRP and BFDP values from data extracted from GWAS Catalog and
GWAS meta-analyses showed that many of the P-values for these associations were in the P-
D
value range from 10-6 to 5×10-8 with the highest number of associations falling in the range
PT
E
between 10-6 and 10-7 (Figure 3A). As expected, while many of the associations with P-values
greater than 5×10-8 from the reports of GWAS raw data had their P-values falling within the
CE
range between 10-1 and 10-4, those also with noteworthy FPRP and BFDP values had their Pvalues falling within the range between 10-4 and 10-7 (Figure 3A, 3B, and 3D). However, this
AC
trend was only observed for the associations whose noteworthy FPRP and BFDP values were
calculated using a prior probability of 10-3 rather than those computed using a prior
probability of 10-6 (Figure 3B and 3D) because very few associations had noteworthy FPRP
and BFDP values at a lower prior probability of 10-6 using data from GWAS raw data, all of
which having P-values less than 5×10-8.
11
ACCEPTED MANUSCRIPT
Discussion
To the best of our knowledge, this paper is the first to ever use empirical data, especially data
extracted from a literature search of GWAS meta-analyses and those from the GWAS Catalog,
to prove that not all associations with “noteworthiness” found using the Bayesian statistics
such as FPRP and BFDP can be considered as truly “noteworthy”. Here, we show that even
PT
associations with a P-value greater than 5×10-8 and less than 0.05 can have noteworthy FPRP
RI
(FPRP<0.2) and BFDP (BFDP<0.8) values and may thus in some instances allow an
SC
investigator to falsely label a “non-noteworthy” association as “noteworthy”. This finding is
problematic not only because it implies that some of the published associations previously
NU
reported as “noteworthy” using the above Bayesian approach may have to be reevaluated, but
it also underscores a need for a more careful method of selecting associations that should
MA
undergo the “noteworthiness” test or a more rigorous method of classifying an association as
positive in addition to the previously used criteria such as the Venice criteria(Ioannidis et al.,
D
2008), both FPRP and BFDP, higher statistical power (i.e. power greater than 80%), and low
PT
E
heterogeneity (i.e. I2 less than 50%). Our study shows that the Bayesian approach is not
rigorous enough as to be used as a single method of finding a noteworthy association as so
CE
previously done by Dong et al.(Dong et al., 2008).
Unfortunately, many investigators still assess the noteworthiness of an association using
AC
either FPRP or BFDP whenever the association is found to be statistically significant by an
observational study (i.e. when P<0.05 and the 95% CI excluded 1.0). For GWAS, the genome
wide significance threshold has been previously suggested to be P-value < 5×10-8 and this
threshold was also used in our P-value comparison study(Panagiotou and Ioannidis, 2012).
However, there still exists some ambiguity as to what constitutes the most suitable genome
wide significance threshold. At a practical level, some early GWAS used a threshold of P <
12
ACCEPTED MANUSCRIPT
1×10-7 (Maraganore et al., 2005; Stacey et al., 2007; Zanke et al., 2007) while another
threshold (e.g. P < 5×10-7) has been less frequently used(2007). The general rule, however, is
that associations with P < 5×10-8 are considered replicable(Hoggart et al., 2008) and
associations with P ≥ 1×10-7 are not accepted unless proven through a more rigorous
replication. These thresholds are based on the estimated effective number of independent tests
PT
(~106 tests) if all common SNP in HapMap have been tested with direct genotyping at the α
RI
= 0.05 level and thus the Bonferroni correction for multiple testing where the number of tests
SC
is in the order of 106 would be 0.05/106 or 5×10-8. Considering that the genome wide
significance depends heavily on the study population, minor allele frequencies of SNP,
NU
different linkage disequilibrium patterns, and types of genetic data or arrays used for the
analysis(Hoggart et al., 2008; Orr and Chanock, 2008; Pe'er et al., 2008), obtaining a true
MA
genome wide significance threshold is nearly impossible and thus, a threshold P-value for
what constitutes a statistical significance worthy of assessing the noteworthiness via FPRP or
D
BFDP is highly indeterminate. Such ambiguity is very troublesome since our results indicate
PT
E
that it is not just the associations with P-values under the conventional threshold that have the
possibility of noteworthiness through FPRP and BFDP estimates. To make matters worse,
CE
according to our results, most of the associations with P > 5×10-8 and noteworthy FPRP and
BFDP values were computed using data extracted from the GWAS Catalog and GWAS meta-
AC
analysis and had their P-values falling within the range between 10-5 and 5×10-8. This range
encompasses the range of P-values between 10-7 and 5×10-8 defined for the “borderline”
associations, which have been termed “borderline” because these associations are highly
prone to statistical insignificance in studies attempting replication(Panagiotou and Ioannidis,
2012). Though a study(Panagiotou and Ioannidis, 2012) did previously report that about 73%
of these borderline associations can be successfully replicated in subsequent GWAS, whether
13
ACCEPTED MANUSCRIPT
this phenomenon holds true across various types of SNPs and the disease of interest is not
certain. Overall, our results can be interpreted in two different ways: (1) if an investigator is
using a higher genome-wide significance threshold than that conventionally used (e.g. P <
5×10-8) in order to better reflect perhaps the quality of the study population, minor allelic
frequency, and linkage patterns, then in the likely scenario that an association turns out to be
PT
statistically significant for a “noteworthiness” test (i.e. its P-value is between 5×10-8 and the
RI
selected threshold), the investigator should bear in mind that this association could be in truth
SC
not noteworthy even with a noteworthy FPRP or BFDP estimates; or (2) if an investigator
uses the conventional threshold of P < 5×10-8 and decides to neglect an association with P-
NU
value in the “borderline” range, he or she may possibly be neglecting an truly noteworthy
association with a noteworthy FPRP or BFDP estimate. In either case, meticulous attention
MA
must be given to associations with a P-value near the selected threshold and results of the
FPRP or BFDP estimation should be interpreted carefully.
D
The problem with selecting a threshold P-value becomes even more challenging for results
PT
E
of observational studies whose associations are considered to be statistically significant if the
P-value is simply less than 0.05 unlike that of GWAS. Since GWAS results generally have
CE
much smaller P-values than those reported by the observational studies, one can expect that
the observational studies will give rise to a higher number of false-positive noteworthy
AC
associations compared to that of GWAS. However, in observational studies, the threshold Pvalue is generally fixed at 0.05 and the study’s small sample size allows for a P-value that is
highly responsive to the change in the number of case patient.
While it is difficult to find a threshold P-value that could determine whether an association
is statistically significant enough to enter a noteworthy test (in our case, FPRP or BFDP), our
independent, paired T-test results indicate that, in general, associations with lower P-values
14
ACCEPTED MANUSCRIPT
are found to be truly noteworthy using the Bayesian approach. One could argue that the Pvalue acts as a dependent variable in the Bayesian function but a careful review of both FPRP
and BFDP functions shows that neither directly uses the given P-value of an association
(Supplementary Table S6). Instead, the sample size used for a particular association study is
thought to allow for this direct relationship. In FPRP, as the prior probability decreases (or as
PT
the allelic frequency decreases), a larger sample size is needed to achieve a FPRP of
RI
0.2(Wacholder et al., 2004) and in BFDP, a larger sample size provides a greater power to
SC
identify levels of significance with higher posterior odds for a given P-value(Wakefield,
2007). Thus, in general, a larger sample size used for an association gives rise to a higher
NU
chance of obtaining a statistical significance (i.e. P-value < threshold) and this also increases
the posterior odds’ chance of being larger than threshold used for noteworthiness (i.e. 0.2 for
MA
FPRP and 0.8 for BFDP) for a given prior probability, thus allowing both the P-value and
FPRP and BFDP estimates to have a direct relationship. This phenomenon is also evident
D
from our results in Table 1 and Table 2 wherein the greatest number of non-noteworthy
PT
E
associations is found in the GWAS raw data, GWAS meta-analysis and GWAS Catalog in
descending order, which is also the order by which the study sample size increases. Generally,
CE
a genetic association that is found significant or noteworthy in its initial GWAS undergoes
multiple replication studies. Both initial and replicative studies are then used together in
AC
meta-analysis to conclude that, from a large sample size perspective, a genetic association is
indeed significant. However, our finding shows that even associations without statistical
significance can, with pooled sample size (i.e. addition of subsequent replication studies or
via meta-analysis), become statistically significant and even noteworthy via the Bayesian
approach’s function. This problem is even more serious for meta-analysis that uses any
arbitrary number of GWAS or observational studies or one that excludes certain individual
15
ACCEPTED MANUSCRIPT
studies to make an association statistically significant.
Some limitations of this study are that we (1) refrained from discussing some criticisms of
FPRP focused on the Bayesian issues such as the determination of prior probabilities for the
two hypotheses, restricted FPRP setting (normal distribution with equal and known variances)
that does not entirely reflect the realistic setting, and admittedly simplistic model(Lucke,
PT
2009); and (2) may have overly simplified our explanations of the FPRP and BFDP and also
RI
regarding the Bayesian statistics in order to focus on the issues of practical application and
SC
problems with interpreting the results of the noteworthiness test by clinicians rather than on
the matter of theoretical basis since there are many investigators who may not be specialized
NU
in Bayesian statistics but are familiar enough with the tools to use them for the analyses.
Overall, determining the right threshold P-value for a genetic association to be worthy of
MA
entering the noteworthy test is challenging and selection of a wrong threshold can give falsely
noteworthy association. In addition, while the Bayesian approach (i.e. FPRP and BFDP), in
D
general, assign noteworthiness to associations with low statistically significant P-values, both
PT
E
computations are not independent from the effects of the sample size. Here, we ask clinicians
CE
to be wary of these limitations when interpreting the results of genetic association studies.
None
AC
Acknowledgements
Conflicts of interest.
None.
Author contributions.
J.P and D.I.G designed research; J.P, D.I.G and J.I.S performed research and analyzed data.
J.P, D.I.G, M.E, H.J.V and J.I.S wrote the paper.
16
ACCEPTED MANUSCRIPT
References
, 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000
shared controls. Nature 447, 661-78.
Bland, J.M. and Altman, D.G., 1995. Multiple significance tests: the Bonferroni method. Bmj 310,
170.
Dong, L.M., Potter, J.D., White, E., Ulrich, C.M., Cardon, L.R. and Peters, U., 2008. Genetic
PT
susceptibility to cancer: the role of polymorphisms in candidate genes. Jama 299, 2423-36.
Hoggart, C.J., Clark, T.G., De Iorio, M., Whittaker, J.C. and Balding, D.J., 2008. Genome-wide
RI
significance for dense SNP and resequencing data. Genet Epidemiol 32, 179-85.
Ioannidis, J.P., 2007. Non-replication and inconsistency in the genome-wide association setting.
SC
Hum Hered 64, 203-13.
Ioannidis, J.P., Boffetta, P., Little, J., O'Brien, T.R., Uitterlinden, A.G., Vineis, P., Balding, D.J.,
Chokkalingam, A., Dolan, S.M., Flanders, W.D., Higgins, J.P., McCarthy, M.I., McDermott, D.H.,
NU
Page, G.P., Rebbeck, T.R., Seminara, D. and Khoury, M.J., 2008. Assessment of cumulative
evidence on genetic associations: interim guidelines. Int J Epidemiol 37, 120-32.
MA
Lucke, J.F., 2009. A critique of the false-positive report probability. Genet Epidemiol 33, 145-50.
Maraganore, D.M., de Andrade, M., Lesnick, T.G., Strain, K.J., Farrer, M.J., Rocca, W.A., Pant, P.V.,
Frazer, K.A., Cox, D.R. and Ballinger, D.G., 2005. High-resolution whole-genome association
study of Parkinson disease. Am J Hum Genet 77, 685-93.
D
Martin, L.J., Woo, J.G., Avery, C.L., Chen, H.S., North, K.E., Au, K., Broet, P., Dalmasso, C., Guedj, M.,
PT
E
Holmans, P., Huang, B., Kuo, P.H., Lam, A.C., Li, H., Manning, A., Nikolov, I., Sinha, R., Shi, J.,
Song, K., Tabangin, M., Tang, R. and Yamada, R., 2007. Multiple testing in the genomics era:
findings from Genetic Analysis Workshop 15, Group 15. Genet Epidemiol 31 Suppl 1,
S124-31.
CE
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P. and Hirschhorn,
J.N., 2008. Genome-wide association studies for complex traits: consensus, uncertainty and
AC
challenges. Nat Rev Genet 9, 356-69.
Orr, N. and Chanock, S., 2008. Common genetic variation and human disease. Adv Genet 62, 1-32.
Panagiotou, O.A. and Ioannidis, J.P., 2012. What should the genome-wide significance threshold be?
Empirical replication of borderline genetic associations. Int J Epidemiol 41, 273-86.
Pe'er, I., Yelensky, R., Altshuler, D. and Daly, M.J., 2008. Estimation of the multiple testing burden for
genomewide association studies of nearly all common variants. Genet Epidemiol 32, 381-5.
Stacey, S.N., Manolescu, A., Sulem, P., Rafnar, T., Gudmundsson, J., Gudjonsson, S.A., Masson, G.,
Jakobsdottir, M., Thorlacius, S., Helgason, A., Aben, K.K., Strobbe, L.J., Albers-Akkers, M.T.,
Swinkels, D.W., Henderson, B.E., Kolonel, L.N., Le Marchand, L., Millastre, E., Andres, R.,
17
ACCEPTED MANUSCRIPT
Godino, J., Garcia-Prats, M.D., Polo, E., Tres, A., Mouy, M., Saemundsdottir, J., Backman, V.M.,
Gudmundsson, L., Kristjansson, K., Bergthorsson, J.T., Kostic, J., Frigge, M.L., Geller, F.,
Gudbjartsson, D., Sigurdsson, H., Jonsdottir, T., Hrafnkelsson, J., Johannsson, J., Sveinsson, T.,
Myrdal, G., Grimsson, H.N., Jonsson, T., von Holst, S., Werelius, B., Margolin, S., Lindblom, A.,
Mayordomo, J.I., Haiman, C.A., Kiemeney, L.A., Johannsson, O.T., Gulcher, J.R.,
Thorsteinsdottir, U., Kong, A. and Stefansson, K., 2007. Common variants on chromosomes
2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat
Genet 39, 865-9.
PT
Storey, J.D. and Tibshirani, R., 2003. Statistical significance for genomewide studies. Proc Natl Acad
Sci U S A 100, 9440-5.
RI
Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. and Rothman, N., 2004. Assessing the
probability that a positive report is false: an approach for molecular epidemiology studies.
SC
J Natl Cancer Inst 96, 434-42.
Wakefield, J., 2007. A Bayesian measure of the probability of false discovery in genetic
epidemiology studies. Am J Hum Genet 81, 208-27.
NU
Zanke, B.W., Greenwood, C.M., Rangrej, J., Kustra, R., Tenesa, A., Farrington, S.M., Prendergast, J.,
Olschwang, S., Chiang, T., Crowdy, E., Ferretti, V., Laflamme, P., Sundararajan, S., Roumy, S.,
Olivier, J.F., Robidoux, F., Sladek, R., Montpetit, A., Campbell, P., Bezieau, S., O'Shea, A.M.,
MA
Zogopoulos, G., Cotterchio, M., Newcomb, P., McLaughlin, J., Younghusband, B., Green, R.,
Green, J., Porteous, M.E., Campbell, H., Blanche, H., Sahbatou, M., Tubacher, E., BonaitiPellie, C., Buecher, B., Riboli, E., Kury, S., Chanock, S.J., Potter, J., Thomas, G., Gallinger, S.,
D
Hudson, T.J. and Dunlop, M.G., 2007. Genome-wide association scan identifies a colorectal
AC
CE
PT
E
cancer susceptibility locus on chromosome 8q24. Nat Genet 39, 989-94.
18
ACCEPTED MANUSCRIPT
Table 1. Calculated FPRP values for each SNP ID at prior probabilities of 0.001 and 0.000001 and a statistical power to detect an OR of 1.5 using reported OR and 95%CI in the GWAS catalog,
GWAS meta-analysis and GWAS raw data used for the GWAS meta-analysis and reported by the GWAS meta-analysis (analyzed were a maximum of 4 sets of raw data)
Prior Probability of 0.001 (Expected for a Candidate Gene)
SNP ID (rs)
rs7579899
rs4953345
rs12105918
rs72858496
rs7105934
rs718314
rs4765623
rs3217810
rs17879961
rs13314271
rs3806624
rs20541
rs444929
rs1860661
rs3817198
rs17021463
rs12699477
rs4888262
rs603965
rs11903757
rs10911251
rs3217901
rs59336
rs17530068
rs2284378
rs3765524
rs3781264
rs10411210
rs961253
rs4444235
No. of nonnoteworthy results
GWAS Catalog
7.18873E-05
1.78501E-09
0.144229351
6.24935E-13
4.68753E-09
2.44427E-06
1.22167E-07
3.07367E-05
6.00213E-07
3.15083E-05
1.85665E-10
2.52559E-05
0.003946647
9.14459E-06
8.01445E-10
2.44427E-06
5.34936E-06
1.85665E-10
1.02835E-05
7.9967E-06
0.002766038
0.000169251
0.002766038
0.000686867
7.9967E-06
2.54705E-06
7.8929E-07
1.22167E-07
2.45089E-07
7.56027E-06
0
(0.0%)
GWAS Meta
0.217010919
1.32301E-05
0.000210456
5.2017E-05
1.2724E-05
0.802020793
0.217010919
0.00290674
3.91155E-06
0.04094744
0.149033717
2.35165E-07
0.003946647
1.45705E-05
0.671653697
0.004396377
0.00037535
0.000141706
1.11569E-05
0.003536869
0.000169251
0.025695761
0.000169251
0.031819289
0.000686867
1.18482E-10
6.73668E-08
5.26569E-05
0.00363619
0.020330418
GWAS Raw
Data 1
0.966728248
0.072094967
0.305719064
0.267351214
0.000341673
0.870163687
0.150471685
0.991531177
0.224574983
0.99483476
0.968131758
9.85358E-06
0.706866392
0.011453285
0.994163969
0.217307661
0.278155362
0.11404148
0.126790586
0.961321595
0.972952691
0.900546291
0.997290718
0.012905608
0.692646105
8.6767E-06
2.45866E-05
0.95655674
0.992700115
0.578906683
20
(66.7%)
GWAS Raw
Data 3
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.995408064
0.999202121
0.996448128
0.981788204
0.964732338
0.75519144
0.958080589
N/A
N/A
N/A
N/A
N/A
0.967512601
0.995304574
0.997925193
0.977303366
0.998204583
0.93936687
N/A
N/A
0.940648956
0.896800977
0.76677849
D
E
T
P
E
C
C
4 (13.3%)
GWAS Raw
Data 2
0.447046669
0.000566742
0.551614305
0.473744775
0.625252318
0.996915547
0.980371294
0.998377395
0.997491532
0.985035047
0.830558794
0.991179851
0.967376514
0.694530155
0.000686867
0.874438882
0.311786558
0.355074933
0.024977207
0.998000414
0.994210551
0.998208397
0.998292138
0.965268608
0.447046669
0.002747928
0.589878719
0.357665573
0.985992961
0.982003947
26
(86.7%)
30 (100.0%)
Prior Probability of 0.000001 (Expected for a Random Single Nucleotide Polymorphism)
GWAS Raw
Data 4
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.997911489
0.1210079
0.988397509
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.830641557
0.946246085
0.997039656
0.903729432
N/A
N/A
N/A
N/A
0.710672002
0.149033717
0.969481017
28
(93.3%)
GWAS Catalog
0.067133187
1.78679E-06
0.994107454
6.2556E-10
4.6922E-06
0.002440748
0.000122274
0.029849939
0.000600453
0.030576387
1.85851E-07
0.024658348
0.798640721
0.009070788
8.02246E-07
0.002440748
0.005326216
1.85851E-07
0.010189038
0.007941194
0.735203457
0.144896277
0.735203457
0.407592287
0.007941194
0.002543123
0.000789456
0.000122274
0.000245274
0.007511042
0.996408487
0.013070426
0.174038892
0.049494425
0.012576697
0.999753457
0.996408487
0.74477616
0.003900206
0.977136804
0.994328166
0.000235345
0.798640721
0.014375621
0.999511864
0.815505181
0.273185402
0.124241877
0.011044843
0.78036328
0.144896277
0.96350341
0.144896277
0.970499644
0.407592287
1.18601E-07
6.74296E-05
0.050072903
0.785089954
0.954071713
5 (16.7%)
16 (53.3%)
GWAS Raw
Data 1
0.999965619
0.987305487
0.997736427
0.997269814
0.254916775
0.999850962
0.994391504
0.999991467
0.996562451
0.999994813
0.999967117
0.009767188
0.999585891
0.920619601
0.999994136
0.996414728
0.997414183
0.992298791
0.993166872
0.999959807
0.999972229
0.999889685
0.999997286
0.929014653
0.999556901
0.00861066
0.024020639
0.999954631
0.999992654
0.99927386
27
(90.0%)
T
P
I
R
C
S
U
N
A
M
GWAS Meta
GWAS Raw
Data 2
0.998765858
0.362094589
0.99918861
0.998891499
0.999401603
0.999996909
0.999979999
0.999998376
0.999997488
0.999984823
0.999796237
0.99999111
0.999966311
0.99956081
0.407592287
0.999856574
0.997799735
0.998188794
0.96246622
0.999997998
0.999994183
0.999998207
0.999998291
0.999964056
0.998765858
0.733919121
0.999305913
0.998209099
0.999985808
0.999981693
30
(100.0%)
GWAS Raw
Data 3
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.999995392
0.999999202
0.999996439
0.999981469
0.999963481
0.999676261
0.999956292
N/A
N/A
N/A
N/A
N/A
0.999966456
0.999995287
0.999997923
0.9999768
0.999998203
0.999935522
N/A
N/A
0.999936971
0.999885054
0.999696239
GWAS Raw
Data 4
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.999997909
0.992795613
0.999988273
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.999796356
0.999943252
0.999997034
0.999893592
N/A
N/A
N/A
N/A
0.999593453
0.994328166
0.999968553
30 (100.0%)
30 (100.0%)
Abbreviations: GWAS, genome-wide association studies; FPRP, false positive rate probability; Meta, meta-analysis; SNP, single nucleotide polymorphism; OR, odds ratio; CI, confidence interval; N/A, not available; No., number
Cells highlighted in red are FPRP values ≥ 0.2 and thus are not considered noteworthy
Cells highlighted in brown or marked N/A are cells of which FPRP values could not be calculated due to missing odds ratio and confidence interval not reported in the GWAS meta-analysis.
When summing the non-noteworthy results, “N/A” was also counted as non-noteworthy.
Cells highlighted in blue show that the number of non-noteworthy results when deciding note-worthiness based on FPRP calculation differs from the number of non-noteworthy results based on BFDP calculation (see Table 2)
FPRP values underlined and bolded give rise to noteworthiness that differ from that obtained through the BFDP calculation (see Table 2); SNP ID that are bolded are the ones with discrepancies.
*P-values reported by the GWAS meta-analyses for each SNP ID (rs) is provided in Table 2 (SNP ID in Table 1 and Table 2 are the same)
A
19
ACCEPTED MANUSCRIPT
Table 2. Calculated BFDP values for each SNP ID at prior probabilities of 0.001 and 0.000001 using reported OR and 95%CI in the GWAS catalog, GWAS meta-analysis and
GWAS raw data used for the GWAS meta-analysis reported by the GWAS meta-analysis (analyzed were a maximum of 2 sets of raw data). Calculations of various
parameters (Z-statistics, r “shrinkage factor”, approximate Bayes factor, etc.) are listed in Supplementary Table S5
rs7579899
rs4953345
rs12105918
GWAS
Catalog
0.009364
2.67E-07
0.788397
Prior Probability of 0.001 (Expected for a Candidate Gene)
P-value*
GWAS Meta
P-value*
GWAS Raw
Data 1
2.00E-09
0.921833
1.43E-04
0.998506
5.00E-10
0.00098
5.26E-10
0.776396
2.00E-08
0.009155
3.38E-07
0.869662
GWAS Raw
DATA 2
0.963092
0.030597
0.958878
Prior Probability of 0.000001 (Expected for a Random Single Nucleotide Polymorphism)
GWAS
P-value*
GWAS Meta
P-value*
GWAS Raw
GWAS Raw
Catalog
Data 1
DATA 2
0.90442
2.00E-09
0.999915
1.43E-04
0.999999
0.999962
0.000267
5.00E-10
0.4954
5.26E-10
0.999712
0.969319
0.902425
0.999732
2.00E-08
3.38E-07
0.99985
0.999957
rs72858496
rs7105934
rs718314
rs4765623
rs3217810
rs17879961
rs13314271
rs3806624
rs20541
rs444929
rs1860661
rs3817198
rs17021463
rs12699477
rs4888262
rs603965
rs11903757
rs10911251
7.84E-11
3.7E-07
0.000323
2.27E-05
0.003364
2.94E-10
0.004913
2.94E-08
0.001098
0.201958
0.000971
3.66E-07
0.000323
0.000626
2.94E-08
5.32E-06
0.001136
0.305261
2.00E-07
8.00E-14
9.00E-10
3.00E-08
6.00E-08
1.00E-13
7.00E-10
1.00E-12
1.00E-08
3.00E-06
4.00E-10
2.00E-11
1.00E-08
6.00E-09
5.00E-12
2.00E-11
4.00E-08
9.00E-08
0.002312
0.000499
0.992496
0.921833
0.145845
3.73E-06
0.832196
0.878779
1.94E-05
0.162202
0.001029
0.992887
0.167764
0.019904
0.007827
3.76E-05
0.19319
0.019509
1.72E-07
4.93E-09
4.17E-03
2.30E-04
3.40E-07
1.96E-11
2.76E-03
2.00E-04
1.00E-10
3.10E-06
1.90E-08
1.10E-02
6.09E-06
7.16E-07
1.39E-07
7.96E-11
1.38E-06
1.34E-06
0.839498
0.004527
0.993998
0.880994
0.998545
0.344165
0.999756
0.998653
0.000613
0.985515
0.37097
0.999795
0.889369
0.917735
0.792886
0.227909
0.994645
0.998163
0.94609
0.968435
0.999691
0.99896
0.998938
0.998538
0.999355
0.981636
0.998717
0.996146
0.968661
0.048085
0.991639
0.914488
0.922876
0.176272
0.998902
0.998448
7.85E-08
0.00037
0.244246
0.022239
0.771624
2.94E-07
0.831712
2.94E-05
0.523795
0.996068
0.493106
0.000367
0.244246
0.385506
2.94E-05
0.005298
0.532459
0.997732
2.00E-07
8.00E-14
9.00E-10
3.00E-08
6.00E-08
1.00E-13
7.00E-10
1.00E-12
1.00E-08
3.00E-06
4.00E-10
2.00E-11
1.00E-08
6.00E-09
5.00E-12
2.00E-11
4.00E-08
9.00E-08
rs3217901
rs59336
rs17530068
rs2284378
rs3765524
rs3781264
rs10411210
rs961253
rs4444235
0.029203
0.305261
0.068147
0.001136
0.00022
7.08E-05
2.27E-05
5.36E-05
0.001493
3.00E-07
4.00E-07
3.00E-07
1.00E-08
2.00E-09
4.00E-09
5.00E-09
2.00E-10
8.00E-10
0.670696
0.019509
0.622245
0.048085
1.3E-08
5.82E-06
0.003496
0.214815
0.585294
1.71E-06
7.64E-07
3.50E-06
6.50E-07
3.15E-14
5.77E-11
4.90E-08
8.90E-07
1.80E-06
0.993259
0.999693
0.399579
0.988188
0.000604
0.001485
0.995464
0.999392
0.972681
0.999162
0.999369
0.998112
0.963092
0.063731
0.966942
0.87604
0.998957
0.998786
0.967857
0.997732
0.986524
0.532459
0.180241
0.066217
0.022239
0.050893
0.599445
3.00E-07
4.00E-07
3.00E-07
1.00E-08
2.00E-09
4.00E-09
5.00E-09
2.00E-10
8.00E-10
No. of nonnoteworthy
results
0 (0.0%)
20 (66.7%)
26 (86.7%)
8 (26.7%)
SNP ID (rs)
PT
E
C
C
A
6 (20.0%)
D
E
M
I
R
C
S
U
N
A
T
P
0.69877
0.333356
0.999992
0.999915
0.994183
0.003719
0.999799
0.999862
0.019004
0.994866
0.507687
0.999993
0.995069
0.953114
0.887593
0.036283
0.995845
0.952192
1.72E-07
4.93E-09
4.17E-03
2.30E-04
3.40E-07
1.96E-11
2.76E-03
2.00E-04
1.00E-10
3.10E-06
1.90E-08
1.10E-02
6.09E-06
7.16E-07
1.39E-07
7.96E-11
1.38E-06
1.34E-06
0.999809
0.819875
0.999994
0.999865
0.999999
0.9981
1
0.999999
0.380406
0.999985
0.998309
1
0.999876
0.99991
0.999739
0.996627
0.999995
0.999998
0.999943
0.999967
1
0.999999
0.999999
0.999999
0.999999
0.999981
0.999999
0.999996
0.999968
0.980607
0.999992
0.999907
0.999917
0.995353
0.999999
0.999998
0.99951
0.952192
0.999394
0.980607
1.3E-05
0.005792
0.778354
0.996362
0.999293
1.71E-06
7.64E-07
3.50E-06
6.50E-07
3.15E-14
5.77E-11
4.90E-08
8.90E-07
1.80E-06
0.999993
1
0.998501
0.999988
0.376773
0.59825
0.999995
0.999999
0.999972
0.999999
0.999999
0.999998
0.999962
0.985536
0.999966
0.999859
0.999999
0.999999
27 (90.0%)
30 (100.0%)
20 (66.7%)
Abbreviations: GWAS, genome-wide association studies; BFDP, Bayesian false discovery probability; Meta, meta-analysis; SNP, single nucleotide polymorphism; OR, odds ratio; CI, confidence interval; N/A, not available; No., number
Cells highlighted in red are BFDP values ≥ 0.8 and thus are not considered noteworthy
Cells highlighted in blue show that the number of non-noteworthy results when deciding note-worthiness based on BFDP calculation differs from the number of non-noteworthy results based on FPRP calculation (see Table 1)
20
ACCEPTED MANUSCRIPT
BFDP values underlined and bolded give rise to noteworthiness that differ from that obtained through the FPRP calculation (see Table 1); SNP ID that are bolded are the ones with discrepancies.
*P-values are P-values reported by the GWAS meta-analyses for each SNP ID not the FPRP P-values, which can be found in Supplementary Table S4.
T
P
I
R
C
S
U
N
A
D
E
M
T
P
E
C
C
A
21
ACCEPTED MANUSCRIPT
Figure legends
Figure 1. Flow chart of literature search
SC
RI
PT
Figure 2. 2A & 2B: Percentages of noteworthiness based on (1) P-value comparison, (2) FPRP
calculation, and (3) BFDP calculation based on the reports from GWAS catalog, GWAS metaanalysis, and GWAS raw data reported by the GWAS meta-analyses. In Figure 2A, the prior
probability of 0.001 for a candidate gene was used while in Figure 2B, the prior probability of
0.000001 for a random single nucleotide polymorphism was used for the calculation of FPRP and
BFDP. 2C & 2D: Summary of calculated mean and standard error of FPRP and BFDP values using
reports of GWAS meta-analyses with P-values categorized into one of three ranges (Summary of the
independent and paired T-test is shown in the Supplementary Table S6). In Figure 2C, the prior
probability of 0.001 for a candidate gene was used while in Figure 2D, the prior probability of
0.000001 for a random single nucleotide polymorphism was used for the calculation of FPRP and
BFDP.
AC
CE
PT
E
D
MA
NU
Figure 3. P-value distributions of SNP associations with P-values ≤ 5 ×10-8 regardless of FPRP or
BFDP noteworthiness (Figures 3A), P-values ≤ 5 ×10-8 and noteworthy FPRP values (FPRP<0.2)
(Figures 3B and 3C), and P-values ≤ 5 ×10-8 noteworthy BFDP values (BFDP <0.8) (Figures 3D
and 3E) that were calculated using data provided by the GWAS catalog, GWAS meta-analysis,
GWAS raw data 1 and GWAS raw data 2. X-axis is the range of P-values while Y-axis is the
percentages of SNP associations falling into a particular range. Figures 3B and 3D had
noteworthy FPRP and BFDP estimated with a prior probability of 0.001 while Figures 3C and 3E
had noteworthy FPRP and BFDP estimated with a prior probability of 0.000001.
22
ACCEPTED MANUSCRIPT
Abbreviation list
AC
CE
PT
E
D
MA
NU
SC
RI
PT
GWAS, genome-wide association studies; FPRP, false positive rate probability; BFDP, Bayesian false
discovery probability; Meta, meta-analysis; PP, prior probability
23
Figure 1
Figure 2
Figure 3
Download