Supplementary Methods and Appendix A Data extraction For each eligible association and each eligible article thereof we recorded: the PMID of each pertinent article, the name of the first author, the journal and the year of publication, the SNP or other genetic variant analyzed, the chromosomal region, the implicated gene(s) loci (as judged by the GWAS investigators), the genetic model selected, the phenotype of interest, and the ancestral group(s) of the included subjects. Data extraction was performed by two investigators with a third investigator arbitrating in discrepancies and ambiguous information. Selection of associations - clarifications Whenever two or more eligible GWAS existed on the same association and both were eligible, we merged all the data from these investigations. The same applied for GWAS that did not fulfill the GWS criterion, but one or more other GWAS on the same association existed that fulfilled the GWS criterion. For eligible associations appearing in back-to-back publications (same issue of a journal) and no cross-publication validation was performed, we synthesized by meta-analysis the final replication samples from both publications. We considered both outcomes selected upfront in the discovery stage as well as additional outcomes assessed in the final replication stage; e.g. bipolar datasets could have been included in the replication of schizophrenia-related genes on the grounds of defining a broader severe mental disease phenotype, or SNPs associated with eosinophil count could have been tested subsequently for asthma on the grounds of a common biological pathway. We excluded secondary clinical outcomes that were a subset of the primary clinical outcome (e.g. primary=stroke and secondary=ischemic stroke) or vice versa. If the primary clinical outcome was excluded, secondary clinical outcomes were evaluated for inclusion and, should more than one have qualified for inclusion, we chose to assess only those reported first by the GWAS investigators in the text of the article. When several genetic models of inheritance were eligible (e.g. log-additive, additive, dominant, recessive, model free), we selected the one that was presented by the original investigators as the main analysis. If several models were presented in the main analysis, the allelic (log-additive) model was chosen. Whenever a meta-analysis of populations of the same ancestry was not performed in the original article, we performed a fixed-effects meta-analysis per ancestry group to document eligibility. Selection of replication datasets to avoid the winner’s curse The effect sizes for genetic associations are expected to be inflated when these are selected among those that pass a threshold based on statistical significance or some other selection metric (e.g. p-value, Bayes factor, or false-discovery threshold). When only those SNPs that pass the threshold are carried to the next stage, there can be inflation due to the winner’s curse. The inflation is greater when the power of the data is more limited to show an association at the desired threshold. This inflation could affect the comparison of effect sizes between ancestry groups, if datasets from different ancestry groups come from different stages in the discovery and replication process and these stages have different selection thresholds. Therefore, we grouped datasets in the following three groups. The first group (“agnostic discovery”) included datasets subjected to agnostic testing in a genome-wide association platform with the aim to discover new variants. The second group (“trimming down”) included datasets from intermediate stage(s) where selection of the genetic markers to move forward to the next stage was based upon passing a threshold (p-value or other) in the data in the specific stage or in the combination of the data in the specific stage and previous stage(s). The third group (“replication”) included datasets where variants surviving the previous stage(s) were assessed in one or multiple independent datasets and there was no other subsequent stage where only some best selected SNPs were tested. This third group was split to subgroup 1, where combination of the previous data had not yet reached GWS; and subgroup 2 where combination of the previous data had already reached GWS . Of note, when a new GWAS reported its results on specific SNPs that had been previously found to be GWS in another ancestry group(s), this testing represents replication (third group) even though these SNPs may have been tested as part of the agnostic genome-wide platform. For analyses other than assessing GWS, we excluded the ‘discovery’ and the ‘trimming-down’ datasets, in order to minimize bias arising from inflation of effects. Effect metrics For each population dataset included in an eligible association, we extracted the allele or genotype counts to generate the natural logarithm of the odds ratio (OR) and its standard error for the strength of the association according to the selected genetic model. When allele and genotype data were not given, we used directly the provided OR and 95% CIs, or estimated this information indirectly from other published information (e.g. through the provided OR and P value). Whenever a study had used a family-based design or there was any concern about cryptic relatedness (e.g. in deCODE Icelandic populations), we preferred the relatedness-adjusted estimates over unadjusted estimates. In such cases, when we calculated estimates ourselves, we also recorded the inflation factor lambda for the study so as to adjust the standard error of the effect size (standard error is multiplied by the square root of lambda). For associations with quantitative traits rather than binary phenotypes (e.g., lipid levels, body mass index, weight, height, and waist circumference), we did the same analyses using the standardized mean difference (SMD) instead of OR. The SMD expresses the effect as a multiple of the standard deviation of the measure of interest in the population. The same considerations applied for correction for lambda. Summary odds ratio calculations used the fixed-effects model and the random-effects model methodology. Fixed-effects models assume that all studies aim at evaluating a common underlying genetic effect and results differ by chance alone. Random-effects models anticipate that the studies may have genuine differences in their results; thus, they also incorporate a between-study variance in their estimates. Random-effects models are generally more conservative (that is, they provide wider confidence intervals when there is between-study heterogeneity). APPENDIX A. Eligible genome-wide association study publications. 1-33 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. Cho, Y.S. et al. A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 41, 527-34 (2009). Dehghan, A. et al. Association of three genetic loci with uric acid concentration and risk of gout: a genome-wide association study. Lancet 372, 1953-61 (2008). Easton, D.F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087-93 (2007). Eeles, R.A. et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat Genet 41, 1116-21 (2009). Frayling, T.M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889-94 (2007). Graham, R.R. et al. Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus. Nat Genet 40, 1059-61 (2008). Gudbjartsson, D.F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353-7 (2007). Gudbjartsson, D.F. et al. Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction. Nat Genet 41, 342-7 (2009). Gudbjartsson, D.F. et al. Many sequence variants affecting diversity of adult human height. Nat Genet 40, 609-15 (2008). Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet 39, 631-7 (2007). Han, J.W. et al. Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nat Genet 41, 1234-7 (2009). Harley, J.B. et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat Genet 40, 204-10 (2008). Hom, G. et al. Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N Engl J Med 358, 900-9 (2008). Ikram, M.A. et al. Genomewide association studies of stroke. N Engl J Med 360, 1718-28 (2009). Kozyrev, S.V. et al. Functional variants in the B-cell gene BANK1 are associated with systemic lupus erythematosus. Nat Genet 40, 211-6 (2008). Lei, S.F. et al. Genome-wide association scan for stature in Chinese: evidence for ethnic specific loci. Hum Genet 125, 1-9 (2009). Lettre, G. et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet 40, 584-91 (2008). Loos, R.J. et al. Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat Genet 40, 768-75 (2008). Musone, S.L. et al. Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus. Nat Genet 40, 1062-4 (2008). O'Donovan, M.C. et al. Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet 40, 1053-5 (2008). Sanna, S. et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nat Genet 40, 198-203 (2008). Satake, W. et al. Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson's disease. Nat Genet 41, 1303-7 (2009). Simon-Sanchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nat Genet 41, 1308-12 (2009). Sleiman, P.M. et al. Variants of DENND1B associated with asthma in children. N Engl J Med 362, 36-44. 25. 26. 27. 28. 29. 30. 31. 32. 33. Stacey, S.N. et al. Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet 39, 865-9 (2007). Steinthorsdottir, V. et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet 39, 770-5 (2007). Takeuchi, F. et al. Confirmation of multiple risk Loci and genetic impacts by a genome-wide association study of type 2 diabetes in the Japanese population. Diabetes 58, 1690-9 (2009). Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet 40, 631-7 (2008). Unoki, H. et al. SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in East Asian and European populations. Nat Genet 40, 1098-102 (2008). Weedon, M.N. et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 40, 575-83 (2008). Yasuda, K. et al. Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus. Nat Genet 40, 1092-7 (2008). Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 40, 638-45 (2008). Zhou, X. et al. HLA-DPB1 and DPB2 are genetic loci for systemic sclerosis: a genome-wide association study in Koreans with replication in North Americans. Arthritis Rheum 60, 3807-14 (2009).