Senapati et al. Validation of CeD variants in Indian population 1 Evaluation of European coeliac disease risk variants in a north Indian population 2 Short title: Validation of CeD variants in Indian population 3 4 Sabyasachi Senapati1,*, Javier Gutierrez-Achury2,*, Ajit Sood3, Vandana Midha3, Agata 5 Szperl2, Jihane Romanos2, Alexandra Zhernakova2, Lude Franke2, Santos Alonso4, 6 Thelma BK1,#, Cisca Wijmenga2,#, Gosia Trynka2,5,# 7 1) Department of Genetics, University of Delhi South Campus, New Delhi, India 8 2) University of Groningen, University Medical Hospital Groningen, Department of 9 Genetics, Groningen, The Netherlands 10 3) Dayanand Medical College and Hospital, Ludhiana, Punjab, India 11 4) Department of Genetics, Physical Anthropology and Animal Physiology, University 12 of the Basque Country, Spain 13 5) Current address: Division of Genetics, Brigham and Women's Hospital, Harvard 14 Medical School, Boston, MA, USA 15 * These authors contributed equally to the study 16 # These authors contributed equally to this study 17 Correspondence: Cisca Wijmenga PhD, Department of Genetics, University Medical 18 Centre Groningen, PO Box 30001, 9700 RB Groningen, The Netherlands. 19 (phone : +31 50 361710; fax: +31 50 3617230; e-mail: c.wijmenga@umcg.nl) 20 Supplementary information: 1 figure – 3 tables 21 Page 1 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Abstract 2 Studies in European populations have contributed to a better understanding of the 3 genetics of complex diseases, for example, in coeliac disease, studies of over 23,000 4 European samples have reported association to the HLA locus and another 39 loci. 5 However, these associations have not been evaluated in detail in other ethnicities. We 6 sought to better understand how disease-associated loci that have been mapped in 7 Europeans translate to a disease risk for a population with a different ethnic 8 background. We therefore performed a validation of European risk loci for coeliac 9 disease in 497 cases and 736 controls of north Indian origin. Using a dense-genotyping 10 platform (Immunochip), we confirmed the strong association to the HLA region 11 (rs2854275, p=8.2x10-49). The ANK3 locus at chromosome 10 showed suggestive 12 association (rs4948256, p=9.3x10-7, rs4758538, p=8.6x10-5 and rs17080877, p=2.7x10- 13 5 14 to loci harbouring FASLG/TNFSF18, SCHIP1/IL12A, PFKFB3/PRKCQ, ZMIZ1 and 15 ICOSLG). Using a transferability test we further confirmed association at 16 PFKFB3/PRKCQ (rs2387397, p=2.8x10-4) and PTPRK/THEMIS (rs55743914, 17 p=3.4x10-4). The north Indian population has a higher degree of consanguinity than 18 Europeans and we therefore explored the role of recessively acting variants, which 19 replicated the HLA locus (rs9271850, p=3.7x10-23) and suggested a role of additional 20 four loci. To our knowledge, this is the first replication study of coeliac disease variants 21 in a non-European population. 22 Keywords: Trans-ethnic replication; Coeliac disease; Genetic associations ). We directly replicated five previously reported European variants (p<0.05; mapping Page 2 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Introduction 2 Hundreds of common variants have been established for immune-mediated diseases, 3 mainly from genome-wide association studies (GWAS) in populations of European 4 ancestry 1. Although a large proportion of the European association signals could be 5 generalized to other ethnic populations, studies using multi-ethnic cohorts were able to 6 identify additional risk variants 2-4,5. 7 Coeliac disease (CeD) is a common, autoimmune disorder, with a similar prevalence 8 among Europeans and north Indians (1-3%) 6. The largest genetic risk to CeD is 9 conferred by the HLA haplotypes, which account for approximately 35-40% of the risk 10 7 11 non-HLA loci 8-11. Although these analyses provided much insight into genetic risk 12 factors for CeD, they were largely based on European populations, so that the risk of 13 these associations in other non-European populations needed to be assessed. 14 In this first major study on the genetic factors in CeD in a non-European population, we 15 explored the contribution of reported risk variants and searched for new associated 16 variants using a cohort of 1,233 north Indian individuals. We acknowledge the fact that 17 about half of this cohort contributed to the previously published Immunochip study. 18 However, due to the large size of the European cohorts, it is likely that any effect 19 derived from the north Indian population would have been overwhelmed by the 20 European effects (620 north Indian samples, which is only 2.5% of the total 24,269 21 samples). In the current study, we have doubled the sample size of the north Indian 22 cohort and analysed the transferability of known European variants to this population. 23 We also took advantage of the higher degree of consanguinity in the north Indian . Recent progress in understanding the genetic architecture of CeD has identified 39 Page 3 of 24 Senapati et al. Validation of CeD variants in Indian population 1 population 12 and assessed the excess homozygosity to explore the possibility of 2 recessively acting risk variants. 3 Materials and Methods 4 Ethical Statement 5 This study was approved by the respective institutional and university ethical 6 committees. Informed written consent was received from all participants. 7 Study Populations 8 We used two ethnically distinct cohorts from northern India and the Netherlands. The 9 Indian cases (n = 497) and controls (n = 736) were recruited from Dayanand Medical 10 College and Hospital, Ludhiana, in the Punjab (northern India). The Indian controls 11 included blood donors who tested negative for CeD serology. All the Indian subjects 12 had self-reported northern Indian ethnicity. The Dutch cases (n = 1150) and controls (n 13 = 1173) were recruited from three university medical centres (Utrecht, Leiden, and 14 VUmc, Amsterdam) in the Netherlands. A small proportion of cases were recruited via 15 the Dutch CeD patients’ society. In both cohorts, Indian and Dutch CeD patients were 16 diagnosed according to standard clinical, serological and histopathological criteria, 17 following the ESPGHAN criteria 13. Most of these samples have been described 18 elsewhere 11. 19 DNA Extraction and Genotyping 20 Most of the DNA samples were derived from blood, or from saliva for a small 21 proportion of the Dutch cases and controls. Samples were hybridized on the 22 Immunochip platform, a custom-made chip with 196,524 markers11. Genotyping was Page 4 of 24 Senapati et al. Validation of CeD variants in Indian population 1 carried out according to Illumina’s protocol at the genotyping facility of the University 2 Medical Centre Groningen. The variant calls were exported from Genome Studio with 3 human genome GRCh36/hg18 coordinates and further converted into GRCh37/hg19 4 coordinates using the UCSC liftOver tool 14. 5 Genotype Data Quality Control 6 We manually inspected cluster plots and adjusted them if required for about 150,000 7 markers. For about 20,000 SNPs, the assay performed poorly, and those variants were 8 marked as uncalled and excluded from the final analysis. This left 178,111 high quality 9 SNPs that were exported from Illumina’s Genome Studio. We required samples to have 10 ≥98% call rate. Individuals showing a high degree of genetic relatedness (estimated 11 kinship coefficient >0.2) or with a discordant sex were removed. Population outliers 12 were identified by principal component analysis (PCA) implemented in EIGENSOFT 13 5.0.1 15,16. We excluded markers with a call rate lower than 99% or with significant 14 deviation from Hardy-Weinberg equilibrium (p > 1x10-3) and were left with 176,799 15 variants. A large number of Immunochip variants was of low frequency, but our study 16 was underpowered to assess their association in the north Indian cohort; we therefore 17 filtered out SNPs with a minor allele frequency (MAF) <10%. This excluded 88,083 18 and 87,442 variants in the north Indian and Dutch cohorts, respectively. We further only 19 used the set of variants shared across the two cohorts, resulting in 88,716 SNPs in the 20 final analysis. Genomic inflation for each cohort was estimated using 3,016 ‘neutral’ 21 variants present on the Immunochip array for the replication of the GWAS for reading 22 and writing skills and therefore unlikely to be confounded by the immune signals. 23 Statistical Analysis Page 5 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Chip-wide analysis: For both populations we applied additive and recessive models 2 using logistic regression, including the first four principal components and gender as 3 covariates. We used an Immunochip-wide p-value cut-off of p = 3.5 x 10-6 calculated as 4 0.05 divided by 14,136 LD independent SNPs (r2 < 0.1 computed across 100 SNPs, 5 with a 10-SNP sliding window) with MAF > 10% and based on the north Indian cohort 6 to determine a significant association. We used p < 1x10-4 to report a suggestive 7 association. To reassess the significance of the associations detected in this test we also 8 performed a minimum of 100,000 permutations to compute empirical p-values. 9 Replication: We used two approaches to test for replication of the 39 non-HLA CeD 10 loci that have been reported in Europeans 11 in the north Indian cohort: the exact SNP 11 replication, and the locus transferability. 12 Exact SNP replication: The reported index SNPs (representing primary as well 13 as secondary, independent signals) were tested for association in the north 14 Indian cohort. We considered replication to be significant at p < 0.05. During the 15 quality control, we removed 13 of the total of 57 independent SNPs associated 16 to the 39 loci because of MAF < 10%. The exact SNP replication was applied to 17 44 reported SNPs, representing 38 distinct loci. 18 Locus transferability test: We performed a transferability test to account for 19 linkage disequilibrium (LD) differences across different ethnic populations. If 20 the causal SNP is poorly tagged by the associated SNP reported in the discovery 21 population (European), the differences in LD between the north Indian and 22 Dutch populations could result in either a lack of replication or a different 23 localization of the association signal. Published LD boundaries were used to 24 define each of the 39 non-HLA loci 11. First, we identified all the LD correlated Page 6 of 24 Senapati et al. Validation of CeD variants in Indian population 1 markers for each locus, measured by r2 ≥ 0.05 between the index SNP and all the 2 variants present in CeD loci using the Dutch controls. We then checked the 3 association status of these variants in the north Indian Immunochip dataset for 4 transferability. SNP rs13132308 from the IL2/IL21 locus was not present in the 5 north Indian dataset and was replaced by its best proxy SNP rs13151961 (r2 > 6 0.9 based on the Dutch controls). The ELMO1 gene and SCHIP1/IL12A locus 7 were excluded because the index variants for these loci, or their proxies failed 8 the 10% MAF filter. We required at least one significant variant per locus. The 9 significance threshold was assessed in two ways. Firstly, across all tested loci 10 (37) we identified 142 LD-independent SNPs (pairwise r2 < 0.1) and used p = 11 0.05/142 = 3.52x10-4 as the significance threshold. Secondly, we employed a 12 permutation-based test as an alternative way to account for LD structure. We 13 performed 1,000 permutations for which we randomized phenotype labels. We 14 identified a nominal p-value for each of the 37 loci in each of the permutation 15 results. We then defined the 5th percentile of the nominal p-values as a 16 significance threshold. 17 Runs of homozygosity (ROH): The north Indian population has a reported higher 18 degree of consanguinity 12,17-19, therefore we sought to identify if there were 19 significantly associated runs of homozygous variants. We first pruned both datasets for 20 LD using an r2 ≥ 0.8 as a threshold to avoid detecting false-positive regions due to 21 homozygous stretches of LD-correlated SNPs 20-22. This is particularly likely to occur 22 for the Immunochip, as it includes regions with a high density of markers that are often 23 in strong LD with each other. Then we identified regions with ROH, which were 24 defined as at least 50 contiguous homozygous SNPs with no interruption, without Page 7 of 24 Senapati et al. Validation of CeD variants in Indian population 1 heterozygous positions in between, and with no more than 500 kb distance between 2 contiguous SNPs. Association of ROH regions was performed using PLINK v1.07 23. 3 We considered a locus significant after Bonferroni correction using the total number of 4 ROH found per dataset (8) in our analysis with a p < 0.006. We assessed the false 5 discovery rate (FDR) by running 10,000 permutations in which we randomly shuffled 6 the phenotypes and reassessed the significance of association. 7 Data availability 8 Genotype data for the north Indian cohort is available through the European Genome- 9 phenome Archive (https://www.ebi.ac.uk/ega/) under the accession number 10 EGAS00001000849. The summary statistics can be obtained by contacting the authors 11 directly. 12 Results 13 We obtained high quality genotype data for 88,716 common SNPs that were shared 14 between the north Indian and Dutch cohorts. PCA revealed different clustering 15 patterns between the two cohorts, consistent with their different ethnicities. 16 Furthermore, the case-control samples were homogeneous within each of the 17 datasets (Supplementary Figure 1). We used the first four principal components to 18 control for possible population substructure and did not observe significant genomic 19 inflation in any of the datasets (λIndia = 1.036, λNetherlands = 1.039). 20 The strongest association in the north Indian cohort was present at the HLA locus. 21 Comparable to the Dutch sample, the SNP rs2854275 was the top signal (pIndia = 22 8.17x10-49, ORIndia = 6.97 [5.38-9.04]); pDutch = 4.413x10-121, ORDutch = 10.08 [8.3- Page 8 of 24 Senapati et al. Validation of CeD variants in Indian population 1 12.23]). This replicated the known, very strong effect of HLA in coeliac disease. No 2 other locus reached genome-wide significance (Figure 1 and Table 3). The 3 rs4948256 at 10q21.2 in the ANK3 gene reached our Immunochip-wide significance 4 threshold (p = 9.28x10-7, ppermuted=1x10-6). The rs4758538 at 11p15.4 in the OSBPL5 5 gene (p = 8.57x10-5, ppermuted= 7.93x10-5) and rs17080877 at 18q22.2 near the DOK6 6 gene (p = 2.68x10-5, ppermuted= 1.7x10-5) both were suggestively associated. None of 7 these associations could be replicated in the Dutch cohort. 8 9 Replication of European Associations in the North Indian Cohort Using Exact and 10 Transferability Tests 11 The exact association in the north Indian cohort was observed at five previously 12 reported loci, corresponding to FASLG/TNFRSF18 (rs12142280, p = 0.002, OR = 13 0.69 [0.54-0.87]), ICOSLG (rs58911644, p = 0.004, OR = 0.68 [0.51-0.88]), 14 PFKFB3/PRKCQ (rs2387397, p = 0.02, OR = 0.77 [0.61-0.96]), SCHIP1/IL12A 15 (rs2561288, p = 0.03, OR = 1.2 [1.01-1.44]) and ZMIZ1 (rs1250552, p = 0.03, OR = 16 1.2 [1.01-1.44]) (Table 1). At all these variants, the directions of association were 17 the same as those previously reported (Supplementary Table 1). Three of these five 18 loci, FASLG/TNFRSF18, ZMIZ1 and ICOSLG, previously reached p < 0.05 in the 19 north Indian samples employed in the initial study by Trynka et al.11 (Supplementary 20 Table 1). 21 Using the transferability analysis of 39 loci, we observed two associations, which 22 passed two independent significance thresholds (Table 2). One was at the 23 PFKFB3/PRKCQ locus represented by the SNP rs744254 (p = 2.8x10-4, OR = 0.70 Page 9 of 24 Senapati et al. Validation of CeD variants in Indian population 1 [0.57-0.84]); the other was at the PTPRK/THEMIS locus, rs4142030 (p = 3.4x10-4, 2 OR = 1.39 [1.16-1.66]) (Supplementary Table 2). 3 Recessive Analysis in the North Indian Population 4 The ROH analysis identified 16 genomic regions with more than 50 consecutive 5 homozygous SNPs each; two regions showed suggestive association with CeD in the 6 north Indian cohort. They were located on chromosomes 2q11.2 (p = 4.1x10-3, FDR 7 = 0.11) and 6p21.33 (p = 1.4x10-4, FDR = 0.0001) (Supplementary Table 3). None 8 of those regions replicated in the Dutch cohort. 9 Complementary to the ROH analysis, we also applied a recessive model to test for 10 association across all 88,716 SNPs. Apart from the SNP rs9271850 (p = 3.67x10-23, 11 OR = 0.12 [0.08-0.18]) located in the HLA locus, we observed four variants (MAF > 12 0.25) at four loci that reached the threshold of suggestive association (p < 1x10-4) 13 (Figure 1 and Table 3). The rs744253 at 10p15.1 (p = 9.27x10-5, OR = 0.3859; 14 ppermuted= 6.7x10-5) corresponds to PFKFB3/PRKCQ locus that was replicated in the 15 direct association and transferability tests described above. The remaining three loci 16 were new suggestive associations, with the strongest observed at the intergenic SNP 17 rs963872 (p = 1.2x10-5, OR = 2.01 [1.47-2.75]; ppermuted= 1.2x10-5) at 5p15.33, near 18 IRX2 gene, followed by rs10994257 (p = 1.8x10-5, OR = 2.5 [1.67-3.99]; ppermuted= 19 5x10-6) at 10q21.2 mapping to the ANK3 gene and rs6010998 (p = 6.17x10-5, OR = 20 2.8 [1.67-4.66]; ppermuted= 2.5x10-5) at 20q13.33 in the RTEL1 gene. 21 22 Discussion Page 10 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Recent trans-ethnic studies have indicated that a large proportion of common 2 disease- or trait-associated variants established in populations of European descent 3 also contribute to the risk in other ethnic groups 4,24,25. Yet this observation cannot 4 be generalized because only a few of the GWAS were conducted in non-European 5 populations 26. Novel loci have been reported in these limited studies 27,28. Thus, 6 more multi-ethnic studies may help identify not only shared disease determinants, 7 but also additional risk variants, which may provide leads towards a better 8 understanding of the disease biology 29. 9 To our knowledge there have been very few genetic association studies for coeliac 10 disease performed in non-European populations, and those that have been published 11 have focused only on the HLA effects based on very small sample size 30-36. In this 12 study we attempted to validate known genetic determinants of CeD and identify 13 potentially novel loci in the north Indian population. We used the Immunochip 14 genotyping platform for this study because it offers the highest coverage of regions 15 associated to immune-related disorders, including those conferring risk to coeliac 16 disease. 17 Given our modest sample size and the limited power of our study, it is likely that the 18 real replication rate of the European associations in the north Indian population is 19 much greater than we can report here. In this context, it should be mentioned that 20 although half of the samples of the north Indian cohort were previously analysed in 21 the study by Trynka et al. 11, that study focused strongly on the effects in samples of 22 European ancestry (97% of the 24,269 samples were of European descent). In the 23 current study we wanted to focus specifically on the role of the European variants in 24 the north Indian population and to explore potential new associations that could have Page 11 of 24 Senapati et al. Validation of CeD variants in Indian population 1 been diluted in the previous study due to the dominance of the European cohorts. 2 Our study suffers from a small sample size and lack of replication, for these reasons 3 we acknowledge that the results presented here need to be validated further in future 4 replication studies. 5 We successfully replicated HLA and five of the 38 tested European SNPs (13%); 6 these included the loci harbouring the FASLG/TNFSF18, SCHIP1/IL12A, 7 PFKFB/PRKCQ, ZMIZ1, and ICOSLG genes. The transferability test further 8 confirmed association to the PFKFB/PRKCQ locus and identified an additional 9 association to a variant at the PTPRK/THEMIS locus with suggestive evidence of 10 association. The top association at this locus, rs4142030, has a very low correlation 11 with the European index variants (r2 = 0.083, D’ = 0.53) and is located in intron 2 of 12 THEMIS, in contrast to the European-associated SNP, which is located in intron 30 13 of PTPRK. Given that our study was underpowered, we acknowledge that these 14 suggestive associations must be further replicated in north Indian populations. 15 However, it is tempting to speculate that this localization of the association signal 16 could result from the independent effects of each of the genes in the different 17 ethnicities. Both genes are of functional relevance to CeD aetiology and a recent 18 study showed that the mRNA levels between these two genes have higher 19 expression levels in CeD patients but not in controls 37. 20 The analysis of the recessive effects implicated multiple genomic regions at a 21 suggestive significance. With the run of homozygosity test, we found a suggestive 22 association to a region at 2q11.2 that spans 1.14 Mb and includes the LYG1, 23 TXNDC9, EIF5B, REV1, AFF3 and LONRF2 genes. This region has been associated 24 with multiple immune-related phenotypes, including type 1 diabetes (T1D) and Page 12 of 24 Senapati et al. Validation of CeD variants in Indian population 1 rheumatoid arthritis (RA) 38-40. The second region maps to the HLA locus at 6p21 2 and contains the MICA gene, which is also a locus with a pleiotropic effect in 3 phenotypes such as Behcet’s disease, RA, or HCV-induced hepatocellular carcinoma 4 41-47 5 that have also been linked to other phenotypes, e.g. ANK3, which has been 6 associated with psychiatric disorders 48,49, and with other loci that include 7 PFKFB3/PRKCQ and RTEL1/TNFRSF6B reported to be associated with immune- 8 related diseases such as T1D, RA, and irritable bowel disease 39,50-63. Apart from 9 HLA the ANK3 locus was the only one that passed Immunochip-wide significance . The Immunochip-wide analysis under a recessive model suggested four loci 10 threshold when we considered the additive model. None of these loci were replicated 11 in the Dutch cohort, which could reflect a population-specific effect, but they require 12 further replication in the north Indian population to robustly establish their 13 association with a genome-wide significance. Although the genes mapping to the 14 associated loci that we report here appear to be relevant candidates for coeliac 15 disease pathogenesis, our study was biased towards identifying biologically relevant 16 genes because the Immunochip was designed to specifically target regions 17 previously reported as associated to immune-related diseases. 18 19 Acknowledgements 20 We thank Jackie Senior for carefully reading the manuscript. We received funding 21 from the European Research Council under the European Union's Seventh 22 Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 2012-322698 to 23 CW. GT is supported by a Rubicon grant from The Netherlands Organization for Page 13 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Scientific Research (NWO). BKT received funding from a JC Bose fellowship, 2 Department of Science and Technology, Government of India. SS is supported by a 3 Senior Research Fellowship from the Council for Scientific and Industrial Research 4 (CSIR), New Delhi, India. AZ is supported by a grant from the Dutch Reumafonds 5 (11-1-101) and a Rosalind Franklin Fellowship from the University of Groningen, 6 the Netherlands. 7 Conflict of interest. 8 The authors declare they have no conflicts of interest. 9 References 10 1. Welter D, Macarthur J, Morales J et al: The NHGRI GWAS Catalog, a curated 11 resource of SNP-trait associations. Nucleic acids research 2014; 42: D1001- 12 1006. 13 2. Ng MC, Saxena R, Li J et al: Transferability and fine mapping of type 2 14 diabetes loci in African Americans: the Candidate Gene Association Resource 15 Plus Study. Diabetes 2013; 62: 965-976. 16 3. Liu CT, Ng MC, Rybin D et al: Transferability and fine-mapping of glucose and 17 insulin quantitative trait loci across populations: CARe, the Candidate Gene 18 Association Resource. Diabetologia 2012; 55: 2970-2984. 19 4. 20 21 22 Teslovich TM, Musunuru K, Smith AV et al: Biological, clinical and population relevance of 95 loci for blood lipids. Nature 2010; 466: 707-713. 5. Bustamante CD, Burchard EG, De la Vega FM: Genomics for the world. Nature 2011; 475: 163-165. Page 14 of 24 Senapati et al. 1 6. Validation of CeD variants in Indian population Makharia GK, Verma AK, Amarchand R et al: Prevalence of celiac disease in 2 the northern part of India: a community based study. Journal of 3 gastroenterology and hepatology 2011; 26: 894-900. 4 7. Gutierrez-Achury J, Coutinho de Almeida R, Wijmenga C: Shared genetics in 5 coeliac disease and other immune-mediated diseases. Journal of internal 6 medicine 2011; 269: 591-603. 7 8. van Heel DA, Franke L, Hunt KA et al: A genome-wide association study for 8 celiac disease identifies risk variants in the region harboring IL2 and IL21. 9 Nature genetics 2007; 39: 827-829. 10 9. Hunt KA, Zhernakova A, Turner G et al: Newly identified genetic risk variants 11 for celiac disease related to the immune response. Nature genetics 2008; 40: 12 395-402. 13 10. Dubois PC, Trynka G, Franke L et al: Multiple common variants for celiac 14 disease influencing immune gene expression. Nature genetics 2010; 42: 295- 15 302. 16 11. Trynka G, Hunt KA, Bockett NA et al: Dense genotyping identifies and 17 localizes multiple common and rare variant association signals in celiac disease. 18 Nature genetics 2011; 43: 1193-1201. 19 12. 20 21 Reich D, Thangaraj K, Patterson N, Price AL, Singh L: Reconstructing Indian population history. Nature 2009; 461: 489-494. 13. Husby S, Koletzko S, Korponay-Szabo IR et al: European Society for Pediatric 22 Gastroenterology, Hepatology, and Nutrition guidelines for the diagnosis of 23 coeliac disease. Journal of pediatric gastroenterology and nutrition 2012; 54: 24 136-160. Page 15 of 24 Senapati et al. 1 14. 2 3 Validation of CeD variants in Indian population Hinrichs AS, Karolchik D, Baertsch R et al: The UCSC Genome Browser Database: update 2006. Nucleic acids research 2006; 34: D590-598. 15. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: 4 Principal components analysis corrects for stratification in genome-wide 5 association studies. Nature genetics 2006; 38: 904-909. 6 16. 7 8 genetics 2006; 2: e190. 17. 9 10 Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Narang A, Jha P, Rawat V et al: Recent admixture in an Indian population of African ancestry. American journal of human genetics 2011; 89: 111-120. 18. Metspalu M, Romero IG, Yunusbayev B et al: Shared and unique components 11 of human population structure and genome-wide signals of positive selection in 12 South Asia. American journal of human genetics 2011; 89: 731-744. 13 19. 14 15 Moorjani P, Thangaraj K, Patterson N et al: Genetic evidence for recent population mixture in India. American journal of human genetics 2013. 20. Lencz T, Lambert C, DeRosse P et al: Runs of homozygosity reveal highly 16 penetrant recessive loci in schizophrenia. Proceedings of the National Academy 17 of Sciences of the United States of America 2007; 104: 19942-19947. 18 21. McQuillan R, Leutenegger AL, Abdel-Rahman R et al: Runs of homozygosity 19 in European populations. American journal of human genetics 2008; 83: 359- 20 372. 21 22 22. Keller MC, Simonson MA, Ripke S et al: Runs of homozygosity implicate autozygosity as a schizophrenia risk factor. PLoS genetics 2012; 8: e1002656. Page 16 of 24 Senapati et al. 1 23. Validation of CeD variants in Indian population Purcell S, Neale B, Todd-Brown K et al: PLINK: a tool set for whole-genome 2 association and population-based linkage analyses. American journal of human 3 genetics 2007; 81: 559-575. 4 24. Waters KM, Stram DO, Hassanein MT et al: Consistent association of type 2 5 diabetes risk variants found in europeans in diverse racial and ethnic groups. 6 PLoS genetics 2010; 6. 7 25. Kurreeman F, Liao K, Chibnik L et al: Genetic basis of autoantibody positive 8 and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from 9 electronic health records. American journal of human genetics 2011; 88: 57-69. 10 26. 11 12 Need AC, Goldstein DB: Next generation disparities in human genomics: concerns and remedies. Trends in genetics : TIG 2009; 25: 489-494. 27. Negi S, Juyal G, Senapati S et al: A genome-wide association study reveals 13 ARL15, a novel non-HLA susceptibility gene for Rheumatoid arthritis in north 14 Indians. Arthritis and rheumatism 2013. 15 28. Tabassum R, Chauhan G, Dwivedi OP et al: Genome-wide association study for 16 type 2 diabetes in Indians identifies a new susceptibility locus at 2q21. Diabetes 17 2013; 62: 977-986. 18 29. 19 20 biology and drug discovery. Nature 2014; 506: 376-381. 30. 21 22 23 Okada Y, Wu D, Trynka G et al: Genetics of rheumatoid arthritis contributes to Brar P, Lee AR, Lewis SK, Bhagat G, Green PH: Celiac disease in AfricanAmericans. Digestive diseases and sciences 2006; 51: 1012-1015. 31. Almeida RC, Gandolfi L, De Nazare Klautau-Guimaraes M et al: Does celiac disease occur in Afro-derived Brazilian populations? American journal of Page 17 of 24 Senapati et al. Validation of CeD variants in Indian population 1 human biology : the official journal of the Human Biology Council 2012; 24: 2 710-712. 3 32. Remes-Troche JM, Ramirez-Iglesias MT, Rubio-Tapia A, Alonso-Ramos A, 4 Velazquez A, Uscanga LF: Celiac disease could be a frequent disease in 5 Mexico: prevalence of tissue transglutaminase antibody in healthy blood donors. 6 Journal of clinical gastroenterology 2006; 40: 697-700. 7 33. Perez-Bravo F, Araya M, Mondragon A et al: Genetic differences in HLA- 8 DQA1* and DQB1* allelic distributions between celiac and control children in 9 Santiago, Chile. Human immunology 1999; 60: 262-267. 10 34. Akbari MR, Mohammadkhani A, Fakheri H et al: Screening of the adult 11 population in Iran for coeliac disease: comparison of the tissue-transglutaminase 12 antibody and anti-endomysial antibody tests. European journal of 13 gastroenterology & hepatology 2006; 18: 1181-1186. 14 35. Srivastava A, Yachha SK, Mathias A, Parveen F, Poddar U, Agrawal S: 15 Prevalence, human leukocyte antigen typing and strategy for screening among 16 Asian first-degree relatives of children with celiac disease. Journal of 17 gastroenterology and hepatology 2010; 25: 319-324. 18 36. El-Akawi ZJ, Al-Hattab DM, Migdady MA: Frequency of HLA-DQA1*0501 19 and DQB1*0201 alleles in patients with coeliac disease, their first-degree 20 relatives and controls in Jordan. Annals of tropical paediatrics 2010; 30: 305- 21 309. 22 37. Bondar C, Plaza-Izurieta L, Fernandez-Jimenez N et al: THEMIS and PTPRK in 23 celiac intestinal mucosa: coexpression in disease and after in vitro gliadin 24 challenge. European journal of human genetics : EJHG 2013. Page 18 of 24 Senapati et al. 1 38. 2 3 Validation of CeD variants in Indian population Sandholm N, Salem RM, McKnight AJ et al: New susceptibility loci associated with kidney disease in type 1 diabetes. PLoS genetics 2012; 8: e1002921. 39. Stahl EA, Raychaudhuri S, Remmers EF et al: Genome-wide association study 4 meta-analysis identifies seven new rheumatoid arthritis risk loci. Nature genetics 5 2010; 42: 508-514. 6 40. Todd JA, Walker NM, Cooper JD et al: Robust associations of four new 7 chromosome regions from genome-wide analyses of type 1 diabetes. Nature 8 genetics 2007; 39: 857-864. 9 41. Bratanic N, Smigoc Schweiger D, Mendez A, Bratina N, Battelino T, Vidan- 10 Jeras B: An influence of HLA-A, B, DR, DQ, and MICA on the occurrence of 11 Celiac disease in patients with type 1 diabetes. Tissue antigens 2010; 76: 208- 12 215. 13 42. Tinto N, Ciacci C, Calcagno G et al: Increased prevalence of celiac disease 14 without gastrointestinal symptoms in adults MICA 5.1 homozygous subjects 15 from the Campania area. Digestive and liver disease: official journal of the 16 Italian Society of Gastroenterology and the Italian Association for the Study of 17 the Liver 2008; 40: 248-252. 18 43. Martin-Pagola A, Ortiz L, Perez de Nanclares G, Vitoria JC, Castano L, Bilbao 19 JR: Analysis of the expression of MICA in small intestinal mucosa of patients 20 with celiac disease. Journal of clinical immunology 2003; 23: 498-503. 21 44. Fernandez L, Fernandez-Arquero M, Gual L et al: Triplet repeat polymorphism 22 in the transmembrane region of the MICA gene in celiac disease. Tissue 23 antigens 2002; 59: 219-222. Page 19 of 24 Senapati et al. 1 45. Validation of CeD variants in Indian population Kumar V, Kato N, Urabe Y et al: Genome-wide association study identifies a 2 susceptibility locus for HCV-induced hepatocellular carcinoma. Nature genetics 3 2011; 43: 455-458. 4 46. Hou S, Yang Z, Du L et al: Identification of a susceptibility locus in STAT4 for 5 Behcet's disease in Han Chinese in a genome-wide association study. Arthritis 6 and rheumatism 2012; 64: 4104-4113. 7 47. Eleftherohorinou H, Hoggart CJ, Wright VJ, Levin M, Coin LJ: Pathway-driven 8 gene stability selection of two rheumatoid arthritis GWAS identifies and 9 validates new susceptibility genes in receptor mediated signalling pathways. 10 11 Human molecular genetics 2011; 20: 3494-3506. 48. Chen SJ, Chao YL, Chen CY et al: Prevalence of autoimmune diseases in in- 12 patients with schizophrenia: nationwide population-based study. The British 13 journal of psychiatry: the journal of mental science 2012; 200: 374-380. 14 49. Ludvigsson JF, Osby U, Ekbom A, Montgomery SM: Coeliac disease and risk 15 of schizophrenia and other psychosis: a general population cohort study. 16 Scandinavian journal of gastroenterology 2007; 42: 179-185. 17 50. Anderson CA, Boucher G, Lees CW et al: Meta-analysis identifies 29 additional 18 ulcerative colitis risk loci, increasing the number of confirmed associations to 19 47. Nature genetics 2011; 43: 246-252. 20 51. Athanasiu L, Mattingsdal M, Kahler AK et al: Gene variants associated with 21 schizophrenia in a Norwegian genome-wide study are replicated in a large 22 European cohort. Journal of psychiatric research 2010; 44: 748-753. Page 20 of 24 Senapati et al. 1 52. Validation of CeD variants in Indian population Barrett JC, Clayton DG, Concannon P et al: Genome-wide association study and 2 meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature 3 genetics 2009; 41: 703-707. 4 53. Chen DT, Jiang X, Akula N et al: Genome-wide association study meta-analysis 5 of European and Asian-ancestry samples identifies three novel loci associated 6 with bipolar disorder. Molecular psychiatry 2013; 18: 195-205. 7 54. Cooper JD, Smyth DJ, Smiles AM et al: Meta-analysis of genome-wide 8 association study data identifies additional type 1 diabetes risk loci. Nature 9 genetics 2008; 40: 1399-1401. 10 55. Ferreira MA, O'Donovan MC, Meng YA et al: Collaborative genome-wide 11 association analysis supports a role for ANK3 and CACNA1C in bipolar 12 disorder. Nature genetics 2008; 40: 1056-1058. 13 56. Franke A, McGovern DP, Barrett JC et al: Genome-wide meta-analysis 14 increases to 71 the number of confirmed Crohn's disease susceptibility loci. 15 Nature genetics 2010; 42: 1118-1125. 16 57. Hsu YH, Zillikens MC, Wilson SG et al: An integration of genome-wide 17 association study and gene expression profiling to prioritize the discovery of 18 novel susceptibility Loci for osteoporosis-related traits. PLoS genetics 2010; 6: 19 e1000977. 20 58. Liu Y, Blackwood DH, Caesar S et al: Meta-analysis of genome-wide 21 association data of bipolar disorder and major depressive disorder. Molecular 22 psychiatry 2011; 16: 2-4. 23 24 59. Rajaraman P, Melin BS, Wang Z et al: Genome-wide association study of glioma and meta-analysis. Human genetics 2012; 131: 1877-1888. Page 21 of 24 Senapati et al. 1 60. Validation of CeD variants in Indian population Raychaudhuri S, Remmers EF, Lee AT et al: Common variants at CD40 and 2 other loci confer risk of rheumatoid arthritis. Nature genetics 2008; 40: 1216- 3 1223. 4 61. 5 6 influences glioma risk. Human molecular genetics 2011; 20: 2897-2904. 62. 7 8 9 10 Sanson M, Hosking FJ, Shete S et al: Chromosome 7p11.2 (EGFR) variation Shete S, Hosking FJ, Robertson LB et al: Genome-wide association study identifies five susceptibility loci for glioma. Nature genetics 2009; 41: 899-904. 63. Sklar P, Ripke S, Scott LJ et al: Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nature genetics 2011; 43: 977-983. 11 12 Page 22 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Titles and legends to figures 2 Figure 1. Additive and recessive association results in the north Indian population. 3 Manhattan plot of association under A) an additive and B) a recessive model for the 4 north Indian (blue dots) and Dutch populations (black dots). The Immunochip-wide 5 significance cut-off (p = 3.5x10-6) is depicted as a green dotted line and suggestive 6 significance threshold as a red dotted line (p = 1.0x10-4). We excluded the HLA 7 region. 8 Supplementary Figure 1. Principal component analysis in the north Indian and 9 Dutch populations. 10 PCA showed that cases (red crosses) and controls (red rhombs) were homogeneous 11 in both datasets and were in line with their respective reference populations. 12 13 Tables 14 Table 1. Five loci that were significantly (p < 0.05) replicated in the north Indian 15 cohort in the exact SNP test. 16 17 Table 2. Two loci significantly replicated in the north Indian cohort in the 18 transferability test. 19 Page 23 of 24 Senapati et al. Validation of CeD variants in Indian population 1 Table 3. Association results for Immunochip-wide additive and recessive models in 2 the north Indian population compared to the same model applied to the Dutch 3 cohort. We used an Immunochip-wide p-value cut-off of p = 3.5x10-6 to report a 4 significant association and p = 1x10-4 for suggestive association. Page 24 of 24