1 Online Repository 2 Genetic association of key Th1/Th2 pathway candidate genes, IRF2, IL6, 3 IFNGR2, STAT4 and IL4RA, with atopic asthma in the Indian population. 4 Amrendra Kumar1, Sudipta Das1, Anurag Agarwal2, Indranil Mukhopadhyay3 and Balaram 5 Ghosh1, 2 6 7 1 8 Asthma and Lung Disease, Institute of Genomics and Integrative Biology. Delhi- 110007 9 3 Molecular Immunogenetics Laboratory, 2 Centre of Excellence for Translational Research in Human Genetics unit, Indian Statistical Institute, Kolkata, India 10 11 12 13 14 Address of Correspondence 15 Dr. Balaram Ghosh, Ph.D. 16 Molecular Immunogenetics Laboratory, 17 Institute of Genomics and Integrative Biology 18 Mall Road, Delhi- 110007 19 Phone No 91-11-27662580 20 Fax No.:91-11-27667471, 91-11-27416489 21 E-mail ID: bghosh@igib.res.in 22 23 1 24 Supplementary methods 25 Gene and SNP selection 26 Ideally we would have preferred to include all genes reported to modulate Th1/Th2/Th17 27 differentiation, development and/or functions. However, limited in our choice by the available 28 resources, we selected 33 genes focusing on the Th1/Th2 pathway (Supplementary Table B) in 29 total where emphasis was given to the important mediators of IL-4 (IL-4RA, STAT6), IFNG 30 (IFNGR1, IFNGR2, STAT1) and IL-12 (IL-12A, IL-12B, IL-12RB1, IL-12RB2, STAT4) signaling 31 pathway genes or genes modulating expression and function of these pathways (IRF1, IRF2, 32 ATF2, TBET etc; Supplementary Table B). Genetic studies of IFNG 1, IL-4 33 reported previously from this lab and not a part of the current study. Detailed description of the 34 mechanisms through which these genes modulate Th1/Th2 pathways is out of the scope of this 35 article and readers are referred to appropriate articles/papers for the same (Supplementary 36 Table B). It should be mentioned here that we have included four genes (INPP4A, HSPH1, 37 ITLN1 and RPS6KB2) that were found to be differentially expressed in microarray datasets in a 38 meta-analysis from our laboratory 3, for several reasons. We have already reported identification 39 of INPP4A as novel asthma candidate gene that has been replicated in another population/study 3, 40 4 41 association with asthma was found. These evidences and the fact that their gene products have 42 been shown to be involved in regulation of immune homeostasis mechanisms by modulating cell 43 cycle, apoptosis, etc. necessitated/motivated detailed genetic association analysis of these genes 44 with dense marker selection. And it has been demonstrated that T helper cell differentiation is 45 controlled by cell cycle 5. Also, inclusion of these genes and validation of our preliminary report 46 would have potentiated novel findings. 2 genes have been . Also, in preliminary studies, using microsatellite markers in other three genes, a suggestive 2 47 Subjects/individuals in this study belong to the Indo-European caste groups, referred to as 6 48 IE-LPs in IGVDB (Indian Genome Variation Database) that have been shown to have closest 49 genetic affinity to CEU (Utah residents from Northern and western European) in HapMap 50 populations. Following criteria was adopted for selecting SNPs: (1) we selected CEU as 51 reference population, (2) selected TagSNPs (minor allele frequency ≥5% and r2 value set at 0.8), 52 (3) since HapMap CEU is not exactly similar to our population we also ensured that SNPs 53 (HapMap validated) were selected covering entire gene (distance between adjacent SNPs is not 54 more than ~ 4 to 5 KB), (4) efforts were made to include previously reported SNPs in these 55 genes (if not present in tag SNPs) if these were amenable to the assay (based on SNP score; 56 discussed below). 57 After selection of SNPs, all the SNPs were submitted to Illumina Assay Design Tool for 58 scoring prior to OPA (Oligonucleotide Pool All) design. The SNP scores were supplied by 59 Illumina Inc. and SNP score value ranges from 0 to 1.1. The SNP score reflects the ability to 60 design a successful assay (SNP score < 0.4: Low success rate, high risk to OPA; SNP score 0.4 - 61 0.6: Moderate success rate, moderate risk to OPA; SNP score 0.6 - 1.1: High success rate, low 62 risk to OPA). For the present study SNP score 0.6 was selected as the lowest cutoff. The SNPs 63 failing to achieve this score were replaced (with nearest neighbor, HapMap validated SNPs) and 64 submitted again for rescoring to Illumina Inc and this process was repeated till SNP score ≥ 0.6 65 was achieved for all the SNPs. The final list was then submitted to Illumina Inc. for OPA design 66 and synthesis. 67 Genotyping and data cleaning 68 69 Samples were genotyped with Illumina Bead Array system in accordance with manufacturer’s protocol 7 . Briefly, the OPA, querying a set of SNPs, is hybridized 3 70 simultaneously to genomic DNA followed by allele specific primer extension and ligation. 71 Subsequently, a set of fluorescently labeled universal primers (Cy3 and Cy5 labeled P1 and P2 72 respectively) were added and PCR was carried out, generating multiple labeled amplicons 73 representing hundreds of different SNPs. These fluorescent products were then combined with 74 beads on the Sentrix Array Matrix (SAM). The address sequences within the PCR amplicons 75 hybridize to their related sequences on the beads, and the fluorescence on each bead is quantified 76 resulting in a signal associated with a particular address sequence. Each bead type is represented 77 approximately 25 times on the array to improve the accuracy of the signal. After hybridization, 78 the SAM was scanned using Beadstation 500 - Beadarray reader. The hybridization intensities 79 from Beadarray reader were used for data processing, clustering and genotype calling using the 80 genotyping module in the BeadStudio package v3. GenCall module of BeadStudio was used to 81 generate genotype calls. 82 The genotype clusters generated for each SNP locus by GenCall were edited manually 83 after visual inspection of clusters on two-dimensional plot. All the genotype clusters were 84 inspected and corrected manually; the threshold for GenTrain score of > 0.25 was set to call a 85 SNP successfully genotyped 6. We retained markers for further analysis if the call rate was above 86 90%, maximum of one reproducibility error, maximum of five Mendelian errors and showed 87 consistency with HWE at the level of P > 0.001. 88 Data/statistical analysis 89 Hardy-Weinberg equilibrium and Linkage disequilibrium 90 Hardy-Weinberg equilibrium (HWE) for patients as well as controls was calculated using 8 91 PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/). Pair-wise LD among the SNPs in the 92 cases and control population was measured by complementary measures, Lewtonin’s 4 93 standardized LD coefficient (D) and Pearson’s correlation (r2) by using the software Haploview 94 v 4.2 9. Tag SNP selection was done using tagger as implemented in Haploview setting a 95 threshold of r2 ≥ 0.8. 96 Single marker association analysis 97 Case-control association analysis was performed for each polymorphism, using the 98 Armitage trend test using PLINK 8. Odds ratios (OR) were also calculated using 2×2 99 contingency tables (http://home.clara.net/sisa/). Associations between serum IgE levels and 100 alleles of markers were analyzed in cases and controls using PLINK (option --assoc --perm). 101 Logistic regression analysis was performed to evaluate the effect of age and sex, if any, on the 102 association of various genotypes on asthma or serum total IgE levels. 103 For the family based analysis (with binary trait asthma and log10 serum IgE levels), 10, 11 104 FBAT (Family Based Association Test) 105 since this method has several reported advantages and best suited for heterogeneous family 106 structures. We have also used TDT (transmission disequilibrium test) as implemented in 107 PLINKE8 (plink --tdt option) as a tool to validate observations made from FBAT analysis. 108 Haplotypic association analysis 109 was used (http://www.biostat. harvard.edu/~fbat) Haplotypic association analyses were performed using PLINK 8 and HBAT 10, 11 110 (implemented in FBAT). PLINK uses standard Expectation-Maximization (E-M) algorithm to 111 estimate haplotypes and then performs standard family based and population based (unrelated 112 individuals) association testing. Since we have genotyped a large number of markers in most of 113 the genes, to avoid large number of haplotypes with frequency less than 0.05, we have used 114 sliding window haplotypic association analysis 115 (for genes where less than five markers are used for analysis, the software as default constructs 12 5 (in case-control cohort) with window size of 5 116 haplotypes for as many markers as included for that gene) using the option (plink --bfile mydata 117 --hap-window 5 --hap-assoc). The default (plink --file mydata --hap myfile.hlist --hap-assoc) 118 option was also used to calculate the global scores of associations for regions (marker 119 combinations) showing highest scores of association. In families haplotype based TDT 120 association test were performed using PLINK 121 for regions (marker combination) identified as regions of highest significance in the case-control 122 analysis. Furthermore, excessive transmissions of the multi-locus haplotypes were also tested 123 using the HBAT as implemented in FBAT 10, 11 package, using the additive model with bi-allelic 124 (individual haplotype based) and multi-allelic (global test of association) models. Since the 125 frequencies of some of the haplotypes showing suggestive associations were found to be low, we 126 used the Monte Carlo permutation approach as implemented in FBAT (hbat --p option). 8 (plink --file mydata --hap myfile.hlist --hap-tdt) 127 128 Combining p values (Fisher’s combined probability testing) 129 We have performed family based and case-control studies (single gene or gene-wise 130 analysis), where independent set of samples have been used i.e. individuals in families and case- 131 control cohorts are non-overlapping. However, these are samples drawn from the same 132 population. In each of the analyses (family based and case-control) p values have been obtained 133 for all the association tests that we performed using appropriate methods for different types of 134 samples. We used the Fisher’s method or Fisher’s combined probability test 13 for combining the 135 p values obtained from family based and case-control association analysis. Fisher’s combined 136 probability test is a technique for data fusion or meta-analysis 137 combine the results from several independent tests. Combining p-values from different tests, 138 where same hypothesis is under testing, is an important method and has been suggested to 6 13 . In its basic form, it is used to 139 provide higher strength towards decision-making 13 as it is based on more information from both 140 types of samples considered here. It is to be noted that only p values obtained in allelic 141 association analysis have been combined since this was the screening step for identifying 142 statistically significant associations. 143 Correction for multiple testing 144 For single marker association analyses with asthma and serum total IgE we performed 145 Benjamini-Hochberg method for multiple testing corrections after combining evidences through 146 family based and case-control association studies. For the haplotype based association studies 147 since we were interested only in risk haplotypes with in a gene that were selected based on our 148 single marker association analyses, we performed p value adjustments using Benjamini- 149 Hochberg method for number of haplotypes within each gene. In the 5 loci sliding window 150 haplotype analyses a total of 402, 4, 166, 5 and 74 haplotypes in IRF2, IL6, STAT4, IFNGR2 and 151 IL4RA genes were generated, tested and adjusted for in case-control analyses while in family 152 based analyses 10, 4, 6, 6, 6 haplotypes were tested for in IRF2, IL6, STAT4, IFNGR2 and IL4RA 153 genes respectively and p value corrections made accordingly. 154 155 156 157 158 159 160 161 7 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Kumar, A. & Ghosh, B. A single nucleotide polymorphism (A --> G) in intron 3 of IFNgamma gene is associated with asthma. Genes and immunity. 9, 294-301 (2008). Nagarkatti, R., Kumar, R., Sharma, S.K. & Ghosh, B. Association of IL4 gene polymorphisms with asthma in North Indians. International archives of allergy and immunology. 134, 206-212 (2004). Sharma, M., Batra, J., Mabalirajan, U., Sharma, S., Nagarkatti, R., Aich, J. et al. A genetic variation in inositol polyphosphate 4 phosphatase a enhances susceptibility to asthma. American journal of respiratory and critical care medicine. 177, 712-719 (2008). Rogers, A.J., Raby, B.A., Lasky-Su, J.A., Murphy, A., Lazarus, R., Klanderman, B.J. et al. Assessing the reproducibility of asthma candidate gene associations, using genomewide data. American journal of respiratory and critical care medicine. 179, 1084-1090 (2009). Bird, J.J., Brown, D.R., Mullen, A.C., Moskowitz, N.H., Mahowald, M.A., Sider, J.R. et al. Helper T Cell Differentiation Is Controlled by the Cell Cycle. Immunity. 9, 229-237 (1998). Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet. 87, 3-20 (2008). Fan, J.B., Gunderson, K.L., Bibikova, M., Yeakley, J.M., Chen, J., Wickham Garcia, E. et al. in Methods in Enzymology, Vol. Volume 410 (eds. Alan, K. & Brian, O.), 57-73, (Academic Press 2006). Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, Manuel A R., Bender, D. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. American Journal of Human Genetics. 81, 559-575 (2007). Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics (Oxford, England). 21, 263-265 (2005). Laird, N.M., Horvath, S. & Xu, X. Implementing a unified approach to family-based tests of association. Genetic epidemiology. 19 Suppl 1, S36-42 (2000). Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human heredity. 50, 211-223 (2000). Li, Y., Sung, W.-K. & Liu, J.J. Association Mapping via Regularized Regression Analysis of Single-Nucleotide–Polymorphism Haplotypes in Variable-Sized Sliding Windows. American Journal of Human Genetics. 80, 705-715 (2007). Fisher, R.A. Statistical methods for research workers. (Oliver and Boyd: Edinburgh, 1932). 8