Supplementary Information (doc 93K)

advertisement
1
Online Repository
2
Genetic association of key Th1/Th2 pathway candidate genes, IRF2, IL6,
3
IFNGR2, STAT4 and IL4RA, with atopic asthma in the Indian population.
4
Amrendra Kumar1, Sudipta Das1, Anurag Agarwal2, Indranil Mukhopadhyay3 and Balaram
5
Ghosh1, 2
6
7
1
8
Asthma and Lung Disease, Institute of Genomics and Integrative Biology. Delhi- 110007
9
3
Molecular Immunogenetics Laboratory,
2
Centre of Excellence for Translational Research in
Human Genetics unit, Indian Statistical Institute, Kolkata, India
10
11
12
13
14
Address of Correspondence
15
Dr. Balaram Ghosh, Ph.D.
16
Molecular Immunogenetics Laboratory,
17
Institute of Genomics and Integrative Biology
18
Mall Road, Delhi- 110007
19
Phone No 91-11-27662580
20
Fax No.:91-11-27667471, 91-11-27416489
21
E-mail ID: bghosh@igib.res.in
22
23
1
24
Supplementary methods
25
Gene and SNP selection
26
Ideally we would have preferred to include all genes reported to modulate Th1/Th2/Th17
27
differentiation, development and/or functions. However, limited in our choice by the available
28
resources, we selected 33 genes focusing on the Th1/Th2 pathway (Supplementary Table B) in
29
total where emphasis was given to the important mediators of IL-4 (IL-4RA, STAT6), IFNG
30
(IFNGR1, IFNGR2, STAT1) and IL-12 (IL-12A, IL-12B, IL-12RB1, IL-12RB2, STAT4) signaling
31
pathway genes or genes modulating expression and function of these pathways (IRF1, IRF2,
32
ATF2, TBET etc; Supplementary Table B). Genetic studies of IFNG 1, IL-4
33
reported previously from this lab and not a part of the current study. Detailed description of the
34
mechanisms through which these genes modulate Th1/Th2 pathways is out of the scope of this
35
article and readers are referred to appropriate articles/papers for the same (Supplementary
36
Table B). It should be mentioned here that we have included four genes (INPP4A, HSPH1,
37
ITLN1 and RPS6KB2) that were found to be differentially expressed in microarray datasets in a
38
meta-analysis from our laboratory 3, for several reasons. We have already reported identification
39
of INPP4A as novel asthma candidate gene that has been replicated in another population/study 3,
40
4
41
association with asthma was found. These evidences and the fact that their gene products have
42
been shown to be involved in regulation of immune homeostasis mechanisms by modulating cell
43
cycle, apoptosis, etc. necessitated/motivated detailed genetic association analysis of these genes
44
with dense marker selection. And it has been demonstrated that T helper cell differentiation is
45
controlled by cell cycle 5. Also, inclusion of these genes and validation of our preliminary report
46
would have potentiated novel findings.
2
genes have been
. Also, in preliminary studies, using microsatellite markers in other three genes, a suggestive
2
47
Subjects/individuals in this study belong to the Indo-European caste groups, referred to as
6
48
IE-LPs in IGVDB (Indian Genome Variation Database)
that have been shown to have closest
49
genetic affinity to CEU (Utah residents from Northern and western European) in HapMap
50
populations. Following criteria was adopted for selecting SNPs: (1) we selected CEU as
51
reference population, (2) selected TagSNPs (minor allele frequency ≥5% and r2 value set at 0.8),
52
(3) since HapMap CEU is not exactly similar to our population we also ensured that SNPs
53
(HapMap validated) were selected covering entire gene (distance between adjacent SNPs is not
54
more than ~ 4 to 5 KB), (4) efforts were made to include previously reported SNPs in these
55
genes (if not present in tag SNPs) if these were amenable to the assay (based on SNP score;
56
discussed below).
57
After selection of SNPs, all the SNPs were submitted to Illumina Assay Design Tool for
58
scoring prior to OPA (Oligonucleotide Pool All) design. The SNP scores were supplied by
59
Illumina Inc. and SNP score value ranges from 0 to 1.1. The SNP score reflects the ability to
60
design a successful assay (SNP score < 0.4: Low success rate, high risk to OPA; SNP score 0.4 -
61
0.6: Moderate success rate, moderate risk to OPA; SNP score 0.6 - 1.1: High success rate, low
62
risk to OPA). For the present study SNP score 0.6 was selected as the lowest cutoff. The SNPs
63
failing to achieve this score were replaced (with nearest neighbor, HapMap validated SNPs) and
64
submitted again for rescoring to Illumina Inc and this process was repeated till SNP score ≥ 0.6
65
was achieved for all the SNPs. The final list was then submitted to Illumina Inc. for OPA design
66
and synthesis.
67
Genotyping and data cleaning
68
69
Samples were genotyped with Illumina Bead Array system in accordance with
manufacturer’s protocol
7
. Briefly, the OPA, querying a set of SNPs, is hybridized
3
70
simultaneously to genomic DNA followed by allele specific primer extension and ligation.
71
Subsequently, a set of fluorescently labeled universal primers (Cy3 and Cy5 labeled P1 and P2
72
respectively) were added and PCR was carried out, generating multiple labeled amplicons
73
representing hundreds of different SNPs. These fluorescent products were then combined with
74
beads on the Sentrix Array Matrix (SAM). The address sequences within the PCR amplicons
75
hybridize to their related sequences on the beads, and the fluorescence on each bead is quantified
76
resulting in a signal associated with a particular address sequence. Each bead type is represented
77
approximately 25 times on the array to improve the accuracy of the signal. After hybridization,
78
the SAM was scanned using Beadstation 500 - Beadarray reader. The hybridization intensities
79
from Beadarray reader were used for data processing, clustering and genotype calling using the
80
genotyping module in the BeadStudio package v3. GenCall module of BeadStudio was used to
81
generate genotype calls.
82
The genotype clusters generated for each SNP locus by GenCall were edited manually
83
after visual inspection of clusters on two-dimensional plot. All the genotype clusters were
84
inspected and corrected manually; the threshold for GenTrain score of > 0.25 was set to call a
85
SNP successfully genotyped 6. We retained markers for further analysis if the call rate was above
86
90%, maximum of one reproducibility error, maximum of five Mendelian errors and showed
87
consistency with HWE at the level of P > 0.001.
88
Data/statistical analysis
89
Hardy-Weinberg equilibrium and Linkage disequilibrium
90
Hardy-Weinberg equilibrium (HWE) for patients as well as controls was calculated using
8
91
PLINK
(http://pngu.mgh.harvard.edu/~purcell/plink/). Pair-wise LD among the SNPs in the
92
cases and control population was measured by complementary measures, Lewtonin’s
4
93
standardized LD coefficient (D) and Pearson’s correlation (r2) by using the software Haploview
94
v 4.2 9. Tag SNP selection was done using tagger as implemented in Haploview setting a
95
threshold of r2 ≥ 0.8.
96
Single marker association analysis
97
Case-control association analysis was performed for each polymorphism, using the
98
Armitage trend test using PLINK 8. Odds ratios (OR) were also calculated using 2×2
99
contingency tables (http://home.clara.net/sisa/). Associations between serum IgE levels and
100
alleles of markers were analyzed in cases and controls using PLINK (option --assoc --perm).
101
Logistic regression analysis was performed to evaluate the effect of age and sex, if any, on the
102
association of various genotypes on asthma or serum total IgE levels.
103
For the family based analysis (with binary trait asthma and log10 serum IgE levels),
10, 11
104
FBAT (Family Based Association Test)
105
since this method has several reported advantages and best suited for heterogeneous family
106
structures. We have also used TDT (transmission disequilibrium test) as implemented in
107
PLINKE8 (plink --tdt option) as a tool to validate observations made from FBAT analysis.
108
Haplotypic association analysis
109
was used (http://www.biostat. harvard.edu/~fbat)
Haplotypic association analyses were performed using PLINK
8
and HBAT
10, 11
110
(implemented in FBAT). PLINK uses standard Expectation-Maximization (E-M) algorithm to
111
estimate haplotypes and then performs standard family based and population based (unrelated
112
individuals) association testing. Since we have genotyped a large number of markers in most of
113
the genes, to avoid large number of haplotypes with frequency less than 0.05, we have used
114
sliding window haplotypic association analysis
115
(for genes where less than five markers are used for analysis, the software as default constructs
12
5
(in case-control cohort) with window size of 5
116
haplotypes for as many markers as included for that gene) using the option (plink --bfile mydata
117
--hap-window 5 --hap-assoc). The default (plink --file mydata --hap myfile.hlist --hap-assoc)
118
option was also used to calculate the global scores of associations for regions (marker
119
combinations) showing highest scores of association. In families haplotype based TDT
120
association test were performed using PLINK
121
for regions (marker combination) identified as regions of highest significance in the case-control
122
analysis. Furthermore, excessive transmissions of the multi-locus haplotypes were also tested
123
using the HBAT as implemented in FBAT 10, 11 package, using the additive model with bi-allelic
124
(individual haplotype based) and multi-allelic (global test of association) models. Since the
125
frequencies of some of the haplotypes showing suggestive associations were found to be low, we
126
used the Monte Carlo permutation approach as implemented in FBAT (hbat --p option).
8
(plink --file mydata --hap myfile.hlist --hap-tdt)
127
128
Combining p values (Fisher’s combined probability testing)
129
We have performed family based and case-control studies (single gene or gene-wise
130
analysis), where independent set of samples have been used i.e. individuals in families and case-
131
control cohorts are non-overlapping. However, these are samples drawn from the same
132
population. In each of the analyses (family based and case-control) p values have been obtained
133
for all the association tests that we performed using appropriate methods for different types of
134
samples. We used the Fisher’s method or Fisher’s combined probability test 13 for combining the
135
p values obtained from family based and case-control association analysis. Fisher’s combined
136
probability test is a technique for data fusion or meta-analysis
137
combine the results from several independent tests. Combining p-values from different tests,
138
where same hypothesis is under testing, is an important method and has been suggested to
6
13
. In its basic form, it is used to
139
provide higher strength towards decision-making 13 as it is based on more information from both
140
types of samples considered here. It is to be noted that only p values obtained in allelic
141
association analysis have been combined since this was the screening step for identifying
142
statistically significant associations.
143
Correction for multiple testing
144
For single marker association analyses with asthma and serum total IgE we performed
145
Benjamini-Hochberg method for multiple testing corrections after combining evidences through
146
family based and case-control association studies. For the haplotype based association studies
147
since we were interested only in risk haplotypes with in a gene that were selected based on our
148
single marker association analyses, we performed p value adjustments using Benjamini-
149
Hochberg method for number of haplotypes within each gene. In the 5 loci sliding window
150
haplotype analyses a total of 402, 4, 166, 5 and 74 haplotypes in IRF2, IL6, STAT4, IFNGR2 and
151
IL4RA genes were generated, tested and adjusted for in case-control analyses while in family
152
based analyses 10, 4, 6, 6, 6 haplotypes were tested for in IRF2, IL6, STAT4, IFNGR2 and IL4RA
153
genes respectively and p value corrections made accordingly.
154
155
156
157
158
159
160
161
7
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Kumar, A. & Ghosh, B. A single nucleotide polymorphism (A --> G) in intron 3 of
IFNgamma gene is associated with asthma. Genes and immunity. 9, 294-301 (2008).
Nagarkatti, R., Kumar, R., Sharma, S.K. & Ghosh, B. Association of IL4 gene
polymorphisms with asthma in North Indians. International archives of allergy and
immunology. 134, 206-212 (2004).
Sharma, M., Batra, J., Mabalirajan, U., Sharma, S., Nagarkatti, R., Aich, J. et al. A
genetic variation in inositol polyphosphate 4 phosphatase a enhances susceptibility to
asthma. American journal of respiratory and critical care medicine. 177, 712-719 (2008).
Rogers, A.J., Raby, B.A., Lasky-Su, J.A., Murphy, A., Lazarus, R., Klanderman, B.J. et
al. Assessing the reproducibility of asthma candidate gene associations, using genomewide data. American journal of respiratory and critical care medicine. 179, 1084-1090
(2009).
Bird, J.J., Brown, D.R., Mullen, A.C., Moskowitz, N.H., Mahowald, M.A., Sider, J.R. et
al. Helper T Cell Differentiation Is Controlled by the Cell Cycle. Immunity. 9, 229-237
(1998).
Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet.
87, 3-20 (2008).
Fan, J.B., Gunderson, K.L., Bibikova, M., Yeakley, J.M., Chen, J., Wickham Garcia, E.
et al. in Methods in Enzymology, Vol. Volume 410 (eds. Alan, K. & Brian, O.), 57-73,
(Academic Press 2006).
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, Manuel A R., Bender, D. et
al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage
Analyses. American Journal of Human Genetics. 81, 559-575 (2007).
Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of
LD and haplotype maps. Bioinformatics (Oxford, England). 21, 263-265 (2005).
Laird, N.M., Horvath, S. & Xu, X. Implementing a unified approach to family-based tests
of association. Genetic epidemiology. 19 Suppl 1, S36-42 (2000).
Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for
population admixture with arbitrary pedigree structure and arbitrary missing marker
information. Human heredity. 50, 211-223 (2000).
Li, Y., Sung, W.-K. & Liu, J.J. Association Mapping via Regularized Regression
Analysis of Single-Nucleotide–Polymorphism Haplotypes in Variable-Sized Sliding
Windows. American Journal of Human Genetics. 80, 705-715 (2007).
Fisher, R.A. Statistical methods for research workers. (Oliver and Boyd: Edinburgh,
1932).
8
Download