The germline sequence variant rs2736100_C in TERT

advertisement
1
Supplementary Information
2
The germline sequence variant rs2736100_C in TERT
3
associates with myeloproliferative neoplasms
4
Oddsson A1*, Kristinsson SY2,3 *, Helgason H1, Gudbjartsson DF1, Masson G1, Sigurdsson A1, Jonasdottir
5
A1, Jonasdottir A1, Steingrimsdottir H3, Vidarsson B3, Reykdal S3, Eyjolfsson GI5, Olafsson I6,
6
Onundarson PT2,3, Runarsson G3, Sigurdardottir O4, Kong A1, Rafnar T1, Sulem P1, Thorsteinsdottir U1,2
7
& Stefansson K1,2
8
deCODE Genetics/Amgen Inc., 101 Reykjavik, Iceland
9
1
10
2
11
3
12
Iceland
13
4
14
5
15
Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
Department of Hematology, Landspitali, The National University Hospital of Iceland, 101 Reykjavik,
Department of Clinical Biochemistry, Akureyri Hospital, 600 Akureyri, Iceland
The Laboratory in Mjodd, RAM, 109 Reykjavik, Iceland
Department of Clinical Biochemistry, Landspitali, The National University Hospital of Iceland, 101
6
16
Reykjavik, Iceland
17
Content
18
Table S1-S8
19
Supplemental methods
20
References
1
21
22
Table S1: Number of directly and familially imputed MPN cases and
controls in the study.
Phenotype
N
Chip imputed
Familially imputed
Myeloproliferative neoplasm (ph-negative)
237
112
125
- Polycythemia Vera
98
40
58
- Essential thrombocythemia
40
27
13
- Primary myelofibrosis
26
9
15
34 128
16 128
18 000
Controls
23
24
25
Of the 237 MPN cases 74 were without sub-phenotype (PV, ET or PMF) classification and
one MPN case had two sub-phenotypes assigned.
26
2
27
28
29
Table S2: Association with MPN in Iceland of the JAK2 variant rs1034072_A reported in this
study and the previously reported variant rs10974944_G.
MPN GWAS
All
Chip-typed
SNP ID
Position (hg18)
Allele
AF
P
OR
P
OR
P*
OR*
r2 ‡
rs1034072
chr9:5078903
A/T
28.2
3.19 x 10-7
1.85
4.19 x 10-5
1.64
0.29
1.78
0.91
rs10974944
chr9:5060831
G/C
28.7
1.90 x 10-6
1.78
7.54 x 10-5
1.09
0.86
1.75
30
Allele: minor allele/major allele, AF: Allele frequency (shown for minor allele), OR: Odds ratio (shown for minor allele).
31
32
*Adjusted values between rs1034072 and rs10974944. Only chip-typed individuals were used in conditional analysis (Cases N
‡
=
112,
ccontrols
N
=
16,128),
r2
between
rs1034072
and
rs10974944
3
33
34 Table S3: Association with MPN of previously reported variants and those that associate with MPN at
35 P >10-5 at the TERT locus, with and without conditioning on rs2736100.
MPN GWAS
All
SNP ID
Chip-typed
Position (hg18)
Allele
AF
Reported phenotypes
P
OR
P-unadj
OR-unadj
P-adj*
OR-adj*
r2 ‡
rs2736100
chr5:1339516
C/A
49.3
LA,IPF,LC,GL,TC,TL,BCC
6.39 x 10-10
2.09
4.38 x 10-7
2.02
NA
NA
1.000
NA
chr5:1345642
TT/-
47.8
-
1.29 x 10-8
1.96
2.98 x 10-5
1.78
0.27
1.22
0.450
1.92
1.59 x
10-4
1.68
0.45
1.14
0.411
10-5
1.75
0.35
1.19
0.448
rs2853677
chr5:1340194
G/A
41.6
LA
2.24 x
10-8
10-8
1.93
4.92 x
rs2735940
chr5:1349486
A/G
48.1
-
3.28 x
rs7705526
chr5:1338974
A/C
34.6
-
3.35 x 10-8
1.92
2.75 x 10-6
1.90
0.14
1.33
0.524
rs2853672
chr5:1345983
C/A
47.8
-
5.11 x 10-8
1.91
7.01 x 10-5
1.73
0.41
1.16
0.450
rs2736099
chr5:1340340
A/G
36.6
-
9.22 x 10-7
1.78
1.31 x 10-4
1.69
0.36
1.17
0.393
rs78559769
chr5:1429174
T/C
2.5
-
3.82 x 10-6
3.32
1.19 x 10-4
3.13
0.003
2.42
0.022
rs2736108
chr5:1350488
T/C
30.0
-
5.56 x 10-6
1.74
4.43 x 10-4
1.65
0.17
1.24
0.207
-
7.99 x
10-6
1.72
8.08 x
10-4
1.61
0.23
1.21
0.206
10-6
1.72
6.90 x
10-3
1.49
0.46
1.13
0.205
NA
chr5:1350077
A/ACC
28.9
rs2736107
chr5:1350854
T/C
28.5
-
9.46 x
NA
chr5:1349255
AG/A
29.2
-
9.74 x 10-6
1.72
9.21 x 10-2
1.47
0.55
1.10
0.204
rs2736098
chr5:1347086
T/C
27.4
BCC
2.10 x 10-5
1.70
3.89 x 10-3
1.52
0.37
1.15
0.173
rs4635969
chr5:1361552
A/G
20.2
TC
3.01 x 10-3
0.62
3.50 x 10-2
0.68
0.16
0.77
0.018
rs2853676
chr5:1341547
T/C
26.5
GL
6.91 x 10-3
1.41
5.50 x 10-3
1.50
0.57
1.10
0.220
rs4975709
chr5:1930280
C/A
23.7
CVD
6.01 x 10-2
1.28
0.12
1.27
0.06
1.33
0.002
10-2
0.80
0.54
0.92
0.74
0.96
0.002
rs401681
chr5:1375087
T/C
45.4
PSA,ME,UBC,PC,LC,BCC
rs4975616
chr5:1368660
G/A
42.5
LC
6.91 x 10-2
0.80
0.69
0.95
0.95
0.99
0.003
rs31489
chr5:1395714
A/C
42.4
LA
8.91 x 10-2
0.82
0.48
0.91
0.87
0.98
0.010
rs31490
chr5:1397458
A/G
43.8
CLL
0.11
0.83
0.60
0.93
0.88
0.98
0.004
rs402710
chr5:1373722
T/C
36.0
LC
0.27
0.87
0.84
0.97
0.65
1.07
0.015
rs12653946
chr5:1948829
T/C
40.4
PC
0.29
1.13
0.09
1.26
0.08
1.27
0.001
rs10069690
chr5:1332790
T/C
25.6
UBC, CLL
0.89
1.02
0.26
1.19
0.37
0.87
0.169
rs2242652
chr5:1333028
A/G
22.3
PC
0.96
1.01
0.48
1.12
0.27
0.83
0.138
36
37
38
39
40
41
42
6.31 x
OR: Odds ratio (shown for minor allele), AF: allele frequency (shown for minor allele), Allele: minor allele/major allele,
Reported: Known associations of diseases and traits with the index SNPs, BCC: Basal cell carcinoma, CVD:Cardiovascular
disease risk factors, CLL: Chronic lymphocytic leukemia, GL:Glioma, IPF:Idiopathic pulmonary fibrosis, LA:Lung
adenocarcinoma, LC:Lung cancer, ME:Melanoma, MPN:Myeloproliferative neoplasms, UBC: Urinary bladder cancer,
PC:Pancreatic cancer, PSA:Prostate specific antigen levels, TC:Testicular germ cell cancer, TL:Telomerase length *Adjusted for
rs2736100. Only chip-typed individuals were used in conditional analysis (Cases N = 112, Controls N = 16,128), ‡ r2 correlation
between rs2736100 and the listed variants
4
43
Table S4: Association with MPN in Iceland of variants reported to affect telomere length
MPN in Iceland
Terlomere length
SNP ID*
Position (hg18)
Gene
Allele
AF (%)
P
OR
AF (%)*
P*
Effect (SD) *
rs10936599
chr3:170974795
TERC
C
79.2
0.92
1.02
74.8
2.54 x 10-31
0.097
rs2736100
chr5:1339516
TERT
C
49.3
6.39 x 10-10
2.09
48.6
4.38 x 10-19
0.078
rs7675998
chr4:164227270
NAF1
G
80.4
0.94
1.02
78.3
4.35 x 10-16
0.074
rs9420907
chr10:105666455
OBFC1
C
12.3
0.83
1.04
13.5
6.90 x 10-11
0.069
rs11125529
chr2:54329370
ACYP2
A
16.3
2.85 x 10-3
1.53
14.2
7.50 x 10-10
0.056
rs8105767
chr19:22007281
ZNF208
G
32.0
0.425
1.10
29.1
1.11 x 10-9
0.048
rs755017
chr20:61892066
RTEL1
G
14.2
0.82
1.04
13.1
6.71 x 10-9
0.062
44
Allele: Effect allele, OR: Odds ratio (shown for effect allele), AF: allele frequency (shown for effect allele).
45
*Data drawn from Codd et al. 2013 that reported association of the listed SNPs with telomere length.
5
46
47
48
Table S5: Association of rs2736100_C in TERT and
rs1034072_A in JAK2 with various blood cell counts in Iceland
TERT rs2736100_C
Trait
JAK2 rs1034072_A
N
P
Effect (SD)
P
Effect (SD)
76 739
9.07 x 10-6
0.019
0.91
0.001
126 853
4.10 x 10-4
0.016
7.74 x10-4
0.017
- Basophils
99 809
8.19 x 10-2
0.006
0.02
0.008
- Eosinophils
99 862
2.50 x 10-2
-0.010
3.01 x 10-4
0.018
- Granulocytes
99 473
3.55 x 10-4
0.016
4.82 x 10-4
0.017
- Monocytes
100 271
2.55 x 10-2
0.010
0.38
0.004
- Lymphocytes
100 270
0.78
0.001
0.95
0.000
Platelet
103 441
5.26 x 10-8
0.025
2.50 x 10-4
-0.019
Red blood cells
White blood cells
49
50
6
51
52
53
Table S6: Association of rs2736100_C
Hematological disorders in Iceland.
Myeloid
in
with
Phenotype
N
P
OR
Chronic myeloid leukemia
85
0.93
0.98
Acute myeloid leukemia
291
0.30
1.14
1 122
0.68
0.98
Multiple myeloma
414
0.12
0.85
Waldenstroms
86
0.06
1.46
Non Hodgkins lymphoma
800
0.96
1.00
Hodgkins lymphoma
256
0.44
0.91
Follicular non Hodgkins lymphoma
149
0.67
1.06
Chronic lymphocytic leukemia
309
0.56
0.94
Monoclonal gammopathy of unknown significance
Lymphoid
TERT
54
55
7
56
57
Table S7: Mutation status of JAK2V617F in MPN patients
Phenotype
N
JAK2V617F-positive
43
N
JAK2V617F-negative
19
%
JAK2V617F-positive
69.35
- Polycythemia vera (N =31)
25
6
80.64
- Essential thrombocythemia (N =11)
6
5
54.55
- Primary myelofibrosis (N =5 )
3
2
60.00
Myeloproliferative Neoplasms (ph-neg.) (N = 62)
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
MPN cases with available blood samples drawn until two years prior to and after MPN diagnosis were included in the analysis
(N = 62).
Table S8: The effect of the germline risk alleles rs1034072_A in JAK2
and the rs2736100_C in TERT on Somatic JAK2V617F allele burden.
8
JAK2V617F allele burden effect
97
98
99
100
101
Phenotype
P†
per germline risk allele †
TERT rs2736100_C
MPN (n= 60)
0.77
1.45 %
(Control AF=49.32)
PV (n= 30)
0.98
0.17 %
JAK2 rs1034072_A
MPN (n= 60)
0.02
10.28 %
(Control AF=28.21)
PV (n= 29)
0.03
15.07 %
MPN cases with available blood samples drawn until two years prior to and after MPN diagnosis were
included in the analysis (N = 62). The mean JAK2V617F somatic allele burden among the 62 MPN cases
used is 22%.
† Linear regression analysis was performed after adjusting for time from blood draw date to MPN
diagnosis to estimate the JAK2V617F somatic allele burden effect per germline risk allele.
102
103
104
105
106
107
108
9
109
110
111
Supplemental methods
112
Study population
113
The study group consists of 237 patients diagnosed with MPN from the year 1956 until the
114
end of 2012 according to the nationwide Icelandic Cancer Registry14. These include the sub-
115
diagnosis PV (N = 98), ET (N = 40) and PMF (N = 26). In total, 74 had MPN unclassifiable and
116
one individual had two sub-diagnoses. Median age at diagnosis was 70 years (range 20-96,
117
48% males) and the median time since diagnosis was 15 years (range 1-57). 48% of the
118
patients were males. The controls consist of 34 128 Icelanders recruited through different
119
research projects at deCODE genetics.
120
The Data Protection Authority of Iceland and the National Bioethics Committee of Iceland
121
approved this study. All participants signed written informed consent prior to participation
122
in the study. All personal identifiers associated with blood samples, medical information,
123
and genealogies were encrypted by the Data Protection Authority, using a third-party
124
encryption system.
125
Illumina SNP Chip Genotyping
126
Genotyping was performed with methods previously described 11. Icelandic chip-typed
127
samples were assayed with the Illumina HumanHap300, HumanCNV370, HumanHap610,
128
HumanHap1M, HumanHap660, Omni-1, Omni 2.5 or Omni Express bead chips at deCODE
129
genetics. SNPs were excluded if they had (i) yield less than 95%, (ii) minor allele frequency
10
130
less than 1% in the population or (iii) significant deviation from Hardy-Weinberg
131
equilibrium in the controls (P <0.001), (iv) if they produced an excessive inheritance error
132
rate (over 0.001), (v) if there was substantial difference in allele frequency between chip
133
types (from just a single chip if that resolved all differences, but from all chips otherwise).
134
All samples with a call rate below 97% were excluded from the analysis. For the HumanHap
135
series of chips, 304,937 SNPs were used for long range phasing, whereas for the Omni series
136
of chips 564,196 SNPs were included. The final set of SNPs used for long-range phasing was
137
composed of 707,525 SNPs.
138
Single track SNP assay genotyping
139
Single SNP genotyping applying the Centaurus (Nanogen) single track genotyping assay1
140
was done to verify the accuracy of the imputation of the TERT rs2736100_C variant in the
141
Icelandic samples.
142
Whole Genome Sequencing
143
Paired-end libraries for sequencing were prepared according to the manufacturer’s
144
instructions (Illumina, TruSeqTM). Whole genome sequencing was performed for 2,230
145
Icelanders, selected for various conditions. All of the individuals were sequenced at a depth
146
of at least 10X (average sequencing depth = 22X).
147
Template DNA fragments were hybridized to the surface of flow cells (GA PE cluster kit
148
(v2) or HiSeq PE cluster kits (v2.5 or v3)) and amplified to form clusters using the Illumina
149
cBot. In brief, DNA (2.512 pM) was denatured, followed by hybridization to grafted adaptors
150
on the flow cell. Isothermal bridge amplification using Phusion polymerase was then
151
followed by linearization of the bridged DNA, denaturation, blocking of 3’ ends and
11
152
hybridization of the sequencing primer. Sequencing-by-synthesis (SBS) was performed on
153
Illumina GAIIx and/or HiSeq 2000 instruments. Paired-end libraries were sequenced at 2 x
154
101 (HiSeq) or 2 x 120 (GAIIx) cycles of incorporation and imaging using the appropriate
155
TruSeqTM SBS kits. Each library or sample was initially run on a single GAIIx lane for QC
156
validation followed by further sequencing on either GAIIx (≥ 4 lanes) or HiSeq (≥ 1 lane)
157
with targeted raw cluster densities of 500800 k/mm2, depending on the version of the data
158
imaging and analysis packages (SCS2.6-2-9/RTA1.6-1.9, HCS1.3.8-1.4.8/RTA1.10.36-
159
1.12.4.2). Real-time analysis involved conversion of image data to base-calling in real-time.
160
Sample preparation
161
Paired-end libraries for sequencing were prepared according to the manufacturer’s
162
instructions (Illumina, TruSeqTM). In short, approximately 1 mg of genomic DNA, isolated
163
from frozen blood samples, was fragmented to a mean target size of 300 bp using a Covaris
164
E210 instrument. The resulting fragmented DNA was end repaired using T4 and Klenow
165
polymerases and T4 polynucleotide kinase with 10 mM dNTP followed by addition of an ”A”
166
base at the ends using Klenow exo fragment (3’ to 5’-exo minus) and dATP (1 mM).
167
Sequencing adaptors containing ”T” overhangs were ligated to the DNA products followed
168
by agarose (2%) gel electrophoresis. Fragments of about 400-500 bp were isolated from the
169
gels (QIAGEN Gel Extraction Kit), and the adaptor-modified DNA fragments were PCR
170
enriched for ten cycles using Phusion DNA polymerase (Finnzymes Oy) and a PCR primer
171
cocktail (Illumina). Enriched libraries were further purified using AMPure XP beads
172
(Beckman-Coulter). The quality and concentration of the libraries were assessed with the
173
Agilent 2100 Bioanalyzer using the DNA 1000 LabChip (Agilent). Barcoded libraries were
174
stored at -20oC. All steps in the workflow were monitored using an in-house laboratory
175
information management system with barcode tracking of all samples and reagents.
12
176
Alignment and SNP calling
177
Reads were aligned to NCBI Build 36 of the human reference sequence using Burrows-
178
Wheeler Aligner (BWA) 0.5.92. Alignments were merged into a single BAM file and marked
179
for duplicates using Picard 1.55 (http://picard.sourceforge.net/
180
http://picard.sourceforge.net/). Only non-duplicate reads were used for the downstream
181
analysis.
182
Variants were called using Genome Analysis Toolkit, (GenomeAnalysisTK) 1.2-29-
183
g0acaf2d3 by applying base quality score recalibration, INDEL realignment and performing
184
SNP and INDEL discovery and genotyping using standard hard filtering4. Variants were
185
annotated using SNP effect predictor (snpEff) and Genome AnalysisToolkit 1.4-9-g1f1233b
186
with only the highest-impact effect 3,5.
187
Genotype imputation
188
Long range phasing of all chip-genotyped individuals was performed with methods
189
described previously6,7. SNPs and INDELs identified through sequencing were imputed into
190
all chip typed Icelanders who had been phased with long range phasing using the same
191
model as used by IMPUTE 8. In brief, phasing is achieved using an iterative algorithm which
192
phases a single proband at a time given the available phasing information about everyone
193
else who shares a long haplotype identically by state with the proband. Given the large
194
fraction of the Icelandic population that has been chip-typed, accurate long range phasing is
195
available genome-wide for all chip-typed Icelanders. SNPs and INDELs identified through
196
sequencing were imputed into all chip typed Icelanders who had been phased with long
197
range phasing using the same model as used by IMPUTE8 (for details see Supplementary
198
Methods). The genotype data from sequencing can be ambiguous due to low sequencing
13
199
coverage. In order to phase the sequencing genotypes, an iterative algorithm was applied
200
for each SNP with alleles 0 and 1. We let H be the long range phased haplotypes of the
201
sequenced individuals and applied the following algorithm:
202
1. For each haplotype h in H, use the Hidden Markov Model of IMPUTE to calculate for
203
every other k in H, the likelihood, denoted γh,k, of h having the same ancestral
204
source as k at the SNP.
205
2. For every h in H, initialize the parameter
, which specifies how likely the one allele
206
of the SNP is to occur on the background of h from the genotype likelihoods
207
obtained from sequencing. The genotype likelihood Lg is the probability of the
208
observed sequencing data at the SNP for a given individual assuming g is the true
209
genotype at the SNP. If L0, L1 and L2 are the likelihoods of the genotypes 0, 1 and 2 in
210
the individual that carries h, then set
211
3. For every pair of haplotypes h and k in H that are carried by the same individual, use
212
the other haplotypes in H to predict the genotype of the SNP on the backgrounds of
213
h and k:
214
4. and
14
215
5. Combining these predictions with the genotype likelihoods from sequencing gives
216
un-normalized updated phased genotype probabilities that were not normalized
217
yielded
218
6. Now use these values to update θh and θk to
219
7. and
220
8. Repeat step 3 when the maximum difference between iterations is greater than a
221
convergence threshold ϵ. We used ϵ = 10-7.
222
Given the long range phased haplotypes and θ, the allele of the SNP on a new haplotype h
223
not in H, is imputed as.
15
224
Genotype imputation information.
225
The information measure value of genotype imputation was estimated by the ratio of the
226
variance of imputed expected allele counts and the variance of the actual allele counts.
227
Were
228
variance of the imputed expected counts and V ar(θ) was estimated by p(1 - p) were p is the
229
allele frequency.
is the allele count. V ar(E(θ|chipdata)) was estimated from the observed
230
In the present MPN GWAS, only variants with an information measure value >0.9 were
231
used. The imputed genotype information measure value for rs2736100 is 0.98. To validate
232
the imputation we directly genotyped rs2736100 in 7 281 Icelanders, by single track
233
Centaurus genotyping assay1. The correlation (r2) between directly genotyped and imputed
234
allele counts was 0.94.
235
Familial imputation (in-silico genotyping)
236
In addition to imputing sequence variants from the whole genome sequencing effort into
237
chip genotyped individuals, we also performed a second imputation step where genotypes
238
were imputed into relatives of chip genotyped individuals, creating in-silico genotypes. The
239
inputs into the second imputation step are the fully phased (in particular every allele has
240
been assigned a parent of origin) imputed and chip type genotypes of the available chip
241
typed individuals. The algorithm used to perform the second imputation step consists of:
16
242
1. For each ungenotyped individual (the proband), find all chip genotyped individuals
243
within two meiosis of the individual. The six possible types of two meiosis relatives
244
of the proband are (ignoring more complicated relationships due to pedigree loops):
245
Parents, full and half siblings, grandparents, children and grandchildren. If all
246
pedigree paths from the proband to a genotyped relative go through other
247
genotyped relatives, then that relative is excluded. For example, if a parent of the
248
proband is genotyped, then the probands grandparents through that parent are
249
excluded. If the number of meiosis in the pedigree around the proband exceeds a
250
threshold (we used 12), then relatives are removed from the pedigree until the
251
number of meiosis falls below 12, in order to reduce computational complexity.
252
2. At every point in the genome, calculate the probability for each genotyped relative
253
sharing with the proband based on the autosomal SNPs used for phasing. A
254
multipoint algorithm based on the hidden Markov model Lander-Green multipoint
255
linkage algorithm using fast Fourier transforms is used to calculate these sharing
256
probabilities9,10. First single point sharing probabilities are calculated by dividing
257
the genome into 0.5cM bins and using the haplotypes over these bins as alleles. If
258
there are informative haplotypes in the pedigree around the proband, denote by the
259
inheritance vector (sharing pattern) 9. Haplotypes that are the same, except at most
260
at a single SNP, are treated as identical. Given the haplotype frequencies in each bin
261
the single point distribution, can be calculated as in classical multipoint linkage
262
analysis9. When the haplotypes in the pedigree are incompatible over a bin, then a
263
uniform probability distribution was used for that bin, . The most common causes
264
for such incompatibilities are recombinations in member belonging to the pedigree,
265
phasing errors and genotyping errors. Note that since the input genotypes are fully
266
phased, the single point information is substantially more informative than for
17
267
unphased genotyped, in particular one haplotype of the parent of a genotyped child
268
is always known. The single point distributions are then convolved using the
269
multipoint algorithm to obtain multipoint sharing probabilities at the center of each
270
bin just as in the original Lander Green algorithm 9. Genetic distances were obtained
271
from the most recent version of the deCODE genetic map11.
272
3. Based on the sharing probabilities at the center of each bin, all the SNPs from the
273
whole genome sequencing are imputed into the proband. To impute the genotype of
274
the paternal allele of a SNP located at x, flanked by bins with centers at
275
276
and
. Starting with the left bin, going through all possible inheritance vectors v, let
be the set of haplotypes of genotyped individuals that share identically by descent
277
within the pedigree with the probands paternal haplotype given the inheritance
278
vector v and P(v) be the probability of at the left bin this is the output from step 2
279
above and let
280
be the expected allele count of the SNP for haplotype i. Then
is the expected allele count of the paternal haplotype of the proband
281
given and an overall estimate of the allele count given the sharing distribution at the
282
left bin is obtained from
283
with the proband’s paternal haplotype given v and thus there is no information
284
about the allele count. We therefore store the probability that some genotyped
285
relative shared the probands paternal haplotype,
286
expected allele count, conditional on the probands paternal haplotype being shared
287
by at least one genotyped relative:
288
In the same way calculate
289
an estimates of the SNP from the two flanking bins:
. If Iv is empty then no relative shares
and
and an
.
. Linear interpolation is then used to get
18
290
If θ is an estimate of the population frequency of the SNP then Oc + (1 - O)θ is an
291
estimate of the allele count for the probands paternal haplotype. Similarly, an
292
expected allele count can be obtained for the proband’s maternal haplotype.
293
Association testing
294
Logistic regression was used to test for association between SNPs and disease, treating
295
disease status as the response and expected genotype counts from imputation or allele
296
counts from direct genotyping as covariates in Iceland, as described previously12. Testing
297
was performed using the likelihood ratio statistic.
298
Multivariate logistic regression analysis was performed conditioning for a given marker
299
by adjusting for the estimated allele count on the basis of imputation of this marker in
300
Iceland. The genomic control correction factor was the same as used for the unadjusted
301
association analysis. A forward selection multiple logistic regression model was used to
302
further define the extent of the genetic association. Briefly, all imputed SNPs located within
303
the interval of 500 kb were tested for possible incorporation into a multiple-regression
304
model. In a stepwise fashion, a SNP was added to the model if it had the smallest P-value
305
among all SNPs not yet included in the model and if it had a P-value below the locus-wide
306
significance threshold.
19
307
To account for the relatedness and stratification within our case and control sample
308
sets, we applied the method of genomic control based on chip markers. For the MPN versus
309
control comparison, the correction factors based on the genomic control was 1.11
310
Real-time quantitative PCR assay for JAK2V617F
311
The somatic JAK2V617F mutation was screened for using a Real-time quantitative PCR assay
312
performed as described previously13. Briefly, PCR amplification and detection were
313
performed on an ABI Prism 7900HT Sequence Detection System (Applied Biosystems) with
314
an initial step of 10 minutes at 95°C, followed by 40 cycles of 15 seconds at 95°C and 1
315
minute at 60°C. DNA from a healthy JAK2V617F non-carrier and from a homozygous
316
for JAK2V617F carrier, as determined by sanger sequencing, were mixed in various
317
proportions to generate a standard curve for JAK2V617F/JAK2Total against ΔCt (CtJAK2V617F –
318
CtJAK2WT) to estimate JAK2V617F somatic allele burden. All samples were measured in
319
duplicate, and the mean ΔCt was used to calculate JAK2V617F /JAK2WT.
320
Derived from this measurement we classify individuals with positive JAK2V617F somatic
321
mutation status if the allele burden is 5% or higher. In addition, we correlate the number of
322
copies (0, 1 or 2) of each of the two germline MPN risk alleles (rs2736100_C and
323
rs1034072_A) with JAK2V617F somatic allele burden (ranging from 0% to 85%) after
324
adjusting for time between MPN diagnosis and blood sampling.
325
326
327
328
20
329
330
References
331
1. Kutyavin IV, Milesi D, Belousov Y, Podyminogin M, Vorobiev A et al. A novel endonuclease
332
333
334
335
336
337
IV post-PCR genotyping system. Nucleic Acids Res 34, e128 (2006).)
2. Li, H. & Durbin, R. Fast and accurate short read alignment with burrows-wheeler
transform. Bioinformatics (Oxford, England) 25, 1754—1760 (2009).
3. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing
next-generation DNA sequencing data. Genome research 20, 1297—1303 (2010).
4. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-
338
generation DNA sequencing data. Nature genetics 43, 491—498 (2011).
339
5. Cingolani, P. et al. A program for annotating and predicting the effects of single
340
nucleotide polymorphisms, SnpEff: SNPs in the genome of drosophila melanogaster
341
strain w1118; iso-2; iso-3. Fly 6, 80—92 (2012).
342
343
344
345
6. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype
imputation. Nature genetics 40, 1068—1075 (2008).
7. Kong, A. et al. Parental origin of sequence variants associated with complex diseases.
Nature 462, 868—874 (2009).
346
8. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for
347
genome-wide association studies by imputation of genotypes. Nature genetics 39,
348
906—913 (2007).
21
349
9. Lander, E. S. & Green, P. Construction of multilocus genetic linkage maps in humans.
350
Proceedings of the National Academy of Sciences of the United States of America 84,
351
2363—2367 (1987).
352
10. Kruglyak, L. & Lander, E. S. Faster multipoint linkage analysis using fourier transforms.
353
Journal of computational biology: a journal of computational molecular cell biology 5,
354
1—7 (1998).
355
356
357
358
359
11. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations
and individuals. Nature 467, 1099—1103 (2010). PMID: 20981099.
12. Helgason, H. et al. A rare nonsynonymous sequence variant in c3 is associated with high
risk of age-related macular degeneration. Nature genetics (2013).
13. Levine RL, Belisle C, Wadleigh M, Zahrieh D, Lee S, Chagnon P. X-inactivation-based
360
clonality analysis and quantitative JAK2V617F assessment reveal a strong association
361
between clonality and JAK2V617F in PV but not ET/MMM, and identifies a subset of
362
JAK2V617F-negative ET and MMM patients with clonal hematopoiesis. Blood. 2006
363
May 15;107(10):4139-41. Epub 2006 Jan 24.
22
Download