Supplementary Information (doc 1374K)

advertisement
Supplementary Materials for
Haplotype-based approach for noninvasive prenatal tests of Duchenne muscular
dystrophy (DMD) using cell-free fetal DNA in maternal plasma
Yan Xu, MS1,2﹟; Xuchao Li, MS4﹟; Huijuan Ge, ME4; Bing Xiao, MD1,2; Yanyan
Zhang, MS4; Xiao-Min Ying, BS1,2; Xiaoyu Pan, BE4; Lei Wang, MD1,3, Weiwei Xie,
BM4; Lin Ni, BS1,2; Shengpei Chen, BE4; Wen-TingJiang, MS1,2; Ping Liu, MM4; Hui
Ye, BS1,2; Ying Cao, BS1,2; Jing-Min Zhang, MD1,2; Yu Liu, BS1,2; Zu-Jing Yang,
MD2,3; Ying-Wei Chen, MD1,2; Fang Chen, MS4*; Hui Jiang, MS4*; and Xing Ji,
MS1,2*
1. Methods
Identification of the underlying mutations in the proband and the mother by
multiplex ligation-dependent probe amplification (MLPA)
Large deletions/duplications were detected by MLPA using SALSA MLPA kits
P034 and P035 DMD (MRC-Holland). The analysis was performed according to the
manufacturer’s recommendations. The FAM-labeled PCR products were separated by
capillary electrophoresis on an ABI Prism 3730 Genetic Analyzer (Applied
Biosystems) using ROX 500 as the size standard. The data were analyzed using the
Microsoft Excel software. Sanger sequencing and qPCR were used to confirm the
abnormal reading from a single probe to exclude the possibility of a SNP under a
probe or primer binding site.
qPCR
For each family, the gDNA of the parents, proband and fetus was used for
prenatal analysis. The DNA copy numbers for specific exons were determined using
the DNA-binding dye SYBR Green I. The reference gene ALB, which was
simultaneously quantified in separate tubes, was used to correct possible variation as
related to the DNA input amounts. The normal control male and female samples were
the mixture of DNA that was obtained from 10 normal males and females,
respectively. The amplification mixtures (20 μL) contained 10 μL of SYBR Green I
master mix (Takara), 0.2 μM each primer, 10 nM ROX fluorescein, 3.8 μL of
DNase/RNase-free water and 10 ng of template DNA. The no-template control (NTC)
included DNase/RNase-free water instead of DNA. The cycling conditions were as
follows: 30 s at 95°C, 40 cycles at 95°C for 5 s and 60°C for 34 s. Each sample was
run in triplicate on an ABI 7500 machine (Applied Biosystems) with a SD < 0.15. The
results were analyzed using the ABI 7500 software. The primers used for qPCR are
listed in Table S1.
Sanger sequencing
The specific primers for amplifying exon 67 of the DMD gene and SRY gene
were designed using Primer 3.0 (Table S1). DNA was amplified in a 20-μL reaction
volume, including 10 μL of PCR mix (2× HS Taq PCR Mix, TransGen Biotech,
TAKARA), 0.2 μM each primer, 100 ng of genomic DNA and 7.2 μL of
DNase/RNase-free water. The PCR cycling conditions were as follows: initial
denaturation at 95°C for 5 min, followed by 35 cycles at 94°C for 30 s, 55°C for 30 s,
and 72°C for 30 s, and a final extension at 72°C for 10 min. Sequence analysis was
performed using an ABI 3730XL DNA Analyzer (Applied Biosystems, USA). The
1
primers used for PCR are listed in Table S1.
Linkage analysis
The cytogenetic locations of these markers as well as the length of the amplified
products were obtained from the Human Genome Database and the Marshfield
Medical Center database. According to the identified mutation in the DMD gene, four
closely linked microsatellites among DXS1235, DXS1236, DXS1237, DXS1238,
DXS1241, DXS1242, DXS1214, DXS992, and STR07A were selected to determine
the haplotype of the fetus and to exclude maternal contamination. The sense primers
were labeled with FAM fluorophores, and the PCR products were separated by
capillary electrophoresis. The data were collected and analyzed using a 3730XL
genetic analyzer (Applied Biosystems).
Short reads alignment and parental SNP calling
The short reads that were generated using Illumina HiSeq 2000 sequencing were
mapped to the human reference genome (NCBI 37) using SOAP2. Then, we
performed SNP calling using SOAPsnp with the default parameters. Filters (Q>20 &
depth≥8) were set to guarantee the accuracy of the parental and probands’ genotypes.
Haplotyping in parent-offspring trios
We constructed the haplotype based on the trios’ strategy. For chromosome X, the
parental and proband’s haplotype were inferred by the genotype information of trios
as imposed by Mendel’s laws. For example, the genotype of the father is ‘A’, while
that of the mother is “AT” and of the proband is “AT”. In this case, the “A” must be
inherited from father, and the “T” should be inherited from mother. Here, we defined
the parental allele that was passed to offspring as haplotype 0 and the other as
haplotype 1. Thus, we could phase the “A” to haplotype 0 and “T” to haplotype 1 in
the mother, whereas the father was much easier because of the haploid status for
chromosome X.
Calculation of the overall sequencing error rate
Sequencing errors can originate during both PCR-based library construction and
next-generation sequencing. In the loci that were homozygous with the same alleles in
both parents, the fetal genotype must have been homozygous irrespective of a de novo
mutation. As de novo mutation is extremely rare, occurring 18-74 per offspring1, we
assume that all the discordant bases arise from sequencing error. Thus, we calculated
the sequencing error rate using the SNPs that were homozygous with the same
genotype in both parental genomes on chromosome 22, but with different bases in the
plasma. This rate is an important parameter in the following mathematic model.
Haplotyping of the fetus based on the HMM
1. Basic denotation
The number of loci on certain chromosomes was denoted as N c , while the total
number of loci was denoted as N * . The haplotypes of the father and mother were
FH  { fh0 }
recorded
as
and
,
respectively,
MH = {mh0 , mh1}
where mhk = {mi,k } , fhk  { fi ,k 0 } , k Î{0,1} ,
i = 1,2, 3..., N c , and
"fhi,k , mi,k Î{ A,C,G,T } .
{
}
The unknown fetal haplotype was denoted as H = {h0 ,h1 } , where h0 = mi,xi ,
2
{ }
and h1 = fi,xi . Therefore, qi = { xi , yi } consisted of the hidden state that we needed
to decipher, and the potential hidden state consisted of the set Q .
In maternal plasma sequencing, we denoted the sequence base as S = {Si } ,
where Si = {ni,A ,ni,C ,ni,G ,ni,T } indicates the sequencing depth of each base. For other
parameters in the maternal plasma, the average cff-DNA concentration and the
average sequence error were denoted as e and e .
2. Initial state distribution p = {p j } , j ÎQ . Due to the lack of prior probability, we
defined  j  Pr  q1  j 
1
, representing the same initial probability of each hidden
2
state.
3. Transition probabilities matrix A = { a jk } ( j, k ÎQ ), where
xi  xi 1 , yi  yi 1
1  pr
q jk  Pr  qi  k | qi 1  j   
xi  xi 1 , yi  yi 1
 pr
pr = re N * , and re was the average frequency of the recombination between
gemmates, where we used re = 30 for the whole genome.
{
}
4. Observation symbol probabilities matrix B = bi, j ( si ) ( j ÎQ ), where
(
bi, j ( si ) = Pr si qi = j, { m0 , m1 }
=
)
(ni,A + ni,C + ni,G + ni,T )!
n
n
n
n
× ( Pi,A ) i ,A × ( Pi,C ) i ,C × ( Pi,G ) i ,G × ( Pi,T ) i ,T
ni,A !ni,C !ni,G !ni,T !
(
Pi,base = Pr base qi = j, { m0 , m1 }
)
1
1
1
(1- e ) D ( base, mk ) + e × D ( base, mxi ) + e × D ( base, fyi )
2
2
kÎ{0,1} 2
and the indicator function
ìï 1- e
x=y
D ( x, y ) = í
x¹y
ïî e 3
=
å
5. Viterbi algorithm [3]
(1) Initialization d 1 ( q1 ) = p j ×b1,q1 ( s1 )
(2) Iteration
d i-1 ( qi-1 ) × aqi-1qi bi,q ( si ) ,
d i ( qi ) = max
q ÎQ
i
(
)
i-1
Y i ( qi ) = arg max d i-1 ( qi-1 ) × aqi-1qi
qi-1ÎQ
(3) Termination and backtracking
The final optimized hidden state qN* c= argmax d Nc ( qNc )
The optimized path
q = Yi ( qi )
*
i-1
qNc ÎQ
i = 2, 3,..., N c
3
2. Figure
Figure S1. Pedigrees and the inherited mutations that were identified in the DMD
gene for the eight analyzed families. The male probands are indicated by arrows. The
inherited mutations and the week of gestation (wk) of the mother (pregnant) for each
family are shown in the figure. The mothers in these families were mutation carriers
with genotypes of one mutant allele and one wild-type allele.
4
3. Tables
Table S1 Primer sequences used for PCR/qPCR in this study
Location
Forward primer (5’-3’)
DMD E2
TCATAATGGAAAGTTACTTTGGTTG
DMD E17
ACAATTTTATTTGGCTTCAATATGG
DMD E22
Reverse primer (5’-3’)
Product Length
Tests
219 bp
qPCR
GACATTACAGGTACCCGAGGATT
448 bp
qPCR
GGCAAAGTGTGAAACAATTAAGTG
TGGGCAAACTACCATACTTGTCAGAAT
317 bp
qPCR
DMD E23
TCATCTACTTTGTTTACATGTTTGAA
ACAGTGTATCGTTAGGGAAAAA
397 bp
qPCR
DMD E 45
TGTCTTTCTGTCTTGTATCCTTTGG
CTGCTAAAATGTTTTCATTCCTATTAGA
399 bp
qPCR
DMD E 47
GATAGACTAATCAATAGAAGCAAAG
GGGAGGAGGCTGGTATGTG
342 bp
qPCR
DMD E 56
TCCAAATTCACATTCATCGC
CCAGTTACTTGTGCTAAGACAATGAG
329 bp
qPCR
ALB E12
AGCTATCCGTGGTCCTGAAC
TTCTCAGAAAGTGTGCATATATCTG
202 bp
qPCR
DMD E 67
TGGCTACTCTTGAGAATTGCTACTG
CTGCCTACTGAAGAGCTAATATGAGA
369 bp
PCR
SRY
CTAAGTATCAGTGTGAAACGGG
CCTTCCGACGAGGTCGATAC
279 bp
PCR
CACAGGTACATAGTCCATTTTGAAA
5
Continued
Location
Forward primer (5’-3’)
Reverse primer (5’-3’)
DXS1235
AAGGTTCCTCCAGTAACAGATTTGG
TATGCTACATAGTATGTCCTCAGAC
DXS1236
CGTTTACCAGCTCAAAATCTCAAC
CATATGATACGATTCGTGTTTTGC
DXS1237
GAGGCTATAATTCTTTAACTTTGGC
CTCTTTCCCTCTTTATTCATGTTAC
DXS1238
TCCAACATTGGAAATCACATTTCAA
TCATCACAAATAGATGTTTCACAG
DXS1241
TGTCTGTCTTCAGTTATATG
ATAACTTACCCAAGTCATGT
DXS1242
TCTTGATATATAGGGATTATTTGTGTTTGTTATAC
ATTATGAAACTATAAGGAATAACTCATTTAGC
DXS1214
TAGAACCCAAATGACAACCA
TAGAACCCAAATGACAACCA
DXS992
AAGAATGGGACTCCATTTCA
AAGAATGGGACTCCATTTCA
STR07A
TTCTGGTTTTCTGGTCTG
TTCTGGTTTTCTGGTCTG
6
Product Length
Tests
Linkage analysis
Table S2. Data production of deep sequencing for target enrichment region
Mother
Family
Father
Proband
Plasma
Coverage
Depth
Coverage
Depth
Coverage
Depth
Coverage
Depth
F01
95.64%
37.08
95.55%
30.50
95.79%
60.10
95.92%
29.06
F02
95.57%
31.09
95.50%
27.49
95.53%
39.67
95.95%
30.10
F03
95.08%
50.47
95.89%
42.47
90.79%
12.25
95.88%
21.84
F04
95.53%
29.42
94.56%
11.93
95.19%
18.55
95.90%
28.48
F05
95.66%
35.19
96.08%
121.74
95.73%
47.97
95.92%
35.62
F06
95.61%
31.77
95.81%
54.90
95.48%
28.56
95.91%
26.38
F07
95.69%
45.11
95.57%
36.16
95.49%
27.02
95.94%
31.43
F08
95.19%
22.97
93.67%
9.13
95.02%
22.12
95.86%
21.50
7
Table S3. The inferred SNP genotypes compared with direct fetal gDNA sequencing data on maternal chromosome X and the DMD gene
region
Heterozygous SNP sites
Total SNP sites
Family
chrX
DMD
chrX
DMD
F01
5663(92.35%)
265(71.70%)
5,580,677(99.99%)
156,597(99.95%)
F02
5859(78.53%)
276(78.62%)
5,587,961(99.98%)
141,948(99.96%)
F03
2005(84.34%)
62(100.00%)
2,413,321(99.99%)
46,402(100.00%)
F04
5164(78.78%)
285(100.00%)
5,087,381(99.98%)
147,877(100.00%)
F05
5700(89.46%)
233(100.00%)
5,638,886(99.99%)
154,455(100.00%)
F06
5241(87.67%)
249(87.95%)
5,251,571(99.99%)
150,469(99.98%)
F07
6165(87.67%)
250(77.20%)
5,620,763(99.99%)
160,426(99.96%)
F08
4685(82.18%)
208(79.81%)
4,612,512(99.98%)
112,682(99.96%)
8
Reference
1. Francioli1 LC, Menelaou1 A, Pulit SL, et al. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet
2014;46(8): 818-825.
9
Download