Supplementary Information

advertisement
Supplementary Information
Model
We are interested in modeling the statistical limitations of cell-free DNA based
diagnostics for Mendelian and mainly recessive diseases. Since the amount of fetal DNA
in maternal blood varies widely and increases over the course of pregnancy, it is
uncertain whether most early gestation cases will have sufficient fetal DNA to derive a
statistically meaningful result.
In selecting the correct model, we think of the different DNA alleles of a target
region the same as different-colored discrete marble balls in an effectively infinitely large
bag. The bag, representing the cell-free portion of the circulatory system, would be well
mixed and have many copies of both marbles or alleles (as an estimate, one target would
specifically have about 1000 copies/mL * 5000 mL plasma = 5 million copies in the
entire plasma circulation). A standard blood draw with a subsequent unbiased counting
measurement would be similar to blindly pouring out of the bag and thereafter counting
up the different colored marbles. We assume that outflow of a particular allele or marble
is not going to affect the probability of what other marbles will be. Then we can assume
that the outflow of the 2 independent marbles/alleles follows a Poisson distribution and
are independent of each other.
To aim at the theoretical maximum, we assume perfect technical execution with
no bias in the process to either allele. We also assume that the mother is a carrier and is
therefore heterozygous at the mutation site. The calculations based on Poisson
distributions of finite DNA allele counts of a target allele are below:
Definitions:

fetal _ DNA
total _ DNA
NT = # total alleles = NM + NW,
NM = # mutant alleles,
NW = # wildtype alleles
1
Figure S1: Cell-free DNA can be divided into DNA from a fetal cell origin or maternal
cell origin. The fraction that is fetal cell derived is fetal fraction (diagonal lines). In any
scenario, the mother contributes equal amounts of both alleles. A fetus that is
heterozygous or a silent carrier (and unaffected) also contributes equal amounts of DNA.
However a fetus that is homozygous contributes all its DNA or ε * NT additional DNA to
the homozygous allele count.
Based on Figure S2, we can infer that:
If fetus is homozygous case then on average: NM - NW = ε * NT
If fetus is heterozygous case then on average: NM - NW = 0
We then calculate the standard deviation for each allelic count in both scenarios
and then calculate the combined standard deviation after the subtraction NM - NW (the
variance of sum of two normal distributions is equal to the sum of the two variances). We
find conveniently that in both scenarios the standard deviation is the same at
NT .
Homozygous case:
σ of NM =
0.5 * NT  0.5 * 
σ of NW =
0.5 * N T  0.5 * 
σ of NM - NW =
=
( 0.5 * N T  0.5 *  ) 2  ( 0.5 * N T  0.5 *  ) 2
NT
Heterozygous case:
2
σ of NM or NW =
0.5 * N T
σ of NM - NW =
( 0.5 * N T ) 2  ( 0.5 * N T ) 2
=
N T (same as the homozygous case)
The Z-statistic or Z-score can be calculated by putting the allelic count difference
over the common standard deviation. Given that the denominator is the same, the Zscores can be compared regardless of fetal genotype. With the below simplified
analytical equation we can generate curves relating Z-score to fetal fraction (ε), and total
counts (Figure S2).
Z  score (theoretical) =
 * NT
NT
Equation S1: Theoretical average Z-score based on a Poisson approximation for a
single SNP when the fetus is homozygous.
Z  score (empirical) =
Equation S2:
NM - NW
NT
Empirically derived Z-score for a single SNP when the fetus is
homozygous.
The allele counts that we use for Equation S2 refer to the lowest amount of
molecular counts in the entire sample processing and method. For example if a single
target was counted one million times after an amplification step but it only had 1000
molecules prior to amplification then the total counts should be renormalized to 1000
rather than 1 million and all calculations should be based values recalculated on basis of
the 1000 value.
The calculations here are for genetic content that is measurable. An example of
genetic content that is in the plasma but not measurable is an allele physically located at
the edge of a DNA fragment and therefore would not be amendable to PCR amplification
(although measurable with sequencing). An 80 bp amplicon will only amplify about half
3
of the strands contain in the targeted allele given that the allele location is evenly
distributed on the stereotypical 160 bp strand. Published efforts have shown that shorter
amplicons can effectively enrich for fetal content presumably because fetal DNA
fragments are shorter1.
One key point here is that some samples may lack the fetal fraction and blood
quantity to reach the minimal theoretical threshold for confidence. Other samples may
reach the minimal theoretical threshold but they will have overlap between the theoretical
distribution of a homozygous and heterozygous fetus. A measurement that falls by
chance into the overlap would be indeterminate. There is a range of 1000-2000 copies per
mL plasma (~500-1000 copies per mL blood) and when fetal fraction 5% (average value
for first trimester), one needs a 20,000 copies or 20 mL of blood to achieve good
separation of the fetus homozygous and fetus heterozygous scenarios. At 2%, it is likely
that more than 100,000 copies (~100 mL) will be necessary to ensure that more than
99% of samples will be distinguishable—a requirement that is unlikely to be practical. In
routine practice, a tube may contain 10 mL of blood, and although several tubes are
routinely drawn for pregnant women, many of them are used for a battery of other routine
tests. While a blood donation is much more volume at 1 pint or about 450-500 mL, the
logistics of transportation and sample processing are practical barriers. Luckily for the
vast majority of cases, it may be that only a few tubes of blood are necessary and this or a
related model will be employed to ensure that the result is statistically confident for one
fetal genotype and does not fall into a region of overlap between the distributions of two
fetal possibilities.
While we have described the case of a homozygous mutation, recessive
Mendelian diseases can occur as a compound heterozygous combination of 2 alleles on
different locations on the same gene. For example, one mutation can be a premature stop
codon on exon 2 and another could be a point mutation in exon 5 in a critical active site
of the gene’s protein. Compound heterozygous states will effectively disable both copies
of the same critical gene and can occur frequently depending on the population in
question. The same essential model and equations presented above can be used for
compound heterozygous scenarios. For these scenarios, it is critical to distinguish
between heterozygous and homozygous non-diseased allele at maternal mutation site. If
4
the maternal site is heterozygous and the paternal site also has a mutation, then it would
imply a disease phenotype. Measuring the paternal site is less technically challenging
and similar to the measurement of the fetal fraction.
The model described can also be applying to multiple haplogroup linked markers.
If the markers are assumed to be 100% associated with the mutation then the allelic
counts of each marker are summative.
Figure S2: Theoretical Z-score averages for various molecular counts and fetal fractions
if the fetus is homozygous (based on Equation S1). Heterozygous Z-scores always
average zero. Confident calls involve either high Z-scores from a homozygous fetus or
near-zero Z-score from a heterozygous fetus. The Z-score distribution of the two fetal
genotypes will not significantly overlap when an average theoretical Z-score of a
homozygous fetus is over 4 (almost all of each distribution is within 2 Z-scores in each
5
direction). There with a fetal fraction of 15%, 1400 counts would result in a confident
call in almost all cases; for fetal fraction of 5%, 20,000 counts would be necessary.
However, even if with some overlap between the two distributions, the empirically
counted Z-score can still fall outside the zone of overlap by chance and result in a
confident diagnosis. To take into account haplotype-linked SNPs, the counts of alleles
from other loci linked to the mutation are summative. For example, 1000 counts from 10
sites will be effectively 10,000 counts. Note that this assumes negligible amplification
bias when amplifying the haplotype-linked loci.
Figure S3: Example readout of droplets from a cell-free sample with two alleles that
correspond to the two respective fluorophores FAM and VIC.
6
Table S1: Markers for mutation and haplotype linked positions, their digital PCR counts,
and calculated Z-scrore.
Probe #
dbSNP #
Wildtype
Allele
Wildtype
Counts
Mutation
Counts
Total
Droplets
G
Mutation
associated
allele
A
249
(direct)
rs1219182
57
656.2
927.4
43939
249
(postPCR)
213
214
218
219a
219b
223
rs1219182
57
G
A
5440.0
3807.6
25520
rs4715130
rs6923124
rs2229384
rs7750918
rs3729619
rs4469291
C
G
T
TA[G]TT
CG[A]AG
AATTTTT[A]A
A
G
G
T
T
233
234
241
243
rs7774688
rs4573082
rs497734
rs9369836
Normaliz
ed Zscore
5.97
7.12
T
A
C
TA[C]TT
CG[T]AG
AATTTTT[T]A
A
A
T
G
C
2099.6
3521.0
1288.0
3473.0
1774.0
2532.9
1665.9
4555.4
1534.3
4565.7
2031.6
3223.3
27798
27897
25520
27262
28103
27557
4.56
5.58
3.28
4.88
3.82
4.46
2556.8
1954.2
3010.7
2768.2
3208.0
2337.6
4024.9
3467.7
27847
27798
27897
28427
3.28
2.80
4.72
5.15
1.
Sikora A, Zimmermann BG, Rusterholz C, Birri D, Kolla V, Lapaire O, Hoesli I,
Kiefer V, Jackson L, Hahn S: Detection of Increased Amounts of Cell-Free Fetal DNA
with Short PCR Amplicons, Clinical Chemistry 2010, 56:136-138
7
Download