Supplementary Method 2 Algorithm for detection of possible variants

advertisement
Supplementary Method 2
Algorithm for detection of possible variants from resequencing microarray data
1. Background
The algorithm is developed to complement the GSEQ4.0 algorithm, ABACUS.
GSEQ4.0 occasionally overlooks variants that can easily be detected by visual
inspection of the signal pattern, partly because the emphasis of ABACUS is on the
accuracy of base calling. To minimize the number of “false negatives”, even if the
number of “false positives” increases, the following algorithm was developed
incorporating the experience of visual inspection.
2. Calculation of ratios of signal intensities
Supplementary Figure S1. Examples of signal intensities corresponding to individual
tiled oligonucleotides.
(A)
(B)
Signal
intensity
Homozygous (T)
s1’
Heterozygous (C/T)
s1
s2’ s3’
s2 s3
s1’
s1
s2’
s3’
s2
s3
A C G T
A C G T
A C G T
A C G T
Antisense probe
Sense probe
Antisense probe
Sense probe
(A) In the bar graph, the highest signal intensity in the sense probe is T and
that in the antisense probe is A, thus unequivocally confirming the base (T in
the sense strand).
(B) In this example, the highest (C) and the second highest signal (T)
intensities are comparable and the signal intensities of C and T are sufficiently
higher than those of A and G in the sense strand. Similarly, the signal
intensities of A and G are sufficiently higher than those of C and T in the
antisense strand, thus unequivocally confirming that the base is heterozygous
C/T.
To implement the explanation shown in Supplementary Fig. S1 into the algorithm,
we first calculate the following parameters.
# For sense probes:
X=the highest intensity (s1) / the second highest intensity (s2)
Y=the second highest intensity (s2) /the third highest intensity (s3)
# For antisense probes:
X’=the highest intensity (s1’) / the second highest intensity (s2’)
Y’=the second highest intensity (s2’) / the third highest intensity (s3’)
3. Algorithm to select candidates as homozygous and heterozygous variants.
# Detection of a homozygous variant:
If the highest signal intensities at a position are sufficiently higher than the
second highest signal intensities in both the sense and antisense strands (condition [I]
described below), the position is selected as a candidate for a homozygous variant.
-- Condition [I]: X>1.25 and X’>1.25
# Detection of a heterozygous variant:
1. If the bases with the highest intensity in the sense and antisense strands are
concordant with the bases with the second highest intensities in the antisense and
sense strands, respectively (Supplementary Fig. S2), AND if the second highest
intensities are sufficiently higher than the third highest intensities in both strands
(condition [II]), then, this position is selected as a candidate for a heterozygous variant.
Supplementary Figure S2. Example of heterozygous base (C/T)
s1’
s1
s2’
Signal
intensity
s3’
s2
s3
A C G T
A C G T
Antisense probe
Sense probe
The base with the highest intensity in the sense strand (C) is concordant
with the base with the second highest intensity in the antisense strand (G),
whereas the base with the second highest intensity in the sense strand (T) is
concordant with the base with the highest intensity in the antisense strand (A).
--Condition [II]: Y > 1.25 and Y’ > 1.25
2. If the bases with the highest intensity are concordant between the sense and
antisense strands, AND if the bases with the second highest intensities are also
concordant between the sense and antisense strands (Supplementary Fig. S3), the
algorithm selects this position as a candidate for a heterozygous variant when s2 and s2’
are sufficiently higher than s3 and s3’, respectively, AND if s1 and s1’ are comparable to
s2 and s2’, respectively (condition [III]-1).
Supplementary Figure S3. Example of heterozygous base (C/T) (2)
Signal
intensity
s1’
s2
s1
s2’
s3’
s3
A C G T
A C G T
Antisense probe
Sense probe
The base with the highest intensity in the sense strand (T) is concordant
with the base with the highest intensity in the antisense strand (A), whereas
the base with the second highest intensity in the sense strand (C) is concordant
with the base with the second highest intensity in the antisense strand (G).
--Condition [III] -1. (Y >= 1.25 and Y’ >= 1.25) and ((X<1.3 or X’<1.3) and (X<2 and X’<2))
Because the criterion (condition [III]-1) is rather strict, we employed additional
criteria ([III]-2 and [III]-3). This is based on the following observation: when the signal
intensity of one of the strands is low, we frequently observe that the highest signal
intensity and the second highest signal intensity are comparable (indicative of a
heterozygous variant) but the second highest signal intensity is not sufficiently higher
than the third highest signal intensity in one strand (antisense probes shown in
Supplementary Fig. S5). In such circumstances, if the signal intensity of the other
strand is highly indicative of a heterozygous variant, the base should be considered as a
candidate.
Supplementary Figure S4. Example of heterozygous base (C/T) where signal
intensities of all the antisense probes are substantially low.
s2
Signal
intensity
s1’
s2’
s3’
s1
s3
A C G T
A C G T
Antisense probe
Sense probe
Although the signal intensities of sense probes are highly indicative of
the C/T heterozygous variant, signal intensities of all the antisense probes
are similarly low, not fulfilling Condition [III-1].
--Condition [III]-2. (X<2 and Y > 1.25) and (1<X’<1.05 and 1.1<Y’<1.25)
--Condition [III]-3. (X’<2 and Y’ > 1.25) and (1<X<1.05 and 1.1<Y<1.25)
Taken together, we consider a position as a candidate for being heterozygous
when any of the conditions [III]-1 – [III]-3 is fulfilled.
4. Example showing usefulness of algorithm
(A) Detection of 11 mutations found by our program compared with the GSEQ4.0
algorithm with various quality score thresholds
Supplementary Figure S5. Numbers of mutations detected by GSEQ alone and those
detected by combination of GSEQ and our algorithm
Number of detected mutations
12
10
8
6
4
GSEQ alone
2
GSEQ + our algorithm
0
0
2
4
6
8
10
12
Quality Score Threshold used in GSEQ
We detected 11 base substitutions using the 50 kb format resequencing array
analysis. The algorithm detected all of them, whereas the GSEQ4.0 algorithm
(ABACUS) detected only 10 of them even with the low quality score threshold
(Supplementary Fig. S5).
(B) Signal intensity of the mutation that is not detected by GSEQ4.0 but detected by
the algorithm
Supplementary Figure S6. Bar graphs show the signal intensities corresponding to
c.1741 of SPAST of a patient and those of a control
Signal
intensity
A
C
G
T
Antisense probes
A
C
G
T
Sense probes
The bar graphs show the signal intensities derived from oligonucleotides
corresponding to the target base of c.1741. (Red bars, signal intensities
obtained from a sample with c.1741C>T mutation; black bars, signal
intensities obtained from a sample with the wild-type sequence.)
GSEQ4.0 alone did not detect the mutation (c.1741C>T; R581X in SPAST) even
in the low quality score threshold. Actual signal patterns of the substitution are shown
in red bars in Supplementary Fig. S7. Note that the signal pattern of the sample with
c.1741C>T mutation is distinct from that of a wild-type sample, and the substitution is
easily detected by visual inspection of the bar graph.
According to our algorithm, X=1.1, Y=3.0, X’=1.8, and Y’=4.6 were calculated.
They fulfilled condition [II] described above, and the algorithm identified the base as
heterozygous C/T. The subsequent direct nucleotide sequence analysis confirmed it.
Download