Supplementary Method 2 Algorithm for detection of possible variants from resequencing microarray data 1. Background The algorithm is developed to complement the GSEQ4.0 algorithm, ABACUS. GSEQ4.0 occasionally overlooks variants that can easily be detected by visual inspection of the signal pattern, partly because the emphasis of ABACUS is on the accuracy of base calling. To minimize the number of “false negatives”, even if the number of “false positives” increases, the following algorithm was developed incorporating the experience of visual inspection. 2. Calculation of ratios of signal intensities Supplementary Figure S1. Examples of signal intensities corresponding to individual tiled oligonucleotides. (A) (B) Signal intensity Homozygous (T) s1’ Heterozygous (C/T) s1 s2’ s3’ s2 s3 s1’ s1 s2’ s3’ s2 s3 A C G T A C G T A C G T A C G T Antisense probe Sense probe Antisense probe Sense probe (A) In the bar graph, the highest signal intensity in the sense probe is T and that in the antisense probe is A, thus unequivocally confirming the base (T in the sense strand). (B) In this example, the highest (C) and the second highest signal (T) intensities are comparable and the signal intensities of C and T are sufficiently higher than those of A and G in the sense strand. Similarly, the signal intensities of A and G are sufficiently higher than those of C and T in the antisense strand, thus unequivocally confirming that the base is heterozygous C/T. To implement the explanation shown in Supplementary Fig. S1 into the algorithm, we first calculate the following parameters. # For sense probes: X=the highest intensity (s1) / the second highest intensity (s2) Y=the second highest intensity (s2) /the third highest intensity (s3) # For antisense probes: X’=the highest intensity (s1’) / the second highest intensity (s2’) Y’=the second highest intensity (s2’) / the third highest intensity (s3’) 3. Algorithm to select candidates as homozygous and heterozygous variants. # Detection of a homozygous variant: If the highest signal intensities at a position are sufficiently higher than the second highest signal intensities in both the sense and antisense strands (condition [I] described below), the position is selected as a candidate for a homozygous variant. -- Condition [I]: X>1.25 and X’>1.25 # Detection of a heterozygous variant: 1. If the bases with the highest intensity in the sense and antisense strands are concordant with the bases with the second highest intensities in the antisense and sense strands, respectively (Supplementary Fig. S2), AND if the second highest intensities are sufficiently higher than the third highest intensities in both strands (condition [II]), then, this position is selected as a candidate for a heterozygous variant. Supplementary Figure S2. Example of heterozygous base (C/T) s1’ s1 s2’ Signal intensity s3’ s2 s3 A C G T A C G T Antisense probe Sense probe The base with the highest intensity in the sense strand (C) is concordant with the base with the second highest intensity in the antisense strand (G), whereas the base with the second highest intensity in the sense strand (T) is concordant with the base with the highest intensity in the antisense strand (A). --Condition [II]: Y > 1.25 and Y’ > 1.25 2. If the bases with the highest intensity are concordant between the sense and antisense strands, AND if the bases with the second highest intensities are also concordant between the sense and antisense strands (Supplementary Fig. S3), the algorithm selects this position as a candidate for a heterozygous variant when s2 and s2’ are sufficiently higher than s3 and s3’, respectively, AND if s1 and s1’ are comparable to s2 and s2’, respectively (condition [III]-1). Supplementary Figure S3. Example of heterozygous base (C/T) (2) Signal intensity s1’ s2 s1 s2’ s3’ s3 A C G T A C G T Antisense probe Sense probe The base with the highest intensity in the sense strand (T) is concordant with the base with the highest intensity in the antisense strand (A), whereas the base with the second highest intensity in the sense strand (C) is concordant with the base with the second highest intensity in the antisense strand (G). --Condition [III] -1. (Y >= 1.25 and Y’ >= 1.25) and ((X<1.3 or X’<1.3) and (X<2 and X’<2)) Because the criterion (condition [III]-1) is rather strict, we employed additional criteria ([III]-2 and [III]-3). This is based on the following observation: when the signal intensity of one of the strands is low, we frequently observe that the highest signal intensity and the second highest signal intensity are comparable (indicative of a heterozygous variant) but the second highest signal intensity is not sufficiently higher than the third highest signal intensity in one strand (antisense probes shown in Supplementary Fig. S5). In such circumstances, if the signal intensity of the other strand is highly indicative of a heterozygous variant, the base should be considered as a candidate. Supplementary Figure S4. Example of heterozygous base (C/T) where signal intensities of all the antisense probes are substantially low. s2 Signal intensity s1’ s2’ s3’ s1 s3 A C G T A C G T Antisense probe Sense probe Although the signal intensities of sense probes are highly indicative of the C/T heterozygous variant, signal intensities of all the antisense probes are similarly low, not fulfilling Condition [III-1]. --Condition [III]-2. (X<2 and Y > 1.25) and (1<X’<1.05 and 1.1<Y’<1.25) --Condition [III]-3. (X’<2 and Y’ > 1.25) and (1<X<1.05 and 1.1<Y<1.25) Taken together, we consider a position as a candidate for being heterozygous when any of the conditions [III]-1 – [III]-3 is fulfilled. 4. Example showing usefulness of algorithm (A) Detection of 11 mutations found by our program compared with the GSEQ4.0 algorithm with various quality score thresholds Supplementary Figure S5. Numbers of mutations detected by GSEQ alone and those detected by combination of GSEQ and our algorithm Number of detected mutations 12 10 8 6 4 GSEQ alone 2 GSEQ + our algorithm 0 0 2 4 6 8 10 12 Quality Score Threshold used in GSEQ We detected 11 base substitutions using the 50 kb format resequencing array analysis. The algorithm detected all of them, whereas the GSEQ4.0 algorithm (ABACUS) detected only 10 of them even with the low quality score threshold (Supplementary Fig. S5). (B) Signal intensity of the mutation that is not detected by GSEQ4.0 but detected by the algorithm Supplementary Figure S6. Bar graphs show the signal intensities corresponding to c.1741 of SPAST of a patient and those of a control Signal intensity A C G T Antisense probes A C G T Sense probes The bar graphs show the signal intensities derived from oligonucleotides corresponding to the target base of c.1741. (Red bars, signal intensities obtained from a sample with c.1741C>T mutation; black bars, signal intensities obtained from a sample with the wild-type sequence.) GSEQ4.0 alone did not detect the mutation (c.1741C>T; R581X in SPAST) even in the low quality score threshold. Actual signal patterns of the substitution are shown in red bars in Supplementary Fig. S7. Note that the signal pattern of the sample with c.1741C>T mutation is distinct from that of a wild-type sample, and the substitution is easily detected by visual inspection of the bar graph. According to our algorithm, X=1.1, Y=3.0, X’=1.8, and Y’=4.6 were calculated. They fulfilled condition [II] described above, and the algorithm identified the base as heterozygous C/T. The subsequent direct nucleotide sequence analysis confirmed it.