Supplementary Information (doc 48K)

advertisement
Supplemental methods
Illumina library preparations
10µg whole genome DNA was sheared to 3kb fragments using the Covaris S2 Adaptive
Focused Acoustic (AFA) instrument (Covaris, Massachussetts, USA) and miniTubes
(Covaris) according to the manufacturer’s instructions followed by purification with the PCR
Purification Kit (Qiagen, Venlo, The Netherlands). Next the mate pair sequencing protocol
from Illumina (Illumina 2-5kb mate pair protocol v2) was used to make the libraries
according to the manufacturer’s instructions with minor modifications. Qiaquick spin columns
were used for purification instead of the QIAEX II suspension. The enrichment of adaptermodified DNA fragments was done by PCR with custom multiplex primers instead of the
primers PE 1.0 and PE 2.0.
Analysis of the mate pair data
Cluster analysis
Deletions/insertions were called when mates had a correct orientation and an insert size
larger/smaller than the median insert size +/- 2 SD. Tandem duplications or inversions were
called from mate pair reads with everted or inverted orientation, respectively (Supplementary
Figure S1). For all aberrations both the insert size as well as the orientation was taken into
account. All discordant reads are isolated from the mapping data. To avoid false discordant
pairs due to mapping problems, a local realignment for each of pairs identified as potential
translocations is attempted. The ClustalW-powered realignment1 is performed within a region
limited by the insert size + 2 SD ( 95% confidence interval size). If the read can be properly
mapped with respect to the other read, the aligned pair is discarded for further analysis, and
labeled as a false discordant (concordant) pair. During this filtering step discordant reads are
evaluated for their grouping in known segmental duplications, RepeatMasker regions or
coincide with a hg19 Self Chain record (UCSC data tables, 2).
Once a filtered list of discordant pairs is generated, these are grouped in clusters covering the
same structural variant. The clustering algorithm loads all pairs and compares them with the
existing clusters. A matching cluster is identified based on the overlapping region of the
potential breakpoint site, variant type and similarity of the insert size of the pairs. If multiple
clusters are possible, the closest matching cluster is selected. If no matching cluster is
retrieved, a new cluster is created. Aberrant clusters were only retained when the number of
mate pairs per cluster was equal to or exceeded a preset cut-off.
This cut-off of mate pairs per cluster was set based on coverage. For a subset of the samples
we raised the cut-off until a set of known variants were no longer detected in the resulting
variant list. The relation between this cut-off at the detection limit and the coverage was
calculated. This was then used to impute a cut-off for each experiment in the complete set,
based on their coverage. In general this means that experiments with a higher overall coverage
will have a higher cut-off.
The variant type of a cluster can be assigned based on the signature caused by the structural
variant 3. Detection of simple duplications, inversions, deletions, insertions and translocations
is automated, but complex variants require manual interpretation of the cluster pattern.
Depth of coverage analysis
For this purpose CNV-seq is implemented 4. As reference pool, experiments were grouped
according to GC-bias for normalization.
The number of mapped reads is counted using sliding windows along the chromosomes. The
size of these windows is determined by the overall coverage of that chromosomes and the
coverage ratio is calculated between a reference set and the sample. Deletions or duplications
were called when a genomic region had a log2ratio below -0.50 or above 0.45 with a p-value
<0.001. By setting a p-value cut-off, a list indicating putative copy number variant regions is
generated. CNVs mapping in segmental duplications were discarded. Data was visualized in
the in-house developed browser (http://medgen.ugent.be/vivar/) (Sante et al. in preparation).
Filtering strategies
Abberations
which
overlap
with
a
DGV
(Database
of
Genomic
Variants,
http://projects.tcag.ca/variation/) entry for 75% or more, were filtered out. The remaining list
of aberrations is subsequently compared with the pool of samples (i.e. all patients and parents)
and our internal dataset. In this step, the variant calls from each patient were compared to
variants found in other patients from our cohort. Here we assumed that SVs with the same
breakpoints found in more than four unrelated patients are likely to be common variants in the
population and less likely to be pathogenic. If an aberration is present in at least four other
patients, the aberration was then filtered out.
Sanger sequencing of the breakpoints
Using the cluster data (i.e. position and orientation), unique primers were designed in a way
that the PCR amplicons span the breakpoints. PCR products were purified from gel when
needed. Subsequent capillary sequencing was performed using the ABI 3730XL Genetic
Analyzer (Applied Biosystems). Using BLAST and BLAT software, sequencing reads were
aligned to the human reference genome (GRCh37, hg19). These hits were analyzed manually
to determine the exact breakpoints and breakpoint characteristics.
Quantitative PCR (qPCR)
The copy numbers were analyzed using 5 ng DNA, 2.5 μl of sso Advanced 2x Mastermix
(Bio Rad, Nazareth, Belgium) and 5 μM primers in a total volume of 5 μl. Assays amplifying
ZNF80 and GPR15 genomic DNA were used for normalization ((RTprimerDB #1021 and
#1022)5. Analysis was performed as described previously
6
with the qBasePlus software
(http://www.biogazelle.com).
References
1.
Larkin MA, Blackshields G, Brown NP et al: Clustal W and Clustal X version 2.0.
Bioinformatics (Oxford, England) 2007; 23: 2947-2948.
2.
Fujita PA, Rhead B, Zweig AS et al: The UCSC Genome Browser database: update 2011.
Nucleic acids research 2011; 39: D876-882.
3.
Medvedev P, Stanciu M, Brudno M: Computational methods for discovering structural
variation with next-generation sequencing. Nature methods 2009; 6: S13-20.
4.
Xie C, Tammi MT: CNV-seq, a new method to detect copy number variation using highthroughput sequencing. BMC bioinformatics 2009; 10: 80.
5.
Lefever S, Vandesompele J, Speleman F, Pattyn F: RTPrimerDB: the portal for real-time PCR
primers and probes. Nucleic Acids Res 2009; 37: D942-945.
6.
D'Haene B, Vandesompele J, Hellemans J: Accurate and objective copy number profiling
using real-time quantitative PCR. Methods 2010; 50: 262-270.
Download