Supplementary Information (doc 48K)

Supplemental methods Illumina library preparations 10µg whole genome DNA was sheared to 3kb fragments using the Covaris S2 Adaptive Focused Acoustic (AFA) instrument (Covaris, Massachussetts, USA) and miniTubes (Covaris) according to the manufacturer’s instructions followed by purification with the PCR Purification Kit (Qiagen, Venlo, The Netherlands). Next the mate pair sequencing protocol from Illumina (Illumina 2-5kb mate pair protocol v2) was used to make the libraries according to the manufacturer’s instructions with minor modifications. Qiaquick spin columns were used for purification instead of the QIAEX II suspension. The enrichment of adaptermodified DNA fragments was done by PCR with custom multiplex primers instead of the primers PE 1.0 and PE 2.0. Analysis of the mate pair data Cluster analysis Deletions/insertions were called when mates had a correct orientation and an insert size larger/smaller than the median insert size +/- 2 SD. Tandem duplications or inversions were called from mate pair reads with everted or inverted orientation, respectively (Supplementary Figure S1). For all aberrations both the insert size as well as the orientation was taken into account. All discordant reads are isolated from the mapping data. To avoid false discordant pairs due to mapping problems, a local realignment for each of pairs identified as potential translocations is attempted. The ClustalW-powered realignment1 is performed within a region limited by the insert size + 2 SD ( 95% confidence interval size). If the read can be properly mapped with respect to the other read, the aligned pair is discarded for further analysis, and labeled as a false discordant (concordant) pair. During this filtering step discordant reads are evaluated for their grouping in known segmental duplications, RepeatMasker regions or coincide with a hg19 Self Chain record (UCSC data tables, 2). Once a filtered list of discordant pairs is generated, these are grouped in clusters covering the same structural variant. The clustering algorithm loads all pairs and compares them with the existing clusters. A matching cluster is identified based on the overlapping region of the potential breakpoint site, variant type and similarity of the insert size of the pairs. If multiple clusters are possible, the closest matching cluster is selected. If no matching cluster is retrieved, a new cluster is created. Aberrant clusters were only retained when the number of mate pairs per cluster was equal to or exceeded a preset cut-off. This cut-off of mate pairs per cluster was set based on coverage. For a subset of the samples we raised the cut-off until a set of known variants were no longer detected in the resulting variant list. The relation between this cut-off at the detection limit and the coverage was calculated. This was then used to impute a cut-off for each experiment in the complete set, based on their coverage. In general this means that experiments with a higher overall coverage will have a higher cut-off. The variant type of a cluster can be assigned based on the signature caused by the structural variant 3. Detection of simple duplications, inversions, deletions, insertions and translocations is automated, but complex variants require manual interpretation of the cluster pattern. Depth of coverage analysis For this purpose CNV-seq is implemented 4. As reference pool, experiments were grouped according to GC-bias for normalization. The number of mapped reads is counted using sliding windows along the chromosomes. The size of these windows is determined by the overall coverage of that chromosomes and the coverage ratio is calculated between a reference set and the sample. Deletions or duplications were called when a genomic region had a log2ratio below -0.50 or above 0.45 with a p-value <0.001. By setting a p-value cut-off, a list indicating putative copy number variant regions is generated. CNVs mapping in segmental duplications were discarded. Data was visualized in the in-house developed browser (http://medgen.ugent.be/vivar/) (Sante et al. in preparation). Filtering strategies Abberations which overlap with a DGV (Database of Genomic Variants, http://projects.tcag.ca/variation/) entry for 75% or more, were filtered out. The remaining list of aberrations is subsequently compared with the pool of samples (i.e. all patients and parents) and our internal dataset. In this step, the variant calls from each patient were compared to variants found in other patients from our cohort. Here we assumed that SVs with the same breakpoints found in more than four unrelated patients are likely to be common variants in the population and less likely to be pathogenic. If an aberration is present in at least four other patients, the aberration was then filtered out. Sanger sequencing of the breakpoints Using the cluster data (i.e. position and orientation), unique primers were designed in a way that the PCR amplicons span the breakpoints. PCR products were purified from gel when needed. Subsequent capillary sequencing was performed using the ABI 3730XL Genetic Analyzer (Applied Biosystems). Using BLAST and BLAT software, sequencing reads were aligned to the human reference genome (GRCh37, hg19). These hits were analyzed manually to determine the exact breakpoints and breakpoint characteristics. Quantitative PCR (qPCR) The copy numbers were analyzed using 5 ng DNA, 2.5 μl of sso Advanced 2x Mastermix (Bio Rad, Nazareth, Belgium) and 5 μM primers in a total volume of 5 μl. Assays amplifying ZNF80 and GPR15 genomic DNA were used for normalization ((RTprimerDB #1021 and #1022)5. Analysis was performed as described previously 6 with the qBasePlus software (http://www.biogazelle.com). References 1. Larkin MA, Blackshields G, Brown NP et al: Clustal W and Clustal X version 2.0. Bioinformatics (Oxford, England) 2007; 23: 2947-2948. 2. Fujita PA, Rhead B, Zweig AS et al: The UCSC Genome Browser database: update 2011. Nucleic acids research 2011; 39: D876-882. 3. Medvedev P, Stanciu M, Brudno M: Computational methods for discovering structural variation with next-generation sequencing. Nature methods 2009; 6: S13-20. 4. Xie C, Tammi MT: CNV-seq, a new method to detect copy number variation using highthroughput sequencing. BMC bioinformatics 2009; 10: 80. 5. Lefever S, Vandesompele J, Speleman F, Pattyn F: RTPrimerDB: the portal for real-time PCR primers and probes. Nucleic Acids Res 2009; 37: D942-945. 6. D'Haene B, Vandesompele J, Hellemans J: Accurate and objective copy number profiling using real-time quantitative PCR. Methods 2010; 50: 262-270.

Supplementary Information (doc 48K)

Related documents

Products

Support

Supplementary Information (doc 48K)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib