Supplemental data TESTING PHASE AND TECHNOLOGY SELECTION The first step of NGS diagnostic implementation was to test the available platforms in our laboratory, together with various enrichment strategies. Basically, a panel of 20 control DNA harbouring routine BRCA mutations as well as difficult cases was tested with different enrichment, sequencing and bioinformatics procedures in order to define the pros and cons of each technology in a diagnostic perspective. We first tested the accuracy of the SOLiDv3 system by fragment sequencing of the 20 barcoded patients using in-house multiplex PCRs and the Multiplicom BRCA MASTR assay v 0.1 enrichment kit, both targeting approx. 40 kb of BRCA1 and BRCA2 coding sequence. Four barcodes were found to perform poorly, with a low number of attributed reads. Following BFAST mapping and variant calling with SAMTools , most false positives were filtered out using the minimum coverage, the minimum allele ratio and distribution of the heterozygosis frequency as main filtering parameters. All specific point mutations were successively localized and identified along with expected polymorphisms. Large rearrangements were found by comparison of normalized exonic coverage between patients. Following these promising results in terms of sequencing accuracy, we focused on target enrichment, a critical step which must ensure that all the ROI is properly covered. Consequently, different enrichment strategies were tested in combination with SOLiDv4 and/or PGM sequencing. A Sure Select liquid capture panel of 43 genes involved in oncogenetics and targeting promoters, exons and flanking regions was designed with the help of Agilent. The same 43 genes-panel was also designed using the Selector technology from HaloGenomics along with a restricted format targeting BRCA1 and BRCA2 coding sequences. We include in our evaluation the optimized version of the BRCA Multiplicom assay (MASTR assay v2.0). At last, 5 control DNA from our panel were send out to Rain Dance for enrichment by the RainStorm microdroplet-based technology on their 142-genes oncology panel. In all cases, emphasis was put on design quality i.e. our main issue was to ensure complete capture of the ROI. For capacity reasons, gene panels were sequenced with the SOLiD v4 paired –end chemistry while BRCA-restricted enrichments were sequenced using the Ion Torrent-PGM and the 316 ion chip with 200bp read length. Agilent’s liquid capture provided a depth-of-coverage ranging from 1X to1472X (mean 629X) and 1X to1260X (mean 472X) for BRCA1 and BRCA2, respectively. Eighteen % of the reads were off target and enrichment failed for GC-rich regions such as the 5’ part of RB1 gene and some first exons. More subtle, strand bias occurred in well covered exons which is in turn a real issue despite good depth-of-coverage because some true variants are filtered out in amplicons extremities due to a unbalanced forward/reverse ratio. Insufficient overlaps in bait design combined with the SOLiD small read length are probably at cause, which is why an optimized bait design should solve the problem. On the other hand, the promising Selector approach gave poor results with only 15% of reads on target and uneven depth-of-coverage. Similar results were obtained with the dedicated Selector BRCA kit, prompting us to recommend technological improvement before any clinical use. We also observed lack of coverage with the Rainstorm technology, at least for a few BRCA exons, precluding it in the present state for diagnostic purposes. Lastly, the Multiplicom enrichment provided complete and even depth of coverage for BRCA1 and BRCA2 (min=1 and max=8025, average=3501), thus appearing as a method of choice, at this point in time, for small size target enrichment in a diagnostic setting. To summarize this pilot phase, two enrichment procedures appear diagnostic-compatible, Multiplicom and Agilent Sure Select, the latter providing bait optimization and taking into account that GC-rich regions cannot be analysed. Another drawback of liquid capture is that automation is mandatory for high throughput diagnosis on a high number of cases with rapid turnaround time. Such automation is still tedious and expensive. Regarding sequencers, the SOLiD system, despite its sequencing accuracy, appeared inadequate for clinical diagnostics for run time reasons (10 days for v4 paired-end sequencing). On the other hand, PGM has shorter, diagnostic-compatible run time (2 hours). Overall, and considering the constraints in terms of enrichment quality, automation and turnaround time, we chose a combination of BRCA Multiplicom enrichment and PGM sequencing on a 318 chip for BRCA clinical diagnosis TESTING PHASE, BIOINFORMATICS PARAMETERS Read mapping SOLiDv3 fragment reads were mapped with BFAST 0.6.2a onto the BRCA1/2 amplicons of the Multiplicom BRCA enrichment kit, whereas all SOLiDv4 paired-end reads were aligned with BFAST+BWA 0.6.5a onto the human reference genome hg19 after conversion of the csfasta/qual files into the fastq format with the solid2fastq. F3 and F5 reads were mapped in color-space with the alignment tools BFAST and BWA (ref), respectively, with default parameters. Then, after the local alignment and the pairing procedure of reads (with –v 215 –s 54 –S 4.0 options for the insert size parameters), only pairs showing the best unique alignment of reads were kept for further analysis. SNP/Indel detection and filtering SNVs from SOLiDv3 data were identified with the Samtools 0.1.13 pileup program and the varFilter.pl perl script with default parameters. For all SOLiDv4 experiments, SNVs and Indels were detected with the UnifiedGenotyper of the GATK v1.0.5 suite after preprocessing of pairs: PCR duplicates have been marked, reads were local-realigned around known indels and base quality score recalibrated from dbSNP132. The Exome variants filters (Q<30, QD<5,HRun>5, SB>-0.1) were then used to filter out both SNV and Indels false-positives. Exonic rearrangement detection For the SOLiDv3 dataset, exonic rearrangements were detected by comparing normalized mean coverage ratios between samples. For all other sequencing datasets, copy-number variations were detected using a depth-of-coverage method based on the read count. For each captured or amplified region, read counts were first computed with the multiBamCov program from Bedtools then normalized and compared to the mean of all samples from the same experiment as a control using the Bioconductor R package DESeq. Supplemental table 1. Mutations and rare variants found in the validation set. Nucleotide position was numbered on the basis of the coding sequences NM_007294.2 and NM_000059.3 for BRCA1 and BRCA2, respectively. Nucleotide numbering reflects cDNA numbering with +1 corresponding to the A of the ATG translation initiation codon in the reference sequence. Number of occurrences is indicated in brackets. (*): large rearrangements found using the electrophoresis step (see text). Variant type Large rearrangements Insertions/Deletions Nucleotide substitutions Gene BRCA1 BRCA1 BRCA2 BRCA2 BRCA1 BRCA1 BRCA1 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA1 BRCA1 BRCA1 BRCA1 BRCA1 BRCA1 Description c.671-?_c.4185+? del c.871-?_c.547+? dup c.426-?_c.(1910_6841) dup c.(?_-227)_(*902_?) dup c.19_47del c.68_69del c.3416_3427delinsC c.68-7dup c.5722_5723del c.5946del c.6514_6515del c.6591_6592del c.10095delinsGAATTATATCT c.301+7G>A c.314A>G c.994C>T c2458A>G c.3748G>A c.4393A>C Identified with reference method Identified with NextGene v2.3 (*) (Coverage/Mutation ratio) Identified with academic pipeline (Coverage/Mutation ratio) p.(Ser3366AsnfsTer4) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes (*) Yes (*) Yes (*) Yes (*) Yes (37X/70%) Yes (35X/62%) Yes (82X/45%) Yes (84X/20%) Yes (330X/56%) Yes (92X/46%) Yes (416X/44%) Yes (413X/31%) Yes (177X/40%) Yes Yes Yes Yes Yes (52X/40%) No Yes (60X/35%) No Yes (333X/55%) Yes (94X/45%) Yes (447X/41%) Yes (435X/24%) Yes (173X/36%) p.? p.(Tyr105Cys) p.(Arg332Trp) p.(Lys820Glu) p.(Glu1250Lys) p.(Ile1465Leu) Yes Yes No Yes Yes Yes Yes (111X/56%) Yes (289X/44%) Yes (294X/47%) Yes (153X/56%) Yes (383X/47%) Yes (124X/50%) Yes (115X/59%) Yes (257X/45%) Yes (296X/49%) Yes (137X/60%) Yes (346X/45%) Yes (116X/52%) Expected Consequences p.? p.? p.? p.? p.(Arg7CysfsTer24) p.(Glu23ValfsTer17) p.(Ser1139ThrfsTer6) p.? p.(Leu1908ArgfsTer2) p.(Ser1982ArgfsTer22) p.(Ser2172ThrfsTer3) p.(Glu2198AsnfsTer4) BRCA1 BRCA1 BRCA1 BRCA2 BRCA2 c.4535G>T c.4812A>G c.4956G>A c.-11C>T c.68-17A>G p.(Ser1512Ile) p.(=) p.(Met1652Ile) p.? p.? Yes Yes Yes Yes Yes BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 c.68-7T>A (2) c.125A>G c.1151C>T c.1788T>C c.1938C>T c.1964C>G c.3079A>G c.3252T>C c.3264T>C c.3516G>A c.4068G>A c.4090A>C c.4584C>T c.4585G>A c.4956G>A c.5199C>T (2) p.? p.(Tyr42Cys) p.(Ser384Phe) p.(=) p.(=) p.(Pro655Arg) p.(Ser1027Gly) p.(=) p.(=) p.(=) p.(=) p.(Ile1364Leu) p.(=) p.(Gly1529Arg) p.(Met1652Ile) p.(=) No, No Yes yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 c.5418A>G c.6037A>T c.6100C>T c.6785T>C c.7017G>C c.7319A>G c.8850G>T p.(=) p.(Lys2013Ter) p.(Arg2034Cys) p.(Met2262Thr) p.(Lys2339Asn) p.(His2440Arg) p.(Lys2950Asn) No, Yes Yes Yes Yes No Yes Yes Yes Yes (153X/46%) Yes (142X/47%) Yes (543X/46%) Yes (484X/51%) Yes (76X/50%) Yes (72X/50%) Yes (895X/48%) Yes (986X/50%) Yes (112X/40%) Yes (37X/41%) Yes, Yes Yes, Yes (Minimum: 312X/41%) (Minimum: 172X/45%) Yes (458X/54%) Yes (413X/55%) Yes (810X/77%) No Yes (241X/42%) Yes (219X/43%) Yes (260X/55%) Yes (234X/56%) Yes (1082X/48%) Yes (1024X/51%) Yes (715X/39%) Yes (790X/40%) Yes (1006X/51%) Yes (998X/49%) Yes (4065X/39%) Yes (3556X/46%) Yes (271X/47%) Yes (233X/45%) Yes (448X/47%) Yes (278X/48%) Yes (71X/42%) Yes (78X/44%) Yes (999X/50%) Yes (1221X/46%) Yes (78X/59%) Yes (77X/60%) Yes (44X/50%) Yes (42X/52%) Yes, Yes Yes, Yes (Minimum: 166X/30%) (Minimum: 145X/32%) Yes (3948X/50%) Yes (4299X/50%) Yes (78X/53%) Yes (67X/52%) Yes (909X/49%) Yes (1114X/47%) Yes (404X/49%) Yes (372X/50%) Yes (106X/44%) Yes (108X/43%) Yes (2452X/44%) Yes (2507X/47%) Yes (180X/39%) Yes (155X/46%) BRCA2 BRCA2 BRCA2 BRCA2 c.8851G>A c.9649-19G>A c.9730G>A c.10110G>A p.(Ala2951Thr) p.? p.(Val3244Ile) p.(=) Yes Yes Yes Yes Yes (246X/44%) Yes (966X/50%) Yes (7277X/47%) Yes (511X/50%) Yes (232X/45%) Yes (856X/49%) Yes (8242X/50%) Yes (464X/52%) Supplemental table 2 : Bioinformatic tools and parameters used in the academic pipeline. Analysis step Program Command line MAPPING Mapping Sort SAM file Get BAM Focus on targets Coverage statistics TMAP PicardTools Samtools BEDTools GATK tmap mapall -g 0 -n 8 -f <reference_file> -r <sff_file> -s <output_sam_file> -v -Y stage1 map1 map2 map3 SortSam.jar I=<input_sam_file> O=<output_sam_file> SO=coordinate samtools view -q 10 -bSt <reference_index_file> -o <output_bam_file> <input_sam_file> intersectBed -abam <input_bam_file> -b <exons_bed_file> GenomeAnalysisTK.jar -R <reference_file> -I <exons_bam_file> -T DepthOfCoverage -L <exons_bed_file> -ct 50 -ct 100 -ct 200 -ct 300 --omitDepthOutputAtEachBase SNV/INDEL DETECTION Variant calling TVC Variant Filtering PostProcessVCF.pl Variant Annotation Annovar Annovar variantCaller.py -l -k -L <log_file> -o <flow_order_seq> -p <GermLine_json_file> -r <TVC_bin_directory> -b <exons_bed_file> <output_dir> <reference_file><exons_bam_file> Modified parameters in the <GermLine_json_file>: max_alternate_alleles : 3 min-bayesian-score : 0.1 min-var-freq : 0 gatk-min-score : 900 SNVs : postProcessVCF.pl –vcf SNP_variants.vcf --mincov 30 --varfreq 0.3 --strdfreq 0.2 --out <snps_output> Indels : PostProcessVCF.pl –vcf Indel_variants.vcf --mincov 30 --varfreq 0.2 --strdfreq 0.2 --out <indels_output> convert2annovar.pl -format vcf4 –includeInfo <snps/indels_vcf_file> annotate_variation.pl --geneanno --buildver hg19 <snps/indels_var_file> <Annovar_bin_dir>/humandb Samtools multiBamCov -bams <exons_bam_files> -bed <amplicons_bed_file> DESeq Rscript --vanilla rgt_script.R <FDR> <number_of_multiplexes> <read_count_matrix> REARRANGEMENT DETECTION Read count for each amplicon and for all samples Differential read count analysis Supplemental table 3 : Minimum input requirements for analysis Nextgene pipeline Academic pipeline Highthroughput reads 34 619 Reads at QV20 and more: 92.91 % 34 619 92.91 % Minimum Minimum number number of of matched bases aligned reads 30 972 5 524 125 33 420 5 566 449 Minimum coverage observed 30X 35X