Considerations for Analyzing Targeted NGS Data – BRCA

advertisement
Considerations for Analyzing Targeted
NGS Data
BRCA
Tim Hague,CTO
Introduction
 BRCA 1 and 2 are best known as 'cancer
susceptibility' genes
 Actually the proteins repair damage in
DNA
 Large number of known deleterious
mutations
 Disproportionate number of indels
History
 Mary-Claire King discovered BRCA1 and
BRCA2, published the function
 Myriad Genetics won the patent
Distribution of known BRCA1 deletions >3 bp
Indel size (nt)
Dominuque Stoppa
Lyonnet at Curie Institute
„Large scale deletions could
account for as many as
one-third of all BRCA1
mutations in some populations”
BRCA are tumor suppressor genes.
82% lifetime chance of developing breast/ovarian cancer.
Science 2004, 306:2187-2191
>1,500 deleterious BRCA mutations
17 kbp coding region with mutation rate of
1/2000
NGS-based BRCA screening
Leeds UK, Newgene UK, Ghent Belgium
DIY genetic test published by Salzberg
82% chance of cancer
>90% chance of being false positive/ negative
What kind of NGS data?
 False negatives must be avoided
 Precision of both sequencing data and the
data analysis is key
 Looking for indels – indel detection
abilities are a key criterion
 Repeats are also an issue in BRCA region
BRCA Repeats
Homopolymer Errors
Homopolymer errors look like small indels
and can cause noise
Problem for:
Roche 454
Ion Torrent
Long Reads
Read length is a limiting factor for insertion
detection.
When searching for indels, long reads can
help. Long reads can also help with
repeats.
Roche 454 have the longest reads.
Real examples with Roche 454 data
Real examples with Roche 454 data
Paired Reads
 Paired reads can also help to increase
effective 'read length'
 Illumina MiSeq now has 2x250bp protocol
 Compare 9 open source and commercial NGS
analysis softwares
 In silico test with mutated reference BRCA
gene
 2211 known BRCA variants
1341 SNOs, 320 insertions and 551
deletions
 Full GATK pipeline used for variant call,
including quality recalibration and indel
realignment
BWA
Overall Sensitivity:
SNPs found:
99.2% Paired End
99.5% PE
94.5% Single End
99.5% SE
Insertions found:
Deletions found:
99.4% PE
98.5% PE
89.4% SE
85.5% SE
BWA
False Negatives :
False Positives:
17 Paired End
23 PE
121 Single End
168 SE
The longest (60bp+) deletions were not
found, either with PE or SE data
Indel sizes - BWA Single End
Indel sizes - BWA Paired End
Other Tools
 Most other alignment tools showed a
similar trend – much better results overall
with Paired data
 Only two of the tools tested found the
longest deletions, even with Paired data
Paired Reads - Conclusions
 Much better for reliable variant detection than equivalent
length single reads
 Provided much better coverage in the BRCA region
(spanning small repeats)
If available, paired reads should be preferred
Indel Detection - Conclusions
 Not all tools are good at finding indels.
 Burrows Wheeler based aligners can't find indels
beyond a few base pairs in single reads, but can make
better use of paired data – if indel realignment is also
used.
 They still can't detect the longest indels (there is just a
gap in coverage).
If indel detection is required, an indel
sensitive tool should be used
Overall - Conclusions
 None of the alignment tools found all the variants
 It will almost certainly require the same data to be
analyzed with more than one tool, to get sufficiently
accurate results
Download