Clonal Sequencing for Genetics and  Cancer in a Research and Diagnostic  Setting Graham Taylor

advertisement
Clonal Sequencing for Genetics and Cancer in a Research and Diagnostic Setting
Graham Taylor
Leeds
People
Leeds University
• Joanne Morgan
• David Parry
• Claire Logan
• David Bonthron
• Colin Johnson
• Eamonn Sheriden
• Chris Inglehearn
• Ian Carr
• Alex Markham
Leeds NHS
• Nick Camm
• Helen Lindsay
• Antigone Tzika
• Josie Hayes
• Christopher Watson
• Lampros Mavrogiannis
• Ruth Charlton
• Paul Roberts
• Leeds Health Stars
Why bother…
On résiste à l'invasion des armées; on ne résiste pas à l'invasion des idées.
Victor Hugo, Histoire d'un Crime (1852)
There is one thing stronger than all the armies in the world, and that is an idea whose time has come.
Sequencing in 2007
The scale of activity
The rate of change
courtesy Stephanie Cohen & Steve Brenner Hype Cycle
$1,000 genome
Cost‐effective
Genetic tests
Projects
Leeds University
• Gene dosage
• Targetted re‐sequencing
• ChIP‐seq
• Transcriptome
• Tumour resequencing
Leeds NHS
• Re‐sequencing Long PCR products
• Gene dosage
• Targetted reseqencing
• Tumour resequencing
The Pathway to Terabase sequencing
• Generation 1: – Sanger/Capillary Format/Nanomolar scale
– 96 reads per run
• Generation 2
– Clonal/array format 454, Illumina, SOLiD
• Generation 2.1
– Higher cluster density Clonal/Array Format/Zeptomolar scale
– 1 billion reads per run
• Generation 2.1b
– Mini versions of 2.1
• Generation 3:
– Single molecule
Generation 2.1 sequencers: 1012 base sequence capacity
• Reagent cost approx £3,000
• Avge genome coverage >250
• Avge exome coverage >20,000
– £300 (reagent) exome at 2,000‐fold coverage
• aCGH will not compete
– Reduction of CNV‐seq costs by at least 10‐fold
– Ability to identify translocations
– Ability to report SNPs (uniparental disomy,autozygosity)
• Further reduction in cost of single/few gene analyses will not have much impact as major costs are now outside of the sequencing workflow
2.1b (mini‐sequencers)
•
•
•
•
•
109‐10 base capacity
Roche Junior (14 Gbase/£3,000)
Illumina MiSeq (1.5 Gbase/£500)
Ion torrent (1 Gbase/?)
Mid range sequencers would replace most conventional sequencing applications but not achieve the economies of scale of the large scale sequencers e.g. exomes. CNV‐seq, mapping re‐arrangements
Approx cost per base Q1 2012
Whole Genome
CNV‐seq
Multiplex NGS for pre‐ and post‐natal genetic diagnosis of copy number variation in phenotypically‐abnormal constitutional cases Antigoni Tzika, Kelly Cohen, Paul Roberts
Resolution comparable to that obtained from array‐CGH platforms (240Kb for ICSA 8x60Kb, 225Kb for NGRL 4x44Kb19) can be achieved using 2 million reads per sample. With increased reads, resolution to the level of the base pair is feasible with NGS
Read pairs will map translocations to the exact position
Aneuploidy detection
CNV‐seq
0.5Mb BlueGnome
BAC array Multiplexing and effect on resolution
• Increasing multiplexing decreases resolution
• Basic patterns visible and higher throughput and lower cost
5X multiplex
10X multiplex
80X multiplex
Next Generation Sequencing (NGS) and Genetic Diagnostics
• NGS diagnostics is an inevitable trend because:
– Staffing and reagent costs are reduced
– Data output is increased and automated
• There will gradually be a transition in diagnostics from testing one or two genes to several, many and possibly all genes
• Since February 2010 the Leeds Genetics service has delivered BRCA1 & BRCA2 sequence using CPA‐
accredited NGS based on the Illumina GAIIx
– Improved reporting times
– Reduced costs
– Reduced Retest rate
The case for gene‐centric analysis
(or, “We have run out of money, we shall have to think”‐ Rutherford)
Coverage
1 flow cell (GAIIx) = 7 channels
1 channel = 40 million reads
1 read = 36 – 300 bases
1 channel = 1 – 12 Gigabases
1 flow cell = 24 haploid genome equivalents
At mean coverage of 100, one channel can cover 120 Megabases (NB exome is approx 50Mbases)
•
•
•
•
•
•
Reagent Costs
•
•
•
•
1 genome (x24) $10,000 (1‐plex)
1 exome (x50) $1,000 (1‐plex)
1 cardiome (1Mbase) $ 100 (10‐plex)*
1 gene (e.g BRCA1) $15 (96‐plex)*
*Library preparation weighting not fully costed
Multiplexing for cost efficiency
Sample tagging – 6 base barcodes (potentially 1024 variations)
Standard adaptor sequence
Library insert
Barcode sequence
GGTGGC
Sequencing primer
BRCA1 & BRCA2 analysis
• 671 variants (326 BRCA1, 345 BRCA2)
• 77 variants identified previously were all detected
• All pathogenic variants were detected
NGS in the Leeds Genetics lab
• Familial breast/ovarian cancer
– NGS replaced existing Sanger service for BRCA1 and
BRCA2 for all diagnostic referrals in February 2010
– First UK NGS diagnostic reports issued in March 2010
• Hereditary non-polyposis colorectal cancer
– NGS replaced existing Sanger service for MLH1, MSH2
and MSH6 for all diagnostic referrals in October 2010
New services introduced May 2011
• Hypertrophic cardiomyopathy
– MYBPC3, MYH7, TNNI3, TNNT2
• Pheochromocytoma & paraganglioma
– PRKAR1A, RET, SDH5, SDHB, SDHC, SDHD, TMEM127, VHL
• Marfan syndrome
– FBN1
Results / benefits
Average BRCA reporting times
60
50
40
30
20
10
11
bFe
n11
Ja
c-1
0
De
0
v-1
0
No
10
t- 1
Oc
p-
10
Se
g-
l-1
0
Au
Ju
-1
0
n10
Ju
ay
M
r- 1
0
0
Ap
10
ar
-1
M
bFe
n10
Ja
c-0
9
De
v-0
9
0
No
Reporting tim e (working days)
Multi‐gene testing
Reduced lab costs
Increased capacity
Increased reliability
Improvement in turnaround times • Improvements in patient care pathways
• Close working relationship with research groups
•
•
•
•
•
The deliverable, the desirable and the fundable
•
•
$1K genomes/exomes
Targetted resequencing
– Incremental revision of existing services
– Stratified medicine
• Germline
• Somatic
•
Syndrome‐omes
–
–
–
–
•
Cardiome
Retinome
Ciliome
“Cancer chip”
Screening
– Prenatal
– Carrier
• Autozygous
• General recessive
What does this mean for clinical diagnostics?
The currency of genetic analysis will become DNA sequence
– Sequence count
– Sequence variation
– Sequence arrangement
– The currency is dropping to commodity prices
– Skill set required is less lab, more informatics biased
– genes < pathways < “omes” < genomes
Genome Informatics
• Use the co‐ordinates of the reference genome
• Use the reference sequence
• Use the annotations to the reference sequence
• Overlay experimental data
• Filter and output results
• Use scripts, e.g. Perl, Java, Python
• Use web applications, e.g. Galaxy Handling the data locally
Python driven Web front end for Genome Informatics
Now getting up to speed with NGS tools
Array capture
Switches from over 20 PCRs to get two genes to one PCR to enrich over 24 genes (pathway or syndrome‐driven testing)
Phenome
•
•
•
•
•
Cardiome
Ciliome
Retinome
Kinome
…
Samples need to multiplex to be cost‐effective, but multiplexing
reduces capture efficiency
Autozygome
Bradford has a population of 470,000, with 6,000 live births in 2006. Although only 18% of the Bradford population is of south Asian origin, 50% of births are to south Asian families Autosomal recessive (AR) diseases occur when a child inherits two copies of a gene, one from each parent, both genes carrying a harmful mutation. The chance of having a child with an AR condition is increased if both parents are blood relatives. In communities in which consanguineous marriage is common, there is a significant increase in the prevalence of AR disease . A 1993 study from Birmingham recorded a 16‐
fold increase in AR diseases in the offspring of consanguineous Pakistani couples, compared to non‐consanguineous couples .
Exome
• One exome 22,000 variants
• Filtering results
• Unknown success rate
• Exome sequencing is now within reach of Regional Genetics Services
• Recent experience in Leeds: 3 exomes, 3 pathogenic variants
PCR‐ome
Preliminary(!) data on accuracy
<50 bases 1/700, proportional to read length
>50 bases, less accurate than 1/700, proportional to read length and square of read length
Detection and quantification of rare mutations with massively parallel sequencing
Isaac Kinde, Jian Wu, Nick Papadopoulos, Kenneth W. Kinzler, and Bert Vogelstein
NGS Genotypes
• Generated using custom Perl script converts FASTA or FASTQ files to xls genotype report
• Good signal:noise (noise <1%)
• Direct reporting of genotypes
• Forward and reverse reads support QC
• Unmatched reads available for further processing
Scale up: From 5 amplicons, 10 cases to 350 amplicons, 10 cases
de novo variants
Custom Perl script groups and counts unmatched reads which are then used in BLASTn queries against reference genome build
kras codon 12 6 base insertion
kras codon 11 C>T
How do I transfer cheap sequencing into high value diagnostics?
• Demand/Targets
– Phenotype driven
– Public Health driven
– Clinical demand driven
– Clinical utility driven
• Commissioners
• Providers
– State funded or commercial?
– Staff Training and accreditation needs Consequences
• Any large scale re‐sequencing effort will find many variants of unknown clinical significance
• These need to be recorded and to be searchable
• In this way, over time, we will learn more about their frequency and clinical significance
• But only if we have the database!
• Support your local/national/international mutation database
Thanks to
Academic
•
•
•
•
•
•
•
•
•
Joanne Morgan
David Parry
Claire Logan
David Bonthron
Colin Johnson
Eamonn Sheriden
Chris Inglehearn
Ian Carr
Alex Markham
Service
•
•
•
•
•
•
•
•
•
Nick Camm
Helen Lindsay
Antigone Tzika
Josie Hayes
Christopher Watson
Lampros Mavrogiannis
Ruth Charlton
Paul Roberts
Leeds Health Stars
Download