High throughput CNV analysis - 2 billion data points in 20 weeks John Anson PhD, Oxford Gene Technology Who are OGT and what do we do? • • • • • Founded in 1995 by Professor Ed Southern Fundamental IP on microarray technology High throughput microarray servicing Cytogenetics (CytoSure™) products and service Single cell analysis (CellScribe ™) OGT Business Model Clinical & Genomics solutions Applications of arrays from the lab to the clinic Biomarker discovery Protein and DNA biomarkers Digital microarrays/single cell analysis for analysing genomic events at single cell level Licensing fundamental array patents OGT is a molecular medicine company providing advanced clinical genetics services and developing innovative molecular diagnostics Many issues face scientists investigating the genetic basis of disease • Most diseases have a complex molecular basis • Large scale multi-faceted studies are required • Methods employed need to be sensitive, specific, reliable, with a high capacity and quick turnaround Conventional biomarkers studied DNA • Single nucleotide polymorphisms (SNPs) RNA • Messenger RNA (mRNA) markers Protein Metabolites Prospective biomarkers DNA • Single nucleotide polymorphisms (SNPs) • Epigenetic (methylation) • Copy number variation (CNV) RNA • Messenger RNA (mRNA) markers • MicroRNA (miRNA) markers Protein Metabolites Copy Number Variation Genotyping CNVs using aCGH Test genomic DNA Reference genomic DNA Label with Cy5 Label with Cy3 Hybridise Wash and scan Feature extract & load into analysis software Challenge: Scaling up aCGH for high throughput CNV genotyping • Labelling protocol needs to be “robot friendly” • Automated slide washing required • Tracking of samples and assay performance essential • Ozone monitoring and control • Powerful computational capability required for data for analysis and data storage High Throughput CNV Service Project • • • • • 24 leading human geneticists Copy number variations (CNVs) >20,000 DNA samples, and associated reference sample 7 disease areas 3,000 controls • >1000 DNA samples processed per week OGT in collaboration with Agilent technologies provided high throughput servicing of the WTCCC CNV arrays High throughput samples are prepared using automation • Experimental error is minimised, promoting consistency • Metrics fed directly into LIMS assuring accuracy • >600 samples and controls per day Quality control is paramount at every step • QC check to ensure that dye incorporation and yield metrics are achieved prior to hybridisation • Data consistency is continually monitored, immediately highlighting variation and minimising waste of: • sample • expense • time. Hybridisation capacity of ~300 slides in parallel • Hybridisation oven temperatures are monitored constantly to ensure: • optimal hybridisation conditions • more reliable results Automated slide washing ensuring reduced variability • Automated slide washing eliminates manual handling and hence, potential variation • Provides batch to batch reproducibility for large studies • Ozone-controlled and monitored environment preserving dyes and data quality • Wash buffer, reagent temperatures, agitation rates and wash times are controlled and reported in LIMS Fully automated array scanning, data download & analysis • Ozone-controlled environment protects against dye degradation • Data securely stored and backed up in real-time (batches of 192 slides) • Data transferred rapidly and securely LIMS provides full traceability at all stages • Samples are barcoded and tracked throughout the process • All sample and array QC metrics • All mastermix and consumable batches used • All equipment utilised • PCR machine (and calibration) • Hybridisation oven (continuous temperature recording) • Centrifuges • Spectrophotometers (and frequent calibration) • Automated washing stations (temperatures, and speeds) • Scanners • Ozone levels (washing and scanning environments) LIMS is customisable & provides full traceability at all stages >40 QC checks completed for every sample processed – complete reassurance LIMS provides full traceability at every stage • Customised modules for: plate handling, unique array IDs, hybridisation oven, and scanner • QC on all reagents and consumables: digest and labelling mixes, incubation block, clean-up kit, cot-1 solution and wash solutions Example in-process control metrics • OGT internal control sample pass rate = >98% 0.5 Heat block failure DLRSD 0.4 Purification module failure 0.3 POOR GOOD 0.2 EXCELLENT 0.1 0 Wk1 Wk19 Time DLRS = derivative log ratio spread This metric is a measure of the reliability of the data to detect & call CNV aberrations Signal Intensity (Cy3) Signal Intensity of Control Samples EXCELLENT GOOD POOR Wk1 Wk19 Time Signal intensity is consistently high and gives excellent QC metrics throughout 19 weeks. 20 Signal-To-Noise (Cy3) Signal to Noise Ratio of Control Samples EXCELLENT GOOD POOR Wk1 Wk19 Time Signal-To-Noise is consistently excellent (based in Agilent QC metrics) throughout 19 weeks. This ensures optimal dynamic range of data enabling greater confidence in CNV calling. 21 Some numbers... • • • • • • 20,146 samples processed 40,292 labelling reactions 20,146 arrays hybridised 2.1x109 data points generated 5.1Terabytes of data shipped Processing time 20 weeks WTCCC HapMap Study published October 2009 Findings from the study • Any two genomes differ by more than 1000 CNVs, or around 0.8% of a person's genome sequence. • Most of these CNVs are deletions, with a minority being duplications. • Two consequences are particularly striking in this study of apparently healthy people: • 75 regions have jumped around in the genomes of these samples • more than 250 genes can lose one of the two copies without obvious consequences and a further 56 genes can fuse together potentially to form new composite genes But... “We have not found large numbers of common CNVs that we can tie strongly to disease. There remains much to be discovered and much to understand and our freely available genotyped collection will drive that discovery. “ Matt Hurles, Wellcome Trust Sanger Institute Products from OGT Services • Using OGT Services to identify growing applications for Products • • • • • Gene expression Array Comparative Genome Hybridisation (aCGH) ChIP on chip Methylation/CpG islands miRNA profiling Karyotyping – gross changes Example of a normal karyotype Example of Down’s Syndrome (trisomy 21), the most common numerical abnormality found in newborns. It is characterised by an extra chromosome 21. Karyotyping - deletions http://www.pathology.washington.edu/galleries/Cytogallery/main.ph p?file=digeorge%20syndrome Accessed 19.09.06 CytoSure family CytoSureTM Cytogenetics range • Arrays – OGT designed, manufactured by Agilent: • Syndrome Plus • • • • 2x105k developed with Pr. Joris Vermeesch Belgium 4x44k ISCA approved 4x180k 4x180k ISCA approved • Chromosome X • 4x44k • 2x105k • Aneuploidy (new) • 8x15k (220Kb) • DMD (new) • 4x44k developed with Emory • Labelling kit & Ancillaries • Analysis Software • Oligome custom arrays CytoSure software designed for cytogenetics Syndrome Genes Recom hot CNVs Confirmation Patient data (del) Patient data (dup) Data courtesy Greenwood clinic, South Carolina Summary of OGT’s capability Designed to meet the need of large scale CNV genotyping studies: • • • • Data quality – rigorous QC critical (proven in WTCCC study) Throughput – currently >1,200 samples per week (saleable) Data handling and storage – proprietary LIMS Custom array design service – from whole genome exploration to focussed interrogation in key regions of interest Where next for high throughput CNV genotyping? • Several large scale projects on the horizon • New array formats provide flexibility between content coverage and cost • CytoSure software adapted for data analysis • Routine processing of clinical samples Impact of next gen sequencing? Potential for a symbiotic relationship Acknowledgements • High throughput CNV genotyping: • • • • • • Graham Speight Nicole Sparkes Andrew Rogers Sandra Lam Tom Nicholls John Shovelton • CytoSure • Doug Hurd • John Shovelton • Volker Brenner Thank you