NEXT GENERATION SEQUENCING(BIG DATA), IT AND THE CLINICAL PRACTICE Che Martin Ph.D | PMP Katrina Fox-O’Malley Bulent Oral BIG DATA …WHAT IS IT REALLY? Has been made more popular due to advances in computing and technology Data that is : High volume Variable Complex Can be mined to extract valuable information MODERN APPLICATIONS-INFORMATION EQUALS IMPROVED EFFICIENCY & PROFIT Google : UPS Search engine and Company/Product –client matching ORION program uses data such as speed, direction, performance and other operations to optimize their performance. Clinical and Research Research applications : identification of new regulatory elements in pathogens Identification of new drug targets Clinical applications – Precision medicine Sources : Thomas H. Davenport and Jill Dyche, "Big Data in Big Companies," May 2013. McKinsey Global Institute” Big data: The next frontier for innovation, competition, and productivity “ NEXT GENOME SEQUENCING AND PRECISION MEDICINE Precision Medicine Match treatment and diagnosis to a persons molecular profile. Advances in molecular biology, genomics and other technologies allow: The molecular/genetic characterization of patient’s cancer In some cases apply these results to treatment strategies that target molecular basis of the particular patient’s cancer. Involves diagnostic tests via one of which is NGS to obtain molecular information about cancer. NGS STEPS -BIG DATA- HOW DOES IT WORK Sample are subjected to targeted sequencing; known cancer genes that well-characterized as mutational hot spots Sequencer produces reads of sequence data ~1.5 GB wells and ~ .1 GB fastq files (contain reads) per sample Source : https://rdp.cme.msu.edu/tutorials/init_process/RDPtutorial_INITIAL-PROCESS.html NGS STEPS -BIG DATA- HOW DOES IT WORK Reads are quality checked and clipped Clipped read are: Aligned to reference genome via one of many algorithms. Converted to produce a BAM file. ~.3GB (per sample) Saved for visualization as part of downstream analysis. BAM is processed by Variant calling Software (many different algorithms) NGS STEPS -BIG DATA- HOW DOES IT WORK Variant; genetic differences from reference genome (“normal expectation”) are identified and confirmed via visualization. Information is saved in VCF files Source : http://www.sustc-genome.org.cn/pgi/documentation.php NGS STEPS -BIG DATA- HOW DOES IT WORK Bioinformaticians : Design pipelines to parse and annotate identified variants with information required for the clinical workflow: Specific mutation (peptide) Identifying relevant clinical trials Identifying published references Design verification pipelines (some cases) Design a infrastructure and pipelines to format this data to be received by clinical LIMS software. ADT Reports /Results Patient Registration Storage Orders Report Generation Patient Report Electronic Medical Record VCF Sequencer (Big Data) Bioinformatics Pipeline Laboratory Information System LABORATORY INFORMATION SYSTEM (LIS) SPECIMEN TRACKING AND PROTOCOL DOCUMENTATION SHARING DATA WITH BIOINFORMATICS PIPELINE Export Aliases Import new data and match with sample aliases TECHNOLOGIST REVIEW CLINICAL REPORT SENT TO EMR – HL7/PDF REPORT CONTENT: PATHOLOGY & SPECIMEN DETAILS Cancer Gene Panel 50 w/ Interp Targeted Next Generation Sequencing Collected: 3/20/2015 11:17 AM Received: 3/30/2015 8:26 AM Verified Date/Time: Specimen Information: Surgical Path No.: S15-8646 Specimen Type: Paraffin Embedded Tissue Tumor Type:Metastatic lung carcinoma Block No.: B1 Neoplastic Cell Content: 50% Institution: REPORT CONTENT: CLASSIFICATION OF VARIANTS Result: The following variants were detected in the patient's specimen: Tier 1 Gene Variant: None detected Type of Mutation: Cosmic ID: Tier 2 Gene Variant: KRAS, c.34G>T, p.G12C Type of Mutation: SNV Cosmic ID: COSM516 Variants in Tier 2 may be associated with clinical trials. Please check www.clinicaltrials.gov for details Tier 3 Gene Variant: TP53, c.830G>T, p.C277F Type of Mutation: SNV Cosmic ID: COSM10749 Classification of variants: Variants are classified based on current evidence for clinical actionability. Tier 1 – Clinical utility has been demonstrated - Actionable / Clinically Relevant variants. -Variants in genes with approved therapeutic implications in specified tumors. -Variants with potential diagnostic/classification, prognostic implications. Tier 2 – Clinical utility /actionability has been documented. -Variants with approved therapeutic implications in a different tumor type. -Novel variants in genes that have approved therapeutic implications. -Variants associated with Clinical trials. Tier 3 – Variants of Uncertain Significance (VUS) -Variants may be associated with little or no established cancer risk. DISCLAIMERS/REFERENCES/E-SIGNATURE METHOD: DNA was extracted from macrodissected, paraffin-embedded tumor of the patient using the QIAmp Kit (Qiagen, Valencia, CA). The extracted DNA was amplified and subjected to Next Generation Sequencing (NGS) using the Ion Ampliseq Cancer Panel hotspot v2 on the Ion Torrent Personal Genome Machine (Life Technologies). The targeted gene panel is designed to detect mutations/variants in 50 key cancer-related genes. This test was validated for mutations/Single Nucleotide Variants (SNV) in the BRAF, EGFR, KRAS and JAK2 genes. The limit of detection is precise and reproducible at approximately 5% with approximately 400X coverage and 2.5% with 1000X coverage. The data obtained was analyzed with the Torrent Suite Software v 4.2. DNA sequences used as references for this panel of genes can be found at http://www.ncbi.nlm.nih.gov/refseq/rsg/. The mutation nomenclature is based on the recommendations from the Human Genome Variation Society http://www.hgvs.org/mutnomen/. Limitations: This mutation panel is designed to detect targeted mutations only. The 50 genes covered are not all sequenced in their entirety. Mutations outside the 207 interrogated amplicons will not be detected. Variants of uncertain origin (germline versus somatic origin) cannot be determined unequivocally. REFERENCES: Sequist LV et al., First-Line Gefitinib in Patients With Advanced Non–Small-Cell Lung Cancer Harboring SomaticEGFR Mutations J. Clin. Oncol , 2008, 26, 2442-2449. Verified Date/Time:04/06/15 3:02 PM By: John Doe M.D. (Electronic Signature)