MITOCHONDRIAL DISEASE SEQUENCE DATA RESOURCE (MSeqDR) CONSORTIUM: A Global Grass-Roots Effort to Compile, Organize, Annotate, and Analyze Whole Exome Datasets from Individuals with Suspected Mitochondrial Disease Marni J. Falk, M.D., FACMG Assistant Professor of Pediatrics Division of Human Genetics The Children’s Hospital of Philadelphia University of Pennsylvania Perelman School of Medicine Philadelphia, Pennsylvania falkm@email.chop.edu DISCLOSURES Marni J. Falk, M.D. is • Organizer, Mitochondrial Disease Sequence Data Resource (MSeqDR) Consortium – Pilot project development funding from UMDF and NAMDC (U54, NINDS & NICHD, NIH) • Chair, Scientific and Medical Advisory Board & Member, Board of Trustees, United Mitochondrial Disease Foundation • Consultant, GlaxoSmithKline • Consultant, Mitokyne GENOMIC BASIS OF MITOCHONDRIAL DISEASE • Mitochondrial disease is highly heterogeneous in causes and features ‒ Traditional single gene testing has had limited diagnostic success ‒ Newer genomics technologies enable comprehensive and efficient testing for all known genetic causes in dual genomes ‒ >200 nuclear genes ‒ All 37 mtDNA genes ‒ Diagnose >50% of complex mitochondrial diseases in one test* ‒ Novel disease gene discovery • We have entered a computationally sophisticated molecular diagnostic age for understanding subclasses of mitochondrial disease**: **Calvo S, Mootha R, Ann Rev Genom Hum Genet, 2010; **McCormick E et al, 2012, Disc Med MSeqDR Consortium Rationale • MSeqDR Consortium was initiated at the June 2012 Annual Meeting of the United Mitochondrial Disease Foundation (UMDF) to create an international source of genomic information in suspected mito diseases – Utilize genomic data being generated in clinical and research labs world-wide • Large-scale sequencing panels of mtDNA genome & multiple nuclear genes • Whole exome or genome data – Once initial data analysis is complete, dataset itself remains highly valuable • Informs allele frequency in >10,000 non-synonymous exome variants per person – Link to phenotypic and laboratory data • “Missed” disease-causing mutations might later be identified by other bioinformatics tools or investigators • Collective exome analysis in specific subgroups or singe gene disorders may reveal variants that modify phenotypes or predict response to specific therapies – Common exome/genome scale data repository and analysis tools are needed to maximize data utility across the mitochondrial disease community • Optimize tools for mito disease that integrate with existing genomic databases MSeqDR Consortium Structure WORKING GROUP 1: TECHNOLOGY AND BIOINFORMATICS • WG1 Co-Chairs: – Marni Falk, MD (CHOP/Upenn), Xiaowu Gai, PhD (Loyola), Curt Scharfe, MD, PhD (Stanford) • WG1 Advisors: – Lisa Brooks, PhD (NHGRI, NIH), Deanna Church, PhD (NCBI, NIH) WORKING GROUP 2: PHENOTYPING, DATABASING, IRB CONCERNS, SECURITY, AND ACCESS • WG2 Co-Chairs: – Patrick Chinnery, MD, PhD (Newcastle), Lee-Jun Wong, PhD (Baylor), and Peter White, PhD (CHOP/Penn) • WG2 Advisors: – Donna Maglott, PhD (NCBI, NIH) and Yaffa Rubinstein, PhD (NCATS, ORDR) WORKING GROUP 3: MITOCHONDRIAL DNA SPECIFIC CONCERNS • WG3 Co-Chairs: – Vincent Procaccio, PhD (Angers) and Douglas Wallace, PhD (CHOP/Upenn) • WG3 Advisor: – Richard Cotton, PhD (Melbourne/Human Variome Project) MSeqDR Prototype Development Project • Prototype Development Project – Leaders: Marni Falk, MD (CHOP) Organization Xiaowu Gai, PhD (MEEI/Harvard) Bioinformatics Pipeline + Tools Stephan Zuchner, MD (Miami) Data Visualization and Mining • Seeking global input from Mito Disease investigators • Dedicated MSeqDR bioinformatician – Lishuang Shen, PhD • • • • • • Establish MSeqDR website and tools Exome data file handling and server issues Reannotate exome data to deposit in data visualization tool(s) Coordinate comparative analysis project of bioinformatics pipelines Extract/link phenotypic data from NAMDC or other patient registries Optimize user-friendly web interfaces to mine exome and mtDNA data https://mseqdr.org Xiaowu Gai, PhD, Lishuang Shen, PhD MSeqDR: Major Components – Web Portal = MSeqDR.ORG • Incorporates Wordpress, facilitates community participation, controlled access • https://mseqdr.org – MSeqDR G-browse - Gene or variant level analytic support • graphic user interface for accessing exome variants (public & private) in the context of other genomic annotations in individual “tracks” • Variant data sharing: other G-browse instances, UCSC Genome Browser, etc • mtDNA-specific analysis and interpretation tools • http://gmod.org/wiki/GBrowse – BioDAS Server = Aggregate data distribution tool • ProServer up and running, useful to distribute MSeqDR variant data • Enable sharing private variant data at various investigator comfort levels – MSeqDR LSVD – Mitochondrial Disease Locus Specific Database • >1,300 nuclear genes and mtDNA-genes – curated information of gene/transcript/variant/phenotype/disease • http://www.lovd.nl/3.0/home (Leiden University Medical Center) Providing centralized access to different genome mining tools • • HBCR = Exome Data Annotation Tool • Human Basepair Codon Resource for web-based variant annotation • Custom variant annotation tool developed by Xiaowu Gai, PhD • Dr. Xiaowu Gai, Massachusetts Eye and Ear Infirmary GEM.app = Exome-level dataset mining tool • Genome management application • Web-based exome analysis of individuals-families-cohorts • Stephan Zuchner, MD, PhD – University of Miami – Already “live” with >4,500 exomes from neuromuscular disease patients/families – Data archive • MSeqDR optimized – common exome reannotation, mtDNA genome mining, CDEs, etc – Common login for MSeqDR users – Incorporate Global Universal IDs (GUIDs) – Will display individual exome data in MSeqDR Gbrowse custom “Tracks” Providing ready access to different mtDNA-specific genome mining tools mtDNA GENOME Analayis & Variant Pathogenicity Assessment Tools • MitoMAP – MitoMaster - MitoWIKI – – • Mito Tool Box / HmtDB – – • rCRS-based mtDNA haplogroup analysis Dr. Mannis Van Oven, The Netherlands Phy-Mer – – • mtDNA Variant Annotation & Prioritization Tool Dr. Fons Stassen, Maastricht University, The Netherlands Phylotree mtDNA tree build 16 (19 Feb 2014) – – • HmtDB variant analysis, including heteroplasmy and haplogroup calling Drs. Marcella Attimonelli, Maria Angela Diroma, Italy MT.AT – – • Human mtDNA genome database Dr. Douglas Wallace, Marie Lott, Jeremy Leipzig at CHOP Mitochondrial haplogroup classifier (alignment-free; reference-independent) Daniel Novarro-Gomez, Massachusetts Eye and Ear Infirmary MitoBreak – – mtDNA breakpoints database Drs. Filipe Pereira, Joana Damas MSeqDR G-browse: Aggregate Data Analysis at the Gene or Variant Level MSeqDR Gbrowse: Novel annotation tracks Community-specific & Custom Tracks MSeqDR Gbrowse: mtDNA gene analysis Visualize mtDNA Genome Variation User Data Ensembl Genes RSRS vs rCRS ClinVar Variants Ensembl Variants MitoMap Variants HmtDB Variants PhyloTree Variants Reference differences: rCRS vs. RSRS Known pathogenic variants: ClinVar, Ensembl, MitoMap Predicted pathogenicity of mtDNA variants: HmtDB (rCRS) Predicted pathogenicity of likelihood of mtDNA variants: HmtDB (RSRS) Haplogroup Defining Variants: PhyloTree Visualize mtDNA Variants Linked variant data tables by track MSeqDR Gbrowse: Nuclear gene analysis Nuclear Genome: POLG POLG Mutation Database Transgenomic Nuclear Mitome Panel MSeqDR Exomes EVS Exomes Visualize Nuclear Genes Browse Nuclear Gene Variation MSeqDR Gbrowse: Interrogate public & MSeqDR variant data Nuclear-Mito Data Sets POLG Mutation Database: http://tools.niehs.nih.gov/polg/ Dr. William C. Copeland NuclearMitome – Comprehensive Sequence Analysis of 448 Nuclear Mitochondrial Genes MSeqDr Exomes – aggregate data of 1,043 exomes analyzed and shared by Dr. Xiaowu Gai MitoCarta: an inventory of 1,013 human mito localized genes – Dr. Vamsi Mootha GeneDx – 101 Gene Panel, gene content only currently; variant content coming (?) Dr. Renkui Bai and Sherry Bale MitoPhenome – 174 Genes Dr. Curt Scharfe Phenome Portal for Mitochondrial Diseases HPO for Mitochondrial Diseases MSeqDR-LSDB Curation and analysis at gene-transcript-variant-disease levels in all nuclear & mtDNA mitochondrial genes Browse Search Genes Diseases Variants MSeqDR-LSDB: >1,300 nDNA & mtDNA mito genes Mito-Gene: POLG Browse Gene Data Integration with other browsers Integration with other resources MSeqDR-LSDB: Variant Data Curation Search Browse Variants Browse Individual Variant Data ClinVar Data Integration Browse Diseases Search MSeqDR TOOLS Providing access to public and custom tools for individual exome and mtDNA genome analysis HBCR: Human BP Codon Resources Exome dataset Variant Annotation Xiaowu Gai, PhD MT.AT: mtDNA genome variant annotation tool Fons Stasson, PhD Genome Center Maastricht MSeqDR – GEM.app Stephan Zuchner, MD University of Miami MSeqDR exome data annotation pipeline for individual exome deposition & analysis Xiaowu Gai, PhD Comparative WES Data analysis pilot project underway • No Gold Standard exists for NGS data preparation or analysis • Assess relative strengths and weaknesses of existing pipelines • Determine optimal strategy for common exome data analysis Comparative Metric Sample Characteristics Exome Variant Detection Performance Exome Variant Characteristics Pipeline 1 Pipeline 2 Pipeline 3 Sample ID Subject Gender Family Relationship Total # Variants Total # INDELS Total # Homozygous Variants Total # Heterozygous Variants % de Novo Coding Variant # Coding Variants - % de Novo Synonymous Variant # Non-synonymous Variant # Splice Variant # UTR Variant # Public (dbSNP 137) Known Variant # Novel (based on dbSNP) Variant # Xiaowu Gai, PhD, LiShuang Shen, PhD MSeqDR-GEM.APP integrated software tool: Web-based exome dataset mining Stephan Zuchner, MD University of Miami Using gem.app to identify a novel mitochondrial disease gene Phenotype-Exome Data Integration and Consent Incorporation of Phenotype Capture and Display Tools • REDCap – Research electronic data capture tool – Free, web-based, clickable data entry – Custom design tools to capture any desired data type • Common data elements (CDE) optimized for mitochondrial disease • Integrate with NAMDC data capture tools and fields REDCap-based Mito Disease Data Capture Claire Sheldon, MD, PhD, Elizabeth McCormick, MS, Jeff Miller, PhD Self manage account & data access Phil Yeske (UMDF), Sharon Terry (Genetic Alliance), Robert Shelton (Private Access) MSeqDR “go-live” preparation underway • MSeqDR tools technical optimization and response to community feedback – G-browse optimization (variant blog, add lab-specific aggregate level data tracks) – GUID system implementation and assignment to all data types – Phenotype data integration (existing data on subjects vs newly collected) • Integrate NINDS Common Data Elements for mitochondrial disease • HPO ontology tree-like structure to vary phenotype data to access rights – mtDNA genome analysis optimization • Integrate with MitoMap, MitoMaster, Mito Tool Box, etc – Haplogroup and heteroplasmy analysis • Ethical use and oversight – Data security protections (aggregate data, cloud computing) – Develop web portal to enable patient privacy access for genetic +/- deidentified phenotype data to be deposited into MSeqDR • Genetic Alliance IRB approval • Translate access page into different languages – Develop data access and use oversight committee • Clinical diagnostic labs, researchers, physicians, family support groups, etc • Long-term financial support structure – Commercial diagnostic laboratories and UMDF assessing long-term support options – Integrate with NIH Clinical Genome Project and related resources (dbGAP, ClinVar) CONCLUSIONS • The international mitochondrial disease community is collaborating through the MSeqDR Consortium to create a unified genomic database that facilitates diagnosis and improved understanding of individual mitochondrial diseases • MSeqDR website (http://mseqdr.luhs.org/) provide a common entry for clinicians, diagnostic labs, and researchers involved in genomic analyses in suspected mitochondrial disease – Provide flexible and continually updated suite of web-based and open access software tools accessible by clinicians, labs, and researchers from their office/clinic desktop to securely mine exome data in a real-time setting – Exploit collective information of variant allele frequencies in a large cohort of individuals with suspected mitochondrial disease (gem.app, G-browse, etc), potentially linked to relevant phenotype & laboratory data – Accelerate pace and accuracy of known a& novel gene discovery in mito disease – Deposit aggregate-level deidentified exome or variant data to share at various levels of comfort (BioDAS Server) – Provide improved knowledge and centralized resource for locus and variant allele frequencies in nuclear AND mtDNA-based mitochondrial diseases – Assist with transfer of anonymized data to public resources (NIH, Global Alliance) Acknowledgements MEEI/Harvard University of Miami Xiaowu Gai, PhD Lishuang Shen, PhD Stephan Zuchner, MD Michael Gonzalez, PhD CHOP Claire Sheldon, MD, PhD Danuta Krotoski, PhD Elizabeth McCormick, MS, CGC Melisa Parisi, MD, PhD MSeqDR PROTOTYPE DEVELOPMENT PARTICIPANTS: • • • • • • • • • • NICHD, NIH Doug Wallace, Michio Hirano, Doug Kerr, Curt Scharfe, Li Dong, Hakon Hakonarson, Bruce Cohen, Amy Goldstein, Richard Haas, Russell Saneto (USA) Marcella Attimonelli, Mannis van Oven (Italy) Holger Prokisch (Germany) Mark Tarnopolsky, Isabella Thiffault (Canada) Richard Rodenburg, Jan Smeitink, IFM de Coo, Bert Smeets, Fons Stassen (The Netherlands) Virginia Brilhante (Finland) Yasushi Okazaki (Japan) Donna Maglott, Wendy Rubinstein (NCBI) Heidi Rehm (ClinGen) Clinical diagnostic laboratories: • Jeana DaRe, David Ralph (Transgenomics) • Renkui Bai, Sherri Bale (GeneDx) • Richard Boles, Christine Stanley (Courtagen) UMDF Chuck Mohan, CEO Dan Wright, President Philip Yeske, PhD Janet Owens Cliff Gorski FUNDING United Mitochondrial Disease Foundation NAMDC Pilot Grant Award #NAMDC7407 (NINDS/NICHD, NIH) U01-HG006546 (NHGRI, NIH) falkm@email.chop.edu MSeqDR WEBSITE https://mseqdr.org falkm@email.chop.edu