A Unified Clinical Genomics Database NHGRI - U41 Genomic Resource Grant www.iccg.org Variant Analysis for General Genome Report 3-5 million variants Genes ~20,000 Coding/Splice Variants Published as Disease-Causing 20-40 “Pathogenic” Variants <1% Rare CDS/Splice Variants LOF in Disease Associated Genes 30-50 Variants Review evidence for variant pathogenicity Review evidence for gene-disease association and LOF role Pharmacogenetics 5-10 Variants Classification of Reported Pathogenic Variants found in Human Genomes Likely Path – 1% Pathogenic – 2% Benign 18% Uncertain significance – 52% Likely Benign 26% U41 Genomic Resource Grant: A Unified Clinical Genomics Database To raise the quality of patient care by: • Standardizing the annotation and interpretation of genomic variants • Sharing variant and case level data through a centralized database for clinical and research use • Implementing an evidence-based expert consensus process for curating genes and variant interpretations Supporting data collection, submission and curation • Work with NCBI to design ClinVar to meet the needs of the community • Develop data dictionary, ontologies, and work with standards bodies • Define data submission and access policies for variant and case-level data including genotypes and phenotypes • Work with labs to solicit and support data submission • Evidence-based curation of structural variants - (Riggs et al. 2012 ) • Evidence-based curation of sequence variants (ACMG Committee work in progress) • Develop a gene-centric resource to define the medical exome and provide tools to support use in genomic medicine • Work with vendors to improve reagents for genomic analysis (CMA, WES, WGS) NIH NCBI ClinVar www.ncbi.nlm.nih.gov/clinvar ClinVar Submitters OMIM Harvard Medical School and Partners Healthcare InVitae Inc. International Standards For Cytogenomic Arrays GeneReviews ARUP Laboratories LabCorp Sharing Clinical Reports Project Finland Institute for Molecular Medicine Tuberous Sclerosis Database ClinSeq Project Leiden Muscular Dystrophy Database GeneDx Emory Genetics Laboratory American College of Medical Genetics and Genomics Osteogenesis Imperfecta Database; University of Leicester Ambry Genetics Other laboratories (19) Total Variants Genes 23524 6996 5526 4194 2913 1415 1391 902 840 431 425 220 205 48 23 15 10 52 49130 3077 155 4 46 287 6 140 2 39 1 35 10 3 13 1 3 1 25 3848 Sequencing Laboratories Which Have Agreed to Share Data Alfred I Dupont Hospital for Children All Children's Hospital St. Petersburg Ambry Laboratories ARUP Athena Diagnostics Baylor Medical Genetic Laboratories Boston Children's Hospital Boston University Children's Hospital of Philadelphia Children's Mercy Hospital, Kansas City Cincinnati Children's Hospital City of Hope Molecular Diagnostic Lab CureCMD Denver Genetic Laboratories Detroit Medical Center Emory University Fullerton Genetics Laboratory GeneDx Cleveland Clinic Greenwood Genetics Harvard-Partners Lab for Molec. Medicine Henry Ford Hospital Huntington Medical Research Institutes Illumina Clinical Services Lab Indiana University/Perdue University InSiGHT LabCorp / Integrated Genetics / Correlagen Masonic Medical Research Laboratory Mayo Clinic Mt. Sinai School of Medicine Nationwide Children's Hospital Nemours Biomolecular Core, Jefferson Medical Oregon Health Sciences University Providence Sacred Heart Medical Center Quest Diagnostics SickKids Molecular Genetic Laboratory Transgenomics University of Chicago University of Michigan University of Nebraska Medical Center University of Oklahoma University of Penn University of Sydney University of Washington Women and Children's Hospital Wayne State University School of Medicine Yale University Documenting arguments will improve the evidence-based assessment of variants U41/ClinVar pilot project Comparison of three laboratories classifications for variants in 12 RASopathy genes: BRAF, CBL, HRAS, KRAS, MAP2K1, MAP2K2, NRAS, PTPN11, RAF1, SHOC2, SOS1, SPRED1 Scope Number of alleles Total submitted to ClinVar 997 Multiple assertions 269 20% discrepant 53 discrepancies: 60% differ based upon likelihood (Benign vs LB, P vs LP) 34% differed VUS vs Likely Pathogenic/Likely Benign 6% differed VUS vs Pathogenic Lab Classification Differences 84% differences were Lab A reporting a more aggressive assertion (Pathogenic/Benign) than Lab B/C (LP, LB, VUS) 16% of differences were Labs B/C reporting a more aggressive assertion than Lab A ACMG Lab QA Committee on the Interpretation of Sequence Variants ACMG Sue Richards (chair), Heidi Rehm (co-chair) Sherri Bale, David Bick, Soma Das, Wayne Grody, Madhuri Hegde, Elaine Spector AMP Julie Gastier-Foster, Elaine Lyon CAP Nazneen Aziz, Karl Voelkerding 12 Evidence supporting pathogenicity (check all that apply): I. Stand-alone □ Truncating variant (e.g. nonsense, frameshift, canonical +/-1,2 splice sites, initiation codon) in a gene where loss of function is a known mechanism of disease1 □ Same amino acid change as a previously established pathogenic variant regardless of nucleotide change2 II. Strong □ De novo (paternity confirmed)3 □ Well-established in vitro or in vivo functional studies supportive of a deleterious effect on the gene or gene product4 □ Case-control studies show a p value <0.01 for enrichment in cases6 III. Supporting □ Located in a mutational hot spot and/or experimentally well-characterized functional domain7 □ Variant occurs in a gene with high clinical specificity and sensitivity for a particular phenotype and the proband has multiple, specific features of the disease8 □ Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc)9 □ Type of variant fits known pathogenic variant spectrum for the disease10 □ Variant frequency in control data Absent from controls in Exome Sequencing Project & 1000 Genomes, OR Case-control studies show p value between 0.01-0.05 for enrichment in cases (only applies if well-phenotyped populations are available) and frequency is below highest general population minor allele frequency (MAF) expected for disease:6 General guidance: Autosomal dominant MAF <0.4% General guidance: X-linked MAF <0.4% males General guidance: Autosomal recessive MAF <1% □ For recessive disorders, detected in trans with a pathogenic variant11 □ Assumed de novo, but without confirmation of paternity3 □ In-frame deletions/insertions in a non-repeat region or stop-loss variants12 □ Co-segregation with disease5 □ Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before2 5 Categories: Pathogenic Likely Pathogenic Uncertain significance Likely benign Benign Pathogenic = 1 stand-alone OR 2 strong OR 1 strong + ≥3 supporting Likely Pathogenic = 1 strong + 2 supporting OR ≥4 supporting Benign = 1 stand-alone OR 2 strong OR 1 strong + ≥3 supporting Likely benign = 1 strong + 2 supporting OR ≥4 supporting Evidence supporting benign classification (check all that apply): I. Stand-alone □ For autosomal recessive: ≥1% MAF frequency6 □ For autosomal dominant: ≥0.4% or lower depending on disease frequency and penetrance6 □ For X-linked: ≥0.4% or lower in males depending on disease frequency and penetrance6 □ Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder with full penetrance at an early age6 II. Strong □ Well-established in-vitro or in vivo functional studies shows no deleterious effect on protein function or splicing4 □ Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder11 □ Variant present in multiple mammalian species despite adjacent conservation9 III. Supporting □ Located in a highly variable region without a known function7 □ Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc)9 □ Type of variant does not fit known pathogenic variant spectrum10 □ Case-control studies show comparable frequencies (e.g. p > 0.05)6 □ Variant in a dominant gene that does not segregate in a family5 or is found in a case with an alternate cause of disease13 □ Observed in cis with a pathogenic variant11 *Variants should be classified as Uncertain Significance if other criteria are unmet QC and Expert Concensus Practice guidelines Evidence-based review Guideline Expert Curation ClinVar Inter-laboratory Multi-Source Curation Intra-laboratory Single-Source Curation Large variant datasets Uncurated dbSNP/dbVar Curation - ClinVar Analysis of LOF Variants - single genome Rare LOFs 8 Reported Common 33 82 LOF variants below 5% MAF from one case Novel/Rare - 41 Pathogenic - 2 (Both AR 1 novel 1 known) VUS – 1 (novel) Update database Excluded 46 Weak gene Not to disease Mendelian association 14 10 False Positives 13 False positive Weak gene-disease association Non-Mendelian LOF not disease LOF not a disease mechanism Mechanism - 2 Gene-centric resource 1. Define genes with medical relevance 2. Technical challenges • • • • High GC Pseudogenes/homologies Repeat expansions Common sites of structural variation Initiated through collaboration amongst CHOP, Emory, and Harvard/Partners and Structural Variant workgroup 3. Variant types (denote common vs rare types) • • • • • Sequence variants (substitutions, small indels) • Loss-of-function vs. Gain-of-function CNV – haploinsufficient vs. triplosensitive Other structural changes (translocations, inversions, etc) Imprinted loci Repeat expansions 4. Medically relevant transcripts 5. Gene regions of pathogenic relevance 6. Patterns of inheritance (dominant, recessive, X-linked, mitochondrial, de novo, etc) 7. Phenotypes and evidence base for phenotype associations 8. Available approaches to define variant pathogenicity (assays, tools, etc) 9. Clinical utility measures 10. Clinical decision support opportunities U41 - Working with Existing Efforts • NCBI (ClinVar, dbSNP, dbVar, dbGaP, GTR) and EBI • NHGRI (CRVR, eMERGE, CSER, ROR), IRDiRC • Regulatory and Standards: ACMG, CAP, CDC, FDA, ASHG, AMP, CMGS, Global Alliance • Locus Specific Databases (LSDBs – LOVD and non-LOVD) • InSiGHT, PharmGKB, MSeqDB, CFTR2, ENIGMA, etc • Human Variome Project and HGVS • PhenoDB (Ada Hamosh) and Human Phenotype Ontology (Peter Robinson) • OMIM (Ada Hamosh) and GeneReviews (Bonnie Pagon) • Patient Advocacy Groups (Genetic Alliance, Patient CrossRoads, UNIQUE, Disease Specific Groups) • Industry partners (reagents, instruments, software, etc) ClinGen: The Clinical Genome Resource Program Collaboration between: • NHGRI U41 Grant – PIs: Ledbetter (Geisinger), Martin (Geisinger), Nussbaum (UCSF), Mitchell (Utah), Rehm (Partners/Harvard) • NHGRI U01 “Clinically Relevant Variant Resource” Grants – Grant 1 PIs: Bustamante (Stanford), Plon (Baylor) – Grant 2 PIs: Berg (UNC), Ledbetter (Geisinger), Watson (ACMG) • NCBI – ClinVar ClinGen Delegation of Responsibilities Data Collection Curation Structural Variation Variant Curation – Clinical Significance Sequence Variation Gene-Variant Pairs – Actionability Other Genomic Data Clinical Domain Curation Phenotype Machine Learning Curation IT/Biofx Community Data Extraction Education Data Analysis ELSI/ Actionability Data Dissemination Laboratory Bioinformatics/IT EHR Integration Community Patient Registry U41 UNC Geisinger ACMG U01 Stanford Baylor U01 ClinGen System Interactions Private Labs Labs Labs (Genotypes & Phenotypes) Patient Registries Controlled Access Public Access LSDBs dbGaP OMIM Medical Lit Case-level Data Crowdsourced Curation Pharm GKB ClinVar Variant-level Data Data Gene Resource (Medical Exome, Actionability) CNV Curation Tool (JIRA) Population Datasets Expert Curated Variants CoreDB External Informatics Activities Enabled Application Interface Machine Learning Algorithms EHR Interface Portal for the Public Disease Area Curation Tool Disease WGs Clinical Domain WGs Expert Curation of Genes and Variants by Clinical Domain and Disease Area Workgroups International Collaboration for Clinical Genomics – Over 190 institutional members – Over 2800 individual members Annual Conference June 10-12, 2014, Bethesda, MD – Attendees include laboratory directors, physicians, genetic counselors, researchers, parents, government employees, regulatory agency representatives, and vendor partners U41 Principal Investigators and Workgroups NIH U41 PIs: David Ledbetter (Geisinger), Christa Martin (Geisinger), Joyce Mitchell (Utah), Robert Nussbaum (UCSF), Heidi Rehm (Harvard) Sequence Variant Workgroup Structural Variant Workgroup Phenotyping Workgroup Madhuri Hegde (co-chair, Emory) Sherri Bale (co-chair, GeneDx) Carlos Bustamante (Stanford) Soma Das (U Chicago) Matt Ferber (Mayo) Birgit Funke (Harvard/MGH) Marc Greenblat (UVM) Elaine Lyon (ARUP) Dona Maglott (NCBI) Sharon Plon (Baylor) Heidi Rehm (Harvard/Partners) Avni Santani (CHOP) Patrick Willems (Gendia) Erik Thorland (co-chair, Mayo) Swaroop Aradhya (co-chair, InVitae) Deanna Church (NCBI) Hutton Kearney (Fullerton) Charles Lee (Jackson Labs) Christa Martin (Emory) Sarah South (ARUP) Chad Shaw (Baylor) Karin Wain (Utah) David Miller (chair, Harvard) Ada Hamosh (Hopkins) Karen Eilbeck (Utah) Monica Giovanni (Geisinger) Robert Green (Harvard/BWH) Mike Murray (Geisinger) Robert Nussbaum (USCF) Erin Riggs (Emory) Peter Robinson (Berlin) Steven Van Vooren (Cartagenia) Patrick Willems (Gendia) Engagement, Education and Access Workgroup Andy Faucett (chair, Geisinger) Erin Riggs (Emory) Danielle Metterville (Partners) Genetic Counselors from participating laboratories Bioinformatics and IT Workgroup Karen Eilbeck (co-chair) and Sandy Aronson (co-chair) ARUP: Brendon O’Fallon; Cartagenia: Steven Van Vooren; Emory: Stuart Tinker; GeneDx: Rhonda Brandon, Lisa Vincent; Mayo: Eric Klee; NCBI: Deanna Church, Jennifer Lee, Donna Maglott; George Riley; Partners Healthcare: Eugene Clark, Larry Babb, Matt Varugheese; University of Chicago Teja Nelakuditi; Utah: Karen Eilbeck, Shawn Rynearson Consultants Les Biesecker, Johan den Dunnen, Robert Green, Ada Hamosh, Laird Jackson, Stephen Kingsmore, Jim Ostell, Sue Richards, Peter Robinson, Lisa Salberg, Joan Scott, Sharon Terry