Second Review Document Rare Genetic Disease Database Sarthak Khillan 19BCB0118 +91 9999144653 sarthak.khillan2019@vitstudent.ac.in P Mridula Menon 18BCB0125 +91 9819869005 pmridula.menon2018@vitstudent.ac.in G. Mahesh Chowdhary 19BCB0139 B.Tech. in Computer Science and Engineering School of Computer Science & Engineering Abstract Here, we describe a dataset with information about rare genetic diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We preprocessed and curated a collection of 3500+ rare genetic diseases and linked them to their causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases will also be added using information from OMIM, PubMed, Wikipedia, and Google Scholar. . This dataset relies on publicly available data from Orphanet, OMIM and ClinVar and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis will hopefully help in analysis of the timeline of rare disease and causative gene discovery and links them to developments in methods. Introduction A genetic disease is caused by a change, or mutation, in an individual’s DNA sequence.These mutations can be due to an error in DNA replication or due to environmental factors, such as cigarette smoke and exposure to radiation, which cause changes in the DNA sequence. Depending on where these mutations occur, they can have little or no effect, or may profoundly alter the biology of cells in our body, resulting in a genetic disorder.A rare genetic disease is a disease that affects a very small section of the population. In this project we will be creating a database with detailed information regarding rare genetic diseases and the genes associated with the same. It will serve as an inventory and encyclopedia of rare genetic diseases, with genes involved. Methodology Adapted ● We will be creating the database with information from various primary databases like OMIM, Orphanet, ClinVar, GnomAD. ● We will clean, organize and curate the collected data in SQL backend and integrate it with a website where you can search the information needed with the help of various parameters like Disease name, Gene name, OMIM ID etc. ● Every disease will be catalogued and details like the genetic information, prevalence, symptoms, onset, etc will be displayed along with other information. ● Our database will be presented as an Open source platform where other users can freely use the information. RESULTS: Curated and cleaned database imported into MongoDB and MySQL. Following are the images: Work to be done: ● Adding more required data entries to the 3500+ rare disease curated database we have imported on to the SQL back end if needed. ● ● ● ● To further check the data entries and tables to remove duplicates and any irrelevant information we may have missed. To integrate the back end with the front end by connecting the SQL database to the front end. To design the front end for ease of use and with easy keyword search options. Testrun the website for multiple results and check its efficiency. Contributions: Mridula identified the datasets from OMIM, Orphanet and ClinVar required to be cleaned and processed for importing and helped in the curation of the database and worked on the literature review and most parts of documentation . Sarthak and Mahesh worked on downloading the datasets and converting them to required Json files and in coding and pipelining the cleaned database onto the SQL database along with cleaning, organizing and processing the data entries. Literature Review /References Sr. Title Highlights Conclusion No. [1] Rare diseases collectively impact a large Rare genetic diseases collectively Rare genetic diseases: update portion of the world’s population. impact a significant portion of the on diagnosis, treatment and Advances in technology are allowing a world’s population. For many diseases online resources better understanding of rare there is limited information available, Authors -Robert disorders.Molecular techniques are and clinicians can find difficulty in helping improve the efficiency of the differentiating between clinically diagnostic process. Several therapeutic similar conditions. This leads to strategies are also being developed for problems in genetic counseling and rare genetic diseases. patient treatment. The biomedical Web-based tools started helping market is affected because healthcare professionals in differential pharmaceutical and biotechnology diagnosis. industries do not see advantages in E.Pogue1‡Denise P.Cavalcanti,ShreyaShanker,Ro sangela V.Andrade1Lana R.Aguiar, Juliana L.de Carvalho, Fabrício F.Costa addressing rare disease treatments, or because the cost of the treatments is too high. By contrast, technological advances including DNA sequencing and analysis, together with computer-aided tools and online resources, are allowing a more thorough understanding of rare disorders. Here, how the collection of various types of information together with the use of new technologies is facilitating diagnosis and, consequently, treatment of rare diseases is discussed. [2] Provision of a molecularly confirmed Thus, continued research is required for International Cooperation to diagnosis in a timely manner for moving toward a more complete Enable the Diagnosis of All children and adults with rare genetic catalog of disease-related genes and Rare Genetic Diseases diseases shortens their “diagnostic variants. The International Rare Authors-Kym odyssey,” improves disease Diseases Research Consortium M.Boycott,AnaRath,Jessica management, and fosters genetic (IRDiRC) was established in 2011 to X.Chong,TailaHartley,Fowzan counseling with respect to recurrence bring together researchers and S.Alkuraya,GarethBaynam6Anth risks while assuring reproductive organizations invested in rare disease ony,J.Brookes,MichaelBrudno,A choices. In a general clinical genetics research to develop a means of nge lCarracedo, Johan T.den setting, the current diagnostic rate is achieving molecular diagnosis for all Dunnen, Stephanie O.M.Dyke, approximately 50%, but for those who rare diseases. Here, the current and Xavier Estivill, Jack do not receive a molecular diagnosis future bottlenecks to gene discovery Goldblatt,Catherine, after the initial genetics evaluation, that and suggest strategies for enabling Gonthier,Stephen C.Groft, rate is much lower. Diagnostic success progress in this regard are reviewed. IvoGut,AdaHamosh, PhilipHieter, for these more challenging affected individuals depends to a large extent on Hanns Lochmüller progress in the discovery of genes Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding rare disease, enabling associated with, and mechanisms precision medicine for this patient underlying, rare diseases population. [3] Diseasecard’s reasoning is to build a An innovative portal for rare Diseasecard is a web portal delivering true lightweight knowledge base genetic diseases research: The rapid access to rare diseases covering rare genetic diseases. semantic Diseasecard resources.Data integrated from distinct Developed with the latest semantic web Author- sources are managed in a semantic web technologies, this portal delivers unified environment.The aggregated semantic access to a comprehensive network for rare diseases network is available researchers, clinicians, patients and through an advanced API. Diseasecard bioinformatics developers. With is freely available online at in-context access covering over 20 http://bioinformatics.ua.pt/diseasecard. distinct heterogeneous resources, Pedro Lopes José, Luís Oliveira Diseasecard’s workspace provides access to the most relevant scientific knowledge regarding a given disorder, whether through direct common identifiers or through full-text search over all connected resources. In addition to its user-oriented features, Diseasecard’s semantic knowledge base is also available for direct querying, enabling everyone to include rare genetic diseases knowledge in new or existing information systems. [4] The discovery of disease-causing PhenomeCentral addresses the mutations typically requires increasing need for computational confirmation of the variant or gene approaches to identify individuals Orion J. Buske,Marta in multiple unrelated individuals, affected by the same or overlapping Girdea,Sergiu Dumitriu,Bailey and a large number of rare genetic Gallinger,Taila Hartley,Heather diseases remain unsolved due to PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients with Rare Genetic Diseases phenotypes and mutations in the same gene, thereby enabling novel gene discovery. Trang,Andriy Misyura,Tal difficulty identifying second Friedman,Chandree families. To enable the secure Beaulieu,William P. sharing of case records by clinicians Bone,Amanda E. Links,Nicole and rare disease scientists, they have L. Washington,Melissa A. Haendel,Peter N. Robinson,Cornelius F. Boerkoel,David Adams,William A. Gahl,Kym M. Boycott,Michael Brudno developed the PhenomeCentral portal (https://phenomecentral.org). Each record includes a phenotypic description and relevant genetic information (exome or candidate genes). [5] Developing a national collaborative study system for rare genetic diseases There are thousands of rare genetic The development of a mechanism diseases and many genetic and through which the aforementioned nongenetic contributors to common issues and activities can be coordinated Michael S Watson, Charles Epstein, R Rodney Howell, Marilyn C Jones, Bruce R Korf, Edward R B McCabe & Joe Leigh Simpson genetic diseases. The evidence base that will be essential to moving forward. is currently available about the great Significant amounts of money are majority of these conditions is limited to already being spent on individual case studies and relatively small components of the system that is observational study sets derived from envisioned in this report. Much of the one or several institutions. Hence, the work is multidisciplinary and involves statistical power in any one study is clinical service providers, clinical usually quite limited. Further, in the investigators, industry, government, and absence of organized registries and data the public. Further, because many with collection on particular patient groups, rare genetic diseases are identified in the information available is weak and NBS programs, the involvement of the patient resources that are available States and public health programs will are limited. The meeting in which these be necessary. The nature of the work issues were raised resulted in a set of will require the involvement of proposed principles and associated organized medicine to develop recommendations as to how best to consensus practice guidelines for care achieve the vision of creating an and related data collection. Few extensive and comprehensive organizations bridge this wide array of collaboration of professional and lay communities to enable translational interest groups and expertise as does research to improve clinical care and the ACMG. therapies for persons with rare genetic diseases. [6] Addressing challenges in the diagnosis and treatment of rare genetic diseases The past 5 years have seen an It is clear that the grand challenges for unprecedented rate of discovery of rare diseases will require international genes that cause rare diseases and with cooperation and stakeholder Kym M. Boycott & it a commensurate increase in the engagement at an unprecedented level. Diego Ardigó number of diagnosable but nevertheless The International Rare Diseases untreatable disorders. Here, the authors Research Consortium (IRDiRC), a discuss the increasing opportunity for partnership of nearly 50 organizations diagnosis and therapy of rare diseases in 18 nations (with combined yearly and how to tackle the associated funding of US$2 billion) is attempting challenges. just this with its recently announced strategic goals for 2017–2027: diagnosis of rare disease patients within 1 year or entrance into a globally coordinated diagnostic and research pipeline; 1,000 new therapies, chiefly for diseases with no approved treatment; and the introduction of methods for assessing the impact of diagnoses and therapies on patient wellbeing5 . The success of such initiatives will be crucial for leveraging the increasing opportunity for diagnosis and therapy of rare diseases. [7] Phenogenon: Gene to phenotype associations for rare genetic diseases As high-throughput sequencing is In conclusion, authors have developed a increasingly applied to the molecular statistical tool, Phenogenon, to detect Nikolas Pontikos , diagnosis of rare Mendelian disorders, a and visualise “gene—HPO—MOI” Cian Murphy , large number of patients with diverse relationships. The approach has Ismail Moghul , phenotypes have their genetic and suggested some strong candidate Gavin Arno, phenotypic data pooled together to relationships and correctly recapitulated Kaoru Fujinami, uncover new gene-phenotype relations. existing relationships. The adoption of Yu Fujinami, Authors introduce Phenogenon, a the HPO nomenclature by large rare Dayyanah Sumodhee, statistical tool that combines, Human disease sequencing projects leads us to Susan Downes, Phenotype Ontology (HPO) annotated believe Phenogenon will be of Andrew Webster, patient phenotypes, gnomAD allele increasing utility in understanding Jing Yu , population frequency, and Combined gene-phenotype-MOI relationships as UK Inherited Retinal Annotation Dependent Depletion genetics is phased into routine NHS Dystrophy Consortium, (CADD) score for variant pathogenicity, practice. Phenopolis Consortium in order to jointly predict the mode of inheritance and gene-phenotype associations. [8] New Diagnostic Approaches for Undiagnosed Rare Genetic Diseases Accurate diagnosis is the cornerstone of These are promising times for the RGD medicine; it is essential for informed community; never before has the care and promoting patient and family prospect of identifying a molecular well-being. However, families with a diagnosis for all RGD patients been so rare genetic disease (RGD) often spend attainable. To overcome this grand more than five years on a diagnostic challenge, however, will require global odyssey of specialist visits and invasive cooperation at an unprecedented scale. testing that is lengthy, costly, and often We need to ensure that all undiagnosed futile, as 50% of patients do not receive RGD patients are identified and seen by a molecular diagnosis. The current clinicians with expertise in RGD diagnostic paradigm is not well diagnosis, have access to appropriate designed for RGDs, especially for genetic testing, and are offered patients who remain undiagnosed after opportunities to share their phenotypic the initial set of investigations, and thus data, genotypic data, and biological requires an expansion of approaches in samples with researchers; only then will the clinic. Leveraging opportunities to we be able to understand the cause of participate in research programs that each and every RGD. The IRDiRC's utilize new technologies to understand goal over the next 10 years—for each RGDs is an important path forward for RD patient to receive a diagnosis within patients seeking a diagnosis. Given one year of coming to medical recent advancements in such attention—will be achievable if we put technologies and international the well-being of the patients and their initiatives, the prospect of identifying a families at the center of everything that molecular diagnosis for all patients with we do. RGDs has never been so attainable, but achieving this goal will require global cooperation at an unprecedented scale. [9] KDBI: Kinetic Data of Bio-molecular Interactions database authors introduce a new database of KDBI currently contains 8273 entries Kinetic Data of Bio-molecular of biomolecular binding or reaction Interactions (KDBI) aimed at providing events. There are a total of 1380 experimentally determined kinetic data proteins, 143 nucleic acids and 1395 of protein–protein, protein-RNA, small molecules included in the protein-DNA, protein-ligand, database. Work is underway to collect RNA-ligand, DNA-ligand binding or kinetic data published in earlier years, reaction events described in the which is expected to significantly literature. increase the number of entries in the database. Rapid advances in proteomics pathways and networks are expected to stimulate more interest in the quantitative aspects of biomolecular interactions including kinetic data . The availability of increasing amount of kinetic data can better serve the need for mechanistic investigation, quantitative study and simulation of cellular processes and events. [10] MvirDB—a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications To facilitate rapid identification of MvirDB is a centralized resource (data sequences and characterization of genes warehouse) comprising all publicly for signature discovery, authors have accessible, organized sequence data for collected all publicly available (as of protein toxins, virulence factors and this writing), organized sequences antibiotic resistance genes. Protein representing known toxins, virulence entries in MvirDB are annotated using a factors, and antibiotic resistance genes high-throughput, fully automated in one convenient database, which they computational annotation system; believe will be of use to the bio-defense annotations are updated periodically to research community ensure that results are derived using current public database and open-source tool releases. Tools provided for using MvirDB include a web-based browser tool and blast interface