Uploaded by Mridula menon

BioDB Review1

Second Review Document
Rare Genetic Disease Database
Sarthak Khillan
+91 9999144653
P Mridula Menon
+91 9819869005
G. Mahesh Chowdhary
Computer Science and Engineering
School of Computer Science & Engineering
Here, we describe a dataset with information about rare genetic diseases with a known genetic
background, supplemented with manually extracted provenance for the disease itself and the discovery of
the underlying genetic cause. We preprocessed and curated a collection of 3500+ rare genetic diseases and
linked them to their causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols.
The PubMed identifiers of the scientific publications, which for the first time described the rare diseases,
and the publications, which found the genes causing the diseases will also be added using information
from OMIM, PubMed, Wikipedia, and Google Scholar. . This dataset relies on publicly available data
from Orphanet, OMIM and ClinVar and publications with a PubMed identifier, but by our effort to make
the data interoperable and linked, we can now analyse this data. Our analysis will hopefully help in
analysis of the timeline of rare disease and causative gene discovery and links them to developments in
A genetic disease is caused by a change, or mutation, in an individual’s DNA sequence.These mutations
can be due to an error in DNA replication or due to environmental factors, such as cigarette smoke and
exposure to radiation, which cause changes in the DNA sequence.
Depending on where these mutations occur, they can have little or no effect, or may profoundly alter
the biology of cells in our body, resulting in a genetic disorder.A rare genetic disease is a disease that
affects a very small section of the population.
In this project we will be creating a database with detailed information regarding rare genetic diseases
and the genes associated with the same. It will serve as an inventory and encyclopedia of rare
genetic diseases, with genes involved.
Methodology Adapted
We will be creating the database with information from various primary databases like OMIM,
Orphanet, ClinVar, GnomAD.
We will clean, organize and curate the collected data in SQL backend and integrate it with a
website where you can search the information needed with the help of various parameters like
Disease name, Gene name, OMIM ID etc.
Every disease will be catalogued and details like the genetic information, prevalence, symptoms,
onset, etc will be displayed along with other information.
Our database will be presented as an Open source platform where other users can freely use the
Curated and cleaned database imported into MongoDB and MySQL. Following are the images:
Work to be done:
Adding more required data entries to the 3500+ rare disease curated database we have imported
on to the SQL back end if needed.
To further check the data entries and tables to remove duplicates and any irrelevant information
we may have missed.
To integrate the back end with the front end by connecting the SQL database to the front end.
To design the front end for ease of use and with easy keyword search options.
Testrun the website for multiple results and check its efficiency.
Mridula identified the datasets from OMIM, Orphanet and ClinVar required to be cleaned and processed
for importing and helped in the curation of the database and worked on the literature review and most
parts of documentation . Sarthak and Mahesh worked on downloading the datasets and converting them
to required Json files and in coding and pipelining the cleaned database onto the SQL database along
with cleaning, organizing and processing the data entries.
Literature Review /References
Rare diseases collectively impact a large Rare genetic diseases collectively
Rare genetic diseases: update
portion of the world’s population.
impact a significant portion of the
on diagnosis, treatment and
Advances in technology are allowing a
world’s population. For many diseases
online resources
better understanding of rare
there is limited information available,
Authors -Robert
disorders.Molecular techniques are
and clinicians can find difficulty in
helping improve the efficiency of the
differentiating between clinically
diagnostic process. Several therapeutic
similar conditions. This leads to
strategies are also being developed for
problems in genetic counseling and
rare genetic diseases.
patient treatment. The biomedical
Web-based tools started helping
market is affected because
healthcare professionals in differential
pharmaceutical and biotechnology
industries do not see advantages in
sangela V.Andrade1Lana
R.Aguiar, Juliana L.de
Carvalho, Fabrício F.Costa
addressing rare disease treatments, or
because the cost of the treatments is too
high. By contrast, technological
advances including DNA sequencing
and analysis, together with
computer-aided tools and online
resources, are allowing a more
thorough understanding of rare
disorders. Here, how the collection of
various types of information together
with the use of new technologies is
facilitating diagnosis and, consequently,
treatment of rare diseases is discussed.
Provision of a molecularly confirmed
Thus, continued research is required for
International Cooperation to
diagnosis in a timely manner for
moving toward a more complete
Enable the Diagnosis of All
children and adults with rare genetic
catalog of disease-related genes and
Rare Genetic Diseases
diseases shortens their “diagnostic
variants. The International Rare
odyssey,” improves disease
Diseases Research Consortium
management, and fosters genetic
(IRDiRC) was established in 2011 to
counseling with respect to recurrence
bring together researchers and
risks while assuring reproductive
organizations invested in rare disease
choices. In a general clinical genetics
research to develop a means of
nge lCarracedo, Johan T.den
setting, the current diagnostic rate is
achieving molecular diagnosis for all
Dunnen, Stephanie O.M.Dyke,
approximately 50%, but for those who
rare diseases. Here, the current and
Xavier Estivill, Jack
do not receive a molecular diagnosis
future bottlenecks to gene discovery
after the initial genetics evaluation, that
and suggest strategies for enabling
Gonthier,Stephen C.Groft,
rate is much lower. Diagnostic success
progress in this regard are reviewed.
IvoGut,AdaHamosh, PhilipHieter, for these more challenging affected
individuals depends to a large extent on
Hanns Lochmüller
progress in the discovery of genes
Each successful discovery will define
potential diagnostic, preventive, and
therapeutic opportunities for the
corresponding rare disease, enabling
associated with, and mechanisms
precision medicine for this patient
underlying, rare diseases
Diseasecard’s reasoning is to build a
An innovative portal for rare
Diseasecard is a web portal delivering
true lightweight knowledge base
genetic diseases research: The
rapid access to rare diseases
covering rare genetic diseases.
semantic Diseasecard
resources.Data integrated from distinct
Developed with the latest semantic web
sources are managed in a semantic web
technologies, this portal delivers unified
environment.The aggregated semantic
access to a comprehensive network for
rare diseases network is available
researchers, clinicians, patients and
through an advanced API. Diseasecard
bioinformatics developers. With
is freely available online at
in-context access covering over 20
distinct heterogeneous resources,
Pedro Lopes José, Luís
Diseasecard’s workspace provides
access to the most relevant scientific
knowledge regarding a given disorder,
whether through direct common
identifiers or through full-text search
over all connected resources. In
addition to its user-oriented features,
Diseasecard’s semantic knowledge base
is also available for direct querying,
enabling everyone to include rare
genetic diseases knowledge in new or
existing information systems.
The discovery of disease-causing
PhenomeCentral addresses the
mutations typically requires
increasing need for computational
confirmation of the variant or gene
approaches to identify individuals
Orion J. Buske,Marta
in multiple unrelated individuals,
affected by the same or overlapping
Girdea,Sergiu Dumitriu,Bailey
and a large number of rare genetic
Gallinger,Taila Hartley,Heather
diseases remain unsolved due to
PhenomeCentral: A Portal for
Phenotypic and Genotypic
Matchmaking of Patients with
Rare Genetic Diseases
phenotypes and mutations in the same
gene, thereby enabling novel gene
Trang,Andriy Misyura,Tal
difficulty identifying second
families. To enable the secure
Beaulieu,William P.
sharing of case records by clinicians
Bone,Amanda E. Links,Nicole
and rare disease scientists, they have
L. Washington,Melissa A.
Haendel,Peter N.
Robinson,Cornelius F.
Adams,William A. Gahl,Kym
M. Boycott,Michael Brudno
developed the PhenomeCentral
portal (https://phenomecentral.org).
Each record includes a phenotypic
description and relevant genetic
information (exome or candidate
Developing a national
collaborative study system for
rare genetic diseases
There are thousands of rare genetic
The development of a mechanism
diseases and many genetic and
through which the aforementioned
nongenetic contributors to common
issues and activities can be coordinated
Michael S Watson,
Charles Epstein,
R Rodney Howell,
Marilyn C Jones,
Bruce R Korf,
Edward R B McCabe &
Joe Leigh Simpson
genetic diseases. The evidence base that
will be essential to moving forward.
is currently available about the great
Significant amounts of money are
majority of these conditions is limited to already being spent on individual
case studies and relatively small
components of the system that is
observational study sets derived from
envisioned in this report. Much of the
one or several institutions. Hence, the
work is multidisciplinary and involves
statistical power in any one study is
clinical service providers, clinical
usually quite limited. Further, in the
investigators, industry, government, and
absence of organized registries and data
the public. Further, because many with
collection on particular patient groups,
rare genetic diseases are identified in
the information available is weak and
NBS programs, the involvement of
the patient resources that are available
States and public health programs will
are limited. The meeting in which these
be necessary. The nature of the work
issues were raised resulted in a set of
will require the involvement of
proposed principles and associated
organized medicine to develop
recommendations as to how best to
consensus practice guidelines for care
achieve the vision of creating an
and related data collection. Few
extensive and comprehensive
organizations bridge this wide array of
collaboration of professional and lay
communities to enable translational
interest groups and expertise as does
research to improve clinical care and
the ACMG.
therapies for persons with rare genetic
Addressing challenges in the
diagnosis and treatment of rare
genetic diseases
The past 5 years have seen an
It is clear that the grand challenges for
unprecedented rate of discovery of
rare diseases will require international
genes that cause rare diseases and with
cooperation and stakeholder
Kym M. Boycott &
it a commensurate increase in the
engagement at an unprecedented level.
Diego Ardigó
number of diagnosable but nevertheless
The International Rare Diseases
untreatable disorders. Here, the authors
Research Consortium (IRDiRC), a
discuss the increasing opportunity for
partnership of nearly 50 organizations
diagnosis and therapy of rare diseases
in 18 nations (with combined yearly
and how to tackle the associated
funding of US$2 billion) is attempting
just this with its recently announced
strategic goals for 2017–2027:
diagnosis of rare disease patients within
1 year or entrance into a globally
coordinated diagnostic and research
pipeline; 1,000 new therapies, chiefly
for diseases with no approved
treatment; and the introduction of
methods for assessing the impact of
diagnoses and therapies on patient
wellbeing5 . The success of such
initiatives will be crucial for leveraging
the increasing opportunity for diagnosis
and therapy of rare diseases.
Phenogenon: Gene to
phenotype associations for rare
genetic diseases
As high-throughput sequencing is
In conclusion, authors have developed a
increasingly applied to the molecular
statistical tool, Phenogenon, to detect
Nikolas Pontikos ,
diagnosis of rare Mendelian disorders, a
and visualise “gene—HPO—MOI”
Cian Murphy ,
large number of patients with diverse
relationships. The approach has
Ismail Moghul ,
phenotypes have their genetic and
suggested some strong candidate
Gavin Arno,
phenotypic data pooled together to
relationships and correctly recapitulated
Kaoru Fujinami,
uncover new gene-phenotype relations.
existing relationships. The adoption of
Yu Fujinami,
Authors introduce Phenogenon, a
the HPO nomenclature by large rare
Dayyanah Sumodhee,
statistical tool that combines, Human
disease sequencing projects leads us to
Susan Downes,
Phenotype Ontology (HPO) annotated
believe Phenogenon will be of
Andrew Webster,
patient phenotypes, gnomAD allele
increasing utility in understanding
Jing Yu ,
population frequency, and Combined
gene-phenotype-MOI relationships as
UK Inherited Retinal
Annotation Dependent Depletion
genetics is phased into routine NHS
Dystrophy Consortium,
(CADD) score for variant pathogenicity,
Phenopolis Consortium
in order to jointly predict the mode of
inheritance and gene-phenotype
New Diagnostic Approaches
for Undiagnosed Rare Genetic
Accurate diagnosis is the cornerstone of
These are promising times for the RGD
medicine; it is essential for informed
community; never before has the
care and promoting patient and family
prospect of identifying a molecular
well-being. However, families with a
diagnosis for all RGD patients been so
rare genetic disease (RGD) often spend
attainable. To overcome this grand
more than five years on a diagnostic
challenge, however, will require global
odyssey of specialist visits and invasive
cooperation at an unprecedented scale.
testing that is lengthy, costly, and often
We need to ensure that all undiagnosed
futile, as 50% of patients do not receive
RGD patients are identified and seen by
a molecular diagnosis. The current
clinicians with expertise in RGD
diagnostic paradigm is not well
diagnosis, have access to appropriate
designed for RGDs, especially for
genetic testing, and are offered
patients who remain undiagnosed after
opportunities to share their phenotypic
the initial set of investigations, and thus
data, genotypic data, and biological
requires an expansion of approaches in
samples with researchers; only then will
the clinic. Leveraging opportunities to
we be able to understand the cause of
participate in research programs that
each and every RGD. The IRDiRC's
utilize new technologies to understand
goal over the next 10 years—for each
RGDs is an important path forward for
RD patient to receive a diagnosis within
patients seeking a diagnosis. Given
one year of coming to medical
recent advancements in such
attention—will be achievable if we put
technologies and international
the well-being of the patients and their
initiatives, the prospect of identifying a
families at the center of everything that
molecular diagnosis for all patients with
we do.
RGDs has never been so attainable, but
achieving this goal will require global
cooperation at an unprecedented scale.
KDBI: Kinetic Data of
Bio-molecular Interactions
authors introduce a new database of
KDBI currently contains 8273 entries
Kinetic Data of Bio-molecular
of biomolecular binding or reaction
Interactions (KDBI) aimed at providing
events. There are a total of 1380
experimentally determined kinetic data
proteins, 143 nucleic acids and 1395
of protein–protein, protein-RNA,
small molecules included in the
protein-DNA, protein-ligand,
database. Work is underway to collect
RNA-ligand, DNA-ligand binding or
kinetic data published in earlier years,
reaction events described in the
which is expected to significantly
increase the number of entries in the
database. Rapid advances in proteomics
pathways and networks are expected
to stimulate more interest in the
quantitative aspects of biomolecular
interactions including kinetic data . The
availability of increasing amount of
kinetic data can better serve the need
for mechanistic investigation,
quantitative study and simulation of
cellular processes and events.
MvirDB—a microbial
database of protein toxins,
virulence factors and
antibiotic resistance genes for
bio-defence applications
To facilitate rapid identification of
MvirDB is a centralized resource (data
sequences and characterization of genes
warehouse) comprising all publicly
for signature discovery, authors have
accessible, organized sequence data for
collected all publicly available (as of
protein toxins, virulence factors and
this writing), organized sequences
antibiotic resistance genes. Protein
representing known toxins, virulence
entries in MvirDB are annotated using a
factors, and antibiotic resistance genes
high-throughput, fully automated
in one convenient database, which they
computational annotation system;
believe will be of use to the bio-defense
annotations are updated periodically to
research community
ensure that results are derived using
current public database and
open-source tool releases. Tools
provided for using MvirDB include a
web-based browser tool and blast