Supplementary Information (doc 873K)

advertisement
Wang et al.
Text summary
Supplementary information: Content of database; Querying, searching and visualization;
Data sources; Implementation; Supplementary Figure 1 and 2.
Content of the database
According to PubMed database, we collected the literatures published before September
2012. Based on keywords combination, we have automatically screened thousands of
abstracts and full-text articles by in-house scripts. The relevant hits were further inspected
manually. In total, more than 370 literatures were reviewed and 807 lncRNA-associated,
229 miRNA-associated, 13 piRNA-associated and 100 snoRNA-associated entries for a
total of 1149 curated entries were documented for three mammals (866 Homo
sapiens-associated, 251 Mus musculus-associated and 32 Rattus norvegicus-associated
entries) (Table 1). Among these ncRNA-disease entries, it contained 224 non-redundancy
lncRNAs, 100 non-redundancy miRNAs, 2 non-redundancy piRNAs and 8
non-redundancy snoRNAs associated with 175 disease terms. In current version of
MNDR, each entry contains detailed information on a ncRNA–disease relationship,
including RNA category, species, ncRNA symbol, disease, tissue, interaction gene
symbol, expression direction of ncRNA, a literature reference and detailed description. To
facilitate researchers accessing information from external resources, we linked lncRNAs
to the lncRNAdb database (www.lncrnadb.com/) or Functional lncRNA database
(www.valadkhanlab.org/database/);
miRNAs
to
the
miRBase
database
(www.mirbase.org);
and
snoRNAs
to
the
sno/scaRNAbase
(bioinfo.fudan.edu.cn/snoRNAbase.nsf)
or
snoRNA-LBME-db
database
(www-snorna.biotoul.fr). Correspondingly, functional data of the interaction genes were
also linked to commonly used resources NCBI gene (www.ncbi.nlm.nih.gov/gene). Users
can efficiently retrieve plenty of genomic and disease associated data via linking to these
external resources. MNDR also welcomes researchers to submit experimentally identified
novel ncRNA–disease relationships. In addition, all the ncRNA-disease relationships can
be downloaded directly in the Excel format.
Querying, searching and visualization
MNDR provides an interface for convenient retrieving of all relationships between
diverse ncRNAs and diseases. Users can browse and obtain full lists of ncRNAs involved
in any given diseases through three paths. Path1: users can browse the ncRNA-disease
relationships by selecting associated options of RNA categories, species and RNA
symbols. Users can obtain a list of ncRNA-disease relationships for any combination of
RNA categories, species and RNA symbols. Alternatively, researchers can also get
ncRNA-disease relationships based on a specified disease term (Path2) or tissue (Path3).
The main table of result contains RNA categories, species, RNA symbols, disease, tissue,
PMID and detail. When clicking the “detail” link in each record, users can access to more
specific information such as interaction gene symbol, expression of ncRNA, detection
method of ncRNA expression, PMID, reference title and detailed description. To help
users to observe the ncRNA-mediate interaction network in disease conditions, MNDR
also provides visualization functionality, where the global ncRNA-mediate disease
network in three mammals can be rapidly and independently represented by embedding
interactive networks with Cytoscape Web (cytoscapeweb.cytoscape.org/). Multiple data
resources can be combined in a single visualization in each of three mammals. Since the
compelling visualization architecture is pan-and-zoom, users can observe specific
Mammalian ncRNA-disease repository(MNDR))
diseases associated ncRNAs within the global ncRNAs interaction network.
Data sources
In order to collect all available ncRNAs symbols, we have firstly integrated three major
types of ncRNAs: lncRNA symbols collected from the lncRNAdb1 and Functional
lncRNA database, 2 miRNA symbols collected from the mirBase, 3 snoRNAs symbols
collected from the sno/scaRNAbase4 and snoRNA-LBME-db.5 Because the research for
other ncRNAs including promoter-associated small RNAs(PASRs), PIWI-interacting
RNAs (piRNA), promoter upstream transcripts (PROMPTs), transcription initiation
RNAs (tiRNA) and TSS-associated RNAs (TSSa-RNAs), etc6 is still in its infancy, we
searched the PubMed database by using these ncRNA category names to replace specific
ncRNA symbols. The list of disease terms were collected according to the MeSH
(Medical Subject Headings) vocabularies that are created and maintained at the National
Library of Medicine. In order to reduce the challenge of manual curation, we’ve written
scripts to search in all abstract and full-text articles in the PubMed database for the
following keywords combinations: (each ncRNA symbol or ncRNA category names) and
(each species: Homo sapiens, Mus musculus and Rattus norvegicus) and (each disease
name). Since mir2disease database has provided a resource of disease-associated
miRNAs in human, we have not integrated such information in the MNDR database.
Implementation
The MNDR database runs in a window environment, it was implemented by using HTML
and PHP language, a widely-used general-purpose scripting language for web
development. The interface component consists of the web pages designed and
implemented in HTML/CSS. It has been tested in some web browsers, such as Google
Chrome, Safari, Mozilla Firefox and Internet Explorer.
Supplementary Figure:
Supplementary Figure 1. The biggest sub-network of ncRNA-associated disease
network based on MNDR database. Trianglar, square, diamond and circular nodes
represent lncRNAs, miRNAs, snoRNAs and protein-coding genes respectively
s
Wang et al.
Supplementary Figure 2. The biggest sub-network of ncRNA-associated disease
network by integrating MNDR database and mir2disease data. Trianglar, square,
diamond and circular nodes represent lncRNAs, miRNAs, snoRNAs and protein-coding
genes respectively
Supplementary information is available at cell death disease website.
References
1.
2.
3.
4.
5.
6.
Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS. lncRNAdb: a
reference database for long noncoding RNAs. Nucleic acids research 2011,
39(Database issue): D146-151.
Niazi F, Valadkhan S. Computational analysis of functional long noncoding RNAs
reveals lack of peptide-coding capacity and parallels with 3' UTRs. Rna 2012,
18(4): 825-843.
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for
microRNA genomics. Nucleic acids research 2008, 36(Database issue):
D154-158.
Xie J, Zhang M, Zhou T, Hua X, Tang L, Wu W. Sno/scaRNAbase: a curated
database for small nucleolar RNAs and cajal body-specific RNAs. Nucleic acids
research 2007, 35(Database issue): D183-187.
Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human
H/ACA and C/D box snoRNAs. Nucleic acids research 2006, 34(Database issue):
D158-162.
Esteller M. Non-coding RNAs in human disease. Nature reviews Genetics 2011,
12(12): 861-874.
Download