BEL Framework Resources (namespaces, equivalences, documents) August 2012 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. 1 Overview • The BEL Framework accesses files during compilation – For checking and equivalencing namespace values – For augmenting the KAM to increase connectivity • A set of resource files are maintained by Open BEL • Flexible - can be substituted or augmented with user provided documents 2 Contents • Resource locations • Resources – – – – Namespaces Equivalences Annotations BEL Documents • Creating and Using Custom Namespaces Resource Locations • Resources provided by the BEL Framework can be found here: – http://resource.belframework.org/belframework/1.0/ • Can also be downloaded from GitHub – https://github.com/OpenBEL/openbel-framework-resources 4 The BEL Framework Configuration Includes a Resource Index • Provides locations for namespace, equivalence, and augmentation documents • Can use default or modify to use with custom namespaces, equivalences, etc. • Default Resource Index: – http://resource.belframework.org/belframework/1.0/index.xml 5 Contents • Resource locations • Resources – – – – Namespaces Equivalences Annotations BEL Documents • Creating and Using Custom Namespaces BEL Namespaces • OpenBEL supports 32 namespaces for: – – – – – genes/RNAs/proteins protein families named complexes biological processes chemicals • Namespace documents (.belns) have a specific format – Include entity encodings to enforce BEL function semantics • Users can provide custom namespaces 7 Supported Namespaces • Genes, RNAs, microRNAs, proteins (6 namespaces) – – – – – – Entrez Gene Ids (human, mouse, and rat only) HUGO gene symbols MGI gene symbols RGD gene symbols SwissProt accession numbers SwissProt names • Affymetrix Probe Sets (9 namespaces) – Human, mouse, and rat probe set identifiers • Protein families (3 namespaces) – Selventa Protein Families (human, mouse, rat) 8 Supported Namespaces • Chemicals (3 namespaces) – ChEBI names – ChEBI Ids – Selventa legacy chemicals • Biological processes and pathologies (5 namespaces) – – – – – GO names GO Ids MeSH Phenomena and Processes [G] MeSH Diseases [C] Selventa legacy diseases 9 Supported Namespaces • Named Complexes (5 namespaces) – Selventa Named Complexes (human, mouse, rat) – GO Cellular Components names – GO Cellular Components Ids • Cellular locations (3 namespaces) – MeSH Cellular structures [A11.284] – GO Cellular Components names – GO Cellular Components Ids 10 BEL Namespace Documents • Namespaces are .belns files – Text files with header information and values • Values include encoding information – Which BEL functions are valid to apply to this entity 11 Namespace Entity Encoding Encoding Value Valid BEL Functions B bp(), path() O path() R r(), m() M m() P p() G g() A a(), r(), m(), p(), g(), complex() C complex() • Example values - HGNC namespace – A2ML1-AS1 (A2ML1 antisense RNA 1), encoded as "GR" is a valid value for a gene or RNA abundance, but not protein abundance 12 BEL Equivalence Files • A BEL Equivalence File (.beleq) is associated with each BEL namespace • Each namespace value in the equivalence file is associated with a universally unique identifier (UUID) – 32 hexadecimal digits • Values with the same UUID are equivalenced – Terms containing same functions are coalesced to a single node during compilation • Values in a namespace file are not required to be included in the associated equivalence file Example: Equivalences for MGI namespace 13 Examples of BEL Equivalencing • The following three protein abundance terms are equivalent: – p(HGNC:AKT1) • The abundance of the protein designated by HUGO gene symbol ‘AKT1’ – p(EGID:207) • The abundance of the protein designated by EntrezGene Id 207 (AKT1 Human) – p(SPAC:P31749) • The abundance of the protein designated by SwissProt Id P31749 (AKT1 Human) 14 Examples of BEL Equivalencing • The following two biological process terms are equivalent: – bp(MESHPP:apoptosis) • The biological process designated by the MESH Phenomena and Processes heading ‘apoptosis’ – bp(GOID:0006915) • The biological process designated by the GO Id 0006915 (apoptotic process) 15 BEL Annotations • BEL Annotations and BEL Terms are completely separate • Annotations are associated with BEL Statements to express context information about the statement – Source of the knowledge • Citation, Evidence – Biological system • Cell line, Body part, Species • 22 Annotation Types are provided with the BEL Framework – 2 reserved types: Citation and Evidence – 20 additional defined by .belanno documents • Additional Annotation Types can be defined by user – Require unique name within BEL document and domain of allowable values (as list or .belanno document) or regular expression 16 Annotations Can Be Applied to Individual BEL Statements or Groups of Statements Source: PMID 1234567 Cell Type: Fibroblast Causal relationships demonstrated in lung fibroblasts, reported in PMID 1234567 Tissue: Lung kin(p(X)) increases p(Z); p(X) increases r(Y); Cell Type: Endothelial Cell Causal relationship demonstrated in liver endothelial cells , reported in PMID 1234567 Tissue: Liver p(X) increases r(Y); 17 Each Statement is distinct: These Statements have different sets of contexts Citation Annotation Format • The Citation annotation is composed of a comma separated list containing up to 6 fields. – SET Citation = {"PubMed","Cell","16962653","2006-10-07","Jacinto E|Facchinetti V|Liu D|Soto N|Wei S|Jung SY|Huang Q|Qin J|Su B",""} Field 1 2 3 4 5 6 Required Contents Yes Type of Citation. This is one of the following strings “Book”, “PubMed”, “Journal”, “Online Reference”, or “Other” Yes Name of the Citation. This is typically the journal reference or book name. Yes Reference. This is an identifier that can be used to link to the citation. For books this is usually the ISBN number, for PubMeds this would be the PubMed ID and for other types it could be a URL pointing to the reference such as Wikipedia page. No Date of publication in ISO8061 format (YYYY-MM-DD). No Authors. This is a “|” delimited list of authors for the reference. No Comments. This is optional information such as an abstract that can be stored along with the reference. Limit is 4000 characters. 18 BEL Resource Documents • BEL Resource Documents are used in compilation Phase III for network augmentation – BEL documents – Relevant assertions are identified and added to the network • Include: – Gene Scaffolding • • g(EG:123) transcribedTo r(EG:123) r(EG:123) translatedTo p(EG:123) – Protein Family membership • p(PFH:"AKT Family") hasMembers list(p(HGNC:AKT1), p(HGNC:AKT2), p(HGNC:AKT3)) – Named Complex components • complex(NCH:"9-1-1 Complex") hasComponents list(p(HGNC:HUS1),p(HGNC:RAD1),p(HGNC:RAD9A)) – Orthology (2.0.0 and future) 19 Contents • Resource locations • Resources – – – – Namespaces Equivalences Annotations BEL Documents • Creating and Using Custom Namespaces Creating Custom Namespaces • Allows use of a vocabulary not specifically supported by the BEL Framework – Including equivalencing to other namespaces • Detailed directions can be found here: – http://openbel-framework.readthedocs.org/en/latest/tutorials/building_custom_namespaces.html • Requires: – – – – Namespace file in .belns format URL for the .belns file Customized resource index Updated BEL Framework configuration file pointing to new resource index • Optional: – Equivalence file in .beleq format 21