BEL Framework

advertisement
BEL Framework v2.0.0
August 2012
This work is licensed under the Creative Commons Attribution 3.0 Unported License.
To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/
or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain
View, California, 94041, USA.
BEL Framework Overview
• Current version 2.0.0 released June 29, 2012
– Open source
• The BEL Framework includes:
–
–
–
–
BEL Compiler
KAM store
Tools
Web and Java APIs
• API = Application Programming Interface
– Can be used by software to access information from KAMs
• KAM Navigator uses the Web API
• Whistle uses the Java API
– Web Server
Knowledge User Workflow:
BEL Framework and Applications
Application
BEL Framework
BEL Framework API
Multiple KAMs can be
imported for use by the
application
KAM Store
BEL Compiler
Encrypted
portable KAM
BEL Documents
3
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
4
Knowledge Assembly Model (KAM)
• A knowledge base in network form
• Composed of Nodes (KamNode) and Edges
(KamEdge)
• Each KamNode represents one or more BEL Terms
drawn from one or more BEL Documents
• Each KamEdge represents one or more BEL
Statements from from one or more BEL Documents
5
KamNodes
• Nodes represent one or more BEL terms
• KamNodes are coalesced wherever possible by the
equivalencing engine (Phase II)
6
KamEdges
• Represent assertions supported by one or more BEL
Statements
• Querying a KamEdge will return:
– Each BEL Statement supporting the assertion
– Assertions are coalesced based solely on semantic triple
after equivalencing, independent of Annotations
• Querying a BEL Statement will return:
– The BEL Document the statement was recorded in
– The list of assertions for the statement
7
KAM Store
• The database that stores KAMs
• Default database is Derby
– Can configure to use MySQL or other databases
• Put KAMs into the KAM Store by:
– Compiling a KAM (belc.cmd)
– Importing a KAM (tools\KamManager.cmd --import)
• Access KAMs via:
– APIs
– Exporting a KAM (tools\KamManager.cmd –export)
© 2012, Open BEL Community
8
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
9
KAMs Are Compiled from BEL Documents
• The BEL Compiler compiles one or more BEL Documents
into a Knowledge Assembly Model (KAM)
• Multi-Phase compiler/assembler:
1. Compiler – compiles each BEL Document into a proto-network
2. Equivalencer – merges proto-networks by equivalencing
analogous nodes across namespaces
3. Augmenter – increases KAM computability by injecting terms
and relationships from additional sources of prior knowledge
(e.g. relationships connecting RNAs to their corresponding
proteins)
4. Assembler – Generates final network and supporting evidence
structures
• Users can change compiler parameters to control the
knowledge assembly process
10
KAM Compilation Phases
Network Resources
Namespace &
Annotation
Tables
Compiler
Equivalence
Tables
Other Prior
Knowledge
Equivalencer
Augmentor
Final
Assembler
Compiled
KAM
BEL Documents
11
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
12
Running the BEL Compiler
• From BEL Framework folder:
– belc.cmd (Windows)
– belc.sh (Linux or OS X)
• Ensure that the server is not running
• Required:
– BEL document(s)
• Specify filename(s) with –f
• OR specify path to folder of BEL documents with -p
– KAM name
• Specify with -k
– KAM description
• Specify with –d
>belc.cmd –f myDoc.bel –k myKAM –d "my KAM description"
13
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
14
Phase I Expansions
•
•
•
•
•
•
List expansions
Inner terms
Protein modifications
Reactions
Nested statements
Reciprocal statements
15
List Expansion - hasMembers
• Phase I expands hasMembers relationships to individual
hasMember relationships
• All hasMembers relationship statements are removed
p(PFH:"AKT Family") hasMembers \
list(p(HGNC:AKT1),p(HGNC:AKT2),p(HGNC:AKT3))
becomes
p(PFH:"AKT Family") hasMember p(HGNC:AKT1)
p(PFH:"AKT Family") hasMember p(HGNC:AKT2)
p(PFH:"AKT Family") hasMember p(HGNC:AKT3)
16
List Expansion - hasComponents
• Phase I expands hasComponents relationships to individual
hasComponent relationships
• All hasComponents relationship statements are removed
complex(NCH:"IkappaB Kinase Complex") hasComponents \
list(p(HGNC:CHUK), p(HGNC:IKBKB), p(HGNC:IKBKG))
becomes
complex(NCH:"IkappaB Kinase Complex") hasComponent p(HGNC:CHUK)
complex(NCH:"IkappaB Kinase Complex") hasComponent p(HGNC:IKBKB)
complex(NCH:"IkappaB Kinase Complex") hasComponent p(HGNC:IKBKG)
17
complexAbundance Expansion
• Phase I preprocesses complexAbundance() terms
and injects individual hasComponent relationships
complex(p(HGNC:GTF2E1),p(HGNC:GTF2E2))
becomes
complex(p(HGNC:GTF2E1),p(HGNC:GTF2E2))
complex(p(HGNC:GTF2E1),p(HGNC:GTF2E2))\
hasComponent p(HGNC:GTF2E1)
complex(p(HGNC:GTF2E1),p(HGNC:GTF2E2))\
hasComponent p(HGNC:GTF2E2)
18
compositeAbundance Expansion
• Phase I preprocesses compositeAbundance() terms and
injects individual includes relationships
composite(a(CHEBI:"deoxyribonucleic acid"), a(CHEBI:"NAD(+)")) \
-> ribo(p(HGNC:PARP1))
becomes
composite(a(CHEBI:"deoxyribonucleic acid"), a(CHEBI:"NAD(+)"))
composite(a(CHEBI:"deoxyribonucleic acid"), a(CHEBI:"NAD(+)")) includes \
a(CHEBI:"deoxyribonucleic acid"),
composite(a(CHEBI:"deoxyribonucleic acid"), a(CHEBI:"NAD(+)")) includes \
a(CHEBI:"NAD(+)")
19
Inner Terms Expansion
• Phase I expands inner terms to relate abundances to
activity terms using actsIn relationships
phos(p(HGNC:DUSP1)) =| kin(p(HGNC:MAPK8))
becomes
phos(p(HGNC:DUSP1)) =| kin(p(HGNC:MAPK8))
p(HGNC:DUSP1) actsIn phos(p(HGNC:DUSP1))
p(HGNC:MAPK8) actsIn kin(p(HGNC:MAPK8))
20
Protein Modification Expansion
• Phase I expands proteinModification() sub-terms to associate
a modified protein abundance with the root protein
abundance
p(HGNC:MAPK1, pmod(P,T)) => kin(p(HGNC:MAPK1))
becomes
p(HGNC:MAPK1, pmod(P, T)) => kin(p(HGNC:MAPK1))
p(HGNC:MAPK1) hasModification p(HGNC:MAPK1, pmod(P,T))
p(HGNC:MAPK1) actsIn kin(p(HGNC:MAPK1))
21
Variant Expansion
• Phase I expands fusion(), truncation(), and substitution()
sub-terms to associate a protein variant abundance with
the parent (reference) protein abundance
p(HGNC:KRAS, sub(G,12,V))
becomes
p(HGNC:KRAS, sub(G,12,V))
p(HGNC:KRAS) hasVariant p(HGNC:KRAS, sub(G,12,V))
22
Reaction Expansion
• Phase I expands reactants() and products() reaction sub-terms to
associate the reactant and product lists with their abundances
reaction(reactants(a(CHEBI:superoxide)),
products(a(CHEBI:"hydrogen peroxide"),a(CHEBI:oxygen))
becomes
reaction(reactants(a(CHEBI:superoxide)), \
products(a(CHEBI:"hydrogen peroxide"),a(CHEBI:oxygen))
a(CHEBI:superoxide) reactantIn \
reaction(reactants(a(CHEBI:superoxide)), \
products(a(CHEBI:"hydrogen peroxide"),a(CHEBI:oxygen))
reaction(reactants(a(CHEBI:superoxide)), \
products(a(CHEBI:"hydrogen peroxide"),a(CHEBI:oxygen)) \
hasProduct a(CHEBI:"hydrogen peroxide")
reaction(reactants(a(CHEBI:superoxide)), \
products(a(CHEBI:"hydrogen peroxide"),a(CHEBI:oxygen)) \
hasProduct a(CHEBI:oxygen)
23
Nested Statement Expansion
• The compiler will automatically expand nested
statements and create additional relationships from
the subject of the statement to the object of the
nested statement
– can be turned off using the --no-statement-expansion
switch
24
Default Nested Statement Expansion
• Phase I expands nested statements to link the subject of
the statement to the object of the nested statement
• The original statement is preserved as supporting
evidence for the derived assertions
p(HGNC:CLSPN) -> (kin(p(HGNC:ATR)) => p(HGNC:CHEK1, pmod(P)))
becomes
p(HGNC:CLSPN) -> p(HGNC:CHEK1, pmod(P))
kin(p(HGNC:ATR)) => p(HGNC:CHEK1, pmod(P))
p(HGNC:ATR) actsIn kin(p(HGNC:ATR))
p(HGNC:CHEK1) hasModification p(HGNC:CHEK1, pmod(P))
25
Modified Nested Statement Expansion
• When the –no-statement-expansion switch is set, the
compiler will instantiate the subject of the statement
and expand the nested statement but not couple the
two together.
(kin(p(HGNC:ATR))
=> p(HGNC:CHEK1, pmod(P)))
• p(HGNC:CLSPN)
The original->statement
is removed
becomes
kin(p(HGNC:ATR)) => p(HGNC:CHEK1, pmod(P))
p(HGNC:CLSPN)
p(HGNC:ATR) actsIn kin(p(HGNC:ATR))
p(HGNC:CHEK1) hasModification p(HGNC:CHEK1, pmod(P))
26
Reciprocal Statement Expansion
• All KAM edges are directed
• Non-directed BEL relationships (positiveCorrelation,
negativeCorrelation, association) are expanded to be
expressed in both directions:
r(HGNC:IL8) positiveCorrelation path(MESHD:"Lung Neoplasms")
becomes
r(HGNC:IL8) positiveCorrelation path(MESHD:"Lung Neoplasms")
path(MESHD:"Lung Neoplasms") positiveCorrelation r(HGNC:IL8)
27
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
28
Phase II Equivalences
• Nodes are equivalenced based on:
– Namespace value UUID
• In .beleq resource file
– Equivalent unordered list
• complexes, composites, rxns
29
The BEL Framework Manages Equivalences
Between External IDs
• Equivalences between terms from different vocabularies are
provided to the BEL compiler
– AKT3 in the HGNC namespace and Entrez Gene ID 10000 refer to the
same gene
– p(HGNC:AKT3) and p(EG:10000) coalesce to a single node in a KAM
• Selection of preferred namespaces “Dialect” slated for future
30
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
31
Phase III Augmentations
•
•
•
•
Gene Scaffolding
Protein Families
Named Complexes
Orthology
32
Network Augmentation Order
Protein
Family
Inclusion
Named
Complex
Inclusion
Optional
Stages
Protein
Family
Expansion
Named
Complex
Expansion
Gene
Scaffolding
Basic Stages
33
Orthology
Gene Scaffolding
• Default behavior is to insert p(), r(), and g() nodes
and corresponding edges wherever a protein, rna, or
gene abundance term is detected
• The compiler will only insert missing nodes and
edges
– Can be turned off with the --no-gene-scaffolding switch
p(HGNC:KRAS, sub(G, 12, V)) -> path(MESHD:Neoplasms)
becomes
p(HGNC:KRAS, sub(G, 12, V)) -> \
path(MESH:Neoplasms)
p(HGNC:KRAS) hasVariant \
p(HGNC:KRAS, sub(G, 12, V))
r(HGNC:KRAS) >> p(HGNC:KRAS)
g(HGNC:KRAS) :> r(HGNC:KRAS)
34
Protein Family Expansion
• The compiler will automatically include protein
family members when a protein family term is
identified
– Can be turned off using the --no-protein-families switch
• The compiler can also search for protein families to
include when a protein family member is identified
– Can be enabled using the --expand-protein-families switch
• The compiler will automatically connect protein
family activity terms with the corresponding family
member activity terms
35
Protein Family Example 1
(Default Behavior)
p(HGNC:KRAS, sub(G,12,D)) -> kin(p(PFH:"MAPK JNK Family"))
becomes
p(HGNC:KRAS, sub(G,12,D)) -> kin(p(PFH:"MAPK JNK Family"))
p(HGNC:KRAS) hasVariant p(HGNC:KRAS, sub(G,12,D))
p(PFH:"MAPK JNK Family") actsIn kin(p(PFH:"MAPK JNK Family"))
p(PFH:"MAPK JNK Family") hasMember p(HGNC:MAPK8)
p(PFH:"MAPK JNK Family") hasMember p(HGNC:MAPK9)
p(PFH:"MAPK JNK Family") hasMember p(HGNC:MAPK10)
Gene scaffolding will also be added to p(HGNC:KRAS) ,
p(HGNC:MAPK8), p(HGNC:MAPK9), and p(HGNC:MAPK10)
36
Protein Family Example 2
(Default Behavior)
kin(p(HGNC:AKT1)) -> p(HGNC:RELA)
kin(p(PFH:"AKT Family")) =| bp(MESHPP:Apoptosis)
becomes
kin(p(HGNC:AKT1)) -> p(HGNC:RELA)
kin(p(PFH:"AKT Family")) =| bp(MESHPP:Apoptosis)
p(HGNC:AKT1) actsIn kin(p(HGNC:AKT1))
p(PFH:"AKT Family") actsin kin(p(PFH:"AKT Family"))
p(PFH:"AKT Family") hasMember p(HGNC:AKT1)
p(PFH:"AKT Family") hasMember p(HGNC:AKT2)
p(PFH:"AKT Family") hasMember p(HGNC:AKT3)
kin(p(HGNC:AKT1)) isA kin(p(PFH:"AKT Family"))
Gene scaffolding would then be applied to p(HGNC:AKT1),
p(HGNC:AKT2), p(HGNC:AKT3), and p(HGNC:RELA)
37
Protein Family Example 3
(--expand-protein-families enabled)
kin(p(HGNC:AKT1)) -> p(HGNC:RELA)
becomes
kin(p(HGNC:AKT1)) -> p(HGNC:RELA)
p(HGNC:AKT1) actsIn kin(p(HGNC:AKT1))
p(PFH:"AKT Family") hasMember p(HGNC:AKT1)
p(PFH:"AKT Family") hasMember p(HGNC:AKT2)
p(PFH:"AKT Family") hasMember p(HGNC:AKT3)
Gene scaffolding would then be applied to p(HGNC:AKT1),
p(HGNC:AKT2), p(HGNC:AKT3), and p(HGNC:RELA)
38
Named Complex Expansion
• The compiler will automatically include named
complex components when a named complex
member is identified
– can be turned off using the --no-named-complexes switch
• The compiler can also search for named complexes
to include when a named complex member is
identified
– Can be enabled using the --expand-named-complexes
switch
39
Named Complex Expansion
(Default Behavior)
kin(complex(NCH:"IkappaB Kinase Complex")) => \
p(HGNC:NFKBIA, pmod(P,S,32))
becomes
kin(complex(NCH:"IkappaB Kinase Complex")) => \
p(HGNC:NFKBIA, pmod(P,S,32))
complex(NCH:"IkappaB Kinase Complex") actsIn \
kin(complex(NCH:"IkappaB Kinase Complex"))
p(HGNC:NFKBIA) hasModification p(HGNC:NFKBIA, pmod(P, S, 32))
complex(NCH:"IkappaB Kinase Complex") hasComponent p(HGNC:CHUK)
complex(NCH:"IkappaB Kinase Complex") hasComponent p(HGNC:IKBKB)
complex(NCH:"IkappaB Kinase Complex") hasComponent p(HGNC:IKBKG)
Gene scaffolding would then be applied to p(HGNC:CHUK) ,
p(HGNC:NFKBIA), p(HGNC:IKBKB), and p(HGNC:IKBKG)
40
Contents
• KAMs and the KAM store
• BEL Compiler
–
–
–
–
Running the BEL Compiler
Phase I - Compiler Expansions
Phase II - Equivalencing
Phase III – Compiler Augmentations
• BEL Framework Tools
41
BEL Framework Tools
• Found in the “tools” folder of the BEL Framework
• Two versions for each:
– .cmd
– .sh
(Windows)
(Linux, OS X)
• KamManager
– Use with –h to get full options list
– list KAMs in KAM store, export KAM to XGMML, delete KAM
• BelCheck
– check BEL document validity
• DocumentConverter
– convert between BEL script and xbel formats
• CacheManager
– Manage cached resources
42
Download