PowerPoint

advertisement
Annotating SABIO-RK:
Integration of MIRIAM and SBO
Martin Golebiewski
Scientific Databases and Visualization Group
EML Research, Heidelberg
2nd BioModels.net Training Camp
13-15 th of January 2007, Manchester, UK
Why we have developed SABIO-RK ?
•
Biochemical model simulations need experimental reaction kinetics data
•
Kinetic parameter values highly depend on environmental conditions
(temperature, pH, concentrations of reactants and modifiers, etc.)
•
Enzyme characteristics vary between organisms, tissues and cellular locations
•
Kinetic parameters are only interpretable with their corresponding kinetic laws
•
Most databases do not link experimental kinetic data for single reactions to
complete sets of information comprising all the information mentioned above
•
Data must be easily accessible and interchangeable (data export for exchange)
We aimed at creating a database that collects and standardizes kinetic data,
relates the data to its biochemical, environmental and experimental context,
cross-links corresponding data and associates it with external resources
to make the data comparable and accessible in standard formats
Database population and access
SABIO-RK
 Merges information about biochemical reactions and pathways mainly collected
from other databases (e.g. KEGG) with corresponding kinetic data manually
extracted from literature (including the environmental context)
 Is curated manually, assisted by semi-automatic tools (e.g. lists of values)
 Unifies, systematically structures and interrelates the data
 Can be accessed through a web-based user interface and through web-services
 Supports export of the data in SBML for exchange
 Links entities and expressions to complementary databases and ontologies
Database population: data extraction
Data source:
• Kinetic data contained in
publications
• Text with non-local, highly
scattered information
• Tables, Formula, Graphs,
Pictures
• Some information is only
noted as reference
Problems:
• No 1:1 relation between the
paper and the input mask!
• No controlled vocabulary
(e.g. different names of one
compound or enzyme)
 fuzziness of descriptions
Full-text publication
SABIO-RK input interface
Problems in the database population
Missing or only partial information in the data source:
- Incomplete reactions (products not mentioned)
- Assay conditions missing or reference to another paper
- Kinetic law equation (or fitting equation) not described
Multiplicity of kinetic law types:
no real standard used in publications (or even available, except SBO)
 varying notations referring to several kinetic theories
Parameter units:
- Multiple definitions (e.g. Katal or Unit for enzyme activities)
- Different compositions (e.g. µmol/s or µmol/(s*mg) for Vmax)
- Wrong parameter unit (e.g. 1/s for Vmax)
Identification of compounds, reactions and enzymes:
- Ambiguous descriptions of chemical compounds or enzymes (e.g. missing
stereochemical information for stereoisomers, simplifying trivial names, ...)
Data integration problems
e.g. Parameter units:
=nmol/(min*mg)
=U/mg
1 U = the amount of enzyme which
catalyses the transformation of 1 µmol of
the substrate per minute under standard
conditions
Annotations and controlled vocabularies
Infosource
• PubMed ID
• title
• authors
• journal
Environment
• buffer
• pH
• temperature
SBML Unit
defined as
Annotations to
external
resources
Controlled
vocabulary
Unit
determined under
from a
General
Information
• organism
 NCBI-ID
• tissue
• pathway
• comments
Protein complex
• UniProt IDs
Kinetic Law
• type  SBO
• equation
parameter units
belongs
to
for a
reported
for
Reaction
• stoechiometry
• EC classification
• enzyme variant
Kinetic Parameter
• name
• type (e.g. Km, kcat) SBO
• value (range)
• standard deviation
• comment
• SBO-ID
catalyzes
Compound
• recommended name
• synonymic names
• IDs in external databases
(e.g. KEGG, ChEBI)
• additional information
corresponding species
participate in
refers
to
Reactant, Modifier (Species)
• compound name (given in publication)
• role (e.g. substrate, inhibitor)  SBO
• cellular location  Gene Ontology
• comments (modifications etc.)
Annotations of entities in SABIO-RK
Annotations shown to the user:
 Chemical compounds to KEGG compound and ChEBI
 Enzymatic activities to Expasy, KEGG, IntEnz, IUBMB and Reactome
(query links in the user interface based on the enzyme classification EC)
 Enzyme protein complexes to UniProt/Swiss-Prot
 Cellular locations (compartments etc.) to Gene Ontology (as query link)
 Publications (data sources) to PubMed
Annotations integrated in SABIO-RK, not yet implemented for the output:
 Organisms to NCBI taxonomy
 Kinetic law types and parameter types to SBO (Systems Biology Ontology)
 Species role (substrate, product, modifier, etc.) to SBO
 Reactions to KEGG reactions
More annotations following the MIRIAM standard are planed ...
Controlled vocabularies in SABIO-RK
- To unambiguously identify entities or terms
- Facilitate the search, interpretation and comparison of the data
- Permits a matching with other database resources based on shared vocabulary
- Facilitate the integration of different database entries into kinetic models
Lists of values (LOV) in the input interface:
 Species (compounds) and species roles (e.g. substrate, product, modifier …)
 Biochemical reactions and pathways
 Organisms (NCBI taxonomy), tissues and cellular locations
 Kinetic law types (e.g. ‚Competitive inhibition‘ or ‚Sequential ordered Bi Bi‘)
 Parameter types (e.g. Km‚ kcat, Vmax, Ki, Kd, rate constant, pH, pK ...)
 Parameter units (e.g. mM, µM, 1/s, nmol/min, U/(h*mg) ...)
 Corresponding species for kinetic parameters (like for Km, Ki or concentrations)
Other notation standards in SABIO-RK
Semi-controlled notation standards:
- Kinetic law equation (analyzed for mathematical correctness when entered)
- Enzyme variants (e.g. wildtype, mutant E540K, wildtype isoenzyme PFKL ...)
- Protein complex of the enzyme: e.g. (Q6UG02)*4 for a hometetramer
- Recombinant enzymes: e.g. ‚expressed in Escherichia coli BL21(DE3)’
- Buffer composition in the experimental setup
Controlled vocabularies in SABIO-RK
List of values (LOV)
SABIO-RK input interface
Identifying chemical compounds
Every chemical compound can have multiple synonymic descriptions
e.g.:
Trivial name and systematic chemical description
Valproic acid = 2-Propylpentanoic acid
Different parts of the molecule could be considered as lead structure
Acetyl phenol = Phenylacetate
Abberrant order of the substituents of a lead structure (prefixes)
2-Amino-6-methyl-4-pyrimidol = 6-Methyl-2-amino-4-pyrimidol
Description of substituents as prefix (like amino-) or suffix (like –amine)
1-(4-Iodo-2,5-dimethoxyphenyl)-2-aminopropane = 1-(4-iodo-2,5-dimethoxy-phenyl)propan-2-amine
3,17-Dioxoandrost-4-ene = 4-Androstene-3,17-dione
Different nomenclature systems (e.g. abberrant order of the morphems)
2-Amino-6-methyl-4-pyrimidol
2-Methylpropan-2-ol
= 2-Amino-6-methylpyrimidin-4-ol
= 2-Hydroxy-2-methyl-propane
Normalization of compound names
Goals:
• Comparing and linking databases with names of chemical compounds,
i.e. synonym detection disregarding orthographic and (minor) morphosyntactic variance in naming
• Matching chemical compound names against existing synonym lists
(e.g. ChEBI, PubChem) to identify synonyms with differences in naming
not arising from orthographic variations, like trivial names and
systematic names.
Normalization of compound names
CompoundID: 10296
IUPAC Name: 2-phenylpropanoic acid
ID
NAME
20986
alpha-Phenylpropionate
Normalized Name:
alpha-phenylpropionate
Canonical SMILES:
CC(C1=CC=CC=C1)C(=O)O
Synonyms
Hydratropic acid
2-Phenylpropionic acid
2-Phenylpropanoic acid
alpha-Phenylpropioic acid
alpha-Methylphenylacetic acid
.alpha.-Phenylpropionic acid
alpha-Methylbenzeneacetic acid
Benzeneacetic acid, .alpha.-methyl.alpha.-Methylphenylacetic acid
.alpha.-Methylbenzeneacetic acid
ALPHA-PHENYLPROPIONIC ACID
Benzeneacetic acid, alpha-methyl(S)-alpha-Methylbenzeneacetic acid
Benzeneacetic acid, .alpha.-methyl-, (S)Benzeneacetic acid, .alpha.-methyl-, (R)Benzeneacetic acid, alpha-methyl-, (R)Benzeneacetic acid, alpha-methyl-, (S)-
Linguistic assisted compound analysis
Systematic compound
name
Structure
Classification
Access to SABIO-RK
Available interfaces:
 Web-based user interface
for browsing and searching the data manually
 Web Services (API access)
can be automatically called by external tools, e.g. by other
databases or simulation programs for biochemical network models
Both interfaces support the export of the data in SBML
SABIO-RK user interface: Query
SABIO-RK user interface: Query result
SABIO-RK user interface: Reaction
SABIO-RK user interface: Enzyme
SABIO-RK user
interface:
database entry
with kinetic data
SBML export from SABIO-RK
SBML export from SABIO-RK
Reactions are coupled
in exported SBML files
every species is only
defined once in the
exported SBML file if
several reactions refer
to the same species
Export of layout
information in SBML
- using the SBML layout
extension
- to draw reaction maps
SABIO-RK
API access
- Integration in
simulation tools
- Cross-linking with
other databases
- Several possible
entry points
- Supports data
export in SBML
Web service
methods
Data in SABIO-RK: statistics
PubMed records:
923
Organisms
312
Pathways
90
Reactions:
9600
Enzymes
416
Measured parameters:
enzyme activities
(rate constant, kcat or Vmax )
8118
Km (Michaelis constant)
8701
Ki (inhibiton constant)
1774
as of 09/01/2007
Data in SABIO-RK: statistics
Conclusions
• SABIO-RK is a web-accessible database containing biochemical reaction
kinetics data for systems biologists and experimenters
• Merges general reaction information retrieved from external databases with kinetic data
manually extracted from literature
• Manual curation of the data with some semi-automatic support
• High degree of interrelation within the database
• Type of kinetics, modes of inhibition or activation and corresponding equations are
shown with their parameters, measured values and experimental conditions
• Access through a web-based user interface or through web services (API)
• Export of the data in SBML from both interfaces
• Controlled vocabulary used and content annotated to ontologies and external resources
Future goals
• Information about detailed reaction mechanisms (elementary reaction steps)
• Expansion of the data export functions (more data, more annotations)
• Tools for information extraction and data integration
• Expand the usage of annotations and controlled vocabularies
• Extension of the database model to store signaling reactions
• Convince scientists to directly insert their kinetic data into SABIO-RK
SABIO-RK project team
and many more: students, colleagues at EML Research and other
collaborators….
Financial support:
Workshop Invitation
Workshop
Storage and Annotation of Reaction Kinetics’ Data
May 21-23, 2007
Heidelberg, Germany
http://projects.eml.org/sdbv/projects/events/workshop2007/index_html
Topics:
-
Data generation
-
Data storage and integration
-
Data annotation
-
Data usage
http://sabio.villa-bosch.de/SABIORK
Download