Functionality and requirements.
Initial proposal.
Maciej Janik
Serves as a storage for files with experiment results, item descriptions, ontologies, etc.
Data is heterogenous.
Amount of data will be massively increasing.
Hierarchical storage (as file system) is not enough – users need to add annotations.
Users should be able to find interesting data
(relatively) easy.
What we expect from Glyco Vault?
Insert file to GV
Get file from GV
Delete specific file from GV
Annotate with metadata
Get file annotations
Find file(s)
Basic
ID (unique)
Name
Date created
User (owner)
File type (InForma)
Modification record
…
Specific (extended)
Machine used for experiment
Associated workflow
Names of used samples
Version number
Other linked files
…
File
ID: GV-123
Name: ABC.txt
Created: 2007-02-13
Owner: Will
Machine: PCR
Tissue: Parrot
Version: 9
…
We cannot limit properties to currently known – schema must be flexible to accommodate future changes.
GV-123
ID
2007-02-13
Creation date
Will
Owner
Binaries File
Name
ABC.txt
Version
9
Machine
PCR
Tissue
Parrot
Ontology with File-centric view. File is connected to its properties and other Files
Prepare samples
Run PCR machine
PCR configuration
Machine description
(use of resource)
Get raw data
Statistical
Analysis
Final result
Intermediate analytic data
Description of samples
Well assignment
Person preparing
Date/time of experiment
Raw results (binaries)
Raw results
(values with annotation)
Final values of fluorescence
Assigned to specific genes
Descriptions of used protocols and produced data
Attributes and relations defined by users (not captured by other ontologies)
ProPreO
Core descriptions for all stored files
(id, name, user, date …)
Processes
Data provenance
Machines, etc.
ProtocolSchema
ProtocolSchema
ProtocolSchema
CoreFileOnt
UserDefined attributes
GlycO and others …
Data-specific ontologies
Insert simple file with its type
FileID insert(Object aBinaries,
InForma aType)
Manage annotations
boolean annotate(FileID aFile,
Annotation aMetadata)
boolean removeAnnotation(FileID aFile,
Annotation aMetadata)
Annotations
Relation Resource (object)
Relation Literal (value)
Use of PropreO ontolgy (in some part)
Add standard, independent file metadata
Get raw data file from GV
Object getFile(FileID aFile)
return binaries associated with file stored in
Glyco Vault
Get file descriptions
Annotation[] describe(FileID aFile)
return all direct annotations of a file to display for user
Get file in specific format using InForma transformation
should be external to GV
Find file by annotations
In simple form, describe constraints directly associated with file
FileID[] find(AnnotationConstraint[] aConstraints)
Annotation constraint includes relation, value object and/or filter on value
SPARQL queries with constraints in file ontology for complex cases
GUI query builder/wizard with file metadata
Search is essential part of system and most complicated one.
Efficient (DB/KB performance)
Easy for users (GUI, Web)
Browsing methods should be also added for easer navigation
Idea:
Faceted type browsing for files, eg.
http://e-culture.multimedian.nl/demo/facet
Use of Leon’s tool for browsing files and their descriptions
FileID insert(Object aBinaries, InForma aType) boolean annotate(FileID aFile,
Annotation aMetadata) boolean removeAnnotation(FileID aFile,
Annotation aMetadata)
Object getFile(FileID aFile)
Annotation[] describe(FileID aFile)
FileID[] find(AnnotationConstraint[] aConstraints) boolean delete(FileID aFile)
Type: Annotation
Type: AnnotationConstraint
Access to file ontology is needed to construct proper annotation
Use of defined resources, eg. PCR Machine
Creation of resources
Specific instances
Classes and hierarchies
Listing available types and/or formats for annotations
Possible access:
Choice from pre-populated list
Result of ‘can query’
Suggest as you type (Lucene index)
List of classes with attributes and individuals
(LSDIS library)
File system uses ontology, so administrators must be able to modify existing ontology to handle new types of annotations
Maybe creation of some classes and instances can be pushed to specific users?
During development easiest is to use direct ontology access
Direct insert/delete of triples
Use of SPARQL
Later some GUI interface would be required
SPARQL
JDBC read only
Oracle platform
Jena
Ont Management
D2R server
SQL experiment tables full JDBC
(administrator)
Data (bin)
Data (ont)
API
Access via JDBC and SPARQL
Ontologies will have a pointer to SQL data, can be a query, how to get real values
(?) Data will be duplicated between SQL tables and ontology.
What should be granularity of ontology data?
JDBC access is read only for consistency purposes
Modification allowed only for administrators