Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik

advertisement

Interface for Glyco Vault

Functionality and requirements.

Initial proposal.

Maciej Janik

Glyco Vault

 Serves as a storage for files with experiment results, item descriptions, ontologies, etc.

 Data is heterogenous.

 Amount of data will be massively increasing.

 Hierarchical storage (as file system) is not enough – users need to add annotations.

 Users should be able to find interesting data

(relatively) easy.

Interface functionality (basic)

What we expect from Glyco Vault?

 Insert file to GV

 Get file from GV

 Delete specific file from GV

 Annotate with metadata

 Get file annotations

 Find file(s)

Sample file properties in GV

 Basic

 ID (unique)

 Name

 Date created

 User (owner)

 File type (InForma)

Modification record

 Specific (extended)

 Machine used for experiment

 Associated workflow

 Names of used samples

 Version number

Other linked files

File description (metadata)

File

ID: GV-123

Name: ABC.txt

Created: 2007-02-13

Owner: Will

Machine: PCR

Tissue: Parrot

Version: 9

We cannot limit properties to currently known – schema must be flexible to accommodate future changes.

Ontological File System for GV

GV-123

ID

2007-02-13

Creation date

Will

Owner

Binaries File

Name

ABC.txt

Version

9

Machine

PCR

Tissue

Parrot

Ontology with File-centric view. File is connected to its properties and other Files

Protocol data in GlycoVault

Prepare samples

Run PCR machine

PCR configuration

Machine description

(use of resource)

Get raw data

Statistical

Analysis

Final result

Intermediate analytic data

Description of samples

Well assignment

Person preparing

Date/time of experiment

Raw results (binaries)

Raw results

(values with annotation)

Final values of fluorescence

Assigned to specific genes

Ontologies for annotation

Descriptions of used protocols and produced data

Attributes and relations defined by users (not captured by other ontologies)

ProPreO

Core descriptions for all stored files

(id, name, user, date …)

Processes

Data provenance

Machines, etc.

ProtocolSchema

ProtocolSchema

ProtocolSchema

CoreFileOnt

UserDefined attributes

GlycO and others …

Data-specific ontologies

Insert and annotate file

 Insert simple file with its type

 FileID insert(Object aBinaries,

InForma aType)

 Manage annotations

 boolean annotate(FileID aFile,

Annotation aMetadata)

 boolean removeAnnotation(FileID aFile,

Annotation aMetadata)

 Annotations

Relation  Resource (object)

Relation  Literal (value)

 Use of PropreO ontolgy (in some part)

 Add standard, independent file metadata

Get file and its annotations

 Get raw data file from GV

 Object getFile(FileID aFile)

 return binaries associated with file stored in

Glyco Vault

 Get file descriptions

 Annotation[] describe(FileID aFile)

 return all direct annotations of a file to display for user

 Get file in specific format using InForma transformation

 should be external to GV

Search for files

 Find file by annotations

 In simple form, describe constraints directly associated with file

FileID[] find(AnnotationConstraint[] aConstraints)

Annotation constraint includes relation, value object and/or filter on value

 SPARQL queries with constraints in file ontology for complex cases

GUI query builder/wizard with file metadata

Search for files

 Search is essential part of system and most complicated one.

 Efficient (DB/KB performance)

 Easy for users (GUI, Web)

 Browsing methods should be also added for easer navigation

 Idea:

Faceted type browsing for files, eg.

http://e-culture.multimedian.nl/demo/facet

Use of Leon’s tool for browsing files and their descriptions

Interface (initial proposal)

FileID insert(Object aBinaries, InForma aType) boolean annotate(FileID aFile,

Annotation aMetadata) boolean removeAnnotation(FileID aFile,

Annotation aMetadata)

Object getFile(FileID aFile)

Annotation[] describe(FileID aFile)

FileID[] find(AnnotationConstraint[] aConstraints) boolean delete(FileID aFile)

Type: Annotation

Type: AnnotationConstraint

File ontology access

Access to file ontology is needed to construct proper annotation

Use of defined resources, eg. PCR Machine

Creation of resources

Specific instances

Classes and hierarchies

Listing available types and/or formats for annotations

Possible access:

Choice from pre-populated list

Result of ‘can query’

Suggest as you type (Lucene index)

 List of classes with attributes and individuals

(LSDIS library)

Administrators

 File system uses ontology, so administrators must be able to modify existing ontology to handle new types of annotations

 Maybe creation of some classes and instances can be pushed to specific users?

 During development easiest is to use direct ontology access

 Direct insert/delete of triples

Use of SPARQL

Later  some GUI interface would be required

SPARQL

GlycoVault on Oracle and Jena

JDBC read only

Oracle platform

Jena

Ont Management

D2R server

SQL experiment tables full JDBC

(administrator)

Data (bin)

Data (ont)

API

GlycoVault – hybrid access

 Access via JDBC and SPARQL

 Ontologies will have a pointer to SQL data, can be a query, how to get real values

 (?) Data will be duplicated between SQL tables and ontology.

What should be granularity of ontology data?

 JDBC access is read only for consistency purposes

Modification allowed only for administrators

Download