BinX An edikt Project Testbed Ted Wen, Robert Carroll, Denise Ecklund, Bob Gibbins,

advertisement
e-Science Data Information and Knowledge Transformation
BinX
An edikt Project Testbed
Ted Wen, Robert Carroll,
Denise Ecklund, Bob Gibbins,
Davy Virdee, Rob Baxter
Presentation outline
 Edikt project
 A data problem
 BinX - today
– language
– library
– applications
 BinX – future
2
www.edikt.org
What is edikt?
 e-Science Data, Information and Knowledge
Transformation
– a research development activity designed to bridge
the gap between applications science and computer
science in the realms of Grid-scale data





take prototypes from CS and Grid research…
…engineer them into robust tools…
…for real application science problems…
…test them under extreme science conditions…
…and keep an eye on the commercial possibilities
 Team of 8 professional engineers, mgmt & staff
 Funded by SHEFC; Project start was May 2002
3
www.edikt.org
Current activities
 edikt::Eldas
– proving GGF’s GDSS for virtual organisations
– developing scalable data access technologies
 edikt::BinX
– data interchange for astronomy & PP
 edikt::Giggle and RLS
– evaluation of data replication technology for PP
 Bioinformatics
– data mediation to integrate multiple data sources
– data versioning to manage changing schemas
4
www.edikt.org
e-Science Data Information and Knowledge Transformation
“eScience Data”
Real-World and In Silico
Experiments
Research and discovery
Workflow
Real-world
Experiments
Data
C
Analysis
C
C
Abstract Model
 Workflow support tools
– Format converter
– Model builder
Result
Data C
C
App
area 1
In silico
Experiments
App
area 2
App
area 3
Results
App
area 4
Generic Tools
Existing tools: XML processors
New tools: Perl script generators
Model description generators
6
www.edikt.org
Data integration & mediation
Real-world
Experiments
Data
Integrator/
Mediator
Data
Data
Integrated
Data
 Distributed Geo-sensors
– One sensor type with overlapping
observation regions
– Resolve conflicting values in the overlap
– Compute “total space” – min or max?
If max, define missing values

Public Biochemical Signalling DBs
–
–
–
–
Match the input records
Build integrated records
Detect data value conflicts
Resolve data value conflicts
S1
S2
S4
S3
S6
S5
Reaction 1 D1 D2 D3
D1
Reaction 2 D1 D2 D3
D2
D3
.
.
.
.
Reaction n D1 D2 D3
7
www.edikt.org
Data subsets
Real-world
Experiments
1953
Legacy
Data
Data
Analysis
S
Real-world
Experiments
today
C
New
Data

 Legacy data was not
organized for the new analysis

– Extract a data subset
– Define the subset by queries

Results
New
Analysis
New
Results
Structural metadata query:
“What is the minimum geo-space
data coverage?”
Simple semantic query:
“What reactions require 2 or
more inhibitor agents to prevent
the reaction?”
Complex semantic query:
What objects are contained in
a 3-dimensional image?”
8
www.edikt.org
BinX for binary data
 BinX is a foundation tool for these problems
when the data is a structured binary file.
Workflow – format conversion
Binary data1
BinX XML1
Binary data2
BinX-based
format conversion
Data Subsets
R-W
Exper
Binary data
BinX XML2
Data Integration
Binary data
S1
Exp1
Binary data
Binary data
BinX XML
description
S2
Binary data
Exp2
Binary data
S3
Exp3
Binary data
D
1
D
2
D
3
Integrate
dBinary
data
I-D
9
www.edikt.org
Download