The e Project dikt An overview

advertisement
e-Science Data Information and Knowledge Transformation
The edikt Project
An overview
Dr Rob Baxter, edikt project manager
What is edikt?
 e-Science Data, Information and Knowledge
Transformation
– a research development activity designed to bridge
the gap between applications science and computer
science in the realms of Grid-scale data





take prototypes from CS and Grid research…
…engineer them into robust tools…
…for real application science problems…
…test them under extreme science conditions…
…and keep an eye on the commercial possibilities
 Leveraged from Edi-Gla NeSC bond
edikt Overview – 31 May 2016
www.edikt.org
What is edikt?
tools &
techniques
Software
development
concepts &
ideas
Scientific
applications
evaluation
& testing
the
virtuous
circle
Research
funding
Commercial
application
market
identification
Product
edikt Overview – 31 May 2016
www.edikt.org
Where is edikt?
 Started in May 2002
– originally in the eScience Institute
 Moved March 2003
– across the road to Old
College
 newly refurbished offices,
courtesy of the College of
Science & Engineering
 good team environment
edikt Overview – 31 May 2016
www.edikt.org
Where is edikt?
edikt Overview – 31 May 2016
www.edikt.org
Who are edikt?
 The Team
May 2002
July 2002
November 2002
March 2003
The Scientific
Advisory Board
edikt Overview – 31 May 2016
www.edikt.org
What are we doing?
 “Data engineering for e-Science”
[© Prof. D. Gilbert, Sep 2003]
 Developing and testing tools in four areas
– astronomy
– experimental particle physics
– bioinformatics
– virtual organisations and the Grid
– others to follow…
edikt Overview – 31 May 2016
www.edikt.org
Common ground
 Common requirements have emerged between
application testbeds
– data source integration
 VOs, astronomy, bioinformatics
– data format interchange
 VOs, astronomy, bioinformatics, particle physics
– web/Grid service frameworks
 everyone - the e-Science architecture du jour
 This commonality reflected in activities
– more technology split than application split
edikt Overview – 31 May 2016
www.edikt.org
Early activities
 May-Jun 2002 (pm, 2 engineers)
– planning and infrastructure
 Jul-Sep 2002 (pm, 5 engineers)
– requirements and tech evaluations
 some success, though tricky across the testbed areas of PP,
astronomy and bioinformatics
 Oct-Dec 2002 (pm, 8 engineers)
– current activities defined and kicked off
 Jan 2003 onwards (pm, architect, 8 engineers)
– full steam ahead!
edikt Overview – 31 May 2016
www.edikt.org
Current activities
 edikt::Eldas
– developing scalable data access technologies
 edikt::Giggle
– evaluation of data replication technology for PP
 edikt::BinX
– data interchange for astronomy & PP (& bioinfo, Earth science?)
 edikt::Osage
– extending the Edinburgh Mouse Atlas
 Bioinformatics
– early explorations, new collaborations with Glasgow, Edinburgh
– £105k purchase underpinning CPU at Glasgow
 Earth sciences
– early explorations underway
edikt Overview – 31 May 2016
www.edikt.org
edikt::Eldas
 Network data integration is a “holy grail”
– for science, for business
 OGSA Data Access and Integration Services
– uniform access to database systems over Grid
 leverage web/Grid services technology
 relational (SQL), semi-structured (XML) databases
 files and directories (soon)
– Related to OGSA-DAI e-Science GCP project
 EPCC, IBM and others
 producing specifications and reference codes
 The technology for enterprise integration?
– for VOs in science and business
edikt Overview – 31 May 2016
www.edikt.org
edikt::Eldas
 Porting/testing of OGSA-DAI software
– expanding range of platforms, databases
– developing framework for strain testing
 EJB version of OGSA-DAI
– are EJBs best for scalability & performance?
– design analysis completed, implementation begun
 alpha version on internal release for testing now
 version 1.0 public release beginning Q4
 expect to provide leverage for bioinfo, astro, other apps
 Complementary to OGSA-DAI projects
– common oversight through PM
edikt Overview – 31 May 2016
www.edikt.org
edikt::Giggle
 EDG/Globus Replica Location Service
– “intelligent data caching”
 finding the “closest” copy of data to speed analysis
 Evaluation for Glasgow PP (Doyle)
– can RLS deliver for PP?
– early versions were difficult to work with…
 Latest vn (T2, released in April) now under test
– functionality and performance test suites run
– final profiling work to complete
 Short follow-on to compare with other techs
– latest vn. of SDSC’s SRB
edikt Overview – 31 May 2016
www.edikt.org
edikt::BinX
 Data format interchange a big issue
 The BinX XML Schema (Westhead/Bull, EPCC)
– describes the structure of binary files
– XML is “wordy” for storing data, but excellent for
describing data
 BinX library, part of Astronomy testbed work
– though generating interest from PP, bioinformatics,
materials modelling (RealityGrid), Earth science
– v1.0 released October 2003
– mini-seminar held 6 May; bigger one planned
edikt Overview – 31 May 2016
www.edikt.org
edikt::Osage
 Ontology-based Species Atlas for
Gene Expression
 Extension of Edinburgh Mouse Atlas
– MRC Human Genetics Unit at WGH
 Phase I underway
– started September 2003
– porting object  relational db
– developing new query services
– leverage Eldas
edikt Overview – 31 May 2016
www.edikt.org
Other bioinformatics
 Early exploratory work: emTree
– investigations of external memory indexing for large
genome data with Glasgow DCS (Hunt)
 New collaborations emerging with Glasgow
BRC (Gilbert)
 Strong prospects identified
– BPS, Amaze
– interest in leveraging Eldas, BinX
 Key science area for this year
edikt Overview – 31 May 2016
www.edikt.org
Spoilt for choice
 In the pipeline:
–
–
–
–
–
–
Earth science applications
XML storage & annotation (Buneman, Ed Informatics)
data mining (astronomy, bioinfo, others)
further data replication technologies (PP, others)
large indexes [emTree] (astronomy, bioinfo)
application/data integration (everyone)
 Too many!
– although we’re always looking for interesting
collaborations
edikt Overview – 31 May 2016
www.edikt.org
Summary
 First year has focused on
– project & team infrastructure
– CS technology infrastructure
 Second year is focused on
– application science
 put our key technologies to work
 continue successful astronomy work
 begin key projects in bioinformatics
 begin new testbed in Earth science
edikt Overview – 31 May 2016
www.edikt.org
Download