e-Science Data Information and Knowledge Transformation The edikt Project An overview Dr Rob Baxter, edikt project manager What is edikt? e-Science Data, Information and Knowledge Transformation – a research development activity designed to bridge the gap between applications science and computer science in the realms of Grid-scale data take prototypes from CS and Grid research… …engineer them into robust tools… …for real application science problems… …test them under extreme science conditions… …and keep an eye on the commercial possibilities Funded by SHEFC (initially £2.25M / 3 years) Team of architect + 8 engineers – plus support and overhead (me :-) edikt Overview – 31 May 2016 www.edikt.org What are we doing? “Data engineering for e-Science” [© Prof. D. Gilbert, Sep 2003] Developing and testing tools in four areas – astronomy – experimental particle physics – bioinformatics – virtual organisations and the Grid – earth sciences (just beginning) edikt Overview – 31 May 2016 www.edikt.org Common ground Common requirements have emerged between application testbeds – data source integration VOs, astronomy, bioinformatics – data format interchange VOs, astronomy, bioinformatics, particle physics – web/Grid service frameworks everyone - the e-Science architecture du jour This commonality reflected in activities – more technology split than application split edikt Overview – 31 May 2016 www.edikt.org Current activities edikt::Eldas – developing scalable data access technologies on Grid/web svcs edikt::BinX/AstroBinX – data interchange for astronomy & PP (& bioinfo, Earth science?) edikt::Osage – extending the Edinburgh Mouse Atlas Bioinformatics – early explorations, new collaborations with Glasgow, Edinburgh (Prof. D Gilbert, Dr R. Baldock) Earth sciences – early explorations underway with Prof. G. Boulton edikt Overview – 31 May 2016 www.edikt.org edikt::Eldas Network data integration is a “holy grail” – for science, for business OGSA Data Access and Integration Services – uniform access to database systems over Grid leverage web/Grid services technology relational (SQL), semi-structured (XML) databases files and directories Eldas uses an EJB architecture for scalability – version 1.0 complete end-of-year – will trial in applications 2004 – complementary to OGSA-DAI e-Science GCP project edikt Overview – 31 May 2016 www.edikt.org edikt::BinX & AstroBinX Data format interchange a big issue The BinX XML Schema (Westhead/Bull, EPCC) – describes the structure of binary files – XML is “wordy” for storing data, but excellent for describing data BinX library, part of Astronomy testbed work – though generating interest from PP, bioinformatics, materials modelling (RealityGrid), Earth science – v1.0 released October 2003 – mini-seminar held 6 May; bigger one planned edikt Overview – 31 May 2016 www.edikt.org edikt & geoscience New application area (“testbed”) for 2004 – looking to staff at c. 1-2 FTEs Initial ideas on data format interchange – glaciation simulations – input/output tools based (?) on BinX Other ideas most welcome – I’m here today to listen :-) edikt Overview – 31 May 2016 www.edikt.org Summary First year has focused on – project & team infrastructure – CS technology infrastructure Second year is focused on – application science put our key technologies to work continue successful astronomy work continue/expand key projects in bioinformatics begin new testbed in Earth science edikt Overview – 31 May 2016 www.edikt.org