e-Science Data Information and Knowledge Transformation The edikt Project An overview Dr Rob Baxter, edikt project manager What is edikt? e-Science Data, Information and Knowledge Transformation – a research development activity designed to bridge the gap between applications science and computer science in the realms of Grid-scale data take prototypes from CS and Grid research… …engineer them into robust tools… …for real application science problems… …test them under extreme science conditions… …and keep an eye on the commercial possibilities Leveraged from Edi-Gla NeSC bond edikt Overview – 31 May 2016 www.edikt.org What is edikt? tools & techniques Software development concepts & ideas Scientific applications evaluation & testing the virtuous circle Research funding Commercial application market identification Product edikt Overview – 31 May 2016 www.edikt.org Where is edikt? Started in May 2002 – originally in the eScience Institute Moved March 2003 – across the road to Old College newly refurbished offices, courtesy of the College of Science & Engineering good team environment edikt Overview – 31 May 2016 www.edikt.org Where is edikt? edikt Overview – 31 May 2016 www.edikt.org Who are edikt? The Team May 2002 July 2002 November 2002 March 2003 The Scientific Advisory Board edikt Overview – 31 May 2016 www.edikt.org What are we doing? “Data engineering for e-Science” [© Prof. D. Gilbert, Sep 2003] Developing and testing tools in four areas – astronomy – experimental particle physics – bioinformatics – virtual organisations and the Grid – others to follow… edikt Overview – 31 May 2016 www.edikt.org Common ground Common requirements have emerged between application testbeds – data source integration VOs, astronomy, bioinformatics – data format interchange VOs, astronomy, bioinformatics, particle physics – web/Grid service frameworks everyone - the e-Science architecture du jour This commonality reflected in activities – more technology split than application split edikt Overview – 31 May 2016 www.edikt.org Early activities May-Jun 2002 (pm, 2 engineers) – planning and infrastructure Jul-Sep 2002 (pm, 5 engineers) – requirements and tech evaluations some success, though tricky across the testbed areas of PP, astronomy and bioinformatics Oct-Dec 2002 (pm, 8 engineers) – current activities defined and kicked off Jan 2003 onwards (pm, architect, 8 engineers) – full steam ahead! edikt Overview – 31 May 2016 www.edikt.org Current activities edikt::Eldas – developing scalable data access technologies edikt::Giggle – evaluation of data replication technology for PP edikt::BinX – data interchange for astronomy & PP (& bioinfo, Earth science?) edikt::Osage – extending the Edinburgh Mouse Atlas Bioinformatics – early explorations, new collaborations with Glasgow, Edinburgh – £105k purchase underpinning CPU at Glasgow Earth sciences – early explorations underway edikt Overview – 31 May 2016 www.edikt.org edikt::Eldas Network data integration is a “holy grail” – for science, for business OGSA Data Access and Integration Services – uniform access to database systems over Grid leverage web/Grid services technology relational (SQL), semi-structured (XML) databases files and directories (soon) – Related to OGSA-DAI e-Science GCP project EPCC, IBM and others producing specifications and reference codes The technology for enterprise integration? – for VOs in science and business edikt Overview – 31 May 2016 www.edikt.org edikt::Eldas Porting/testing of OGSA-DAI software – expanding range of platforms, databases – developing framework for strain testing EJB version of OGSA-DAI – are EJBs best for scalability & performance? – design analysis completed, implementation begun alpha version on internal release for testing now version 1.0 public release beginning Q4 expect to provide leverage for bioinfo, astro, other apps Complementary to OGSA-DAI projects – common oversight through PM edikt Overview – 31 May 2016 www.edikt.org edikt::Giggle EDG/Globus Replica Location Service – “intelligent data caching” finding the “closest” copy of data to speed analysis Evaluation for Glasgow PP (Doyle) – can RLS deliver for PP? – early versions were difficult to work with… Latest vn (T2, released in April) now under test – functionality and performance test suites run – final profiling work to complete Short follow-on to compare with other techs – latest vn. of SDSC’s SRB edikt Overview – 31 May 2016 www.edikt.org edikt::BinX Data format interchange a big issue The BinX XML Schema (Westhead/Bull, EPCC) – describes the structure of binary files – XML is “wordy” for storing data, but excellent for describing data BinX library, part of Astronomy testbed work – though generating interest from PP, bioinformatics, materials modelling (RealityGrid), Earth science – v1.0 released October 2003 – mini-seminar held 6 May; bigger one planned edikt Overview – 31 May 2016 www.edikt.org edikt::Osage Ontology-based Species Atlas for Gene Expression Extension of Edinburgh Mouse Atlas – MRC Human Genetics Unit at WGH Phase I underway – started September 2003 – porting object relational db – developing new query services – leverage Eldas edikt Overview – 31 May 2016 www.edikt.org Other bioinformatics Early exploratory work: emTree – investigations of external memory indexing for large genome data with Glasgow DCS (Hunt) New collaborations emerging with Glasgow BRC (Gilbert) Strong prospects identified – BPS, Amaze – interest in leveraging Eldas, BinX Key science area for this year edikt Overview – 31 May 2016 www.edikt.org Spoilt for choice In the pipeline: – – – – – – Earth science applications XML storage & annotation (Buneman, Ed Informatics) data mining (astronomy, bioinfo, others) further data replication technologies (PP, others) large indexes [emTree] (astronomy, bioinfo) application/data integration (everyone) Too many! – although we’re always looking for interesting collaborations edikt Overview – 31 May 2016 www.edikt.org Summary First year has focused on – project & team infrastructure – CS technology infrastructure Second year is focused on – application science put our key technologies to work continue successful astronomy work begin key projects in bioinformatics begin new testbed in Earth science edikt Overview – 31 May 2016 www.edikt.org