CS5547 e-Science & Grid Computing - introduction - What is e-Science? What is the Grid? Grid middleware Virtual Organisations - some issues Data access & integration Metadata MSc in e-Science Technology at-a-glance http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 1 CS5547 Some definitions e-Science “The large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. “Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists.” [nesc.ac.uk] Grid “An infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources.” [Foster & Kesselman, globus.org] http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 2 The Global Grid http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 3 http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt CS5547 UK SuperJANET 4/5 (Links up to 2.5Gbit/s) http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 4 http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt CS5547 CS5547 Scale, distribution, complexity Person Cell Multiscale modelling of the heart http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 5 http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt Multiscale modelling of cancer Large Hadron Collider (LHC) http://gridportal.hep.ph.ic.ac.uk/rtm/ http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 6 http://www.nesc.ac.uk/events/ahm2004/presentations/BobJones.ppt CS5547 e-Science & CS5547 engineering Airline office London Airport New York Airport Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 Signal Data Explorer 7 http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt Engine flight data “A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 e-Science CS5547 workflows B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 8 http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt A CS5547 Grid middleware: Globus toolkit (GT) The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. I. Foster, C. Kesselman, J. Nick, S. Tuecke, Open Grid Service Infrastructure WG, Global Grid Forum, 2002. http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 9 http://www.globus.org The Anatomy of the Grid: Enabling Scalable Virtual Organizations. I. Foster, C. Kesselman, S. Tuecke. International J. Supercomputer Applications, 15(3), 2001. Grid & Web Services convergence The definition of WSRF means that the Grid and Web services communities can move forward on a common base. http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 10 http://www.globus.org CS5547 CS5547 Web & Grid Services Specifications that have/will enter a standardisation process but are not stable and are still experimental ‘WS-I+’ profile WS-I http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 11 http://www.globus.org Specifications that are emerging from standardisation process and are recognised as being ‘useful’ Standards that have broad industry support and multiple interoperable implementations UK National Grid Service Interfaces Projects e-Minerals e-Materials Orbital Dynamics of Galaxies Bioinformatics (using BLAST) GEODISE project UKQCD Singlet meson project Census data analysis MIAKT project e-HTPX project. RealityGrid (chemistry) OGSI::Lite Users Leeds Oxford UCL Cardiff Southampton Imperial Liverpool Sheffield Cambridge Edinburgh QUB BBSRC CCLRC http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 12 http://www.nesc.ac.uk/events/ahm2004/presentations/TonyHey.ppt CS5547 CS5547 Grid Virtual Organisations - some issues Forming a VO dynamically • partner identification • Service Level Agreements (SLAs) • QoS, trust, reputation Operating a VO • monitoring QoS • perturbation: coping with failures - and new opportunities! • policing: what went wrong? who’s to blame? www.conoise.org http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 13 CS5547 Grid Data Service element query data element element Transform Activity data Delivery Activity credentials data connection credentials connection role Data Resource Implementation role Role Mapper http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 14 http://www.ogsadai.org.uk/ Query Activity response document The Engine perform document Sql Query Statement Deliver ToURL GDS - pipeline example <sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> <resultSetStream name=“MyOutput"/> </sqlQueryStatement> <deliverToURL name="deliverOutput"> <fromLocal from=“MyOutput"/> <toURL> ftp://anon:frog@ftp.example.com/home </toURL> </deliverToURL> http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 15 http://www.ogsadai.org.uk/ CS5547 CS5547 Grid data access & integration Solutions in place to handle • heterogeneous data storage • pipelines / dataflows • access control • … within the Grid svc arch Major issues remain, including • provenance - where did it come from, who did what to it? • data quality - living with variable-quality data (www.qurator.org) http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 16 http://www.ogsadai.org.uk/ Not specific to e-Science! e.g. see FirstDIG project CS5547 Metadata in e-Science Publications • formal/reviewed • “grey” • associated artefects Experiment datasets • formally curated • raw/pre-processed • in vivo / in vitro / in silico People • expert directories • communities of practice Scientific method • experiment workflow • knowledge roles: hypotheses, observations, predictions, deductions, … • Discourse & natural arguments: proof, refutation, agreement, … Projects • formal/funded • working groups http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 17 CS5547 Managing scientific metadata Evidence Experiment Experiment Hypothesis Publication Publication Hypothesis Hypothesis Publication Publication Disagrees With Hypothesis Agrees With Hypothesis Hypothesis e-Science metadata management platform http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 18 CS5547 Fearlus-G pilot project metadata schema (ontology) desktop client metadata client Globus client http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 19 CS5547 MSc e-Science Technologies: next… CS5547 e-Science & Grid Computing • Grid middleware, e-Science workflow, metadata CS5553 Intelligent Architectures • technologies for Virtual Organisations CS5545 Data Interpretation & Communication • technologies at the data/user-scientist interface CS5544 E-Technology Workshop • group project, with an e-Science application CS5945 MSc Project in E-Technology • potential to do a project with user-scientists http://www.csd.abdn.ac.uk/teaching/levelfive/CS5547 20