Research at the National e-Science Centre Dr. Dave Berry Research Manager www.nesc.ac.uk 6th November 2003 Three Pillars of e-Science Research Apply known results Foundations Edinburgh: - Informatics - Physics & Astronomy Glasgow: - Computing Science - Physics & Astronomy Enable new science Technology Focus for new work EPCC ETF&Testbeds edikt Repositories Computing Industry Applications Steering of development Research Departments Research Institutes Other Universities Commercial Customers Information Grids Apply known Peter results Foundations Publishing Scientific Data GridPP ScotGrid Buneman’s GroupTechnology Focus for new work OGSA-DAI/ DAIT th 1,000 Enable new Download science Sep 2003 Applications Steering of development ODD-Genes AstroGrid Tony edikt Doyle & – eldas BinX Steveand Playfer BRIDGES FirstDIG QCDGrid Richard Kenway Richard Baldock Biological SpatioTemporal Databases Computation Grids Apply known results Foundations GridPP ScotGrid Enable new science Technology Focus for new work SunDCG > 3000 doc Applications downloads Steering of development ODD-Genes PGPGrid RealityGrid Enhance Murray Cole Paul Cockshott Fabrics and Platforms Apply known Joe results Enable new science Sventek Foundations AMUSE Dynamic Configuration of Grid Fabrics Dependable Grid Services Technology Focus for new work MS.NETGrid GridWeaver Applications Steering of development OGSA Test Grid IBM Grid Evaluation Stuart Anderson LCFG + SmartFrog More foundations Service Composition Alan Bundy Deductive Synthesis Techniques … Inferring QoS Properties for Grid Applications Don Sannella, Stephen Gilmore Mobile Code Mobile Resource Guarantees IRCs CoAKTinG EQUATOR Security Austin Tate Matthew Chalmers Technologies for Information Environment Security More applications Physics CDF Grid Development NeuroInformatics David Wilshaw for Grid-enabled Modelling Tools and Databases Neuroinformatics BioInformatics e-Diamond (mammography) Rob Procter http://www.nesc.ac.uk/projects/ Data Repositories Medical Genetics Generation Scotland Human Genetics Unit Mouse Atlas Nuclear Protein Database Roslin Institute ArkDB, Informatics EUSTACE Corpus FlyTrap GeoSciences Antarctic Survey data Continental seismic survey data BGS offshore survey Example: ODD-Genes ODD-Genes is a demonstrator Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery SunDCG’s TOG software allows for job submission on remote compute resources OGSA-DAI provides access, control and discovery of data resources "This project has demonstrated how Grid technologies can be used to ODD-Genes used to investigate Wilms Tumour enable true e-Science - discoveries Routine statistical conditioning ofotherwise microarray results that would not have been Data-driven discovery of novel targets investigation and achieved without thisfor infrastructure potential therapy in place." Professor Peter Ghazal, Director, Collaborative project GTI. NeSC/EPCC Scottish Centre for Genomic Technology and Informatics (GTI) Human Genetics Unit at MRC, Western General Hospital (HGU) SunDCG – Enabling Routine Statistical Conditioning Choose analysis to perform Automates analysis process Provides predetermined workflow Can run more than one analysis at a time Multiple reproducible avenues for investigation Reduces cost (human, machine), increases availability TOG enables this by allowing access to HPC resources SunDCG Compute Scheduler B A Grid Engine a b c d e Globus 2 User A Grid Engine e f User B g h d Integrates Grid Engine and Globus 2 GE execution methods provide job submission/control GE job context stores job specific information Globus GSI for security Globus GRAM enables interaction with remote resource GASS for small data transfer, GridFTP for large datasets OGSA-DAI - Results Investigation Multiple views of data Raw Heat Map Cluster Map Wilms Tumour study takes a new direction two genes appear significant in early development Researchers would like more info on these genes… OGSA-DAI - Data Resource Discovery OGSA-DAI uses keywords to locate relevant data resources May return data resources previously unknown to researcher Researcher selects most interesting data resource to query for information about gene Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions OGSA-DAI - Data Resource Query OGSA-DAI returns data from query Data and annotation displayed Data contains references to related images Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression These show that the genes are stem cell markers Targets for focussed investigation, potential therapy Data Access & Integration Services 1a. Request to Registry for sources of data about “x” SOAP/HTTP Registry 1b. Registry responds with Factory handle service creation API interactions 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3c. Results of query returned to client as XML 2b. Factory creates GridDataService to manage access Grid Data Service XML / Relationa l database 3b. GDS interacts with database Example: Mobile Resource Guarantees The MRG technology consists of programming languages; type systems for the languages; logics for expressing statements of resource consumption; and proof technology for proving these statements. Camelot, a high-level functional programming language with objects and resource control; Grail , a strongly-typed intermediate language which is the target language of the Camelot compiler and is interconvertible with Java byte code; A cost model, a formal semantics for byte code execution which tracks execution time and space allocation; A byte code logic allowing the expression of costs, embedded in a generic proof system (Isabelle). Resource-bounded mobile code Relevance to Grids Grid service providers need to schedule competing requests for access to resources. With 25Kb of code and 1Pb of sky survey data it is infeasible to ship the data to the code. There are projects which have supported scientific programming in functional languages (e.g. Psicho). An alternative would be to transfer the MRGtechnology to Java or Java-like languages (ESC/Java, SpecialJ, and Pizza). Example: AMUSE Autonomic Management of Ubiquitous Systems for e-Health Automated management of complex distributed application systems Architectural pattern and prototype implementations for closed-loop management of such systems Policy-based management AMUSE will integrate these to address automated management of e-Health applications Closed-loop Management Pattern (Self-Managed Cell) Management Application Measurement Analysis, Simulation, Optimization Raw Measurement Provisioning Trends & Prediction Event Bus Policy Management Measurement Adapters Service Goals System Policy “System” Configuration Topology, Other “System” Under Test Two-level nesting Management Application Level n Measurement Analysis, Simulation, Optimization Raw Measurement Provisioning Trends & Prediction Event Bus Measurement Adapter Policy Management Service Goals System Policy Meas Level n-2 Infer Level n-1 Prov Event Bus Policy Agents “System” Config “System” Configuration Topology, Other GGF: Standardisation Grid Research Oversight Committee & Programme Committee Prof. Malcolm Atkinson Data Access and Integration Services Working Group Dr Mario Antonioletti (Group Secretary & Editor), Dr Amy Krause (Editor) Prof. Malcolm Atkinson, Dr Martin Westhead, Neil Chue Hong (Authors) Dr. Mike Jackson Data Format Definition Language Working Group Dr Martin Westhead (Founder and Chair) Job Submission Definition Language Working Group Dr Ali Anjomshoaa (founder and chair) Open Grid Services Architecture Working Group Dr Dave Berry Open Grid Services Infrastructure Working Group Dr Mike Jackson, Daragh Byrne Data Services GGF Data Access & Integration Services (DAIS) OGSI-compliant interfaces to access relational and XML databases Needs to be generalized to encompass other data sources (see next slide…) Generalized DAIS becomes the foundation for: Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated? Future DAI Services 1a. Request to Registry for sources of data about “x” & “y” 1b. Registry responds with Factory handle Data Registry SOAP/HTTP service creation API interactions 2a. Request to Factory for access and integration from resources Sx and Sy Data Access & Integration master 2c. Factory returns handle of GDS to client 3b. Client Problem tells“scientific” Solving analyst Client Application Environment coding scientific insights Analyst 2b. Factory creates Semantic GridDataServices network Meta data 3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc GDTS1 GDS GDTS XML database GDS2 Sx 3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation Application Code GDS GDS1 Sy GDS3 GDS GDTS2 GDTS Relational database Take Home Message In addition to our national services, NeSC has a thriving research programme Foundation departments Technology development (EPCC, NeSC, Globus Alliance) Research scientists There are many opportunities for Wide breadth of interest collaboration Particular focus on scientific data OGSA-DAI is here now Join in making better DAI services & standards Bioinformatics and Astronomy are Priority Application Areas