Enabling Grids for E-sciencE Introduction to Grid Eddie.Aronovich@cs.tau.ac.il www.eu-egee.org INFSO-RI-508833 Acknowledgements Enabling Grids for E-sciencE • Presentation is based on slides from: – Roberto Barbera, University of Catania and INFN (EGEE Tutorial Roma, 02.11.2005) – Mike Mineter, Concepts of grid computing – Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva, Switzerland (Naregi Symposium 2005 – Tokyo) – Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva, Switzerland (APAC, 27 September 2005) – Guy Warner, NeSC Training Team (An Induction to EGEE for GOSC and the NGS NeSC, 8th December 2004 ) INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 2 EGEE project in 1K words Enabling Grids for E-sciencE https://goc.grid-support.ac.uk/gridsite/monitoring/ INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 3 From Phase I to II Enabling Grids for E-sciencE • From 1st EGEE EU Review in February 2005: – “The reviewers found the overall performance of the project very good.” – “… remarkable achievement to set up this consortium, to realize appropriate structures to provide the necessary leadership, and to cope with changing requirements.” • EGEE I – Large scale deployment of EGEE infrastructure to deliver production level Grid services with selected number of applications • EGEE II – Natural continuation of the project’s first phase – Emphasis on providing an infrastructure for e-Science increased support for applications increased multidisciplinary Grid infrastructure more involvement from Industry – Extending the Grid infrastructure world-wide increased international collaboration (Asia-Pacific is already a partner!) INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 4 What do we need more ? Enabling Grids for E-sciencE • Processing power • Storage • Security aware integrative infrastructure • Community aware environment Or what we may call…. INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 5 e-Science Enabling Grids for E-sciencE • What is e-Science? Collaborative science that is made possible by the sharing across the Internet of resources (data, instruments, computation, people’s expertise...) – Often very compute intensive – Often very data intensive (both creating new data and accessing very large data collections) – data deluges from new technologies – Crosses organisational boundaries • Examples…. INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 6 A good example: Particle Physics Enabling Grids for E-sciencE • Large amount of data produced in a few places: CERN, FNAL, KEK… • Large worldwide organized collaborations (i.e. LHC CERN experiments) of computer-savvy scientists • Computing and data management resources distributed world-wide owned and managed by many different entities • Large Hadron Collider (LHC) at CERN in Geneva Switzerland: – One of the most powerful instruments ever built to investigate matter INFSO-RI-508833 Mont Blanc (4810 m) Downtown Geneva EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 7 The LHC Experiments Enabling Grids for E-sciencE • Large Hadron Collider (LHC): – four experiments: ALICE ATLAS CMS LHCb – 27 km tunnel – Start-up in 2007 INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 8 Orders of magnitude… Enabling Grids for E-sciencE 10-15 Petabytes ˜20.000.000 CDROM INFSO-RI-508833 10 times the Eiffel Tower ˜3000 m EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 9 EGEE – building e-infrastructure Enabling Grids for E-sciencE EGEE is building a large-scale production grid service to: • Underpin research, technology and public service • Link with and build on national, regional and international initiatives • Foster international cooperation both in the creation and the use of the einfrastructure INFSO-RI-508833 Pan-European Grid Operations, Support and training Collaboration Network infrastructure & Resource centres EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 10 Grids and e-Infrastructure Enabling Grids for E-sciencE • “Campus grids”: internal to an institute / university: – – – – – “High throughput” – harvesting compute time Not really ‘a grid’ unless crossing administrative domains Can become a resource on a grid Example: Condor http://www.nesc.ac.uk/esi/events/556/ • Grids: cross administrative boundaries – National scale: in UK, NGS – Regional efforts: in China, EUMedGrid, CrossGrid, SeeGrid – International scale: in Europe, EGEE INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 11 e-Infrastructure Enabling Grids for E-sciencE • implementation blocks From a talk by Mario Campolargo, Brussels, 30 May 2005 INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 12 Contents Enabling Grids for E-sciencE • • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Implementation samples Grid Status & Standards The basis: authentication, authorisation, security So, What can it do ? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 13 The Grid Metaphor Enabling Grids for E-sciencE Mobile Access G R I D Workstation M I D D L E W A R E Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Visualising Internet, networks INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 14 The grid vision Enabling Grids for E-sciencE • The grid vision is of “Virtual computing” (+ information services to locate computation, storage resources) – Compare: The web: “virtual documents” (+ search engine to locate them) • MOTIVATION: collaboration through sharing resources (and expertise) to expand horizons of – Research – Commerce – engineering, … “the knowledge economy” – Public service – health, environment,… INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 15 Contents Enabling Grids for E-sciencE • • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Implementation samples Grid Status & Standards The basis: authentication, authorisation, security So, What can it do ? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 16 “A grid” Enabling Grids for E-sciencE • The initial vision: “The Grid” • The present reality: Many “grids” • Each grid is an infrastructure enabling one or more “virtual organisations” to share computing resources • What’s a VO? – People in different organisations seeking to cooperate and share resources across their organisational boundaries • Why establish a Grid? VO Institute A Institute B Institute C Institute D – Share data – Pool computers – Collaborate INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 17 The Single Computer Enabling Grids for E-sciencE • The Operating System enables easy use of – – – – – Input devices Processor Disks Display Any other attached devices Application Software Operating System Disks, Processor, Memory, … INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 18 Resources on a Local Area Network Enabling Grids for E-sciencE User just perceives “shared resources”, with no regard to location in the organisation: - Authenticated by username / password - Authorised to use own files,… Application Software Middleware for sharing computers, servers, printers, … Operating System on each computer Resources connected by a LAN INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 19 Resources on a grid Enabling Grids for E-sciencE Application Software Interface between app. and grid Grid Middleware: “collective services” Grid Middleware on each resource Operating System on each resource Resources connected by internet INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 20 A grid Enabling Grids for E-sciencE • Grid middleware runs on each shared resource – Data storage – (Usually) batch jobs on pools of processors • Users join VO’s • Virtual organisation negotiates with sites to agree access to resources INTERNET • Distributed services (both people and middleware) enable the grid INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 21 What characterises a grid? Enabling Grids for E-sciencE • Co-ordinated resource sharing – No centralised point of control – Different administrative domains. • Standard, open, general-purpose protocols and interfaces – NOT specific to an application – EGEE, NGS support multiple VO’s • Delivering non-trivial qualities of service – Co-ordinated to deliver combined services, greater than sum of the individual components • http://www.gridtoday.com/02/0722/100136.html INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 22 The components of a Grid Enabling Grids for E-sciencE • Resources – networking, computers, storage, data, instruments, … • Grid Middleware – the “operating system of the grid” • Operations infrastructure – Run enabling services (people + software) • Virtual Organization management – Procedures for gaining access to resources INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 23 Key concepts Enabling Grids for E-sciencE • Virtual organisation: people and resources collaborating - across admin, organisational boundaries • Single sign-on – I connect to one machine – some sort of “digital credential” is passed on to any other resource I use, basis of: Authentication: How do I identify myself to a resource without username/password for each resource I use? Authorisation: what can I do? Determined by • My membership of VO • VO negotiations with resource providers • Grid middleware runs on each resource • User just perceives “shared resources” with no concern for location or owning organisation INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 24 Contents Enabling Grids for E-sciencE • • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Implementation samples Grid Status & Standards The basis: authentication, authorisation, security So, What can it do ? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 25 Large Hadron Collider at CERN Enabling Grids for E-sciencE • Data Challenge: – 10 Petabytes/year of data !!! – 20 million CDs each year! • Simulation, reconstruction, analysis: – LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges – Reliable and scalable through project lifetime of decades INFSO-RI-508833 Mont Blanc (4810 m) Downtown Geneva EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 27 Enabling Grids for E-sciencE Input file Seq1 > dcscdssdcsdcdsc Computing element dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc Seq1 zedezdze> bcbjbf dscbscds dssdcsdc dedzedzdzedezdze cdscsdcsc bcbjbf dscbscds cdscsdcscdssdcsdc dssdcsdc bcbjbf dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf bscdsbcbjbfvbfvbvfbvbvbhvbhs vbhdvbhfdbvfd bhvdsvbhvbhdvrefghefgdscgdfg csdycgdkcsqkc … Seqn > bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscgdfg csdycgdkcsqkchdsqhfduhdhdhq edezhhezldhezhfehflezfzejfv dedzedz dzedezd dedzedz zecdscsd dzedezd dedzedz cscdssdc zecdscsd dzedezd dedzedz sdcdscbs cscdssdc zecdscsd dzedezd cdsbcbjb dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd dedzedz sdcdscbs cscdssdc zecdscsd f cdsbcbjb dzedezd sdcdscbs cscdssdc zecdscsd f cdsbcbjb sdcdscbs cscdssdc f cdsbcbjb sdcdscbs f cdsbcbjb f BLAST UI Seq2 > bvdfvfdvhbdfvb DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc Seq2 zedezdze> dscbscds dssdcsdc dedzedzdzedezdze cdscsdcsc bcbjbf dscbscds cdscsdcscdssdcsdc dssdcsdc bcbjbf dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf dedzedzd Seqn zedezdze> dedzedzdzedezdze cdscsdcsc cdscsdcscdssdcsdc dssdcsdc dscbscdsbcbjbdfn dscbscds dfjvbndfbnbnfbjn bcbjbf bjxbnxbjk:nxbf BLAST gridification dedzedzdzedezdzecdscsdcscdssdcsd cdscbscdsbcbjbfvbfvbvfbvbvbhvbh svbhdvbhfdbvfdbvdfvfdvhbdfvbhd bhvdsvbhvbhdvrefghefgdscgdfgcsd ycgdkcsqkcqhdsqhfduhdhdhqedezh dhezldhezhfehflezfzeflehfhezfhehf ezhflezhflhfhfelhfehflzlhfzdjazslzd hfhfdfezhfehfizhflqfhduhsdslchlkc hudcscscdscdscdscsddzdzeqvnvqvn q! Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbwfdbfn INFSO-RI-508833 DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds bcbjbf bcbjbf BLAST DB dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dedzedzd dssdcsdc cdscsdcsc zedezdze dedzedzd dscbscds dssdcsdc cdscsdcsc zedezdze bcbjbf dscbscds dssdcsdc cdscsdcsc bcbjbf dscbscds dssdcsdc bcbjbf dscbscds RESULT BLAST bcbjbf dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dssdcsdc cdscsdcsc dscbscds dssdcsdc bcbjbf dscbscds bcbjbf BLAST dedzedzd zedezdze dedzedzd cdscsdcsc zedezdze dssdcsdc cdscsdcsc dscbscds dssdcsdc bcbjbf dscbscds DB bcbjbf Computing element EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 28 Enabling Grids for E-sciencE DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis and Prognosis Engine flight data London Airport Airline office New York Airport •“A Significant factor in the success of the Rolls-Royce campaign to power the Boeing 7E7 with the Trent 1000 was the emphasis on the new aftermarket support service for the engines provided via DS&S. Boeing personnel were shown DAME as an example of the new ways of gathering and processing the large amounts of data that could be retrieved from an advanced aircraft such as the 7E7, and they were very impressed”, DS&S 2004 Grid Diagnostics Centre Maintenance Centre American data center European data center XTO Companies: Rolls-Royce DS&S Cybula INFSO-RI-508833 Universities: York, Leeds, Sheffield, Oxford Engine Model Case Based Reasoning EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 29 Contents Enabling Grids for E-sciencE • • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Implementation samples Grid Status & Standards The basis: authentication, authorisation, security So, What can it do ? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 55 Enabling Grids for E-sciencE If “The Grid” vision leads us here… … then where are we now? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 56 Grids: where are we now? Enabling Grids for E-sciencE • Many key concepts identified and known • Many grid projects have tested, and benefit from, these • Major efforts now on establishing: – Standards (a slow process) (e.g. Global Grid Forum, http://www.gridforum.org/ ) – Production Grids for multiple VO’s “Production” = Reliable, sustainable, with commitments to quality of service • In Europe, EGEE • In UK, National Grid Service • In US, Teragrid One stack of middleware that serves many research (and other!!!) communities Operational procedures and services (people!, policy,..) – New user communities • … whilst research & development continues INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 58 The vision of 2001: convergence of Web Services and Grids Enabling Grids for E-sciencE Open Grid Services Architecture OGSI Grid prototypes Web services World-wide web High-end computing High throughput-computing INTERNET INFSO-RI-508833 Massively parallel computing EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 59 Contents Enabling Grids for E-sciencE • • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Implementation samples Grid Status & Standards The basis: authentication, authorisation, security So, What can it do ? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 60 Approaches to Security: 1 Enabling Grids for E-sciencE The Poor Security House INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 61 Approaches to Security: 2 Enabling Grids for E-sciencE The Paranoid Security House INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 62 Approaches to Security: 3 Enabling Grids for E-sciencE The Realistic Security House INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 63 Grid security and trust -1 Enabling Grids for E-sciencE • Providers of resources (computers, databases,..) need risks to be controlled: they are asked to trust users they do not know – They trust a VO – The VO trusts its users • User’s need – single sign-on: to be able to logon to a machine that can pass the user’s identity to other resources – To trust owners of the resources they are using • Build middleware on layer providing: – Authentication: who wants to use/provide resource – Authorisation: what the user is allowed to do – Security: reduce vulnerability, e.g. from outside the firewall – Non-repudiation: knowing who did what • Digital credentials and the “Grid Security Infrastructure” middleware are the basis of production grids INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 64 Grid security and trust -2 Enabling Grids for E-sciencE • Currently, achieved by Certification: – User’s identity has to be certified by one of the national Certification Authorities (CAs) mutually recognized http://www.gridpma.org/, for EU go via here to http://marianne.in2p3.fr/datagrid/ca/catable-ca.html to find your CA •E.g. In IL go to https://certificates.iucc.ac.il – Resources are also certified by CAs • User – User joins a VO – Digital certificate is basis of AA – Identity passed to other resources you use, where it is mapped to a local account – the mapping is maintained by the VO • Common agreed policies establish rights for a Virtual Organization to use resources INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 65 Grid security and trust -3 Enabling Grids for E-sciencE • Certification and GSI provides – Authentication Resource can trust user User can trust the resource provider …. So long as certificates are protected – they are your grid identity – A basis for Authorisation so a VO can manage access to resources Resource providers trust the VO The VO trusts the user – Mechanism for checking message integrity Messages are passed between machines Public/private key pairs protect message integrity as well as authentication •Not (usually) encrypted but message-integrity is checked INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 66 Certificate Request Enabling Grids for E-sciencE User send public key to CA along with proof of identity. User generates public/private key pair. Certificate Request Public Key CA confirms identity, signs certificate and sends back to user. Cert ID Private Key encrypted on local disk INFSO-RI-508833 slide based on presentation given by Carl Kesselman at GGF Summer School 2004 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 67 Contents Enabling Grids for E-sciencE • • • • • • • “The Grid” vision What is “a grid” ? Drivers of grid computing Implementation samples Grid Status & Standards The basis: authentication, authorisation, security So, What can it do ? INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 68 BioMed Overview Enabling Grids for E-sciencE • Infrastructure – ~3.000 CPUs – ~12 TB of disk – in 9 countries PADOVA BARI • >50 users in 7 countries working with 12 applications • 18 research labs 15 resource centres 17 CEs 16 SEs BIOMED Number of jobs Number of jobs 25,000 20,000 15,000 10,000 5,000 0 2004-09 2004-10 2004-11 2004-12 2005-01 2005-02 2005-03 Month Month INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 70 Biomed Virtual Organisation Enabling Grids for E-sciencE • ~ 70 users, 9 countries • > 12 Applications (medical image processing, bioinformatics) • ~3000 CPUs, ~12 TB disk space • ~100 CPU years, ~ 500K jobs last 6 months 120000 60 100000 50 80000 40 60000 30 40000 20 duration estimate (years) nb of jobs BIOMED jobs distribution registered jobs successful jobs 20000 10 cancelled jobs aborted jobs 0 0 2005-01 INFSO-RI-508833 2005-02 2005-03 2005-04 2005-05 2005-06 2005-07 run duration estimate 2005-08 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 71 Bioinformatics Enabling Grids for E-sciencE • GPS@: Grid Protein Sequence Analysis – Gridified version of NPSA web portal Offering proteins databases and sequence analysis algorithms to the bioinformaticians (3000 hits per day) Need for large databases and big number of short jobs – Objective: increased computing power – Status: 9 bioinformatic softwares gridified – Grid added value: open to a wider community with larger bioinformatic computations • xmipp_MLrefine – 3D structure analysis of macromolecules From (very noisy) electron microscopy images Maximum likelihood approach to find the optimal model – Objective: study molecule interaction and chem. properties – Status: algorithm being optimised and ported to 3D – Grid added value: parallel computation on different resources of independent jobs INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 72 Drug Discovery Enabling Grids for E-sciencE • Demonstrate the relevance and the impact of the grid approach to address Drug Discovery for neglected diseases Target discovery Target Identification Lead discovery Target Validation Database filtering Alignment Similarity analysis vHTS Lead Identification Biophores Lead Optimization Clinical Phases (I-III) QSAR ADMET diversity Combinatorial de novo selection libraries design Computer Aided Drug Design (CADD) Duration: 12 – 15 years, Costs: 500 - 800 million US $ INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 73 Docking platform components Enabling Grids for E-sciencE • Predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure Grid infrastructure UI Targets family ~10 Compounds database ~millions Parameter / scoring settings Software methods ~10 INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 74 First biomedical data challenge: World-wide In Silico Docking On Malaria (WISDOM) Enabling Grids for E-sciencE • Significant biological parameters – two different molecular docking applications (Autodock and FlexX) – about one million virtual ligands selected – target proteins from the parasite responsible for malaria • Significant numbers Domain distribution of Flexx run jobs bg; 597 com; 1072 cy; 383 de; 715 uk; 8106 es; 5122 tw; 827 ru; 218 ro; 337 pl; 1877 fr; 7580 nl; 3356 it; 3687 il; 263 gr; 2004 – Total of about 46 million ligands docked in 6 weeks WISDOM open day – 1TB of data produced – Up 1000 computers in 15 countries December 16th, 2005, Bonn (Germany) used simultaneously corresponding to about 80 CPU years Discuss Data Challenge results Prepare next steps towards a malaria Grid (EGEE-II, Embrace, Bioinfogrid) Information: http://wisdom.eu-egee.fr INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 75 Medical imaging Enabling Grids for E-sciencE • GATE – Radiotherapy planning Improvement of precision by Monte Carlo simulation Processing of DICOM medical images – Objective: very short computation time compatible with clinical practice – Status: development and performance testing – Grid Added Value: parallelisation reduces computing time • CDSS – Clinical Decision Support System Assembling knowledge databases Using image classification engines – Objective: access to knowledge databases from hospitals – Status: from development to deployment, some medical end users – Grid Added Value: ubiquitous, managed access to distributed databases and engines INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 76 Medical imaging Enabling Grids for E-sciencE • SiMRI3D – 3D Magnetic Resonance Image Simulator MRI physics simulation, parallel implementation Very compute intensive – Objective: offering an image simulator service to the research community – Status: parallelised and now running on EGEE resources – Grid Added Value: enables simulation of high-res images • gPTM3D – Interactive tool to segment and analyse medical images A non gridified version is distributed in several hospitals Need for very fast scheduling of interactive tasks – Objectives: shorten computation time using the grid Interactive reconstruction time: < 2min and scalable – Status: development of the gridified version being finalized – Grid Added Value: permanent availability of resources INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 77 Generic Applications Enabling Grids for E-sciencE • EGEE Generic Applications Advisory Panel (EGAAP) – UNIQUE entry point for “external” applications – Reviews proposals and make recommendations to EGEE management Deals with “scientific” aspects, not with technical details Generic Applications group in charge of introducing selected applications to the EGEE infrastructure – 6 applications selected so far: INFSO-RI-508833 Earth sciences (earth observation, geophysics, hydrology, seismology) MAGIC (astrophysics) Computational Chemistry PLANCK (astrophysics and cosmology) Drug Discovery E-GRID (e-finance and e-business) GRACE (grid search engine, ended Feb 2005) EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 78 Security & Intellectual Property (I) Enabling Grids for E-sciencE • The existing EGEE grid middleware is distributed under an Open Source License developed by EU DataGrid – No restriction on usage (scientific or commercial) beyond acknowledgement – Same approach for new middleware • Application software maintains its own licensing scheme – Sites must obtain appropriate licenses before installation INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 108 Summary of grid computing concepts Enabling Grids for E-sciencE • Flexible collaboration across multiple administrative domains – sharing data, computers, instruments, application software,.. • Single sign-on to resources in multiple organisations – Authorisation, authentication • Need for people-services as well as middleware services – credential authorities, VO managers, support • Drives are towards – Production services (reliable, sustainable,… – against which research projects can plan with confidence) In Europe, EGEE In UK, National Grid Service – Standards – Empowering new user communities INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 109 Conclusions Enabling Grids for E-sciencE • Grids are a powerful new tool for science – as well as other fields • Grid computing is used by different communities like Biomedical HEP as the most cost effective computing model • Investments in grid projects are growing world-wide • We are here to help you to join ! INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 110 Whos who in EGEE-IL Enabling Grids for E-sciencE • David Horn – Michal Finkelman-Reuven • Eddie Aronovich • NA3 (dissemination) team – Vered Kunik – Assaf Gotleib • SA1 (technical) team – Yan Ben-Hamou (TAU) – Ofer Wald (OU) – Lorne Levinson (WI) INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 111 Contacts Enabling Grids for E-sciencE • Israeli Academic Grid (IAG) http://iag.iucc.ac.il/ • EGEE Website http://www.eu-egee.org • How to test https://gilda.ct.infn.it INFSO-RI-508833 EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006 112