Plateforme de Calcul pour les Sciences du Vivant Grids, a new way to do science V. Breton CNRS-IN2P3 http://clrpcsv.in2p3.fr V. Breton CUIC 2007 What is Do Son ACGRID school about ? Plateforme de Calcul pour les Sciences du Vivant • The school is about grids – Grids of PC clusters: EGEE tutorial from Nov. 5th to 9th – Desktop grids: BOINC tutorial on Nov 15th • The school is about computational tools that use the grid – For data analysis: ROOT on Nov. 12th and TAVERNA on Nov. 13th – For simulation: GEANT4 on Nov. 14th • The school will consist of courses and hands-on – A Grid has been deployed locally at IOIT for the duration of the school V. Breton CUIC 2007 Our goals for the school Plateforme de Calcul pour les Sciences du Vivant • Train asian engineers to install and operate grid services – Tutorial on grid installation (October 29th – Nov. 2nd) • Train asian researchers to use the services offered by the EGEE grid – Train users to call the grid services – Train users to deploy analysis and simulation tools which take advantage of the grid • Deploy in Vietnam a grid infrastructure researchers can use – Machines bought for the school will be distributed in 5 sites IOIT in Hanoi and HCMC Hanoi University of Technology Maison des Sciences et Technologies Institut Français d’Informatique V. Breton CUIC 2007 What is the Grid? Plateforme de Calcul pour les Sciences du Vivant • The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations • In contrast, the Grid is a new computing infrastructure which provides seamless access to computing power, data and other resources distributed over the globe • The name Grid is chosen by analogy with the electric power grid: plug-in to computing power without worrying where it comes from, like a toaster V. Breton CUIC 2007 Two kinds of grids Plateforme de Calcul pour les Sciences du Vivant Volunteer computing vs grid infrastructures BOINC tutorial on Nov. 15th EGEE grid tutorial Nov 5-9 V. Breton CUIC 2007 What is driving grid development? Plateforme de Calcul pour les Sciences du Vivant Data and compute intensive sciences are next generation applications that have extreme needs but are likely to become mainstream in the next 5 years • Natural Resources and the Environment • Physics/Astronomy (data from different kinds of (weather forecasting, earth observation, modeling and prediction of complex systems: river floods and earthquake simulation) research instruments) • Bioinformatics (study of the human genome and proteome to understand genetic diseases) • Medical/Healthcare (imaging, diagnosis and treatment ) • Nanotechnology (design of new materials from the molecular scale) • Engineering (design optimization, simulation, failure analysis and remote Instrument access and control) V. Breton CUIC 2007 Meteorology Plateforme de Calcul pour les Sciences du Vivant • Necessity for early warning and detection system for e.g. hurricanes • Technology advances at fast speeds: – Infrared sensors on meteorological satellites now provide more and more detailed observations of the atmosphere – Research efforts continue the development of computer forecasting models capable of utilizing satellite data to improve current weather-predicting skills – Meteorological studies are aided by the use of large computers for atmospheric modeling • With easier and faster access to data and models, prediction becomes continually more efficient V. Breton CUIC 2007 Earth Observation Plateforme de Calcul pour les Sciences du Vivant • Long-term global observations of the land surface, biosphere, solid Earth, atmosphere, and oceans produce huge amounts of data: – not in homogeneous data formats – not easy to locate – no obvious user friendly interface • Challenge: understanding the Earth as an integrated system – increased scope and more local details means ever more data – to better understand the interrelations of different components one needs more analysing power – this translates into better forecasting V. Breton CUIC 2007 Climate Simulation Plateforme de Calcul pour les Sciences du Vivant • Climate simulation already uses distributed computing – Example: the scientific experiment “Casino-21” tries to produce a forecast of the climate in the 21st century by a large-scale simulation – “Casino-21” uses a structure like the SETI@home project • Grid infrastructures will provide new and more powerful ways of using distributed computing for the use of Climate Simulation V. Breton CUIC 2007 Pollution Plateforme de Calcul pour les Sciences du Vivant • Satellite monitoring: – helps scientists to understand changes in the atmosphere, track them and plan ways to reduce our environmental impact • A wide variety of emissions is changing the chemistry and composition of our planet's atmosphere • The atmosphere is a very complex chemical system • So far data is used selectively – Increased analysing power gives access to a wider spectrum and optimizes turn-around times V. Breton CUIC 2007 The Vision Plateforme de Calcul pour les Sciences du Vivant • An international network of scientists will be able to model a new flood of the Mekong river in real time, using meteorological and geological data from several centres across Europe • UNOSAT: – internet based service to provide high quality maps to UN agencies, NGOs and other institutions of the humanitarian community – Grid technology allows raw satellite images to be reduced and processed into readable maps at a greater speed than would otherwise be possible Access to a production quality grid will change the way science and earth observation of all kinds are done V. Breton CUIC 2007 How does the grid work? Plateforme de Calcul pour les Sciences du Vivant • The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world • The Grid search engine not only finds the data the scientist needs, but also the data processing techniques and the computing power to carry them out • It distributes the computing task to wherever in the world there is available capacity, and sends the result back to the scientist V. Breton CUIC 2007 Grid Challenges Plateforme de Calcul pour les Sciences du Vivant • • • • • • Share data between thousands of scientists with multiple interests – Need to support dynamic virtual organisations of geographically dispersed groups Ensure all data is accessible anywhere, anytime – Peta-byte range of data needs to be available on-demand Grow rapidly, yet remain reliable for more than a decade – Are we sure the current technologies will scale? – Transfer to industry to achieve economies of scale Standardisation process still on-going – Merge of web-services (OASIS) and grids (GGF) into WSRF – Must progress to avoid non-compatible proprietary grids Cope with different management policies of grid sites – Link computer centres, not just single PCs, separately administered and owned – Need resource allocation policies and billing systems Ensure security – Medical applications have legal/ethical restrictions on data access – Avoid becoming a target for hackers V. Breton CUIC 2007 What is EGEE ? Plateforme de Calcul pour les Sciences du Vivant • EGEE – 1 April 2004 – 31 March 2006 – 71 partners in 27 countries, federated in regional Grids • EGEE-II – 1 April 2006 – 31 March 2008 – 91 partners in 32 countries – 13 Federations • Objectives – Large-scale, production-quality infrastructure for e-Science – Attracting new resources and users from industry as well as science – Maintain and further improve “gLite” Grid middleware V. Breton CUIC 2007 Why did we choose to teach you about EGEE? Plateforme de Calcul pour les Sciences du Vivant • EGEE is an operational grid infrastructure – More than 100000 jobs / day • EGEE offers real services to its user communities – Job and data management services are operational • EGEE Infrastructure is used to analyze LHC data – Joining EGEE allows participating to LHC data analysis • EGEE technology is well supported in Asia – Academia Sinica in Taiwan offers central services to user communities around Asia V. Breton CUIC 2007 What does EGEE provide? Plateforme de Calcul pour les Sciences du Vivant • Simplified access (access to all the operational resources the user needs) • On demand computing (fast access to resources by allocating them efficiently) • Pervasive access (accessible from any geographic location) • Large scale resources (of a scale that no single computer centre can provide) • Sharing of software and data (in a transparent way) • Improved support (use the expertise of all partners to offer in-depth support for all key applications) V. Breton CUIC 2007 Highlights of EGEE-II Plateforme de Calcul pour les Sciences du Vivant • >200 VOs from several scientific domains – – – – – – – – – – Astronomy & Astrophysics Civil Protection Computational Chemistry Comp. Fluid Dynamics Computer Science/Tools Condensed Matter Physics Earth Sciences Fusion High Energy Physics 3000000 Life Sciences • Further applications under evaluation 2500000 2000000 No. jobs / month - all 98k jobs/day 1500000 OPS 1000000 Non-LHC LHC 500000 Applications have moved from 0 testing to routine and daily usage ~80-90% efficiency V. Breton CUIC 2007 EGEE-II middleware Plateforme de Calcul pour les Sciences du Vivant • EGEE maintains and improves the gLite middleware distribution LCG-2 2004 gLite prototyping • gLite 3 prototyping – Publicly released on May 4, 2006 – Convergence with LCG-2 – Currently deploying version 3.1 On Scientific Linux product 2005 product • Work management system • Data management system • Information system 2006 gLite 3.0 • Resource brokering • Security V. Breton CUIC 2007 Operations Plateforme de Calcul pour les Sciences du Vivant Size of the infrastructure today: • 237 sites in 45 countries • ~36 000 CPU • ~ 5 PB disk, + tape MSS • distributed operations • copes well with increase in size and usage 3000000 2500000 No. jobs / month - all 98k jobs/day 2000000 1500000 OPS 1000000 Non-LHC LHC 500000 0 EGEE Network Sites Sites Sites Sites Support Units Users NRENs NRENs NRENs NRENs GGUS ENOC GÉANT2 V. Breton CUIC 2007 Applications Plateforme de Calcul pour les Sciences du Vivant VO CPU Consumption 12000 10000 Non-LHC 8000 6000 4000 2000 V. Breton CUIC 2007 f-07 j-07 d-06 n-06 o-06 s-06 a-06 j-06 j-06 m-06 a-06 m-06 f-06 j-06 d-05 n-05 o-05 s-05 a-05 j-05 j-05 m-05 a-05 m-05 0 f-05 Total VOs: 204 Total Users: 5034 Affected People: 10200 Norm. CPU (1K.SI2K-months) LHC The pilot applications Plateforme de Calcul pour les Sciences du Vivant – High Energy Physics with LHC Computing Grid (www.cern.ch/lcg) relies on a Grid infrastructure to store and analyse petabytes of real and simulated data. LCG is a major source of resources, requirements and a hard deadlines with no conventional solution available – In Biomedical Sciences, several communities are facing equally daunting challenges to cope with the flood of bioinformatics and healthcare data. Need to access large and distributed nonhomogeneous data and important ondemand computing requirements V. Breton CUIC 2007 LCG Plateforme de Calcul pour les Sciences du Vivant • LCG: a collaboration of – The LHC experiments – The Regional Computing Centres – Physics institutes • Mission: – Prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data • Strategy: – Integrate thousands of computers at dozens of participating institutes worldwide into a global computing resource – Rely on software being developed in advanced grid technology projects, both in Europe and in the USA V. Breton CUIC 2007 WISDOM Plateforme de Calcul pour les Sciences du Vivant • WISDOM: a collaboration of – Biology, Bioinformatics, Chemoinformatics laboratories – Grid infrastructure projects • Mission: – in silico drug discovery against emerging and neglected diseases • Strategy: – Centuries of CPU cycles used to dock millions of compounds during large scale grid deployments – Secure data management of biochemical information V. Breton CUIC 2007 Dissemination and Training Plateforme de Calcul pour les Sciences du Vivant www.eu-egee.org 8000 7000 6000 5000 4000 3000 Unique visitors Links from Internet Search Engines 2000 1000 0 • Comprehensive training programme in Europe, South America, Asia • 110 events, > 1600 participants ACGRID is one of these events V. Breton CUIC 2007 What is Do Son ACGRID school about ? Plateforme de Calcul pour les Sciences du Vivant • Grids are about sharing – Resources (CPU, storage) – Knowledge • Do Son ACGRID school is about sharing knowledge – Sharing expertise in the installation and operation of grid services – Sharing expertise in the development of deployment of grid-enabled applications • Do Son ACGRID school is about building for long term collaboration – We are here to help Vietnamese engineers to run grid services – We are here to help vietnamese scientists to develop and deploy gridenabled applications – We are here to present performing tools for data analysis and simulation • TAKE ADVANTAGE OF THIS OPPORTUNITY TO ADVANCE YOUR RESEARCH – ask questions – Don’t hesitate to discuss with teachers V. Breton CUIC 2007 What should happen after the school ? Plateforme de Calcul pour les Sciences du Vivant • Grid services will be installed in several sites in Vietnam – In Hanoi: Hanoi University of Technology, IOIT, Institut Français d’Informatique – In HCMC: IOIT • You will be able to use your grid certificates to access the EGEE grid through these sites – Possibility to join any other Virtual Organization • You will benefit from the grid services as any other EGEE user V. Breton CUIC 2007 What you get out of the school Plateforme de Calcul pour les Sciences du Vivant • Grids offer a unique opportunity to integrate research laboratories into international initiatives – Example: LHC • Grids offer opportunities to start collaboration – Example: Telemedecine Installation of a grid enabled medical imaging platform at IOIT in HCMC Joint application deployment between the platforms in HCMC and Clermont-Ferrand It all depends on you ! V. Breton CUIC 2007 Credits Plateforme de Calcul pour les Sciences du Vivant • IOIT in Hanoi: Vu Duc Thy, Luong Chi Mai, Ngo Tran Anh and collaborators • IOIT in HCMC: Do Van Long • ASGC: Min Tsai, Jinny Chen and collaborators • Nicolas Maire, Sébastien Incerti, René Brun, Georgina Moulton, our second week speakers • HealthGrid: Nicolas Spalinger, Nathanaël Verhaeghe • CNRS office in Hanoi: Bernard Mely, Le Tuyet Trinh • CNRS-IN2P3: Vincent Bloch, Vincent Breton, Géraldine Fettahi, Matthieu Reichstadt, Denis Perret-Gallix, Jean Salzemann • TEIN2: David West V. Breton CUIC 2007