What is the Grid?

advertisement
Plateforme de Calcul pour les Sciences du Vivant
Grids, a new way to do science
V. Breton
CNRS-IN2P3
http://clrpcsv.in2p3.fr
V. Breton CUIC 2007
What is Do Son ACGRID school about
?
Plateforme de Calcul pour les Sciences du Vivant
• The school is about grids
– Grids of PC clusters: EGEE tutorial from Nov. 5th to 9th
– Desktop grids: BOINC tutorial on Nov 15th
• The school is about computational tools that use the
grid
– For data analysis: ROOT on Nov. 12th and TAVERNA on Nov.
13th
– For simulation: GEANT4 on Nov. 14th
• The school will consist of courses and hands-on
– A Grid has been deployed locally at IOIT for the duration of the
school
V. Breton CUIC 2007
Our goals for the school
Plateforme de Calcul pour les Sciences du Vivant
• Train asian engineers to install and operate grid
services
– Tutorial on grid installation (October 29th – Nov. 2nd)
• Train asian researchers to use the services offered by
the EGEE grid
– Train users to call the grid services
– Train users to deploy analysis and simulation tools which take
advantage of the grid
• Deploy in Vietnam a grid infrastructure researchers can
use
– Machines bought for the school will be distributed in 5 sites




IOIT in Hanoi and HCMC
Hanoi University of Technology
Maison des Sciences et Technologies
Institut Français d’Informatique
V. Breton CUIC 2007
What is the Grid?
Plateforme de Calcul pour les Sciences du Vivant
• The World Wide Web provides seamless
access to information that is stored in
many millions of different geographical
locations
• In contrast, the Grid is a new computing
infrastructure which provides seamless
access to computing power, data and
other resources distributed over the globe
• The name Grid is chosen by analogy
with the electric power grid: plug-in to
computing power without worrying
where it comes from, like a toaster
V. Breton CUIC 2007
Two kinds of grids
Plateforme de Calcul pour les Sciences du Vivant
Volunteer computing vs
grid infrastructures
BOINC tutorial
on Nov. 15th
EGEE grid tutorial
Nov 5-9
V. Breton CUIC 2007
What is driving grid development?
Plateforme de Calcul pour les Sciences du Vivant
Data and compute intensive sciences are next generation applications that have
extreme needs but are likely to become mainstream in the next 5 years
•
Natural Resources and the Environment
•
Physics/Astronomy (data from different kinds of
(weather
forecasting, earth observation, modeling and prediction of
complex systems: river floods and earthquake simulation)
research instruments)
•
Bioinformatics (study of the human genome and
proteome to understand genetic diseases)
•
Medical/Healthcare (imaging, diagnosis and treatment )
•
Nanotechnology (design of new
materials from the molecular scale)
•
Engineering (design optimization, simulation, failure
analysis and remote Instrument access and control)
V. Breton CUIC 2007
Meteorology
Plateforme de Calcul pour les Sciences du Vivant
• Necessity for early warning and
detection system for e.g. hurricanes
• Technology advances at fast speeds:
– Infrared sensors on meteorological
satellites now provide more and more
detailed observations of the atmosphere
– Research efforts continue the development
of computer forecasting models capable of
utilizing satellite data to improve current weather-predicting skills
– Meteorological studies are aided by the use of large computers
for atmospheric modeling
• With easier and faster access to data and models,
prediction becomes continually more efficient
V. Breton CUIC 2007
Earth Observation
Plateforme de Calcul pour les Sciences du Vivant
• Long-term global observations of the land surface,
biosphere, solid Earth, atmosphere, and oceans
produce huge amounts of data:
– not in homogeneous data formats
– not easy to locate
– no obvious user friendly interface
• Challenge: understanding the
Earth as an integrated system
– increased scope and more local
details means ever more data
– to better understand the interrelations of different components
one needs more analysing power
– this translates into better forecasting
V. Breton CUIC 2007
Climate Simulation
Plateforme de Calcul pour les Sciences du Vivant
• Climate simulation already uses
distributed computing
– Example: the scientific experiment
“Casino-21” tries to produce a forecast
of the climate in the 21st century by a
large-scale simulation
– “Casino-21” uses a structure like the
SETI@home project
• Grid infrastructures will provide new and more
powerful ways of using distributed computing for
the use of Climate Simulation
V. Breton CUIC 2007
Pollution
Plateforme de Calcul pour les Sciences du Vivant
• Satellite monitoring:
– helps scientists to understand changes in the atmosphere, track
them and plan ways to reduce our environmental impact
• A wide variety of emissions is
changing the chemistry and
composition of our planet's
atmosphere
• The atmosphere is a very
complex chemical system
• So far data is used selectively
– Increased analysing power gives access to a wider spectrum
and optimizes turn-around times
V. Breton CUIC 2007
The Vision
Plateforme de Calcul pour les Sciences du Vivant
• An international network of scientists will be
able to model a new flood of the Mekong river in
real time, using meteorological and geological
data from several centres across Europe
• UNOSAT:
– internet based service to provide high quality
maps to UN agencies, NGOs and other
institutions of the humanitarian community
– Grid technology allows raw satellite images to be
reduced and processed into readable maps at a
greater speed than would otherwise be possible
Access to a production quality grid will change the way science
and earth observation of all kinds are done
V. Breton CUIC 2007
How does the grid work?
Plateforme de Calcul pour les Sciences du Vivant
• The Grid relies on advanced software,
called middleware, which ensures
seamless communication between
different computers and different parts
of the world
• The Grid search engine not only finds
the data the scientist needs, but also
the data processing techniques and the
computing power to carry them out
• It distributes the computing task to
wherever in the world there is available
capacity, and sends the result back to
the scientist
V. Breton CUIC 2007
Grid Challenges
Plateforme de Calcul pour les Sciences du Vivant
•
•
•
•
•
•
Share data between thousands of scientists with multiple interests
– Need to support dynamic virtual organisations of geographically dispersed groups
Ensure all data is accessible anywhere, anytime
– Peta-byte range of data needs to be available on-demand
Grow rapidly, yet remain reliable for more than a decade
– Are we sure the current technologies will scale?
– Transfer to industry to achieve economies of scale
Standardisation process still on-going
– Merge of web-services (OASIS) and grids (GGF) into WSRF
– Must progress to avoid non-compatible proprietary grids
Cope with different management policies of grid sites
– Link computer centres, not just single PCs, separately administered and owned
– Need resource allocation policies and billing systems
Ensure security
– Medical applications have legal/ethical restrictions on data access
– Avoid becoming a target for hackers
V. Breton CUIC 2007
What is EGEE ?
Plateforme de Calcul pour les Sciences du Vivant
• EGEE
– 1 April 2004 – 31 March 2006
– 71 partners in 27 countries, federated in regional Grids
• EGEE-II
– 1 April 2006 – 31 March 2008
– 91 partners in 32 countries
– 13 Federations
• Objectives
– Large-scale, production-quality
infrastructure for e-Science
– Attracting new resources and
users from industry as well as
science
– Maintain and further improve
“gLite” Grid middleware
V. Breton CUIC 2007
Why did we choose to teach you
about EGEE?
Plateforme de Calcul pour les Sciences du Vivant
• EGEE is an operational grid infrastructure
– More than 100000 jobs / day
• EGEE offers real services to its user communities
– Job and data management services are operational
• EGEE Infrastructure is used to analyze LHC data
– Joining EGEE allows participating to LHC data analysis
• EGEE technology is well supported in Asia
– Academia Sinica in Taiwan offers central services to user
communities around Asia
V. Breton CUIC 2007
What does EGEE provide?
Plateforme de Calcul pour les Sciences du Vivant
•
Simplified access (access to all the operational
resources the user needs)
•
On demand computing (fast access to resources by
allocating them efficiently)
•
Pervasive access (accessible from any geographic
location)
•
Large scale resources (of a scale that no single
computer centre can provide)
•
Sharing of software and data (in a transparent way)
•
Improved support (use the expertise of all partners to
offer in-depth support for all key applications)
V. Breton CUIC 2007
Highlights of EGEE-II
Plateforme de Calcul pour les Sciences du Vivant
• >200 VOs from several
scientific domains
–
–
–
–
–
–
–
–
–
–
Astronomy & Astrophysics
Civil Protection
Computational Chemistry
Comp. Fluid Dynamics
Computer Science/Tools
Condensed Matter Physics
Earth Sciences
Fusion
High Energy Physics
3000000
Life Sciences
• Further applications
under evaluation
2500000
2000000
No. jobs / month - all
98k jobs/day
1500000
OPS
1000000
Non-LHC
LHC
500000
Applications have moved from 0
testing to routine and daily usage
~80-90% efficiency
V. Breton CUIC 2007
EGEE-II middleware
Plateforme de Calcul pour les Sciences du Vivant
• EGEE maintains and improves the
gLite middleware distribution
LCG-2
2004
gLite
prototyping
• gLite 3
prototyping
– Publicly released on May 4, 2006
– Convergence with LCG-2
– Currently deploying version 3.1
 On Scientific Linux
product
2005
product
• Work management
system
• Data management system
• Information system
2006 gLite 3.0
• Resource brokering
• Security
V. Breton CUIC 2007
Operations
Plateforme de Calcul pour les Sciences du Vivant
Size of the infrastructure today:
• 237 sites in 45 countries
• ~36 000 CPU
• ~ 5 PB disk, + tape MSS
• distributed operations
• copes well with increase in size
and usage
3000000
2500000
No. jobs / month - all
98k jobs/day
2000000
1500000
OPS
1000000
Non-LHC
LHC
500000
0
EGEE
Network
Sites
Sites
Sites
Sites
Support
Units
Users
NRENs
NRENs
NRENs
NRENs
GGUS
ENOC
GÉANT2
V. Breton CUIC 2007
Applications
Plateforme de Calcul pour les Sciences du Vivant
VO CPU Consumption
12000
10000
Non-LHC
8000
6000
4000
2000
V. Breton CUIC 2007
f-07
j-07
d-06
n-06
o-06
s-06
a-06
j-06
j-06
m-06
a-06
m-06
f-06
j-06
d-05
n-05
o-05
s-05
a-05
j-05
j-05
m-05
a-05
m-05
0
f-05
Total VOs: 204
Total Users: 5034
Affected People: 10200
Norm. CPU (1K.SI2K-months)
LHC
The pilot applications
Plateforme de Calcul pour les Sciences du Vivant
– High Energy Physics with LHC Computing
Grid (www.cern.ch/lcg) relies on a Grid
infrastructure to store and analyse petabytes
of real and simulated data. LCG is a major
source of resources, requirements and a
hard deadlines with no conventional solution
available
– In Biomedical Sciences, several
communities are facing equally daunting
challenges to cope with the flood of
bioinformatics and healthcare data. Need to
access large and distributed nonhomogeneous data and important ondemand computing requirements
V. Breton CUIC 2007
LCG
Plateforme de Calcul pour les Sciences du Vivant
• LCG: a collaboration of
– The LHC experiments
– The Regional Computing Centres
– Physics institutes
• Mission:
– Prepare and deploy the computing environment that will
be used by the experiments to analyse the LHC data
• Strategy:
– Integrate thousands of computers at dozens of
participating institutes worldwide into a global computing
resource
– Rely on software being developed in advanced grid
technology projects, both in Europe and in the USA
V. Breton CUIC 2007
WISDOM
Plateforme de Calcul pour les Sciences du Vivant
• WISDOM: a collaboration of
– Biology, Bioinformatics, Chemoinformatics laboratories
– Grid infrastructure projects
• Mission:
– in silico drug discovery against emerging and neglected diseases
• Strategy:
– Centuries of CPU cycles used to dock millions of compounds
during large scale grid deployments
– Secure data management of biochemical information
V. Breton CUIC 2007
Dissemination
and Training
Plateforme de Calcul pour les Sciences du Vivant
www.eu-egee.org
8000
7000
6000
5000
4000
3000
Unique visitors
Links from
Internet Search Engines
2000
1000
0
• Comprehensive training programme in
Europe, South America, Asia
• 110 events, > 1600 participants
ACGRID is one of these events
V. Breton CUIC 2007
What is Do Son ACGRID school about
?
Plateforme de Calcul pour les Sciences du Vivant
• Grids are about sharing
– Resources (CPU, storage)
– Knowledge
• Do Son ACGRID school is about sharing knowledge
– Sharing expertise in the installation and operation of grid services
– Sharing expertise in the development of deployment of grid-enabled
applications
• Do Son ACGRID school is about building for long term
collaboration
– We are here to help Vietnamese engineers to run grid services
– We are here to help vietnamese scientists to develop and deploy gridenabled applications
– We are here to present performing tools for data analysis and simulation
• TAKE ADVANTAGE OF THIS OPPORTUNITY TO ADVANCE YOUR
RESEARCH
– ask questions
– Don’t hesitate to discuss with teachers
V. Breton CUIC 2007
What should happen after the school
?
Plateforme de Calcul pour les Sciences du Vivant
• Grid services will be installed in several sites in
Vietnam
– In Hanoi: Hanoi University of Technology, IOIT, Institut Français
d’Informatique
– In HCMC: IOIT
• You will be able to use your grid certificates to access
the EGEE grid through these sites
– Possibility to join any other Virtual Organization
• You will benefit from the grid services as any other
EGEE user
V. Breton CUIC 2007
What you get out of the school
Plateforme de Calcul pour les Sciences du Vivant
• Grids offer a unique opportunity to integrate research
laboratories into international initiatives
– Example: LHC
• Grids offer opportunities to start collaboration
– Example: Telemedecine
 Installation of a grid enabled medical imaging platform at IOIT in
HCMC
 Joint application deployment between the platforms in HCMC and
Clermont-Ferrand
It all depends on you !
V. Breton CUIC 2007
Credits
Plateforme de Calcul pour les Sciences du Vivant
• IOIT in Hanoi: Vu Duc Thy, Luong Chi Mai, Ngo Tran
Anh and collaborators
• IOIT in HCMC: Do Van Long
• ASGC: Min Tsai, Jinny Chen and collaborators
• Nicolas Maire, Sébastien Incerti, René Brun, Georgina
Moulton, our second week speakers
• HealthGrid: Nicolas Spalinger, Nathanaël Verhaeghe
• CNRS office in Hanoi: Bernard Mely, Le Tuyet Trinh
• CNRS-IN2P3: Vincent Bloch, Vincent Breton, Géraldine
Fettahi, Matthieu Reichstadt, Denis Perret-Gallix, Jean
Salzemann
• TEIN2: David West
V. Breton CUIC 2007
Download