User communities and applications

advertisement
Enabling Grids for E-sciencE
User communities and
applications
David Fergusson
28th February
www.eu-egee.org
INFSO-RI-508833
Enabling Grids for EsciencE
Enabling Grids for E-sciencE
• What is the EGEE community?
–
–
–
–
–
Researchers in eScience (applications NA4)
eResearch
European community
World grid community
Industry (industry forum)
• What is not the EGEE community?
INFSO-RI-508833
eScience/eResearch
Enabling Grids for E-sciencE
• EGEE’s initial focus is on specific scientific
communities
–
–
–
–
–
High Energy Physics (Large Hadron Collider)
Biomedical
Geology
Chemistry
Astrophysics
• Collaborating with other EU projects in other areas
– For example, digital libraries - DILIGENT
INFSO-RI-508833
Enabling Grids for E-sciencE
Applications in EGEE
• Production service supporting multiple VOs
with different requirements
– Data
 Volume
 Location – distributed?
 Write Once or Update?
 Metadata archives?
 Controlled or open access?
– Computation
 High throughput (~ current LCG)
 High performance, supercomputing
– No. of sites, scientists,…
• Establish viable general process to bring other scientific
communities on board
INFSO-RI-508833
An EGEE community
Enabling Grids for E-sciencE
• EGEE communities are based around the idea of
Virtual Organisations.
• A Virtual Organisation:
– Owns shared computing resources
– Authorises and authenticates its members access to resources
– Manages its own resources
INFSO-RI-508833
EGEE: adding a VO
Enabling Grids for E-sciencE
•
•
•
•
EGEE has a formal procedure for adding selected new
user communities (Virtual Organisations):
Negotiation with one of the Regional Operations
Centres
Seek balance between the resources contributed by a
VO and those that they consume.
Resource allocation will be made at the VO level.
Many resources need to be available to multiple VOs :
shared use of resources is fundamental to a Grid
INFSO-RI-508833
The role of the pilot applications –
HEP and Biomedicine
Enabling Grids for E-sciencE
• Initial area of focus to establish a strong user base on
which to build a broad EGEE user community
• Provide early feedback to the infrastructure activities
on their experience with application deployment and
VO management
• Act as guinea pigs and provide early feedback to the
middleware developers on their experience with new
services
INFSO-RI-508833
EGEE pilot application: Large Hadron
Collider
Enabling Grids for E-sciencE
• Data Challenge:
– 10 Petabytes/year of data !!!
– 20 million CDs each year!
• Simulation, reconstruction,
analysis:
– LHC data handling requires computing
power equivalent to ~100,000 of today's
fastest PC processors!
• Operational challenges
– Reliable and scalable through project
lifetime of decades
INFSO-RI-508833
Mont Blanc
(4810 m)
Downtown Geneva
The characteristics of pilot HEP
applications
Enabling Grids for E-sciencE
• Very large scale from project day 1
• Virtual Organizations were already set up at
project day 1
• Very centralized: jobs are sent in a very organized way
• Multi-grid: data challenges are deployed on several
grids
–
–
–
–
ALICE
ATLAS
CMS
LHCb
INFSO-RI-508833
LCG,
LCG,
LCG,
LCG,
Alien
US Grid2003, Nordugrid
US Grid2003
Dirac
The Large Hadron Collider
Enabling Grids for E-sciencE
http://www.cern.ch
LHC
~9 km
SPS
CERN
INFSO-RI-508833
The LHC Experiments
Enabling Grids for E-sciencE
INFSO-RI-508833
Overview of experiences with LHC
data challenges
Enabling Grids for E-sciencE
• There was continual evolution throughout 2004, with LCG and
experiments gaining more experience in the development and use
of an expanding LCG grid
• All experiments had excellent relations with LCG-EIS support – a
model for the future support of VOs
• Global job efficiencies ranged from 60-80% as experience
developed – must get up to 90+% for user analysis - look to new
middleware developments and tighter operational procedures
• Sources of problems and losses
– Site configuration, management and stability
– Data Management (especially metadata handling)
– Difficult to monitor job running and causes of failure
• D0 in early 2005 showed that one can run with good efficiency with
a set of well controlled sites
INFSO-RI-508833
EGEE pilot application: BioMedical
Enabling Grids for E-sciencE
• BioMedical
– Bioinformatics (gene/proteome databases
distributions)
– Medical applications (screening,
epidemiology, image databases distribution,
etc.)
– Interactive application (human supervision
or simulation)
– Security/privacy constraints
 Heterogeneous data formats - Frequent
data updates - Complex data sets Long term archiving
http://egee-na4.ct.infn.it/biomed/applications.html
INFSO-RI-508833
The characteristics of biomedical pilot
applications
Enabling Grids for E-sciencE
• Prototype level at project day 1
• VO was created after the project kicked-off
• Very decentralized: application developers use the grid
at their own pace
• Very demanding on services
– Compute intensive applications
– Applications requiring large amounts of short jobs
– Need for interactivity or guaranteed response time
• Resources were focused on the deployment of large
scale applications on LCG-2
– Integration of Biomed VO used to identify issues relevant to all
VOs to be deployed during EGEE lifetime
– Decentralized usage of the infrastructure highlights different
weaknesses from the more centralized HEP data challenges
INFSO-RI-508833
Status of Biomedical VO
Enabling Grids for E-sciencE
RLS, VO LDAP Server:
CC-IN2P3
PADOVA
BARI
4 RBs:
CNAF, IFAE,
LAPP, UPV
15 resource centres ( )
17 CEs (>750 CPUs)
16 SEs
4 RBs
1 RLS
1 LDAP Server
INFSO-RI-508833
Enabling Grids for E-sciencE
INFSO-RI-508833
Biomedical VO:
production jobs on EGEE
Biomedical applications
Enabling Grids for E-sciencE
– 3 batch-oriented applications ported on LCG2
 SiMRI3D: medical image simulation
 xmipp_MLRefine: molecular structure analysis
 GATE: radiotherapy planning
– 3 high throughput applications ported on LCG2
 CDSS: clinical decision support system
 GPS@: bioinformatics portal (multiple short jobs)
 gPTM3D: radiology images analysis (interactivity)
– New applications to join in the near future
 Especially in the field of drug discovery
INFSO-RI-508833
Enabling Grids for E-sciencE
•
EGEE pilot application: BioMedical
BioMedical
– Bioinformatics (gene/proteome databases distributions)
– Medical applications (screening, epidemiology, image databases
distribution, etc.)
– Interactive application (human supervision or simulation)
– Security/privacy constraints
 Heterogeneous data formats - Frequent data updates - Complex
data sets - Long term archiving
•
BioMed applications deployed
•
GATE - Geant4 Application for Tomographic Emission
– GPS@ - genomic web portal
– CDSS - Clinical Decision Support System
INFSO-RI-508833
12 Biomed applications
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
•
•
•
•
•
GATE: Geant4 Application for Tomographic Emission (LPC)
Docking platform for tropical diseases: grid-enabled docking platform for in
sillico drug discovery (LPC)
CDSS: Clinical Decision Support System (UPV)
GPS@: Grid genomic web portal (IBCP)
SiMRI 3D: Magnetic Resonance Image simulator (CREATIS)
gPTM 3D: Interactive radiological image visualization and processing tool (LRI)
xmipp_ML_refine: Macromolecular 3D structure analysis (CNB)
xmipp_multiple_CTFs : Electronmicroscopic images CTF calculation (CNB)
GridGRAMM: Molecular Docking web (CNB)
GROCK: Mass screenings of molecular interaction (CNB
Mammogrid: Mammograms analysis (EU project)
SPLATCHE: Genome evolution modeling (U. Berne/WHO)
INFSO-RI-508833
...and more to come
Enabling Grids for E-sciencE
• SPLATCHE
– first application being migrated from GILDA to biomed VO
• Pharmacokinetics in MRI (UPV)
– MRI registration for contrast agent diffusion study
• Some progress on biological sequences analysis (M.
Lexa)
• ...
INFSO-RI-508833
Enabling Grids for E-sciencE
BLAST – comparing DNA or protein sequences
•
BLAST is the first step for analysing new
sequences: to compare DNA or protein
sequences to other ones stored in personal
or public databases. Ideal as a grid
application.
– Requires resources to store databases
and run algorithms
– Can compare one or several sequence
against a database in parallel
– Large user community
INFSO-RI-508833
Bio-medicine applications
Enabling Grids for E-sciencE
•
Bio-informatics
–
–
–
–
–
–
–
•
Phylogenetics
Search for primers
Statistical genetics
Bio-informatics web portal
Parasitology
Data-mining on DNA chips
Geometrical protein comparison
1. Query the medical image database and retrieve a patient image
Medical imaging
– MR image simulation
– Medical data and metadata
management
– Mammographies analysis
– Simulation platform for
PET/SPECT
Exam image patient key ACL ...
Medical
Metadata
images
2. Compute similarity measures over the database images
Submit 1 job per image
3. Retrieve most similar cases
Applications deployed
Applications tested
Applications under preparation
INFSO-RI-508833
Similar images
Low score images
Bio-medicine applications
Enabling Grids for E-sciencE
INFSO-RI-508833
Bio-medicine applications
Enabling Grids for E-sciencE
INFSO-RI-508833
Bio-medicine applications
Enabling Grids for E-sciencE
INFSO-RI-508833
gPTM3D : Grid-Enabling Interactive Medical
Analysis
Enabling Grids for E-sciencE
Interaction
Acquire
INFSO-RI-508833
Explore
Analyse
Interpret
Render
Use case
Enabling Grids for E-sciencE
Planning percutaneous nephrolithotomy
INFSO-RI-508833
Enabling Grids for E-sciencE
Evolution of biomedical
applications
• Growing interest of the biomedical community
– Partners involved proposing new applications
– New application proposals (in various health-related areas)
– Enlargement of the biomedical community (drug discovery)
• Growing scale of the applications
– Progressive migration from prototypes to pre-production services
for some applications
– Increase in scale (volume of data and number of CPU hours)
• Towards pre-production
– Several initiatives to build user-friendly portals and interfaces to
existing applications in order to open to an end-users community
INFSO-RI-508833
Enabling Grids for E-sciencE
A look at the future: the HealthGrid vision
HealthGRID
Public Health
Public Health
Patient
Tissue, organ
Cell
Molecule
Association
Modelling
Computation
Patient
Tissue, organ
Cell
Molecule
Patient
related data
Databases
INDIVIDUALISED HEALTHCARE
MOLECULAR MEDICINE
INFSO-RI-508833
Computational
recommendation
In this context "Health"
does not involve only
clinical practice but
covers
the
whole
range of information
from molecular level
(genetic and proteomic
information) over cells
and tissues, to the
individual and finally
the population level
(social healthcare).
Enabling Grids for E-sciencE
Earth Sciences in EGEE
• Research
– Earth observations by satellite
 (ESA(IT), KNMI(NL), IPSL(FR), UTV(IT),
RIVM(NL),SRON(NL))
– Climate :
 DKRZ(GE),IPSL(FR)
– Solid Earth Physics:
 IPGP (FR)
– Hydrology:
 Neuchâtel University (CH)
• Industry
– CGG : Geophysics Company (FR)
INFSO-RI-508833
Climate Applications in EGEE
Enabling Grids for E-sciencE
Model: Atmosphere, Ocean, Hydrology, Atmospheric and Marine
chemistry….
Goal: Comparison of model outputs from different runs and/or institutes
Large volume of data (TB) from different model outputs, and
experimental data
Run made on supercomputer
=> Link the EGEE infrastruture with supercomputer Grids (DEISA)
EXAMPLE: For the IPCC Assessment reports
many experiment are performed with different
models (different spatial resolution, different timestep, different "physics" ..) and various sites.
The generated data need to be compared in a
comprehensive and "unified" way.
INFSO-RI-508833
Geophysics Applications
Enabling Grids for E-sciencE
Seismic processing Generic Platform:
- Based on Geocluster, an industrial application – to be a starter of the
core member VO.
- Include several standard tools for signal processing, simulation and
inversion.
- Opened: any user can write new
algorithms in new modules (shared or
not)
- Free for academic research
-Controlled by license keys
(opportunity to explore license issue
at a grid level)
- initial partners F, CH, UK, Russia,
Norway
INFSO-RI-508833
Flood simulation
Enabling Grids for E-sciencE
Sample
Vah river
Computer vision
Geographical Information Systems
Results: flow + water depths
INFSO-RI-508833
Computational Chemistry: molecular
simulator
Enabling Grids for E-sciencE
SURFACE
Construction of the
Potential Energy Surface
DYNAMICS
Dynamical properties
Calculation
PROPERTIES
Calculation of
Averaged quantities
no
Good
Results?
yes
end
INFSO-RI-508833
Ar - Benzene
Enabling Grids for E-sciencE
The MAGIC telescope
• Largest Imaging Air Cherenkov
Telescope
(17 m mirror dish)
• Located on Canary Island
La Palma (@ 2200 m asl)
• Lowest energy threshold ever
obtained with a Cherenkov
telescope
 Aim: detect –ray sources in the
unexplored energy range:
30 (10)-> 300 GeV
INFSO-RI-508833
The MAGIC Physics Program
Enabling Grids for E-sciencE
Pulsars
AGNs
 Origin
of
Cosmic Rays
SNRs
INFSO-RI-508833
 Cosmological
-Ray Horizon
 Tests
of Quantum
Gravity effects
GRBs
Cold
Dark Matter
Feedback to LCG-2 middleware
developers and infrastructure
Enabling Grids for E-sciencE
• From HEP applications
– Experiment Integration Support group and Grid Applications
Group produced documents summarizing problems encountered
in use of LCG-2
• From Biomed applications
– Very significant exchanges related to the set-up of the biomed
VO and the deployment of relevant services
– Request to use MPI
INFSO-RI-508833
Engineering applications
Enabling Grids for E-sciencE
INFSO-RI-508833
Engineering applications
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Applications: art
Enabling Grids for E-sciencE
the Thomson
flat scanner
developed in 1990
140,000 photo-archives
digitised in
6.000 dots x 8.000 lines
in 5 years (1996-2001)
Books are being scanned in at
767 MB per page
1/2 Terabyte for Gutenberg Bible
Paintings are being scanned in at
30 GB each
in the EU CRISATEL Project
INFSO-RI-508833
Museo Virtual de Artes El Pais (MUVA)
http://www3.diarioelpais.com/muva/.
Who else can benefit from EGEE?
Enabling Grids for E-sciencE
• EGEE Generic Applications Advisory
Panel:
– For new applications
• EU projects: MammoGrid, Diligent, SEEGRID …
• Expression of interest: Planck/Gaia
(astroparticle), SimDat (drug discovery)
http://agenda.cern.ch/age?a042351
Next meeting at EGEE conference (November)
INFSO-RI-508833
New communities identification
Enabling Grids for E-sciencE
• Through training, dissemination and outreach,
communities already using advanced computing and
keen to use EGEE infrastructure are identified
• These communities are encouraged to prepare a
document describing their interest to use EGEE
• A scientific advisory panel (EGAAP) assesses and
chooses among the interested communities the ones
which seem the most mature to deploy their
applications on EGEE
INFSO-RI-508833
GILDA, an infrastructure for dissemination
and demonstration
Enabling Grids for E-sciencE
• Goals
– Demonstration of grid operation for tutorials and outreach
– Initial deployment of new applications for testing purposes
• Key features
– Initiative of the INFN Grid Project using LCG-2 middleware
– On request, anyone can quickly receive a grid certificate and a
VO membership allowing them to use the infrastructure for 2
weeks
– Certificate expires after two weeks but can be renewed
– Use of friendly interface: Genius grid portal
• Very important for the first steps of new user
communities on to the grid infrastructure
INFSO-RI-508833
GILDA numbers
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
14 sites in 2 continents
>1200 certificates issued, 10% renewed at least once
>35 tutorials and demos performed in 10 months
>25 jobs/day on the average
Job success rate above 96%
>320,000 hits on the web site from 10’s of different countries
>200 copies of the UI live CD
distributed in the world
INFSO-RI-508833
NA4 Applications and GILDA
Enabling Grids for E-sciencE
• 7 Virtual Organizations supported:
–
–
–
–
–
–
–
Biomed
Earth Science Academy (ESR)
Earth Science Industry (CGG)
Astroparticle Physics (MAGIC)
Computational Chemistry (GEMS)
Grid Search Engines (GRACE)
Astrophysics (PLANCK)
• Development of complete interfaces with GENIUS for 3 Biomed
Applications: GATE, hadronTherapy, and Friction/Arlecore
• Development of complete interfaces with GENIUS for 4 Generic
Applications: EGEODE (CGG), MAGIC, GEMS, and CODESA-3D
(ESR) (see demos!)
• Development of complete interfaces with GENIUS for 16
demonstrative applications available on the GILDA Grid
Demonstrator (https://grid-demo.ct.infn.it)
INFSO-RI-508833
Summary
Enabling Grids for E-sciencE
• EGEE and grids – not just physics
• For communities to benefit they need to know what
grids can do for them – dissemination
• Many communities are beginning to adopt the grid
• EGEE has a mechanism for assisting communities
onto the grid
INFSO-RI-508833
Practical URLs
Enabling Grids for E-sciencE
• homepages.nesc.ac.uk/~gcw
• grid-demo.ct.infn.it
INFSO-RI-508833
Download