Introduction to Grid for Biomed community

advertisement
Enabling Grids for E-sciencE
Introduction to Grid
Eddie.Aronovich@cs.tau.ac.il
www.eu-egee.org
INFSO-RI-508833
Acknowledgements
Enabling Grids for E-sciencE
• Presentation is based on slides from:
– Roberto Barbera, University of Catania and INFN (EGEE Tutorial
Roma, 02.11.2005)
– Mike Mineter, Concepts of grid computing
– Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva,
Switzerland (Naregi Symposium 2005 – Tokyo)
– Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva,
Switzerland (APAC, 27 September 2005)
– Guy Warner, NeSC Training Team (An Induction to EGEE for
GOSC and the NGS NeSC, 8th December 2004 )
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
2
EGEE project in 1K words
Enabling Grids for E-sciencE
https://goc.grid-support.ac.uk/gridsite/monitoring/
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
3
From Phase I to II
Enabling Grids for E-sciencE
•
From 1st EGEE EU Review in February 2005:
– “The reviewers found the overall performance of the project very good.”
– “… remarkable achievement to set up this consortium, to realize
appropriate structures to provide the necessary leadership, and to cope
with changing requirements.”
•
EGEE I
– Large scale deployment of EGEE infrastructure to deliver
production level Grid services with selected number of applications
•
EGEE II
– Natural continuation of the project’s first phase
– Emphasis on providing an infrastructure for e-Science
 increased support for applications
 increased multidisciplinary Grid infrastructure
 more involvement from Industry
– Extending the Grid infrastructure world-wide
 increased international collaboration
(Asia-Pacific is already a partner!)
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
4
What do we need more ?
Enabling Grids for E-sciencE
• Processing power
• Storage
• Security aware integrative infrastructure
• Community aware environment
Or what we may call….
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
5
e-Science
Enabling Grids for E-sciencE
• What is e-Science?
Collaborative science that is made possible by the
sharing across the Internet of resources (data,
instruments, computation, people’s expertise...)
– Often very compute intensive
– Often very data intensive (both creating new data and accessing
very large data collections) – data deluges from new
technologies
– Crosses organisational boundaries
• Examples….
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
6
A good example: Particle Physics
Enabling Grids for E-sciencE
• Large amount of data produced in a few places: CERN, FNAL,
KEK…
• Large worldwide organized collaborations (i.e. LHC CERN
experiments) of computer-savvy scientists
• Computing and data management resources distributed world-wide
owned and managed by many different entities
• Large Hadron Collider (LHC)
at CERN in Geneva
Switzerland:
– One of the most powerful
instruments ever built to
investigate matter
INFSO-RI-508833
Mont Blanc
(4810 m)
Downtown Geneva
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
7
The LHC Experiments
Enabling Grids for E-sciencE
• Large Hadron Collider
(LHC):
– four experiments:




ALICE
ATLAS
CMS
LHCb
– 27 km tunnel
– Start-up in 2007
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
8
Orders of magnitude…
Enabling Grids for E-sciencE
10-15 Petabytes
˜20.000.000 CDROM
INFSO-RI-508833
10 times the
Eiffel Tower
˜3000 m
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
9
EGEE – building e-infrastructure
Enabling Grids for E-sciencE
EGEE is building a large-scale
production grid service to:
• Underpin research,
technology and public service
• Link with and build on
national, regional and
international initiatives
• Foster international
cooperation both in the
creation and the use of the einfrastructure
INFSO-RI-508833
Pan-European Grid
Operations, Support and
training
Collaboration
Network
infrastructure
& Resource
centres
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
10
Grids and e-Infrastructure
Enabling Grids for E-sciencE
• “Campus grids”: internal to an institute / university:
–
–
–
–
–
“High throughput” – harvesting compute time
Not really ‘a grid’ unless crossing administrative domains
Can become a resource on a grid
Example: Condor
http://www.nesc.ac.uk/esi/events/556/
• Grids: cross administrative boundaries
– National scale: in UK, NGS
– Regional efforts: in China, EUMedGrid, CrossGrid, SeeGrid
– International scale: in Europe, EGEE
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
11
e-Infrastructure
Enabling Grids for E-sciencE
• implementation blocks
From a talk by Mario Campolargo, Brussels, 30 May 2005
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
12
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Implementation samples
Grid Status & Standards
The basis: authentication, authorisation, security
So, What can it do ?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
13
The Grid Metaphor
Enabling Grids for E-sciencE
Mobile Access
G
R
I
D
Workstation
M
I
D
D
L
E
W
A
R
E
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Visualising
Internet, networks
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
14
The grid vision
Enabling Grids for E-sciencE
• The grid vision is of “Virtual
computing” (+ information
services to locate computation,
storage resources)
– Compare: The web: “virtual
documents” (+ search engine
to locate them)
• MOTIVATION: collaboration
through sharing resources
(and expertise) to expand
horizons of
– Research
– Commerce – engineering, …
“the knowledge economy”
– Public service – health,
environment,…
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
15
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Implementation samples
Grid Status & Standards
The basis: authentication, authorisation, security
So, What can it do ?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
16
“A grid”
Enabling Grids for E-sciencE
• The initial vision: “The Grid”
• The present reality: Many
“grids”
• Each grid is an infrastructure
enabling one or more “virtual
organisations” to share
computing resources
• What’s a VO?
– People in different
organisations seeking to
cooperate and share
resources across their
organisational boundaries
• Why establish a Grid?
VO
Institute A
Institute B
Institute C
Institute D
– Share data
– Pool computers
– Collaborate
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
17
The Single Computer
Enabling Grids for E-sciencE
• The Operating System
enables easy use of
–
–
–
–
–
Input devices
Processor
Disks
Display
Any other attached devices
Application
Software
Operating
System
Disks, Processor,
Memory, …
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
18
Resources on a Local Area Network
Enabling Grids for E-sciencE
User just perceives “shared
resources”, with no regard to
location in the organisation:
- Authenticated by username /
password
- Authorised to use own files,…
Application Software
Middleware for sharing
computers, servers, printers, …
Operating System on each
computer
Resources connected by a LAN
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
19
Resources on a grid
Enabling Grids for E-sciencE
Application Software
Interface between app. and grid
Grid Middleware: “collective services”
Grid Middleware on each
resource
Operating System on each
resource
Resources connected by internet
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
20
A grid
Enabling Grids for E-sciencE
• Grid middleware
runs on each
shared resource
– Data storage
– (Usually) batch
jobs on pools of
processors
• Users join VO’s
• Virtual organisation
negotiates with
sites to agree
access to resources
INTERNET
• Distributed services
(both people and
middleware) enable
the grid
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
21
What characterises a grid?
Enabling Grids for E-sciencE
• Co-ordinated resource sharing
– No centralised point of control
– Different administrative domains.
• Standard, open, general-purpose protocols and
interfaces
– NOT specific to an application
– EGEE, NGS support multiple VO’s
• Delivering non-trivial qualities of service
– Co-ordinated to deliver combined services,
greater than sum of the individual components
• http://www.gridtoday.com/02/0722/100136.html
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
22
The components of a Grid
Enabling Grids for E-sciencE
• Resources
– networking, computers, storage, data, instruments, …
• Grid Middleware
– the “operating system of the grid”
• Operations infrastructure
– Run enabling services (people + software)
• Virtual Organization management
– Procedures for gaining access to resources
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
23
Key concepts
Enabling Grids for E-sciencE
• Virtual organisation: people and resources
collaborating - across admin, organisational
boundaries
• Single sign-on
– I connect to one machine – some sort of “digital credential” is
passed on to any other resource I use, basis of:
 Authentication: How do I identify myself to a resource without
username/password for each resource I use?
 Authorisation: what can I do? Determined by
• My membership of VO
• VO negotiations with resource providers
• Grid middleware runs on each resource
• User just perceives “shared resources” with no
concern for location or owning organisation
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
24
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Implementation samples
Grid Status & Standards
The basis: authentication, authorisation, security
So, What can it do ?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
25
Large Hadron Collider at CERN
Enabling Grids for E-sciencE
• Data Challenge:
– 10 Petabytes/year of data !!!
– 20 million CDs each year!
• Simulation, reconstruction,
analysis:
– LHC data handling requires computing
power equivalent to ~100,000 of today's
fastest PC processors!
• Operational challenges
– Reliable and scalable through project
lifetime of decades
INFSO-RI-508833
Mont Blanc
(4810 m)
Downtown Geneva
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
27
Enabling Grids for E-sciencE
Input
file
Seq1 > dcscdssdcsdcdsc
Computing
element
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
Seq1
zedezdze>
bcbjbf
dscbscds
dssdcsdc
dedzedzdzedezdze
cdscsdcsc
bcbjbf
dscbscds
cdscsdcscdssdcsdc
dssdcsdc
bcbjbf
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
bscdsbcbjbfvbfvbvfbvbvbhvbhs
vbhdvbhfdbvfd
bhvdsvbhvbhdvrefghefgdscgdfg
csdycgdkcsqkc
…
Seqn > bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfg
csdycgdkcsqkchdsqhfduhdhdhq
edezhhezldhezhfehflezfzejfv
dedzedz
dzedezd
dedzedz
zecdscsd
dzedezd
dedzedz
cscdssdc
zecdscsd
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
dzedezd
cdsbcbjb
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
dedzedz
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
dzedezd
sdcdscbs
cscdssdc
zecdscsd
f cdsbcbjb
sdcdscbs
cscdssdc
f cdsbcbjb
sdcdscbs
f cdsbcbjb
f
BLAST
UI
Seq2 > bvdfvfdvhbdfvb
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
Seq2
zedezdze>
dscbscds
dssdcsdc
dedzedzdzedezdze
cdscsdcsc
bcbjbf
dscbscds
cdscsdcscdssdcsdc
dssdcsdc
bcbjbf
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
dedzedzd
Seqn
zedezdze>
dedzedzdzedezdze
cdscsdcsc
cdscsdcscdssdcsdc
dssdcsdc
dscbscdsbcbjbdfn
dscbscds
dfjvbndfbnbnfbjn
bcbjbf
bjxbnxbjk:nxbf
BLAST gridification
dedzedzdzedezdzecdscsdcscdssdcsd
cdscbscdsbcbjbfvbfvbvfbvbvbhvbh
svbhdvbhfdbvfdbvdfvfdvhbdfvbhd
bhvdsvbhvbhdvrefghefgdscgdfgcsd
ycgdkcsqkcqhdsqhfduhdhdhqedezh
dhezldhezhfehflezfzeflehfhezfhehf
ezhflezhflhfhfelhfehflzlhfzdjazslzd
hfhfdfezhfehfizhflqfhduhsdslchlkc
hudcscscdscdscdscsddzdzeqvnvqvn
q! Vqlvkndlkvnldwdfbwdfbdbd
wdfbfbndblnblkdnblkdbdfbwfdbfn
INFSO-RI-508833
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
bcbjbf
BLAST
DB
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dedzedzd
dssdcsdc
cdscsdcsc
zedezdze
dedzedzd
dscbscds
dssdcsdc
cdscsdcsc
zedezdze
bcbjbf
dscbscds
dssdcsdc
cdscsdcsc
bcbjbf
dscbscds
dssdcsdc
bcbjbf
dscbscds
RESULT
BLAST
bcbjbf
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dssdcsdc
cdscsdcsc
dscbscds
dssdcsdc
bcbjbf
dscbscds
bcbjbf
BLAST
dedzedzd
zedezdze
dedzedzd
cdscsdcsc
zedezdze
dssdcsdc
cdscsdcsc
dscbscds
dssdcsdc
bcbjbf
dscbscds
DB
bcbjbf
Computing
element
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
28
Enabling Grids for E-sciencE
DAME: Grid based tools and Inferstructure for Aero-Engine Diagnosis
and Prognosis
Engine flight data
London Airport
Airline
office
New York Airport
•“A Significant factor in the success of the Rolls-Royce
campaign to power the Boeing 7E7 with the Trent 1000
was the emphasis on the new aftermarket support service
for the engines provided via DS&S. Boeing personnel
were shown DAME as an example of the new ways of
gathering and processing the large amounts of data that
could be retrieved from an advanced aircraft such as the
7E7, and they were very impressed”, DS&S 2004
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
XTO
Companies:
Rolls-Royce
DS&S
Cybula
INFSO-RI-508833
Universities:
York,
Leeds,
Sheffield, Oxford
Engine Model
Case Based Reasoning
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
29
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Implementation samples
Grid Status & Standards
The basis: authentication, authorisation, security
So, What can it do ?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
55
Enabling Grids for E-sciencE
If “The Grid”
vision leads us
here…
… then where are
we now?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
56
Grids: where are we now?
Enabling Grids for E-sciencE
• Many key concepts identified and known
• Many grid projects have tested, and benefit from, these
• Major efforts now on establishing:
– Standards (a slow process)
(e.g. Global Grid Forum, http://www.gridforum.org/ )
– Production Grids for multiple VO’s
 “Production” = Reliable, sustainable, with commitments to quality of
service
• In Europe, EGEE
• In UK, National Grid Service
• In US, Teragrid
 One stack of middleware that serves many research (and other!!!)
communities
 Operational procedures and services (people!, policy,..)
– New user communities
• … whilst research & development continues
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
58
The vision of 2001: convergence of
Web Services and Grids
Enabling Grids for E-sciencE
Open Grid
Services
Architecture
OGSI
Grid prototypes
Web services
World-wide web
High-end computing
High throughput-computing
INTERNET
INFSO-RI-508833
Massively parallel
computing
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
59
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Implementation samples
Grid Status & Standards
The basis: authentication, authorisation, security
So, What can it do ?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
60
Approaches to Security: 1
Enabling Grids for E-sciencE
The Poor Security House
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
61
Approaches to Security: 2
Enabling Grids for E-sciencE
The Paranoid Security House
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
62
Approaches to Security: 3
Enabling Grids for E-sciencE
The Realistic Security House
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
63
Grid security and trust -1
Enabling Grids for E-sciencE
• Providers of resources (computers, databases,..) need risks to
be controlled: they are asked to trust users they do not know
– They trust a VO
– The VO trusts its users
• User’s need
– single sign-on: to be able to logon to a machine that can pass the
user’s identity to other resources
– To trust owners of the resources they are using
• Build middleware on layer providing:
– Authentication: who wants to use/provide resource
– Authorisation: what the user is allowed to do
– Security: reduce vulnerability, e.g. from outside the firewall
– Non-repudiation: knowing who did what
• Digital credentials and the “Grid Security Infrastructure”
middleware are the basis of production grids
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
64
Grid security and trust -2
Enabling Grids for E-sciencE
• Currently, achieved by Certification:
– User’s identity has to be certified by one of the national
Certification Authorities (CAs)
 mutually recognized http://www.gridpma.org/,
for EU go via here to http://marianne.in2p3.fr/datagrid/ca/catable-ca.html to find your CA
•E.g. In IL go to https://certificates.iucc.ac.il
– Resources are also certified by CAs
• User
– User joins a VO
– Digital certificate is basis of AA
– Identity passed to other resources you use, where it is
mapped to a local account – the mapping is maintained by
the VO
• Common agreed policies establish rights for a
Virtual Organization to use resources
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
65
Grid security and trust -3
Enabling Grids for E-sciencE
• Certification and GSI provides
– Authentication
 Resource can trust user
 User can trust the resource provider
 …. So long as certificates are protected – they are your grid
identity
– A basis for Authorisation
 so a VO can manage access to resources
 Resource providers trust the VO
 The VO trusts the user
– Mechanism for checking message integrity
 Messages are passed between machines
 Public/private key pairs protect message integrity as well as
authentication
•Not (usually) encrypted but message-integrity is checked
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
66
Certificate Request
Enabling Grids for E-sciencE
User send public
key to CA along
with proof of
identity.
User generates
public/private
key pair.
Certificate
Request
Public Key
CA confirms identity,
signs certificate and
sends back to user.
Cert
ID
Private Key
encrypted on
local disk
INFSO-RI-508833
slide based on presentation given by Carl Kesselman at GGF Summer School 2004
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
67
Contents
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
“The Grid” vision
What is “a grid” ?
Drivers of grid computing
Implementation samples
Grid Status & Standards
The basis: authentication, authorisation, security
So, What can it do ?
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
68
BioMed Overview
Enabling Grids for E-sciencE
• Infrastructure
– ~3.000 CPUs
– ~12 TB of disk
– in 9 countries
PADOVA
BARI
• >50 users in 7 countries
working with 12
applications
• 18 research labs
15 resource
centres
17 CEs
16 SEs
BIOMED
Number of jobs
Number of jobs
25,000
20,000
15,000
10,000
5,000
0
2004-09
2004-10
2004-11
2004-12
2005-01
2005-02
2005-03
Month
Month
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
70
Biomed Virtual Organisation
Enabling Grids for E-sciencE
• ~ 70 users, 9 countries
• > 12 Applications (medical image processing, bioinformatics)
• ~3000 CPUs, ~12 TB disk space
• ~100 CPU years, ~ 500K jobs last 6 months
120000
60
100000
50
80000
40
60000
30
40000
20
duration estimate (years)
nb of jobs
BIOMED jobs distribution
registered jobs
successful jobs
20000
10
cancelled jobs
aborted jobs
0
0
2005-01
INFSO-RI-508833
2005-02
2005-03
2005-04
2005-05
2005-06
2005-07
run duration estimate
2005-08
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
71
Bioinformatics
Enabling Grids for E-sciencE
• GPS@: Grid Protein Sequence Analysis
– Gridified version of NPSA web portal
 Offering proteins databases and sequence analysis algorithms to the
bioinformaticians (3000 hits per day)
 Need for large databases and big number of short jobs
– Objective: increased computing power
– Status: 9 bioinformatic softwares gridified
– Grid added value: open to a wider community with larger
bioinformatic computations
• xmipp_MLrefine
– 3D structure analysis of macromolecules
 From (very noisy) electron microscopy images
 Maximum likelihood approach to find the optimal model
– Objective: study molecule interaction and chem. properties
– Status: algorithm being optimised and ported to 3D
– Grid added value: parallel computation on different resources of
independent jobs
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
72
Drug Discovery
Enabling Grids for E-sciencE
• Demonstrate the relevance and the impact of the grid
approach to address Drug Discovery for neglected
diseases
Target discovery
Target
Identification
Lead discovery
Target
Validation
Database
filtering
Alignment
Similarity
analysis
vHTS
Lead
Identification
Biophores
Lead
Optimization
Clinical Phases
(I-III)
QSAR
ADMET
diversity Combinatorial de novo
selection
libraries
design
Computer Aided
Drug Design
(CADD)
Duration: 12 – 15 years, Costs: 500 - 800 million US $
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
73
Docking platform components
Enabling Grids for E-sciencE
• Predict how small molecules, such as substrates or
drug candidates, bind to a receptor of known 3D
structure
Grid infrastructure
UI
Targets family
~10
Compounds database
~millions
Parameter /
scoring settings
Software methods
~10
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
74
First biomedical data challenge: World-wide In
Silico Docking On Malaria (WISDOM)
Enabling Grids for E-sciencE
• Significant biological parameters
– two different molecular docking
applications (Autodock and FlexX)
– about one million virtual ligands
selected
– target proteins from the parasite
responsible for malaria
• Significant numbers
Domain distribution of Flexx run jobs
bg; 597
com; 1072
cy; 383
de; 715
uk; 8106
es; 5122
tw; 827
ru; 218
ro; 337
pl; 1877
fr; 7580
nl; 3356
it; 3687
il; 263
gr; 2004
– Total of about 46 million ligands
docked in 6 weeks
WISDOM open day
– 1TB of data produced
– Up 1000 computers in 15 countries December 16th, 2005, Bonn (Germany)
used simultaneously corresponding
to about 80 CPU years
Discuss Data Challenge results
Prepare next steps towards a malaria
Grid (EGEE-II, Embrace, Bioinfogrid)
Information: http://wisdom.eu-egee.fr
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
75
Medical imaging
Enabling Grids for E-sciencE
• GATE
– Radiotherapy planning
 Improvement of precision by Monte Carlo simulation
 Processing of DICOM medical images
– Objective: very short computation time compatible with clinical
practice
– Status: development and performance testing
– Grid Added Value: parallelisation reduces computing time
• CDSS
– Clinical Decision Support System
 Assembling knowledge databases
 Using image classification engines
– Objective: access to knowledge databases from hospitals
– Status: from development to deployment, some medical end
users
– Grid Added Value: ubiquitous, managed access to distributed
databases and engines
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
76
Medical imaging
Enabling Grids for E-sciencE
• SiMRI3D
– 3D Magnetic Resonance Image Simulator
 MRI physics simulation, parallel implementation
 Very compute intensive
– Objective: offering an image simulator service to the research
community
– Status: parallelised and now running on EGEE resources
– Grid Added Value: enables simulation of high-res images
• gPTM3D
– Interactive tool to segment and analyse medical images
 A non gridified version is distributed in several hospitals
 Need for very fast scheduling of interactive tasks
– Objectives: shorten computation time using the grid
 Interactive reconstruction time: < 2min and scalable
– Status: development of the gridified version being finalized
– Grid Added Value: permanent availability of resources
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
77
Generic Applications
Enabling Grids for E-sciencE
• EGEE Generic Applications Advisory Panel (EGAAP)
– UNIQUE entry point for “external” applications
– Reviews proposals and make recommendations to EGEE management
 Deals with “scientific” aspects, not with technical details
 Generic Applications group in charge of introducing selected applications to
the EGEE infrastructure
– 6 applications selected so far:







INFSO-RI-508833
Earth sciences (earth observation, geophysics, hydrology, seismology)
MAGIC (astrophysics)
Computational Chemistry
PLANCK (astrophysics and cosmology)
Drug Discovery
E-GRID (e-finance and e-business)
GRACE (grid search engine, ended Feb 2005)
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
78
Security & Intellectual Property (I)
Enabling Grids for E-sciencE
•
The existing EGEE grid middleware is distributed
under an Open Source License developed by EU
DataGrid
– No restriction on usage (scientific or commercial)
beyond acknowledgement
– Same approach for new middleware
•
Application software maintains its own licensing
scheme
– Sites must obtain appropriate licenses before
installation
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
108
Summary of grid computing concepts
Enabling Grids for E-sciencE
• Flexible collaboration across multiple administrative
domains – sharing data, computers, instruments,
application software,..
• Single sign-on to resources in multiple organisations
– Authorisation, authentication
• Need for people-services as well as middleware
services
– credential authorities, VO managers, support
• Drives are towards
– Production services (reliable, sustainable,… – against which
research projects can plan with confidence)
 In Europe, EGEE
 In UK, National Grid Service
– Standards
– Empowering new user communities
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
109
Conclusions
Enabling Grids for E-sciencE
• Grids are a powerful new tool for science – as well as
other fields
• Grid computing is used by different communities like
Biomedical HEP as the most cost effective computing
model
• Investments in grid projects are growing world-wide
• We are here to help you to join !
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
110
Whos who in EGEE-IL
Enabling Grids for E-sciencE
• David Horn
– Michal Finkelman-Reuven
• Eddie Aronovich
• NA3 (dissemination) team
– Vered Kunik
– Assaf Gotleib
• SA1 (technical) team
– Yan Ben-Hamou (TAU)
– Ofer Wald (OU)
– Lorne Levinson (WI)
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
111
Contacts
Enabling Grids for E-sciencE
• Israeli Academic Grid (IAG)
http://iag.iucc.ac.il/
• EGEE Website
http://www.eu-egee.org
• How to test
https://gilda.ct.infn.it
INFSO-RI-508833
EGEE tutorial, Seoul Biomed Grid Induction (Tel-Aviv Univ), Feb 2006
112
Download