Middleware Development and Deployment Status Tony Doyle

advertisement
Middleware
Development and
Deployment Status
Tony Doyle
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Contents
• What are the
Challenges?
• What is the scale?
• How does the Grid
work?
• What is the status of
(EGEE) middleware
development?
• What is the
deployment status?
9 November 2004
• What is GridPP doing
as part of the
International effort?
– What was GridPP1?
– Is GridPP a Grid?
– What is planned for
GridPP2?
– What lies ahead?
• Summary
– Why? What? How?
When? Tony Doyle - University of Glasgow
PPE & PPT Lunchtime Talk
Science generates data
and might require a Grid?
Astronomy
Healthcare
Earth Observation
Bioinformatics
?
Digital Curation
Collaborative Engineering
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
What are the challenges?
•Must
•share data between thousands of scientists with
multiple interests
•link major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2])
computer centres
•ensure all data accessible anywhere, anytime
•grow rapidly, yet remain reliable for more than a
decade
•cope with different management policies of different
centres
•ensure data security
•be up and running routinely by 2007
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
What are the challenges?
1. Software process 2. Software efficiency3. Deployment planning
10. Policies
9. Accounting
9 November 2004
Data Management,
Security and
Sharing
8. Analyse data
7. Install software
PPE & PPT Lunchtime Talk
4. Link centres
5. Share data
6. Manage data
Tony Doyle - University of Glasgow
Tier-1 Scale
Step-1
Disk Doubling
CPU Doubling
Select Hardware Price Assumptions using the GREEN cells
(These are contained in the worksheet called "Assumptions")
Model
AGGRESSIVE
AGGRESSIVE
STA
NDA RD
STANDARD
2002
12
24
2003
12
24
Calender Year
2004
2005
2006
2007
2008
15
15
15
18
18
24
24
24
24
24
Table-1: Moore's Law Assumptions
2009
18
24
2010
18
24
2011
18
24
2012
18
24
Step-1..
CPU (kSI2K)
9100
16600
12600
9500
47800
Disk (Tbytes) 3000
9200
8700
1300
22200
financial planning
Tape (Pbytes)
3.6
6
6.6
0.4
16.6
Number of T1s
5
11
7
6
29
Step-2.. Compare to
Ian Foster / Carl Kesselman:
(e.g. Tier-1) expt. requirements
"A computational Grid is a
Step-3.. Conclude that more
hardware and software
than one centre is needed
infrastructure that provides
Step-4.. A Grid?
dependable, consistent,
Reqts 2008
Currently network
performance doubles
every year (or so) for
unit cost.
9 November 2004
ALICE
ATLAS
CMS
LHCb
SUM
pervasive and inexpensive
access to high-end
computational capabilities."
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
What is the Grid?
Hour Glass
I. Experiment Layer
e.g. Portals
II. Application Middleware
e.g. Metadata
III. Grid Middleware
e.g. Information Services
IV. Facilities and Fabrics
e.g. Storage Services
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
How do I start?
http://www.gridpp.ac.uk/start/
• Getting started as a Grid user
• Quick start guide for LCG2
GridPP guide to starting as a user of the Large Hadron Collider
Computing Grid.
• Getting an e-science certificate
In order to use the Grid you need a Grid certificate. This page
introduces the UK e-Science Certification Authority, which issues
cerficates to users. You can get a certificate from here.
• Using the LHC Computing Grid (LCG)
CERN's guide on the steps you need to take in order to become a user
of the LCG. This includes contact details for support.
• LCG user scenario
This describes in a practical way the steps a user has to follow to
send and run jobs on LCG and to retrieve and process the output
successfully.
• Currently being improved..
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Job Submission
(behind the scenes) Replica
UI
JDL
Catalogue
Input “sandbox”
DataSets info
Information
Service
Output “sandbox”
9 November 2004
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Submission
Service
Job Status
PPE & PPT Lunchtime Talk
Compute
Element
Tony Doyle - University of Glasgow
Enabling Grids for E-sciencE
Deliver a 24/7 Grid service to European
science
•build a consistent, robust and secure Grid
network that will attract additional computing
resources.
•continuously improve and maintain the
middleware in order to deliver a reliable
service to users.
•attract new users from industry as well as
science and ensure they receive the high
standard of training and support they need.
•100 million euros/4years, funded by EU
• >400 software engineers + service
support
• 70 European partners
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Enabling Grids for E-sciencE
Prototype Middleware
Status & Plans (I)
• Workload Management
– AliEn TaskQueue
– EDG WMS (plus new TaskQueue and Information
Supermarket)
– EDG L&B
• Computing Element
– Globus Gatekeeper + LCAS/LCMAPS
 Dynamic accounts (from Globus)
Blue: deployed on
development
testbed
Red: proposed
– CondorC
– Interfaces to LSF/PBS (blahp)
– “Pull components”
 AliEn CE
 gLite CEmon (being configured)
INFSO-RI-508833
LHCC Comprehensive Review – November 2004 11
Prototype Middleware
Status & Plans (II)
Enabling Grids for E-sciencE
•
Storage Element
•
– Existing SRM implementations
– Simple interface defined
(AliEn+BioMed)
 dCache, Castor, …
 FNAL & LCG DPM
– gLite-I/O (re-factored AliEn-I/O)
•
Catalogs
– AliEn FileCatalog – global catalog
– gLite Replica Catalog – local
catalog
– Catalog update (messaging)
– FiReMan Interface
– RLS (globus)
•
Metadata Catalog
•
Information & Monitoring
– R-GMA web service version;
multi-VO support
Data Scheduling
– File Transfer Service
(Stork+GridFTP)
– File Placement Service
– Data Scheduler
INFSO-RI-508833
LHCC Comprehensive Review – November 2004 12
Enabling Grids for E-sciencE
• Security
– VOMS as Attribute Authority
and VO mgmt
– myProxy as proxy store
– GSI security and VOMS
attributes as enforcement
 fine-grained authorization
(e.g. ACLs)
 globus to provide a set-uid
service on CE
• Accounting
Prototype Middleware
Status & Plans (III)
• User Interface
– AliEn shell
– CLIs and APIs
– GAS
 Catalogs
 Integrate remaining services
• Package manager
– Prototype based on AliEn
backend
– evolve to final architecture
agreed with ARDA team
– EDG DGAS (not used yet)
INFSO-RI-508833
LHCC Comprehensive Review – November 2004 13
CB
PMB
Deployment Board
Tier1/Tier2,
Testbeds,
Rollout
Metadata
Storage
Workload
Requirements
Network
Security
Info. Mon.
User
feedback
LCG
ARDA
PPE & PPT Lunchtime Talk
Application
Development
Expmts
9 November 2004
EGEE
Service
specification
& provision
User Board
Tony Doyle - University of Glasgow
Middleware
Development
Grid Data Management
Network Monitoring
Configuration Management
Storage Interfaces
Information Services
Security
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Application
Development
ATLAS
BaBar (SLAC)
9 November 2004
LHCb
CMS
SAMGrid (FermiLab) QCDGrid
PPE & PPT Lunchtime Talk
PhenoGrid
Tony Doyle - University of Glasgow
GridPP Deployment
Status
GridPP
deployment is
part of LCG
(Currently the
largest Grid in
the world)
The future Grid
in the UK is
dependent upon
LCG releases
Three Grids on Global scale in HEP (similar functionality)
sites
CPUs
• LCG (GridPP)
90 (15) 8700 (1500)
• Grid3 [USA]
29
2800
• NorduGrid
30
3200
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
LCG Overview
By 2007:
- 100,000 CPUs
- More than 100 institutes
worldwide
- building on complex middleware
being developed in advanced Grid
technology projects, both in Europe
(Glite) and in the USA (VDT)
- prototype went live in September
2003 in 12 countries
- Extensively tested by the LHC
experiments during this summer
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Deployment Status
(26/10/04)
•
Incremental releases: significant improvements in
reliability, performance and scalability
– within the limits of the current architecture
– scalability is much better than expected a year ago
•
Many more nodes and processors than anticipated
– installation problems of last year overcome
– many small sites have contributed to MC productions
•
•
Full-scale testing as part of this year’s data challenges
GridPP “The Grid becomes a reality” – widely reported
Technology
Sites
9 November 2004
British Embassy
(USA)
PPE & PPT Lunchtime Talk
British Embassy
(Russia)
Tony Doyle - University of Glasgow
Data Challenges
• Ongoing..
• Grid and
non-Grid
Production
• Grid now
significant
• ALICE - 35
CPU Years
• Phase 1
done
• Phase 2
ongoing
LCG
• CMS - 75 M events
and 150 TB: first
of this year’s Grid
data challenges
Entering Grid
Production Phase..
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Data Challenge
• 7.7 M GEANT4 events and 22 TB
ATLAS DC2 - LCG - September 7
• UK ~20% of LCG
• Ongoing..
• (3) Grid
Production
• ~150 CPU years
so far
• Largest total
computing
requirement
• Small fraction of
what ATLAS
need..
1%
2%
0%
1%
2%
10%
ATLAS
DC2 - CPU usage
2%
14%
1%
1%
0%
3%
1%
12%
3%
Grid3
29%
0%
1%
9%
4%
LCG
41%
1%
1%
8%
0%
4%
3%
1%
1%
5%
2%
3%
1%
Total:
Entering Grid
Production Phase..
9 November 2004
1%
4%
NorduGrid
30%
PPE & PPT Lunchtime Talk
at.uibk
ca.triumf
ca.ualberta
ca.umontreal
ca.utoronto
ch.cern
cz.golias
cz.skurut
de.fzk
es.ifae
es.ific
es.uam
fr.in2p3
it.infn.cnaf
it.infn.lnl
it.infn.mi
it.infn.na
it.infn.na
it.infn.roma
it.infn.to
it.infn.lnf
jp.icepp
nl.nikhef
pl.zeus
ru.msu
LCG
tw.sinica
NorduGrid
uk.bham
Grid3
uk.ic
uk.lancs
uk.man
uk.rl
~ 1350 kSI2k.months
~ 95000 jobs
~ 7.7 Million events fully simulated (Geant4)
~ 22 TB
Tony Doyle - University of Glasgow
LHCb Data Challenge
424 CPU years (4,000 kSI2k months), 186M events
• UK’s input significant (>1/4 total)
Entering Grid
Production Phase..
• LCG(UK) resource:
– Tier-1 7.7%
186 M Produced EventsPhase 1
–
–
–
–
Tier-2 sites:
London 3.9%
South 2.3%
North 1.4%
• DIRAC:
–
–
–
–
Imperial 2.0%
L'pool 3.1%
Oxford 0.1%
ScotGrid 5.1%
9 November 2004
3-5 106/day
Completed
LCG
LCG
paused restarted
LCG in
action 1.8 106/day
DIRAC
alone
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Paradigm Shift
Transition to Grid…
424 CPU · Years
9 November 2004
May: 89%:11%
Jun: 80%:20%
11% of DC’04
25% of DC’04
Jul: 77%:23%
Aug: 27%:73%
22% of DC’04
42% of DC’04
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
More Applications
ZEUS uses LCG
•needs the Grid to respond to
increasing demand for MC
production
• 5 million Geant events on Grid
since August 2004
QCDGrid
• For UKQCD
• Currently a 4-site data grid
• Key technologies used
- Globus Toolkit 2.4
- European DataGrid
- eXist XML database
•managing a few hundred gigabytes
of data
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
Issues
First large-scale
Grid production
problems
being
addressed…
at all levels
“LCG-2 MIDDLEWARE
PROBLEMS AND
REQUIREMENTS FOR
LHC EXPERIMENT DATA
CHALLENGES”
https://edms.cern.ch/file/495809/2.2/LCG2-Limitations_and_Requirements.pdf
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
5
Is GridPP a Grid?
http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
1. Coordinates resources
1. YES.
that are not subject to
This is why development
centralized control
and maintenance of LCG
is important.
2. YES.
2. … using standard, open,
VDT (Globus/Condor-G)
general-purpose protocols
+ EDG/EGEE(Glite)
and interfaces
~meet this requirement.
3. YES.
3. … to deliver nontrivial
LHC experiments data
qualities of service
challenges over the
summer of 2004.
http://agenda.cern.ch/fullAgenda.php?ida=a042133
9 November 2004
PPE & PPT Lunchtime Talk
Tony Doyle - University of Glasgow
What was GridPP1?
• A team that built a working
prototype grid of significant scale
> 1,500 (7,300) CPUs
> 500 (6,500) TB of storage
> 1000 (6,000) simultaneous jobs
Update
GridPP Goal
Clear
To develop and deploy a large scale science Grid
in the UK for the use of the Particle Physics community
• A complex project where 82% of the
190 tasks for the first three years
were completed
9 November 2004
PPE & PPT Lunchtime Talk

1
CERN

LCG Creation

1. 1

1.1.1 1.1.2 1.1.3 1.1.4
1.1.5
Applications

1. 2

1.2.1 1.2.2 1.2.3 1.2.4
1.2.5 1.2.6 1.2.7 1.2.8
1.2.9 1.2.10
Fabric

1. 3

1.3.1 1.3.2 1.3.3 1.3.4
1.3.5 1.3.6 1.3.7 1.3.8
1.3.9 1.3.101.3.11
Technology

1. 4

1.4.1 1.4.2 1.4.3 1.4.4
1.4.5 1.4.6 1.4.7 1.4.8
1.4.9
Deployment

1. 5

1.5.1 1.5.2 1.5.3 1.5.4
1.5.5 1.5.6 1.5.7 1.5.8
1.5.9 1.5.10

2
DataGrid


WP1

2. 1

2.1.1 2.1.2 2.1.3 2.1.4
2.1.5 2.1.6 2.1.7 2.1.8
2.1.9
WP2

2. 2

2.2.1 2.2.2 2.2.3 2.2.4
2.2.5 2.2.6 2.2.7
WP3

2. 3

2.3.1 2.3.2 2.3.3 2.3.4
2.3.5 2.3.6 2.3.7
WP4

2. 4

2.4.1 2.4.2 2.4.3 2.4.4
2.4.5 2.4.6 2.4.7

3
Applications

6
Dissemination
7
Resources
Presentation

6. 1

6.1.1 6.1.2 6.1.3 6.1.4
6.1.5
Deployment

7. 1

7.1.1 7.1.2 7.1.3 7.1.4
Open Source

5. 2

5.2.1 5.2.2 5.2.3
Participation

6. 2

6.2.1 6.2.2 6.2.3
Monitoring

7. 2
7.2.1 7.2.2 7.2.3
Tier-2

4. 3

4.3.1 4.3.2 4.3.3 4.3.4
4.3.5
Worldwide Integration

5. 3
5.3.1 5.3.2 5.3.3
Engagement

6. 3

6.3.1 6.3.2 6.3.3 6.3.4
Developing

7. 3

7.3.1 7.3.2 7.3.3 7.3.4
CMS
3. 4

3.4.2 3.4.3 3.4.4
3.4.6 3.4.7 3.4.8
3.4.10
BaBar
3. 5

3.5.2 3.5.3 3.5.4
3.5.6 3.5.7
Testbed

4. 4

4.4.1 4.4.2 4.4.3 4.4.4
4.4.5 4.4.6
UK Integration

5. 4

5.4.1 5.4.2 5.4.3 5.4.4
5.4.5

3.4.1
3.4.5
3.4.9
WP6

2. 6
2.6.1 2.6.2 2.6.3
2.6.5 2.6.6 2.6.7
2.6.9
WP7

2. 7
2.7.1 2.7.2 2.7.3
2.7.5 2.7.6 2.7.7
CDF/DO

3. 6

3.6.1 3.6.2 3.6.3 3.6.4
3.6.5 3.6.6 3.6.7 3.6.8
3.6.9 3.6.103.6.113.6.12
UKQCD

3. 7

3.7.1 3.7.2 3.7.3 3.7.4
3.7.5 3.7.6
WP8

2. 8

2.8.1 2.8.2 2.8.3 2.8.4
2.8.5

5
Interoperability
Int. Standards

5. 1

5.1.1 5.1.2 5.1.3

3.5.1
3.5.5

2.7.4
2.7.8

Tier-A

4. 1

4.1.1 4.1.2 4.1.3 4.1.4
4.1.5 4.1.6 4.1.7 4.1.8
4.1.9
Tier-1

4. 2

4.2.1 4.2.2 4.2.3 4.2.4
4.2.5 4.2.6 4.2.7
WP5

2. 5

2.5.1 2.5.2 2.5.3 2.5.4
2.5.5 2.5.6 2.5.7

2.6.4
2.6.8
4
Infrastructure
ATLAS

3. 1

3.1.1 3.1.2 3.1.3 3.1.4
3.1.5 3.1.6 3.1.7 3.1.8
3.1.9 3.1.10
ATLAS/LHCb

3. 2

3.2.1 3.2.2 3.2.3 3.2.4
3.2.5 3.2.6 3.2.7 3.2.8
3.2.9
LHCb

3. 3

3.3.1 3.3.2 3.3.3 3.3.4
3.3.5 3.3.6
Other

3. 8
3.8.1 3.8.2 3.8.3
Rollout

4. 5

4.5.1 4.5.2 4.5.3 4.5.4
Data Challenges

4. 6

4.6.1 4.6.2 4.6.3

Status Date
1-Jan-04
Metric OK
Metric not OK
Task complete
Task overdue
Due within 60
days
Task not due soon
Not Active
No Task or metric
Navigate up
Navigate down
External link
Link to goals

1.1.1
1.1.1
1.1.1
1.1.1
1.1.1
1.1.1
1.1.1





Tony Doyle - University of Glasgow
Aims for GridPP2? From
Prototype to Production
BaBarGrid
BaBar
CDF
ATLAS
ALICE
LHCb
CMS
CERN Computer
Centre
RAL Computer
Centre
19 UK Institutes
Separate Experiments,
Resources, Multiple
Accounts
2001
9 November 2004
EGEE
SAMGrid
D0
GANGA
EDG
ARDA
LCG
LCG
CERN Prototype
Tier-0 Centre
UK Prototype
Tier-1/A Centre
CERN Tier-0
Centre
UK Tier-1/A
Centre
4 UK Tier-2
Centres
4 UK Prototype
Tier-2 Centres
Prototype Grids
2004
PPE & PPT Lunchtime Talk
'One' Production Grid
2007
Tony Doyle - University of Glasgow
Planning:
GridPP2 ProjectMap
GridPP2 Goal
To develop and deploy a large scale production quality grid in the UK for the use of the Particle Physics community

0.
Tier-A

Tier-1

1
LCG

Tier-2
1. 1


Applications

1. 2
1. 3


1. 4
2.1



2.2

2.3

Grid Deployment
2.4


2.5


2.6
Network
9 November 2004


3.2
3.3


3.4



3.5



3.6
LHC Deployment

4.2
4.3
Grid Operations

5
Management


5.1
6
External

Planning



5.2


4.4

Deployment


4.5

4.6
Portal
PPE & PPT Lunchtime Talk


Interoperability
6.3

Engagement


6.4

Knowledge
Transfer

SAMGrid

6.2

UKQCD

6.1
Dissemination
D0
PhenoGrid

4.1
CDF
CMS


BaBar
LHCb
Information
& Monitoring

3.1
Experiment Support
4
Non-LHC Apps
Ganga
Security


ATLAS
Workload
Management

Middleware Support
3
LHC Apps
Data & Storage
Management
Grid Technology


Metadata
Computing Fabric

Deployment
2
M/S/N

Production Grid
Navigate down
External link

Link to goals



Structures agreed and in
place (except LCG phase-2)
Tony Doyle - University of Glasgow
What lies ahead? Some
mountain climbing..
Annual data storage:
12-14 PetaBytes
per year
100 Million SPECint2000
Importance of
step-by-step
planning…
Pre-plan your trip, carry an ice
axe and crampons and arrange
for a guide…
CD stack with
1 year LHC data
(~ 20 km)
Concorde
(15 km)
In production terms,
 100,000 PCs (3 GHz Pentium 4)
we’ve made
base camp
Quantitatively, we’re ~9% of the way there in terms of CPU
(9,000 ex 100,000) and disk (3 ex 12-14*3 years)…
9 November 2004
PPE & PPT Lunchtime Talk
We are here
(1 km)
Tony Doyle - University of Glasgow
1. Why? 2. What?
3. How? 4. When?
From Particle Physics perspective the Grid is:
1. needed to utilise large-scale computing resources
efficiently and securely
2. a) a working prototype running today on large testbed(s)…
b) about seamless discovery of computing resources
c) using evolving standards for interoperation
d) the basis for computing in the 21st Century
e) not (yet) as transparent or robust as end-users need
3. see the GridPP getting started pages
(two-day EGEE training courses available)
4. a) Now, at prototype level, for simple(r) applications
(e.g. experiment Monte Carlo production)
b) September 2007 for more complex applications
(e.g. data
ready
for LHC
9 November
2004 analysis) – PPE
& PPT Lunchtime
Talk
Tony Doyle - University of Glasgow
Download