Open Science Grid

advertisement
High Energy & Nuclear Physics Experiments
and Advanced Cyberinfrastructure
www.opensciencegrid.org
Internet2 Meeting
San Diego, CA
October 11, 2007
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
University of Florida
avery@phys.ufl.edu
Paul Avery
1
Context: Open Science Grid
 Consortium
of many organizations (multiple disciplines)
 Production grid cyberinfrastructure
 75+ sites, 30,000+ CPUs: US, UK, Brazil, Taiwan
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
2
OSG Science Drivers
at Large Hadron Collider
 New
 High
Energy & Nuclear Physics expts
 Top
quark, nuclear matter at extreme density
 ~10 petabytes
1997 – present
 LIGO
(gravity wave search)
 Search
for gravitational waves
 ~few petabytes
2002 – present
2005
2003
2001
Future Grid resources




2007
Data growth
fundamental particles and forces
 100s of petabytes
2008 - ?
2009
Community growth
 Experiments
Massive CPU (PetaOps)
Large distributed datasets (>100PB)
Global communities (1000s)
International optical networks
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
3
OSG History in Context
Primary Drivers: LHC and LIGO
LIGO operation
LIGO preparation
LHC construction, preparation, commissioning
LHC Ops
iVDGL(NSF)
GriPhyN(NSF)
Trillium Grid3
OSG (DOE+NSF)
PPDG (DOE)
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
European Grid + Worldwide LHC Computing Grid
Campus, regional grids
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
4
LHC Experiments at CERN
 27 km Tunnel in Switzerland & France
TOTEM
CMS
ALICE
ATLAS
Search for
 Origin of Mass
 New fundamental forces
 Supersymmetry
 Other new particles
Internet2
 2008
– ?Presentation (Oct. 11, 2007)
Paul Avery
LHCb
5
Collisions at LHC (2008?)
ProtonProton
Protons/bunch
Beam energy
Luminosity
Bunch
2835 bunch/beam
1011
7 TeV x 7 TeV
1034 cm2s1
Crossing rate every 25 nsec
(~20 Collisions/Crossing)
Proton
Parton
(quark, gluon)
l
Higgs
l
o
Z
+
e
Particle
e+
e-
o
Z
jet
jet
Internet2 Presentation (Oct. 11, 2007)
e-
 Collision rate ~109 Hz
 New physics rate ~105 Hz
 Selection: 1 in 1014
SUSY.....
Paul Avery
6
LHC Data and CPU Requirements
CMS
ATLAS
Storage




Raw recording rate 0.2 – 1.5 GB/s
Large Monte Carlo data samples
100 PB by ~2012
1000 PB later in decade?
Processing
 PetaOps (> 600,000 3 GHz cores)
Users
 100s of institutes
 1000s of researchers
LHCb
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
7
LHC Global Collaborations
CMS
ATLAS
2000 – 3000 physicists per experiment
 USA is 20–31% of total

Internet2 Presentation (Oct. 11, 2007)
Paul Avery
8
LHC Global Grid
 5000 physicists, 60 countries
 10s of Petabytes/yr by 2009
 CERN / Outside = 10-20%
CMS Experiment
Online
System
Tier 0
Tier 1
CERN Computer
Center
200 - 1500 MB/s
Korea
Russia
UK
10-40 Gb/s
FermiLab
>10 Gb/s
Tier 2
OSG
U Florida
Caltech
UCSD
2.5-10 Gb/s
Tier 3
Tier 4
FIU
Physics caches
Internet2 Presentation (Oct. 11, 2007)
Iowa
Maryland
PCs
Paul Avery
9
LHC Global Grid
11 Tier-1 sites
112 Tier-2 sites (growing)
Internet2 Presentation (Oct. 11, 2007)
100s
of universities
Paul Avery
10
J. Knobloch
LHC Cyberinfrastructure Growth: CPU
300
Multi-core
boxes
AC & power challenges
Tier-2
350
200
~100,000 cores
150
Tier-1
MSI2000
250
100
0
2007
CERN
50
LHCb-Tier-2
CMS-Tier-2
ATLAS-Tier-2
ALICE-Tier-2
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
2008
Internet2 Presentation (Oct. 11, 2007)
2009
Year
Paul Avery
2010
11
LHC Cyberinfrastructure Growth: Disk
160
120
Tier-2
Disk
140
100 Petabytes
80
Tier-1
60
40
20
0
2007
CERN
PB
100
LHCb-Tier-2
CMS-Tier-2
ATLAS-Tier-2
ALICE-Tier-2
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
2008
2009
2010
Year
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
12
LHC Cyberinfrastructure Growth: Tape
160
140
Tape
LHCb-Tier-1
CMS-Tier-1
80
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
60
CMS-CERN
40
CERN
PB
100
100 Petabytes
Tier-1
120
20
0
2007
2008
2009
ATLAS-CERN
ALICE-CERN
2010
Year
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
13
HENP Bandwidth Roadmap
for Major Links (in Gbps)
Year

Production
Experimental
Remarks
2001
2002
0.155
0.622
0.622-2.5
2.5
SONET/SDH
2003
2.5
10
DWDM; 1 + 10 GigE
Integration
2005
10
2-4 X 10
 Switch;
 Provisioning
2007
3 X 10
1st Gen.  Grids
2009
~8 X 10
or 2 X 40
~5 X 40 or
~20 X 10
~Terabit
~10 X 10;
40 Gbps
~5 X 40 or
~20 X 10
~25 X 40 or
~100 X 10
2012
2015
SONET/SDH
DWDM; GigE Integ.
40 Gbps 
Switching
2nd Gen  Grids
Terabit Networks
~MultiTbps
Paralleled by Paul
ESnet
Avery roadmap
Internet2 Presentation (Oct. 11, 2007)
~Fill One Fiber
14
HENP Collaboration with Internet2
www.internet2.edu
HENP SIG
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
15
HENP Collaboration with NLR
www.nlr.net
UltraLight and other networking initiatives
 Spawning state-wide and regional networks (FLR, SURA, LONI, …)

Internet2 Presentation (Oct. 11, 2007)
Paul Avery
16
US LHCNet, ESnet Plan 2007-2010:
3080 Gbps US-CERN
AsiaPac
SEA
Europe
US-LHCNet: NY-CHI-GVA-AMS
2007-10: 30, 40, 60, 80 Gbps
Europe
Aus.
ESnet4
SDN Core:
30-50Gbps
SNV
Japan
Japan
BNL
CHI
NYC
GEANT2
SURFNet
IN2P3
DEN
Metro
Rings
DC
FNAL
Aus.
SDG
ESnet IP Core
≥10 Gbps
ALB
ATL
CERN
ELP
ESnet hubs
New ESnet hubs
Metropolitan Area Rings
Major DOE Office of Science Sites
High-speed cross connects with Internet2/Abilene
Production IP ESnet core, 10 Gbps enterprise IP traffic
Science Data Network core, 40-60 Gbps circuit transport
Lab supplied
Major international
LHCNet Data Network
NSF/IRNC
circuit; GVA-AMS
connection via SurfnetPaul
or Avery
Internet2
Presentation
(Oct. 11, 2007)
Geant2
10Gb/s
10Gb/s
30Gb/s
2 x 10Gb/s
US-LHCNet
Network Plan
(3 to 8 x 10 Gbps
US-CERN)
ESNet MANs to FNAL & BNL; Dark Fiber
17
to FNAL; Peering With GEANT
Tier1–Tier2 Data Transfers: 2006–07
1 GB/sec
CSA06
Sep. 2006
Internet2 Presentation (Oct. 11, 2007)
Mar. 2007
Paul Avery
Sep. 2007
18
US: FNAL Computing,
Transfer Rates
to
Tier-2
Universities
Offline and CSA07
1 GB/s
One well configured site.
But ~10 such sites in near
future  network challenge
Nebraska
June 2007
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
19
Current Data Transfer Experience
 Transfers
 Or
are generally much slower than expected
stop altogether
 Potential
causes difficult to diagnose
 Configuration
problem? Loading? Queuing?
 Database errors, experiment S/W error, grid S/W error?
 End-host problem? Network problem? Application failure?
 Complicated
recovery
 Insufficient
information
 Too slow to diagnose and correlate at the time the error occurs
 Result
 Lower
 Need
transfer rates, longer troubleshooting times
intelligent services, smart end-host systems
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
20
UltraLight
Integrating Advanced Networking in Applications
http://www.ultralight.org
Funded by NSF
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
10 Gb/s+ network
• Caltech, UF, FIU, UM, MIT
• SLAC, FNAL
• Int’l partners
21
• Level(3), Cisco, NLR
UltraLight Testbed
www.ultralight.org
Funded by NSF
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
22
Many Near-Term Challenges
 Network
 Bandwidth,
bandwidth, bandwidth
 Need for intelligent services, automation
 More efficient utilization of network (protocols, NICs, S/W clients,
pervasive monitoring)
 Better
collaborative tools
 Distributed
authentication?
 Scalable services: automation
 Scalable support
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
23
END
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
24
Extra Slides
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
25
The Open Science Grid Consortium
U.S. grid
projects
University
facilities
Multi-disciplinary
facilities
Science projects &
communities
LHC experiments
Open
Science
Grid
Regional and
campus grids
Education
communities
Computer
Science
Laboratory
centers
Technologists
(Network, HPC, …)
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
26
CMS: “Compact” Muon Solenoid
Inconsequential humans
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
27
Collision Complexity: CPU + Storage
(+30 minimum bias events)
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
109 collisions/sec, selectivity: 1 in 1013
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
28
LHC Data Rates: Detector to Storage
Physics filtering
40 MHz
~TBytes/sec
Level 1 Trigger: Special Hardware
75 GB/sec
75 KHz
Level 2 Trigger: Commodity CPUs
5 GB/sec
5 KHz
Level 3 Trigger: Commodity CPUs
0.15 – 1.5 GB/sec
100 Hz
Raw Data to storage
(+ simulated data)
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
29
LIGO: Search for Gravity Waves
 LIGO
Grid
6
US sites
 3 EU sites (UK & Germany)
Birmingham•
Cardiff
AEI/Golm •
* LHO, LLO: LIGO observatory sites
* LSC:
LIGO Scientific Collaboration
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
30
Is HEP Approaching Productivity Plateau?
Expectations
Beijing
2001
The Technology Hype Cycle
Applied to HEP Grids
San Diego
2003
Padova
2000
Victoria
2007
Mumbai
2006
Interlachen
2004
(CHEP Conferences)
Gartner Group
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
31
From Les Robertson
Challenges from Diversity and Growth
 Management
of an increasingly diverse enterprise
 Sci/Eng
projects, organizations, disciplines as distinct cultures
 Accommodating new member communities (expectations?)
 Interoperation
with other grids
 TeraGrid
 International
partners (EGEE, NorduGrid, etc.)
 Multiple campus and regional grids
 Education,
outreach and training
 Training
for researchers, students
 … but also project PIs, program officers
 Operating
a rapidly growing cyberinfrastructure
 100K CPUs, 4  10 PB disk
 Management of and access to rapidly increasing data stores (slide)
 Monitoring, accounting, achieving high utilization
 Scalability of support model (slide)
 25K
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
32
Collaborative Tools: EVO Videoconferencing
End-to-End Self Managed Infrastructure
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
33
REDDnet: National Networked Storage
NSF
funded project
 Vanderbilt
8
initial sites
Multiple
disciplines
 Satellite
imagery
 HENP
 Terascale
Supernova
Initative
 Structural Biology
 Bioinformatics
Storage
 500TB
disk
 200TB tape
Brazil?
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
34
OSG Operations Model
Distributed model
 Scalability!
 VOs, sites, providers
 Rigorous problem
tracking & routing
 Security
 Provisioning
 Monitoring
 Reporting
Partners with EGEE operations
Internet2 Presentation (Oct. 11, 2007)
Paul Avery
35
Download