Open Science Grid

advertisement
Open Science Grid
Linking Universities and Laboratories in National
CyberInfrastructure
Paul Avery
University of Florida
avery@phys.ufl.edu
SURA Infrastructure Workshop
Austin, TX
December 7, 2005
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
1
Bottom-up Collaboration: “Trillium”
 Trillium
= PPDG + GriPhyN + iVDGL
 PPDG:
$12M (DOE)
 GriPhyN: $12M (NSF)
 iVDGL:
$14M (NSF)
 ~150
(1999 – 2006)
(2000 – 2005)
(2001 – 2006)
people with large overlaps between projects
 Universities,
 Strong
labs, foreign partners
driver for funding agency collaborations
 Inter-agency:
NSF – DOE
 Intra-agency: Directorate – Directorate, Division – Division
 Coordinated
internally to meet broad goals
 CS
research, developing/supporting Virtual Data Toolkit (VDT)
 Grid deployment, using VDT-based middleware
 Unified entity when collaborating internationally
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
2
Common Middleware: Virtual Data Toolkit
VDT
NMI
Sources
(CVS)
Build & Test
Condor pool
22+ Op. Systems
Build
Binaries
Test
Pacman cache
Package
Patching
RPMs
Build
Binaries
GPT src
bundles
Test
Build
Binaries
Many Contributors
A unique laboratory for testing, supporting, deploying, packaging, upgrading, &
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
3
troubleshooting
complex sets of software!
VDT Growth Over 3 Years (1.3.8 now)
www.griphyn.org/vdt/
35
VDT 1.1.x
25
VDT 1.2.x
VDT 1.3.x
VDT 1.1.8
First real use by LCG
20
VDT 1.0
Globus 2.0b
Condor 6.3.1
15
10
VDT 1.1.11
Grid3
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
Apr-05
Jan-05
Oct-04
Jul-04
Apr-04
Jan-04
Oct-03
Jul-03
Jan-03
Oct-02
Apr-02
0
Jul-02
5
Apr-03
VDT 1.1.7
Switch to Globus 2.2
Jan-02
# of components
30
4
Components of VDT 1.3.5
 Globus
3.2.1
 Condor 6.7.6
 RLS 3.0
 ClassAds 0.9.7
 Replica 2.2.4
 DOE/EDG CA certs
 ftsh 2.0.5
 EDG mkgridmap
 EDG CRL Update
 GLUE Schema 1.0
 VDS 1.3.5b
 Java
 Netlogger 3.2.4
 Gatekeeper-Authz
 MyProxy1.11
 KX509
SURA Infrastructure Workshop (Dec. 7, 2005)
 System
Profiler
 GSI OpenSSH 3.4
 Monalisa 1.2.32
 PyGlobus 1.0.6
 MySQL
 UberFTP 1.11
 DRM 1.2.6a
 VOMS 1.4.0
 VOMS Admin 0.7.5
 Tomcat
 PRIMA 0.2
 Certificate Scripts
 Apache
 jClarens 0.5.3
 New GridFTP Server
 GUMS 1.0.1
Paul Avery
5
VDT Collaborative Relationships
Partner science projects
Partner networking projects
Partner outreach projects
Requirements
Other linkages
 Work
force
 CS researchers
 Industry
U.S.Grids
International
Outreach
Prototyping
& experiments
Deployment,
Feedback
Computer
Virtual
Science, ENG,
Techniques
Tech
Science
Data
Education
& software
Research
Toolkit Transfer Communities
Globus, Condor, NMI, TeraGrid, OSG
EGEE, WLCG, Asia, South America
QuarkNet, CHEPREO, Digital Divide
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
6
Major Science Driver:
Large Hadron Collider (LHC) @ CERN
 27 km Tunnel in Switzerland & France
TOTEM
CMS
ALICE
LHCb
Search for
 Origin of Mass
 New fundamental forces
 Supersymmetry
 Other new particles
SURA–
Infrastructure
Workshop (Dec. 7, 2005)
 2007
?
ATLAS
Paul Avery
7
LHC: Petascale Global Science
 Complexity:
Millions of individual detector channels
 Scale:
PetaOps (CPU), 100s of Petabytes (Data)
 Distribution:
Global distribution of people & resources
BaBar/D0 Example - 2004
700+ Physicists
100+ Institutes
35+ Countries
CMS Example- 2007
5000+ Physicists
250+ Institutes
60+ Countries
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
8
LHC Global Data Grid (2007+)
 5000 physicists, 60 countries
 10s of Petabytes/yr by 2008
 1000 Petabytes in < 10 yrs?
CMS Experiment
Online
System
Tier 0
Tier 1
CERN Computer
Center
150 - 1500 MB/s
Korea
Russia
UK
10-40 Gb/s
USA
>10 Gb/s
U Florida
Tier 2
Caltech
UCSD
2.5-10 Gb/s
Tier 3
Tier 4
FIU
Physics caches
SURA Infrastructure Workshop (Dec. 7, 2005)
Iowa
Maryland
PCs
Paul Avery
9
Grid3 and
Open Science Grid
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
10
Grid3: A National Grid Infrastructure
 October
2003 – July 2005
 32 sites, 3,500 CPUs: Universities + 4 national labs
 Sites in US, Korea, Brazil, Taiwan
 Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS
Brazil
SURA Infrastructure Workshop (Dec. 7, 2005)
www.ivdgl.org/grid3
Paul Avery
11
Grid3 Lessons Learned
 How
to operate a Grid as a facility
 Tools,
services, error recovery, procedures, docs, organization
 Delegation of responsibilities (Project, VO, service, site, …)
 Crucial role of Grid Operations Center (GOC)
 How
to support people  people relations
 Face-face
 How
to test and validate Grid tools and applications
 Vital
 How
role of testbeds
to scale algorithms, software, process
 Some
 How
meetings, phone cons, 1-1 interactions, mail lists, etc.
successes, but “interesting” failure modes still occur
to apply distributed cyberinfrastructure
 Successful
production runs for several applications
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
12
http://www.opensciencegrid.org
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
13
Open Science Grid: July 20, 2005
 Production
Grid: 50+ sites, 15,000 CPUs “present”
(available but not at one time)
 Sites in US, Korea, Brazil, Taiwan
 Integration Grid: 10-12 sites
Taiwan, S.Korea
Sao Paolo
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
14
OSG Operations Snapshot
November 7: 30 days
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
15
OSG Participating Disciplines
Computer Science
Condor, Globus, SRM, SRB
Test and validate
innovations: new services
& technologies
Physics
LIGO, Nuclear Physics,
Tevatron, LHC
Global Grid: computing &
data access
Astrophysics
Sloan Digital Sky Survey
CoAdd: multiply-scanned
objects
Argonne GADU project
Spectral fitting analysis
BLAST, BLOCKS, gene
sequences, etc
Bioinformatics
Dartmouth Psychological & Functional MRI
Brain Sciences
CCR
(U Buffalo)
University campus
GLOW
(U Wisconsin)
Resources, portals, apps
TACC
(Texas Advanced Computing Center)
MGRID
(U Michigan)
UFGRID
(U Florida)
Crimson Grid (Harvard)
SURA Infrastructure Workshop (Dec. 7, 2005)
16
FermiGridPaul Avery
(FermiLab Grid)
OSG Grid Partners
TeraGrid
•
“DAC2005”: run LHC apps on TeraGrid resources
•
TG Science Portals for other applications
•
Discussions on joint activities: Security,
Accounting, Operations, Portals
•
Joint Operations Workshops, defining mechanisms
to exchange support tickets
•
Joint Security working group
•
US middleware federation contributions to coremiddleware gLITE
Worldwide LHC
Computing Grid
•
OSG contributes to LHC global data handling and
analysis systems
Other partners
•
SURA, GRASE, LONI, TACC
•
Representatives of VOs provide portals and
interfaces to their user groups
EGEE
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
17
Example of Partnership:
WLCG and EGEE
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
18
OSG Technical Groups & Activities
 Technical
Groups address and coordinate technical areas
 Propose
and carry out activities related to their given areas
 Liaise & collaborate with other peer projects (U.S. & international)
 Participate in relevant standards organizations.
 Chairs participate in Blueprint, Integration and Deployment activities
 Activities
are well-defined, scoped tasks contributing to OSG
 Each
Activity has deliverables and a plan
 … is self-organized and operated
 … is overseen & sponsored by one or more Technical Groups
TGs and Activities are where the real work gets done
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
19
OSG Technical Groups (deprecated!)
Governance
Charter, organization, by-laws, agreements,
formal processes
Policy
VO & site policy, authorization, priorities,
privilege & access rights
Security
Common security principles, security
infrastructure
Monitoring and
Information Services
Resource monitoring, information services,
auditing, troubleshooting
Storage
Storage services at remote sites, interfaces,
interoperability
Infrastructure and services for user support,
helpdesk, trouble ticket
Training, interface with various E/O projects
Support Centers
Education / Outreach
Networks (new)
Including interfacing with various networking
projects
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
20
OSG Activities
Blueprint
Defining principles and best practices for OSG
Deployment
Deployment of resources & services
Provisioning
Connected to deployment
Incidence response
Plans and procedures for responding to security
incidents
Integration
Testing & validating & integrating new services
and technologies
Data Resource
Management (DRM)
Deployment of specific Storage Resource
Management technology
Documentation
Organizing the documentation infrastructure
Accounting
Accounting and auditing use of OSG resources
Interoperability
Primarily interoperability between
Operations
Operating Grid-wide services
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
21
OSG Integration Testbed:
Testing & Validating Middleware
Taiwan
Brazil
Korea
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
22
Networks
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
23
Evolving Science Requirements for Networks
(DOE High Performance Network Workshop)
End2End
Throughput
5 years
End2End
Throughput
High Energy
Physics
Climate (Data &
Computation)
SNS
NanoScience
0.5 Gb/s
100 Gb/s
5-10 Years
End2End
Throughput
1000 Gb/s
0.5 Gb/s
160-200 Gb/s
N x 1000 Gb/s
Not yet
started
1 Gb/s
1000 Gb/s +
QoS for Control
Channel
Fusion Energy
0.066 Gb/s
(500 MB/s
burst)
0.013 Gb/s
(1 TB/week)
0.2 Gb/s
(500MB/
20 sec. burst)
N*N multicast
N x 1000 Gb/s
Time critical
throughput
1000 Gb/s
0.091 Gb/s
(1 TB/day)
100s of users
1000 Gb/s +
QoS for Control
Channel
Computational
steering and
collaborations
High throughput
and steering
Science Areas
Astrophysics
Genomics Data
& Computation
Today
Remarks
High bulk
throughput
High bulk
throughput
Remote control
and time critical
throughput
http://www.doecollaboratory.org/meetings/hpnpw
SURA See
Infrastructure
Workshop (Dec. 7, 2005)
Paul Avery
/
24
UltraLight
Integrating Advanced Networking in Applications
http://www.ultralight.org
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
10 Gb/s+ network
• Caltech, UF, FIU, UM, MIT
• SLAC, FNAL
• Int’l partners
25
• Level(3), Cisco, NLR
Education
Training
Communications
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
26
iVDGL, GriPhyN Education/Outreach
Basics




$200K/yr
Led by UT Brownsville
Workshops, portals, tutorials
Partnerships with QuarkNet,
CHEPREO, LIGO E/O, …
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
27
Grid Training Activities
 June
 36
 July
2004: First US Grid Tutorial (South Padre Island, Tx)
students, diverse origins and types
2005: Second Grid Tutorial (South Padre Island, Tx)
 42
students, simpler physical setup (laptops)
 Reaching
a wider audience
 Lectures,
exercises, video, on web
 Students, postdocs, scientists
 Coordination of training activities
 “Grid Cookbook” (Trauner & Yafchak)
 More tutorials, 3-4/year
 CHEPREO tutorial in 2006?
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
28
QuarkNet/GriPhyN e-Lab Project
http://quarknet.uchicago.edu/elab/cosmic/home.jsp
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
29
CHEPREO: Center for High Energy Physics Research
and Educational Outreach
Florida International University




Physics Learning Center
CMS Research
iVDGL Grid Activities
AMPATH network (S. America)
 Funded September 2003
 $4M initially (3 years)
 MPS, CISE, EHR, INT
Grids and the Digital Divide
Background
 World Summit on Information Society
 HEP Standing Committee on Interregional Connectivity (SCIC)
Themes
 Global collaborations, Grids and
addressing the Digital Divide
 Focus on poorly connected regions
 Brazil (2004), Korea (2005)
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
31
Science Grid Communications
Broad set of activities
(Katie Yurkewicz)
News releases, PR, etc.
Science Grid This Week
OSG Newsletter
Not restricted to OSG
www.interactions.org/sgtw
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
32
Grid Timeline
First US-LHC
Grid Testbeds
Grid Communications
Grid3 operations
GriPhyN, $12M
UltraLight, $2M
iVDGL, $14M
2000
2001
LIGO Grid
2002
VDT 1.0
DISUN, $10M
2003
2004
CHEPREO, $4M
PPDG, $9.5M
Start of LHC
2005
2006
2007
OSG operations
Grid Summer
Schools
Digital Divide Workshops
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
33
Future of OSG CyberInfrastructure
 OSG
is a unique national infrastructure for science
 Large
CPU, storage and network capability crucial for science
 Supporting
advanced middleware
 Long-term
support of the Virtual Data Toolkit (new disciplines &
international collaborations
 OSG
currently supported by a “patchwork” of projects
 Collaborating
 Developing
projects, separately funded
workplan for long-term support
 Maturing,
hardening facility
 Extending facility to lower barriers to participation
 Oct. 27 presentation to DOE and NSF
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
34
OSG Consortium Meeting: Jan 23-25
 University
of Florida (Gainesville)
 About
100 – 120 people expected
 Funding agency invitees
 Schedule
 Monday
Morning:
 Monday Afternoon:
 Tuesday Morning:
 Tuesday Afternoon:
 Wednesday Morning:
 Wednesday Afternoon:
 Thursday:
Applications plenary (rapporteurs)
Partner Grid projects plenary
Parallel
Plenary
Parallel
Plenary
OSG Council meeting
Taiwan, S.Korea
Sao Paolo
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
35
Disaster Planning
Emergency Response
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
36
Grids and Disaster Planning /
Emergency Response
 Inspired
by recent events
 Dec.
2004 tsunami in Indonesia
 Aug. 2005 Katrina hurricane and subsequent flooding
 (Quite different time scales!)
 Connection
 Resources
of DP/ER to Grids
to simulate detailed physical & human consequences of
disasters
 Priority pooling of resources for a societal good
 In principle, a resilient distributed resource
 Ensemble
approach well suited to Grid/cluster computing
 E.g.,
given a storm’s parameters & errors, bracket likely outcomes
 Huge number of jobs required
 Embarrassingly parallel
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
37
DP/ER Scenarios
 Simulating
physical scenarios
 Hurricanes,
storm surges, floods, forest fires
 Pollutant dispersal: chemical, oil, biological and nuclear spills
 Disease epidemics
 Earthquakes, tsunamis
 Nuclear attacks
 Loss of network nexus points (deliberate or side effect)
 Astronomical impacts
 Simulating
human responses to these situations
 Roadways,
evacuations, availability of resources
 Detailed models (geography, transportation, cities, institutions)
 Coupling human response models to specific physical scenarios
 Other
possibilities
 “Evacuation”
of important data to safe storage
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
38
DP/ER and Grids: Some Implications
 DP/ER
scenarios are not equally amenable to Grid approach
 E.g.,
tsunami vs hurricane-induced flooding
 Specialized Grids can be envisioned for very short response times
 But all can be simulated “offline” by researchers
 Other “longer term” scenarios
 ER
is an extreme example of priority computing
 Priority
use of IT resources is common (conferences, etc)
 Is ER priority computing different in principle?
 Other
implications
 Requires
long-term engagement with DP/ER research communities
 (Atmospheric, ocean, coastal ocean, social/behavioral, economic)
 Specific communities with specific applications to execute
 Digital Divide: resources to solve problems of interest to 3rd World
 Forcing function for Grid standards?
 Legal liabilities?
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
39
Grid Project References
Open
Science Grid
UltraLight
 www.opensciencegrid.org
Grid3
 www.ultralight.org
Globus
 www.ivdgl.org/grid3
Virtual
Data Toolkit
 www.griphyn.org/vdt
GriPhyN
 www.griphyn.org
iVDGL
 www.ivdgl.org
 www.globus.org
Condor
 www.cs.wisc.edu/condor
WLCG
 www.cern.ch/lcg
EGEE
 www.eu-egee.org
PPDG
 www.ppdg.net
CHEPREO
 www.chepreo.org
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
40
Extra Slides
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
41
Grid3 Use by VOs Over 13 Months
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
42
CMS: “Compact” Muon Solenoid
Inconsequential humans
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
43
LHC: Beyond Moore’s Law
Estimated CPU Capacity at CERN
Intel CPU (2 GHz) = 0.1K SI95
6,000
LHC CPU
Requirements
K SI95
5,000
4,000
Moore’s Law
(2000)
3,000
2,000
1,000
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
0
44
Grids and Globally Distributed Teams
 Non-hierarchical: Chaotic analyses + productions
 Superimpose significant random data flows
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
45
Sloan Digital Sky Survey (SDSS)
Using Virtual Data in GriPhyN
Sloan Data
100000
Galaxy cluster
size distribution
Number of Clusters
10000
1000
100
10
1
SURA Infrastructure Workshop (Dec.1 7, 2005)
10
Number of Galaxies
Paul Avery 100
46
The LIGO Scientific Collaboration (LSC)
and the LIGO Grid
 LIGO
Grid: 6 US sites + 3 EU sites (Cardiff/UK,
AEI/Germany)
Birmingham•
Cardiff
AEI/Golm •
* LHO, LLO: LIGO observatory sites
* LSC:
LIGO Scientific Collaboration
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery
47
Download