CASC Meeting
Washington, DC
July 14, 2004
CASC Meeting (July 14, 2004) Paul Avery
Paul Avery
University of Florida avery@phys.ufl.edu
1
Trillium = PPDG + GriPhyN + iVDGL
Particle Physics Data Grid: $12M (DOE) (1999 – 2004+)
GriPhyN:
iVDGL:
$12M (NSF) (2000 – 2005)
$14M (NSF) (2001 – 2006)
Basic composition (~150 people)
PPDG: 4 universities, 6 labs
GriPhyN: 12 universities, SDSC, 3 labs
iVDGL: 18 universities, SDSC, 4 labs, foreign partners
Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO
Complementarity of projects
GriPhyN: CS research, Virtual Data Toolkit (VDT) development
PPDG: “End to end” Grid services, monitoring, analysis
iVDGL: Grid laboratory deployment using VDT
Experiments provide frontier challenges
Unified entity when collaborating internationally
CASC Meeting (July 14, 2004) Paul Avery 2
Experiments at Large Hadron Collider
100s of Petabytes 2007 - ?
High Energy & Nuclear Physics expts
~1 Petabyte (1000 TB) 1997 – present
LIGO (gravity wave search)
100s of Terabytes 2002 – present
Sloan Digital Sky Survey
10s of Terabytes 2001 – present
2009
2007
2005
2003
2001
Massive CPU (PetaOps)
Large distributed datasets (>100PB)
Global communities (1000s)
CASC Meeting (July 14, 2004) Paul Avery 3
27 km Tunnel in Switzerland & France
TOTEM
CMS
ALICE
LHCb
Search for Origin of
Mass & Supersymmetry
(2007 – ?)
CASC Meeting (July 14, 2004) Paul Avery
ATLAS
4
Common experiments, leadership, participants
CS research
Workflow, scheduling, virtual data
Common Grid toolkits and packaging
Virtual Data Toolkit (VDT) + Pacman packaging
Common Grid infrastructure: Grid3
National Grid for testing, development and production
Advanced networking
Ultranet, UltraLight, etc.
Integrated education and outreach effort
+ collaboration with outside projects
Unified entity in working with international projects
LCG, EGEE, South America
CASC Meeting (July 14, 2004) Paul Avery 5
Production Team
Single Researcher Workgroups
GriPhyN
Interactive User Tools
GriPhyN
Virtual Data Tools
Resource
Management
Services
Request Planning &
Scheduling Tools
Security and
Policy
Services
GriPhyN
Request Execution &
Management Tools
Other Grid
Services
PetaOps
Petabytes
Performance
Raw data source
Transforms
GriPhyN
Distributed resources
(code, storage, CPUs, networks)
CASC Meeting (July 14, 2004) Paul Avery 6
Complexity: Millions of individual detector channels
Scale: PetaOps (CPU), 100s of Petabytes (Data)
Distribution: Global distribution of people & resources
CMS Example- 2007
5000+ Physicists
250+ Institutes
60+ Countries
CASC Meeting (July 14, 2004) Paul Avery
BaBar/D0 Example - 2004
700+ Physicists
100+ Institutes
35+ Countries
7
CMS Experiment
~10s of Petabytes/yr by 2007-8
~1000 Petabytes in < 10 yrs?
Online
System
0.1 - 1.5 GBytes/s
Tier 0
10-40 Gb/s
CERN Computer
Center
Tier 1
Tier 2
Tier 3
Korea
Physics caches
UK
FIU
Russia
U Florida Caltech
Iowa
USA
>10 Gb/s
UCSD
2.5-10 Gb/s
Maryland
Tier 4
CASC Meeting (July 14, 2004)
PCs
Paul Avery 8
Year Production Experimental Remarks
2001 0.155 0.62 - 2.5 SONET/SDH
2002
2003
2005
0.622
2.5
10
2.5
10
2-4
10
SONET/SDH
DWDM; GigE Integ.
DWDM;
1+10 GigE Integration
Switch;
Provisioning
1 st
Gen.
Grids 2007
2009
2011
2-4
10
1-2
40
~5
40
1
40
~5
40
~25
40
40 Gb/s;
Switching
Terabit Networks
2013 ~Terabit ~MultiTbps ~Fill One Fiber
HEP: Leading role in network development/deployment
CASC Meeting (July 14, 2004) Paul Avery 9
Tier2 facility
20 – 40% of Tier1?
“1 FTE support”: commodity CPU & disk, no hierarchical storage
Essential university role in extended computing infrastructure
Validated by 3 years of experience with proto-Tier2 sites
Functions
Physics analysis, simulations
Support experiment software
Support smaller institutions
Official role in Grid hierarchy (U.S.)
Sanctioned by MOU with parent organization (ATLAS, CMS, LIGO)
Local P.I. with reporting responsibilities
Selection by collaboration via careful process
CASC Meeting (July 14, 2004) Paul Avery 10
Non-hierarchical: Chaotic analyses + productions
Superimpose significant random data flows
CASC Meeting (July 14, 2004) Paul Avery 11
Sources
(CVS)
Use NMI processes later
Build & Test
Condor pool
(37 computers)
Build Binaries
Test
Pacman cache
Package
Patching
…
RPMs
GPT src bundles
Build Binaries
Test
Build Binaries
Paul Avery 12 CASC Meeting (July 14, 2004)
Globus Alliance
Grid Security Infrastructure (GSI)
Job submission (GRAM)
Information service (MDS)
Data transfer (GridFTP)
Replica Location (RLS)
ISI & UC
Chimera & related tools
Pegasus
NCSA
MyProxy
GSI OpenSSH
Condor Group
Condor/Condor-G
DAGMan
Fault Tolerant Shell
ClassAds
EDG & LCG
Make Gridmap
Cert. Revocation List Updater
Glue Schema/Info provider
LBL
PyGlobus
Netlogger
Caltech
MonaLisa
VDT
VDT System Profiler
Configuration software
Others
KX509 (U. Mich.)
CASC Meeting (July 14, 2004) Paul Avery 13
22
20
18
16
14
12
10
2
0
8
6
4
VDT 1.0
Globus 2.0b
Condor 6.3.1
VDT 1.1.8
First real use by LCG
VDT 1.1.11
Grid3
VDT 1.1.14
May 10
VDT 1.1.3,
1.1.4 & 1.1.5
Switch to Globus 2.2
pre-SC 2002
VDT 1.1.7
Number of Components
CASC Meeting (July 14, 2004) Paul Avery 14
Language: define software environments
Interpreter: create, install, configure, update, verify environments
LCG /Scram
ATLAS /CMT
CMS DPE /tar/make
LIGO /tar/make
OpenSource /tar/make
Globus /GPT
NPACI/TeraGrid /tar/make
D0 /UPS-UPD
Commercial /tar/make
Combine and manage software from arbitrary sources.
% pacman –get iVDGL:Grid3
“1 button install”:
Reduce burden on administrators
VDT
VDT iVDGL
% pacman
CASC Meeting (July 14, 2004)
LIGO
UCHEP
D-
Zero
CMS/DPE
ATLA
S
Paul Avery
NPAC
I
Remote experts define installation/ config/updating for everyone at once
15
Partner Physics projects
Partner Outreach projects
Other linkages
Work force
CS researchers
Industry
Requirements
Prototyping
& experiments Production
Deployment
Computer
Science
Research
Techniques
& software
Virtual
Data
Toolkit
Tech
Transfer
Larger
Science
Community
U.S.Grids
Int’l
Outreach
CASC Meeting (July 14, 2004)
Globus, Condor, NMI, iVDGL, PPDG,
EU DataGrid, LHC Experiments,
QuarkNet, CHEPREO
Paul Avery 16
28 sites: Universities + national labs
2800 CPUs, 400–1300 jobs
Running since October 2003
Applications in HEP, LIGO, SDSS, Genomics
CASC Meeting (July 14, 2004) http://www.ivdgl.org/grid3
Paul Avery 17
CASC Meeting (July 14, 2004) Paul Avery 18
US-CMS Monte Carlo Simulation
Used = 1.5
US-CMS resources
Non-USCMS
USCMS
CASC Meeting (July 14, 2004) Paul Avery 19
Build on Grid3 experience
Persistent, production-quality Grid, national + international scope
Ensure U.S. leading role in international science
Grid infrastructure for large-scale collaborative scientific research
Create large computing infrastructure
Combine resources at DOE labs and universities to effectively become a single national computing infrastructure for science
Provide opportunities for educators and students
Participate in building and exploiting this grid infrastructure
Develop and train scientific and technical workforce
Transform the integration of education and research at all levels
CASC Meeting (July 14, 2004) Paul Avery 20
Iteratively build & extend Grid3, to national infrastructure
Shared resources, benefiting broad set of disciplines
Strong focus on operations
Grid3 Open Science Grid
Contribute US-LHC resources to provide initial federation
US-LHC Tier-1 and Tier-2 centers provide significant resources
Build OSG from laboratories, universities, campus grids, etc.
Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab
UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc.
Corporations?
Further develop OSG
Partnerships and contributions from other sciences, universities
Incorporation of advanced networking
Focus on general services, operations, end-to-end performance
CASC Meeting (July 14, 2004) Paul Avery 21
Campus, Labs
Service
Providers
Sites
Researchers
VO Org
Research
Grid Projects
Enterprise
Participants provide: resources, management, project steering groups
CASC Meeting (July 14, 2004)
Technical
Groups
0…n (small) activity activity activity
1
0…N (large)
Paul Avery
Consortium Board
(1)
Joint committees
(0… N small)
22
Join U.S. forces into the Open Science Grid consortium
Application scientists, technology providers, resource owners, ...
Provide a lightweight framework for joint activities...
Coordinating activities that cannot be done within one project
Achieve technical coherence and strategic vision (roadmap)
Enable communication and provide liaising
Reach common decisions when necessary
Create Technical Groups
Propose, organize, oversee activities, peer, liaise
Now: Security and storage services
Later: Support centers, policies
Define activities
Contributing tasks provided by participants joining the activity
CASC Meeting (July 14, 2004) Paul Avery 23
CASC Meeting (July 14, 2004) Paul Avery 24
CASC Meeting (July 14, 2004) Paul Avery 25
Complex matrix of partnerships: challenges & opportunities
NSF and DOE, Universities and Labs, physics and computing depts.
U.S. LHC and Trillium Grid Projects
Application sciences, computer sciences, information technologies
Science and education
U.S., Europe, Asia, North and South America
Grid middleware and infrastructure projects
CERN and U.S. HEP labs, LCG and Experiments, EGEE and OSG
~85 participants on OSG stakeholder mail list
~50 authors on OSG abstract to CHEP conference (Sep. 2004)
Meetings
Jan. 12: Initial stakeholders meeting
May 20-21: Joint Trillium Steering meeting to define program
Sep. 9-10: OSG workshop, Boston
CASC Meeting (July 14, 2004) Paul Avery 26
27 CASC Meeting (July 14, 2004) Paul Avery
NEWS:
Bulletin: ONE TWO
WELCOME BULLETIN
General Information
Registration
Travel Information
Hotel Registration
Participant List
How to Get UERJ/Hotel
Computer Accounts
Useful Phone Numbers
Program
Contact us:
Secretariat
Chairmen
Background
World Summit on Information Society
HEP Standing Committee on Inter-regional
Connectivity (SCIC)
Themes
Global collaborations, Grids and addressing the Digital Divide
Next meeting: 2005 (Korea)
CASC Meeting (July 14, 2004) http://www.uerj.br/lishep2004
Paul Avery 28
Basics
$200K/yr
Led by UT Brownsville
Workshops, portals
Partnerships with
CHEPREO, QuarkNet, …
CASC Meeting (July 14, 2004) Paul Avery 29
First of its kind in the U.S. (South Padre Island, Texas)
36 students, diverse origins and types (M, F, MSIs, etc)
Marks new direction for Trillium
First attempt to systematically train people in Grid technologies
First attempt to gather relevant materials in one place
Today: Students in CS and Physics
Later: Students, postdocs, junior & senior scientists
Reaching a wider audience
Put lectures, exercises, video, on the web
More tutorials, perhaps 3-4/year
Dedicated resources for remote tutorials
Create “Grid book”, e.g. Georgia Tech
New funding opportunities
NSF: new training & education programs
CASC Meeting (July 14, 2004) Paul Avery 30
CASC Meeting (July 14, 2004) Paul Avery 31
CHEPREO: C
H
E
P
R
E
O
Physics Learning Center
CMS Research
iVDGL Grid Activities
AMPATH network (S. America)
Funded September 2003
$4M initially (3 years)
4 NSF Directorates!
Meeting April 8 in Washington DC
Brought together ~40 outreach leaders (including NSF)
Proposed Grid-based framework for common E/O effort
CASC Meeting (July 14, 2004) Paul Avery 33
Grid3
www.ivdgl.org/grid3
PPDG
www.ppdg.net
GriPhyN
www.griphyn.org
iVDGL
www.ivdgl.org
Globus
www.globus.org
LCG
www.cern.ch/lcg
EU DataGrid
www.eu-datagrid.org
EGEE
egee-ei.web.cern.ch
CASC Meeting (July 14, 2004) Paul Avery
2nd Edition www.mkp.com/grid2
34
CASC Meeting (July 14, 2004) Paul Avery 35
More than a web site
Organize datasets
Perform simple computations
Create new computations & analyses
View & share results
Annotate & enquire (metadata)
Communicate and collaborate
Easy to use, ubiquitous,
No tools to install
Open to the community
Grow & extend
Initial prototype implemented by graduate student Yong Zhao and
M. Wilde (U. of Chicago)
CASC Meeting (July 14, 2004) Paul Avery 36
Virtual Data paradigm to express science processes
Unified language (VDL) to express general data transformation
Advanced planners, executors, monitors, predictors, fault recovery
to make the Grid “like a workstation”
Virtual Data Toolkit (VDT)
Tremendously simplified installation & configuration of Grids
Close partnership with and adoption by multiple sciences:
ATLAS, CMS, LIGO, SDSS, Bioinformatics, EU Projects
Broad education & outreach program (UT Brownsville)
25 graduate, 2 undergraduate; 3 CS PhDs by end of 2004
Virtual Data for QuarkNet Cosmic Ray project
Grid Summer School 2004, 3 MSIs participating
CASC Meeting (July 14, 2004) Paul Avery 37
Careful planning and coordination essential to build Grids
Community investment of time/resources
Operations team needed to operate Grid as a facility
Tools, services, procedures, documentation, organization
Security, account management, multiple organizations
Strategies needed to cope with increasingly large scale
“Interesting” failure modes as scale increases
Delegation of responsibilities to conserve human resources
Project, Virtual Org., Grid service, site, application
Better services, documentation, packaging
Grid3 experience critical for building “useful” Grids
Frank discussion in “Grid3 Project Lessons” doc
CASC Meeting (July 14, 2004) Paul Avery 38
Need for financial investment
Grid projects: PPDG + GriPhyN + iVDGL: $35M
National labs: Fermilab, Brookhaven, Argonne, LBL
Other expts: LIGO, SDSS
Critical role of collaboration
CS + 5 HEP + LIGO + SDSS + CS + biology
Makes possible large common effort
Collaboration sociology ~50% of issues
Building “stuff”: Prototypes testbeds production Grids
Build expertise, debug software, develop tools & services
Need for “operations team”
Point of contact for problems, questions, outside inquiries
Resolution of problems by direct action, consultation w/ experts
CASC Meeting (July 14, 2004) Paul Avery 39
VDT + Pacman: Installation and configuration
Simple installation, configuration of Grid tools + applications
Major advances over 14 VDT releases
Why this is critical for us
Uniformity and validation across sites
Lowered barriers to participation more sites!
Reduced FTE overhead & communication traffic
The next frontier: remote installation
CASC Meeting (July 14, 2004) Paul Avery 40
Further evolution of Grid3: (Grid3+, etc.)
Contribute resources to persistent Grid
Maintain development Grid, test new software releases
Integrate software into the persistent Grid
Participate in LHC data challenges
Involvement of new sites
New institutions and experiments
New international partners (Brazil, Taiwan, Pakistan?, …)
Improvements in Grid middleware and services
Storage services
Integrating multiple VOs
Monitoring
Troubleshooting
Accounting
CASC Meeting (July 14, 2004) Paul Avery 41
LBL
Caltech
ISI
UCSD
SKC
CASC Meeting (July 14, 2004)
UW Milwaukee
UW Madison
Fermilab
Iowa Chicago
Michigan
Argonne
Indiana
Buffalo
PSU
Boston U
BNL
J. Hopkins
Hampton
Vanderbilt
Austin
Brownsville
Paul Avery
Tier1
Tier2
Other
UF
FIU
Partners
EU
Brazil
Korea
42
U Florida
Caltech
CMS (Tier2), Management
CMS (Tier2), LIGO (Management)
UC San Diego CMS (Tier2), CS
Indiana UATLAS (Tier2), iGOC (operations)
Boston U
Harvard
Wisconsin, Milwaukee
Penn State
Johns Hopkins
Chicago
Vanderbilt*
Southern California
Wisconsin, Madison
Texas, Austin
Salish Kootenai
Hampton U
Texas, Brownsville
Fermilab
Brookhaven
Argonne
ATLAS (Tier2)
ATLAS (Management)
LIGO (Tier2)
LIGO (Tier2)
SDSS (Tier2), NVO
CS , Coord./Management, ATLAS (Tier2)
BTEV (Tier2, unfunded)
CS
CS
CS
LIGO (Outreach, Tier3)
ATLAS (Outreach, Tier3)
LIGO (Outreach, Tier3)
CMS (Tier1), SDSS, NVO
ATLAS (Tier1)
ATLAS (Management), Coordination
CASC Meeting (July 14, 2004) Paul Avery 43
Most scientific data are not simple “measurements”
They are computationally corrected/reconstructed
They can be produced by numerical simulation
Science & eng. projects are more CPU and data intensive
Programs are significant community resources (transformations)
So are the executions of those programs (derivations)
Management of dataset dependencies critical!
Derivation: Instantiation of a potential data product
Provenance: Complete history of any existing data product
Previously: Manual methods
GriPhyN: Automated, robust tools
CASC Meeting (July 14, 2004) Paul Avery 44
“I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.”
Data
“I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.” product-of consumed-by/ generated-by
Transformation execution-of
Derivation
“I want to search a database for 3 muon SUSY events. If a program that does this analysis exists, I won’t have to write one from scratch.”
CASC Meeting (July 14, 2004) Paul Avery
“I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.”
45
decay = bb decay = ZZ decay = WW
WW leptons
Scientist adds a new derived data branch
& continues analysis mass = 160 decay = WW decay = WW
WW e decay = WW
WW e
Pt > 20
Other cuts Other cuts Other cuts
CASC Meeting (July 14, 2004) Paul Avery
Other cuts
46
Sloan Data
100000
Galaxy cluster size distribution
10000
1000
CASC Meeting (July 14, 2004)
100
10
1
1 10
Number of Galaxies
Paul Avery 100 47
LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK, AEI/Germany)
LHO, LLO: observatory sites
LSC - LIGO Scientific Collaboration
CASC Meeting (July 14, 2004) Paul Avery
Birmingham
•
Cardiff
AEI/Golm
•
48
Fundamentally alters conduct of scientific research
“Lab-centric”:
“Team-centric”:
Activities center around large facility
Resources shared by distributed teams
“Knowledge-centric”: Knowledge generated/used by a community
Strengthens role of universities in research
Couples universities to data intensive science
Couples universities to national & international labs
Brings front-line research and resources to students
Exploits intellectual resources of formerly isolated schools
Opens new opportunities for minority and women researchers
Builds partnerships to drive advances in IT/science/eng
HEP Physics, astronomy, biology, CS, etc.
“Application” sciences Computer Science
Universities
Scientists
Laboratories
Students
Research Community IT industry
CASC Meeting (July 14, 2004) Paul Avery 49