View PowerPoint Presentation

advertisement

Physics Grids and Open Science Grid

CASC Meeting

Washington, DC

July 14, 2004

CASC Meeting (July 14, 2004) Paul Avery

Paul Avery

University of Florida avery@phys.ufl.edu

1

U.S. “Trillium” Grid Consortium

Trillium = PPDG + GriPhyN + iVDGL

Particle Physics Data Grid: $12M (DOE) (1999 – 2004+)

GriPhyN:

 iVDGL:

$12M (NSF) (2000 – 2005)

$14M (NSF) (2001 – 2006)

Basic composition (~150 people)

PPDG: 4 universities, 6 labs

GriPhyN: 12 universities, SDSC, 3 labs

 iVDGL: 18 universities, SDSC, 4 labs, foreign partners

Expts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO

Complementarity of projects

GriPhyN: CS research, Virtual Data Toolkit (VDT) development

PPDG: “End to end” Grid services, monitoring, analysis

 iVDGL: Grid laboratory deployment using VDT

Experiments provide frontier challenges

Unified entity when collaborating internationally

CASC Meeting (July 14, 2004) Paul Avery 2

Trillium Science Drivers

Experiments at Large Hadron Collider

100s of Petabytes 2007 - ?

High Energy & Nuclear Physics expts

~1 Petabyte (1000 TB) 1997 – present

LIGO (gravity wave search)

100s of Terabytes 2002 – present

Sloan Digital Sky Survey

10s of Terabytes 2001 – present

2009

2007

2005

2003

2001

Future Grid resources

 Massive CPU (PetaOps)

 Large distributed datasets (>100PB)

 Global communities (1000s)

CASC Meeting (July 14, 2004) Paul Avery 3

Large Hadron Collider (LHC) @ CERN

 27 km Tunnel in Switzerland & France

TOTEM

CMS

ALICE

LHCb

Search for Origin of

Mass & Supersymmetry

(2007 – ?)

CASC Meeting (July 14, 2004) Paul Avery

ATLAS

4

Trillium: Collaborative Program of Work

Common experiments, leadership, participants

CS research

Workflow, scheduling, virtual data

Common Grid toolkits and packaging

Virtual Data Toolkit (VDT) + Pacman packaging

Common Grid infrastructure: Grid3

National Grid for testing, development and production

Advanced networking

Ultranet, UltraLight, etc.

Integrated education and outreach effort

+ collaboration with outside projects

Unified entity in working with international projects

LCG, EGEE, South America

CASC Meeting (July 14, 2004) Paul Avery 5

Goal: Peta-scale Virtual-Data Grids for Global Science

Production Team

Single Researcher Workgroups

GriPhyN

Interactive User Tools

GriPhyN

Virtual Data Tools

Resource

Management

Services

Request Planning &

Scheduling Tools

Security and

Policy

Services

GriPhyN

Request Execution &

Management Tools

Other Grid

Services

PetaOps

Petabytes

Performance

Raw data source

Transforms

GriPhyN

Distributed resources

(code, storage, CPUs, networks)

CASC Meeting (July 14, 2004) Paul Avery 6

LHC: Petascale Global Science

Complexity: Millions of individual detector channels

Scale: PetaOps (CPU), 100s of Petabytes (Data)

Distribution: Global distribution of people & resources

CMS Example- 2007

5000+ Physicists

250+ Institutes

60+ Countries

CASC Meeting (July 14, 2004) Paul Avery

BaBar/D0 Example - 2004

700+ Physicists

100+ Institutes

35+ Countries

7

Global LHC Data Grid Hierarchy

CMS Experiment

~10s of Petabytes/yr by 2007-8

~1000 Petabytes in < 10 yrs?

Online

System

0.1 - 1.5 GBytes/s

Tier 0

10-40 Gb/s

CERN Computer

Center

Tier 1

Tier 2

Tier 3

Korea

Physics caches

UK

FIU

Russia

U Florida Caltech

Iowa

USA

>10 Gb/s

UCSD

2.5-10 Gb/s

Maryland

Tier 4

CASC Meeting (July 14, 2004)

PCs

Paul Avery 8

HEP Bandwidth Roadmap (Gb/s)

Year Production Experimental Remarks

2001 0.155 0.62 - 2.5 SONET/SDH

2002

2003

2005

0.622

2.5

10

2.5

10

2-4

10

SONET/SDH

DWDM; GigE Integ.

DWDM;

1+10 GigE Integration

Switch;

Provisioning

1 st

Gen.

Grids 2007

2009

2011

2-4

10

1-2

40

~5

40

1

40

~5

40

~25

40

40 Gb/s;

Switching

Terabit Networks

2013 ~Terabit ~MultiTbps ~Fill One Fiber

HEP: Leading role in network development/deployment

CASC Meeting (July 14, 2004) Paul Avery 9

Tier2 Centers

Tier2 facility

20 – 40% of Tier1?

“1 FTE support”: commodity CPU & disk, no hierarchical storage

Essential university role in extended computing infrastructure

Validated by 3 years of experience with proto-Tier2 sites

Functions

Physics analysis, simulations

Support experiment software

Support smaller institutions

Official role in Grid hierarchy (U.S.)

Sanctioned by MOU with parent organization (ATLAS, CMS, LIGO)

Local P.I. with reporting responsibilities

Selection by collaboration via careful process

CASC Meeting (July 14, 2004) Paul Avery 10

Analysis by Globally Distributed Teams

 Non-hierarchical: Chaotic analyses + productions

 Superimpose significant random data flows

CASC Meeting (July 14, 2004) Paul Avery 11

Sources

(CVS)

Trillium Grid Tools: Virtual Data Toolkit

NMI

Use NMI processes later

VDT

Build & Test

Condor pool

(37 computers)

Build Binaries

Test

Pacman cache

Package

Patching

RPMs

GPT src bundles

Build Binaries

Test

Build Binaries

Contributors (VDS, etc.)

Paul Avery 12 CASC Meeting (July 14, 2004)

Virtual Data Toolkit:

Tools in VDT 1.1.12

Globus Alliance

Grid Security Infrastructure (GSI)

Job submission (GRAM)

Information service (MDS)

Data transfer (GridFTP)

Replica Location (RLS)

ISI & UC

Chimera & related tools

Pegasus

NCSA

MyProxy

GSI OpenSSH

Condor Group

Condor/Condor-G

DAGMan

Fault Tolerant Shell

ClassAds

EDG & LCG

Make Gridmap

Cert. Revocation List Updater

Glue Schema/Info provider

LBL

PyGlobus

Netlogger

Caltech

MonaLisa

VDT

VDT System Profiler

Configuration software

Others

KX509 (U. Mich.)

CASC Meeting (July 14, 2004) Paul Avery 13

VDT Growth (1.1.14 Currently)

22

20

18

16

14

12

10

2

0

8

6

4

VDT 1.0

Globus 2.0b

Condor 6.3.1

VDT 1.1.8

First real use by LCG

VDT 1.1.11

Grid3

VDT 1.1.14

May 10

VDT 1.1.3,

1.1.4 & 1.1.5

Switch to Globus 2.2

pre-SC 2002

VDT 1.1.7

Number of Components

CASC Meeting (July 14, 2004) Paul Avery 14

Packaging of Grid Software: Pacman

 Language: define software environments

 Interpreter: create, install, configure, update, verify environments

 LCG /Scram

 ATLAS /CMT

 CMS DPE /tar/make

 LIGO /tar/make

 OpenSource /tar/make

 Globus /GPT

 NPACI/TeraGrid /tar/make

 D0 /UPS-UPD

 Commercial /tar/make

Combine and manage software from arbitrary sources.

% pacman –get iVDGL:Grid3

“1 button install”:

Reduce burden on administrators

VDT

VDT iVDGL

% pacman

CASC Meeting (July 14, 2004)

LIGO

UCHEP

D-

Zero

CMS/DPE

ATLA

S

Paul Avery

NPAC

I

Remote experts define installation/ config/updating for everyone at once

15

Trillium Collaborative Relationships

Internal and External

Partner Physics projects

Partner Outreach projects

Other linkages

Work force

CS researchers

Industry

Requirements

Prototyping

& experiments Production

Deployment

Computer

Science

Research

Techniques

& software

Virtual

Data

Toolkit

Tech

Transfer

Larger

Science

Community

U.S.Grids

Int’l

Outreach

CASC Meeting (July 14, 2004)

Globus, Condor, NMI, iVDGL, PPDG,

EU DataGrid, LHC Experiments,

QuarkNet, CHEPREO

Paul Avery 16

Grid3: An Operational National Grid

28 sites: Universities + national labs

2800 CPUs, 400–1300 jobs

Running since October 2003

Applications in HEP, LIGO, SDSS, Genomics

CASC Meeting (July 14, 2004) http://www.ivdgl.org/grid3

Paul Avery 17

Grid3: Three Months Usage

CASC Meeting (July 14, 2004) Paul Avery 18

Production Simulations on Grid3

US-CMS Monte Carlo Simulation

Used = 1.5

US-CMS resources

Non-USCMS

USCMS

CASC Meeting (July 14, 2004) Paul Avery 19

Grid3

Open Science Grid

Build on Grid3 experience

Persistent, production-quality Grid, national + international scope

Ensure U.S. leading role in international science

Grid infrastructure for large-scale collaborative scientific research

Create large computing infrastructure

Combine resources at DOE labs and universities to effectively become a single national computing infrastructure for science

Provide opportunities for educators and students

Participate in building and exploiting this grid infrastructure

Develop and train scientific and technical workforce

Transform the integration of education and research at all levels

http://www.opensciencegrid.org

CASC Meeting (July 14, 2004) Paul Avery 20

Roadmap towards Open Science Grid

Iteratively build & extend Grid3, to national infrastructure

Shared resources, benefiting broad set of disciplines

Strong focus on operations

Grid3  Open Science Grid

Contribute US-LHC resources to provide initial federation

US-LHC Tier-1 and Tier-2 centers provide significant resources

Build OSG from laboratories, universities, campus grids, etc.

Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. Lab

UW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc.

Corporations?

Further develop OSG

Partnerships and contributions from other sciences, universities

Incorporation of advanced networking

Focus on general services, operations, end-to-end performance

CASC Meeting (July 14, 2004) Paul Avery 21

Possible OSG Collaborative Framework

Campus, Labs

Service

Providers

Sites

Researchers

VO Org

Research

Grid Projects

Enterprise

Participants provide: resources, management, project steering groups

CASC Meeting (July 14, 2004)

Technical

Groups

0…n (small) activity activity activity

1

0…N (large)

Paul Avery

Consortium Board

(1)

Joint committees

(0… N small)

22

Open Science Grid Consortium

Join U.S. forces into the Open Science Grid consortium

Application scientists, technology providers, resource owners, ...

Provide a lightweight framework for joint activities...

Coordinating activities that cannot be done within one project

Achieve technical coherence and strategic vision (roadmap)

Enable communication and provide liaising

Reach common decisions when necessary

Create Technical Groups

Propose, organize, oversee activities, peer, liaise

Now: Security and storage services

Later: Support centers, policies

Define activities

Contributing tasks provided by participants joining the activity

CASC Meeting (July 14, 2004) Paul Avery 23

Applications, Infrastructure, Facilities

CASC Meeting (July 14, 2004) Paul Avery 24

Open Science Grid Partnerships

CASC Meeting (July 14, 2004) Paul Avery 25

Partnerships for Open Science Grid

Complex matrix of partnerships: challenges & opportunities

NSF and DOE, Universities and Labs, physics and computing depts.

U.S. LHC and Trillium Grid Projects

Application sciences, computer sciences, information technologies

Science and education

U.S., Europe, Asia, North and South America

Grid middleware and infrastructure projects

CERN and U.S. HEP labs, LCG and Experiments, EGEE and OSG

~85 participants on OSG stakeholder mail list

~50 authors on OSG abstract to CHEP conference (Sep. 2004)

Meetings

Jan. 12: Initial stakeholders meeting

May 20-21: Joint Trillium Steering meeting to define program

Sep. 9-10: OSG workshop, Boston

CASC Meeting (July 14, 2004) Paul Avery 26

Education and Outreach

27 CASC Meeting (July 14, 2004) Paul Avery

Grids and the Digital Divide

Rio de Janeiro, Feb. 16-20, 2004

NEWS:

Bulletin: ONE TWO

WELCOME BULLETIN

General Information

Registration

Travel Information

Hotel Registration

Participant List

How to Get UERJ/Hotel

Computer Accounts

Useful Phone Numbers

Program

Contact us:

Secretariat

Chairmen

Background

 World Summit on Information Society

 HEP Standing Committee on Inter-regional

Connectivity (SCIC)

Themes

 Global collaborations, Grids and addressing the Digital Divide

Next meeting: 2005 (Korea)

CASC Meeting (July 14, 2004) http://www.uerj.br/lishep2004

Paul Avery 28

iVDGL, GriPhyN Education / Outreach

Basics

 $200K/yr

 Led by UT Brownsville

 Workshops, portals

 Partnerships with

CHEPREO, QuarkNet, …

CASC Meeting (July 14, 2004) Paul Avery 29

June 21-25 Grid Summer School

First of its kind in the U.S. (South Padre Island, Texas)

36 students, diverse origins and types (M, F, MSIs, etc)

Marks new direction for Trillium

First attempt to systematically train people in Grid technologies

First attempt to gather relevant materials in one place

Today: Students in CS and Physics

Later: Students, postdocs, junior & senior scientists

Reaching a wider audience

Put lectures, exercises, video, on the web

More tutorials, perhaps 3-4/year

Dedicated resources for remote tutorials

Create “Grid book”, e.g. Georgia Tech

New funding opportunities

NSF: new training & education programs

CASC Meeting (July 14, 2004) Paul Avery 30

CASC Meeting (July 14, 2004) Paul Avery 31

CHEPREO: C

enter for

H

igh

E

nergy

P

hysics

R

esearch and

E

ducational

O

utreach

Florida International University

 Physics Learning Center

 CMS Research

 iVDGL Grid Activities

 AMPATH network (S. America)

 Funded September 2003

 $4M initially (3 years)

 4 NSF Directorates!

UUEO: A New Initiative

Meeting April 8 in Washington DC

Brought together ~40 outreach leaders (including NSF)

Proposed Grid-based framework for common E/O effort

CASC Meeting (July 14, 2004) Paul Avery 33

Grid Project References

Grid3

 www.ivdgl.org/grid3

PPDG

 www.ppdg.net

GriPhyN

 www.griphyn.org

iVDGL

 www.ivdgl.org

Globus

 www.globus.org

LCG

 www.cern.ch/lcg

EU DataGrid

 www.eu-datagrid.org

EGEE

 egee-ei.web.cern.ch

CASC Meeting (July 14, 2004) Paul Avery

2nd Edition www.mkp.com/grid2

34

Extra Slides

CASC Meeting (July 14, 2004) Paul Avery 35

Outreach

QuarkNet-Trillium Virtual Data Portal

More than a web site

Organize datasets

Perform simple computations

Create new computations & analyses

View & share results

Annotate & enquire (metadata)

Communicate and collaborate

Easy to use, ubiquitous,

No tools to install

Open to the community

Grow & extend

Initial prototype implemented by graduate student Yong Zhao and

M. Wilde (U. of Chicago)

CASC Meeting (July 14, 2004) Paul Avery 36

GriPhyN Achievements

Virtual Data paradigm to express science processes

Unified language (VDL) to express general data transformation

Advanced planners, executors, monitors, predictors, fault recovery

 to make the Grid “like a workstation”

Virtual Data Toolkit (VDT)

Tremendously simplified installation & configuration of Grids

Close partnership with and adoption by multiple sciences:

ATLAS, CMS, LIGO, SDSS, Bioinformatics, EU Projects

Broad education & outreach program (UT Brownsville)

25 graduate, 2 undergraduate; 3 CS PhDs by end of 2004

Virtual Data for QuarkNet Cosmic Ray project

Grid Summer School 2004, 3 MSIs participating

CASC Meeting (July 14, 2004) Paul Avery 37

Grid3 Broad Lessons

Careful planning and coordination essential to build Grids

Community investment of time/resources

Operations team needed to operate Grid as a facility

Tools, services, procedures, documentation, organization

Security, account management, multiple organizations

Strategies needed to cope with increasingly large scale

“Interesting” failure modes as scale increases

Delegation of responsibilities to conserve human resources

Project, Virtual Org., Grid service, site, application

Better services, documentation, packaging

Grid3 experience critical for building “useful” Grids

Frank discussion in “Grid3 Project Lessons” doc

CASC Meeting (July 14, 2004) Paul Avery 38

Grid2003 Lessons (1)

Need for financial investment

Grid projects: PPDG + GriPhyN + iVDGL: $35M

National labs: Fermilab, Brookhaven, Argonne, LBL

Other expts: LIGO, SDSS

Critical role of collaboration

CS + 5 HEP + LIGO + SDSS + CS + biology

Makes possible large common effort

Collaboration sociology ~50% of issues

Building “stuff”: Prototypes  testbeds  production Grids

Build expertise, debug software, develop tools & services

Need for “operations team”

Point of contact for problems, questions, outside inquiries

Resolution of problems by direct action, consultation w/ experts

CASC Meeting (July 14, 2004) Paul Avery 39

Grid2003 Lessons (2): Packaging

VDT + Pacman: Installation and configuration

Simple installation, configuration of Grid tools + applications

Major advances over 14 VDT releases

Why this is critical for us

Uniformity and validation across sites

Lowered barriers to participation  more sites!

Reduced FTE overhead & communication traffic

The next frontier: remote installation

CASC Meeting (July 14, 2004) Paul Avery 40

Grid2003 and Beyond

Further evolution of Grid3: (Grid3+, etc.)

Contribute resources to persistent Grid

Maintain development Grid, test new software releases

Integrate software into the persistent Grid

Participate in LHC data challenges

Involvement of new sites

New institutions and experiments

New international partners (Brazil, Taiwan, Pakistan?, …)

Improvements in Grid middleware and services

Storage services

Integrating multiple VOs

Monitoring

Troubleshooting

Accounting

CASC Meeting (July 14, 2004) Paul Avery 41

International Virtual Data Grid Laboratory

LBL

Caltech

ISI

UCSD

SKC

CASC Meeting (July 14, 2004)

UW Milwaukee

UW Madison

Fermilab

Iowa Chicago

Michigan

Argonne

Indiana

Buffalo

PSU

Boston U

BNL

J. Hopkins

Hampton

Vanderbilt

Austin

Brownsville

Paul Avery

Tier1

Tier2

Other

UF

FIU

Partners

EU

Brazil

Korea

42

Roles of iVDGL Institutions

U Florida

Caltech

CMS (Tier2), Management

CMS (Tier2), LIGO (Management)

UC San Diego CMS (Tier2), CS

Indiana UATLAS (Tier2), iGOC (operations)

Boston U

Harvard

Wisconsin, Milwaukee

Penn State

Johns Hopkins

Chicago

Vanderbilt*

Southern California

Wisconsin, Madison

Texas, Austin

Salish Kootenai

Hampton U

Texas, Brownsville

Fermilab

Brookhaven

Argonne

ATLAS (Tier2)

ATLAS (Management)

LIGO (Tier2)

LIGO (Tier2)

SDSS (Tier2), NVO

CS , Coord./Management, ATLAS (Tier2)

BTEV (Tier2, unfunded)

CS

CS

CS

LIGO (Outreach, Tier3)

ATLAS (Outreach, Tier3)

LIGO (Outreach, Tier3)

CMS (Tier1), SDSS, NVO

ATLAS (Tier1)

ATLAS (Management), Coordination

CASC Meeting (July 14, 2004) Paul Avery 43

“Virtual Data”: Derivation & Provenance

Most scientific data are not simple “measurements”

They are computationally corrected/reconstructed

They can be produced by numerical simulation

Science & eng. projects are more CPU and data intensive

Programs are significant community resources (transformations)

So are the executions of those programs (derivations)

Management of dataset dependencies critical!

Derivation: Instantiation of a potential data product

Provenance: Complete history of any existing data product

 Previously: Manual methods

 GriPhyN: Automated, robust tools

CASC Meeting (July 14, 2004) Paul Avery 44

Virtual Data Motivations

“I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.”

Data

“I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.” product-of consumed-by/ generated-by

Transformation execution-of

Derivation

“I want to search a database for 3 muon SUSY events. If a program that does this analysis exists, I won’t have to write one from scratch.”

CASC Meeting (July 14, 2004) Paul Avery

“I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.”

45

Virtual Data Example: LHC Analysis

decay = bb decay = ZZ decay = WW

WW  leptons

Scientist adds a new derived data branch

& continues analysis mass = 160 decay = WW decay = WW

WW  e  decay = WW

WW  e 

Pt > 20

Other cuts Other cuts Other cuts

CASC Meeting (July 14, 2004) Paul Avery

Other cuts

46

Sloan Digital Sky Survey (SDSS)

Using Virtual Data in GriPhyN

Sloan Data

100000

Galaxy cluster size distribution

10000

1000

CASC Meeting (July 14, 2004)

100

10

1

1 10

Number of Galaxies

Paul Avery 100 47

The LIGO International Grid

LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK, AEI/Germany)

 LHO, LLO: observatory sites

 LSC - LIGO Scientific Collaboration

CASC Meeting (July 14, 2004) Paul Avery

Birmingham

 Cardiff

AEI/Golm

48

Grids: Enhancing Research & Learning

Fundamentally alters conduct of scientific research

“Lab-centric”:

“Team-centric”:

Activities center around large facility

Resources shared by distributed teams

“Knowledge-centric”: Knowledge generated/used by a community

Strengthens role of universities in research

Couples universities to data intensive science

Couples universities to national & international labs

Brings front-line research and resources to students

Exploits intellectual resources of formerly isolated schools

Opens new opportunities for minority and women researchers

Builds partnerships to drive advances in IT/science/eng

HEP  Physics, astronomy, biology, CS, etc.

“Application” sciences  Computer Science

Universities

Scientists

Laboratories

Students

Research Community  IT industry

CASC Meeting (July 14, 2004) Paul Avery 49

Download