grid2003 - Physics at Louisiana Tech University

advertisement
The Grid2003 Project:
An Application Laboratory for Science
D0SAR Workshop
Louisiana Tech University
April 7th, 2004
Jorge L. Rodriguez
University of Florida
Department of Physics
jorge@phys.ufl.edu
What is Grid2003/Grid3?




International Data Grid with dozens of sites
Serving applications across various disciplines
HEP experiments (LHC, BTeV) Bio-chemical, CS
demonstrators…
Currently over 2000 CPUS available for use by
over 100 users
A peak throughput of 1100 concurrent jobs with
a completion efficiency of approximately 75%
Note: Grid2003 refers to the initial project from 8/2003 – 12/2003
Grid3 refers to the persistent grid infrastructure
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
2
Grid3 Organization

Stakeholders:

US LHC Software and Computing Projects


Grid projects (iVDGL, PPDG, GriPhyN)


CS groups, VDT team, iGOC
GriPhyN experiments


US ATLAS, US CMS
LIGO, SDSS as well as ATLAS and CMS
New collaborators



Vanderbilt BTeV (Fermilab) Group
Argonne computational biology group
U Buffalo chemical structure
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
3
Contributors
Boston University
Caltech
Hampton University
Harvard University
Indiana University
Johns Hopkins University
Vanderbilt University
University of Oklahoma
University of Chicago
University of Florida
University of Michigan
University at Buffalo
Argonne National Laboratory
Brookhaven National Laboratory
Fermi National Accelerator Laboratory
Kyungpook National University
Lawrence Berkeley National Laboratory
University of California San Diego
University of New Mexico
University of Southern California-ISI
University of Texas, Arlington
University of Wisconsin-Madison
University of Wisconsin-Milwaukee
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
4
Contributors
* Team Leads
Argonne National Laboratory: Jerry Gieraltowski, Scott Gose, Natalia Maltsev, Ed May, Alex
Rodriguez, Dinanath Sulakhe, Boston University: Jim Shank, Saul Youssef, Brookhaven
National Laboratory: David Adams, Rich Baker, Wensheng Deng, Jason Smith, Dantong Yu,
Caltech: Iosif Legrand, Suresh Singh, Conrad Steenberg, Yang Xia, Fermi National
Accelerator Laboratory: Anzar Afaq, Eileen Berman, James Annis, Lothar Bauerdick, Michael
Ernst, Ian Fisk, Lisa Giacchetti, Greg Graham, Anne Heavey, Joe Kaiser, Nickolai
Kuropatkin, Ruth Pordes*, Vijay Sekhri, John Weigand, Yujun Wu, Hampton University:
Keith Baker, Lawrence Sorrillo, Harvard University: John Huth, Indiana University: Matt Allen,
Leigh Grundhoefer, John Hicks, Fred Luehring, Steve Peck, Rob Quick, Stephen Simms,
Johns Hopkins University: George Fekete, Jan vandenBerg, Kyungpook National
University/KISTI: Kihyeon Cho, Kihwan Kwon, Dongchul Son, Hyoungwoo Park, Lawrence
Berkeley National Laboratory: Shane Canon, Jason Lee, Doug Olson, Iowa Sakrejda, Brian
Tierney, University at Buffalo: Mark Green, Russ Miller, University of California San Diego:
James Letts, Terrence Martin, University of Chicago: David Bury, Catalin Dumitrescu, Daniel
Engh, Ian Foster, Robert Gardner*, Marco Mambelli, Yuri Smirnov, Jens Voeckler, Mike
Wilde, Yong Zhao, Xin Zhao, University of Florida: Paul Avery, Richard Cavanaugh, Bockjoo
Kim, Craig Prescott, Jorge L. Rodriguez, Andrew Zahn, University of Michigan: Shawn
McKee, University of New Mexico: Christopher T. Jordan, James E. Prewett, Timothy L.
Thomas, University of Oklahoma: Horst Severini, University of Southern California: Ben
Clifford, Ewa Deelman, Larry Flon, Carl Kesselman, Gaurang Mehta, Nosa Olomu, Karan
Vahi, University of Texas, Arlington: Kaushik De, Patrick McGuigan, Mark Sosebee,
University of Wisconsin-Madison: Dan Bradley, Peter Couvares, Alan De Smet, Carey
Kireyev, Erik Paulson, Alain Roy, University of Wisconsin-Milwaukee: Scott Koranda, Brian
Moe, Vanderbilt University: Bobby Brown, Paul Sheldon
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
5
Grid3 Services

Software packaging Service (pacman)



Monitoring Services




MonALISA
ganglia
Metrics Data Viewer
User Authentication Service


Virtual Data Toolkit (VDT)
Additional middleware configuration packages
Virtual Organization Management Service (VOMS)
Grid3 Operations

The international Grid Operations Center (iGOC)
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
6
Grid3 Packaging
Grid Packaging Service

Packaging is the key to success!


Automation in software installation greatly improves
reliability of software deployments
Pacman package manager is used in Grid3

Complete installation and site configuration is
simplified to a single command:
% pacman –get iVDGL:Grid3

In reality it takes a little more work. However…
ref. pacman --- http://physics.bu.edu/~youssef/pacman/
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
8
The VDT packages

Globus Alliance











Condor/Condor-G
DAGMan
Fault Tolerant Shell
ClassAds
EDG & LCG




Grid Security Infrastructure (GSI)
Job submission (GRAM)
Information service (MDS)
Data transfer (GridFTP)
Replica Location (RLS)
Condor Group
Make Gridmap
Cert. Revocation List Updater
Glue Schema/Info provider
vers 1.1.12
ISI & UC



NCSA




MonALISA
VDT



PyGlobus
Netlogger
Caltech


MyProxy
GSI OpenSSH
LBL


Chimera & related tools
Pegasus
VDT System Profiler
Configuration software
Others

KX509 (U. Mich.)
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
9
Grid3 Monitoring
Monitoring Services

Ganglia - http://gocmon.uits.iupui.edu/ganglia-webfrontend


MonALISA - http://gocmon.uits.iupui.edu:8080/index.html


http://acdc.ccr.buffalo.edu/statistics/acdc/fullsizeindexqueue.php
Application uses globus GRAM to query job managers and collect information
about jobs. This information is stored in a DB and available for aggregated
queries and browsing.
Metrics Data Viewer (MDViewer) - http://grid.uchicago.edu/metrics/


Monitoring tool to support resource discovery, access to information and
gateway to other information gathering systems
ACDC Job Monitoring System 

Open source tool to collect cluster monitoring information such as CPU and
network load, memory and disk usage
Application to display and analyze information collected by the different
monitoring tools, queries Metrics DBs at iGOC.
Globus MDS

Information and Index Service for resource discovery, selection and
optimization. GLUE schema with Grid3 extension
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
11
Monitoring Infrastructure
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
12
Grid3 Authentication
Grid3 Authentication
edg-mkgridmap
user DNs
iVDGL VOMS server
site a client
DN mappings
gridmap-file
BTeV, LSC, iVDGL
site b client
user DNs
FNAL VOMS server
USCMS, SDSS
user DNs
gridmap-file
mapping of user’s grid
credentials (DN) to local
site group account
BNL VOMS server
USATLAS
site n client
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
gridmap-file
14
Grid3 Operations
Grid3 Operations: (iGOC)
http://www.ivdgl.org/grid2003/catalog
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
16
Grid3 Operations
Support and Policy




Investigation and resolution of grid middleware problems
at the level of 16-20 contacts per week
With other iGOC personnel develop Service Level
Agreements for iVDGL Grid service systems and iGOC
support service.
Membership Charter completed which defines the process
to add new VO’s, sites and applications to the Grid
Laboratory
Support Matrix defining Grid3 and VO services providers
and contact information
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
17
Grid2003 Applications
Project Application
Overview

7 Scientific applications and 3 CS demonstrators



Over 100 users authorized to run on Grid3



All iVDGL experiments participated in the Grid2003 project
A third HEP and two Bio-Chemical experiments also
participated
Application execution performed by dedicated individuals
Typically 1, 2 or 3 users ran the applications from a
particular experiment
Participation from all Grid3 sites



Sites categorized according to policies and resource
Applications ran concurrently on most of the sites
Large sites with generous local use policies where more
popular
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
19
Scientific Applications

High Energy Physics Simulation and Analysis

USCMS: MOP, GEANT based full MC simulation and reconstruction




USATLAS: GCE, GEANT based full MC simulation and reconstruction



Workflow is generated by Chimera VDS, Pegasus grid scheduler and
globus MDS for resource discovery
Data products archived at BNL : Magada and globus RLS are employed
USATLAS: DIAL, Distributed analysis application


Work flow and batch job scripts generated by McRunJob
Jobs generated at MOP master (outside of Grid3) submit jobs to Grid3
sites via condor-G
Data products are archived at FermiLab: SRM/dCache
Dataset catalogs built, n-tuple analysis and histogramming (data
generated on Grid3)
BTeV : Full MC simulation

Also utilizes the Chimera workflow generator and condor G (VDT)
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
20
Scientific Applications

Astrophysics and Astronomical



Bio-Chemical



LIGO/LSC: blind search for continuous gravitational waves
SDSS: maxBcg, cluster finding package
SnB: Bio-molecular program, analyses on X-ray diffraction to
find molecular structures
GADU/Gnare: Genome analysis, compares protein sequences
Computer Science

Evaluation of Adaptive data placement and scheduling
algorithms
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
21
CS Demonstrator
Applications

Exerciser


NetLogger-grid2003


Periodically runs low priority jobs at each site to test
operational status
Monitored data transfers between Grid3 sites via NetLogger
instrumented pyglobus-url-copy
GridFTP Demo

Data mover application using GridFTP designed to meet the
2TB/day metric
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
22
Running on Grid3

With information provided by the Grid3 information system
1. Composes list of target sites


Resource available
Local site policies
2. Finds where to install application and where to write data


Use of Grid3 Information Index Service (~MDS)
Provides pathname for $APP, $DATA, $TMP and $WNTMP
3. User sends and remotely installs application from a local site
4. User submit job(s) through globus GRAM

User never needs to interact with local site administrators
other than through the Grid3 services!
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
23
Grid3 Metrics
Grid3 Metrics Collection

Grid3 monitoring applications
(information consumers)



MonALISA
MetricsData Viewer
Queries to persistent
storage DB (on the gocmon server)


MonALISA plots
MDViewer plots
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
25
Grid3 Metrics Collection
MDViewer
MonALISA
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
26
Metrics Summary Table
Metric
Target
Grid2003
“SC2003”
Number of CPUs
400
2762 (27 sites)
Number of users
> 10
102 (16)
Number of Applications
>4
10
Number of site running concurrent applications
> 10
17
Peak number of concurrent jobs
1000
1100
> 2-3 TB
4.4 TB (11.12.03)
Data Transfer per day
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
27
Grid3 Status Summary

Current hardware resources

Total of 2693 CPUs



Total of 25 sites



Maximum CPU count
Off project contribution >
60%
25 administrative domains
with local policies in effect
All across US and Korea
Running jobs


Peak number of jobs 1100
During SC2003 various
Scientific applications were
running simultaneously across
various Grid3 sites
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
28
USCMS and Grid3
Canonical USCMS resources
Total resources with Grid3

So far have completed about
14.2 million events



Significant amount of
resources provided over that
available to USCMS alone
About 1.4 time the event
yield over dedicated USCMS
resources
USCMS alone has utilized
more than 147 CPU years on
Grid3 resources!

Another 20 CPU years by
other Grid3 applications
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
29
USCMS and Grid3
History over 3 month period
Grid3 sites only
USCMS sites only
MonALISA Plots
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
30
Outlook and Conclusions
Grid3 Near Term Plans
- What’s running on the grid now ?

USCMS has just about completed its Pre-Challenge
Production PCP04 but plans to continues its production runs

USCMS “MOP Regional Center” was asked to simulate 14.3
million JetMet events



Higgs analyses signal and background events, 13 channels in all
about 75% of them background
Particularly challenging run typical job runs for 5 days! Some
are run as long as 4 week!!
Work done in preparation for CMS’ Data Challenge 04

DC04 in a nutshell



Reconstruction at CERN (T0 center) of PCP04 “raw” data @ 25Hz
Stream and catalog at Tier1 centers (FNAL …)
Physics analysis in real time @ Tier1 and Tier2 sites
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
32
Grid3 Near Term Plans

USATLAS, SDSS and LIGO



cont.
USATLAS is in a development mode preparing for their DC2
challenge which begins April 1st, currently using Grid3 to run
test.
LIGO and SDSS are modifying their workflow generators to
enhance reliability and improve productivity when running on
the grid. Once work is completed they also intend to utilize
the resources
Bio-Chemical and Computer Science research

CS research is ongoing with of order 500 jobs being
submitted since SC2003. The work focus on data
management and scheduling. Many more of these
experiments are planned.
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
33
Grid3 Near Term Plans



New sites will be joining under existing VOs
New HEP experiment VO: CDF has begun work to port
their software environment to Grid3
New CS applications :




cont.
Virtual-organization-aware resource allocation
The Sphinx grid scheduler
Scalability and robustness of the VDT scheduling algorithms
…
Lots more to do on evolving the Grid3 infrastructure and
Operations model…
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
34
U.S. Open Science Grid

Goal: An integrated U.S. Grid infrastructure





Getting there: OSG-1 (Grid3+), OSG-2, …


Series of releases  increasing functionality & scale
Initial meetings



Grid computing infrastructure to support US scientific efforts
CPU & storage resources from laboratories and universities
DOE and NSF partnership
Internet2, ESNet, state, international optical networks
Sep. 17 @ NSF: Educators, scientists, etc.
Jan. 12 @ Fermilab: Public discussion, planning sessions
Next steps


White paper to be expanded into roadmap
Presentation to funding agencies (May/June?)
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
35
Conclusion
A project to deploy a reasonably large
distributed international data grid consisting
of tens
of sites
serving
over one
!Useful
work
was done
onhundred
Grid3!
users who running
applications from a variety
14 million events and counting
of scientific disciplines is successful.
It is still being used!
Jorge L. Rodriguez: The Grid2003 Project
D0SAR Workshop Louisiana Tech
36
Download