Open Science Grid Linking Universities and Laboratories in National CyberInfrastructure Paul Avery University of Florida avery@phys.ufl.edu SURA Infrastructure Workshop Austin, TX December 7, 2005 SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 1 Bottom-up Collaboration: “Trillium” Trillium = PPDG + GriPhyN + iVDGL PPDG: $12M (DOE) GriPhyN: $12M (NSF) iVDGL: $14M (NSF) ~150 (1999 – 2006) (2000 – 2005) (2001 – 2006) people with large overlaps between projects Universities, Strong labs, foreign partners driver for funding agency collaborations Inter-agency: NSF – DOE Intra-agency: Directorate – Directorate, Division – Division Coordinated internally to meet broad goals CS research, developing/supporting Virtual Data Toolkit (VDT) Grid deployment, using VDT-based middleware Unified entity when collaborating internationally SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 2 Common Middleware: Virtual Data Toolkit VDT NMI Sources (CVS) Build & Test Condor pool 22+ Op. Systems Build Binaries Test Pacman cache Package Patching RPMs Build Binaries GPT src bundles Test Build Binaries Many Contributors A unique laboratory for testing, supporting, deploying, packaging, upgrading, & SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 3 troubleshooting complex sets of software! VDT Growth Over 3 Years (1.3.8 now) www.griphyn.org/vdt/ 35 VDT 1.1.x 25 VDT 1.2.x VDT 1.3.x VDT 1.1.8 First real use by LCG 20 VDT 1.0 Globus 2.0b Condor 6.3.1 15 10 VDT 1.1.11 Grid3 SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery Apr-05 Jan-05 Oct-04 Jul-04 Apr-04 Jan-04 Oct-03 Jul-03 Jan-03 Oct-02 Apr-02 0 Jul-02 5 Apr-03 VDT 1.1.7 Switch to Globus 2.2 Jan-02 # of components 30 4 Components of VDT 1.3.5 Globus 3.2.1 Condor 6.7.6 RLS 3.0 ClassAds 0.9.7 Replica 2.2.4 DOE/EDG CA certs ftsh 2.0.5 EDG mkgridmap EDG CRL Update GLUE Schema 1.0 VDS 1.3.5b Java Netlogger 3.2.4 Gatekeeper-Authz MyProxy1.11 KX509 SURA Infrastructure Workshop (Dec. 7, 2005) System Profiler GSI OpenSSH 3.4 Monalisa 1.2.32 PyGlobus 1.0.6 MySQL UberFTP 1.11 DRM 1.2.6a VOMS 1.4.0 VOMS Admin 0.7.5 Tomcat PRIMA 0.2 Certificate Scripts Apache jClarens 0.5.3 New GridFTP Server GUMS 1.0.1 Paul Avery 5 VDT Collaborative Relationships Partner science projects Partner networking projects Partner outreach projects Requirements Other linkages Work force CS researchers Industry U.S.Grids International Outreach Prototyping & experiments Deployment, Feedback Computer Virtual Science, ENG, Techniques Tech Science Data Education & software Research Toolkit Transfer Communities Globus, Condor, NMI, TeraGrid, OSG EGEE, WLCG, Asia, South America QuarkNet, CHEPREO, Digital Divide SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 6 Major Science Driver: Large Hadron Collider (LHC) @ CERN 27 km Tunnel in Switzerland & France TOTEM CMS ALICE LHCb Search for Origin of Mass New fundamental forces Supersymmetry Other new particles SURA– Infrastructure Workshop (Dec. 7, 2005) 2007 ? ATLAS Paul Avery 7 LHC: Petascale Global Science Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 8 LHC Global Data Grid (2007+) 5000 physicists, 60 countries 10s of Petabytes/yr by 2008 1000 Petabytes in < 10 yrs? CMS Experiment Online System Tier 0 Tier 1 CERN Computer Center 150 - 1500 MB/s Korea Russia UK 10-40 Gb/s USA >10 Gb/s U Florida Tier 2 Caltech UCSD 2.5-10 Gb/s Tier 3 Tier 4 FIU Physics caches SURA Infrastructure Workshop (Dec. 7, 2005) Iowa Maryland PCs Paul Avery 9 Grid3 and Open Science Grid SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 10 Grid3: A National Grid Infrastructure October 2003 – July 2005 32 sites, 3,500 CPUs: Universities + 4 national labs Sites in US, Korea, Brazil, Taiwan Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS Brazil SURA Infrastructure Workshop (Dec. 7, 2005) www.ivdgl.org/grid3 Paul Avery 11 Grid3 Lessons Learned How to operate a Grid as a facility Tools, services, error recovery, procedures, docs, organization Delegation of responsibilities (Project, VO, service, site, …) Crucial role of Grid Operations Center (GOC) How to support people people relations Face-face How to test and validate Grid tools and applications Vital How role of testbeds to scale algorithms, software, process Some How meetings, phone cons, 1-1 interactions, mail lists, etc. successes, but “interesting” failure modes still occur to apply distributed cyberinfrastructure Successful production runs for several applications SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 12 http://www.opensciencegrid.org SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 13 Open Science Grid: July 20, 2005 Production Grid: 50+ sites, 15,000 CPUs “present” (available but not at one time) Sites in US, Korea, Brazil, Taiwan Integration Grid: 10-12 sites Taiwan, S.Korea Sao Paolo SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 14 OSG Operations Snapshot November 7: 30 days SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 15 OSG Participating Disciplines Computer Science Condor, Globus, SRM, SRB Test and validate innovations: new services & technologies Physics LIGO, Nuclear Physics, Tevatron, LHC Global Grid: computing & data access Astrophysics Sloan Digital Sky Survey CoAdd: multiply-scanned objects Argonne GADU project Spectral fitting analysis BLAST, BLOCKS, gene sequences, etc Bioinformatics Dartmouth Psychological & Functional MRI Brain Sciences CCR (U Buffalo) University campus GLOW (U Wisconsin) Resources, portals, apps TACC (Texas Advanced Computing Center) MGRID (U Michigan) UFGRID (U Florida) Crimson Grid (Harvard) SURA Infrastructure Workshop (Dec. 7, 2005) 16 FermiGridPaul Avery (FermiLab Grid) OSG Grid Partners TeraGrid • “DAC2005”: run LHC apps on TeraGrid resources • TG Science Portals for other applications • Discussions on joint activities: Security, Accounting, Operations, Portals • Joint Operations Workshops, defining mechanisms to exchange support tickets • Joint Security working group • US middleware federation contributions to coremiddleware gLITE Worldwide LHC Computing Grid • OSG contributes to LHC global data handling and analysis systems Other partners • SURA, GRASE, LONI, TACC • Representatives of VOs provide portals and interfaces to their user groups EGEE SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 17 Example of Partnership: WLCG and EGEE SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 18 OSG Technical Groups & Activities Technical Groups address and coordinate technical areas Propose and carry out activities related to their given areas Liaise & collaborate with other peer projects (U.S. & international) Participate in relevant standards organizations. Chairs participate in Blueprint, Integration and Deployment activities Activities are well-defined, scoped tasks contributing to OSG Each Activity has deliverables and a plan … is self-organized and operated … is overseen & sponsored by one or more Technical Groups TGs and Activities are where the real work gets done SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 19 OSG Technical Groups (deprecated!) Governance Charter, organization, by-laws, agreements, formal processes Policy VO & site policy, authorization, priorities, privilege & access rights Security Common security principles, security infrastructure Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting Storage Storage services at remote sites, interfaces, interoperability Infrastructure and services for user support, helpdesk, trouble ticket Training, interface with various E/O projects Support Centers Education / Outreach Networks (new) Including interfacing with various networking projects SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 20 OSG Activities Blueprint Defining principles and best practices for OSG Deployment Deployment of resources & services Provisioning Connected to deployment Incidence response Plans and procedures for responding to security incidents Integration Testing & validating & integrating new services and technologies Data Resource Management (DRM) Deployment of specific Storage Resource Management technology Documentation Organizing the documentation infrastructure Accounting Accounting and auditing use of OSG resources Interoperability Primarily interoperability between Operations Operating Grid-wide services SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 21 OSG Integration Testbed: Testing & Validating Middleware Taiwan Brazil Korea SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 22 Networks SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 23 Evolving Science Requirements for Networks (DOE High Performance Network Workshop) End2End Throughput 5 years End2End Throughput High Energy Physics Climate (Data & Computation) SNS NanoScience 0.5 Gb/s 100 Gb/s 5-10 Years End2End Throughput 1000 Gb/s 0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s Not yet started 1 Gb/s 1000 Gb/s + QoS for Control Channel Fusion Energy 0.066 Gb/s (500 MB/s burst) 0.013 Gb/s (1 TB/week) 0.2 Gb/s (500MB/ 20 sec. burst) N*N multicast N x 1000 Gb/s Time critical throughput 1000 Gb/s 0.091 Gb/s (1 TB/day) 100s of users 1000 Gb/s + QoS for Control Channel Computational steering and collaborations High throughput and steering Science Areas Astrophysics Genomics Data & Computation Today Remarks High bulk throughput High bulk throughput Remote control and time critical throughput http://www.doecollaboratory.org/meetings/hpnpw SURA See Infrastructure Workshop (Dec. 7, 2005) Paul Avery / 24 UltraLight Integrating Advanced Networking in Applications http://www.ultralight.org SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 10 Gb/s+ network • Caltech, UF, FIU, UM, MIT • SLAC, FNAL • Int’l partners 25 • Level(3), Cisco, NLR Education Training Communications SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 26 iVDGL, GriPhyN Education/Outreach Basics $200K/yr Led by UT Brownsville Workshops, portals, tutorials Partnerships with QuarkNet, CHEPREO, LIGO E/O, … SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 27 Grid Training Activities June 36 July 2004: First US Grid Tutorial (South Padre Island, Tx) students, diverse origins and types 2005: Second Grid Tutorial (South Padre Island, Tx) 42 students, simpler physical setup (laptops) Reaching a wider audience Lectures, exercises, video, on web Students, postdocs, scientists Coordination of training activities “Grid Cookbook” (Trauner & Yafchak) More tutorials, 3-4/year CHEPREO tutorial in 2006? SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 28 QuarkNet/GriPhyN e-Lab Project http://quarknet.uchicago.edu/elab/cosmic/home.jsp SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 29 CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S. America) Funded September 2003 $4M initially (3 years) MPS, CISE, EHR, INT Grids and the Digital Divide Background World Summit on Information Society HEP Standing Committee on Interregional Connectivity (SCIC) Themes Global collaborations, Grids and addressing the Digital Divide Focus on poorly connected regions Brazil (2004), Korea (2005) SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 31 Science Grid Communications Broad set of activities (Katie Yurkewicz) News releases, PR, etc. Science Grid This Week OSG Newsletter Not restricted to OSG www.interactions.org/sgtw SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 32 Grid Timeline First US-LHC Grid Testbeds Grid Communications Grid3 operations GriPhyN, $12M UltraLight, $2M iVDGL, $14M 2000 2001 LIGO Grid 2002 VDT 1.0 DISUN, $10M 2003 2004 CHEPREO, $4M PPDG, $9.5M Start of LHC 2005 2006 2007 OSG operations Grid Summer Schools Digital Divide Workshops SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 33 Future of OSG CyberInfrastructure OSG is a unique national infrastructure for science Large CPU, storage and network capability crucial for science Supporting advanced middleware Long-term support of the Virtual Data Toolkit (new disciplines & international collaborations OSG currently supported by a “patchwork” of projects Collaborating Developing projects, separately funded workplan for long-term support Maturing, hardening facility Extending facility to lower barriers to participation Oct. 27 presentation to DOE and NSF SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 34 OSG Consortium Meeting: Jan 23-25 University of Florida (Gainesville) About 100 – 120 people expected Funding agency invitees Schedule Monday Morning: Monday Afternoon: Tuesday Morning: Tuesday Afternoon: Wednesday Morning: Wednesday Afternoon: Thursday: Applications plenary (rapporteurs) Partner Grid projects plenary Parallel Plenary Parallel Plenary OSG Council meeting Taiwan, S.Korea Sao Paolo SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 35 Disaster Planning Emergency Response SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 36 Grids and Disaster Planning / Emergency Response Inspired by recent events Dec. 2004 tsunami in Indonesia Aug. 2005 Katrina hurricane and subsequent flooding (Quite different time scales!) Connection Resources of DP/ER to Grids to simulate detailed physical & human consequences of disasters Priority pooling of resources for a societal good In principle, a resilient distributed resource Ensemble approach well suited to Grid/cluster computing E.g., given a storm’s parameters & errors, bracket likely outcomes Huge number of jobs required Embarrassingly parallel SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 37 DP/ER Scenarios Simulating physical scenarios Hurricanes, storm surges, floods, forest fires Pollutant dispersal: chemical, oil, biological and nuclear spills Disease epidemics Earthquakes, tsunamis Nuclear attacks Loss of network nexus points (deliberate or side effect) Astronomical impacts Simulating human responses to these situations Roadways, evacuations, availability of resources Detailed models (geography, transportation, cities, institutions) Coupling human response models to specific physical scenarios Other possibilities “Evacuation” of important data to safe storage SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 38 DP/ER and Grids: Some Implications DP/ER scenarios are not equally amenable to Grid approach E.g., tsunami vs hurricane-induced flooding Specialized Grids can be envisioned for very short response times But all can be simulated “offline” by researchers Other “longer term” scenarios ER is an extreme example of priority computing Priority use of IT resources is common (conferences, etc) Is ER priority computing different in principle? Other implications Requires long-term engagement with DP/ER research communities (Atmospheric, ocean, coastal ocean, social/behavioral, economic) Specific communities with specific applications to execute Digital Divide: resources to solve problems of interest to 3rd World Forcing function for Grid standards? Legal liabilities? SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 39 Grid Project References Open Science Grid UltraLight www.opensciencegrid.org Grid3 www.ultralight.org Globus www.ivdgl.org/grid3 Virtual Data Toolkit www.griphyn.org/vdt GriPhyN www.griphyn.org iVDGL www.ivdgl.org www.globus.org Condor www.cs.wisc.edu/condor WLCG www.cern.ch/lcg EGEE www.eu-egee.org PPDG www.ppdg.net CHEPREO www.chepreo.org SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 40 Extra Slides SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 41 Grid3 Use by VOs Over 13 Months SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 42 CMS: “Compact” Muon Solenoid Inconsequential humans SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 43 LHC: Beyond Moore’s Law Estimated CPU Capacity at CERN Intel CPU (2 GHz) = 0.1K SI95 6,000 LHC CPU Requirements K SI95 5,000 4,000 Moore’s Law (2000) 3,000 2,000 1,000 SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 0 44 Grids and Globally Distributed Teams Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 45 Sloan Digital Sky Survey (SDSS) Using Virtual Data in GriPhyN Sloan Data 100000 Galaxy cluster size distribution Number of Clusters 10000 1000 100 10 1 SURA Infrastructure Workshop (Dec.1 7, 2005) 10 Number of Galaxies Paul Avery 100 46 The LIGO Scientific Collaboration (LSC) and the LIGO Grid LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham• Cardiff AEI/Golm • * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration SURA Infrastructure Workshop (Dec. 7, 2005) Paul Avery 47