High Energy & Nuclear Physics Experiments and Advanced Cyberinfrastructure www.opensciencegrid.org Internet2 Meeting San Diego, CA October 11, 2007 Internet2 Presentation (Oct. 11, 2007) Paul Avery University of Florida avery@phys.ufl.edu Paul Avery 1 Context: Open Science Grid Consortium of many organizations (multiple disciplines) Production grid cyberinfrastructure 75+ sites, 30,000+ CPUs: US, UK, Brazil, Taiwan Internet2 Presentation (Oct. 11, 2007) Paul Avery 2 OSG Science Drivers at Large Hadron Collider New High Energy & Nuclear Physics expts Top quark, nuclear matter at extreme density ~10 petabytes 1997 – present LIGO (gravity wave search) Search for gravitational waves ~few petabytes 2002 – present 2005 2003 2001 Future Grid resources 2007 Data growth fundamental particles and forces 100s of petabytes 2008 - ? 2009 Community growth Experiments Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s) International optical networks Internet2 Presentation (Oct. 11, 2007) Paul Avery 3 OSG History in Context Primary Drivers: LHC and LIGO LIGO operation LIGO preparation LHC construction, preparation, commissioning LHC Ops iVDGL(NSF) GriPhyN(NSF) Trillium Grid3 OSG (DOE+NSF) PPDG (DOE) 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 European Grid + Worldwide LHC Computing Grid Campus, regional grids Internet2 Presentation (Oct. 11, 2007) Paul Avery 4 LHC Experiments at CERN 27 km Tunnel in Switzerland & France TOTEM CMS ALICE ATLAS Search for Origin of Mass New fundamental forces Supersymmetry Other new particles Internet2 2008 – ?Presentation (Oct. 11, 2007) Paul Avery LHCb 5 Collisions at LHC (2008?) ProtonProton Protons/bunch Beam energy Luminosity Bunch 2835 bunch/beam 1011 7 TeV x 7 TeV 1034 cm2s1 Crossing rate every 25 nsec (~20 Collisions/Crossing) Proton Parton (quark, gluon) l Higgs l o Z + e Particle e+ e- o Z jet jet Internet2 Presentation (Oct. 11, 2007) e- Collision rate ~109 Hz New physics rate ~105 Hz Selection: 1 in 1014 SUSY..... Paul Avery 6 LHC Data and CPU Requirements CMS ATLAS Storage Raw recording rate 0.2 – 1.5 GB/s Large Monte Carlo data samples 100 PB by ~2012 1000 PB later in decade? Processing PetaOps (> 600,000 3 GHz cores) Users 100s of institutes 1000s of researchers LHCb Internet2 Presentation (Oct. 11, 2007) Paul Avery 7 LHC Global Collaborations CMS ATLAS 2000 – 3000 physicists per experiment USA is 20–31% of total Internet2 Presentation (Oct. 11, 2007) Paul Avery 8 LHC Global Grid 5000 physicists, 60 countries 10s of Petabytes/yr by 2009 CERN / Outside = 10-20% CMS Experiment Online System Tier 0 Tier 1 CERN Computer Center 200 - 1500 MB/s Korea Russia UK 10-40 Gb/s FermiLab >10 Gb/s Tier 2 OSG U Florida Caltech UCSD 2.5-10 Gb/s Tier 3 Tier 4 FIU Physics caches Internet2 Presentation (Oct. 11, 2007) Iowa Maryland PCs Paul Avery 9 LHC Global Grid 11 Tier-1 sites 112 Tier-2 sites (growing) Internet2 Presentation (Oct. 11, 2007) 100s of universities Paul Avery 10 J. Knobloch LHC Cyberinfrastructure Growth: CPU 300 Multi-core boxes AC & power challenges Tier-2 350 200 ~100,000 cores 150 Tier-1 MSI2000 250 100 0 2007 CERN 50 LHCb-Tier-2 CMS-Tier-2 ATLAS-Tier-2 ALICE-Tier-2 LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN 2008 Internet2 Presentation (Oct. 11, 2007) 2009 Year Paul Avery 2010 11 LHC Cyberinfrastructure Growth: Disk 160 120 Tier-2 Disk 140 100 Petabytes 80 Tier-1 60 40 20 0 2007 CERN PB 100 LHCb-Tier-2 CMS-Tier-2 ATLAS-Tier-2 ALICE-Tier-2 LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN 2008 2009 2010 Year Internet2 Presentation (Oct. 11, 2007) Paul Avery 12 LHC Cyberinfrastructure Growth: Tape 160 140 Tape LHCb-Tier-1 CMS-Tier-1 80 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN 60 CMS-CERN 40 CERN PB 100 100 Petabytes Tier-1 120 20 0 2007 2008 2009 ATLAS-CERN ALICE-CERN 2010 Year Internet2 Presentation (Oct. 11, 2007) Paul Avery 13 HENP Bandwidth Roadmap for Major Links (in Gbps) Year Production Experimental Remarks 2001 2002 0.155 0.622 0.622-2.5 2.5 SONET/SDH 2003 2.5 10 DWDM; 1 + 10 GigE Integration 2005 10 2-4 X 10 Switch; Provisioning 2007 3 X 10 1st Gen. Grids 2009 ~8 X 10 or 2 X 40 ~5 X 40 or ~20 X 10 ~Terabit ~10 X 10; 40 Gbps ~5 X 40 or ~20 X 10 ~25 X 40 or ~100 X 10 2012 2015 SONET/SDH DWDM; GigE Integ. 40 Gbps Switching 2nd Gen Grids Terabit Networks ~MultiTbps Paralleled by Paul ESnet Avery roadmap Internet2 Presentation (Oct. 11, 2007) ~Fill One Fiber 14 HENP Collaboration with Internet2 www.internet2.edu HENP SIG Internet2 Presentation (Oct. 11, 2007) Paul Avery 15 HENP Collaboration with NLR www.nlr.net UltraLight and other networking initiatives Spawning state-wide and regional networks (FLR, SURA, LONI, …) Internet2 Presentation (Oct. 11, 2007) Paul Avery 16 US LHCNet, ESnet Plan 2007-2010: 3080 Gbps US-CERN AsiaPac SEA Europe US-LHCNet: NY-CHI-GVA-AMS 2007-10: 30, 40, 60, 80 Gbps Europe Aus. ESnet4 SDN Core: 30-50Gbps SNV Japan Japan BNL CHI NYC GEANT2 SURFNet IN2P3 DEN Metro Rings DC FNAL Aus. SDG ESnet IP Core ≥10 Gbps ALB ATL CERN ELP ESnet hubs New ESnet hubs Metropolitan Area Rings Major DOE Office of Science Sites High-speed cross connects with Internet2/Abilene Production IP ESnet core, 10 Gbps enterprise IP traffic Science Data Network core, 40-60 Gbps circuit transport Lab supplied Major international LHCNet Data Network NSF/IRNC circuit; GVA-AMS connection via SurfnetPaul or Avery Internet2 Presentation (Oct. 11, 2007) Geant2 10Gb/s 10Gb/s 30Gb/s 2 x 10Gb/s US-LHCNet Network Plan (3 to 8 x 10 Gbps US-CERN) ESNet MANs to FNAL & BNL; Dark Fiber 17 to FNAL; Peering With GEANT Tier1–Tier2 Data Transfers: 2006–07 1 GB/sec CSA06 Sep. 2006 Internet2 Presentation (Oct. 11, 2007) Mar. 2007 Paul Avery Sep. 2007 18 US: FNAL Computing, Transfer Rates to Tier-2 Universities Offline and CSA07 1 GB/s One well configured site. But ~10 such sites in near future network challenge Nebraska June 2007 Internet2 Presentation (Oct. 11, 2007) Paul Avery 19 Current Data Transfer Experience Transfers Or are generally much slower than expected stop altogether Potential causes difficult to diagnose Configuration problem? Loading? Queuing? Database errors, experiment S/W error, grid S/W error? End-host problem? Network problem? Application failure? Complicated recovery Insufficient information Too slow to diagnose and correlate at the time the error occurs Result Lower Need transfer rates, longer troubleshooting times intelligent services, smart end-host systems Internet2 Presentation (Oct. 11, 2007) Paul Avery 20 UltraLight Integrating Advanced Networking in Applications http://www.ultralight.org Funded by NSF Internet2 Presentation (Oct. 11, 2007) Paul Avery 10 Gb/s+ network • Caltech, UF, FIU, UM, MIT • SLAC, FNAL • Int’l partners 21 • Level(3), Cisco, NLR UltraLight Testbed www.ultralight.org Funded by NSF Internet2 Presentation (Oct. 11, 2007) Paul Avery 22 Many Near-Term Challenges Network Bandwidth, bandwidth, bandwidth Need for intelligent services, automation More efficient utilization of network (protocols, NICs, S/W clients, pervasive monitoring) Better collaborative tools Distributed authentication? Scalable services: automation Scalable support Internet2 Presentation (Oct. 11, 2007) Paul Avery 23 END Internet2 Presentation (Oct. 11, 2007) Paul Avery 24 Extra Slides Internet2 Presentation (Oct. 11, 2007) Paul Avery 25 The Open Science Grid Consortium U.S. grid projects University facilities Multi-disciplinary facilities Science projects & communities LHC experiments Open Science Grid Regional and campus grids Education communities Computer Science Laboratory centers Technologists (Network, HPC, …) Internet2 Presentation (Oct. 11, 2007) Paul Avery 26 CMS: “Compact” Muon Solenoid Inconsequential humans Internet2 Presentation (Oct. 11, 2007) Paul Avery 27 Collision Complexity: CPU + Storage (+30 minimum bias events) All charged tracks with pt > 2 GeV Reconstructed tracks with pt > 25 GeV 109 collisions/sec, selectivity: 1 in 1013 Internet2 Presentation (Oct. 11, 2007) Paul Avery 28 LHC Data Rates: Detector to Storage Physics filtering 40 MHz ~TBytes/sec Level 1 Trigger: Special Hardware 75 GB/sec 75 KHz Level 2 Trigger: Commodity CPUs 5 GB/sec 5 KHz Level 3 Trigger: Commodity CPUs 0.15 – 1.5 GB/sec 100 Hz Raw Data to storage (+ simulated data) Internet2 Presentation (Oct. 11, 2007) Paul Avery 29 LIGO: Search for Gravity Waves LIGO Grid 6 US sites 3 EU sites (UK & Germany) Birmingham• Cardiff AEI/Golm • * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration Internet2 Presentation (Oct. 11, 2007) Paul Avery 30 Is HEP Approaching Productivity Plateau? Expectations Beijing 2001 The Technology Hype Cycle Applied to HEP Grids San Diego 2003 Padova 2000 Victoria 2007 Mumbai 2006 Interlachen 2004 (CHEP Conferences) Gartner Group Internet2 Presentation (Oct. 11, 2007) Paul Avery 31 From Les Robertson Challenges from Diversity and Growth Management of an increasingly diverse enterprise Sci/Eng projects, organizations, disciplines as distinct cultures Accommodating new member communities (expectations?) Interoperation with other grids TeraGrid International partners (EGEE, NorduGrid, etc.) Multiple campus and regional grids Education, outreach and training Training for researchers, students … but also project PIs, program officers Operating a rapidly growing cyberinfrastructure 100K CPUs, 4 10 PB disk Management of and access to rapidly increasing data stores (slide) Monitoring, accounting, achieving high utilization Scalability of support model (slide) 25K Internet2 Presentation (Oct. 11, 2007) Paul Avery 32 Collaborative Tools: EVO Videoconferencing End-to-End Self Managed Infrastructure Internet2 Presentation (Oct. 11, 2007) Paul Avery 33 REDDnet: National Networked Storage NSF funded project Vanderbilt 8 initial sites Multiple disciplines Satellite imagery HENP Terascale Supernova Initative Structural Biology Bioinformatics Storage 500TB disk 200TB tape Brazil? Internet2 Presentation (Oct. 11, 2007) Paul Avery 34 OSG Operations Model Distributed model Scalability! VOs, sites, providers Rigorous problem tracking & routing Security Provisioning Monitoring Reporting Partners with EGEE operations Internet2 Presentation (Oct. 11, 2007) Paul Avery 35