High Energy Physics: Networks & Grids Systems for Global Science Harvey B. Newman California Institute of Technology AMPATH Workshop, FIU January 31, 2003 Next Generation Networks for Experiments: Goals and Needs Large data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams Providing rapid access to event samples, subsets and analyzed physics results from massive data stores From Petabytes by 2002, ~100 Petabytes by 2007, to ~1 Exabyte by ~2012. Providing analyzed results with rapid turnaround, by coordinating and managing the large but LIMITED computing, data handling and NETWORK resources effectively Enabling rapid access to the data and the collaboration Across an ensemble of networks of varying capability Advanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANs With reliable, monitored, quantifiable high performance LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate (+30 minimum bias events) All charged tracks with pt > 2 GeV Reconstructed tracks with pt > 25 GeV 109 events/sec, selectivity: 1 in 1013 (1 person in 1000 world populations) Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*] 2001 2002 2003 2004 2005 2006 CMS ATLAS 100 200 300 600 800 2500 50 100 300 600 800 2500 BaBar 300 600 2300 3000 CDF D0 100 300 2000 3000 6000 1600 2400 3200 6400 8000 BTeV DESY 20 40 100 200 300 500 100 180 210 240 270 300 CERN BW 400 1100 1600 400 155- 622 2500 5000 10000 20000 310 [*] BW Requirements Increasing Faster Than Moore’s Law See http://gate.hep.anl.gov/lprice/TAN HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps Year Production Experimental 2001 2002 0.155 0.622 0.622-2.5 2.5 2003 2.5 10 DWDM; 1 + 10 GigE Integration 2005 10 2-4 X 10 Switch; Provisioning 2007 2-4 X 10 1st Gen. Grids 2009 ~10 X 10 or 1-2 X 40 ~5 X 40 or ~20 X 10 ~Terabit ~10 X 10; 40 Gbps ~5 X 40 or ~20-50 X 10 ~25 X 40 or ~100 X 10 2011 2013 ~MultiTbps Remarks SONET/SDH SONET/SDH DWDM; GigE Integ. 40 Gbps Switching 2nd Gen Grids Terabit Networks ~Fill One Fiber Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade; We are Rapidly Learning to Use and Share Multi-Gbps Networks DataTAG Project NewYork ABILENE UK SuperJANET4 It GARR-B STARLIGHT ESNET GENEVA GEANT NL SURFnet Fr INRIA Atrium STAR-TAP CALREN 2 VTHD EU-Solicited Project. CERN, PPARC (UK), Amsterdam (NL), and INFN (IT); and US (DOE/NSF: UIC, NWU and Caltech) partners Main Aims: Ensure maximum interoperability between US and EU Grid Projects Transatlantic Testbed for advanced network research 2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003 Progress: Max. Sustained TCP Thruput on Transatlantic and US Links 8-9/01 11/5/01 1/09/02 3/11/02 * 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN 125 Mbps in One Stream (modified kernel): CIT-CERN 190 Mbps for One stream shared on 2 155 Mbps links 120 Mbps Disk-to-Disk with One Stream on 155 Mbps link (Chicago-CERN) 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) 9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, OC48 Link 11-12/02 FAST: 940 Mbps in 1 Stream SNV-CERN; 9.4 Gbps in 10 Flows SNV-Chicago Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e FAST (Caltech): A Scalable, “Fair” Protocol for Next-Generation Networks: from 0.1 To 100 Gbps SC2002 11/02 Highlights of FAST TCP SC2002 10 flows SC2002 2 flows I2 LSR 29.3.00 multiple SC2002 1 flow 9.4.02 1 flow Standard Packet Size 940 Mbps single flow/GE card 9.4 petabit-m/sec 1.9 times LSR 9.4 Gbps with 10 flows 37.0 petabit-m/sec 6.9 times LSR 22 TB in 6 hours; in 10 flows Implementation Sender-side (only) mods Delay (RTT) based Stabilized Vegas Internet: distributed feedback system Theory Experiment 7000km 22.8.02 IPv6 Rf (s) TCP AQM Rb’(s) URL: netlab.caltech.edu/FAST p Sunnyvale 3000km Next: 10GbE; 1 GB/sec disk to disk Geneva Chicago Baltimore 1000km C. Jin, D. Wei, S. Low FAST Team & Partners FAST TCP: Aggregate Throughput FAST Standard MTU Utilization averaged over > 1hr 88% 90% 90% Average utilization 92% 95% 1 flow netlab.caltech.edu 2 flows 7 flows 9 flows 10 flows HENP Lambda Grids: Fibers for Physics Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007) requires that each transaction be completed in a relatively short time. Example: Take 800 secs to complete the transaction. Then Transaction Size (TB) Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today) Summary: Providing Switching of 10 Gbps wavelengths within ~3-5 years; and Terabit Switching within 5-8 years would enable “Petascale Grids with Terabyte transactions”, as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields. Data Intensive Grids Now: Large Scale Production Efficient sharing of distributed heterogeneous compute and storage resources Virtual Organizations and Institutional resource sharing Dynamic reallocation of resources to target specific problems Collaboration-wide data access and analysis environments Grid solutions NEED to be scalable & robust Must handle many petabytes per year Tens of thousands of CPUs Tens of thousands of jobs Grid solutions presented here are supported in part by the GriPhyN, iVDGL, PPDG, EDG, and DataTag We are learning a lot from these current efforts For Example 1M Events Processed using VDT Oct.-Dec. 2002 Grids NOW Beyond Peoduction: Web Services for Ubiquitous Data Access and Analysis by a Worldwide Scientific Community Web Services: easy, flexible, platform-independent access to data (Object Collections in Databases) Well-adapted to use by individual physicists, teachers & students SkyQuery Example JHU/FNAL/Caltech with Web Service based access to astronomy surveys Can be individually or simultaneously queried via Web interface Simplicity of interface hides considerable server power (from stored procedures etc.) This is a “Traditional” Web Service, with no user authentication required COJAC: CMS ORCA Java Analysis Component: Java3D Objectivity JNI Web Services Demonstrated Caltech-Rio de Janeiro and Chile in 2002 LHC Distributed CM: HENP Data Grids Versus Classical Grids Grid projects have been a step forward for HEP and LHC: a path to meet the “LHC Computing” challenges “Virtual Data” Concept: applied to large-scale automated data processing among worldwide-distributed regional centers The original Computational and Data Grid concepts are largely stateless, open systems: known to be scalable Analogous to the Web The classical Grid architecture has a number of implicit assumptions The ability to locate and schedule suitable resources, within a tolerably short time (i.e. resource richness) Short transactions; Relatively simple failure modes HEP Grids are data-intensive and resource constrained Long transactions; some long queues Schedule conflicts; policy decisions; task redirection A Lot of global system state to be monitored+tracked Current Grid Challenges: Secure Workflow Management and Optimization Maintaining a Global View of Resources and System State Coherent end-to-end System Monitoring Adaptive Learning: new algorithms and strategies for execution optimization (increasingly automated) Workflow: Strategic Balance of Policy Versus Moment-to-moment Capability to Complete Tasks Balance High Levels of Usage of Limited Resources Against Better Turnaround Times for Priority Jobs Goal-Oriented Algorithms; Steering Requests According to (Yet to be Developed) Metrics Handling User-Grid Interactions: Guidelines; Agents Building Higher Level Services, and an Integrated Scalable User Environment for the Above Distributed System Services Architecture (DSSA): CIT/Romania/Pakistan Agents: Autonomous, Auto- discovering, self-organizing, Lookup collaborative Service “Station Servers” (static) host mobile “Dynamic Services” Servers interconnect dynamically; form a robust fabric in which mobile agents travel, with a Station payload of (analysis) tasks Server Adaptable to Web services: OGSA; and many platforms Station Adaptable to Ubiquitous, Server mobile working environments Lookup Discovery Service Lookup Service Station Server Managing Global Systems of Increasing Scope and Complexity, In the Service of Science and Society, Requires A New Generation of Scalable, Autonomous, Artificially Intelligent Software Systems MonaLisa: A Globally Scalable Grid Monitoring System By I. Legrand (Caltech) Deployed on US CMS Grid Agent-based Dynamic information / resource discovery mechanism Talks w/Other Mon. Systems Implemented in Java/Jini; SNMP WDSL / SOAP with UDDI Part of a Global “Grid Control Room” Service MONARC SONN: 3 Regional Centres “Learning” to Export Jobs (Day 9) <E> = 0.83 CERN 30 CPUs <E> = 0.73 1MB/s ; 150 ms RTT NUST 20 CPUs CALTECH 25 CPUs <E> = 0.66 Optimized: Day = 9 NSF ITR: Globally Enabled Analysis Communities Develop and build Dynamic Workspaces Build Private Grids to support scientific analysis communities Using Agent Based Peer-to-peer Web Services Construct Autonomous Communities Operating Within Global Collaborations Empower small groups of scientists (Teachers and Students) to profit from and contribute to int’l big science Drive the democratization of science via the deployment of new technologies Private Grids and P2P SubCommunities in Global CMS 14600 Host Devices; 7800 Registered Users in 64 Countries 45 Network Servers Annual Growth 2 to 3X An Inter-Regional Center for Research, Education and Outreach, and CMS CyberInfrastucture Foster FIU and Brazil (UERJ) Strategic Expansion Into CMS Physics Through Grid-Based “Computing” Development and Operation for Science of International Networks, Grids and Collaborative Systems Focus on Research at the High Energy frontier Developing a Scalable Grid-Enabled Analysis Environment Broadly Applicable to Science and Education Made Accessible Through the Use of Agent-Based (AI) Autonomous Systems; and Web Services Serving Under-Represented Communities At FIU and in South America Training and Participation in the Development of State of the Art Technologies Developing the Teachers and Trainers Relevance to Science, Education and Society at Large Develop the Future Science and Info. S&E Workforce Closing the Digital Divide