Symposium on Knowledge Environments for Science: HENP Collaboration & Internet2 Douglas Van Houweling President & CEO, Internet2/UCAID November 26, 2002 Overview • High Energy Physics Computing Challenges • Internet2 Infrastructure Issues • Observations Slide 2 November 2002 HENP Computing Challenges • Geographical dispersion of people and resources • Complexity of the detector and the LHC environment • Scale: Tens of Petabytes per year of data • Major challenges associated with: • Communication and collaboration at a distance • Managing globally distributed computing & data resources • Cooperative software development and physics analysis Slide 3 • 5000+ Physicists • 250+ Institutes • 60+ Countries November 2002 Data Grids • Data Grids- New Forms of Distributed Systems • Four LHC Experiments • ATLAS, CMS, ALICE, LHCB • Data Stored: • ~40+ Petabytes/year • CPU: • 0.30+ PetaFlOPS/year • LHC Experiments producing Exabytes (1 EB = 1018 Bytes) • 0.1 EB in 2007 • 1.0 EB by 2012 Slide 4 November 2002 LHC Data Grid Hierarchy ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN 700k SI95 ~1 PB Disk; Tape Robot Tier 0 +1 ~2.5 Gbps Tier 1 IN2P3 Center FNAL: 200k SI95; 600 TB INFN Center RAL Center 2.5 Gbps Tier 2 Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center ~2.5 Gbps Tier 3 Institute ~0.25TIPS Physics data cache Workstations Slide 5 CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2) ~1:1:1 Institute Institute Institute 0.1–10 Gbps Tier 4 Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels November 2002 TransAtlantic BW Reqs 2001 2002 2003 2004 2005 2006 CMS 100 200 300 600 800 2500 ATLAS 50 100 300 600 800 2500 BaBar 300 600 2300 3000 CDF 100 300 2000 3000 6000 D0 400 1600 2400 3200 6400 8000 BTeV 20 40 100 200 300 500 DESY 100 180 210 240 270 300 CERN 155BW 310 622 1100 1600 400 2500 5000 10000 20000 Transatlantic Net WG (HN, L. Price), Installed BW. Maximum Link Occupancy 50% Assumed See http://gate.hep.anl.gov/lprice/TAN Slide 6 November 2002 Emerging DataGrid Community • Grid Physics Network (GriPhyN) • ATLAS, CMS, LIGO, SDSS • Access Grid; VRVS: supporting group-based collaboration And • Others presented at this symposium Slide 7 November 2002 Current Grid Challenges • Stable High Performance Network Platform • Standard Core Middleware • Secure Workflow Management and Optimization • Maintaining a Global View of Resources and System State • Workflow: Strategic Balance of Policy Versus Moment-to-moment Capability to Complete Tasks • Handling User-Grid Interactions: Guidelines; Agents • Building Higher Level Services, and an Integrated Scalable User Environment for the Above Slide 8 November 2002 DataTAG Project • EU-Solicited Project. CERN, PPARC (UK), Amsterdam (NL), and INFN (IT); and US (DOE/NSF: UIC, NWU and Caltech) partners • Main Aims: • Ensure maximum interoperability between US and EU Grid Projects • Transatlantic Testbed for advanced network research • 2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003 NewYork ABILEN E UK SuperJANET4 It GARR-B STARLIGHT ESNET GENEVA GEANT NL SURFnet Fr INRIA Slide 9 STAR-TAP CALRE N2 Atriu VTHD m November 2002 Infrastructure Issues • Network performance & stability • Abilene -> 10 gig wavelength • End-to-end performance • National Light Rail • Middleware • NSF Middleware Initiative • Core middleware – Shibboleth, etc. • Application requirements • Multicast, IPv6 Slide 10 November 2002 National Light Rail Footprint SEA POR SAC BOS NYC CHI OGD SVL DEN CLE PIT FRE WDC KAN RAL NAS STR LAX PHO WAL SDG ATL OLG DAL 15808 Terminal, Regen or OADM site Fiber route NLR Buildout Starts in 2003 Initially 4 10 Gb Wavelengths To 40 10Gb Waves in Future NREN Backbones reached 2.5-10 Gbps in 2002 in Europe, Japan and US; US: Transition now to optical, dark fiber, multi-wavelength R&E network Slide 11 November 2002 Some thoughts… • Technology is rapidly progressing • We can move more bits, faster and over many types of media • Many changes in scientific practice are emerging • Difference between data collectors and analyzers • Synchronization of many instruments • Combination of simulation and observation • Shifting focus from instruments to datasets • And many more… Slide 12 November 2002 HENP Working Group • High Energy and Nuclear Physics Working Group • Formed working group in late 2001 • Needed additional focus on network intensive aspects of their research • Currently over 80 individuals participating Slide 13 November 2002 HENP- Experiment Example • Large Hadron Collider (2006) • Largest superconductor installation in the world • Generating multiple petabytes of data per year, gigabytes per second • One in a trillion events might lead to a major physics discovery Slide 14 November 2002 HENP- Applications • Remote Collaboration, VRVS • Distributed Data Storage • Distributed Computation and Databases • Dynamic Visualizations Slide 15 November 2002 NEESGrid • Network for Earthquake Engineering Simulation • A “Grid” Project • Consists of 10 initial sites across the U.S. addressing the needs of structural, geo- technical and tsunami researchers Slide 16 November 2002 NEESGrid- Applications • Video as Data • Collaboration • Remote Instrumentaiton • Distributed Data storage • Final goal- simultaneous physical and computational experiments Slide 17 November 2002 eVLBI (Astronomy) • Electronic Very Long Baseline Interferometry • Astronomers combine data from multiple antennas to create a single image that is more accurate than any single antenna could create • Requires coordination of multiple physical resources as well as advanced network services Slide 18 November 2002 eVLBI- Experiment Example • Astronomers collect data about a star from many different earth based antennae and send the data to a specialized computer for analysis on a 24x7 basis • VLBI is not as concerned with data loss as they are with long term stability (unlike physics) • The end goal is to send data at 1Gb/s from over 20 antennae that are located around the globe. Slide 19 November 2002 eVLBI- Applications • Advanced network protocol development • Cooperation and participation across international networks • Remote instrumentation • Real time data analysis allows for flexibility and agility in response to transient astronomical events Slide 20 November 2002 www.internet2.edu Slide 21 November 2002 Slide 22 November 2002 Building Global Grids • Implications for Society • Meeting the challenges of Petabyte-to-Exabyte Grids, and Gigabit-to-Terabit Networks, will transform research in science and engineering • These developments could create the first truly global virtual organizations (GVO) • If these developments are successful, and deployed widely as standards, this could lead to profound advances in industry, commerce and society at large • By changing the relationship between people and “persistent” information in their daily lives • Within the next five to ten years • Realizing the benefits of these developments for society, and creating a sustainable cycle of innovation compels us • TO CLOSE the DIGITAL DIVIDE Slide 23 November 2002 Closing the Digital Divide • What HENP and the World Community Can Do • Spread the message: ICFA SCIC, IEEAF et al. can help • Help identify and highlight specific needs (to Work On) • Policy problems; Last Mile problems; etc. • Encourage Joint programs [DESY’s Silk project; Japanese links to SE Asia and China; AMPATH to So. America] • NSF & @LIS Proposals: US and EU to South America • Make direct contacts, arrange discussions with gov’t officials • ICFA SCIC is prepared to participate where appropriate • Help Start, Get Support for Workshops on Networks & Grids • Encourage, help form funded programs • Help form Regional support & training groups (requires funding) Slide 24 November 2002 Technology, Stewardship • Access to and development of leading infrastructures and new classes of information-rich systems carries obligations • Stewardship • Playing a leading role in making these assets usable by a broad sector of the World Community • Examples • Develop devices and systems for the disabled; With no discrimination against any area of society • Develop standardized toolkits and portals for wide access from schools • Encourage joint programs and support from industry • Strong education and outreach components in all medium and large research proposals (e.g. NSF) Slide 25 November 2002 INTRO Doug, Slides 3 to 17 are modified from Harvey’s talk. I am not totally familiar with his stuff. Slide 18 is blank Slides 19 to 28 are from my standard slide deck. I know these in depth. Can give detailed talking points. Feel free to call me on my cell phone if you have questions: 734.730.3300 I will be around all day/evening - Charles Slide 26 November 2002