PRISM: High-Capacity Networks that Augment Campus’ General Utility Production Infrastructure Philip Papadopoulos, PhD. Calit2 and SDSC Some Perspective on 100Gbps • DDR3 1600MHz Memory DIMM = 12.8GB/s (102.4Gbps) • Triton Compute nodes (24GB/node) enough memory capacity to source 100Gbps for ~2 seconds • High-performance Flash drive @ 500MB/sec, about 24 Flash Drives to fill 100Gbps – @ 250GB each (6TB total) ~ 8 minutes @ 100Gbps • Data Oasis High-Performance Parallel File System @ SDSC (all 10GbE) – 64 Servers @ 72TB each, 2GB/sec Disk-to-network – 4.6PB (102 hours/4.25 Days @ 100Gbps) 100Gbps is really big from some perspectives, not so from others. Terminating 100Gbps • You land 100Gbps @ your campus, where does it go from there? • What kinds of devices need to be connected? Some history at UCSD: A Decade of Leading-edge Research Networks • 2002. ITR: The OptIPuter, $15M – Smarr, PI. Papadopoulos, Ellisman UCSD Co-PIs. DeFanti, Leigh UIC Co-PIs – “If the network ceases to become a bottleneck how does that change the design of distributed programs” • 2004, Quartzite: MRI:Development of Quartzite, a Campus-wide, Terabit-Class, FieldProgrammable, Hybrid Switching Instrument for Comparative Studies, $1.48M – Papadopoulos, PI. Smarr, Fainman, Ford, Co-PIs – “Make the network real for OptIPuter experiments” OptIPuter Network(2005) 0.320 Tbps Backplane Bandwidth Juniper T320 To CENIC and NLR Dedicated Fibers Between Sites Link Linux Clusters SDSC SDSC JSOE 20X Engineering SOM 6.4 Tbps Backplane Bandwidth Chiaro Estara SDSC Annex SDSC Annex Preuss High School CRCA 6th College Medicine Phys. Sci Keck Collocation Node M Earth SIO Sciences ½ Mile Source: Phil Papadopoulos, SDSC; Greg Hidley, Cal-(IT)2 Technology Motion • Chiaro (out of business) – Replaced capability with Force10 E1200 – Moved physical center of network to Atkinson Hall (Calit2) • Juniper T320 (Retired) – Upgraded by Campus/SDSC with pair of MX960s • Endpoints replaced/upgraded over time at all sites • Quartzite Introduced DWDM, all-optical, and Wavelength switching • What was constant? – Fiber plant (how we utilized it moved over time) • What was growing – Bigger Data at an increasing number of labs. Instrument capacity. PRISM@UCSD: Next Generation (NSF Award# OCI-1246396) • NSF Campus Cyberinfrastructure Program (CC-NIE), $500K, 1/1/2013 start date, Papadopoulos. PI. Smarr Co-PI • Replace Quartzite Core – Packet switch only (hybrid not required) – 10GbE, 40GbE, 100GbE Capability – “Small” switch – 11.5Tbit/s full-bisection, 1+Tbit/sec terminated in phase0 • Expansion to more sites on/off campus • Widen the freeway between SDSC and Calit2 – Access to SDSC/XSEDE resources – Campus has committed to 100Gb/s Internet2 connection. Prism is the natural termination network. Prism@UCSD: Expanding Network Reach for Big Data Users Phil Papadopoulos, SDSC, Calit2, PI Prism Core Switch – Arista Networks Next Gen 7504 : What 11.5Tb/s looks like (< 3KW) This is the Prism core switch (Delivery in March 2013). Will have 10GbE (48 ports), 40GbE (36 ports), and 100GbE short-reach (2 ports). 2 Slots empty for expansion. Physical Connections • A variety of Transceiver Tech – Copper 10Gbit and 40Gbit for in machine room – SR, LR SFP+ 10GbE, in building and cross-campus – 10GbE DWDM 40KM + Passive Multiplexers • • • • Fiber conservation. Re-use of Optics for Quartzite Requires media conversion (DWDM XFPs) VERY reliable. No multiplexer failures in 5+ years. 1 Transceiver – 10GbE CWDM + Passive multiplexers • SFP+ form factors (direct plug into 7504) – 40GbE LR4, QSFP+. (internally is CWDM). • Choice of transceiver depends on where we are going, how much bandwidth is needed, and the connection point – E.g., Calit2 – SDSC: 12 x 10GbE (2 x LR + 10 DWDM), 2 Fiber pair. • SDSC landing is 10GbE only (today). What is our Rationale in Prism • Big Data Labs have particular burst bandwidth needs – At UCSD. Number of labs today is roughly 20-25 • Campus backbone is 10GbE/20GbE and serves 50,000 users on a daily basis with ~80K IP addresses – One data burst data transfer on Prism would saturate the campus backbone – Protect the campus network from big data freeway users. – Provide massive network capability in a cost-effective manner • Software defined networking (SDN) is emerging technology to better handle configuration – SDN via OpenFlow will be supported on Prism – Combine ability to experiment while reducing risk of complete network disruption • Easily Bridge to Identified networks – – – – Prism UCSD Production Network (20GbE bridge == Campus Backbone) Prism XSEDE Resources (Direct connect in SDSC 7508s) Prism Off-campus, high-capacity (e.g. ESNET, 100GbE Internet2, NLR) Prism Biotech Mesa surrounding UCSD. Optiputer/Quartzite Enabled SDSC to Build Low-Cost High-Performance Storage 120Gbps Prism Core Really Pushing Data from Storage (what 800+ Gbps/sec looks like) Jun 2012 MLAG 485Gb/s + 350Gb/s • Saturation test: IOR testing through Lustre: 835 Gb/s = 104GB/sec • OASIS designed to NOT be an Island. This is why we chose 10GbE instead of IB • Papadopoulos set performance target of 100+GB/sec for Gordon Track 2 Proposal (submitted in 2010). Most people at SDSC thought it was “crazy” Summary • Big Data + High Capacity inexpensive switching + High Throughput Instruments + Significant Computing and Data Analysis Capacity all form a “perfect storm” – OptIPuter predicted this in 2002, Quartzite amplified that prediction in 2004. We are now here. • You have to work on multiple ends of the problem – Devices, Networks, Cost$ • Key insight: Recognize the fundamental differences between scaling challenges (e.g. Campus 50K users vs. Prism’s 500 Users (the 1%)) • Build for Burst capacity