Slides - Cenic

advertisement
PRISM: High-Capacity Networks that
Augment Campus’ General Utility
Production Infrastructure
Philip Papadopoulos, PhD.
Calit2 and SDSC
Some Perspective on 100Gbps
• DDR3 1600MHz Memory DIMM = 12.8GB/s (102.4Gbps)
• Triton Compute nodes (24GB/node) enough memory
capacity to source 100Gbps for ~2 seconds
• High-performance Flash drive @ 500MB/sec, about 24
Flash Drives to fill 100Gbps
– @ 250GB each (6TB total) ~ 8 minutes @ 100Gbps
• Data Oasis High-Performance Parallel File System @ SDSC
(all 10GbE)
– 64 Servers @ 72TB each, 2GB/sec Disk-to-network
– 4.6PB (102 hours/4.25 Days @ 100Gbps)
 100Gbps is really big from some perspectives, not so from
others.
Terminating 100Gbps
• You land 100Gbps @ your campus, where
does it go from there?
• What kinds of devices need to be connected?
Some history at UCSD: A Decade of
Leading-edge Research Networks
• 2002. ITR: The OptIPuter, $15M
– Smarr, PI. Papadopoulos, Ellisman UCSD Co-PIs.
DeFanti, Leigh UIC Co-PIs
– “If the network ceases to become a bottleneck how
does that change the design of distributed programs”
• 2004, Quartzite: MRI:Development of Quartzite,
a Campus-wide, Terabit-Class, FieldProgrammable, Hybrid Switching Instrument for
Comparative Studies, $1.48M
– Papadopoulos, PI. Smarr, Fainman, Ford, Co-PIs
– “Make the network real for OptIPuter experiments”
OptIPuter Network(2005)
0.320 Tbps
Backplane
Bandwidth
Juniper
T320
To CENIC and NLR
Dedicated Fibers
Between Sites Link
Linux Clusters
SDSC
SDSC
JSOE
20X
Engineering
SOM
6.4 Tbps
Backplane
Bandwidth
Chiaro
Estara
SDSC
Annex
SDSC
Annex
Preuss
High School
CRCA
6th
College
Medicine
Phys. Sci Keck
Collocation
Node M
Earth
SIO
Sciences
½ Mile
Source: Phil Papadopoulos, SDSC;
Greg Hidley, Cal-(IT)2
Technology Motion
• Chiaro (out of business)
– Replaced capability with Force10 E1200
– Moved physical center of network to Atkinson Hall (Calit2)
• Juniper T320 (Retired) – Upgraded by Campus/SDSC with
pair of MX960s
• Endpoints replaced/upgraded over time at all sites
• Quartzite Introduced DWDM, all-optical, and Wavelength
switching
• What was constant?
– Fiber plant (how we utilized it moved over time)
• What was growing
– Bigger Data at an increasing number of labs. Instrument
capacity.
PRISM@UCSD: Next Generation
(NSF Award# OCI-1246396)
• NSF Campus Cyberinfrastructure Program (CC-NIE), $500K,
1/1/2013 start date, Papadopoulos. PI. Smarr Co-PI
• Replace Quartzite Core
– Packet switch only (hybrid not required)
– 10GbE, 40GbE, 100GbE Capability
– “Small” switch – 11.5Tbit/s full-bisection, 1+Tbit/sec terminated
in phase0
• Expansion to more sites on/off campus
• Widen the freeway between SDSC and Calit2
– Access to SDSC/XSEDE resources
– Campus has committed to 100Gb/s Internet2 connection. Prism
is the natural termination network.
Prism@UCSD: Expanding Network Reach for Big Data Users
Phil Papadopoulos, SDSC, Calit2, PI
Prism Core Switch – Arista Networks Next Gen 7504
: What 11.5Tb/s looks like (< 3KW)
This is the Prism core switch (Delivery in March 2013). Will have 10GbE (48
ports), 40GbE (36 ports), and 100GbE short-reach (2 ports). 2 Slots empty for
expansion.
Physical Connections
• A variety of Transceiver Tech
– Copper 10Gbit and 40Gbit for in machine room
– SR, LR SFP+ 10GbE, in building and cross-campus
– 10GbE DWDM 40KM + Passive Multiplexers
•
•
•
•
Fiber conservation.
Re-use of Optics for Quartzite
Requires media conversion (DWDM XFPs)
VERY reliable. No multiplexer failures in 5+ years. 1 Transceiver
– 10GbE CWDM + Passive multiplexers
• SFP+ form factors (direct plug into 7504)
– 40GbE LR4, QSFP+. (internally is CWDM).
• Choice of transceiver depends on where we are going, how much
bandwidth is needed, and the connection point
– E.g., Calit2 – SDSC: 12 x 10GbE (2 x LR + 10 DWDM), 2 Fiber pair.
• SDSC landing is 10GbE only (today).
What is our Rationale in Prism
• Big Data Labs have particular burst bandwidth needs
– At UCSD. Number of labs today is roughly 20-25
• Campus backbone is 10GbE/20GbE and serves 50,000 users on a daily
basis with ~80K IP addresses
– One data burst data transfer on Prism would saturate the campus backbone
– Protect the campus network from big data freeway users.
– Provide massive network capability in a cost-effective manner
• Software defined networking (SDN) is emerging technology to better
handle configuration
– SDN via OpenFlow will be supported on Prism
– Combine ability to experiment while reducing risk of complete network disruption
• Easily Bridge to Identified networks
–
–
–
–
Prism  UCSD Production Network (20GbE bridge == Campus Backbone)
Prism  XSEDE Resources (Direct connect in SDSC 7508s)
Prism  Off-campus, high-capacity (e.g. ESNET, 100GbE Internet2, NLR)
Prism  Biotech Mesa surrounding UCSD.
Optiputer/Quartzite Enabled SDSC to Build
Low-Cost High-Performance Storage
120Gbps
Prism Core
Really Pushing Data from Storage
(what 800+ Gbps/sec looks like)
Jun 2012
MLAG
485Gb/s
+
350Gb/s
• Saturation test: IOR testing through Lustre: 835 Gb/s = 104GB/sec
• OASIS designed to NOT be an Island. This is why we chose 10GbE instead of IB
• Papadopoulos set performance target of 100+GB/sec for Gordon Track 2 Proposal
(submitted in 2010). Most people at SDSC thought it was “crazy”
Summary
• Big Data + High Capacity inexpensive switching + High
Throughput Instruments + Significant Computing and
Data Analysis Capacity all form a “perfect storm”
– OptIPuter predicted this in 2002, Quartzite amplified that
prediction in 2004. We are now here.
• You have to work on multiple ends of the problem –
Devices, Networks, Cost$
• Key insight: Recognize the fundamental differences
between scaling challenges (e.g. Campus 50K users vs.
Prism’s 500 Users (the 1%))
• Build for Burst capacity
Download