End-to-End Performance Tuning and Best Practices

advertisement
9/29/15
End-to-End Performance Tuning
and Best Practices
Wednesday, September 29, 2015
Moderator: Charlie McMahon, Tulane University
Jan Cheetham, University of Wisconsin-Madison
Chris Rapier, Pittsburgh Supercomputing Center
Paul Gessler, University of Idaho
Maureen Dougherty, USC
Slide 1
Slide 1
9/29/15
Paul Gessler
Professor & Director, Northwest Knowledge Network
University of Idaho
Slide 2
Slide 2
9/29/15
Enabling 10 Gbps connections to the
Idaho Regional Optical Network
• UI Moscow campus network core
• Northwest Knowledge Network and DMZ
• DOE’s Idaho National Lab
• Implemented perfSONAR monitoring over Idaho
• Institute for Biological and Evolutionary Studies
Slide 3
Slide 3
9/29/15
Slide 4
Slide 4
9/29/15
Slide 5
Slide 5
9/29/15
Jan Cheetham
Research and Instructional Technologies Consultant
University of Wisconsin-Madison
Slide 6
Slide 6
9/29/15
University of Wisconsin Campus Network
Internet2 Innovation
Network
Science DMZ
SSEC
IceCUBE
HEP
100G
perfSONAR
Engineeri
ng
CHTC
WID
WEI
LOCI
Biotech
Campus Network
Distribution
Slide 7
Slide 7
9/29/15
Diagnosing Network Issues
PerfSONAR helps uncover problems with:
• TCP window size issues to San Diego
• Optical fiber cut affecting latency-sensitive link between
SSEC and NOAA
• Line card failure resulting in dropped packets on research
partner’s (WID) LAN
• Transfers from internal data stores to distributed computer
resources (HTCondor pools)
Slide 8
Slide 8
9/29/15
Dealing with Firewalls
Can’t use firewall
• Security baseline for research computing
Must be behind a firewall
• Upgrade firewall to high speed backplane to allow
10G throughput to campus in preparation for campus
network upgrade
• Plan to use SDN to shunt some traffic (identified uses
within our security policy)
Slide 9
Slide 9
9/29/15
Challenges
• 100 GE line card failure
(pursuing buffer overflow)
• Separating spiky research traffic from the rest of
campus network traffic
• Distributed campus—getting the word out to enable
everyone to take advantage
• Internal network environments limitations for
researchers
• Storage bottleneck
Slide 10
Slide 10
9/29/15
Chris Rapier
Senior Research Programmer
Pittsburgh Supercomputing Center
Slide 11
Slide 11
9/29/15
XSight & Web10G
Goal: Use the metrics provided by Web10G to
enhance workflow by early identification of
pathological flows.
• A distributed set of Web10G enabled listeners on Data
Transfer Nodes across multiple domains.
• Gather data on all flows of interest and collate at
centralized DB.
• Analyze data to find marginal and failing flows
• Provide NOC with actionable data in near real time
Slide 12
Slide 12
9/29/15
Implementation
• Listener: C application periodically polls all TCP
flows. Applies rule set to
• Database: InfluxDB. Time series DB.
• Analysis engine: Currently applies heuristic
approach. Development of models in progress.
• UI: Web based logical map. Allows engineers to
drill down to failing flows and display collected
metrics.
Slide 13
Slide 13
9/29/15
Results
• Analysis engine and UI still in development
• Looking for partners for listener deployment
(includes NOCs)
• 6 months left under EAGER grant. Will be
seeking to renew grant.
Slide 14
Slide 14
9/29/15
Maureen Dougherty
Director, Center for High-Performance Computing
USC
Slide 15
Slide 15
Trojan Express Network II
Goal: Develop Next Generation research network in parallel to
production network to address increasing research data transfer
demands
• Leverage existing 100G Science DMZ
• Instead of expensive routers, use cheaper high-end
network switches
• Use OpenFlow running on a server to control the switch
• PerfSONSAR systems for metrics and monitoring
Trojan Express Network Buildout
Collaborative Bandwidth Tests
• 72.5ms round trip between USC and Clemson
• 100Gbps Shared Link
• 12 machine OrangeFS cluster at USC
– Directly connected to Brocade Switch at 10Gbps Each
• 12 clients at Clemson
• USC ran nuttcp sessions between pairs of USC and
Clemson hosts
• Clemson ran file copies to the USC OrangeFS cluster
Linux Network Configuration
Bandwidth Delay Product
72.5ms x 10Gbits/second = 90625000 bytes (90Mbytes)
• net.core.rmem_max = 96468992
• net.core.wmem_max = 96468992
• net.ipv4.tcp_rmem = 4096 87380 96468992
• net.ipv4.tcp_wmem = 4096 65536 96468992
• net.ipv4.tcp_congestion_control = yeah
• jumbo frames enabled (mtu 9000)
Nuttcp Bandwidth Test
Peak Transfer of 72Gb/s with 9 nodes
9/29/15
Contact Information
Charlie McMahon, Tulane University
cpm@tulane.edu
Jan Cheetham University of Wisconsin-Madison
jan.cheetham@wisc.edu
Chris Rapier, Pittsburgh Supercomputing Center
rapier@psc.edu
Paul Gessler, University of Idaho
paulg@uidaho.edu
Slide 21
Slide 21
Download