UH SWARM: Dense perfSONAR Deployment With Small

advertisement
UH SWARM:
Dense perfSONAR Deployment
With Small, Inexpensive Devices
Alan Whinery
U. Hawaii ITS
September 29, 2015
Slide 1
Slide 1
The Whole Small perfNode Thing

At a $2000 to $5000 price point, a typical perfSONAR
node gets deployed at REN leaves and branches



At $50 price point, you can buy 40 to 100 times as many
for the same amount
Some focus on pS node equivalence, Intel compatibility


Deployment tends to be relatively sparse
(~$200 price point)
Some focus on value in smaller ($50) nodes
Slide 2
Slide 2
The Whole Small perfNode Thing

PerfSONAR developer fork for small
devices:https://github.com/perfsonar/project/wiki/perfSONAR-EndpointNode-Project



$150 - $250 price range, focus on Intel architecture
RNP Brasil – MonIPE 
http://monipe.rnp.br/

Same type of nodes – ARM architecture - as our $50 price
range
UH SWARM

Our thing – Beaglebones, Raspberries, etc ARM/$50.
Slide 3
Slide 3
The Swarm


Wrote paragraph into our CC-NIE campus networking
proposal about making use of the recent availability of
~$50 computers to “sense” the network, using elements
of perfSONAR.
Funded a project to deploy 100 nodes on one campus
over 2 years, exploiting a ~$50 price point to deploy
many nodes on campus as a dense mesh.
Slide 4
Slide 4
Goals/Challenges

Finding nodes to buy in the face of market exhaustion

Getting node deployment work-flow down to nil

Getting recoveries of off-line nodes to a minimum

Tracking assets and reliability, generating metrics

Evaluating capabilities of the whole set-up

Developing a test program for many nodes

Slicing/Dicing data to see what it has to tell us

Developing visualizations and distillations to put tools
in hands of network maintainers, merging into pS
Toolkit
Slide 5
Slide 5
Devices We Have


Raspberry Pi – famous, $50, med-perf, file system on
SD card, 100 Mb Ethernet, USB 2.0
BeagleBone Black – $50, more perf, FS on internal
flash, and/or SD card, 100 Mb, USB 2.0
Honorable mention:


CuBox i4 – $147, more perf, FS on SD, GigE, WiFi, USB
2.0
MiraBox $149 – most perf, FS on SD, dual GigE, WiFi,
USB 3.0
Slide 6
Slide 6
Reliability




Raspberry Pi (July 2014)

UH ITS owns 47 – 1 has failed

22 SD card hard failures

10 file-system failures
BeagleBone Black Rev A/C. (December 2013/April
2015)

UH ITS owns 60, 1 has corrupted firmware

Of nodes in production, one had to be power-cycled, once
CuBox – one deployed 6 months of service zero
problems. (using SD from OEM).
Mirabox – promising, dual GigE ($150), wimpy kernel
Slide 7
Slide 7
SD Cards

DANE ELEC 8 GB Class 4
10 cards, 2 failures in light duty

SanDisk Ultra 8 GB Class 10


Kingston 8 GB Class 10



10 cards, 0 failures, 3 FS corrupted in 42k hours
10 cards, 0 failures, 7 FS corrupted, in 42k hours
Kingston 4 GB Class 4

20 hard failures in less than 20k hours

(100% across 6 weeks, < 1000 Hr MTBF)
SanDisk Ultra – 8GB Class 10

Most recent batch of replacements
Slide 8
Slide 8
Year 1

Tried 10 BeagleBones, liked them


And a few Raspberries Pi
The market vacuum around the release of BBB Rev. C
made BBB impossible to obtain

Bought 43 Raspberries

Although we are going with
BeagleBone Black for the
completion, we could make
Raspberries work if necessary.

Bought 2 Dell rack servers as
test facilitators, data archives.
Slide 9
Slide 9
2nd Year Completion

50 BeagleBone Black Rev. C
(4 GB internal flash)




BBB Internal flash is more reliable than
SD

Internal + SD card enables separating
system/data partitions

Better 100 Mb Ethernet performance
5 Raspberry Pi 2 Model B
As number deployed approaches 100,
we will be placing nodes in new/special
roles.
Correlating topology from netdot,
MRTG graphs for context
Slide 10
Slide 10
Management


Puppet/The Foreman

https://puppetlabs.com/

http://theforeman.org/

Easy to push changes, updates out to the swarm.

Easy to push errors out to the swarm and require 50 SSH
sessions.
Work-flow

Try to minimize per node actions and attended setup

RPi – ua-netinstall with tweaks for Puppetization

BBB – custom SD that auto-images the internal flash

Make individual nodes as interchangeable as possible

(If you have a choice use one type of device)
Slide 11
Slide 11
Characteristics Of Dense
Sensor Deployment Within An Enterprise




A “sensor” is less complicated than a perfSONAR toolkit
node
Central perfSONAR buoy/MA orchestrates
Having many observations makes the loss of a single
one less important.
You can correlate topo and test results to “triangulate” on
the source of a problem.

It takes planning avoid affecting user traffic

Strategy is to “be” user traffic

pS Toolkit as-built isn't really made for 100 nodes
Slide 12
Slide 12
Test Programs: powstream (owamp)

powstream puts 10 packets/second on the wire, 24
hours a day



(there's been discussion about increasing the rate)
To some extent, apparently stochastic/probabilistic loss
resembles stochastic/probabilistic loss at much higher
rates
– meaning – the probabilistic loss that powstream
encounters is probably the minimum of what a
throughput test will encounter.
Slide 13
Slide 13
SideBar: Regular Global perfSONAR
powstream
Log-scaled loss to color gradient
An early idea about how to unify many graphs in front of your cerebral cortex.
Black = 0% loss
Green->Yellow->Red gradient: low → medium → higher loss
Log scaled to
avoid hiding low loss
In our campus
network, everything
was always black
(no appreciable loss)
(Gray – no data)
Time: left to right
Each 10 pixel row is
one path.
Slide 14
Slide 14
Test Programs: powstream (owamp)

powstream from pS Toolkit node to/from each sensor
node

Really, really, really boring at first glance. All loss appears to
be about zero. Always one or two losing a packet per day (1
in 864000)

Standard deviation in latency groups somewhat interesting,
may reflect queuing, flares in latency std dev may precede
loss events

Longitudinal analysis reveals damaging loss rates that would
otherwise be invisible

Higher packet rates might expose low loss probabilities in
shorter time
Slide 15
Slide 15
30 nodes, in/out
Mathis, Semke, Mahdavi, "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm”,
ACM SIGCOMM, Vol 27, Number 3, July 1997
Slide:
Used with permission
Speed Limits You Can't See
For 45 milliseconds RTT, typical minimum to get onto
continental US from Hawaii
Loss Rate
10 pps Powstream
Packets
Lost
Per day
TCP AIMD Coastal
Limit
@1460 MSS
(Mbits/sec)
TCP AIMD Coastal
Limit
@8960 MSS
(Mbits/sec)
45 mS RTT
45 mS RTT
1.82E-005
15.75
42.56
261.18
2.25E-006
1.94
121.11
743.23
1.87E-006
1.62
132.76
814.72
9.38E-007
0.81
187.58
1151.16
6.05E-007
0.52
233.55
1433.28
5.93E-007
0.51
236.03
1448.52
3.35E-007
0.29
314.03
1927.21
2.51E-007
0.22
362.49
2224.57
1.74E-007
0.15
435.64
2673.49
Slide 19
Slide 19
Test Progams:
50 Node Full Mesh TCP Throughput

<= 100 Mbps RPi, BBB throughput tests resemble
real-life user flows





Unlike a high performance iperf tester which
“punches the network in the face”
I run a 50x50 full mesh iperf matrix (2450 tests) in
about 7 hours, (5 second tests).
Full-mesh traceroute is collected concurrently
By scoring every hop encountered on the average
peformance for paths it appears in, “per-hop
confidence” can be derived.
Using multi-rate UDP vs. TCP is worth investigating.
Slide 20
Slide 20
The Matrix
Sources



Cut-out view of iperf3 tests to/from a chosen node...
This row/column represents all tests to/from that
chosen node.
Leaves one wondering what the correlation is
between the pink squares showing retransmissions
Slide 22
Slide 22
Correlating Full Mesh Throughput And Traceroute Results For
Fault Isolation
Slide 23
Slide 23
Graph of per-hop
“confidence” with
colored links where
retransmissions were
observed
(names/addresses
obfuscated)
This graph shows hops
involved in in-bound
Throughput testing
between a chosen
node and all partners.
Each oval represents an
IP interface as reported
in Traceroute output.
Graph rendered from
test data with
GraphViz.
(GraphViz.org)
Data Archiving

perfSONAR MA




Exposing some ways in which MA handling of long-term,
diverse data could be optimized
Correlating such things as early/late “bathtub curve”
failures per equipment life cycle (see Wikipedia ^^^ )
Trending probabilistic loss by months/years
Etc
Slide 25
Slide 25
Ongoing

perfSONAR toolkit integration



Not so much new development as making some pieces fit
together
Correlation of other sources to zero in on a fault

NetDot

Flows/MRTG
Ancillary programs

Log collection (honeypot-ish info)

Name resolution tests

v6/v4 precedence
Slide 26
Slide 26
Download