ICFA/SCIC Monitoring Update

advertisement
Monitoring Results Update
Les Cottrell – SLAC
Prepared for the ICFA-SCIC, CERN July 10, 2002
http://www.slac.stanford.edu/grp/scs/net/talk/icfa-jul02.html
Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end
Performance Monitoring (IEPM), also supported by IUPAP
New stuff
•
•
•
•
•
Porting
Prediction
Trans-Pacific throughput
Topology
Next
–
–
–
–
UDPmon
Web services access to PingER and IEPM-BW data
Iperf integrate with Web100
MAGGIE proposal to DoE
High throughput IEPM-BW
•
•
•
•
Monitoring about 35 sites from SLAC
Ported from Solaris to Linux
Automating & documenting porting procedures
Installing monitoring hosts at:
– Manchester, FNAL, NIKHEF, UMich, Internet2,
APAN-KR, INFN/Trieste
– Making toolkit code available for other measurement
projects (Atlas/UMich, Internet 2 …)
SLAC IEPM-BW Deployment
Prediction
Iperf TCP throughput (Mbits/s)
Added exponentially
weighted moving
average to prediction
method
Less than 2% difference
compared to just
averaging last 5 points
450
400
350
300
250
Average
Observed
200
EWMA
150
6/23/02
6/25/02
6/27/02
6/29/02
7/1/02
error = average(abs(forecast-observed)/observed)
33
hosts
error
stdev
Iperf
TCP
10%
8%
Bbcp
mem
17%
15%
Bbcp
disk
15%
13%
Simple method gives predictions within 20% on average
bbftp
16%
12
7/3/02
7/5/02
Maximum Throughput on
Transatlantic Links (155 Mbps)
*
•
•
•
8/01
105 Mbps reached with 30 Streams: SLAC-IN2P3
9/1/01 102 Mbps in One Stream: CIT-CERN
11/5/01 125 Mbps in One Stream (mod kernel): CITCERN135 Mbps in One Stream (mod kernel): CIT-Chicago
• 1/09/02 190 Mbps for One stream shared on 2 155 Mbps
links
• 3/11/02 120 Mbps Disk-to-Disk with One Stream on a 155
Mbps link (Chicago-CERN)
• BaBar Goal: 600 Mbps Throughput in 2002
Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/;
and the Internet2 E2E Initiative: http://www.internet2.edu/e2e
Trans-Oceanic throughput
About 1/3 of the hosts being measured
to have throughputs consistently over
155Mbps.
Over 10% have over 300Mbits/s
Now can do high throughput across
Oceans.
Traceroute topology
• Traceping in VMS/DCL runs traceroutes hourly
and pings nodes along route
– Data from about 6 measurement sites
– Rewritten for Unix/perl
– Running in production at SLAC running traceroutes to
33 high performance sites
• Today reporting is via tables
• Need a more graphic way to look at
– Map the grid, provide easy to read topology maps of
grid connections
SLAC to N. Am
ANL
LANL
Click on
node for
more detail
LBL
Rice
ORNL
TRIUMF
Caltech
Nodes colored by Network provider
Rice to Rice
IEPMBW
sites by
ISP
Click here
For RTT
Rice
Rice to
IEPM-BW
by RTT
KEK.jp
SDSC
dl.uk
LANL
Triumf
Click on branch
Node for subgraph
Wisc
infn.it
ANL
Riken.jp
IN2P3
Click on end-node
to see route details
Detailed subgraph
Next steps
• Develop/extend management, analysis, reporting, navigating tools
• Get improved forecasters (in particular NWS multivariate tools) and
quantify how they work, provide tools to access
• Optimize intervals (using forecasts, and lighter weight measurements) and
durations, tie in Web100 to iperf so can see when out of slow-start
• Evaluate self rate limiting application (bbcp), look at using Web100 for
feedback loop
• Extend analysis of passive Netflow measurements
• Add gridFTP (with Allcock@ANL), UDPmon (RHJ@manchester) & new BW
measurers – netest (Jin@LBNL), pathrate, pathload (Dovropolis@Udel)
• Understand correlations, validate various tools, choose optimum set
• Make data available by std methods (e.g. web services, MDS, GMA, …) –
with Dantong@BNL, Jenny Schopf@ANL & Tierney@LBNL
• Get funding (MAGGIE proposal)
ICFA/SCIC Monitoring WG
Goals
• Obtain as uniform picture as possible of
the present performance of the
connectivity used by the ICFA community
• For end of 2002, prepare a report on the
performance of HEP connectivity,
including, where possible, the identification
of any key bottlenecks or problem areas.
Administrivia
• ICFA-SCIC-MON web page created:
http://www.slac.stanford.edu/xorg/icfa/scic-netmon/ &
• Email list icfa-scic-mon@slac.stanford.edu set up
• Membership:
Person
From
Represents
Les Cottrell
SLAC
US/Babar/ESnet
Richard H-Jones
Manchester
UK/JAnet
Sergei Berezhnev
MSU, RUHEP
Russia/FSU
Sergio Novaes
FNAL
L. America
Fukuko Yuasa
KEK
Japan & E. Asia
Sylvain Ravot
Caltech
US/CMS
Daniel Davids
CERN
CERN, Europe, LHC
Shawn McKee
U Mich
Atlas/I2
Getting started
• Decide on regions (proposal), decide what
measurements/reports needed for ICFA report
• Goals:
– A physical region, generally recognized
– Similar connectivity and issues,
– Including countries with HENP requirements, and
with monitorable connections
– Limited number of regions
Top Level (10 regions)
Moscow
Siberia
Belorussia
Ukraine Kazakhstan
<
Azerbaijan
Turkey Georgia
Iran
EgyptIsrael
Mongolia
China
Japan
Korea
India
Columbia
<
<
Malaysia
Uganda
Peru
Brazil
Uruguay
Chile
S. Africa
Australia
Argentina
–N. America, Latin America (includes Central & S. America), Europe, S.E.
Europe, FSU, Africa, S. E. Asia, S. W. Asia, Australasia (incl. Pacific Islands).
A few subdivisions (now 17 regions)
• Europe, break out Baltic States
• S. E. Asia, break out
– Japan, China
• S. W. Asia, break out
– S. Asia (IN, PK, Bangladesh..)
– Caucasus
• N. America, break apart U.S. & Canada
Questions
• Is Israel closer to Europe?
• Is Greece part of S. E. Europe?
• Do we break out Mid-East
– Is Egypt part of Africa or Mid East?
• Do we break out S. FSU republics
Content of Report
• Guidance on what to put in report?
– Methodology
• Low impact for developing regions, see previous report and
PingER pages, deployment
• Do we want to address hi-perf throughput (e.g. iperf) only
relevant for W. Europe, Japan & N. America?
– Performance of A&R nets
• Which ones?
– Hi-perf: ESnet, Internet 2, Janet, INFN, DFN, CAnet, IN2P3,
Renater, Japan
– Others
• Growth loss, derived throughput; traffic, improvements
• Current performance tables
Throughput quality improvements
TCPBW < MSS/(RTT*sqrt(loss))
80% annual
improvement ~
factor 10/4yr
Balkans not keeping up
China
Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, Matthis,
Semke, Mahdavi, Ott, Computer Communication Review 27(3), July 1997
Losses: World by region, Jan ‘02
• <1%=good, <2.5%=acceptable, < 5%=poor,
>5%=bad Monitored
• Russia,
S
America
bad
• Balkans,
M East,
Africa,
S Asia,
Caucasu
s poor
Region \
Monitor
BR CA DK DE HU IT JP RU CH UK US
Country (1) (2) (1) (1) (1) (3) (2) (2) (1) (3) (16) Avg
COM
0.2
0.3
0.3 0.2
Canada
1.8 1.6 0.3 0.5 9.0 0.3 1.4 21.7 0.7 0.7 0.5 3.5
US
0.4 2.6 0.2 0.3 8.0 0.1 1.4 13.8 0.3 1.3 0.9 2.7
C America
0.9 0.9
Australasia
0.8 1.8 1.3
E Asia
1.2 3.5 1.0 1.1 9.0 0.9 2.0 5.2 1.5 1.4 1.5 2.6
Europe
0.4 5.6 0.3 0.5 5.4 0.4 1.3 15.5 1.1 1.0 1.0 2.9
NET
1.7 6.2 1.0 1.3 8.0 1.6 3.6 21.9 0.7 0.8 0.9 4.3
FSU4.5
0.5 9.8 0.5 1.6 11.2 4.3 1.2 2.0 4.0
Balkans
3.8 3.8
Mid East
4.6 1.4 3.0 8.5 2.8 3.2 11.8 2.0 2.5 2.1 4.2
Africa
5.8
1.5 12.0 1.2 4.2 11.9 2.0 1.9 2.5 4.8
Baltics
5.3 0.8 2.3 7.7 2.2 3.5 10.8 4.8 2.1 3.9 4.3
S Asia
1.6 7.3 0.1 3.1 9.2 3.0 3.9 17.9 1.5 3.1 3.0 4.9
Caucasus
3.2 3.2
S America 24.1 11.3 0.6 0.9 6.7 12.9 7.7 23.0 9.3 1.1 6.6 9.5
Russia
35.9 24.1 22.2 13.4 23.8 21.7 13.6 0.7 8.7 24.1 12.7 18.3
Avg
7.5 6.9 2.8 2.4 9.8 3.7 3.9 13.8 3.1 3.2 2.8 4.4
Pairs
64 144 54 67 70 203 190 114 209 192 1990
A
v
gAvg
-NA +
(WEU
H+ JP Pairs
Region
COM
0.27
23
Canada
0.74 126
US
0.88 2149
C America 0.89 19
Australasia 1.30 18
E Asia
1.61 215
Europe
1.38 852
NET
2.00
85
FSU2.09
48
Balkans
3.83 109
Mid East
2.70
57
Africa
2.72
45
Baltics
3.12
67
S Asia
3.12
97
Caucasus
3.22
19
S America 6.30 203
Russia
17.57
91
Avg
3.16
Pairs
Download