December 3rd

advertisement
LSST end-to-end call – December 3rd
Topics for discussion
1) General updates since last end-to-end call on July 23rd
a. Layer 2 Virtual Connectivity to NCSA:
i.
Strategy discussed on July 23rd: extend VLANs 4001 and 4002 from La
Serena to NCSA, U.S.
1. AURA: Albert (from REUNA) extended the VLANs 4001 and 4002 up to
the AURA’s edge; AURA team is still working to drop the VLANs at the
perfSonar interfaces (Aug 06th)
2. NCSA: On Oct 27th the Layer 2 circuit was created using AL2S service
from Jacksonville to Chicago. Paul is currently working to drop the circuit
internally at the perfSonar.
b. Phases 1 and 2 of end-to-end testing plan
i.
Software update: All perfSonar servers were updated to the latest version
(3.5)
ii.
Topology update and improvements:
1. The two Brocade CES switches were replaced by two Brocade MLXe
routers in Chile
iii.
Network and Host Tunings:
1. More tunings to the TCP/IP stack and the NCI card were applied
2. Better results on bandwidth tests, but packet loss detected somewhere
between Level3 and REUNA in Chile
iv.
Tests scenario and interesting findings:
1. In Santiago, we currently have two perfSonar servers: one placed within
Level3-Santiago Datacenter, and the second within REUNA's office;
2. From Level3-Santiago to NCSA we currently have an average of
3.5Gbps using TCP, and 1Gbps using UDP (the test is limited to 1 Gbps
- to meet the requirements of Phase 3)
3. From REUNA to NCSA we got initially an average of 300Mbps using
UDP, but currently the bandwidth results are getting worse, and the
results are showing less than 50 Mbps using TCP. The 2% packet loss
occurring between the two hosts explain the low bandwidth achieved.
4. Conclusion: So Phase 2, from Santiago to NCSA, can be considered
partially completed but we need to understand what is impacting the
results in the span between Level3 and REUNA Office. The poor result
might impact the next phase of the end-to-end plan.
c. Current Challenges:
i.
In response to the performance issues seen so far, we are collecting more
data about all physical paths interconnecting these perfSonar nodes, as well
as monitoring interface counters such as utilization, errors, discards,
broadcasts, unicasts, and the interface buffers on the switches.
ii.
There is also an effort to start the monitoring plan described on the document
LSST Operations and Management Plan:
1. We will create a dashboard or something like that on our Zabbix, to
provide all the information regarding the paths between the perfSonar
nodes.
iii. I have requested from all networks involved in the tests access to the physical
network documentation as well as read-only SNMP access to all devices in the
physical path. SNMP access to ports was requested in order to have statistical
data including: unicast packets, bandwidth, interface drops/errors, and
broadcast packets. This kind of information is important to establish a
correlation between the network behavior and test results. Current status:
1. REUNA: provided the diagram (Nov 3rd). Additional information was
requested with more details: such as port names, bandwidth, etc. They
still are discussing internally whether or not the SNMP access will be
provided.
2. AURA: We haven’t heard anything back.
3. NCSA: I will discuss with Paul as soon as we finish the Layer2 domain
extension using the AL2S service.
2) Future directions and conclusion
a. Continue documenting the physical path between the perfSonar nodes
i.
To complete the requirements of the perfSonar deployment and advance to
the next testing phase of the project, AmLight engineers need to have a clear
understanding of what is occurring on the physical paths between the
perfSonar servers
b. Understand the root cause of the packet loss between Level3-Santiago and REUNA's
Office
i.
This is an important action to establish a correlation between the one-way
delay measurements and the bandwidth tests
c. Fix the current detailed high-level solution diagram and update the Confluence Wiki –
I’ll update all diagram information as soon as we finish extending the VLANs from La
Serena to NCSA.
d. Start Phase: 3 - Santiago to NCSA, 1Gbps Guaranteed Bandwidth
e. Support for QoS and bandwidth reservation at AmLight: currently we are in process of
adopting an OpenFlow solution to address this requirement.
Download