LSST end-to-end call – December 3rd Topics for discussion 1) General updates since last end-to-end call on July 23rd a. Layer 2 Virtual Connectivity to NCSA: i. Strategy discussed on July 23rd: extend VLANs 4001 and 4002 from La Serena to NCSA, U.S. 1. AURA: Albert (from REUNA) extended the VLANs 4001 and 4002 up to the AURA’s edge; AURA team is still working to drop the VLANs at the perfSonar interfaces (Aug 06th) 2. NCSA: On Oct 27th the Layer 2 circuit was created using AL2S service from Jacksonville to Chicago. Paul is currently working to drop the circuit internally at the perfSonar. b. Phases 1 and 2 of end-to-end testing plan i. Software update: All perfSonar servers were updated to the latest version (3.5) ii. Topology update and improvements: 1. The two Brocade CES switches were replaced by two Brocade MLXe routers in Chile iii. Network and Host Tunings: 1. More tunings to the TCP/IP stack and the NCI card were applied 2. Better results on bandwidth tests, but packet loss detected somewhere between Level3 and REUNA in Chile iv. Tests scenario and interesting findings: 1. In Santiago, we currently have two perfSonar servers: one placed within Level3-Santiago Datacenter, and the second within REUNA's office; 2. From Level3-Santiago to NCSA we currently have an average of 3.5Gbps using TCP, and 1Gbps using UDP (the test is limited to 1 Gbps - to meet the requirements of Phase 3) 3. From REUNA to NCSA we got initially an average of 300Mbps using UDP, but currently the bandwidth results are getting worse, and the results are showing less than 50 Mbps using TCP. The 2% packet loss occurring between the two hosts explain the low bandwidth achieved. 4. Conclusion: So Phase 2, from Santiago to NCSA, can be considered partially completed but we need to understand what is impacting the results in the span between Level3 and REUNA Office. The poor result might impact the next phase of the end-to-end plan. c. Current Challenges: i. In response to the performance issues seen so far, we are collecting more data about all physical paths interconnecting these perfSonar nodes, as well as monitoring interface counters such as utilization, errors, discards, broadcasts, unicasts, and the interface buffers on the switches. ii. There is also an effort to start the monitoring plan described on the document LSST Operations and Management Plan: 1. We will create a dashboard or something like that on our Zabbix, to provide all the information regarding the paths between the perfSonar nodes. iii. I have requested from all networks involved in the tests access to the physical network documentation as well as read-only SNMP access to all devices in the physical path. SNMP access to ports was requested in order to have statistical data including: unicast packets, bandwidth, interface drops/errors, and broadcast packets. This kind of information is important to establish a correlation between the network behavior and test results. Current status: 1. REUNA: provided the diagram (Nov 3rd). Additional information was requested with more details: such as port names, bandwidth, etc. They still are discussing internally whether or not the SNMP access will be provided. 2. AURA: We haven’t heard anything back. 3. NCSA: I will discuss with Paul as soon as we finish the Layer2 domain extension using the AL2S service. 2) Future directions and conclusion a. Continue documenting the physical path between the perfSonar nodes i. To complete the requirements of the perfSonar deployment and advance to the next testing phase of the project, AmLight engineers need to have a clear understanding of what is occurring on the physical paths between the perfSonar servers b. Understand the root cause of the packet loss between Level3-Santiago and REUNA's Office i. This is an important action to establish a correlation between the one-way delay measurements and the bandwidth tests c. Fix the current detailed high-level solution diagram and update the Confluence Wiki – I’ll update all diagram information as soon as we finish extending the VLANs from La Serena to NCSA. d. Start Phase: 3 - Santiago to NCSA, 1Gbps Guaranteed Bandwidth e. Support for QoS and bandwidth reservation at AmLight: currently we are in process of adopting an OpenFlow solution to address this requirement.