Measuring Control Plane Latency in SDN-enabled Switches Keqiang He, Junaid Khalid, Aaron Gember-Jacobson, Sourav Das, Chaithan Prakash, Aditya Akella, Li Erran Li and Marina Thottan 1 Latency in SDN app Control plane app app Centralized Controller OpenFlow msgs Data plane Time taken to install 100 rules ? Some XXX msecs?? Can be as large as 10 secs!!! 2 Do SDN apps care about latency? Intra-DC Traffic Engineering • MicroTE [CoNEXT’11] routes predictable traffic on short time scales of 1-2 sec Fast Failover • Reroute the affected flows quickly inundermine face of failuresthe Latency can significantly • Longer update time increases congestion andapps drops effectiveness of many SDN Mobility • SoftCell [CoNEXT’13] advocates SDN in cellular networks • Routes setup must complete within ~30-40 ms 3 Factors contributing to latency? Control Software Design and Distributed Controllers Speed of Control Programs and Network Latency Not received much attention Latency in Network Switches 4 Our Work Two contributions: • Systematic experiments to quantify control plane latencies in production switches Latency in network switches • Factors that affect latency and low level explanations 5 Elements of Latency – inbound latency Controller CPU board Switch ASIC SDK I1 Switch Fabric Forwarding Engine PHY DMA OF Agent CPU PCI Memory I3 I2 No Hardware Match Lookup Inbound Latency I1: Send to ASIC SDK I2: Send to OF Agent I3: Send to Controller Tables PHY Packet 6 Elements of Latency – outbound latency Controller O1 CPU board Switch Switch Fabric Forwarding Engine PHY DMA OF Agent CPU O2 O4 PCI Memory ASIC SDK Lookup Hardwar e Tables O3 Outbound Latency O1: Parse OF Msg O2: Software schedules the rule O3: Reordering of rules in table O4: Rule is updated in table PHY 7 Measurement Scope • Inbound Latency • Increases with flow arrival rate • Increases with interference from outbound msgs • Higher CPU usage for higher arrival rates • Outbound Latency • Insertion • Modification • Deletion Please see our paper for details 8 Measurement Methodology Switch CPU RAM OF Version Flow table size Data Plane Vendor A 2 Ghz 2GB 1.0 4096 40*10G+4*40 1.0 896 1.3 1792 (ACL) 14*10G+4*40G 1.0 750 48*10G+4*40G Vendor B-1.0 1 Ghz 1GB Vendor B-1.3 Vendor C ? ? 9 Insertion Latency Measurement Insert B rules in a burst (back-to-back) Per rule insertion latency = t1 – t0 SDN Controller eth0 Control Channel t0 Pktgen eth1 Flows OUT Flows IN 1Gbps of 64B Ethernet packets t1 eth2 Libpcap OpenFlow Switch Pre-install 1 default “drop all” rule with low priority 10 Insertion Latency – Priority Effects 30 10 25 8 20 6 15 30 10 insertion delay (ms) (ms) delay(ms) insertiondelay insertion • Affected by priority patterns on all the switches we measured • Per rule insertion always takes order of msec 4 10 25 00 25 8 20 6 15 4 10 25 0 00 40 400 60 100 20200 300 500 60080 700 100 800 rule # rule # (a) Burst size 100, same priority (a) Burst size 800, same priority 0 40 400 60 100 20200 300 500 60080 700 100 800 rule # (b) (b) Burst Burst size size 100, 800, incr. decr.priority priority Vendor Vendor B-1.0 A switch 11 Insertion Latency – Table occupancy Effects • Affected by the types of rules in the table table 100 table 100 table 400 avg completion time (ms) avg completion time (ms) 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 table 400 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 TCAM Organization, Rule Priority & Table Occupancy Switch Software Overhead 0 100 200 300 400 500 600 burst size (a) low priority rules into a table with high priority rules 0 300 400 500 600 burst size (b) high priority rules into a table with low priority rules Vendor B-1.0 switch 100 200 12 Modification Latency Measurement Per rule modification latency = t1 – t0 Modify B rules SDN in a burst (back-to-back) Controller eth0 Control Channel t0 Pktgen eth1 Flows OUT Flows IN 1Gbps of 64B Ethernet packets t1 eth2 Libpcap OpenFlow Switch Pre-install B dropping rules 13 Modification Latency • Same as insertion latency on vendor A • 2X as insertion latency on vendor B-1.3 Modification Latency 160 140 120 100 80 60 40 20 0 modification delay (ms) modification dely (ms) • Much higher on vendor B-1.0 and vendor C • Not affected by rule priority but affected by table occupancy 160 140 120 100 80 60 40 20 0 Poorly optimized switch software 0 20 40 60 80 100 0 rule # 50 100 150 200 rule # (b) 200 rules in table (a) 100 rules in the table Vendor B-1.0 switch 15 Deletion Latency Measurement Delete B rules in a burst (back-to-back) Per rule deletion latency = t1 – t0 SDN Controller eth0 Control Channel t0 Pktgen eth1 Flows OUT Flows IN 1Gbps of 64B Ethernet packets t1 eth2 Libpcap OpenFlow Switch Pre-install B dropping rules & 1 “pass all” rule with low priority 16 Deletion Latency • Higher than insertion latency for all the switches we measured • Not affected by rule priority but affected by table occupancy 14 12 deletion delay (ms) deletion delay (ms) 14 10 8 12 10 8 Deletion is incurring TCAM6 Reorganization 6 4 2 0 0 20 40 60 rule # 80 100 4 2 0 (a) 100 rules in table 0 50 100 rule # (b) 200 rules in table 150 200 Vendor A switch 17 Recommendations for SDN app designers • Insert rules in a hardware-friendly order • Consider parallel rule updates if possible • Consider the switch heterogeneity • Avoid explicit rule deletion if timeout can be applied 18 Summary • Latency in SDN is critical to many applications • Long latency can undermine many SDN app’s effectiveness greatly Need careful design of future switch silicon • Assumption: Latency is small or constant and software in order to fully utilize the • Latency is high and variable power of SDN! Varies with Platforms, Type of operations, Rule priorities, Table occupancy, Concurrent operations Key Factors: TCAM Organization, Switch CPU and inefficient Software Implementation 19 20 Inbound Latency • Increases with flow arrival rate • CPU Usage is higher for higher flow arrival rates Flow Arrival Rate (packets/sec) Mean Delay per packet_in (msec) CPU Usage 100 3.32 15.7 % 200 8.33 26.5 % vendor A switch 21 Inbound Latency 300 300 250 250 inbound delay (ms) inbound delay (ms) • Increases with interference from outbound msgs 200CPU Low Power 200 150 100 150 100 Software Inefficiency 50 50 0 0 0 200 400 600 flow # 800 (a) with flow_mod/pkt_out 1000 0 200 400 600 800 flow # (b) w/o flow_mod/pkt_out 1000 vendor A switch. Flow Arrival Rate = 200/s 22 How accurate? • 500 flows • 1Gbps • 64B Ethernet packet • Inter-packet gap of a flow = 256 us • Solution: Either increase the packet rate or reduce the number of flows or measure the accumulated latency for many flows 23