TCP Performance Monitoring (and Pyretic background) Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks http://www.cs.princeton.edu/courses/archive/fall14/cos561/ Assignments • Buffer bloat –Extension till 3pm Tuesday October 14 • Next assignment: two options –BGP measurement Study BGP (in)stability “in the wild” If you haven’t taken COS 461 TA: Linpeng Tang <linpengt@CS.Princeton.EDU> –SDN firewall Build an SDN controller application on Pyretic If you haven’t taken the fall 2013 SDN grad seminar TA: Xin Jin <xinjin@CS.Princeton.EDU> –If you didn’t take either course, you can pick 2 Applications Inside Data Centers …. …. Front end Aggregator Server …. …. Workers 3 Challenges of Datacenter Diagnosis • Large complex applications –Hundreds of application components –Tens of thousands of servers • New performance problems –Update code to add features or fix bugs –Change components while app is in operation • Old performance problems (Human factors) –Developers may not understand network well –Small packets, delayed ACK, etc. 4 Diagnosis in Data Centers App logs: #Reqs/sec Response time Host 1% req.>200ms delay App Applicationspecific OS SNAP: Diagnose net-app interactions Generic, fine-grained, and lightweight Packet trace: Filter out trace for long delay req. Too expensive Packet sniffer Switch logs: #bytes/pkts per minute Too coarse-grained 5 Collect Data in TCP Stack • TCP understands net-app interactions – Flow control: How much data apps want to read/write – Congestion control: Network delay and congestion • Collect TCP-level statistics – Defined by RFC 4898 – Already exists in today’s Linux and Windows OSes 6 TCP-level Statistics • Cumulative counters – Packet loss: #FastRetrans, #Timeout – RTT estimation: #SampleRTT, #SumRTT – Receiver: RwinLimitTime – Calculate the difference between two polls • Instantaneous snapshots – #Bytes in the send buffer – Congestion window size, receiver window size – Representative snapshots based on Poisson sampling 7 Life of Data Transfer Sender App • Application generates the data – No network problem • Copy data to send buffer Send Buffer – Send buffer not large enough • TCP sends data to the network Network Receiver – Fast retransmission – Timeout • Receiver receives the data and ACK – Not reading fast enough (CPU, disk, etc.) – Not ACKing fast enough (Delayed ACK) 8 Characterizing Performance Limitations Send Buffer #Apps that are limited for > 50% of the time Network Receiver 1 App 6 Apps – Send buffer not large enough – Fast retransmission – Timeout 8 Apps – Not reading fast enough (CPU, disk) – Not ACKing fast enough (Delayed ACK) 144 Apps 9 Discussion • What to do if the monitoring is too expensive? • What to do in a public cloud, where each tenant runs its own virtual machine? • What to do in the wide area, between a server and a (remote) client? 10 Programming Abstractions for SDN Controller Applications http://www.cs.princeton.edu/~jrex/pape rs/pyretic-login13.pdf 11 Simple, Open Data-Plane API • Prioritized list of rules –Pattern: match packet header bits –Actions: drop, forward, modify, send to controller –Priority: disambiguate overlapping patterns –Counters: #bytes and #packets 1. src=1.2.*.*, dest=3.4.5.* drop 2. src = *.*.*.*, dest=3.4.*.* forward(2) 3. src=10.1.2.3, dest=*.*.*.* send to controller 12 (Logically) Centralized Controller Controller Platform 13 Protocols Applications Controller Application Controller Platform 14 Programming SDNs • The Good – Network-wide visibility – Direct control over the switches – Simple data-plane abstraction • The Bad – Low-level programming interface – Functionality tied to hardware – Explicit resource control • The Ugly Images by Billy Perkins – Non-modular, non-compositional – Programmer faced with challenging distributed programming problem 15 Combining Many Networking Tasks Monolithic application Monitor + Route + FW + LB Controller Platform Hard to program, test, debug, reuse, port, … 16 Modular Controller Applications A module for each task Monitor Route FW LB Controller Platform Easier to program, test, and debug Greater reusability and portability 17 Modules Affect the Same Traffic Each module partially specifies the handling of the traffic Monitor Route FW LB Controller Platform How to combine modules into a complete application? 18 From Rules to a Policy Function • Located packet –A packet and its location (switch and port) • Policy function –From located packet to set of located packets • Examples –Original packet: identity –Drop the packet: none –Modified header: modify(f=v) –New location: fwd(a) 19 From Bit Patterns to Predicates • OpenFlow –No direct way to specify dstip!=10.0.0.1 –Requires two prioritized bitmatches Higher priority: dstip=10.0.0.1 Lower priority: * • Using boolean predicates –Providing &, |, and ~ –E.g., ~match(dstip=10.0.0.1) 20 Virtual Header Fields • Unified abstraction –Real headers: dstip, srcport, … –Packet location: switch and port –User-defined: e.g., traffic_class • Simple operations –Match: match(f=v) –Modify: modify(f=v) • Example –match(switch=A) & match(dstip=‘1.0.0.3’) 21 Power of Policy as a Function • Dynamic policy –A stream of policy functions • Composition –Parallel: Monitor + Route –Sequential: Firewall >> Route •A >> (B + C) >> D •(A >> P) + (B >> P) •(A + B)>>P 22 Equational Theory A sign of a well-conceived language == a simple equational theory • Commutative (+) – P + Q == Q + P • Associative (+) – (P + Q) + R == P + (Q + R) • Drop unit – P + drop == P • Associative (>>) – (P >> Q) >> R == P >> (Q >> R) 23 Equational Theory • Id unit (>>) – id >> P == P – P >> id == P • Drop zero (>>) – drop >> P == drop – P >> drop == drop • If commutes (>>) – (if q then P else Q) >> R == if q then (P >> R) else (Q >> R) 24 A Simple Use Case (Modular Reasoning) firewall = if srcip = 1.1.1.1 then drop else id router = ... app = firewall >> router app == firewall >> router == (if srcip = 1.1.1.1 then drop else id) >> router == if srcip = 1.1.1.1 then (drop >> router) else (id >> router) == if srcip = 1.1.1.1 then drop else (id >> router) == if srcip = 1.1.1.1 then drop else router 25 Compiling Parallel Composition dstip = 1.2.3.4 fwd(1) dstip = 3.4.5.6 fwd(2) srcip = 5.6.7.8 count Monitor on source + Route on destination Controller Platform 26 Compiling Parallel Composition dstip = 1.2.3.4 fwd(1) dstip = 3.4.5.6 fwd(2) srcip = 5.6.7.8 count Monitor on source + Route on destination Controller Platform srcip = 5.6.7.8, dstip = 1.2.3.4 fwd(1), count srcip = 5.6.7.8, dstip = 3.4.5.6 fwd(2), count srcip = 5.6.7.8 count dstip = 1.2.3.4 fwd(1) dstip = 3.4.5.6 fwd(2) 27 Compiling Sequential Composition srcip = 0*, dstip=1.2.3.4 dstip=10.0.0.1 srcip = 1*, dstip=1.2.3.4 dstip=10.0.0.2 Load Balancer >> dstip = 10.0.0.1 fwd(1) dstip = 10.0.0.2 fwd(2) Routing Controller Platform 28 Compiling Sequential Composition srcip = 0*, dstip=1.2.3.4 dstip=10.0.0.1 srcip = 1*, dstip=1.2.3.4 dstip=10.0.0.2 Load Balancer >> dstip = 10.0.0.1 fwd(1) dstip = 10.0.0.2 fwd(2) Routing Controller Platform srcip = 0*, dstip = 1.2.3.4 dstip = 10.0.0.1, fwd(1) srcip = 1*, dstip = 1.2.3.4 dstip = 10.0.0.2, fwd(2) 29 Queries as Buckets • Forwarding to a “bucket” –Q = packets(limit=1,group_by=['srcip']) • Callback functions –Q.register_callback(printer) • Multiple kinds of buckets –Packets: with limit on number –Packet counts: with time interval –Byte counts: with time interval 30 SQL-Like Query Language • Get what you ask for –Nothing more, nothing less • SQL-like query language –Familiar abstraction –Returns a stream –Intuitive cost model • Traffic Monitoring Select(bytes) * Where(in:2 & srcport:80) * GroupBy([dstmac]) * Every(60) Learning Host Location * Minimize controller overhead Select(packets) GroupBy([srcmac]) * –Filter using high-level patterns SplitWhen([inport]) * Limit(1) –Limit the # of values returned –Aggregate by #/size of packets 31 Next Steps • Assignments – Buffer bloat due 3pm Tuesday October 14 – BGP/Pyretic assignment due 5pm Friday November 14 • Project proposals – Up to two pages, due 5pm Friday October 17 • Next two weeks – Interdomain routing 32