Fixing TCP in Datacenters Costin Raiciu Advanced Topics in Distributed Systems 2011 TCP Primer • Loss recovery – Fast retransmit – Timeouts • Congestion Control • Buffer sizing TCP Incast • Why does it happen? • How bad is it? TCP Incast Kills Throughput Lab Setup, Artifical Synchronization A datacenter example How can we fix it? • Application level – Add jitter – Reduce response size – Use aggregation Jitter increases mean delay Fixing Incast at the Transport Layer • Quickly recover after timeouts • Or just avoid the timeouts Quickly recover the timeouts • Remove RTOmin bound • Millisecond or lower time resolution • A whole paper about this in Sigcomm 2009 – But is this enough? Fixing Incast at Lower Layers • Add more buffering to switches? – Expensive • Add shared buffering? Datacenter Traffic Datacenter Traffic (2) DCTCP • Want to be robust to incast • Want to avoid interference between short and long flows • Want to avoid buffer pressure • We can do all this with small buffer usage How might we do that? • TCP shouldn’t blow out the buffer – Use delay? How might we do that? • TCP shouldn’t blow out the buffer – Use delay? • Explicit Congestion Notification in switches – Switches tell you when to back off How might we do that? • TCP shouldn’t blow out the buffer – Use delay? • Explicit Congestion Notification in switches – Switches tell you when to back off • But TCP will underutilize the network if CWND<2*BDP DCTCP • Find out alpha fraction of packets that saw congestion • Set cwnd = cwnd * (1-alpha/2) • Alpha is estimated using ECN signals – EWMA DCTCP Convergence • Consider what happens when a new connections starts • How long does it take to reach equilibrium? – With TCP? – With DCTCP? Conclusions • TCP is heavily used in DCs – But sometimes its not ideal • Simple changes can fix its shortcomings • The same problem (incast) can be fixed at many layers Your Presentations • Read your article very carefully, several times • Tried to understand the “gist” of it – What differentiates it from previous work – What is good about it – What is less good • Did they achieve their goals? • How would you design a solution to their problem? Your Slides • • • • Aim for 40-50 slides at most Include as many animations as you can Rehearse presentation at home a few times Do not overcrowd your slides – 2-4 bullets per slide are ideal – Anything more is difficult to read Do NOT • Add outline slides very often (or at all) • Add a blank “Thank you” or “Questions” last slide – Always finish your talk on a slide with content • Read from slides • Stare at the screen • Put everything you have to say on the slide