Moving the Internet’s Transport Layer Ahead Michael Welzl University of Aberdeen 9. 1. 2012 Part 1: the problem 2 It can’t be changed. • Internet transport layer = TCP (1981), UDP (1980) – Service = what these protocols provide – Does not match the diversity of today’s applications • Probably only two truly significant (noticeable for users) changes: 1. Addition of congestion control to TCP: 1988 2. Change of default TCP CC. in Linux to BIC: 2004 (a bit later: CUBIC) ... not IETF-approved! 3 Many more standards exist • Getting deployed: – Many, many TCP bug fixes • Hardly getting deployed: – New protocols: SCTP, DCCP • Newer things - can’t evaluate deployment yet (but don’t want this to end up “in the red” !) – LEDBAT, MPTCP… 4 SCTP and DCCP in a nutshell • SCTP: TCP++ ... mostly by removing “features”! – TCP without stream semantics, requirements for ordered or reliable delivery – and a few features added, e.g. multistreaming (ordered delivery within streams only) and multihoming • DCCP: congestion control for real-time (multimedia) applications – various CCID specs define different CC behaviors – e.g. TFRC for smoother rate (less jitter) 5 What’s wrong? • Internet design is inherently insecure – hence, tendency to disable/block things that look “strange” – TCP, UDP, and special applications, i.e. port numbers, are considered acceptable; everything else is “strange” Application programmers don’t use other transport protocols • Design was supposed to be open... – “Be conservative in what you send, be liberal in what you accept” – Reality is different (Deep Packet Inspection, ..) 6 Internet design flaw: no abstraction • OSI had the right idea! :-) abstraction. Source: A. Tanenbaum, Computer Networks – Layers merely provide a service – Lower layers + their internal operation hidden could be replaced • Transport layer should be especially easy to change! 7 A better Internet transport design A more abstract transport API 1. Applications say... – what kind of service they prefer – what kind of traffic they will generate 2. Using its resources (protocols, signaling with the inner network, ...), the transport layer does its best (still best effort!) to provide a good service – Could try a new protocol, and give up in case of failure – Could maybe also say: “you’re even getting a guarantee here!” 8 A better Internet transport design /2 • This has been said before – Bryan Ford and Janardhan Iyengar: “Breaking Up the Transport Logjam”, HotNets-VII, October 2008. http://www.brynosaurus.com/pub/net/logjam.pdf – Michael Welzl: "A Case for Middleware to Enable Advanced Internet Services", NGNM'04 workshop, co-located with Networking 2004, Athens, Greece, 14 May, 2004 http://heim.ifi.uio.no/~michawe/research/publications/ngnm04.pdf • The problem might not have occurred with this... – but this doesn’t help us now. – so how can we get there? 9 What is needed • Make it attractive to use new protocols 1. show benefits • surprisingly little done so far; some serious implementation issues (e.g. with Linux SCTP) 2. make it easy to use them • minimal change to applications, or even no change at all • Make sure that there’s no harm in trying them – Downward compatibility: fall back to TCP/UDP – Use tricks to get packets across 10 Part 2: things we can do 11 Transparent beneficial deployment of SCTP [Florian Niederbacher: “Beneficial gradual deployment of SCTP”, MSc. thesis, University of Innsbruck, February 2010] [Michael Welzl, Florian Niederbacher, Stein Gjessing: "Beneficial Transparent Deployment of SCTP: the Missing Pieces", IEEE GlobeCom 2011, Dec 2011, Houston, Texas.] 12 Underlying idea • SCTP is already (somewhat) attractive – resilience can improve if used transparently (automatically use multihoming) • Can get more benefits via transparent usage: using multi-streaming – map short TCP connections onto long SCTP association, exploit large congestion window when this is beneficial 13 Step 1: general performance check 14 Step 2: implementation 15 Step 3: Test 16 16 Conclusion from SCTP experiment • Doing this right is probably worth it, but it’s hard – kernel implementation required – fixes to SCTP required • per-stream flow control, improving SCTP performance via auto-buffer-tuning and pluggable congestion control – protocol setup / TCP fall-back mechanism required – either decide when to map (hard / ugly?) or let SCTP with multiple streams be more aggressive (like MulTCP); research required for doing this right 17 Trying to use a new protocol with TCP as a fall-back [Bryan Ford, Janardhan Iyengar: “Efficient Cross-Layer Negotiation”, HotNets 2009] [D. Wing, A. Yourtchenko, P. Natarajan: “Happy Eyeballs: Trending Towards Success with SCTP”, Internet-draft draft-wing-tsvwg-happy-eyeballs-sctp-02, October 2010] [Discussions with Bryan Ford, Jana Iyengar, Michael Tüxen, Joe Touch] 18 Just try! • Happy Eyeballs – most prominent (and maybe most realistic) among several suggested methods – Originally proposed for IPv6 / IPv4 and SCTP / TCP, then split into two drafts – We focus on SCTP / TCP (and keep DCCP in mind) • Algorithm: – Send a TCP SYN + SCTP INIT; use the first answer – Optional: delay TCP processing a bit in case TCP SYN/ACK arrives before SCTP INIT-ACK 19 The early-TCP-SYN-ACK-problem • When Florian started his thesis, he tried this, and he told me that TCP always returned first – Obvious: more efficient processing in server • How to cope with this? – – – – Ignore late SCTP INIT-ACK: SCTP often not used Switch to SCTP on-the-fly: does not seem feasible Delay TCP processing: always affects TCP connections Use SCTP for later connections: delayed effect of SCTP, must know that there will be later connections 20 An idea: “NOT-TCP” option / bit in TCP SYN Case 1: New server Case 2: Old server SCTP INIT, TCP SYN: “NOT-TCP” SCTP INIT, TCP SYN: “NOT-TCP” TCP SYN-ACK: “NOT-TCP” TCP SYN-ACK SCTP INIT-ACK WAIT TCP ACK SCTP COOKIE-ECHO Client Server Ignore, or use for next connection SCTP INIT-ACK Client Server 21 Not-TCP concerns • Encoding issues – TCP option space limited – Overloading a bit (e.g. CWR): will middle-boxes kill such TCP SYNs? Bit overloading also limits protocol choice – e.g. option can carry “use any other protocol from this list” – even processing an unknown TCP option is work (might be a problem for a busy server) to be evaluated – state needed? • Not-TCP binds TCP+SCTP/DCCP/.. ports together 22 NOT-TCP and state + TCP + SCTP SCTP INIT, TCP SYN: “NOT-TCP” + TCP TCP SYN-ACK: “NOT-TCP” SCTP INIT-ACK - TCP SCTP COOKIE-ECHO - TCP + SCTP Client Server • Both sides can immediately tear down all TCP state after exchanging “Not-TCP” and an SCTP packet • (Only?) advantage over TCP SYN Cookies: no need for teardown (FIN / FIN/ACK) 23 Better negotiation for later connections? • Simply trying protocol x instead of y may not be good enough in the long run – Scalability concerns when we try: SIP/TLS/DCCP/IPv6 vs. SIP/UDP/IPv4 vs. .... • Hosts should be able to negotiate the protocol combination; proposals exist – separate protocol w/ preference graphs (HotNets Ford / Iyengar) – signaling over HTTP (draft-wood-tae-specifying-uri-transports-08) – out-of-band (e.g. DNS server) [Iyengar @ Hiroshima IETF] • but: must also try if chosen combination works! 24 Towards a ProtocolIndependent Transport API [Stefan Jörer: A Protocol-Independent Internet Transport API, MSc. thesis, University of Innsbruck, December 2010] [Michael Welzl, Stefan Jörer, Stein Gjessing: "Towards a Protocol-Independent Internet Transport API”, FutureNet IV workshop, ICC 2011, June 2011, Kyoto Japan] 25 Two approaches • Top-down: start with application needs (“QoS-view”) + flexible, good for future use of new protocols – loss of service granularity: some protocol features may never be used – been there, done that, achieved nothing... • Bottom-up: unified view of services of all existing Internet transport protocols + No loss of granularity, all services preserved + Totally new approach, concrete design space – May need to be updated in the future 26 Our chosen design method • Bottom-up: TCP, UDP, SCTP, DCCP, UDP-Lite – start with lists from key references • Step 1: from list of protocol features, carefully identify application-relevant services – features that would not be exposed in APIs of the individual protocols are protocol internals (e.g. ECN) • Result: table with a line for every possible combination of features – 43 lines: 32 SCTP, 3 TCP/UDP 27 Our chosen design method • Step 2: carry out obvious further reductions – – e.g. flow control coupled with congestion control duplicates, subsets • Apply common sense to go beyond purely mechanical result of step 1 – Question: would an application have a reason to say “no” to this service under certain circumstances? – Features that are just performance improvements if they are used correctly (i.e. depending on environment, not app) are not services 28 Result of Step 2 x = always on empty = never on P1 = partial error detection t = total reliability p2 = partial reliability o = ordered u = unordered 29 29 API Design • Goal: make usage attractive = easy – stick with what programmers already know: deviate as little as possible from socket interface • Most services chosen upon socket creation – int socket(int domain, int service) – service number identifies line number in table – understandable aliases: e.g. PI_TCPLIKE_NODELAY, PI_TCPLIKE, PI_NO_CC_UNRELIABLE for lines 1-3 • Sending / receiving: provide sendmsg, recvmsg 30 API Design /2 • We classified features as – static: only chosen upon socket creation • flow characteristic – configurable: chosen upon socket creation + adjusted later with setsockopt • error detection, reliability, multi-homing – dynamic: no need to specify in advance • application PDU bundling (Nagle in TCP) • delivery order: socket option or flags field 31 Implementation example: unordered reliable message delivery with SCTP • Variant on the right: Based on draft- ietf- tsvwgsctpsocket-23 • Could not make this work in our testbed (suspect: bug in SCTP socket API) 32 Implementation example: unordered reliable message delivery with SCTP /2 • SCTP, version 2 (this worked) – socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP) – set SCTP_NODELAY with setsockopt – followed by (10 parameters!): sctp_sendmsg(sockfd, textMsg, msgLength, NULL, 0, 0, SCTP_UNORDERED, 1, 0, 0); • PI_API version – pi_socket(PF_INET, 12); – pi_sendmsg(sockfd, &msg, 0); 33 Tricks to get packets across [Minion—an All-Terrain Packet Packhorse to Jump-Start Stalled Internet Transports, Janardhan Iyengar, Bryan Ford, Dishant Ailawadi, Syed Obaid Amin, Michael Nowlan, Nabin Tiwari, and Jeff Wise. 8th International Workshop on Protocols for Future, Large-Scale & Diverse Network Transports (PFLDNeT), November 2010.] [Communication with David Ros Sanchez] 34 UDP vs. TCP • Tunneling over UDP = obvious choice: UDP doesn’t do much (ports + checksum) – What if UDP can’t pass through? • TCP approach #1: Minion – Extend TCP implementation with (most) SCTP functions, extending the header into the payload – Key function: message based processing; achieved via a separation marker 35 TCP approach #2: polymorphic headers • Correct TCP on the wire, but changed semantics – extend new protocol’s implementation to reformat headers as packets are transmitted / received 36 Conclusion • There’s a chance to get a more flexible Internet transport layer – A lot of useful, fun work to be done • Join me! 37 Thank you! Questions? 38 Backup slides 39 Ensure that SCTP and DCCP are attractive [Florian Niederbacher:” Florian Niederbacher: “Beneficial gradual deployment of SCTP”, MSc. thesis, University of Innsbruck, February 2010] [Dragana Damjanovic: "Parallel TCP Data Transfers: A Practical Model and its Application", Ph.D. thesis, University of Innsbruck, February 2010] [Dragana Damjanovic, Michael Welzl: “An Extension of the TCP Steady-State Throughput Equation for Parallel Flows and its Application in MulTFRC”, IEEE/ACM Transactions on Networking 19(6), December 2011] [Communication with Michael Tuexen and Morten Hustveit] 40 SCTP (as of February 2011) • Essentially, SCTP = TCP++ – what’s the point if it performs worse than TCP? – so that should never happen • Two sides to this – Implementation issues: FreeBSD is well maintained, but Linux has many problems – Specification issues: One SCTP association with N streams should never perform worse than N TCP connections (and sometimes better) 41 Linux SCTP implementation • Florian Niederbacher detected problems – mainly: lack of auto-buffer tuning and pluggable congestion control • Auto-buffer tuning now available: http://tinyurl.com/4bhxt74 and patch submitted: http://tinyurl.com/45ng5d6 • Pluggable congestion control has been implemented but doesn’t work yet 42 More (smaller) issues • Missing sender-dry-event (functional shortcoming, necessary for DTLS over SCTP) – Linux only; pointed out by Michael Tüxen • Sending too little data in Slow-Start – Linux only; detected by Florian, probably wrong implementation of ABC or side-effect from burst mitigation • Wrong calculation of header overhead – General problem, presented by Michael Tüxen @ PFLDNeT10 43 More (smaller) issues /2 • Message size > MTU needed for efficient operation in smallRTT environments – Linux only; detected by Florian, confirmed by Stefan Jörer, probably caused by overhead of send/recv system calls • Flow control might be buggy – Pointed out by Michael Tüxen; not confirmed yet • Making implementation more up-to-date: “improving robustness against non-congestion events”, Andreas Petlund’s thin stream work, spurious loss event detection 44 DCCP • TCP-like, Smooth, and Smooth-SP flow characteristic, each with unordered delivery and either partial or full error protection – Smooth-SP is not a “feature” (!), it’s a necessity – partial error correction is rather experimental • So, “smooth” (TCP-friendly) behavior is the only real news – TFRC congestion control; is this enough as a selling argument? 45 MulTFRC • TFRC in a nutshell – smooth ( less jitter) yet TCP-friendly sending rate – receiver ACKs, sender gets measurements – sender constantly calculates TCP steady-state throughput equation, sends at calculated rate • We derived an extension of this equation which yields the rate of N flows – plug that into TFRC, and it becomes MulTFRC 46 Who cares? • Bottleneck saturation of N TCPs (w/o queues): 100100/(1+3N) % [Altman, E., Barman, D., Tuffin, B., and M. Vojnovic, "Parallel TCP Sockets: Simple Model, Throughput and Validation", Infocom 2006] – 1: 75%. 2: 85.7% ... 6: 95% MulTFRC with N=6 nicely saturates your bottleneck (except high bw*delay link) – no need to stripe data across multiple connections – less overhead (also only one setup/teardown) Future work? • Nε R+ - can also be 0 < N < 1 e.g. Mul-CUBIC-FRC? – useful for less important traffic and multi-path (cf. MPTCP) • can give a knob to users 47 Towards a ProtocolIndependent Transport API [Stefan Jörer: A Protocol-Independent Internet Transport API, MSc. thesis, University of Innsbruck, December 2010] [Michael Welzl, Stefan Jörer, Stein Gjessing: "Towards a Protocol-Independent Internet Transport API”, FutureNet IV workshop, ICC 2011, June 2011, Kyoto Japan] 48 Two approaches • Top-down: start with application needs (“QoSview”) + flexible, good for future use of new protocols – loss of service granularity: some protocol features may never be used – been there, done that, achieved nothing... • Bottom-up: unified view of services of all existing Internet transport protocols + No loss of granularity, all services preserved + Totally new approach, concrete design space – May need to be updated in the future 49 Our chosen design method • Bottom-up: use TCP, UDP, SCTP, DCCP, UDPLite – start with lists from key references • Step 1: from list of protocol features, carefully identify application-relevant services – features that would not be exposed in APIs of the individual protocols are protocol internals – e.g. ECN, selective ACK 50 Result of step 1 • x = always on, empty = never on; 0/1 = can be turned on or off • 2/3/4 = choice between CCIDs 2, 3, 4 • P1 = partial error detection; t = total reliability, p2 = partial reliability • s = stream, m = message; o = ordered, u = unordered 51 Expansion • A line for every possible combination of features – 43 lines: 32 SCTP, 3 TCP/UDP • List shows reduction possibilities (step 2) – e.g. flow control coupled with congestion control – duplicates, subsets 52 52 Reduction method for step 2 • Remove services that seem unnecessary as a result of step 1 expansion • Apply common sense to go beyond purely mechanical result of step 1 – Question: would an application have a reason to say “no” to this service under certain circumstances? – Features that are just performance improvements if they are used correctly (i.e. depending on environment, not app) are not services 53 Step 2 • Connection orientation – Removing it does not affect service diversity – User view: API is always connection oriented – on the wire, non-congestion-controlled service will always use UDP or UDP-Lite – static distinction, clear by documentation • Delivery type – easy for API to provide streams on top of message transport – no need to expose this as a service 54 Step 2, contd. • Multi-streaming – Performance improvement, depending on environment conditions / congestion control behavior, not an application service • Congestion control renamed “flow characteristic” • Multi-homing kept although not an app. service – this is part of a different discussion – could be removed above our API 55 Result of Step 2 56 56 API Design • Goal: make usage attractive = easy – stick with what programmers already know: deviate as little as possible from socket interface • Most services chosen upon socket creation – int socket(int domain, int service) – service number identifies line number in table – understandable aliases: e.g. PI_TCPLIKE_NODELAY, PI_TCPLIKE, PI_NO_CC_UNRELIABLE for lines 1-3 • Sending / receiving: provide sendmsg, recvmsg; for services 1,2,11,17: send, recv 57 API Design /2 • We classified features as – static: only chosen upon socket creation • flow characteristic – configurable: chosen upon socket creation + adjusted later with setsockopt • error detection, reliability, multi-homing – dynamic: no need to specify in advance • application PDU bundling (Nagle in TCP) • delivery order: socket option or flags field 58 Implementation example: unordered reliable message delivery with SCTP • Variant on the right: Based on draft- ietf- tsvwgsctpsocket-23 • Could not make this work in our testbed (suspect: bug in SCTP socket API) 59 Implementation example: unordered reliable message delivery with SCTP /2 • SCTP, version 2 (this worked) – socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP) – set SCTP_NODELAY with setsockopt – followed by (10 parameters!): sctp_sendmsg(sockfd, textMsg, msgLength, NULL, 0, 0, SCTP_UNORDERED, 1, 0, 0); • PI_API version – pi_socket(PF_INET, 12); – pi_sendmsg(sockfd, &msg, 0); 60