+ TCP

advertisement
Moving the Internet’s Transport
Layer Ahead
Michael Welzl
University of Aberdeen
9. 1. 2012
Part 1: the problem
2
It can’t be changed.
• Internet transport layer = TCP (1981), UDP
(1980)
– Service = what these protocols provide
– Does not match the diversity of today’s applications
• Probably only two truly significant (noticeable for
users) changes:
1. Addition of congestion control to TCP: 1988
2. Change of default TCP CC. in Linux to BIC: 2004
(a bit later: CUBIC) ... not IETF-approved!
3
Many more standards exist
• Getting deployed:
– Many, many TCP bug fixes
• Hardly getting deployed:
– New protocols: SCTP, DCCP
• Newer things - can’t evaluate deployment yet
(but don’t want this to end up “in the red” !)
– LEDBAT, MPTCP…
4
SCTP and DCCP in a nutshell
• SCTP: TCP++ ... mostly by removing “features”!
– TCP without stream semantics, requirements for
ordered or reliable delivery
– and a few features added, e.g. multistreaming
(ordered delivery within streams only) and
multihoming
• DCCP: congestion control for real-time
(multimedia) applications
– various CCID specs define different CC behaviors
– e.g. TFRC for smoother rate (less jitter)
5
What’s wrong?
• Internet design is inherently insecure
– hence, tendency to disable/block things that look “strange”
– TCP, UDP, and special applications, i.e. port numbers, are
considered acceptable; everything else is “strange”
 Application programmers don’t use other transport
protocols
• Design was supposed to be open...
– “Be conservative in what you send, be liberal in what
you accept”
– Reality is different (Deep Packet Inspection, ..)
6
Internet design flaw: no abstraction
• OSI had the right idea! :-) abstraction.
Source: A. Tanenbaum, Computer Networks
– Layers merely provide a service
– Lower layers + their internal operation hidden  could be replaced
• Transport layer should be especially easy to change!
7
A better Internet transport design
A more abstract transport API
1. Applications say...
– what kind of service they prefer
– what kind of traffic they will generate
2. Using its resources (protocols, signaling with the inner
network, ...), the transport layer does its best (still best
effort!) to provide a good service
– Could try a new protocol, and give up in case of failure
– Could maybe also say: “you’re even getting a guarantee here!”
8
A better Internet transport design /2
• This has been said before
– Bryan Ford and Janardhan Iyengar: “Breaking Up the Transport
Logjam”, HotNets-VII, October 2008.
http://www.brynosaurus.com/pub/net/logjam.pdf
– Michael Welzl: "A Case for Middleware to Enable Advanced
Internet Services", NGNM'04 workshop, co-located with
Networking 2004, Athens, Greece, 14 May, 2004
http://heim.ifi.uio.no/~michawe/research/publications/ngnm04.pdf
• The problem might not have occurred with this...
– but this doesn’t help us now.
– so how can we get there?
9
What is needed
• Make it attractive to use new protocols
1. show benefits
•
surprisingly little done so far; some serious implementation
issues (e.g. with Linux SCTP)
2. make it easy to use them
•
minimal change to applications, or even no change at all
• Make sure that there’s no harm in trying them
– Downward compatibility: fall back to TCP/UDP
– Use tricks to get packets across
10
Part 2: things we can do
11
Transparent beneficial
deployment of SCTP
[Florian Niederbacher: “Beneficial gradual deployment of SCTP”, MSc. thesis,
University of Innsbruck, February 2010]
[Michael Welzl, Florian Niederbacher, Stein Gjessing: "Beneficial Transparent
Deployment of SCTP: the Missing Pieces", IEEE GlobeCom 2011, Dec 2011,
Houston, Texas.]
12
Underlying idea
• SCTP is already (somewhat) attractive
– resilience can improve if used transparently
(automatically use multihoming)
• Can get more benefits via
transparent usage: using
multi-streaming
– map short TCP connections
onto long SCTP association,
exploit large congestion
window when this is beneficial
13
Step 1: general performance check
14
Step 2: implementation
15
Step 3: Test
16
16
Conclusion from SCTP experiment
• Doing this right is probably worth it, but it’s hard
– kernel implementation required
– fixes to SCTP required
• per-stream flow control, improving SCTP performance via
auto-buffer-tuning and pluggable congestion control
– protocol setup / TCP fall-back mechanism required
– either decide when to map (hard / ugly?) or let
SCTP with multiple streams be more aggressive
(like MulTCP); research required for doing this right
17
Trying to use a new protocol
with TCP as a fall-back
[Bryan Ford, Janardhan Iyengar: “Efficient Cross-Layer Negotiation”, HotNets
2009]
[D. Wing, A. Yourtchenko, P. Natarajan: “Happy Eyeballs: Trending Towards
Success with SCTP”, Internet-draft draft-wing-tsvwg-happy-eyeballs-sctp-02,
October 2010]
[Discussions with Bryan Ford, Jana Iyengar, Michael Tüxen, Joe Touch]
18
Just try!
• Happy Eyeballs
– most prominent (and maybe most realistic) among
several suggested methods
– Originally proposed for IPv6 / IPv4 and SCTP / TCP, then
split into two drafts
– We focus on SCTP / TCP (and keep DCCP in mind)
• Algorithm:
– Send a TCP SYN + SCTP INIT; use the first answer
– Optional: delay TCP processing a bit in case
TCP SYN/ACK arrives before SCTP INIT-ACK
19
The early-TCP-SYN-ACK-problem
• When Florian started his thesis, he tried this, and
he told me that TCP always returned first
– Obvious: more efficient processing in server
• How to cope with this?
–
–
–
–
Ignore late SCTP INIT-ACK: SCTP often not used
Switch to SCTP on-the-fly: does not seem feasible
Delay TCP processing: always affects TCP connections
Use SCTP for later connections: delayed effect of SCTP,
must know that there will be later connections
20
An idea: “NOT-TCP” option / bit in TCP SYN
Case 1:
New server
Case 2:
Old server
SCTP INIT, TCP SYN: “NOT-TCP”
SCTP INIT, TCP SYN: “NOT-TCP”
TCP SYN-ACK: “NOT-TCP”
TCP SYN-ACK
SCTP INIT-ACK
WAIT
TCP ACK
SCTP COOKIE-ECHO
Client
Server
Ignore,
or use
for next
connection
SCTP INIT-ACK
Client
Server
21
Not-TCP concerns
• Encoding issues
– TCP option space limited
– Overloading a bit (e.g. CWR): will middle-boxes kill such
TCP SYNs? Bit overloading also limits protocol choice – e.g.
option can carry “use any other protocol from this list”
– even processing an unknown TCP option is work
(might be a problem for a busy server)
 to be evaluated – state needed?
• Not-TCP binds TCP+SCTP/DCCP/.. ports together
22
NOT-TCP and state
+ TCP
+ SCTP
SCTP INIT, TCP SYN: “NOT-TCP”
+ TCP
TCP SYN-ACK: “NOT-TCP”
SCTP INIT-ACK
- TCP
SCTP COOKIE-ECHO
- TCP
+ SCTP
Client
Server
• Both sides can
immediately tear down all
TCP state after
exchanging “Not-TCP”
and an SCTP packet
• (Only?) advantage over
TCP SYN Cookies: no
need for teardown (FIN /
FIN/ACK)
23
Better negotiation for later connections?
• Simply trying protocol x instead of y may not be good
enough in the long run
– Scalability concerns when we try:
SIP/TLS/DCCP/IPv6 vs. SIP/UDP/IPv4 vs. ....
• Hosts should be able to negotiate the protocol
combination; proposals exist
– separate protocol w/ preference graphs (HotNets Ford / Iyengar)
– signaling over HTTP (draft-wood-tae-specifying-uri-transports-08)
– out-of-band (e.g. DNS server) [Iyengar @ Hiroshima IETF]
• but: must also try if chosen combination works!
24
Towards a ProtocolIndependent Transport API
[Stefan Jörer: A Protocol-Independent Internet Transport API, MSc. thesis,
University of Innsbruck, December 2010]
[Michael Welzl, Stefan Jörer, Stein Gjessing: "Towards a Protocol-Independent
Internet Transport API”, FutureNet IV workshop, ICC 2011, June 2011, Kyoto
Japan]
25
Two approaches
• Top-down: start with application needs
(“QoS-view”)
+ flexible, good for future use of new protocols
– loss of service granularity: some protocol features
may never be used
– been there, done that, achieved nothing...
• Bottom-up: unified view of services of all existing
Internet transport protocols
+ No loss of granularity, all services preserved
+ Totally new approach, concrete design space
– May need to be updated in the future
26
Our chosen design method
• Bottom-up: TCP, UDP, SCTP, DCCP, UDP-Lite
– start with lists from key references
• Step 1: from list of protocol features, carefully
identify application-relevant services
– features that would not be exposed in APIs of the
individual protocols are protocol internals (e.g. ECN)
• Result: table with a line for every possible
combination of features
– 43 lines: 32 SCTP, 3 TCP/UDP
27
Our chosen design method
•
Step 2: carry out obvious further reductions
–
–
e.g. flow control coupled with congestion control
duplicates, subsets
• Apply common sense to go beyond purely
mechanical result of step 1
– Question: would an application have a reason to say
“no” to this service under certain circumstances?
– Features that are just performance improvements if
they are used correctly (i.e. depending on
environment, not app) are not services
28
Result
of
Step 2
x = always on
empty = never on
P1 = partial error
detection
t = total reliability
p2 = partial reliability
o = ordered
u = unordered
29
29
API Design
• Goal: make usage attractive = easy
– stick with what programmers already know: deviate
as little as possible from socket interface
• Most services chosen upon socket creation
– int socket(int domain, int service)
– service number identifies line number in table
– understandable aliases: e.g. PI_TCPLIKE_NODELAY,
PI_TCPLIKE, PI_NO_CC_UNRELIABLE for lines 1-3
• Sending / receiving: provide sendmsg, recvmsg
30
API Design /2
• We classified features as
– static: only chosen upon socket creation
• flow characteristic
– configurable: chosen upon socket creation +
adjusted later with setsockopt
• error detection, reliability, multi-homing
– dynamic: no need to specify in advance
• application PDU bundling (Nagle in TCP)
• delivery order: socket option or flags field
31
Implementation example: unordered
reliable message delivery with SCTP
• Variant on the right:
Based on
draft- ietf- tsvwgsctpsocket-23
• Could not make this
work in our testbed
(suspect: bug in
SCTP socket API)
32
Implementation example: unordered
reliable message delivery with SCTP /2
• SCTP, version 2 (this worked)
– socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP)
– set SCTP_NODELAY with setsockopt
– followed by (10 parameters!):
sctp_sendmsg(sockfd, textMsg, msgLength,
NULL, 0, 0, SCTP_UNORDERED, 1, 0, 0);
• PI_API version
– pi_socket(PF_INET, 12);
– pi_sendmsg(sockfd, &msg, 0);
33
Tricks to get packets across
[Minion—an All-Terrain Packet Packhorse to Jump-Start Stalled Internet
Transports, Janardhan Iyengar, Bryan Ford, Dishant Ailawadi, Syed Obaid Amin,
Michael Nowlan, Nabin Tiwari, and Jeff Wise. 8th International Workshop on
Protocols for Future, Large-Scale & Diverse Network Transports (PFLDNeT),
November 2010.]
[Communication with David Ros Sanchez]
34
UDP vs. TCP
• Tunneling over UDP = obvious choice: UDP
doesn’t do much (ports + checksum)
– What if UDP can’t pass through?
• TCP approach #1: Minion
– Extend TCP implementation with (most) SCTP
functions, extending the header into the payload
– Key function: message based processing; achieved
via a separation marker
35
TCP approach #2: polymorphic headers
• Correct TCP on the wire, but changed semantics
– extend new protocol’s implementation to reformat
headers as packets are transmitted / received
36
Conclusion
• There’s a chance to get a more flexible Internet
transport layer
– A lot of useful, fun work to be done
• Join me! 
37
Thank you!
Questions?
38
Backup slides
39
Ensure that SCTP and DCCP
are attractive
[Florian Niederbacher:” Florian Niederbacher: “Beneficial gradual deployment of
SCTP”, MSc. thesis, University of Innsbruck, February 2010]
[Dragana Damjanovic: "Parallel TCP Data Transfers: A Practical Model and its
Application", Ph.D. thesis, University of Innsbruck, February 2010]
[Dragana Damjanovic, Michael Welzl: “An Extension of the TCP Steady-State
Throughput Equation for Parallel Flows and its Application in MulTFRC”,
IEEE/ACM Transactions on Networking 19(6), December 2011]
[Communication with Michael Tuexen and Morten Hustveit]
40
SCTP (as of February 2011)
• Essentially, SCTP = TCP++
– what’s the point if it performs worse than TCP?
– so that should never happen
• Two sides to this
– Implementation issues: FreeBSD is well
maintained, but Linux has many problems
– Specification issues: One SCTP association with N
streams should never perform worse than N TCP
connections
(and sometimes better)
41
Linux SCTP implementation
• Florian Niederbacher detected problems
– mainly: lack of auto-buffer tuning and
pluggable congestion control
• Auto-buffer tuning now available:
http://tinyurl.com/4bhxt74 and patch submitted:
http://tinyurl.com/45ng5d6
• Pluggable congestion control has been
implemented but doesn’t work yet
42
More (smaller) issues
• Missing sender-dry-event (functional shortcoming,
necessary for DTLS over SCTP)
– Linux only; pointed out by Michael Tüxen
• Sending too little data in Slow-Start
– Linux only; detected by Florian, probably wrong implementation of
ABC or side-effect from burst mitigation
• Wrong calculation of header overhead
– General problem, presented by Michael Tüxen @ PFLDNeT10
43
More (smaller) issues /2
• Message size > MTU needed for efficient operation in smallRTT environments
– Linux only; detected by Florian, confirmed by Stefan Jörer, probably
caused by overhead of send/recv system calls
• Flow control might be buggy
– Pointed out by Michael Tüxen; not confirmed yet
• Making implementation more up-to-date: “improving
robustness against non-congestion events”, Andreas
Petlund’s thin stream work, spurious loss event detection
44
DCCP
• TCP-like, Smooth, and Smooth-SP flow characteristic,
each with unordered delivery and either partial or full error
protection
– Smooth-SP is not a “feature” (!), it’s a necessity
– partial error correction is rather experimental
• So, “smooth” (TCP-friendly) behavior is the only real news
– TFRC congestion control; is this enough as a selling argument?
45
MulTFRC
• TFRC in a nutshell
– smooth (  less jitter) yet TCP-friendly sending rate
– receiver ACKs, sender gets measurements
– sender constantly calculates TCP steady-state throughput
equation, sends at calculated rate
• We derived an extension of this equation which
yields the rate of N flows
– plug that into TFRC, and it becomes MulTFRC
46
Who cares?
• Bottleneck saturation of N TCPs (w/o queues): 100100/(1+3N) %
[Altman, E., Barman, D., Tuffin, B., and M. Vojnovic, "Parallel TCP Sockets: Simple
Model, Throughput and Validation", Infocom 2006]
– 1: 75%. 2: 85.7% ... 6: 95%  MulTFRC with N=6 nicely
saturates your bottleneck (except high bw*delay link)
– no need to stripe data across multiple connections
– less overhead (also only one setup/teardown) Future work?
• Nε
R+
- can also be 0 < N < 1
e.g. Mul-CUBIC-FRC?
– useful for less important traffic and multi-path (cf. MPTCP)
• can give a knob to users
47
Towards a ProtocolIndependent Transport API
[Stefan Jörer: A Protocol-Independent Internet Transport API, MSc. thesis,
University of Innsbruck, December 2010]
[Michael Welzl, Stefan Jörer, Stein Gjessing: "Towards a Protocol-Independent
Internet Transport API”, FutureNet IV workshop, ICC 2011, June 2011, Kyoto
Japan]
48
Two approaches
• Top-down: start with application needs (“QoSview”)
+ flexible, good for future use of new protocols
– loss of service granularity: some protocol features
may never be used
– been there, done that, achieved nothing...
• Bottom-up: unified view of services of all existing
Internet transport protocols
+ No loss of granularity, all services preserved
+ Totally new approach, concrete design space
– May need to be updated in the future
49
Our chosen design method
• Bottom-up: use TCP, UDP, SCTP, DCCP, UDPLite
– start with lists from key references
• Step 1: from list of protocol features, carefully
identify application-relevant services
– features that would not be exposed in APIs of the
individual protocols are protocol internals
– e.g. ECN, selective ACK
50
Result of step 1
• x = always on, empty = never on; 0/1 = can be turned on or off
• 2/3/4 = choice between CCIDs 2, 3, 4
• P1 = partial error detection; t = total reliability, p2 = partial
reliability
• s = stream, m = message; o = ordered, u = unordered
51
Expansion
• A line for every possible
combination of features
– 43 lines: 32 SCTP, 3
TCP/UDP
• List shows reduction
possibilities (step 2)
– e.g. flow control coupled with
congestion control
– duplicates, subsets
52 52
Reduction method for step 2
• Remove services that seem unnecessary as a
result of step 1 expansion
• Apply common sense to go beyond purely
mechanical result of step 1
– Question: would an application have a reason to say “no”
to this service under certain circumstances?
– Features that are just performance improvements if they
are used correctly (i.e. depending on environment, not
app) are not services
53
Step 2
• Connection orientation
– Removing it does not affect service diversity
– User view: API is always connection oriented
– on the wire, non-congestion-controlled service will
always use UDP or UDP-Lite
– static distinction, clear by documentation
• Delivery type
– easy for API to provide streams on top of message
transport
– no need to expose this as a service
54
Step 2, contd.
• Multi-streaming
– Performance improvement, depending on environment
conditions / congestion control behavior, not an application
service
• Congestion control renamed  “flow characteristic”
• Multi-homing kept although not an app. service
– this is part of a different discussion
– could be removed above our API
55
Result
of
Step 2
56
56
API Design
• Goal: make usage attractive = easy
– stick with what programmers already know: deviate
as little as possible from socket interface
• Most services chosen upon socket creation
– int socket(int domain, int service)
– service number identifies line number in table
– understandable aliases: e.g. PI_TCPLIKE_NODELAY,
PI_TCPLIKE, PI_NO_CC_UNRELIABLE for lines 1-3
• Sending / receiving: provide sendmsg, recvmsg;
for services 1,2,11,17: send, recv
57
API Design /2
• We classified features as
– static: only chosen upon socket creation
• flow characteristic
– configurable: chosen upon socket creation +
adjusted later with setsockopt
• error detection, reliability, multi-homing
– dynamic: no need to specify in advance
• application PDU bundling (Nagle in TCP)
• delivery order: socket option or flags field
58
Implementation example: unordered
reliable message delivery with SCTP
• Variant on the right:
Based on
draft- ietf- tsvwgsctpsocket-23
• Could not make this
work in our testbed
(suspect: bug in
SCTP socket API)
59
Implementation example: unordered
reliable message delivery with SCTP /2
• SCTP, version 2 (this worked)
– socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP)
– set SCTP_NODELAY with setsockopt
– followed by (10 parameters!):
sctp_sendmsg(sockfd, textMsg, msgLength,
NULL, 0, 0, SCTP_UNORDERED, 1, 0, 0);
• PI_API version
– pi_socket(PF_INET, 12);
– pi_sendmsg(sockfd, &msg, 0);
60
Download