PPSP Protocol Considerations and Tracker Protocol draft-gu-ppsp-tracker-protocol-01 Y. J. Gu, David A. Bryan, Y. Zhang, H. Liao IETF-78 Maastricht, PPSP Session My Thoughts… • Trying to share a picture of what a PPSP deployment might look like – When possible, want to reuse protocols. Where can we and what? • Given an architecture, what does the tracker protocol for that look like? – Short overview of the protocol details of BitTorrent – These are are own (very early!) thoughts, so may be wrong, but hope to stimulate discussion 1 What are we trying to do? • There are two basic protocols (or protocol operations) – Tracker protocol • How to find other peers that share the information • Focus of this talk, but need to discuss peer protocol a bit to make sense… – Peer protocol • How to get information from those peers once you have found them • But we seem to be looking at two different tasks – Offline/timeshifted media (essentially file sharing) – Streaming/realtime media 2 Tracker Protocol • Need some way to locate the peers that are sharing same content • Don’t have a direct protocol to reuse from IETF • Basic idea is that tracker functionality is on a single server, but could be distributed (DHT) – Note that this is essentially a client-server protocol. Could distribute as a DHT underneath, but the tracker-peer operation is basically C/S – I’ve heard proposal to use RELOAD, this works if the tracker is made up of a distributed set of peers – Anaheim meeting indicated interest in tracker as a server, so seems RELOADs only application here is possibly as an implementation detail underneath or as an alternate distributed implementation 3 BitTorrent as a Model? • Approach seems well suited to offline use case • May not have all the information we want/need for streaming – Need to find peers “nearby” in the stream – Should tracker attempt to do this? – If not, many peers may need to be contacted to find one in “right place” (depending on window size, pause, etc) • Possible issues – Security (or lack of) – HTTP approach is somewhat heavy (but easy) – Do we want to incorporate metadata into tracker (not offline in torrent file)? • Need to specify syntax for these metadata files 4 Some BT Basics • BT’s primary purpose/design is for file sharing (not originally designed for live streaming) • Peers that share a particular file cluster together to share portions of a file form a swarm 5 BitTorrent Entities • Peers: Hosts that hold some portion of the swarm are peers. Peers exchange blocks, and a set of blocks makes up a piece of the file – A Seed is a peer with the entire file • Tracker: A central server that stores a mapping between a swarm and the peers participating in that swarm – Tracker doesn’t store which peers have which pieces, just list of the peers – Tracker is located offline… 6 File and Metadata • Original person sharing the file splits it up into pieces, and performs an MD5 hash on each piece • The list of pieces and their hashes, and the location of a tracker that will serve peers sharing this file are placed into a metadata file called a torrent • Original user places torrent on a web server, and subscribes to tracker with all chunks, as the initial seed 7 Example Startup 1) File is chopped up, MD5 sum generated Tracker Chunk1 Chunk2 Chunk3 … 2) Torrent file lists chunks, sums, and tracker to use for swarm 3) Torrent file is stored on web server Tracker Chunk1 Chunk2 Chunk3 … P T 4) Peer connects to Tracker with entire file as seed 8 Example Join Swarm 1) Peer connects to web site and obtains torrent file to locate tracker Tracker Chunk1 Chunk2 Chunk3 … P P T 2) Peer connects to Tracker to find other peers P P P 3) Peer connects to other peers in swarm and exchanges chunks P 9 Peer Exchange • Peers exchange blocks or chunks – Swap smaller bits than described in metafile, MD5 verify assembled chunks • Simple gossip protocol – – – – – – Generally unstructured, not a DHT Connect to peers that may have desired chunks Exchange bitfield msgs indicating which chunks it has Request to ask for a chunk, returned in a piece msg Once a piece is downloaded, advertised with have msg Interested/not interested and choked/not choked are used in flow control – keep-alive message 10 BitTorrent Protocol Details • Regular HTTP is used to obtain the torrent from a web server • The tracker protocol is also HTTP, essentially GET to ask for a list of peers/join swarm • The peer protocol is a TCP wire protocol (binary) 11 Our (Strawman) Proposal • New -01 version coming soon – Yingjie Gu, David Bryan, Yunfei Zhang, Hongluan Liao • Currently propose binary protocol (but open) – Light weight, aesthetic considerations – Could also use HTTP with XML or something similar • Messages to connect to a tracker/disconnect – Credential verification, verify peer-ID later used by peer protocol – Credential issuance/peer-ID assignment not (necessarily) by tracker • Messages to join/leave a swarm (and get list of peers) – Currently can store location in stream/get peers at this location…may be hard to implement • Diagnostics between peers and tracker, keep-alive messages, query list of swarms from tracker • Will describe in detail in later talk 12 Peer Protocol Considerations • New transport is out of scope. – Offline and Streaming scenarios • Need to reuse existing protocols • SRTP/RTP for streaming. ??? For offline • Should work of LEDBAT be leveraged here? • Lightweight gossip protocol between peers – Typical for BT is 20-50 peers, in an unstructured way – Is RELOAD suited for this, or will we need something lighter? • Try a RELOAD usage and find out? 13 What might this look like? Offline/Time Shifted Scenario T New peer protocol (BT based?) to find peers, get metadata (not specific chunks) P P Existing transport protocol to obtain chunks from peer (leverage LEDBAT?) Lightweight gossip protocol P between peers to find chunks (RELOAD usage or new?) P 14 What might this look like? Streaming Scenario T New peer protocol (BT based?) to find peers, get metadata, where in stream peer is? P P Existing transport protocol to stream the data (RTP/SRTP?) Lightweight gossip protocol P between peers to start/stop stream to other peers P 15 Protocol Reuse? • May be many places where we can reuse existing protocols, but in some cases, using for things we haven’t done before – LEDBAT? – RELOAD for lightweight/gossip protocol (not DHT)? – New protocol for Tracker or HTTP with XML bodies or something similar? 16 Tracker Protocol Proposal • New -01 version released (quite a few changes) – Yingjie Gu, David A. Bryan, Yunfei Zhang, Hongluan Liao • Key changes: – Changed names of several messages • Name and semantic meaning not aligning – Added XML/HTTP Encoding • Authors don’t view encoding as that important right not -- set of messages and semantic meaning is what is critical • Encodings are proof of concept • This is a basic overview, not a detailed description (can read draft shortly when we iterate for all details) • Still very early work -- primary focus is exploring problems through design/early implementation • A number of hard questions are being left open for WG input, and I’ll talk about those today • This is by no means complete right now -- lots of work left to do!!! • Authors are very interested in suggestions 17 Messages • CONNECT/DISCONNECT – Associate with server, verify credentials – DISCONNECT removes from all swarms, leaves system • JOIN/LEAVE – Participate in a particular swarm (streaming or file for VoD) – Possibly a JOIN_CHUNK to allow for specifying where in a live streaming (but big can of worms…) • FIND – Given a swarm, locate a number of peers – Leaving room to specify a criteria (where in stream (if we allow JOIN_CHUNK), capacity, possibly certain layers – Quality (ALTO in the tracker, or do peers do ALTO?) 18 Messages, Continued • STATs messages – STAT_QUERY/STAT_REPORT – Send info to tracker / query about other peers – Tracker can also poll peers • KEEPALIVE – Limit on live time for peers to tracker, so if no requests in a certain time, refresh connections – Another option is to expire either CONNECT or JOIN and require a subsequent call… • QUERY (do we need this?) – Search for swarms/list swarms (Tracker protocol or should this be something else?) – Not currently in the draft 19 Open Issues/Considerations • Binary vs. text encoding – Transport/security mechanism if not HTTP • Need to define format for metadata describing the file • Peer-IDs used in many messages, but assignment is offline. Do we want a version of connect that issues IDs? • Response of list of peers depends on peer protocol used -- IP address vs. Peer-ID only • NAT traversal needs to be considered 20 Metafile • Differs slightly for streaming/VoD • VoD: – Needs to describe chunk format, number of chunks, break down (sizing), support for layered encodings, codec – MD5 sums of blocks (or collections of blocks) • Streaming – Codec, chunk size, but likely some (number of chunks, MD5 sums don’t make sense) • It is very important to get this right, but hard! 21 Impact of Peer Prot. on Tracker Prot. • If we use a DHT style peer protocol, with lookup, then at CONNECT time, the peers need to insert into the overlay – Tracker then only needs return peer-ID (after bootstrap to locate peer when connecting) • If be use an unstrutured/gossip protocol, not clear this is ideal – Random connection to 20 peers in a system of millions likely means you need to provide an address – Or, just use a DHT for routing, nothing else? 22 NAT Traversal • Easy for managed systems -- server is placed in a reachable location. • Issue: unmanaged systems: – Can’t we guarantee tracker in public space • Bigger issue: distributed tracker – Peers may very well be behind NATs – Fully distributed RELOAD may solve some of this – If just a few “super-peers”, how do we decide promotion (NAT detection is provably difficult/impossible…) 23 Conclusion • This is early work -- much still to do • Decoupling the design of the peer and tracker protocols may help with design, but some aspects are intertwined (for example peer list structure) • Very much need advice drawn from existing implementations – Biggest question: Do we have the right commands? 24 PPSP Peer Protocol draft-gu-ppsp-peer-protocol-00 Y.J. Gu, David A. Bryan IETF-78 Maastricht, PPSP Session Overview • Draft covers: – – – – Requirements for Peer Protocol Simple example of a possible flow Discussion of open issues Skeleton (currently empty) for new protocol • Define binary and text strawman proposals such as in tracker protocol? • Very early work (less developed than tracker protocol, but we will be extending/expanding) – Conversations with some vendors with deployment experience – including some since -00 • Main emphasis currently around requirements and open issues 26 Peer Protocol Requirements • Location/Connection Requirements – Req 1: Once a client has a peerlist it SHOULD be able to locate peers and connect to them with no or minimal Tracker's assistance. – Req 2: The Peer Protocol MUST provide a mechanism for peers behind different NATs/Firewalls to connect with each other. 27 Peer Protocol Requirements • Information Exchange Requirements – Req 3: The Peer Protocol SHOULD enable peers to request/return/exchange peerlists. – Req 4: The Peer Protocol SHOULD enable peers to request/return/exchange data availability, e.g. bitmap of chunks. – Req 5: The Peer Protocol MUST be able to carry different data structures for different applications. 28 Peer Protocol Requirements • Transportation/Negotiation Requirements – Req 6: The Peer Protocol MUST be able to negotiate a transportation protocol that both peers can support. • Security Requirements – Req 7: The Peer Protocol MUST guarantee peers' privacy. – Req 8: Peers SHOULD be able to verify the identity of remote peers. 29 Protocol Overview • Currently looking at peer protocol in two logical parts – Location Portion • Locate and connect with Remote Peers – Signaling Portion • Get additional peerlist (optional?) • Exchange data availability (e.g. bitmap) • Negotiate transport mechanism (e.g. protocol, port…) – Could be two protocols… – Actual transfer is yet another aspect, but may be out of scope here (beyond negotiation using signaling portion) 30 Location Protocol • Ties in intimately with tracker protocol • The decision of identity type included in the peerlists in the tracker protocol will influence the choice of peer location mechanism – IP address, Peer-ID w/DHT, other? • Effectiveness/decision on these options will influence the design of the tracker protocol. 31 Candidate Protocols • Candidate protocols include RELOAD, SIP, a new protocol, combination… – RELOAD • In theory, works for unstructured, but untested. • Could use structured simply to connect peers/create connections? – SIP • Potentially heavier than RELOAD. • No direct mechanism to support location by Peer-ID? – RELOAD w/SIP • Use RELOAD to locate/connect, SIP to negotiate? – New protocol • Single new protocol for location and negotiation? New for only some functions with reuse for others? 32 NATs and Choice of Identifier • Big issue in identifiers: NAT • In current network, most peers are behind NAT. • If IP addresses are used, ICE/STUN/TURN server functions may need to be deployed in PPSP system: – Tracker: Natural server/relay location, but will increase the burden on tracker and need long-term connection between peers and tracker. – 3rd party server? • If Peer-IDs are used (particularly reusing RELOAD) can potentially leverage ICE support, add tracker as a relay/STUN server when needed? 33 Conclusions • One of the biggest questions – and one that has strong bearing on the tracker protocol is the use of IP addresses or Peer-IDs (and a mechanism to locate them) • NAT traversal is critical • Need to identify set of operations, determine if reuse is possible • Perhaps decide on identifier issue, then work on tracker protocol first? 34 Acknowledgements • We would like to acknowledge the following for providing advice/suggestions/questions: – Roni Even, Yunfei Zhang, Hongluan Liao, Ning Zong, Daniel De Vera, Matias Barrios 35 Backup Slides HTTP/XML Approach • HTTP POST method used – May not be best approach • Bodies are encoding in XML • Tags defined in draft 37 Encoding, Transport for Binary • Currently, we have proposed a binary protocol – Light weight, low bandwidth for mobile devices – Basic ideas would apply to other encodings • Right now, the transport is left unspecified -- looking for feedback from group – Since this is essentially a client-server like operation, preference is on a persistent secure connection approach (TLS/DTLS) • May want a different approach if we do a distributed tracker – Is there a good reason to use DTLS (UDP)? • Fragmentation mechanism borrowed from RELOAD (first/last bits, offset) 38 Basic Shared Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PPSP Tracker Protocol Token | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Reserved | Method | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transaction ID +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Fragmentation | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 39 Example Specific Request (JOIN) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PeerID (160 bits...) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SwarmID (128 bits...) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Expiration Time | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 40 Messages/Responses • Transaction ID used for correlation/retransmission • Responses are numeric codes, optionally with bodies (for example when requesting a list of peers) • Currently a pure request/response protocol – No need for anything else so far 41 BitTorrent References • Official Protocol Specification (very limited!) is at http://bittorrent.org/beps/bep_0003.html • Unofficial Specifications (much more detailed) at theory.org: http://wiki.theory.org/BitTorrentSpecification 42