An Architecture for Internet Data Transfer Niraj Tolia Michael Kaminsky*, David G. Andersen, and Swapnil Patil Carnegie Mellon University and *Intel Research Pittsburgh Innovation in Data Transfer is Hard 2001 CAN 2001 Bittorrent 2003 2003 DataStaging Riverbed 2001 Pastry 2003 CASPER 2002 2004 Segank 2004 OpenDHT 2003 2004 BlueFS 2004 USPS 2004 2005 ALM-FastReplica 2005 2001 Chord 2001 IBP 2002 Value-Caching 2003 Bullet 2003 LBFS 2001 Avalanche 2005 2006 CoBlitz 2005 Lookaside Caching 2004 • Imagine: You have a novel data transfer technique • How do you deploy? 1. Update HTTP. Talk to IETF. Modify Apache, IIS, Firefox, Netscape, Opera, IE, Lynx, Wget, … 2. Update SMTP. Talk to IETF. Modify Sendmail, Postfix, Outlook… 3. Give up in frustration 2 Niraj Tolia © May 2006 Barriers to Innovation in Data Transfer • Applications bundle: • Content Negotiation: What data to send – Naming (URLs, directories, …) – Languages – Identification –… • Data Transfer: Getting the bits across • Both are tightly coupled (e.g., HTTP, SMTP) • Hinders innovation and evolution of new services 3 Niraj Tolia © May 2006 Solution: A Data Transfer Service Sender Application Protocol and Data Xfer Service Receiver Xfer Service Data • Decouple content negotiation from data transfer • Applications perform negotiation as before • But hand data objects to the Transfer Service • The Transfer Service is shared by applications 4 Niraj Tolia © May 2006 Extensible Transfer Architecture Sender Application Protocol Xfer Service USB Keychain Receiver Xfer Service Bittorrent Bittorrent Plugins Application-independent cache New network features Non-networked transfers 5 Local Cache USB Keychain Niraj Tolia © May 2006 Transfer Service Benefits Apps. can reuse available transfer techniques • No reimplementation needed Easier deployment of new technologies • Applications need no modification Provides for cross-application sharing • Can interpose on all data transfers Handles transient disconnections 6 Niraj Tolia © May 2006 Outline • • • • • Motivation Data Oriented Transfer (DOT) service Evaluation Open Issues and Future Work Conclusion 7 Niraj Tolia © May 2006 10,000 Foot View of Transfers using DOT Request File X Sender Receiver put(X) ? read() data ? Xfer Service Xfer Service • How does the transfer service name data? • How does the transfer service locate data? 8 Niraj Tolia © May 2006 DOT: Object Naming • Application defined names are not portable • Use content-naming for globally unique names • Objects represented by an OID Foo.txt OID Cryptographic Hash File • Objects are further sub-divided into “chunks” Desc1 Desc2 Desc3 File • Each OID corresponds to a list of descriptors • Descriptor lists allow for partial transfers 9 Niraj Tolia © May 2006 DOT: Object Location • Data transfers in DOT are receiver driven • Receiver has better idea of available resources • Senders specify ‘hints’ - potential data locations – dot://sender.example.com:12000/ – dht://opendht.org/ –… 10 Niraj Tolia © May 2006 A Transfer using DOT Request File X OID, Hints Sender put(X) OID, Hints Xfer Service Receiver get(OID, Hints) Transfer Plugins 11 read() data Xfer Service Niraj Tolia © May 2006 DOT’s Modular Architecture Application (1) Application API Transfer Network Plugin DOT (3) Storage Plugin API Storage Plugin (2) Transfer Plugin API Local Storage 12 Niraj Tolia © May 2006 Transfer Plugin API • Simple API • get_descriptor_list( OID, hints ) • get_chunks( descriptor_list, hints ) • cancel_chunks( chunk_list ) • Transfer plugin chaining is easy • e.g., multipath plugin DOT Transfer Plugin Network Transfer MultiPath Plugin Plugin 13 Transfer Plugin Niraj Tolia © May 2006 Implementation • In C++ using libasync event-driven library • One storage plugin: – In-memory hash tables, disk backed. • Three transfer plugins: • Default Xfer-Xfer plugin • Portable Storage plugin • Multipath plugin • Applications • gcp, an scp-like tool for file transfers • A DOT-enabled Postfix email server – Included a socket-like adapter library 14 Niraj Tolia © May 2006 Current DOT Prototype USB Xfer USB NET wireless Xfer NET Internet SENDER NET ( DSL ) Multipath cache RECEIVER NET Plugins Xfer Application-independent cache Multipath and Mirror support Non-networked transfers MIRROR 15 Niraj Tolia © May 2006 Outline • • • • • Motivation Data Oriented Transfer (DOT) service Evaluation Open Issues and Future Work Conclusion 16 Niraj Tolia © May 2006 Evaluation • • • • Standard file transfer Portable Storage Multi-Path Case Study: Postfix Email Server • Capture and analysis of email trace • Evaluation of DOT-enabled SMTP server • Integration effort 17 Niraj Tolia © May 2006 Standard File Transfer Setup Network Emulator • Two DOT-enabled machines • Network Emulator • Evaluate various b/w + delay combinations • Use gcp for the file transfers • Used 40MB, 4MB, 400KB, 40KB, 4KB files • Presenting 40MB here 18 Niraj Tolia © May 2006 Standard File Transfer scp scp w/o encr. gcp 3 .7 3 .6 4.0 3 .7 4 .1 5.0 3.0 1 .2 0 .7 1.0 1 .6 2.0 0 .5 Transfer Time (sec) - 40 MB wget 0.0 1000 Mb/s 100 Mb/s • Overhead: hashing, extra RTT • No noticeable overheads with latency 19 Niraj Tolia © May 2006 Portable Storage Experiment 2 Mbit/s • 255 MB transfer over emulated DSL • Based on Virtual Machine transfers at Carnegie Mellon • DOT preemptively copies data onto Flash drive • Wait 5 minutes, plug flash drive into receiver • Two drive speeds • 8MB/s - 1GB • 20MB/s - 2GB 20 Niraj Tolia © May 2006 Portable Storage Results Device Inserted .. 1126s (~ 19 min) 21 Niraj Tolia © May 2006 Multipath Plugin: Load Balancing Network Emulator Gigabit Experimental links • Varied capacity + delay of experimental links • Compare fastest link alone with multipath plugin on both links; what speedup? • Transferred 40MB file • 128 KB socket buffer sizes 22 Niraj Tolia © May 2006 Multipath Plugin is Effective Link 1 Gigabit Link 1 Link 2 100/0 100/0 10/0 Link 2 Single Multipath Savings 3.59 1.90 3.54 47% 1.4% -40 MB @100/66 100Mbit/s ideal: 3.223.20 seconds 46% 100/66 10/66 43.33 22.97 47% -Multipath 1/66 plugin nearly doubles 38.25throughput 12% - TCP effects dominate. Pipe not full. - Multipath plugin doubles by adding second stream. Actual capacity irrelevant. 23 Niraj Tolia © May 2006 Postfix Email Trace Replay • Generated 10,000 email messages from trace • Random data matched to chunk hash data • Preserves some similarity between messages • Replayed through Postfix to a single local server Program Seconds Bytes Sent Postfix 468 172 MB Postfix + DOT 468 117 MB (68%) • Postfix disk bound… DOT CPU overhead negligible • Savings due to duplication within emails 24 Niraj Tolia © May 2006 Postfix Integration • Integrated DOT with the Postfix mail server Program LoC Added LoC % GTC Lib -- 421 Postfix 70,824 184 0.3% smtpd 6,413 107 1.7% smtp 3,378 71 2.1% • 1 part-time week, 1 student new to Postfix • Includes time to write generic adapter library 25 Niraj Tolia © May 2006 Discussion on Deployment • Application Resilience • DOT is a service - it’s outside the control of the application. • Our Postfix falls back to normal SMTP if – No Transfer Service contact – Transfer keeps failing • In the short term, a simple fallback is encouraged. However, this could interfere with some functions – DOT-based virus scanner… • In the long term, DOT would be a part of a system’s core infrastructure 26 Niraj Tolia © May 2006 Future Work • Security • Application encrypts before DOT - No block-based caching, reuse, mirroring, … • No encryption - Resembles the status quo • In progress: Convergent encryption – Requires integration with DOT chunking • Application Preferences • Encryption, QoS, priorities, … – DOT might benefit from application input • Need an extensible way to express these 27 Niraj Tolia © May 2006 Conclusion • DOT separates app. logic from data transfer • Makes it easier to extend both • Architecture works well • Overhead low (especially in wide-area) • Major benefits • Caching • Flexibility to implement new transfer techniques 28 Niraj Tolia © May 2006 Backup Slides 29 Niraj Tolia © May 2006 Normal SMTP Server SMTP Client EHLO 250 Hello MAIL FROM: user … DATA 250 OK 30 Niraj Tolia © May 2006 DOT-Enabled SMTP Server SMTP Client EHLO 250 Hello MAIL FROM: user … X-DOT-DATA (OID+Hints) Xfer Service 250 OK 31 Niraj Tolia © May 2006 Convergent Encryption Hash1 Hash2 File Hash3 • Chunki is encrypted using Hashi • All identical cleartext blocks will map to the same encrypted block • Hashi is further encrypted using a private key 32 Niraj Tolia © May 2006 28.0 25.5 25.4 24.0 gcp 0 .7 1 .2 1 .6 3 .6 3 .7 3 .7 4 .1 10.0 1.0 scp w/o encr. 43.5 50.4 48.1 44.9 scp 100.0 0 .5 Transfer Time (sec) - 40 MB wget 70.5 72.3 72.3 72.0 Standard File Transfer 0.1 1000 Mb/s 100 Mb/s 100 Mb/s 66ms 33 20 Mb/s 33ms 5 Mb/s 66ms Niraj Tolia © May 2006 Mail Server Evaluation • Trace: 159 days at low volume academic mail server • 458,861 messages • hash, size of: message, headers, body • Message chunks • hash and size of each chunk • Static chunking and Rabin fingerprinting (Content-based block division) 34 Niraj Tolia © May 2006 DOT chunk caching benefits email Method Total Bytes Percent Bytes SMTP Default 6800 MB - DOT body 5876 MB 86.41% Rabin body 5056 MB 74.35% Rabin whole 5496 MB 80.81% 35 Niraj Tolia © May 2006 Default GTC-GTC Transfer Protocol Sender Receiver GET_DESCRIPTORS(OID) Desc list 1,2,… GET_CHUNKS(… ) Chunk 1 Chunk 2 GET_CHUNKS(… ) • GTC-GTC protocol mirrors transfer plugins • Implemented as RPC calls • (Fetches are actually pipelined) 36 Niraj Tolia © May 2006 Rabin Fingerprinting Hash 1 Hash 2 File Data Rabin Fingerprints 4 7 8 2 Natural Boundary 8 Natural Boundary Given Value - 8 37 Niraj Tolia © May 2006 Rabin Fingerprinting: Examples of Edits 1. Original File 2. Addition in chunk – Changes only one hash Figure from “A Low-bandwidth Network File System” 3. Addition creating a new breakpoint 4. Deletion changing size of chunk 38 Niraj Tolia © May 2006 DOT Objects Naming • Objects represented by an OID • Divided into “chunks”, each with a descriptor • Each OID corresponds to a list of descriptors • Data is fetched using descriptor lists • Supports partial transfers 39 Niraj Tolia © May 2006 Innovation in Data Transfer is Hard • Imagine: You have a novel data transfer technique • Say… Bittorrent, a P2P protocol for sharing large files • How do you deploy? 1. Update HTTP. Talk to IETF. Modify Apache, IIS, Firefox, Netscape, Opera, IE, Lynx, Wget, … 2. Update SMTP. Talk to IETF. Modify Sendmail, Postfix, Exchange, Mail.app, Eudora, … 3. Give up in frustration 40 Niraj Tolia © May 2006 DOT’s Modular Architecture Application (1) Application API Transfer Plugin DOT (3) Storage Plugin API Storage Plugin Network Transfer Plugin (2) Transfer Plugin API Transfer Plugin Local Storage 41 Niraj Tolia © May 2006 DOT Plugins • Multipath plugin • List of sub-plugins • Balances load • Portable storage plugin • Sender: Copies new data onto USB flash device • Receiver: Scans USB flash device for blocks – naïve filesystem layout, unoptimized • but effective 42 Niraj Tolia © May 2006 Portable Storage Results Device Inserted .. 1126s (~ 19 min) 43 Niraj Tolia © May 2006 DOT email chunk caching evaluation • Step 1: Caching analysis (infinite cache) • SMTP default: What was really sent • DOT body: Whole-body only caching, headers sent separately • Rabin body: Headers sent separately, rabin fingerprint chunking of body • Rabin whole: Headers+body chunked together • Easiest to implement for application. Just send data… • Step 2: Trace replay through Postfix 44 Niraj Tolia © May 2006 Related Work • BEEP • Proxy-based data interposition approaches – RON, X-Bone, OCALA • Other Content-Addressable Systems – Bittorrent, DHTs, DTNs, EMC’s Centera • Using Content-Addressability to save on data transfers – CASPER, LBFS, Rhea et al., Spring et al. • Portable Storage – Lookaside Caching, BlueFS • Other transfer protocols – GridFTP, IBP, HTTP, etc. 45 Niraj Tolia © May 2006 Transfer plugin API • get_descriptors( OID, hints ) • get_chunks( descriptor, hints ) • cancel_chunks( chunk_list ) • Hints specify a plugin + data • gtc://sender.example.com:12000/ • dht://opendht.org/ • … • Transfer plugin chaining is easy • e.g., multipath plugin 46 Niraj Tolia © May 2006