Presented at DSN-DCCS 2011 in Hong Kong on 6/28/11 TRODS Transparent Recovery for Object Delivery Services Wyatt Lloyd, Michael J. Freedman Princeton University 2 3 4 Service Server Client Connection Recovered! Server Server Server 5 Object Delivery Services • Read-Only • Static Content • Webpages, Images, Videos 6 Work Now • Can’t Modify Clients 7 Key Idea • Coerce client to help – To identify connections that need recovery – To reliably store information • Yet client is unmodified and unaware – Exploit TCP spec to control client’s stack 8 Object Delivery Cluster Service Server Load Balancer Liveness Monitor Server Server Server 9 Failure Service Server Load Balancer Liveness Monitor Server Server Server 10 TRODS Service ? Client Load Balancer Liveness Monitor Server Server Server 11 TRODS Service ? Client Load Balancer Liveness Monitor Server Server Server Store 12 Road to Recovery Step Technique Redirect to live server ………………. Liveness monitor updates load balancer Induce client to send packet………Coerce client’s TCP stack Continue Connection Determine Phase………………… Use packet + stored info Identify Object……………………. Stored Info Find Offset…………………………..Use packet + stored info 13 Coercing Clients • Always Leave A Packet Unacknowledged Exploit TCP Spec for Recovery Initiation! Client Server FIN/ACK Request ACK SYN Response SYN/ACK ACK FIN 321 Retransmit Queue Request FIN/ACK SYN Retransmit Queue Response SYN/ACK FIN 321 Always Something Here 14 Continuing the Connection • Determine Phase: 1) TCP Setup 2) HTTP Setup 3) HTTP Download TRODS Saves Info 4) TCP Teardown 15 Continuing the Download • HTTP ObjectID • Offset = TCP Ack – HTTP ObjectISN HTTP ObjectISN HTTP S N Resp Y Header TCP ISN TCP Ack HTTP Object 16 Continuing the Download • HTTP ObjectID • Offset = TCP Ack – HTTP ObjectISN HTTP ObjectISN HTTP S N Resp Y Header TCP ISN TCP Ack HTTP Object 17 Persistent Store • Key-Value Store + Corner Cases Handled + Unlimited Objects – Still Efficient (1 save only) • TCP Timestamp IP KV T TCP S Payload + Very Efficient (1 machine only) – 1 Million Exploit Object Limit TCP Spec for Persistence! – Corner Cases 18 Recover the Connection • Initiate New Connection – GET ObjectID … – Range: bytes=Offset- • Splice Connections Together • Works with Unmodified Servers! 19 TRODS 1) Packet Manipulation Server IP TCP … IP TCP’ … TCP TRODS IP 20 TRODS 1) Packet Manipulation 2) Protocol Inspection Server Response1 TCP ObjISN ObjID TRODS Request IP Request 21 TRODS 1) Packet Manipulation 2) Protocol Inspection 3) Blocks Connection Server TCP ObjID ObjISN TRODS Response1 IP 22 TRODS 1) 2) 3) 4) Packet Manipulation Protocol Inspection Blocks Connection State Injection Server IP TCP … IP T TCPS … TCP TRODS IP 23 TRODS 1) 2) 3) 4) 5) Packet Manipulation Protocol Inspection Blocks Connection State Injection Recovery Initiation Server TCP ? TRODS IP Ack 24 Failure Walkthrough Service Server Response SYN/ACK TCP 1 TROD S IP ClientSYN ACK Request Load Balancer ID IS N Liveness Monitor Server TCP TROD S IP Server KV Store TCP TROD S IP 25 Failure Walkthrough Service Liveness Monitor ! Client ACK ACK ACK FIN ACK Load Balancer Response Server 2 Response 3 Response FIN 4 TCP TROD S IP Server ? ID IS N KV Store TCP TROD S IP 26 Related Work • New Transport – Trickles, SCTP, TCP Migrate, … • TCP – FT-TCP, ST-TCP, Backdoors, … • HTTP – CoRAL, … 27 Implementation • Linux Kernel Module • 3,000 lines of C • ~CoRAL – Optimistic subset of CoRAL 28 Experiments • Additional Latency – Normal – Failure • Throughput – Lighttpd @ Princeton – Apache @ Emulab – Hybrid TS & KV Throughput – Failure 29 Normal Case Latency • TRODS-TimeStamp (TS) – Median: + 0.009 ms – 99th: + 0.012 ms • TRODS-Key-Value (KV) – Median: + 0.137 ms – 99th: + 0.148 ms 30 Recovery Latency 1 ~15% 0.8 CDF ~35% 0.6 0.4 ~50% 0.2 0 ~0 .2ms 20ms 200ms 3s Additional Latency Blink of an eye 31 ThroughPut Per Server 120 ops/s 30 ops/s 30 ops/s/server Raw Frontend 120 ops/s 30 ops/s TPPS 20 ops/s/server 32 Requests / Sec / Server 9% 22500 20000 17500 15000 12500 10000 7500 5000 2500 Lighttpd 38% KV/Server: 1/8 KV/Server: 1/4 Unmodified TRODS-TS TRODS-KV ~CoRAL 7% KV/Server: 66%1/34 KV/Server: 1/2 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB Web Object Size 33 Apache Normalized TPPS 1 0.8 0.6 0.4 Unmodified TRODS-TS TRODS-KV 0.2 FT-TCP(cold) ~CoRAL FT-TCP(hot) 0 1KB 2KB 4KB 8KB 16KB 32KB 64KB Web Object Size 34 Summary • Recover Object Delivery Connections Unmodified • Exploit TCP Specification to Coerce^Clients – To send recovery-starting packets – To provide persistent storage • Evaluation – Low Latency – High Throughput Per Server 35 Summary • Recover Object Delivery Connections Unmodified • Exploit TCP Specification to Coerce ^Clients – To send recovery-starting packets – To provide persistent storage • Evaluation – Low Latency – High Throughput Per Server • Questions? 36