SEEDING CLOUD-BASED SERVICES: DISTRIBUTED RATE LIMITING (DRL) Kevin Webb, Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren Seeding the Cloud Technologies to deliver on the promise cloud computing Previously: Process data in the cloud (Mortar) Produced/stored across providers Find Ken Yocum or Dennis Logothetis for more info Today: Control resource usage: “cloud control” with DRL Use resources at multiple sites (e.g., CDN) Complicates resource accounting and control Provide cost control DRL Overview Example: Cost control in a Content Distribution Network Abstraction: Enforce global rate limit across multiple sites Simple example: 10 flows, each limited as if there was a single, central limiter 10 flows Limiter Src 100 KB/s Dst 2 flows Limiter Src 20 KB/s Dst DRL Limiter Src 8 flows 80 KB/s Dst Goals & Challenges Up to now Develop architecture and protocols for distributed rate limiting (SIGCOMM 07) Particular approach (FPS) is practical in the wide area Current goals: Move DRL out of the lab and impact real services Validate SIGCOMM results in real-world conditions Provide Internet testbed with ability to manage bandwidth in a distributed fashion Improve usability of PlanetLab Challenges Run-time overheads: CPU, memory, communication Environment: link/node failures, software quirks PlanetLab World-wide test bed and systems research Resources donated by Universities, Labs, etc. Experiments divided into VMs called “slices” (Vservers) Controller Networking Web server PLC API PostgreSQL Linux 2.6 Internet Slice 1 Slice 2 Slice N Slice 1 Slice 2 Vservers Vservers Linux 2.6 Linux 2.6 Nodes Slice N PlanetLab Use Cases PlanetLab needs DRL! Donated bandwidth Ease of administration Machine room Per slice Limit local-area nodes to a single rate Limit experiments in the wide area Per organization Limit all slices belonging to an organization PlanetLab Use Cases Machine room Limit local-area nodes with a single rate DRL DRL 5 MBps 1 MBps DRL DRL DRL DRL Design Each limiter - main event loop Input Traffic Estimate: Observe and record outgoing demand Allocate: Determine rate share of each node Enforce: Drops packets Two allocation approaches GRD: Global random drop (packet granularity) FPS: Flow proportional share Estimate Other Limiters FPS Allocate Regular Interval Enforce Flow count as proxy for demand Output traffic Implementation Architecture Abstractions Limiter Ulogd Estimate Parameters (limit, interval, etc.) Machines and Subsets Built upon standard Linux tools… Communication Manages identities Identity Input Data Userspace packet logging (Ulogd) Hierarchical Token Bucket Mesh & gossip update protocols Integrated with PlanetLab software FPS Regular Interval Enforce HTB Output Data Estimation using ulogd Userspace logging daemon Already used by PlanetLab for efficient abuse tracking Packets tagged with slice ID by IPTables Receives outgoing packet headers via netlink socket DRL implemented as ulogd plug-in Gives us efficient flow accounting for estimation Executes the Estimate, Allocate, Enforce loop Communicates with other limiters Enforcement with Hierarchical Token Bucket Linux Advanced Routing & Traffic Control Packet (1500) Root 200b 1000b Hierarchy of rate limits Enforces DRL’s rate limit 100b A 0b X Packets attributed to leaves (slices) Packets move up, borrowing from parents B C D 600b 0b Y Packet (1500b) Z Enforcement with Hierarchical Token Bucket Uses same tree structure as PlanetLab Root Efficient control of sub-trees A X Updated every loop Root limits whole node B Replenish each level C D Y Z Citadel Site The Citadel (2 nodes) Wanted 1 Mbps traffic limit Added (horrible) traffic shaper Poor responsiveness (2 – 15 seconds) Running right now! Cycles on and off every four minutes Observe DRL’s impact without ground truth DRL Shaper Citadel Results – Outgoing Traffic Outgoing Traffic 1Mbit/s Off On Off On Time Off On Data logged from running nodes Takeaways: Without DRL, way over limit One node sending more than other Off On # of Flows Citadel Results – Flow Counts Time FPS uses flow count as proxy for demand FPS Weight Rate Limit Citadel Results – Limits and Weights Time Lessons Learned Flow counting is not always the best proxy for demand FPS state transitions were irregular Added checks and dampening/hysteresis in problem cases Can estimate after enforce Ulogd only shows packets after HTB FPS is forgiving to software limitations HTB is difficult HYSTERESIS variable TCP Segmentation offloading Ongoing work Other use cases Larger-scale tests Complete PlanetLab administrative interface Standalone version Continue DRL rollout on PlanetLab UCSD’s PlanetLab nodes soon Questions? Code is available from PlanetLab svn http://svn.planet-lab.org/svn/DistributedRateLimiting/ Citadel Results