Semantics of Caching with SPOCA - A Stateless, Proportional, Optimally-Consistent Addressing Algorithm Ashish Chawla, Benjamin Reed, Karl Juhnke, Ghousuddin Syed Yahoo! Inc Video Platform 2 6/21/11 Video Platform 3 6/21/11 Simple Content Serving Architecture 4 6/21/11 Outline § Introduction § Problem Definition § SPOCA and Requirements § Evaluations § Conclusion 5 6/21/11 The Problem § The front-end server disks are a secondary bottleneck. § Eliminating redundant caching of content also reduces the load on the storage farm. § An intelligent request-routing policy can produce far more caching efficiency than even a perfect cache promotion policy that must labor under random request routing. § The cache promotion algorithm not enough. 6 6/21/11 Problems from Geographic Distribution 7 6/21/11 Problems from Geographic Distribution Reque sts v7 8 6/21/11 Problems from Geographic Distribution 9 6/21/11 Outline § Introduction § Problem Definition § SPOCA and Requirements § Evaluations § Conclusion 10 6/21/11 Requirements § Merge different delivery pools and manage the diverse requirements in an adaptive way. § Minimize caching disruptions when front-end server leaves or enters the pool - re-address as few files as possible to different servers. § Proportional distribution of files among servers does not necessarily result in a proportional distribution of requests (Power Law) 11 6/21/11 SPOCA and Zebra § Used in production in a global scenario for web-scale load. § Shows real world improvements over the simple off-theshelf solution. § Implements load balancing, fault tolerance, popular content handling, and efficient cache utilization with a single simple mechanism. 12 6/21/11 Traditional Approach 13 6/21/11 Complete Picture 14 6/21/11 Complete Picture – Inside Data Center 15 6/21/11 Zebra Algorithm § Handles the geographic component of request routing and content caching § Based on content popularity, Zebra decides when requests should be routed to content’s home locale and when the content should be cached in the nearest locale § We use bloom filters to determine popularity. 16 6/21/11 Tracking popularity add(vid1) Bloom Filter 17 6/21/11 Checking Popularity contains(vid1) Bloom Filter 18 6/21/11 What’s the problem here? § Everything will become popular. § No way to expire content in bloom filter § We use a sequence of bloom filters to track popularity. 19 6/21/11 Bloom Filter Representation 0 • vid1 • vid5 1 2 • vid8 • vid526 • vid2 • vid752 20 6/21/11 Bloom Filter Representation 0 1 • vid1 • vid5 21 2 • vid8 • vid526 6/21/11 Bloom Filter Representation add(vid8) 0 1 • vid1 • vid5 22 2 • vid8 • vid526 6/21/11 Bloom Filter Representation 0 1 • vid8 • vid1 • vid5 23 2 • vid8 • vid526 6/21/11 Bloom Filter Representation contains(vid3) 0 1 • vid8 • vid1 • vid5 24 2 • vid8 • vid526 6/21/11 Bloom Filter Representation contains(vid3) Unified Filter vid1, vid5, vid8, vid526 0 1 2 • vid8 • vid1 • vid5 25 • vid8 • vid526 6/21/11 Key Points § Zebra determines which serving cluster will handle a given request based on geolocality and popularity. § SPOCA determines which front-end server within that cluster will cache and serve the request. 26 6/21/11 SPOCA Algorithm § Goal: Maximize cache utilization at the front-end servers. § Simple content to server assignment function based on a sparse hash space. § Each front-end server is assigned a portion of the hash space according to its capacity. § The SPOCA routing function uses a hash function to map names to a point in a hash space. › Input = the name of the requested content › Output = the server that will handle the request. § Re-hashing happens till the result maps to a valid hash space. 27 6/21/11 SPOCA hash map example 28 6/21/11 Failure Handling 29 6/21/11 Elasticity 30 6/21/11 Popular Content § SPOCA minimizes the number of servers to maximize the aggregate number of cached objects. § For popular content we need to route requests to multiple front-end servers. § We store the hashed address of any requested content for a brief popularity window, 150 seconds in our case. § When the popularity window expires, the stored hash for each object is discarded. 31 6/21/11 !" # !$" 32 6/21/11 !"# $ !"# 33 6/21/11 Outline § Introduction § Problem Definition § SPOCA and Requirements § Evaluations § Conclusion 34 6/21/11 Scaling 5x w/o software improvements 35 6/21/11 Scaling 5x with software improvements 36 6/21/11 Memory cache hits 37 6/21/11 Cache Hit and Misses* 2/26 3/1 3/5 3/7 3/10 3/14 Download Cache Miss 9.7% 7.2% 4.3% 3.7% 1.8% 0.4% Download Cache HIT 90.3% 92.8% 95.7% 96.3% 98.2% 99.6% Flash Cache Miss 21.8% 13.5% 22.0% 14.8% 2.5% 0.7% Flash RAM hit 57.2% 81.4% 66.1% 71.5% 90.0% 90.1% * Download and Flash Pools in S1S data center 38 6/21/11 Conclusion § Zebra and SPOCA do not have any hard state to maintain or per object meta-data § Eliminates any per object storage overhead or management, simplifying operations. § Consolidate content serving into a single pool of servers that can handle files from a variety of different workloads. § Decouple serving and caching layers. § Cost savings and end user satisfaction are key success metrics. 39 6/21/11 40 6/21/11