Consistent Hashing: Load Balancing in a Changing World David Karger, Eric Lehman, Tom Leighton, Matt Levine, Daniel Lewin, Rina Panigrahy MIT Caches can Load Balance Server Items distributed among caches Users get items from caches MIT Numerous items in central server. Requests can swamp server. Distribute items among caches. Clients get items from caches. Server gets only 1 request per item Who Caches What? Each cache should hold few items » else cache gets swamped by clients Each item should be in few caches » else server gets swamped by caches » and cache invalidations/updates expensive Browser must know right cache » fast, local computation MIT A Solution: Hashing Server items assigned to caches by hash function. Users use hash to compute cache for item. MIT Example: y = ax+b (mod n) Intuition: Assigns items to “random” caches » few items per cache Easy to compute which cache holds an item Problem: Adding Caches Suppose a new cache arrives. How work it into hash function? Natural change: y=ax+b (mod n+1) Problem: changes bucket for every item » every cache will be flushed » servers get swamped with new requests Goal: when add bucket, few items move MIT Problem: Inconsistent Views Each client knows about a different set of caches: its view View affects choice of cache for item » Same item may hash to many places: caches swamp server with request for item » Many items may hash to same place: clients swamp cache MIT Goal: despite views, items evenly distributed into a few caches each Solution: Consistent Hashing Use standard hash function to map caches and items to points in unit interval. » “random” points spread uniformly Item assigned to nearest cache in view item Cache (Bucket) Computation easy as standard hash function MIT Properties All buckets get roughly same number of items (like standard hashing). When kth bucket is added only a 1/k fraction of items move. » and only from a few caches When a cache is added, minimal reshuffling of cached items is required. MIT Multiple View Properties Despite multiple views, each cache gets few items » no cache overloaded Despite multiple views, each item only in few caches. » server protected, cache updates easy System tolerates multiple, inconsistent views of caches (also fault tolerant). MIT Load Balancing Task: distribute items into buckets » Data to memory locations » Files to disks » Tasks to processors » Web pages to caches (our motivation) MIT Goal: even distribution Problem: No Synchronization Each user knows about a different set of caches: a view View affects assignment of items to caches Problems when there are multiple views: items View 1 View 3 View 2 View 4 The items assigned to a specific cache are different in each view. These sets could be essentially disjoint for standard hash functions. Over all views, cache is responsible for too many items. items assigned to one cache over 4 views MIT Cache not large enough to contain active set of items Multiple Views: Cont.. item View 3 View 1 View 4 View 2 item assigned to different caches in each of 4 views MIT Item may be assigned to different caches in different views. Standard hash function may assign item to a different cache in every view. Result: item requested from many caches » Server swamped with requests for copies of the item. » Hard to update cached copies Problem: Adding Caches New cache means new hash function » natural change: y = ax+b (mod n+1) Standard hash functions completely redistribute items when the range of function changes: » Every cache will be flushed » server is swamped with requests since items are reshuffled between caches. » Need to broadcast the new hash function to all users at the same time – some kind of global synchronization?... MIT