Practical LFU implementation for Web Caching George Karakostas Telcordia Dimitrios N. Serpanos University of Patras A simple caching environment T itle: Creator: Preview: T his EPS pic ture was not s aved with a preview i nc luded i n i t. Comment: T his EPS pic ture wi ll pri nt to a Pos tScript printer, but not to other types of printers. Basic assumptions 1. The number of all Web pages N is known. 2. The system is closed. 3. The requests for Web pages follow Zipf’s Law. 4. The requests are statistically independent. (only order of magnitude matters) (yeah, right…but we won’t care) (plenty of experimental evidence) (very strong assumption - counterintuitive(?)) Zipf-like distributions Let U {Pi : Pi is the i th most popular page, 1 i N} Then Pr[Pi is requested] , where 1 i H N More generally: Pr[ Pi is requested] i where is a constant between 0.6-0.9, depending on the particular request stream. Popularities according to Zipf T itle: Creator: Preview: T his EPS pic ture was not s aved with a preview i nc luded i n i t. Comment: T his EPS pic ture wi ll pri nt to a Pos tScri pt printer, but not to other types of printers. where =1. Our motivation • Serpanos & Wolf prove analytically the optimality of Perfect-LFU under assumptions 3 and 4. • Breslau et al. studied the implications of assumptions 3 and 4. Give evidence for Zipf-like distribution of page requests, and for the optimality of Perfect-LFU as a cache replacement policy. But, if so... Why people don’t use Perfect-LFU? Answer: Because it is ‘Perfect’ (i.e. impractical). Perfect-LFU needs to store statistics for all the pages requested from the beginning of cache operation. Hence the resources (time/space) needed are of order N. Our contribution : We show that under assumptions 1-4 we can efficiently approximate the Perfect-LFU hit rate within any constant ε. Chernoff bounds Theorem [Chernoff]: The sum of R i.i.d. random variables is close to its expected value with very high probability: R Pr[ X i 1 i 2 R) ( (1 ) E[ X ] ] e Observation 1: Under our assumptions, the number of requests for a page in a random trace is close to its expected value, i.e. proportional to its popularity. Observation 2: With a small R we can distinguish the most popular objects. Window-LFU • Simple variation of Perfect-LFU. • Instead of keeping statistics for all pages, keep only for a sample of the request stream (called window) of size | W| poly(C) N1 ln(1/ ) where C is the cache size, and ε is the error parameter. • Cache the C most frequent pages in the sample. Theorem: Under our assumptions, HIT - RATE W - LFU(C) (1 ) HIT - RATE P - LFU(C) Window placement Observation : Under our assumptions, any sample of size |W| will achieve the Perfect-LFU hit rate. Request stream New request CACHE Locality Two different types of locality phenomena: • Temporal • Popularity Our window will be the |W| most recent requests to take advantage of temporal locality as well. Simulation results T itle: T itle: Creator: gnuplot Preview: T his EPS pic ture was not s aved with a previ ew inc luded in it. Comment: T his EPS pic ture wi ll print to a Pos tScri pt printer, but not to other types of printers. Creator: gnuplot Preview: T his EPS pic ture was not s aved with a previ ew inc luded in it. Comment: T his EPS pic ture wi ll print to a Pos tScri pt printer, but not to other types of printers. T itle: T itle: Creator: gnuplot Preview: T his EPS pic ture was not s aved with a previ ew inc luded in it. Comment: T his EPS pic ture wi ll print to a Pos tScri pt printer, but not to other types of printers. Creator: gnuplot Preview: T his EPS pic ture was not s aved with a previ ew inc luded in it. Comment: T his EPS pic ture wi ll print to a Pos tScri pt printer, but not to other types of printers. T itle: Creator: gnuplot Preview: T his EPS pic ture was not s aved with a previ ew inc luded in it. Comment: T his EPS pic ture wi ll print to a Pos tScri pt printer, but not to other types of printers. Conclusions / Open problems • Window-LFU is an efficient implementation of LFU • It takes advantage of the different types of locality to achieve in practice better performance than PerfectLFU. • How can we determine the window size dynamically? (simple doubling heuristic performs very well) • How can we detect that the Zipf-like distribution parameters (N,α) have changed?