LRU-K Page Replacement Algorithm CSCI 485 Lecture notes Instructor: Prof. Shahram Ghandeharizadeh. Outline • • • • • • History Motivation for LRU-K Alternatives to LRU-K LRU-K Design and implementation Conclusion History • LRU-K is attributed to Elizabeth J. O’Neil, Patrick E. O’Neil, and Gerhard Weikum: – The LRU-K Page Replacement Algorithm for Database Disk Buffering, ACM SIGMOD 1993, Washington D.C., page 297-306. Least Recently Used (LRU) • When a new buffer page is needed, the buffer pool manager drops the page from buffer that has not been accessed for the longest time. • Originally for patterns of use in instruction logic (Denning 1968). • Limitation: Decides what page to drop from buffer based on too little information (time of last reference). Pseudo-code for LRU LRU (page p) If p is in the buffer then LAST(p) = current time; Else i) Min = current time + 1; ii) For all pages q in the buffer do a) If (LAST(q) < min) victim = q Min = LAST(q) iii) If victim is dirty then flush it to disk iv) Fetch p into the buffer frame held by victim v) LAST(p) = current time Example 1: LRU Limitation • Consider a non-clustered, primary B-tree index on the SS# attribute of the Employee table. – – – – t(Emp) = 20,000 P(Emp) = 10,000 (2 records per disk page) lp(I, Emp) = 100 Workload: queries that retrieve Emp records using exact match predicates using SS# attribute, e.g., SS#=940-98-7555 • If the B-tree is one-level deep (root-node, followed by the 100 leaf pages), pattern of access is: Ir, I1, D1, Ir, I2, D2, Ir, I3, D3, …. • Assume your workload consists of 101 frames, what is the ideal way to assign leaf pages and data pages to these frames? What will LRU do? LRU (page p) If p is in the buffer then LAST(p) = current time; Else i) Min = current time + 1; ii) For all pages q in the buffer do a) If (LAST(q) < min) victim = q Min = LAST(q) iii) If victim is dirty then flush it to disk iv) Fetch p into the buffer frame held by victim v) LAST(p) = current time • In our example: Data pages compete with the leaf pages, swapping them out. More disk I/O than necessary. Example 2: LRU Limitation • A banking application with good locality of shared page references, e.g., 5000 buffered pages out of one million disk pages observe 95% of the references. • Once a few batch processes begin sequential scans through all one million pages, the referenced pages swap out the 5000 buffered pages. Possible Approaches • Page pool tuning • Query execution plan analysis • LRU-K Page pool tuning • DBA constructs page pools, separating different reference patterns into different buffer pools. • Disadvantage: – requires human effort, – What happens when new reference patterns are introduced? Or existing reference patterns disapper? Query execution plan analysis • Query optimizer should provide hints about the usage pattern of a query plan – Buffer pool manager employs FIFO for pages retrieved by a sequential scan. – Buffer pool manager employs LRU for index pages. • In multi-user situations, query optimizer plans may overlap in complicated ways. • What happens with Example 1? LRU-K • The victim page (page to be dropped) is the one whose backward K-distance is the maximum of all pages in buffer. • Definition of Backward K-distance bt(p,K): Given a reference string known up to time t (r1, r2, …,rt), the backward distance bt(p,K) is the distance backward to the Kth most recent reference to page p. LRU-K (Cont…) • Design limitations: – Early page replacement: An unpopular page may observe correlated references shortly after being referenced for the first time. – Extra memory because LRU-K retains history of pages referenced (even those that are not in the buffer). LRU does not have this limitation; its memory requirement is well defined. Early Page Replacement Key observation • Two correlated references are insufficient reason to conclude that independent references will occur. • One solution: The system should not drop a page immediately after its first reference. Instead, it should keep the page around for a short period until the likelihood of a dependent follow-up reference is minimal. Then the page can be dropped. AND, correlated references should not impact the interarrival time between requests as observed by LRU-K. • Correlated reference period = timeout. Memory required by LRU-K • Why not keep the last K references in the header of each disk page (instead of main memory)? – After all, when the page is memory resident then its last K references are available. Memory required by LRU-K • Forget history of pages using the 5 minute rule. Those pages that are not referenced during the last 5 minute, loose their history. Pseudo-code of LRU-K • HIST & LAST are main memory data structures. • Optimizations: • Use tree search to find the page with maximum backward K-distance. Performance Analysis • Compare LRU (LRU-1) with LRU-2 and LRU-3. • Three different workloads • Measured metrics: – Cache hit for a given buffer pool size. – How much larger should the buffer pool with LRU-1 be in order to perform the same as LRU-2? This value is represented as B(1)/B(2). Workload 1: Two Pool Experiments • Designed to resemble Limitation 1 shown earlier. • Two pools of disk pages: N1 and N2. • Alternate references to each pool. A page in pool Ni is referenced randomly. • What is the probability of reference to a page in Pool N1? Obtained results for Workload 1 Key Observations • LRU-3 is identical/very-close to optimal. • Why would one not choose K=3? Key Observations • LRU-3 is identical/very-close to optimal. • Why would one not choose K=3? – For evolving access patterns, LRU-3 is less adaptive than LRU-2 because it needs more references to adapt itself to dynamic changes of reference frequencies. – LRU-3 requires a larger number of requests to forget the past. • Recommendation: Advocate LRU-2 as a generally efficient policy. Workload 2: Zipfian random access • 1000 pages accessed using a Zipfian distribution of access. Workload 3: Trace driven • Gather traces for one hour from an OLTP system used by a large bank. • Number of unique page references is 470,000. • Key observation: LRU-2 is superior to both LRU and LFU. Results for Workload 3 Workload 3 • LRU-2 is superior to both LFU and LRU. • With small buffer sizes (< 600), LRU-2 improved the buffer hit ratio by more than a factor of 2. • LFU is surprisingly good. Why not LFU? Workload 3 • • • LRU-2 is superior to both LFU and LRU. With small buffer sizes (< 600), LRU-2 improved the buffer hit ratio by more than a factor of 2. LFU is surprisingly good. Why not LFU? 1. LFU never forgets previous references when it compares the pirorities of pages. Hence, it cannot adapt to evolving access patterns. LRU-K • Advantages: 1. Discriminates well between page sets with different levels of reference frequency, e.g., index versus data pages (Example 1). 2. Detects locality of reference within query executions, across multiple queries in the same transaction, and across multiple transactions executing simultaneously. 3. Does not require external hints. 4. Fairly simple and incurs little bookkeeping overhead.