第5讲 数据缓存与失效 §5.1 Basics §5.2 Broadcast Disk Caching §5.3 Cache Invalidation §5.4 Cooperative Caching §5.1 Basics Recall again two basic modes of data accessing mechanism: Data dissemination from server to a large population of clients. Dedicated data access: from server to individual client and from client back to server. Unreliable link, disconnection in data dissemination Solution: caching of useful data items. Problem with caching: data items may become outdated or inconsistent. Solution: use invalidation mechanism to discard outdated items; consistency can be maintained with good broadcast / transaction. Unreliable link, disconnection in dedicated data access For data update: tentative update at local host. For data reading: try again later Problem with tentative update: same items may be updated by different parties. Solution: data reintegration or reconciliation. Data Caching Caching The maintenance of a copy of data item previously obtained from the server at client storage for potential future use. OS often cache data in memory and remote data at local disk. Normally, caching is done in an on-demand basis. Client requests for a needed item and keeps a local copy. Prefetching client requests a data item even it is not needed now; keeps a copy for future need. Prefetching occurs in an anticipatory basis. Hoarding Prefetching done in anticipation of future disconnection. Data Caching Goal To cache the most probable data item for best future use, given limited amount of cache storage and potential changes in data item values. Performance Cache hit ratio Percentage of accesses that results in cache hit. Percentage of total volume of data retrieved from cache to total volume of requested data. Access response time Average service time to complete the data item request. Inclusive of time to retrieve from broadcast channel or requesting from server over dedicated request channel. Major Issues Granularity What format to be cached. Page-based, item-based (tuple or object), attribute-based (part of an item), predicate-based (a logical fragment of related items), chunkbased (generalized page). Admission What to be cached. Decide whether a requested item should reside in cache. Replacement What to be removed to make room. LRU, MRU, LFU, LRD, window, EWMA. Coherence/Consistency How to ensure that cache is valid. Update-based, invalidation-based, lease-based, verification-based. Granularity of caching Cache table To identify the cached unit (item or page or attribute). Translation mechanism To map requested item to cached item in a transparent manner. Coarse granularity such as page or chunk makes use of locality of reference to perform block transfer. Useful for I/O, since physical storage is also page-based. Large page or chunk consumes scarce wireless bandwidth. More useful for server to broadcast, but not as useful for client to cache (unless there is a strong locality of reference). Fine granularity such as object-based or attribute-based is more flexible. Useful for individualized access by client to item. Attribute-based even provides a high flexibility. High overhead of cache table, especially for attribute-based. Cache Admission/Placement Should it be kept in cache for future use? When a data item arrives at the client and is used by the application. Admission of useless items will waste cache storage. Failure to admit useful items will lead to high response time. Some metric is used to estimate the degree of importance E.g. the access probability of item. The item with higher access probability than that of some items in cache, it should be admitted. Cache Replacement When cache is full, victim needs to be selected for removal. Fixed-sized replacement is easier. LRU Used by most OS with good performance. Easy to implement with a stack of access time or second chance algorithm. LRU(k) Extend LRU to take into account of most recent k 1 accesses. More costly to implement, but perform better than LRU. MRU Useful for some file systems, especially to cyclic requests style. Relatively poor performance in general. LFU Keep access count of items and replace the least used. Old items with high access count would never be replaced. Cache Replacement LRD (least reference density) Remedy LFU by normalizing the count with the period from an item’s first access to now. Replace the one with lowest reference density. LRD with aging will divide the density by a certain factor periodically, to reduce effect of old accesses. Window Keep the access time of most recent w accesses and remove the with smallest access density. Higher storage cost for meta-data proportional to w. EWMA(exponentially weighted moving average) A combination of standard LRD and Window. Compute an access score to each item based on the inter-arrival time of consecutive accesses. Maintain the score with exponential smoothing. Easy to implement. Good and relatively stable performance. Cache Coherence Value of a cache item should be close to the most updated value. Traditional approach in distributed systems. Callback approach: write-update, write-invalidate. Detection approach: validation. Update-based Server broadcasts updated value during regular broadcast cycle or over a “change” channel. Easy to implement, but impractical for large items with frequent changes. Invalidation-based Server broadcasts invalidation report to inform client of changed value. Client deletes items that have become invalid. More commonly adopted approach. Difficulty with traditional approaches: Invalidation may be missed due to weakness of wireless network Cache Coherence Lease-based Server promises item to be valid for a certain period. Client can use item before the validity expires. Lease can be renewed either from server via broadcast or from client via request. Verification-based Server does not perform any update or invalidation. Client validates item before using. Similar traffic as with standard request/reply, but size of reply could be small if verification is positive. More useful if combined with lease-based approach: client performs verification on demand only when lease expires, and requests for lease renewal together. §5.2 Broadcast Disk Caching 2 5 1 12 2 8 15 18 2 5 22 25 2 8 28 31 9 6 3 13 9 10 16 19 9 6 23 21 9 10 29 32 26 7 4 14 26 11 17 20 26 7 24 27 26 11 30 33 Broadcast disk schedule can be generated based on access probability of items to minimize expected access time. Caching in presence of broadcast disk is different from standard caching with LRU, LRD, EWMA replacement. The normal decision to cache a data item based on predicated future utility is not the best way. One should cache an item that not only is highly useful in future, but also is not coming over the broadcast in the near future. Two decisions: Whether an item should be cached (admission). Which item to be removed to make room for a new item (replacement). Broadcast Disk Caching Observation Hot items will be broadcast very frequently and it does not make good use of the limited cache to cache them. Cold items will be broadcast very sparsely but there are too many of them to be cached, and each cached cold item only contributes marginally due to low access rate. Caching decision on whether an item should be cached should depend on the relative access need of the item. It is better to cache an item that local probability of access is significantly greater than the page’s frequency of broadcast. Caching decision on which item to be replaced should depend on the expected arrival time of the selected item. This can be approximated by the broadcast frequency of the item. Broadcast disk adopts a cost-based replacement mechanism to take into account of the factors above. Broadcast Disk Caching PIX: access Probability Inverse broadcast frequency (X). Assuming perfect knowledge of data access probability. Cost = P / X (where X is frequency of the item being broadcast). Example Item 1 is accessed 1% of the time and is broadcast 10 times in a broadcast cycle. Item 2 is accessed 0.5% of the time and is broadcast 2 times in a broadcast cycle. The first item should be replaced for keeping the second, even though the first item is accessed more often (since it is much easier to get the first item over the broadcast). This P / X value is called the pix value of the item. Pix value for item 1 = 0.001. Pix value for item 2 = 0.0025. Broadcast Disk Caching 2 5 1 12 2 8 15 18 2 5 22 25 2 8 28 31 9 6 3 13 9 10 16 19 9 6 23 21 9 10 29 32 26 7 4 14 26 11 17 20 26 7 24 27 26 11 30 33 With the example as in lecture of broadcast disk Access probability of group A, B and C: qA = 4/21, qB = 1/21, qC = 1/168. Assume three broadcast disks A, B and C, with broadcast frequency 4:2:1 and cycle length 48. Assume that the cache is of size 3, with an access sequence (reference string) of 2, 5, 4, 7, 5, 3, 5, 9. PIX2 = PIX9 = 1/21; PIX5 = PIX7 = 1/42; PIX3 = PIX4 = 1/168. With 2, 5, 4 arriving, cache contains 2, 5, 4 in that order. With 7 coming, remove 4 with lowest PIX and put in 7: 2, 5, 7. With 5 coming, no action. With 3 coming, remove 7 with lowest PIX and oldest: 2, 5, 3. With 5 coming, no action. With 9 coming, remove 3 with lowest PIX and oldest: 2, 5, 9. Broadcast Disk Caching Note that the access probability may not be known. A page to be replaced needs to be compared with all pages for minimum. There needs to be an approximation of PIX which is practical. LIX, a variation of LRU to approximate PIX Recall that LRU maintains a stack of cached pages (normally as a linked list). When a page is accessed, it is moved to the top. When a page is to be replaced, the bottom page is replaced. We make use of nature of broadcast disk (collection of N disks). LIX implementation: Maintain a stack of cached pages like LRU, but one for each disk. When a page is accessed, it is moved to the top of the stack. When a new page enters the cache, the N lix values for each bottom page for each stack (disk) is computed. The one with the smallest lix value is chosen to be the victim and is replaced. The new page enters the proper stack and is placed at the top of that stack. Broadcast Disk Caching LIX implementation – How to compute the value of lix ? For each date item i, compute pi – the running probability estimate pi = α / (CurrentTime – ti) + (1 - α)* pi , where ti – the time of the most recent access to the page α – a constant to approximately weigh the most recent access with respect to the running probability estimate (previous) lix (i) = pi / broadcast frequency of i In-Class Questions How to do if broadcast frequency is unknown also? Is PIX suitable for general data broadcasting system? §5.3 Cache Invalidation Cache invalidation is the most important technique to ensure the consistency or freshness of items cached by mobile clients. Traditional invalidation: Server sends invalidation message to client to recall the privilege to access a cached item. Client removes invalidated item and replies with invalidation acknowledgement. Difference from traditional invalidation: Server broadcasts invalidation to many clients, but they may be facing with different status (some may doze off). Server may not even know the identity of some clients. Client may miss some invalidation reports easily. Client may become disconnected. How can server ensure that invalidation is heard properly? Should it repeat the report for newly joining clients? Cache Invalidation Mobile caching invalidation is normally stateless There are too many of mobile clients. Server may not know some clients who come and cache some items. Clients may disconnect without being noticed. A taxonomy of invalidation scheme: Content of invalidation report. Mechanism of invalidation. Information to be kept at server to construct report. Cache Invalidation Two ways to send invalidation reports (IR) Synchronous: broadcast IRs periodically to clients. A client needs to wait and listens to report periodically The client needs to wait till the report comes before answering request, OR It initiates a pull request to the server. Asynchronous: broadcast an IR to clients on a data update Client listens to next report to answer query, OR Initiates a pull to the server. update valid query invalid report report report report update query Broadcast Timestamp Approach Server: periodical IR broadcast, every L seconds. IR report <id, TSid> : data item id was updated at time TSid The report contains all updates in the time window w (w L) Client: listening receiving IR report If item i is cached before time TSi: invalidate it; Otherwise: set cache time for i as now. To answer a query, client should do these: Receive an invalidation report and do invalidation. Send query for missing items over uplink to server. Tolerance of packet loss or disconnection: OK: missing some reports or disconnecting for a while ( shorter than w) Problem: missing report for a period of w (discard all the cache) Broadcast Timestamp Approach Example There are 16 items in system, L = 4, w = 16, now = 34. The invalidation report broadcast at time 34 will be {<1,24>,<5,22>,<6,18>,<7,26>,<8,32>,<10,20>,<12,30>,<16,28>}. Assume that a disconnection occurs for client C between 27 to 32 and that all items were cached at time 24. To query {1, 2, 6, 7, 9, 12, 14} at time 32, C waits for report at time 34 and invalidate some items. Item 1 is valid; 6 is valid; 7 is invalid (cached at 24 and updated at 26); 12 is invalid. Items not in list were updated before time 18 and are valid. Thus, query to server contains items 7 and 12. Id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TS 24 16 10 6 22 18 26 32 2 20 14 30 8 4 12 28 Amnesic Terminal Approach Improvement of Broadcast Timestamp Reducing the size of IR: only id of updated items Updated between now and now – L (last report). Client is stateless with respect to IRs. Upon receiving a report: Invalidate all items listed in the report Advantage: smaller invalidation report. Disadvantage: cannot miss even a single report. Client that misses a report must discard the whole cache. Amnesic Terminal Approach Example There are 16 items in system, L = 4, now = 34. The invalidation report broadcast at time 34 will be {8, 12}. To query {1, 2, 6, 7, 9, 12, 14}, wait for the next report at time 34. Assume that there is no disconnection and no report loss. Item 12 is in report and is invalid; others are all valid. Thus, query to server contains only item 12. Assume that a disconnection occurs between 27 to 33. Missed a report at 30 The last report is received at time 26, which is too old Discard all items in the cache and send a query to server for all items. Id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TS 24 16 10 6 22 18 26 32 2 20 14 30 8 4 12 28 Group-based Approach To address the problem of cache rebuilding after long disconnection Grouping items together to help saving some groups. Items in a group need not to be consecutive. Item IR <id, TSid>: items updated after now – w Group IR <Gid, GTSGid>: groups updated after now-W W, group update time window, W > w GTSGid : the timestamp of latest update of the group items in W 34 7,26 8,32 12,30 16,28 G1,24 G2,22 G3,20 G4,12 Group-based Approach Server operations First: construct item IR Then: construct group IR Computing GTS: Consider items not in item IR GTS = max{t, now –W} Where t is: The largest update timestamp among all items with timestamp less than now – w, OR Zero, if no such item is found. Client invalidates items (even after long disconnection): For items in item IR: Invalidate those items with update time TSid greater than the last valid time in cache. For items older than now – w (those will be discarded in Broadcast timestamp) Remove items in group Gid, if GTSGid > timestamp of last group report. Keep them otherwise. Group-based Approach Id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TS 24 16 10 6 22 18 26 32 2 20 14 30 8 4 12 28 L = 4, W = 24, w = 8, now = 34. Reports were broadcast at time 26, 30, 34. Assume that there are four groups of items: G1 = {1, 2, 3, 4} G2 = {5, 6, 7, 8} G3 = {9, 10, 11, 12} G4 = {13, 14, 15, 16} Server constructs the group-based invalidation report. Since w = 8, items updated between 26 and 34 are included in item IR, i.e., 7, 8, 12, 16. Construct the group update timestamp as the maximum update timestamp in each group which smaller than now-w = 26. Content of invalidation report: 34 7,26 8,32 12,30 16,28 G1,24 G2,22 G3,20 G4,12 Group-based Approach Id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TS 24 16 10 6 22 18 26 32 2 20 14 30 8 4 12 28 L = 4, W = 24, w = 8, now = 34. Reports were broadcast at time 26, 30, 34. Client serves query: Assume disconnection between time 23 and 33. At time 33, a query on {1, 2, 6, 7, 9, 12, 14}. Waits till time 34 for the next report. Content of invalidation report is 34 7,26 8,32 12,30 16,28 G1,24 G2,22 G3,20 G4,12 Based on the item report, 7 and 12 are invalidated. Remaining groups:G1 = {1, 2}, G2 = {6}, G3 = {9}, G4 = {14}. Groups G2, G3, G4: updated before 23, their contents are valid. Group G1: updated at 24, invalid . So 1, 2 are invalidated. Query will be sent for {1, 2, 7, 12} to server. Bit Sequence Approach Use n = log2 N bits for N data items. Use n lists with n timestamps to represent the update of at most N/2 items. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 that was item 12 that was item 8 four items were updated since T3 = 26; two of them were updated since 30, 32 another item was updated at T2 = 30 an item was updated at T1 = 32 B0 0 no item was updated after T0 = 38 The bits point to items that were updated in the previous vector, sequentially. T0 = 38 1: updated since TSi 0: not updated since TSi Bit Sequence Approach To “compress” the IR for smaller size Report size is fixed in bit-sequence approach. Use n = log2 N bits for N data items. Use n lists of total 2N-2 bits with n timestamps to represent the update of at most N/2 items. Can use an additional bit B0 and an additional TS0 to represent that fact of no update since TS0. Cost for N/2 updated items: Bit seq.: 2*N-1+(n+1)*t bits. Brt. ts.: N/2*(n+t) bits. N: number of data items t: size of a timestamp Bit Sequence Approach Server constructs invalidation bit sequence for items and broadcasts the invalidation report periodically. All items before the last TS, i.e., Tn are assumed to have changed. Client invalidates items: if T0 > TS of last invalidation report (Tc) then if Tc < Tn then // last report is too old remove whole cache else // check for each bit sequence check for bit sequences Bi such that Ti Tc < Ti-1 invalidate items marked “1” in Bi update Tc to now Answer query with the remaining valid cache items. Send query back to server for missing items. Selective Tuning in Invalidation Tuning time is important to energy consumption. Selective tuning in invalidation To reduce tuning time, one should allow for selective tuning for invalidation reports. General method: Indexing IR to skip IR not of interest Client is not interested in the invalidation of items not belonging to its query. Selective tuning in group-based approach Client invalidation protocol is query-based. Group report is broadcast before item report. All item reports in the same group are sorted in their id. A partition symbol separates item reports for different groups. Group invalidation report G1,T1,ptr1 G2,T2,ptr2 Item invalidation report Gn,Tn,ptrn o11,t11 o12,t12 o13,t13 o21,t21 Selective Tuning in Invalidation Id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TS 24 16 10 6 22 18 26 32 2 20 14 30 8 4 12 28 L = 4, W = 24, w = 8, now = 34. Assume the same four groups G1, G2, G3, and G4. Client is disconnected from 23 to 33, and query on {1, 2, 6, 7, 14}. Content of next invalidation report at time 34: 34 G1,24 G2,22 G3,20 G4,12 7,26 8,32 12,30 16,28 G1: GTS1 = 24 > 23, G1 is invalid. Items 1 and 2 are needed. G2: GTS2 = 22 < 23, G2 is valid. Keep the pointer to tune to item reports of G2 later. G3: no interesting items in G3, doze off to G4. G4: GTS4 = 12 < 23, G4 is valid. Keep the pointer to tune to item reports of G4 later. Doze off to item report of G2. Item 6 is valid and 7 is invalid. Doze off to item report of G4. Item 14 is valid. Finally, send query to server to request for {1, 2, 7}. §5.4 Cooperative Caching Caching may be done at different levels Client side, gateway, router, server side Cooperative Caching Sharing local cache copies among wireless nodes/users Especially effective in multi-hop wireless networks Caching along data delivery path Accessing cache data from peer nodes Cooperative Caching (a) Traditional caching (b) Cooperative caching Data Server Data Server Cache at Proxy Cache at Proxy Cache at GW Cache at GW Cache at Client Cache at Client Cache at Client Cache at Client Major Issues in Cooperative Caching Cache discovery Where to get cache copies Cooperative placement/replacement Consider the need of all/neighbor nodes Cache consistency Consistency model Consistency strategy Discovery of Cooperative Caches A new problem in data caching Where is the cache copy? How can a node know it? Approaches to Cache Discovery Passive (Transparent) Use a cache copy if it is encountered in the way Cache Discovery Proactive Maintain a table of nearest cache copies Reactive Query before requesting data Active Pros and Cons? Transparent Cache Discovery Request + Reply Request Reply Proactive Cache Discovery Notification + Request + Reply How to disseminate notifications? Notification Request Reply Reactive Cache Discovery Query + Request + Reply Query to all the nodes? Query Request Reply Cooperative Cache (Re-)Placement What data should be cached? Metric is at the core How to evaluate the importance of a data item? Popularity vs. Benefit Popularity based Cache (Re-)Placement Will the data item be requested in future? Access frequency (or probability) Based on data request history Most frequently used for placement Least frequently used for replacement LRU, LFU, etc. Overall access frequency of “all” nodes Neighbors? How far away? Benefit based Cache (Re-)Placement Combine access frequency and access cost Similar to PIX Is the data item needed in future? How frequently? Is the data item costly to be got if not cached? General idea: AccessFreqency * CostToGet Overall access frequency of “all” nodes Neighbors? How far away? Cooperative Cache Consistency Consistency model: how consistent it is Strong consistency: difficult for mobile computing Weak consistency: two weak Delta consistency: Allowing some variance in value/time Probabilistic consistency Consistent with some probability, e.g. 90% Cooperative Cache Consistency Consistency strategy: how to maintain consistency Invalidation vs. Update Push vs. Pull and Hybrid Synchronous vs Asynchronous Stateful vs. Stateless A Summary Concepts Caching, prefetching, hoarding Placement, consistency, coherence Broadcast disk caching Cache invalidation Amnesic Terminal Group-based Invalidation Broadcast Timestamp … Cooperative caching Cache Discovery Cache (Re-)Placement Cache Consistency Homework Questions 1. Please discuss the difference between cooperative cache discovery and routing problem. 2. What approach is used to guarantee the consistency of caches of webpage data? Is it a good solution?