Client-server caching and object stores Benjamin Atkin batkin@cs.cornell.edu Client-server database design Low-level considerations How can database systems exploit powerful client machines? What implementation techniques are required? High-level considerations What interface is provided to applications? How can we efficiently implement it? Client-server caching 2 Overview Client-server systems Advantages of caching Object-oriented databases Wisconsin's Exodus storage manager Cache consistency and transactions Implementation of programming interface: QuickStore Client-server caching 3 Client-server systems Simplify the client machines Share services: filesystem, database, ... Run on powerful, dedicated hosts User machines are "clients" of servers enables data sharing centralised maintenance greater security e.g. Sun’s Network File System Client-server caching 4 Networks of workstations c. 1990: more powerful clients Move some processing to clients faster response time better utilisation of client machines less load on server greater scalability autonomy in the face of server failure Client-server caching 5 Naive client-server data access "read blue object" ... "read blue object" Client-server caching 6 Client-server caching "read blue object" ... "read blue object" Client-server caching 7 Caching principles Analogous to hardware caching Server stores the canonical copy of data Client caches the results of each read Subsequent accesses served from cache What if the data changes? Alternatives: "cheap to detect incorrect data", e.g. DNS "validate before use" "notify on change" Client-server caching 8 Caching in distributed file systems CMU's Andrew File System clients cache all files on local disks 50 client machines for each server UC Berkeley's Sprite OS file cache completes with virtual memory Coda follow-up to AFS, UCLA's Ficus client can completely disconnect from server prediction algorithms to determine what to cache Client-server caching 9 Disadvantages of caching Increases client workload, complexity We may cache the wrong data! potentially wasted network traffic uses valuable space in the cache Data consistency problem stale cached data simultaneous writeback Client-server caching 10 Client-server caching revisited Michael J. Franklin and Michael J. Carey Dividing the work Query shipping clients send queries to the server Data shipping clients request data from server transactions run locally potential for caching Client-server caching 12 Why cache data? A client may read a data object repeatedly read and write an object execute multiple transactions on an object Cache an object and execute transactions locally Write back final value on commit Client-server caching 13 Database client caching client server begin transaction read A cache A write A end transaction store A begin transaction read A read B ... Client-server caching 14 The downside Introduces a consistency problem Increases work at client Slower under some conditions Potentially higher abort rates ... ... Client-server caching 15 Caching in EXODUS Small objects are grouped in fixed-size disk pages Caching and locking at the page level Client has buffer manager, lock manager Franklin+Livny investigate the best strategy for caching with transactions Client-server caching 16 Alternatives for caching Intra-transaction versus intertransaction caching Caching locks as well as data Local versus global locking Optimistic versus centralised locking Invalidation versus propagation of updates Client-server caching 17 What to do on writeback? begin transaction ... fetch blue object Client-server caching 18 What to do on writeback? commit transaction ? Client-server caching propagate or invalidate? 19 A taxonomy of strategies Primary-copy server 2PL Caching 2PL no lock caching, validate data before use Optimistic 2PL variants O2PL-Dynamic, O2PL-New Dynamic Callback locking Client-server caching 20 Optimistic 2PL During transaction acquire local locks At commit, validate with server Propagation variant requires 2PC Dynamic variant's propagation heuristic page is resident at receiving site accessed since last propagation of page previously invalidated this page incorrectly Client-server caching 21 Callback locking Global locks required during transaction On lock conflict, server callback to revoke other locks No validation required on commit CB-Read: cache only read locks CB-All: cache write locks as well, lock downgrade on conflict Client-server caching 22 Experiments Vary data access patterns Vary bottlenecks in the system Client-server caching 23 HOTCOLD workload, slow network Client-server caching 24 FEED workload, slow network Client-server caching 25 HICON workload, fast network Client-server caching 26 Summary CB-Read, O2PL-ND come out best CB-Read implemented in EXODUS lower abort rate than O2PL-ND scales better with data contention Natural consequences of the optimistic approach? Client-server caching 27 QuickStore: a highperformance mapped object store Seth J. White and David J. DeWitt Object-oriented versus object-relational DBs Distributed application support Persistent store for program data Access through programming language (C++), not SQL Transactions over objects Client-server caching 29 The programming interface Application manipulates object identifiers (OIDs) "Swizzling" resolves OID to the object hardware swizzling: OID is a pointer, use VM manipulations to do mapping software swizzling: OID contains a pointer, indirection Client-server caching 30 Design alternatives White+DeWitt compares QuickStore, E QuickStore uses hardware swizzling E uses software swizzling, interpreter Both extend C++, over EXODUS storage manager (ESM) All objects are accessed in transactions Client-server caching 31 QuickStore structure client frame A page a buffer pool ESM Client-server caching object store 32 Fine points of pointer swizzling Complication: objects can contain pointers to other objects When a page is mapped to a frame if a pointer in page points to a mapped page, make it point to the correct frame otherwise, make it point to a new frame Use page protection to catch accesses to non-mapped pages: Unix mmap Client-server caching 33 Page faults Page frames in memory have protection bits: read-only, no access, etc. Incorrect access generates a "fault" Protection faults can be handled by the application itself In QS, reference to no-access frame => bring the page from the object server Client-server caching 34 QuickStore page faults client 0x180 0x223 ?fault object store Client-server caching 35 The mapping procedure Fault on a pointer dereference Request a page from the server Load into buffer pool Rewrite pointers in page Map buffer slot to required frame Client-server caching 36 The ESM buffer manager Limited buffer pool space available Frame-to-page mapping may need to be removed to reclaim a buffer slot Modified clock algorithm for page replacement Client-server caching 37 Optimisations Rewriting pointers is expensive Store pointers in disk pages Try to remap page to its previous frame Changing protection bits is expensive! Try and change many at a time Log optimisation with page diffs Client-server caching 38 Hardware versus software swizzling Page-level swizzling obscures object identity Pointers to deleted objects are still valid Using VM pointers allows a more compact OID representation Client-server caching 39 Comparison: the OO7 benchmark Parts database representative of "CAD/CAM/CASE application" Multiple possible database sizes Hierarchical structure, composite parts Benchmark operations specified traversals of parts tree queries which retrieve random parts Client-server caching 40 Cold times, small database Client-server caching 41 Hot times, small database Client-server caching 42 Cold times, medium database Client-server caching 43 QuickStore and E compared QuickStore is not necessarily better! E performs better with low locality Compact representation small database: 6.6MB versus 10.5MB medium database: 54.2MB versus 94.1MB Log optimisation reduces commit times Client-server caching 44