Thor: a Fast, Distributed, Persistent Object System Andrew Myers CS 632—Advanced Database Systems 22 Feb 01 Persistence • Question : what is the right programming model for accessing persistent data? what is the right data model? • Current persistent data: – Much data stored in relational databases – Much less data stored in object databases – Lots of data in flat files in file systems (every Windows machine in the world!) • Structure of data sometime encoded in directory structure, (& relations) • More often implicit in application code 7/12/2016 Thor: A Fast Distributed Persistent Object System 2 Structuring Persistent Data • Huge amount of data are going into digital formats (e.g., digital libraries) • Defining suitable models for persistent data is important -- earlier the better • Models must be flexible, extensible • Should support safe sharing of data across applications, across distributed computing environment • Good performance also important 7/12/2016 Thor: A Fast Distributed Persistent Object System 3 Impedance mismatch • Problem: popular persistent data formats don’t look much like popular programming language models • Persistent data: no pointers, no object identity, weak referential integrity, no garbage collection, no type checking • Important only for volatile data? 7/12/2016 Thor: A Fast Distributed Persistent Object System 4 Effect on Programs • Program reads “file” of persistent data • Creates convenient volatile in-memory data structures using parsing routines • Data manipulated in volatile form • Explicitly “saved” by converting back to persistent format (unparsing) • Extra parsing & unparsing code with no support for correctness • No fine-grained, concurrent sharing • No pointers, no garbage collection 7/12/2016 Thor: A Fast Distributed Persistent Object System 5 Orthogonal Persistence • Idea: write application in any language you like (e.g., Java) • Objects manipulated by the program transparently persistent or volatile • Persistence defined by reachability from root, not by type or explicit annotation • Result: persistence for free; low-cost software development; more robust code 7/12/2016 Thor: A Fast Distributed Persistent Object System 6 Thor Provides standard single-machine programming model, but supports distributed persistent data transparently • persistent objects with semantics –rich type system (Java+) –referential integrity –garbage collection • distributed storage + caching • sequential consistency -- hides concurrent access, failures • heterogeneous language support 7/12/2016 Thor: A Fast Distributed Persistent Object System 7 Thor architecture • Front ends do computation, cache objects, provide application interface to persistence • Object repository (OR) provides persistent storage of objects Client Client Client FE FE FE OR OR OR 7/12/2016 Thor: A Fast Distributed Persistent Object System 8 Programming model • • • • Each FE caches part of object universe Objects automatically fetched as needed 64 bit persistent object ids; 32-bit in-memory ptrs Safe languages supported: Java, Theta FE (231 objects) 264 objects 7/12/2016 Thor: A Fast Distributed Persistent Object System 9 Veneers • Applications may be in unsafe language (C, C++) • Object operations invoked via veneer – automatically generated stubs for app language • Reflective object system – objects point to their own implementations – impls are objects in OR – can discover interfaces dynamically 7/12/2016 Client (unsafe lang.) Veneer FE (safe lang.) sharedmemory pipe ORs Thor: A Fast Distributed Persistent Object System 10 Object References surrogate (node marking) object Client cached object copies FE OR persistent objects 7/12/2016 Client unswizzled pointer (edge marking) FE OR intra-node reference (32 bit) inter-node reference (64 bit via forwarding obj) Thor: A Fast Distributed Persistent Object System 11 Transactions • Computation at FE is broken up into transactions separated by checkpoints • Transaction is committed atomically to participating ORs via two-phase commit Client FE OR 7/12/2016 OR OR Thor: A Fast Distributed Persistent Object System 12 Persistence by Reachability • An OR has a root object – always persistent – always reachable – a light-weight directory • Any object reachable from root becomes persistent at transaction commit • No explicit declaration of persistence needed • No type distinction between persistent and volatile objects: orthogonal persistence 7/12/2016 Thor: A Fast Distributed Persistent Object System 13 Convenient programming model, strong semantic guarantees, and high performance too? 7/12/2016 Thor: A Fast Distributed Persistent Object System 14 Performance • Performance comparison: OO7 benchmark • Most generally-accepted object-oriented database benchmark • Similar to a CAD database -- good model – mixture of very small and large objects (4W-32K) – various recursive traversals (w/ & w/o modification) of complex pointer structure – must run in a fixed amount of memory (so that only fraction of database can fit in memory) 7/12/2016 Thor: A Fast Distributed Persistent Object System 15 Implementation options • Relational database • Conventional file system with read/write • Conventional file system with memorymapped files • Object-oriented database • Distributed object-oriented database (Thor) 7/12/2016 Thor: A Fast Distributed Persistent Object System 16 Using Relational Database 15 levels • Problem: relational database don’t implement pointers (object references) efficiently • Must introduce extra keys, use index to find appropriate records: extra storage, locality problems 7/12/2016 Thor: A Fast Distributed Persistent Object System 17 Memory-mapped files • Memory-mapped files (mmap) avoids data duplication between application and OS file buffer cache – Buffer cache memory mapped directly into application VM • Conventional file I/O uses twice the memory; can cache only half as much of persistent data in memory Application Volatile data 7/12/2016 OS Kernel Buffer cache Thor: A Fast Distributed Persistent Object System 18 Relative Performance for OO7 ? Non-distributed Object databases Memory-mapped files Thor Simple File I/O Object-relational databases Relational databases 7/12/2016 Thor: A Fast Distributed Persistent Object System 19 Relative Performance • Object data in OO7 does not fit in memory fetches of persistent data into memory dominate performance • System with fewest fetches wins 7/12/2016 Thor: A Fast Distributed Persistent Object System 20 OO7 in C++, memory-mapped • C++/OS application implementing OO7 benchmark – Objects in memory-mapped file – close( ) on file flushes memory to disk • Weak semantic guarantees: – no concurrency control – no array bounds checks – no support for failure during write 7/12/2016 Thor: A Fast Distributed Persistent Object System 21 Traversals • Sparse vs. dense traversals – dense traversals use every page of disk storage effectively (unrealistic) (91%) – sparse traversal only touches a few objects on each page (3%) – Realistic bound [TN92]: 15-41% hit rate per page • Read-only vs. read-write traversals – read-write traversals accumulate changes that must be written back to disk 7/12/2016 Thor: A Fast Distributed Persistent Object System 22 Thor vs. C++/mmap (dense) Thor C++/mmap sec 200 150 18MB 100 50 T2a 7/12/2016 T2b Thor: A Fast Distributed Persistent Object System 23 Dense read-only traversal 25% speedup 200 sec 150 C++/mmap 15× speedup 100 40% slowdown 50 Thor 10 7/12/2016 20 30 40 50 FE cache size (MB) Thor: A Fast Distributed Persistent Object System 24 Other traversals • C++/OS does best on unrealistically dense traversals • Sparse traversals: Thor has up to 1000× relative performance • C++ NFS server was given much more memory than Thor OR server (137MB vs. 36MB) 7/12/2016 Thor: A Fast Distributed Persistent Object System 25 Conclusion File systems are obsolete -- they provide sub-optimal performance and a even worse interface for programmers to write applications 7/12/2016 Thor: A Fast Distributed Persistent Object System 26 Thor vs. Quickstore • Quickstore (commercial objectoriented database) has best published performance results for any OODB • Not a distributed system • Built on memory-mapped files -- uses page-based memory management 7/12/2016 Thor: A Fast Distributed Persistent Object System 27 Results • Number of fetches: sparse dense Thor 506 10.2k Quickstore 610 13.2k • Thor has 21-25% fewer fetches • No Quickstore results for medium-sized traversals; even more advantageous for Thor • Conclusion: object caching beats page caching 7/12/2016 Thor: A Fast Distributed Persistent Object System 28 Front end features • Object storage managed by Hybrid Adaptive Caching (HAC) algorithm • CLOCC optimistic concurrency control algorithm provides sequential consistency, best performance • Techniques may be applicable to more conventional databases 7/12/2016 Thor: A Fast Distributed Persistent Object System 29 Object repository • Server cache speeds up client fetches • Modified Object Buffer (MOB) – keeps track of object mods separately from cache – defers writes until necessary – reduces installation reads, allows write absorption Server Page cache read Log flusher commit, abort MOB 7/12/2016 Thor: A Fast Distributed Persistent Object System 30 More OR features • Replicated ORs (log stability via replication) • Referential integrity – object mobility (multiple oids per object) supported through OR surrogate objects, lazy forwarding – no centralized location service – distributed GC algorithm collects cycles efficiently FE OR 7/12/2016 OR OR Thor: A Fast Distributed Persistent Object System 31 Other Issues • Queries not directly supported in standard PLs or in Thor – can be coded using conventional data structures, but can high-performance queries be achieved? – may require moving code to data (function shipping); Thor model is data shipping – relational databases not obsolete • Schema evolution: how to handle changes to software and data objects? • Disconnected operation/long transactions 7/12/2016 Thor: A Fast Distributed Persistent Object System 32 Reading • Providing Persistent Objects in Distributed Systems (ECOOP ’99) • Hybrid Adaptive Caching for Distributed Storage Systems (SOSP ’97) • Safe and Efficient Sharing of Persistent Objects in Thor (SIGMOD ’96) • The Language-Independent Interface of the Thor Persistent Object System, in ObjectOriented Multidatabase Systems 7/12/2016 Thor: A Fast Distributed Persistent Object System 33