Thor: a Fast, Distributed, Persistent Object System CS 632—Advanced Database Systems

advertisement
Thor: a Fast, Distributed,
Persistent Object System
Andrew Myers
CS 632—Advanced Database Systems
22 Feb 01
Persistence
• Question : what is the right programming
model for accessing persistent data? what is
the right data model?
• Current persistent data:
– Much data stored in relational databases
– Much less data stored in object databases
– Lots of data in flat files in file systems (every
Windows machine in the world!)
• Structure of data sometime encoded in directory
structure, (& relations)
• More often implicit in application code
7/12/2016
Thor: A Fast Distributed Persistent Object System
2
Structuring Persistent Data
• Huge amount of data are going into digital
formats (e.g., digital libraries)
• Defining suitable models for persistent data is
important -- earlier the better
• Models must be flexible, extensible
• Should support safe sharing of data across
applications, across distributed computing
environment
• Good performance also important
7/12/2016
Thor: A Fast Distributed Persistent Object System
3
Impedance mismatch
• Problem: popular persistent data
formats don’t look much like popular
programming language models
• Persistent data: no pointers, no object
identity, weak referential integrity, no
garbage collection, no type checking
• Important only for volatile data?
7/12/2016
Thor: A Fast Distributed Persistent Object System
4
Effect on Programs
• Program reads “file” of persistent data
• Creates convenient volatile in-memory data
structures using parsing routines
• Data manipulated in volatile form
• Explicitly “saved” by converting back to persistent
format (unparsing)
• Extra parsing & unparsing code with no support for
correctness
• No fine-grained, concurrent sharing
• No pointers, no garbage collection
7/12/2016
Thor: A Fast Distributed Persistent Object System
5
Orthogonal Persistence
• Idea: write application in any language
you like (e.g., Java)
• Objects manipulated by the program
transparently persistent or volatile
• Persistence defined by reachability from
root, not by type or explicit annotation
• Result: persistence for free; low-cost
software development; more robust code
7/12/2016
Thor: A Fast Distributed Persistent Object System
6
Thor
Provides standard single-machine programming
model, but supports distributed persistent data
transparently
• persistent objects with semantics
–rich type system (Java+)
–referential integrity
–garbage collection
• distributed storage + caching
• sequential consistency -- hides concurrent
access, failures
• heterogeneous language support
7/12/2016
Thor: A Fast Distributed Persistent Object System
7
Thor architecture
• Front ends do computation, cache objects,
provide application interface to persistence
• Object repository (OR) provides persistent
storage of objects
Client
Client
Client
FE
FE
FE
OR
OR
OR
7/12/2016
Thor: A Fast Distributed Persistent Object System
8
Programming model
•
•
•
•
Each FE caches part of object universe
Objects automatically fetched as needed
64 bit persistent object ids; 32-bit in-memory ptrs
Safe languages supported: Java, Theta
FE (231 objects)
264 objects
7/12/2016
Thor: A Fast Distributed Persistent Object System
9
Veneers
• Applications may be in
unsafe language (C, C++)
• Object operations invoked
via veneer
– automatically generated
stubs for app language
• Reflective object system
– objects point to their own
implementations
– impls are objects in OR
– can discover interfaces
dynamically
7/12/2016
Client (unsafe lang.)
Veneer
FE (safe lang.)
sharedmemory
pipe
ORs
Thor: A Fast Distributed Persistent Object System
10
Object References
surrogate
(node marking)
object
Client
cached
object
copies
FE
OR
persistent
objects
7/12/2016
Client
unswizzled
pointer
(edge marking)
FE
OR
intra-node reference
(32 bit)
inter-node reference (64 bit via forwarding obj)
Thor: A Fast Distributed Persistent Object System
11
Transactions
• Computation at FE is broken up into
transactions separated by checkpoints
• Transaction is committed atomically to
participating ORs via two-phase commit
Client
FE
OR
7/12/2016
OR
OR
Thor: A Fast Distributed Persistent Object System
12
Persistence by Reachability
• An OR has a root object
– always persistent
– always reachable
– a light-weight directory
• Any object reachable from root becomes
persistent at transaction commit
• No explicit declaration of persistence needed
• No type distinction between persistent and
volatile objects: orthogonal persistence
7/12/2016
Thor: A Fast Distributed Persistent Object System
13
Convenient programming model,
strong semantic guarantees,
and high performance too?
7/12/2016
Thor: A Fast Distributed Persistent Object System
14
Performance
• Performance comparison: OO7 benchmark
• Most generally-accepted object-oriented
database benchmark
• Similar to a CAD database -- good model
– mixture of very small and large objects (4W-32K)
– various recursive traversals (w/ & w/o
modification) of complex pointer structure
– must run in a fixed amount of memory (so that
only fraction of database can fit in memory)
7/12/2016
Thor: A Fast Distributed Persistent Object System
15
Implementation options
• Relational database
• Conventional file system with read/write
• Conventional file system with memorymapped files
• Object-oriented database
• Distributed object-oriented database
(Thor)
7/12/2016
Thor: A Fast Distributed Persistent Object System
16
Using Relational Database
15 levels
• Problem: relational database don’t implement
pointers (object references) efficiently
• Must introduce extra keys, use index to find
appropriate records: extra storage, locality problems
7/12/2016
Thor: A Fast Distributed Persistent Object System
17
Memory-mapped files
• Memory-mapped files (mmap) avoids data
duplication between application and OS file
buffer cache
– Buffer cache memory mapped directly into
application VM
• Conventional file I/O uses twice the
memory; can cache only half as much of
persistent data in memory
Application
Volatile data
7/12/2016
OS Kernel
Buffer cache
Thor: A Fast Distributed Persistent Object System
18
Relative Performance for OO7
?
Non-distributed
Object databases
Memory-mapped files
Thor
Simple File I/O
Object-relational databases
Relational databases
7/12/2016
Thor: A Fast Distributed Persistent Object System
19
Relative Performance
• Object data in OO7 does not fit in
memory  fetches of persistent data
into memory dominate performance
• System with fewest fetches wins
7/12/2016
Thor: A Fast Distributed Persistent Object System
20
OO7 in C++, memory-mapped
• C++/OS application implementing
OO7 benchmark
– Objects in memory-mapped file
– close( ) on file flushes memory to disk
• Weak semantic guarantees:
– no concurrency control
– no array bounds checks
– no support for failure during write
7/12/2016
Thor: A Fast Distributed Persistent Object System
21
Traversals
• Sparse vs. dense traversals
– dense traversals use every page of disk storage
effectively (unrealistic) (91%)
– sparse traversal only touches a few objects on
each page (3%)
– Realistic bound [TN92]: 15-41% hit rate per page
• Read-only vs. read-write traversals
– read-write traversals accumulate changes that
must be written back to disk
7/12/2016
Thor: A Fast Distributed Persistent Object System
22
Thor vs. C++/mmap (dense)
Thor
C++/mmap
sec 200
150
18MB
100
50
T2a
7/12/2016
T2b
Thor: A Fast Distributed Persistent Object System
23
Dense read-only traversal
25% speedup
200
sec
150
C++/mmap
15× speedup
100
40% slowdown
50
Thor
10
7/12/2016
20
30
40
50
FE cache size (MB)
Thor: A Fast Distributed Persistent Object System
24
Other traversals
• C++/OS does best on unrealistically
dense traversals
• Sparse traversals: Thor has up to 1000×
relative performance
• C++ NFS server was given much more
memory than Thor OR server (137MB
vs. 36MB)
7/12/2016
Thor: A Fast Distributed Persistent Object System
25
Conclusion
File systems are obsolete -- they
provide sub-optimal performance
and a even worse interface for
programmers to write applications
7/12/2016
Thor: A Fast Distributed Persistent Object System
26
Thor vs. Quickstore
• Quickstore (commercial objectoriented database) has best published
performance results for any OODB
• Not a distributed system
• Built on memory-mapped files -- uses
page-based memory management
7/12/2016
Thor: A Fast Distributed Persistent Object System
27
Results
• Number of fetches:
sparse
dense
Thor
506
10.2k
Quickstore 610
13.2k
• Thor has 21-25% fewer fetches
• No Quickstore results for medium-sized
traversals; even more advantageous for Thor
• Conclusion: object caching beats page
caching
7/12/2016
Thor: A Fast Distributed Persistent Object System
28
Front end features
• Object storage managed by Hybrid
Adaptive Caching (HAC) algorithm
• CLOCC optimistic concurrency control
algorithm provides sequential
consistency, best performance
• Techniques may be applicable to more
conventional databases
7/12/2016
Thor: A Fast Distributed Persistent Object System
29
Object repository
• Server cache speeds up client fetches
• Modified Object Buffer (MOB)
– keeps track of object mods separately from cache
– defers writes until necessary
– reduces installation reads, allows write absorption
Server
Page
cache
read
Log
flusher
commit, abort
MOB
7/12/2016
Thor: A Fast Distributed Persistent Object System
30
More OR features
• Replicated ORs (log stability via replication)
• Referential integrity
– object mobility (multiple oids per object) supported
through OR surrogate objects, lazy forwarding
– no centralized location service
– distributed GC algorithm collects cycles efficiently
FE
OR
7/12/2016
OR
OR
Thor: A Fast Distributed Persistent Object System
31
Other Issues
• Queries not directly supported in standard
PLs or in Thor
– can be coded using conventional data structures,
but can high-performance queries be achieved?
– may require moving code to data (function
shipping); Thor model is data shipping
– relational databases not obsolete
• Schema evolution: how to handle changes to
software and data objects?
• Disconnected operation/long transactions
7/12/2016
Thor: A Fast Distributed Persistent Object System
32
Reading
• Providing Persistent Objects in Distributed
Systems (ECOOP ’99)
• Hybrid Adaptive Caching for Distributed
Storage Systems (SOSP ’97)
• Safe and Efficient Sharing of Persistent
Objects in Thor (SIGMOD ’96)
• The Language-Independent Interface of the
Thor Persistent Object System, in ObjectOriented Multidatabase Systems
7/12/2016
Thor: A Fast Distributed Persistent Object System
33
Download