transaction

advertisement
Sigmod 2008 paper
OLTP through the looking glass,
and what we found there
Authors:
Stavros Harizopoulos @ HP
Daniel J. Abadi @ Yale
Samuel Madden @ MIT
Michael Stonebraker @ MIT
Supervisor: Dr Benjamin Kao
Presenter: For
What is Online Transaction Processing
(OLTP)?
• OLTP, refers to a class of systems that facilitate
and manage transaction-oriented applications,
typically for data entry and retrieval
transaction processing (wiki)
• OLTP database was optimized for computer
technology of late 1970s, 30 years ago
2B3L
Main feature of OLTP
• Buffer Management to facilitate data transfer
between memory and disk
• B-Tree for on-disk data storage
• Logging for recovery
• Locking to support concurrency
• Latching for accessing shared data structure
2B3L
Motivation of the studies
• Is the OLTP database optimized nowadays,
given the hardware advancement?
• Request from outside the DB community for
alternative DB architecture
Motivation of studies
Hardware advancement
30 Years ago
Nowadays
DB cost
In millions
Few thousands
Storage size
DB size >>
Memory
Memory >
DB size
Processing time
for most of the
transactions
\
In microseconds
Motivation of studies
Request from outside the DB community
• “database-like” storage system proposal from
Operating System and networking conference
– varying forms of
•
•
•
•
•
concurrency,
consistency,
reliability,
replication,
queryability
Trends in OLTP – (1/5)
Cluster Computing
Trend in OLTP-(2/5)
Memory resident Databases
Buffer
Trend in OLTP-(2/5)
Memory resident Databases
• Data doesn’t grow as fast as memory size
Trend in OLTP-(3/5)
Single Threading in OLTP System
Trend in OLTP-(3/5)
Single Threading in OLTP System
• A step backward from multithread to single
thread ?
• Why multithreading?
– Prevent idle of CPU while waiting data from disk
– Prevent long-running transactions from blocking
short transaction
• Not valid for memory resident DB
– No disk wait
– Long-running transactions run in ware house
Trend in OLTP-(3/5)
Single Threading in OLTP System
• What about multi processors ?
– Achieve shared-nothing processor by virtual
machine
• What about network disk?
– Feasible to partition transaction to run in “singlesite”
Trend in OLTP-(4/5)
High Availability vs. Logging
• 24x7 service achieved by using multiple set of
hardware.
• Perform recovery by copying missing states
from other database replicas.
• Log for recovery can be avoided
Trend in OLTP-(4/5)
High Availability vs. Logging
24x7 service
Production Server
Recovery from
Standby Server
Trend in OLTP-(5/5)
Transaction Variants
• Why transaction variants?
– 2 phase commit protocol harm large scale
distributed DB system performance
– 2 phase commit involves commit-request and
commit phase which involves all server to
participate.
Trend in OLTP-(5/5)
Transaction Variants
• Trade consistency with performance
• Eventual consistency, all writes propagate
among the database servers.
Trend in OLTP-(5/5)
Transaction Variants
• Eventual consistency example: Amazon
Sx, Sy, Sz are different servers
D1, D2 add item to cart
D3 delete an item from cart
D4 add another item to cart
Research groups interested in new DB
architecture
•
•
•
•
Amazon,
HP,
NYU,
MIT
Trend in OLTP - Summary
Eventual consistency
DBMS modification by OLTP trends
• (1) memory resident DB can get rid of buffer
mgt
Buffer
DBMS modification by OLTP trends
• (2) single thread can avoid locking and
latching
DBMS modification by OLTP trends
• (3) Cluster computing avoid locking, instead of
single processor multithread, each processor
is responsible for each own thread
DBMS modification by OLTP trends
• (4) high availability can avoid using log for
recovery purpose
Recovery from
Production Server
Standby Server
DBMS modification by OLTP trends
• (5) transaction less avoid book keeping, i.e.
logging
Case Study of DBMS, Shore
• Shore was developed at the University of
Wisconsin in the early 1990’s
• It was designed to be a typed, persistent
object system borrowing from both file system
and object-oriented database technologies
http://www.cs.wisc.edu/shore
Basic components of Shore
2B3L
Removing Shore’s components
Remove Shore’s logging(1)
• To increase log buffer size so that it will not
flush to disk
Remove Shore’s locking(2)
• To configure Lock Manager always granting
lock
• To remove codes that handle ungranted lock
request
Remove Shore’s latching(3)
• To add if-else statement to avoid request for
latch
• To replace the original latching intensive Btree with a latch free B-tree
Remove Shore’s Buffer Mgt (4)
• To replace Buffer Mgt by directly invoking
malloc for memory allocation
Shore after components removal
Bench mark for comparison (TPC-C)
• TPC-C is industry standard used to measure
ecommerce performance
• TPC-C is designed to represent any industry
that must manage, sell, or distribute a product
or service
• Vendors includes Microsoft, Oracle, IBM,
Sybase, Sun, HP, DELL etc.
http://www.tpc.org/tpcc/default.asp
Bench mark for comparison
• 1 warehouse(~100M) serves 10 districts, and
each district serve 3000 customers.
Bench mark for comparison
• 5 concurrent transactions in TPC-C
– New Order Transaction
– Payment
– Deliver Order
– Check status of Order
– Monitor Stock Level of warehouse
Experiment setup and measurement
• Single-core Pentium 4, 3.2GHz, with 1MB L2
cache, hyper threading disabled, 1GB RAM,
running Linux 2.6.
• 40,000 transactions of types New Order
Transaction and Payment are run
• Results measured in
– 1) Throughput (Time/ # Transaction completed)
– 2) Instruction count
Results after removing the
components (in throughput)
• Memory resident Shore DB provided 640
transactions per second.
• Stripped-down Shore DB provided 12,700
transactions per second.
• The Stripped-down Shore DB gave a 20 times
improvement in throughput
Results after removing the
components (in # instruction)
• Instruction of useful work is only <2% of a
memory resident DB
Effect of removing different
components for payment (1/6)
Effect of removing different
components for payment (2/6)
Effect of removing different
components for payment (3/6)
Effect of removing different
components for payment (4/6)
Effect of removing different
components for payment (5/6)
Effect of removing different
components for payment (6/6)
Effect of removing different
components for both order
Instruction and cycle comparsion
Implication for future OLTP engine
Concurrency Control
• Single threaded transaction allow concurrency
control to be turned off
• But many DBMS applications are not
sufficiently well behave for Single threaded
transaction.
• Dynamic locking was experimentally the best
concurrency control with disk.
• What concurrency control protocol is best?
Implication for future OLTP engine
Multi-core support
• Virtualization, each core is a single-threaded
machine
• Intra-query parallelism, each processor
running part of a single query
Implication for future OLTP engine
Replication Management
• Active-passive replication scheme with log
• Replica may not be consistent with the
primary unless on two-phase commit protocol
• Log is required
• Active-active replication scheme with
transactions
• Two-phase commit introduce large latency for
distributed replication
Implication for future OLTP engine
Weak Consistency
• Eventual Consistency - Data is not immediately
propagated across all nodes.
• To study the effect of different degree of
workload to the consistency level of data
Implication for future OLTP engine
Cache conscious B-Trees
• Cache misses in the B-tree code may well be
the new bottleneck for the stripped down
system.
Conclusion
• Most significant overhead contributors are
buffer management and locking operation,
followed by logging and latching.
• A fully stripped down system’s performance is
orders of magnitude better than an
unmodified system.
Thank you !
Download