Quick Review of May 1 material • Concurrent Execution and Serializability

advertisement
Quick Review of May 1 material
• Concurrent Execution and Serializability
– inconsistent concurrent schedules
– transaction conflicts
• serializable == conflict equivalent to a serial schedule
– precedence graphs (directed edge shows R/W, W/W, or W/R conflict)
• Lock-based Protocols
–
–
–
–
shared and exclusive locks
deadlocks
wait-for graph (directed edge shows “waiting for”)
Two-phase locking protocol
• lock conversion
• strict and rigorous versions
Database Recovery
• Computers may crash, stall, lock
• We require that transactions are Durable (the D in ACID)
– A completed transaction makes a permanent change to the
database that will not be lost
– How do we ensure durability, given the possibility of computer
crash, hard disk failure, power surges, software bugs locking the
system, or all the myriad bad things that can happen?
– How do we recover from failure (i.e., get back to a consistent state
that includes all recent changes)?
• Most widely used system is log-based recovery
Backups
• Regular backups take a snapshot of the database status at a
particular moment in time
– used to restore data in case of catastrophic failure
– expensive operation (writing out the whole database)
• usually done no more than once a week, over the weekend when the
system usage is low
– smaller daily backups store only records that have been modified
since the last weekly backup; done overnight
– backups allow us to recover the database to a fairly recent
consistent state (yesterday’s), but are far too expensive to be used
to save running database modifications
– How do we ensure transaction (D) durability?
Log-Based Recovery
• We store a record of recent modifications; a log.
– Log is a sequence of log records, recording all update activities in
the database. A log record records a single database write. It has
these fields:
• transaction identifier: what transaction performed the write
• data-item identifier: unique ID of the data item (typically the location
on disk)
• old value (what was overwritten)
• new value (value after the write)
– Log is a write-ahead log -- log records are written before the
database updates its records
Log-Based Recovery (2)
• Other log records:
•
•
•
•
<T-ID, start-time>
<T-ID, D-ID, V-old, V-new>
<T-ID, commit-time>
<T-ID, abort-time>
transaction becomes active
transaction makes a write
transaction commits
transaction aborts
• Log contains a complete record of all database activity
since the last backup
• Logs must reside on stable storage
– Assume each log record is written to the end of the log on stable
storage as soon as it is created.
Log-Based Recovery (3)
• Recovery operation uses two primitives:
– redo: reapply the logged update.
• Write V-new into D-ID
– undo: reverse the logged update
• Write V-old into D-ID
– both primitives ignore the current state of the data item -- they
don’t bother to read the value first.
– Multiple applications on the same data item is equivalent to the last
one -- no harm as long as we do them in the correct order, even if
the correct result is already written into stable storage
Checkpoints
• When a system failure occurs, we examine the log to
determine which transactions need to be redone, and
which need to be undone.
• In theory we need to search the entire log
– time consuming
– most of the transactions in the log have already written their output
to stable storage. It won’t hurt the database to redo their results,
but every unnecessary redo wastes time.
• To reduce this overhead database systems introduce
checkpoints.
Log-Based Recover with Checkpoints
• So we have a crash and need to recover. What do we do?
• Three passes through the log between the checkpoint and
the failure
– go forward from the checkpoint to the failure to create the redo and
undo lists
• redo everything that commtted before the failure
• undo everything that failed to commit before the failure
– go backward from failure to checkpoint doing the undos in order
– go forward from checkpoint to failure doing the redos in sequence
– expensive -- three sequential scans of the active log
Recovery Example
•
•
•
First pass: T2, T4 commit; T3, T5 are uncommitted
Second pass: undo T5, then T3
Third pass: redo T4, then T2
Almost Final Stuff on Checkpoints
• Checkpointing usually speeds up recovery
– log prior to checkpoint can be archived
– without checkpoint the log may be very long; three sequential
passes through it could be very expensive
• During checkpointing:
– stop accepting new transactions and wait until all active
transactions commit
– flush the log to stable storage
– flush all dirty disk pages in the buffer to disk
– mark the stable-storage log record with a <checkpoint> marker
Final Stuff on Checkpoints
• Better checkpointing
– don’t wait for active transactions to finish, but don’t let them make
updates to the buffers or the update log during checkpointing
– make the checkpoint log record so that it includes a list L of active
transactions
<checkpoint, L>
– on recovery we need to go further back through previous
checkpoints to find all the changes of all transactions listed in L so
we can undo or redo them
– an even more elaborate scheme (called fuzzy checkpointing) allows
updates during recovery
Deferred vs. Immediate Modification
• Immediate Database Modification
– basically what we’ve been discussing so far -- uncommitted
transactions may write values to disk during their execution
• Deferred Database Modification
– no writes to the database before transaction is partially committed
(i.e., after the execution of its last statement)
– since no uncommitted transaction writes are in the log, there is no
need for the undo pass on recovery.
Download