Indexes (AKA Indices) - Prof. Yitz Rosenthal

advertisement
Transactions
-Fehily book - chap. 14
- Mannino book - chap 15 (up
to 15.2)
Prof. Yitz Rosenthal
ACID
• There are 4 terms used in conjunction with
Transaction
– Atomic
– Consistent
– Isolated
– Durable
• Acronym: ACID
ATOMICITY and DURABLITY
Atomicity
• Operations in a transaction will be processed as
a single unit
• Either :
– ALL of the operations will happen
OR
– NONE of the operations will happen
Durability
• Once a transaction is completed, you are
GUARANTEED that the data will be stored in
the underlying database files correctly EVEN IF
THERE IS SOME UNFORSEEN
CATISTROPHIC EVENT (e.g. a Power Outage)
EXAMPLES
Examples
• Travel Agent
– Booking a departing and return flight as one purchase.
– You don't want to book the departing flight if there is a problem booking
the associated return flight at the same time.
– EVEN MORE SO: You don't want to book the return flight if you can't
book the departing flight at the same time.
• Banking ATM
– Transferring money from a savings account to a checking account. This
involves debiting the savings account and crediting the checking
account.
– You don't want to debit the savings account unless you can also credit
the checking account
– You also don't want to credit the checking account if there was a problem
debiting the savings account.
What can go wrong WITHOUT
TRANSACTIONS
• Imagine...
• Banking example:
– Step 1: Person at ATM requests to do a transfer and
presses the "OK" button on ATM.
– Step 2: DBMS performs the debit of the savings
account and writes the new amount to the database
files.
– Step 3: ***** POWER OUTAGE *****
(computer goes down)
– Step 4: When computer is rebooted, the savings
account was debited but the checking account was
never credited.
Other types of failures
• *** Power Outage *** is only one type of failure that can happen
to a transaction
• Other types of failures
– program detected failure after debiting savings, program queries
checking and notices that balance in checking account is somehow
negative - program will voluntarily stop the transfer since something is
fishy - program issues ROLLBACK command (see next few slides) to
undo the modifications to DB made so far.
– abnormal program termination - caused by programming bug (e.g.
division by zero in a program might cause an unexpected crash of the
program between debiting and crediting) program never COMMITs and
transaction times out and DBMS automatically rolls back transaction.
– System failure - e.g. power glitch causes reboot of server
– Device failure - e.g. hard drive that contains database files crashes. If
transaction log is kept on a different hard drive and an earlier copy of the
database is backed up somewhere else, the current version of the
database can be recreated from the log file.
Buffered Writes
Buffered Writes
• Delayed (or buffered) writes
– Writes to database tables are not written to disk
immediately.
– When an application writes to a DB table (e.g.
insert, update, delete) the DBMS stores the
information in memory buffers.
– The information may actually written to disk only
much much later.
Durability
• Data is DURABLE
– Transactions GUARANTEE that if a system failure
(e.g. power outage) occurs after a transaction is
committed, the database will be able to be restored
to reflect the changes made by the transaction even
if the underlying table data was not written to the
database file.
(we'll see how soon).
Disks are SLOW, Memory is FAST
• Why are writes buffered?
– Memory is MUCH, MUCH faster than a disk drive.
• How
– This reason for this has to do with how disk drives
work (see next slides).
Tracks, Sectors & Clusters
• Disk drives are
segmented into
– tracks (concentric
circles) and
– sectors (pie slices)
• Each track and each
sector has a identifying
number
• A cluster is a particular
area of the disk
corresponding to a
specific track and
specific sector
Cluster Size
• Every cluster on the disk stores the same
amount data.
• The amount of data stored in a cluster is known
as the cluster size.
• Cluster sizes are usually powers of two:
– Example cluster sizes for different disk drives:
•
•
•
•
512 bytes
1024 bytes
4096 bytes
etc.
Reads and Writes
• Every read and write to a disk drive will read or
write an entire cluster at a time.
• There is NO WAY for a disk drive to read or
write only part of a cluster.
• Therefore - PHYSICALLY, IT TAKES JUST AS
MUCH TIME TO READ OR WRITE 512 BYTES
ON A DISK DRIVE (if cluster size is 512 bytes)
AS IT DOES TO READ OR WRITE JUST ONE
BYTE.
Logical Records and Physical
Records
• A physical record corresponds to the data on a
single disk sector.
• A logical record corresponds to the data from a
single record in a particular table.
• Logical records for a specific table are all the
same size. (VARCHAR and VARBINARY data
are not stored in the logical record)
Storage of logical records in
physical records
• If the logical record smaller is smaller than the
physical record size (i.e. cluster size) then
multiple logical records are stored in a single
physical record.
• If the logical record size is LARGER than the
physical record size then a single logical record
will need to be split between 2 or more physical
records.
Not enough memory to hold
everything
• The amount of memory available to the DBMS
is generally NOT as large as the amount of disk
space available.
– Memory is much more expensive than disk space.
– Hardware limitations limit amount of memory that
can be placed on one machine.
Memory Buffers
• The DBMS creates memory buffers that are the
same size as the disk clusters.
• When the DBMS reads information from a
cluster, it copies that information to an in
memory buffer which is the same size as the
cluster. This is known as a memory "page".
Paging
• What is "paging"?
– There generally are NOT ENOUGH memory pages
to store the whole database.
– When the DBMS needs to access data that is not
currently in memory, the DBMS
• Picks an in-memory page that is not being used and
writes it to disk
• The DBMS then reads the desired data from the disk into
the now available memory buffer.
Checkpoints
• What is a "checkpoint".
– Once in a while, the DBMS ensures that the latest
copy of all pages are on the disk.
– This is known as a "checkpoint"
– Checkpoints are necessary for the log mechanism
to work correctly.
SQL Commands
SQL Commands
• BEGIN TRANSACTION
– issued before any SQL statements that are part of
the transaction
• COMMIT
– issued after all SQL statements that are part of the
transaction
– Once the COMMIT statement is executed you are
guaranteed that the data is permanently in the
database even if unforeseen errors happen.
Example of transaction
BEGIN TRANSACTION
UPDATE SAVINGS_ACCOUNTS
SET BALANCE = BALANCE - 500
WHERE ACCOUNT_NUMBER = 12345;
UPDATE CHECKING_ACCOUNTS
SET BALANCE = BALANCE + 500
WHERE ACCOUNT_NUMBER = 12345
COMMIT TRANSACTION
Other SQL commands:ROLLBACK
• Other SQL commands
– ROLLBACK
• A rollback command forces whatever was done in the
transaction so far to become "undone".
• Similar to the "undo" command on your word processor.
• This is used both with "stored procedures" and
application programs that interact with the database.
When the program encounters a condition after it started
processing the transaction that requires undoing the
transaction, the "stored procedure" or the application can
issue the ROLLBACK command.
• Example: if there is a transmission error in a distributed
database, the application program can ROLLBACK a
transaction once it is started.
SAVE POINTS
• Additional SQL commands
– SAVE TRAN mysavepoint1
• allows you to breakup a long transaction in to several
parts.
• You can create several savepoints
• Each savepoint is given a unique name (e.g.
mysavepoint1, mysavepoint2, etc.)
• at any point the program can issue a
ROLLBACK TRAN mysavepoint1
command to rollback the transaction until the specified
savepoint and then continue on from there.
LOG File
Transaction ID
• Many transactions can be executing
simultaneously.
• actions from Trans1 are often interspersed with
actions from Trans2
• Therefore, each transaction is assigned a
unique ID by the DBMS.
LOG File
• All changes to the database are recorded both
– in the underlying DB tables
AND
– in a TRANSACTION LOG FILE
Buffered Writes
• Information written to the DB tables can be
buffered to enhance performance.
• Information is not necessarily written to
permanent storage (i.e. the disk drive) when the
LOG File
• 3 types of records in the LOG file
– begin record
– commit record
– detail record
• 4th type (we'll discuss later)
– rollback record
Log BEGIN & COMMIT Records
• BEGIN record contains
– transaction id
• COMMIT record contains
– transaction id
Log DETAIL record
• There can be many DETAIL records for each
transaction
• Each DETAIL Record contains
– transaction id
– action (insert, update, delete)
– row id (used to uniquely identify the row in the
table)
– old & new values
(AKA before image & after images)
LOG File Implemented as a Table
• The LOG FILE is often implemented as a
special "hidden" database table, not available to
users.
• In this case each row in the table needs a
sequence number to indicate the order in which
records were written to the LOG
Database Recovery
Transactions to the rescue
– Step 1: Person at ATM requests to do a transfer and presses the "OK"
button on ATM.
– Step 2: DBMS writes a record to the LOG file indicating the changes to
be made to the savings_account table
– Step 3: DBMS writes new amount to the savings_account record.
– Step 3: ***** POWER OUTAGE *****
(computer goes down)
– Step 4: When computer is rebooted and DBMS server software is
restarted ...
• The recovery subsystem in the DBMS software attempts to "recover" the
database (this generally happens automatically - Recovery Transparency)
• The recovery subsystem looks through the LOG file and backs out any
changes to the database made by any Transaction for which there is no
COMMIT record
• To do so, the recovery subsystem must make sure that the value in the
savings_account record is equal to the "before image" of the record.
– Step 5: Database is restored as though no transfer ever happened.
– Step 6: Database comes online for regular processing.
Other scenarios for discussion ...
• Outage happened before record was written to
savings_account table file
Database BACKUPs
Backing up a DB
• DBAs should maintain backups of their entire
database in case something catastrophic
happens
Two types of backups
• 2 types of backups
– FULL backup
– INCREMENTAL backup
FULL backup
• COLD BACKUP
– A FULL backup on a database that is not active requires
backing up only the database files (tables, etc.)
• HOT BACKUP
– For 24X7 applications it is often impossible to perform a
COLD backup.
– A HOT backup requires backup of BOTH
• database files (i.e. tables, etc)
AND
• LOG files
INCREMENTAL BACKUPS
• In very large databases, it is often prohibative to
backup the entire set to DB files (ie. tables, etc)
on a regular basis.
• Instead a single backup of the DB files can be
done at one time.
• After that backups of the LOG files can be
done.
ROLL FORWARD
• To restore a database that was backed up
INCREMENTALLY, the DBA uses a tool to
restore the DB.
• The log files are used to "ROLL FORWARD"
the changes that were made to the underlying
DB since the backup of the table files.
CONSISTENCY
CONSISTENCY
• Transactions always operate on a consistent view of
the data and when they end always leave the data in a
consistent state. Data may be said to be consistent as
long as it conforms to a set of invariants, such as no
two rows in the customer table have the same
customer id and all orders have an associated
customer row. While a transaction executes these
invariants may be violated, but no other transaction
will be allowed to see these inconsistencies, and all
such inconsistencies will have been eliminated by the
time the transaction ends.
ISOLATION
(and concurrency)
Concurrency and Isolation
• Concurrency:
– In a multi-user database, several programs are
working against the database at the same time.
• Transactions must guarantee that each
program "sees" a consistent view of the
underlying data without interference from the
other programs.
Types of problems if there is no
Isolation
• lost updates
• 2 transactions trying to update same value
• uncommitted dependency (AKA dirty read)
• 1 transaction reads data written by a 2nd transaction before the 2nd
transaction commits
• 2nd transaction does a ROLLBACK
• Inconsistent Retrievals
• incorrect summary (includes some changed records and some unchanged
records)
• phantom read
– TR1 selects some records
– TR2 writes some data that would have been retrieved by TR1s query
– TR1 runs the same query again, expecting same results, but gets different
results.
• nonrepeatable read
– TR1 reads a value
– TR2 changes the value
– TR1 reads same value again
What can go wrong:
EXAMPLE
• Examples:
– see diagrams on pages 541 - 542 in Mannino
Concurrency Transparency
• Isolation is AUTOMATICALLY enforced by the
DBMS. The application programmer and the
DBA do not need to do anything other than start
and commit the transactions.
• This is knows as "Concurrency transparency"
How does DBMS enforce
Isolation?
Simple method : Sequential
Execution
• Isolation via Sequential Execution
– DBMS can wait to perform a transaction until all
other transactions in the system have been
committed.
– This would cause VERY BAD performance for a
multi-user application.
– The goal of isolation is to make it look to the user
like the DBMS is doing sequential execution.
Term: Transaction Throughput
• Transaction Throughput
– The number of transactions a DBMS can perform
per unit time
– (more is better)
Motivation for studying
ISOLATION
Why Do DBAs need to understand
concurrency?
• Concurrency control adds overhead to DBMS
processing.
• Transactions can be structured to minimize or
maximize the amount of work the DBMS needs
to do.
Locking
Granularity of locks
• Coarsest to finest
– database lock (entire database
– table locks
– row locks
– field lock
• Other types of locks
– page locks (i.e. physical records on disk or pages in
memory)
– index locks
Locking and Efficiency
• Coarse locks improve overall performance but
can cause individual transactions to wait a long
time
• Fine locks improve perception among users but
can decrease overall performance
Lock promotion
• DBMS "concurrency control manager" may
automatically "promote" a lock to a coarser
grained lock if it determines that would greatly
improve efficiency.
DEADLOCK
• Deadlock
– example: trying to reserve a seat on each leg of a
two leg journey
– (speak this out)
Deadlock recovery
• Deadlock recovery
– DBMS chooses one of the deadlocked transactions
and automatically does a ROLLBACK on it.
– Other transaction(s) can then proceed.
• Deadlock detection vs. Timeouts
– Deadlock detection algorithms are expensive to
implement.
– DBMS often uses timeouts to determine which
transactions are deadlocked.
– Timeout values should be chosen appropriately for
the application.
– In general, transactions should BE SHORT LIVED.
Types of Locks
• Types of Locks
– Shared lock (AKA read lock)
– Exclusive lock (AKA write lock)
2 phase locking
2 phase locking
• 2 phase locking
– ALL transactions in the database must follow the
following rule: A transaction must not acquire any
new locks after releasing any lock
– This will avoid the "lost updates" problem
Another modification
• Hold all exclusive (i.e. write) locks to end of
transaction
– This will avoid the "uncommitted dependency"
problem.
One more modification
• Hold all shared (i.e. read) locks until end of
transaction
– This eliminates the following problems:
• incorrect summary
• nonrepeatable read
• phantom read
Optimistic concurrency control
• Check to see if there is a conflict and do a
ROLLBACK if there is
• few conflicts = better performance than locking
Isolation Levels
• See chart on p. 558 in Mannino
– READ UNCOMMITTED
– READ COMMITTED
– REPEATABLE READ
– SERIALIZABLE
• Example
– SET TRANSACTION ISOLATION LEVEL READ
COMMITTED
Performance Issues
Store LOG File on different Hard
drive
END OF PRESENTATION
CHECKPOINTS
Checkpoints
• Changes to the underlying tables are not
always written out to permanent storage when
they happen.
• Changes can reside in memory (volatile
storage) until a "checkpoint" happens.
• The DBMS will occasionally ensure that all
changes to the underlying tables (not the log
file) are written out. This is called a checkpoint.
Immediate Update
• Immediate update
– DB writes Table data AFTER log file
Download