Transactions -Fehily book - chap. 14 - Mannino book - chap 15 (up to 15.2) Prof. Yitz Rosenthal ACID • There are 4 terms used in conjunction with Transaction – Atomic – Consistent – Isolated – Durable • Acronym: ACID ATOMICITY and DURABLITY Atomicity • Operations in a transaction will be processed as a single unit • Either : – ALL of the operations will happen OR – NONE of the operations will happen Durability • Once a transaction is completed, you are GUARANTEED that the data will be stored in the underlying database files correctly EVEN IF THERE IS SOME UNFORSEEN CATISTROPHIC EVENT (e.g. a Power Outage) EXAMPLES Examples • Travel Agent – Booking a departing and return flight as one purchase. – You don't want to book the departing flight if there is a problem booking the associated return flight at the same time. – EVEN MORE SO: You don't want to book the return flight if you can't book the departing flight at the same time. • Banking ATM – Transferring money from a savings account to a checking account. This involves debiting the savings account and crediting the checking account. – You don't want to debit the savings account unless you can also credit the checking account – You also don't want to credit the checking account if there was a problem debiting the savings account. What can go wrong WITHOUT TRANSACTIONS • Imagine... • Banking example: – Step 1: Person at ATM requests to do a transfer and presses the "OK" button on ATM. – Step 2: DBMS performs the debit of the savings account and writes the new amount to the database files. – Step 3: ***** POWER OUTAGE ***** (computer goes down) – Step 4: When computer is rebooted, the savings account was debited but the checking account was never credited. Other types of failures • *** Power Outage *** is only one type of failure that can happen to a transaction • Other types of failures – program detected failure after debiting savings, program queries checking and notices that balance in checking account is somehow negative - program will voluntarily stop the transfer since something is fishy - program issues ROLLBACK command (see next few slides) to undo the modifications to DB made so far. – abnormal program termination - caused by programming bug (e.g. division by zero in a program might cause an unexpected crash of the program between debiting and crediting) program never COMMITs and transaction times out and DBMS automatically rolls back transaction. – System failure - e.g. power glitch causes reboot of server – Device failure - e.g. hard drive that contains database files crashes. If transaction log is kept on a different hard drive and an earlier copy of the database is backed up somewhere else, the current version of the database can be recreated from the log file. Buffered Writes Buffered Writes • Delayed (or buffered) writes – Writes to database tables are not written to disk immediately. – When an application writes to a DB table (e.g. insert, update, delete) the DBMS stores the information in memory buffers. – The information may actually written to disk only much much later. Durability • Data is DURABLE – Transactions GUARANTEE that if a system failure (e.g. power outage) occurs after a transaction is committed, the database will be able to be restored to reflect the changes made by the transaction even if the underlying table data was not written to the database file. (we'll see how soon). Disks are SLOW, Memory is FAST • Why are writes buffered? – Memory is MUCH, MUCH faster than a disk drive. • How – This reason for this has to do with how disk drives work (see next slides). Tracks, Sectors & Clusters • Disk drives are segmented into – tracks (concentric circles) and – sectors (pie slices) • Each track and each sector has a identifying number • A cluster is a particular area of the disk corresponding to a specific track and specific sector Cluster Size • Every cluster on the disk stores the same amount data. • The amount of data stored in a cluster is known as the cluster size. • Cluster sizes are usually powers of two: – Example cluster sizes for different disk drives: • • • • 512 bytes 1024 bytes 4096 bytes etc. Reads and Writes • Every read and write to a disk drive will read or write an entire cluster at a time. • There is NO WAY for a disk drive to read or write only part of a cluster. • Therefore - PHYSICALLY, IT TAKES JUST AS MUCH TIME TO READ OR WRITE 512 BYTES ON A DISK DRIVE (if cluster size is 512 bytes) AS IT DOES TO READ OR WRITE JUST ONE BYTE. Logical Records and Physical Records • A physical record corresponds to the data on a single disk sector. • A logical record corresponds to the data from a single record in a particular table. • Logical records for a specific table are all the same size. (VARCHAR and VARBINARY data are not stored in the logical record) Storage of logical records in physical records • If the logical record smaller is smaller than the physical record size (i.e. cluster size) then multiple logical records are stored in a single physical record. • If the logical record size is LARGER than the physical record size then a single logical record will need to be split between 2 or more physical records. Not enough memory to hold everything • The amount of memory available to the DBMS is generally NOT as large as the amount of disk space available. – Memory is much more expensive than disk space. – Hardware limitations limit amount of memory that can be placed on one machine. Memory Buffers • The DBMS creates memory buffers that are the same size as the disk clusters. • When the DBMS reads information from a cluster, it copies that information to an in memory buffer which is the same size as the cluster. This is known as a memory "page". Paging • What is "paging"? – There generally are NOT ENOUGH memory pages to store the whole database. – When the DBMS needs to access data that is not currently in memory, the DBMS • Picks an in-memory page that is not being used and writes it to disk • The DBMS then reads the desired data from the disk into the now available memory buffer. Checkpoints • What is a "checkpoint". – Once in a while, the DBMS ensures that the latest copy of all pages are on the disk. – This is known as a "checkpoint" – Checkpoints are necessary for the log mechanism to work correctly. SQL Commands SQL Commands • BEGIN TRANSACTION – issued before any SQL statements that are part of the transaction • COMMIT – issued after all SQL statements that are part of the transaction – Once the COMMIT statement is executed you are guaranteed that the data is permanently in the database even if unforeseen errors happen. Example of transaction BEGIN TRANSACTION UPDATE SAVINGS_ACCOUNTS SET BALANCE = BALANCE - 500 WHERE ACCOUNT_NUMBER = 12345; UPDATE CHECKING_ACCOUNTS SET BALANCE = BALANCE + 500 WHERE ACCOUNT_NUMBER = 12345 COMMIT TRANSACTION Other SQL commands:ROLLBACK • Other SQL commands – ROLLBACK • A rollback command forces whatever was done in the transaction so far to become "undone". • Similar to the "undo" command on your word processor. • This is used both with "stored procedures" and application programs that interact with the database. When the program encounters a condition after it started processing the transaction that requires undoing the transaction, the "stored procedure" or the application can issue the ROLLBACK command. • Example: if there is a transmission error in a distributed database, the application program can ROLLBACK a transaction once it is started. SAVE POINTS • Additional SQL commands – SAVE TRAN mysavepoint1 • allows you to breakup a long transaction in to several parts. • You can create several savepoints • Each savepoint is given a unique name (e.g. mysavepoint1, mysavepoint2, etc.) • at any point the program can issue a ROLLBACK TRAN mysavepoint1 command to rollback the transaction until the specified savepoint and then continue on from there. LOG File Transaction ID • Many transactions can be executing simultaneously. • actions from Trans1 are often interspersed with actions from Trans2 • Therefore, each transaction is assigned a unique ID by the DBMS. LOG File • All changes to the database are recorded both – in the underlying DB tables AND – in a TRANSACTION LOG FILE Buffered Writes • Information written to the DB tables can be buffered to enhance performance. • Information is not necessarily written to permanent storage (i.e. the disk drive) when the LOG File • 3 types of records in the LOG file – begin record – commit record – detail record • 4th type (we'll discuss later) – rollback record Log BEGIN & COMMIT Records • BEGIN record contains – transaction id • COMMIT record contains – transaction id Log DETAIL record • There can be many DETAIL records for each transaction • Each DETAIL Record contains – transaction id – action (insert, update, delete) – row id (used to uniquely identify the row in the table) – old & new values (AKA before image & after images) LOG File Implemented as a Table • The LOG FILE is often implemented as a special "hidden" database table, not available to users. • In this case each row in the table needs a sequence number to indicate the order in which records were written to the LOG Database Recovery Transactions to the rescue – Step 1: Person at ATM requests to do a transfer and presses the "OK" button on ATM. – Step 2: DBMS writes a record to the LOG file indicating the changes to be made to the savings_account table – Step 3: DBMS writes new amount to the savings_account record. – Step 3: ***** POWER OUTAGE ***** (computer goes down) – Step 4: When computer is rebooted and DBMS server software is restarted ... • The recovery subsystem in the DBMS software attempts to "recover" the database (this generally happens automatically - Recovery Transparency) • The recovery subsystem looks through the LOG file and backs out any changes to the database made by any Transaction for which there is no COMMIT record • To do so, the recovery subsystem must make sure that the value in the savings_account record is equal to the "before image" of the record. – Step 5: Database is restored as though no transfer ever happened. – Step 6: Database comes online for regular processing. Other scenarios for discussion ... • Outage happened before record was written to savings_account table file Database BACKUPs Backing up a DB • DBAs should maintain backups of their entire database in case something catastrophic happens Two types of backups • 2 types of backups – FULL backup – INCREMENTAL backup FULL backup • COLD BACKUP – A FULL backup on a database that is not active requires backing up only the database files (tables, etc.) • HOT BACKUP – For 24X7 applications it is often impossible to perform a COLD backup. – A HOT backup requires backup of BOTH • database files (i.e. tables, etc) AND • LOG files INCREMENTAL BACKUPS • In very large databases, it is often prohibative to backup the entire set to DB files (ie. tables, etc) on a regular basis. • Instead a single backup of the DB files can be done at one time. • After that backups of the LOG files can be done. ROLL FORWARD • To restore a database that was backed up INCREMENTALLY, the DBA uses a tool to restore the DB. • The log files are used to "ROLL FORWARD" the changes that were made to the underlying DB since the backup of the table files. CONSISTENCY CONSISTENCY • Transactions always operate on a consistent view of the data and when they end always leave the data in a consistent state. Data may be said to be consistent as long as it conforms to a set of invariants, such as no two rows in the customer table have the same customer id and all orders have an associated customer row. While a transaction executes these invariants may be violated, but no other transaction will be allowed to see these inconsistencies, and all such inconsistencies will have been eliminated by the time the transaction ends. ISOLATION (and concurrency) Concurrency and Isolation • Concurrency: – In a multi-user database, several programs are working against the database at the same time. • Transactions must guarantee that each program "sees" a consistent view of the underlying data without interference from the other programs. Types of problems if there is no Isolation • lost updates • 2 transactions trying to update same value • uncommitted dependency (AKA dirty read) • 1 transaction reads data written by a 2nd transaction before the 2nd transaction commits • 2nd transaction does a ROLLBACK • Inconsistent Retrievals • incorrect summary (includes some changed records and some unchanged records) • phantom read – TR1 selects some records – TR2 writes some data that would have been retrieved by TR1s query – TR1 runs the same query again, expecting same results, but gets different results. • nonrepeatable read – TR1 reads a value – TR2 changes the value – TR1 reads same value again What can go wrong: EXAMPLE • Examples: – see diagrams on pages 541 - 542 in Mannino Concurrency Transparency • Isolation is AUTOMATICALLY enforced by the DBMS. The application programmer and the DBA do not need to do anything other than start and commit the transactions. • This is knows as "Concurrency transparency" How does DBMS enforce Isolation? Simple method : Sequential Execution • Isolation via Sequential Execution – DBMS can wait to perform a transaction until all other transactions in the system have been committed. – This would cause VERY BAD performance for a multi-user application. – The goal of isolation is to make it look to the user like the DBMS is doing sequential execution. Term: Transaction Throughput • Transaction Throughput – The number of transactions a DBMS can perform per unit time – (more is better) Motivation for studying ISOLATION Why Do DBAs need to understand concurrency? • Concurrency control adds overhead to DBMS processing. • Transactions can be structured to minimize or maximize the amount of work the DBMS needs to do. Locking Granularity of locks • Coarsest to finest – database lock (entire database – table locks – row locks – field lock • Other types of locks – page locks (i.e. physical records on disk or pages in memory) – index locks Locking and Efficiency • Coarse locks improve overall performance but can cause individual transactions to wait a long time • Fine locks improve perception among users but can decrease overall performance Lock promotion • DBMS "concurrency control manager" may automatically "promote" a lock to a coarser grained lock if it determines that would greatly improve efficiency. DEADLOCK • Deadlock – example: trying to reserve a seat on each leg of a two leg journey – (speak this out) Deadlock recovery • Deadlock recovery – DBMS chooses one of the deadlocked transactions and automatically does a ROLLBACK on it. – Other transaction(s) can then proceed. • Deadlock detection vs. Timeouts – Deadlock detection algorithms are expensive to implement. – DBMS often uses timeouts to determine which transactions are deadlocked. – Timeout values should be chosen appropriately for the application. – In general, transactions should BE SHORT LIVED. Types of Locks • Types of Locks – Shared lock (AKA read lock) – Exclusive lock (AKA write lock) 2 phase locking 2 phase locking • 2 phase locking – ALL transactions in the database must follow the following rule: A transaction must not acquire any new locks after releasing any lock – This will avoid the "lost updates" problem Another modification • Hold all exclusive (i.e. write) locks to end of transaction – This will avoid the "uncommitted dependency" problem. One more modification • Hold all shared (i.e. read) locks until end of transaction – This eliminates the following problems: • incorrect summary • nonrepeatable read • phantom read Optimistic concurrency control • Check to see if there is a conflict and do a ROLLBACK if there is • few conflicts = better performance than locking Isolation Levels • See chart on p. 558 in Mannino – READ UNCOMMITTED – READ COMMITTED – REPEATABLE READ – SERIALIZABLE • Example – SET TRANSACTION ISOLATION LEVEL READ COMMITTED Performance Issues Store LOG File on different Hard drive END OF PRESENTATION CHECKPOINTS Checkpoints • Changes to the underlying tables are not always written out to permanent storage when they happen. • Changes can reside in memory (volatile storage) until a "checkpoint" happens. • The DBMS will occasionally ensure that all changes to the underlying tables (not the log file) are written out. This is called a checkpoint. Immediate Update • Immediate update – DB writes Table data AFTER log file