8. Transactions The terminology used in this section is that all users (online interactive users or batch programs) issue transactions to the DBMS. A TRANSACTION is an atomic unit of database work specified by a user to the DBMS (atomic means - either executed to completion or not executed at all). Transactions are often called QUERIES when they request only read access (i.e., QUERIES are READ-ONLY TRANSACTIONS). A transaction is issued using constructs: (these are the only ones we need for discussion of Concurrency Control and Recovery - the DBMS issues discussed in this section). BEGIN to initiate a transaction (most actual system supply the BEGIN if the user doesn't, e.g., whenever a new SQL statement is encountered it is assumed to iniate a new transaction) END to end a transaction (usually either COMMIT for successful END and ABORT for unsuccessful END) (most actual system supply this element if the user doesn't, e.g., If SQL statement execution is successful, Then DBMS supplies COMMIT, else ABORT) READ whenever any data is needed from the DB (e.g., in an SQL SELECT) WRITE whenever any data needs to be written to the DB (e.g., in an SQL INSERT or UPDATE) In this set of notes, all others aspects of language, coding, etc. will be considered as un-intrepreted aspects. When a transaction arrives at the DBMS, a Transaction Manager (TM) is assigned to it (code segment to act on its behalf). The TM interfaces with other components, e.g., the Scheduler (SCHED) for permission to access particular data items. SCHED is like a policeman, giving permission to access the requested item(s). Its activity is called concurrency control. Once permission is granted for TM to access data items Data Manager (DM) does the actual reads and writes. There are several models for describing this interaction. We will describe two of them, Model-1 and Model-2. Section 8 #1 Transactions Processing, Model-1 1. TM makes requests to the SCHEDULER to read/write data item(s) or to commit/abort the transaction 2. Scheduler (SCHED) decides if the request can be scheduled . If yes, it schedules request (passes it to DM (on TMs behalf). If no rejects it, informs TM. 3. DM read/writes the data item or commits or aborts the transaction if possible, else returns reject to the SCHEDULER (which returns it to TM) 4. DM returns the value read (or returns an acknowledgement(ACK) of the write or commit request to the SCHEDULER 5. SCHED returns the same to the TM. There can be one TM multithreaded by all transactions, or an individual TM assigned to each individual transaction. Transaction Manager(s) 1. read, write, commit, abort 2 ,3; reject 5; read value, write/commit ack Scheduler 2. read, write, commit, abort 3 reject. 4 value read; write/commit_ack Data Manager 3. read, write, Data on Disk Section 8 #2 Transactions Processing, Model-2 (assumed through the rest of notes) 1. TM requests permissions from SCHED. 2. SCHED accepts or rejects TMs permission requests. 3. TM requests DM to do read/write/commit/abort. 4. DM read/writes the data item or commits or aborts the transaction if possible, else returns reject to the TM. 5. DM returns the value read (or returns an acknowledgement(ACK) of the write or commit request to the TM There can be one TM multithreaded by all transactions, or an individual TM assigned to each individual transaction. Transaction Manager(s) 1. read, write, commit, abort 2. decision: accept or reject Scheduler 5. value read or ack reject 3. read, write, commit, abort Data Manager 4. read, write, Data on Disk Section 8 #3 Concurrency Control (CC) (the activity of the scheduler, SCHED) We need concurrency control (AKA mutual exclusion) whenever there are shared system resources that cannot be used concurrently. An illegal concurrent use of a shared resource is a conflict. E.g., one user reads JONES while another user changes JONES to SMITH. The read could get SMIES. The shared DBMS resources will be call data items. DATA ITEM GRANULARITY is the level at which we treat Concurrency Control: field level (logical level, very fine granularity) record level (logical level, fine granularity) page level (physical level, medium granularity) file level (logical level, coarse granularity) area level (logical level, quite coarse granularity) database level (logical level, very coarse granularity) We assume, that a data item is a record (i.e., we assume logical, record-level granularity) This means there are many more shared resources for DBMS to manage than anywhere else, (e.g., printers for an O/S), and thus, CC is harder in a DBMS than anywhere else! A DBMS may have 1,000,000 records or more. An O/S may have to manage ~ 50 printers. Ethernet Medium Access Protocol (unswitched) manages ONE shared wire. Although you may have studied mutual exclusion before (e.g., in an Operating Systems course it is a more complicated problem in DBMS. Section 8 #4 Concurrency Control cont. In any resource management situation (Operating System, Network Operating System or DBMS...) there are "shared resources" and there are "users". SHARED RESOURCE MANAGEMENT deals with how the system can insure correct access to shared resources among concurrently executing transactions? All answers seem to come from traffic control! (traffic intersections, construction zones, driveup windows). WAITING POLICY: If a needed resource is unavailable, requester waits until it becomes available (e.g., intersection red light, Hardee's drive up lane). This is how print jobs are managed by an OS. Advantages: NO RESTARTING (no unnecessary progress loss) e.g., At Hardees, they don't say "Go home! Come back later! Disadvantages: DEADLOCKS may happen unless they are managed. E.g., at a construction zone, if the two flag women don't coordinate, both traffic lines may start into construction zones from opposite directions resulting in a DEADLOCK in the middle!). Another disadvantage is INCONSISTENT RESPONSE TIMES. At the Hardees window, you may wait an hour or a minute. (Not so important at Hardees (well maybe it is if you're very hungry? ;-), but at an Emergency Room?). RESTART POLICY: If a needed resource is unavailable, the request is terminated and restarted later. E.g., When someone goes before a parole board, they either get their request or they restart the process later). In Ethernet (unswitched) CSMA/CD, if node A wants to send a message to node B: 1. Carrier Sense (the "CS" part): the wire is checked for traffic; if it is busy (in use by another sender), A waits (according to some "back-off algorithm") then checks again, etc. until the wire is idle, then SENDs the message. 2. Collision Detection (the "CD" part): listen to bus until you're certain that your message did not collide with another concurrently sent message (the required length of wait time is the traversal_time of wire, since there are terminators (absorbers) at each end). Advantages of restart policies: simple, no deadlock Section 8 # 5 Disadvantages: Lower throughput, lost progress, long delays?, possible livelock. Concurrency Control cont. A Transaction is an atomic computation or program taking the database from one consistent state to another (without necessarily preserving consistency at each step of the way). The transaction is an atomic unit of database work, ie, DBMS executes transaction to completion or not at all, GUARANTEED. If only one transaction is allowed to execute at time and if the database starts in a consistent state then it will always end up in a consistent state! The problem with such a SERIAL EXECUTION policy is that it is too inefficient! A DBMS must guarantee the so-called ACIDS PROPERTIES of transactions: ATOMICITY: A transaction is an all-or-nothing proposition. Either a transaction is executed by the DBMS to completion or all of its effects are erased completely. (Transaction = atomic unit of database workload). CONSISTENCY: Correct Transactions take the database from one consistent state to another consistent state. Consistency is defined in terms of consistency constraints or "integrity constraints", e.g., entity integrity, referential integrity, other integrities. ISOLATION: Each user is given the illusion of being the sole user of the system (by the concurrency control subsystem). DURABILITY: The effects of a transaction are never lost after it is "committed" by the DBMS. (ie, after a COMMIT request is acked by DBMS). Section 8 #6 Execution types SERIAL EXECUTION insures most of the ACID properties (Consistency and isolation for sure. It also helps in atomicity and durability). i.e., queue all transactions as they come in (into a FIFO queue?). Let each transaction execute to completion before the next even starts. Serial execution may produce unacceptable execution delays (i.e., long response times) and low system utilization. SERIALIZABLE EXECUTION is much, much better! Concurrent execution of multiple transactions is called serializable if the effect of the execution of operations (reads and writes) within the transactions are sequenced in a way that the result is equivalent to some serial execution (i.e., is as if it was done by a serial execution of transaction operations). Serializability facilitates ATOMICITY, CONSISTENCY and ISOLATION of concurrent, correct transactions, just as well as SERIAL does, but allow much higher system throughput. RECOVERABILITY facilitates DURABILITY (more on this later). An execution is RECOVERABLE if every transactions that commits, commits only after every other transaction it read-from is committed. Section 8 #7 Isolation Levels SQL defines execution types or levels of isolation weaker than SERIALIZABILITY (they do not guarantee ACIDS properties entirely, but they are easier to achieve). REPEATABLE READ ensures that no value read or written by a transaction, T, is changed by any other transaction until T is complete; and that T can read only changes made by committed transactions. READ COMMITTED ensures that no value written by a transaction, T is changed by any other transaction until T is complete; and that T can read only changes made by committed transactions. READ UNCOMMITTED ensures nothing (T can read changes made to an item by an ongoing trans and the item can be further changed while T is in progress. There will be further discussion on these later in these notes. For now, please note there are several suggested paper topics in the topics file concerning isolation levels. But also note that I think these other isolation levels are bunk! Section 8 #8 Concurrent Transactions are transactions whose executions overlaps in time (the individual operations (read/write of a particular data item) may be interleaved in time). Again, the only operations we concern ourselves with are BEGIN, READ, WRITE, COMMIT, ABORT. READ and WRITE are the operations that apply to data items. A data item can be a field, record, file, area or DB (logical granules) or page (physical granule). We assume record-level granularity. A read(X) operation, reads current value of the data item, X, into a program variable (which we will also called X for simplicity). Even though we will not concern our selves with these details in this section, read(X) includes the following steps: 1. Find the address of the page containing X. 2. Copy that page to a main memory buffer (unless it is already in memory). 3. Copy the value of the dataitem, X, from the buffer to the program variable, X The write(X) operation, writes the value of the program variable, X, into the database item X. It includes the following steps: 1. Find the address of the page containing X 2. Copy that page to a main memory buffer (unless it is already in memory). 3. Copy the program variable, X, to buffer area for X. 4. Write the buffer back to disk (can be deferred and is governed by DM). Section 8 #9