12/4

Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 4, 2003 Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke Final Administrivia  Project demos today and tomorrow  Final exam handed out at the end of today’s class  Finals plus project reports due by 1PM, 12/18/2003  Project reports should be ballpark 10-15 pages  Remember, quality and clarity of presentation matters!  Also, email me a brief message detailing:  Your contributions to the project  Your group members’ contributions and your assessment of “group dynamics”  Turn in at my office, 576 Levine Hall or to my assistant, Kathy Venit, in 308 Levine Hall 2 Last Time…  We were discussing isolation levels  How to keep transactions from interfering with one another  Or at least, how to minimize this  Recall the strongest version of isolation was serializability 3 Theory of Serializability  A schedule of a set of transactions is a linear ordering of their actions  e.g. for the simultaneous deposits example: R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)  A serial schedule is one in which all the steps of each transaction occur consecutively  A serializable schedule is one which is equivalent to some serial schedule (i.e. given any initial state, the final state is the same as one produced by some serial schedule)  The example above is neither serial nor serializable 4 Questions of Concern  Given a schedule S, is it serializable?  How can we "restrict" transactions in progress to guarantee that only serializable schedules are produced? 5 Conflicting Actions  Consider a schedule S in which there are two consecutive actions Ii and Ij of transactions Ti and Tj respectively  If Ii and Ij refer to different data items, then swapping Ii and Ij does not matter  If Ii and Ij refer to the same data item Q, then swapping Ii and Ij matters if and only if one of the actions is a write  Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q) 6 Testing for Serializability  Given a schedule S, we can construct a di-graph G=(V,E) called a precedence graph  V : all transactions in S  E : Ti  Tj whenever an action of Ti precedes and conflicts with an action of Tj in S  Theorem: A schedule S is conflict serializable if and only if its precedence graph contains no cycles  Note that testing for a cycle in a digraph can be done in time O(|V|2) 7 An Example T1 T2 T3 R(X,Y,Z) R(X) W(X) T1 R(Y) W(Y) T2 T3 Cyclic: Not serializable. R(Y) R(X) W(Z) 8 Another Example T1 T2 R(X) W(X) T3 T1 R(X) W(X) T2 T3 Acyclic: serializable R(Y) W(Y) R(Y) W(Y) 9 Producing the Equivalent Serial Schedule  If the precedence graph for a schedule is acyclic, then an equivalent serial schedule can be found by a topological sort of the graph  For the second example, the equivalent serial schedule is:  R1(Y)W1(Y) R2(X)W2(X) R2(Y)W2(Y) R3(X)W3(X) 10 Locking and Serializability  We said that for a serializable schedule, a transaction must hold all locks until it terminates (a condition called strict locking)  It turns out that this is crucial to guarantee serializability  Note that the first (bad) example could have been produced if transactions acquired and immediately released locks. 11 Well-Formed, Two-Phased Transactions  A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesn’t release the lock until the action is performed  Locks are also released by the end of the transaction  A transaction is two-phased if it never acquires a lock after unlocking one  i.e., there are two phases: a growing phase in which the transaction acquires locks, and a shrinking phase in which locks are released 12 Two-Phased Locking Theorem  If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability  i.e., there is a very simple scheduler!  However, if some transaction is not well-formed or two-phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable  i.e., one bad apple spoils the bunch. 13 Summary of Transactions  Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system  Theoretically, the “correct” execution of transactions is serializable (i.e. equivalent to some serial execution)  Practically, this may adversely affect throughput  isolation levels  With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate 14 What to Look for Down the Road  … well, no one really knows the answer to this…  … But here are some hints, ideas, and hot directions     Sensors and streaming data Peer-to-peer meets databases “The Semantic Web” Collaborative data sharing 15 Sensors and Streaming Data  No databases at all…  … Instead we have networks of simple sensors  Madden, starting at MIT  Gehrke, Cornell  Widom, Stanford  queries are in SQL  data is live and “streaming”  we compute aggregates over “windows” 16 What’s Interesting Here  We’re not talking about data on disk – we’re talking about queries over “current readings”  Sensors are generally “stupid” and may be battery-operated  A lot of challenges are networking-related: how to aggregate data before it gets sent, etc.  The next step (e.g., work initiated here @ Penn): including sensors that capture images – a very different problem!  This has many more compelling applications – security, monitoring, correlating multiple sensors, rescue operations, military logistics and coordination, etc. 17 Peer-to-Peer Computing  Fundamentally, our model of DBMSs tends to be centralized  Even for data integration: there’s a single mediator  This has many implications: central administration, central coordination, etc.  What can be gained from borrowing a page from peer-topeer systems like Napster, Kazaa, etc.?  A better architecture?  Solutions to many problems unsolved by distributed DBMSs?  Replication, object location, distributed optimization, resiliency to failure, …  New types of applications, e.g., in integration? 18 P2P Work  As a new architecture for storage and querying  PIER (Berkeley), P-Grid (EPFL), Medusa (MIT)  A better way of thinking about translating and exchanging data  Piazza (Washington), Orchestra (Penn), Hyperion (Toronto), work at Trento 19 The Semantic Web  In some ways, a very “pie-in-the-sky” vision  But some real and concrete problems might be partly solvable  Goal is really very similar to data integration, where somehow we have mappings between the schemas  Currently, most people in the SW community are from knowledge representation community and use RDF  Focus: very rich ways of describing schemas – “ontologies” – that blend querying with class definitions  “Teachers are people who teach students” “Tenure-track professors are teachers at universities who can get tenure”; etc.  Implicit take on the problem: if we create better languages for describing ontologies, it’s easier to mediate between schemas 20 Holes in the Semantic Web  What issues and concerns came up in the data integration assignment you had?  Do you think a richer schema language would help for these?  Do you think “better normalization” would help?  Fundamentally, we need:  Languages for not only describing relationships, but transformations between formats (e.g., XML schemas)  Automatic or partly automated ways of discovering mappings and correspondences  These are all database problems, and the solution likely must come from the DB community  This is part of what P2P systems like Piazza, Hyperion try to address 21 My Take on the Future  We’ve evolved from a world where data management is about controlling the data  Instead, data management is about translating and transforming data using declarative languages  It should ultimately become much like TCP or SOAP – a set of standard services for “getting stuff” from one point to another, or from one form to another  It’s the plumbing that connects different applications using different formats  Orchestra project at Penn: focuses on how to build a system for supporting collaborative science  People publish and map data in different schemas  What happens if people start updating it?  How do you propagate, manage, trace, reconcile changes? 22

12/4

Related documents

Products

Support

12/4

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib