>> Christian Konig: Good morning, everyone. Thank you... introduce Sudip Roy. He is joining us from Cornell...

>> Christian Konig: Good morning, everyone. Thank you for coming. It's my great pleasure to introduce Sudip Roy. He is joining us from Cornell University, where he's co-advised by Johannes Gehrke and Christophe Koch -- or Koch, how you correctly pronounce that. And also, he has done a Microsoft Research internship here, and he's also done a Google internship. He is the co-winner of the SIGMOD 2011 Best Paper Award, together with some of his colleagues at Cornell, and today he will talk to us about lazy transaction execution models. >> Sudip Roy: Thanks for the introduction, Christian. So today I'm going to present my thesis work on lazy transaction execution models. So let me start by reminding you of what a transaction is. A transaction is a single execution of a user program over a shared database state. Informally, it is a basic unit of change, which the database sees, and this execution of the program is guaranteed to satisfy the ACID properties. And I'm sure that all of you are familiar with what ACID is, so I'm not going to go into the details of that, but let me show you what such a user program usually looks like. So consider Mickey's transaction to book a seat on Flight 123. So the fact that this program has to be executed transactionally is indicated by the keyword START TRANSACTION. It has these four statements. First, Mickey selects a seat on Flight 123. It checks if there is something which is available. If not, then the transaction rolls back. If there is something available, then it does two updates to the database. It deletes that seat from the available table and it inserts a tuple corresponding to the reservation into the bookings data. Now, this of course is a very simplified form of what you would see in the real world. So before I talk about how and why lazy execution is good, let me show you how we can execute this transaction in a classical model and why this leads to suboptimal results. So consider this following scenario, in which you have a flight in which you have three seats available. Now, the available seats are in green. The already-reserved seats are in red, so 1A, 1B and 1C are available. Let's say Mickey issues the transaction, the program which I just showed you, to book any seat, and he gets seat 1A. After that, let's assume that Donald issues a similar transaction, and he gets seat 1B. Finally, Minnie issues a transaction. However, Minnie has an additional constraint that she only wants a window seat. Now, the only window seat which was available, 1A, has already been allotted to Mickey, and therefore Minnie's transaction had to abort. Now, if you assume that Minnie's transaction was going to arrive, then you could have given Mickey the seat 1C. Mickey didn't really care about which seat he got -- in which case, Minnie's transaction would have committed. So let us see how a lazy execution model addresses this issue. So, again, consider the same scenario. Now, Mickey says, book me any seat, and instead of assigning a single seat to Mickey, I'm going to commit Mickey's transaction, and I'm going to defer the assignment of seat to Mickey. Subsequently, I'm going to do the same for Donald's transaction. I'm going to ensure that Mickey and Donald both have some seat, but I'm not going to tell them which exact seat they have. Finally, in this case, when Minnie says that I want a window seat, I can actually assign Minnie the window seat which was available, in this case, 1A. So I made these two assumptions of what these transactions are doing. One is that there is a flexibility in the value that is being written. That is, Mickey does not care which exact seat he gets, as long as it satisfies a certain number of constraints, and second, that there is a delay between the point at which the transaction commits and the point at which you read the values which are written by the transaction. Now, humor me for now and assume that there is a broad class of operations which satisfy these two assumptions, and I'm going to return back and precisely identify what this class of applications is. Now, assuming that there is a class of applications over which these two assumptions hold, the key idea is that we can lazily bind the unread values in the transaction, and by doing so, we are creating some room to maximize some notion of global utility. In this particular application, the global utility was to satisfy the maximum number of user constraints, or equivalently to allow the maximum number of transactions to successfully commit. So let me give you another example in which being lazy helps. So consider a simple voting application in which we are using this votes table to keep a tally of the election status. So in this case, the Democrats have 100,000 votes cast for them. The Republicans have 75,000 votes. And I have three transactions which the application -- or three user programs which the application can execute as transaction. One is to cast a vote for Democrats, which basically just goes and updates the count variable of Democrats by one, a second, to cast a vote for Republicans, which increments the value corresponding to Republicans, and a third transform, which checks who is leading, so it reads the Democrats' and Republican counts, it compares the two values and displays who is the current leader in the election. Furthermore, let us assume that this is a nationwide election, and I'm replicating this votes table across two datacenters, one on the east coast and one on the west coast, and also that the initial state of the database is that the Democrats are leading the Republicans by around 25,000 votes. So what happens if we execute this transaction under strong consistency? So whenever I cast a vote -- in this case, a vote is cast for Republicans, I need to consistently change my replica state across these two datacenters, right? So I need to inform -- so if I am executing on the west coast datacenter, I need to synchronously inform the east coast datacenter of this change, and at least I have to incur one round-trip latency. Now, of course, under strong consistency, the programming model is very simple, because the user never has to bother about perceiving inconsistent or two different replica states. The other extreme is to be eventually consistent, in which you say that I'm not going to inform the other datacenter synchronously. I'm going to do that asynchronously, so when my transaction executes on the west coast datacenter, I commit it locally and I keep my fingers crossed and hope that this transient inconsistency between the two duplicate states is not perceived by the application. And in this particular case, it is actually not perceivable by the application, because all the application cares is who is leading, and that transaction is going to evaluate to the same thing over these two states. However, you can imagine scenarios where the Democrats are leading by one, at which point if you are executing an eventual consistent model, then transaction 33 can see two different states, and the application can basically perceive one in which the Democrats and Republicans are tied and another in which the Democrats are actually leading. So this exposes -- so this type of inconsistency then has to be handled at a higher level in the application, right? So can we get the best of both worlds? That is, can we get the clean semantics of strong consistency as well as the strong response times of eventual consistency? And the answer is, yes, we can, at least under certain assumptions and for a certain class of applications, we can. So let me show you how. So the key idea is to exploit flexibility in the reads which the application is making. So in the earlier case, for this particular application, if these are the only three transactions which the application can execute, then it doesn't really matter what exactly the Democrats' and Republican vote counts are, as well as in both these two database states, the Democrats are leading, because the application cannot actually perceive this difference. So, in some sense, they belong to the same equivalence class of database states. In this case, a class in which the Democrats are leading. So the idea then is to be lazy yet strongly consistent, and how can we do so? Instead of requiring that the two replicas are always identical, which is what strong consistency does, we are now going to enforce that the two replicas are always in equivalent states, as opposed to being completely identical. And this will allow my two replicas to diverge, but I'm going to establish certain bounds within which they are allowed to diverge, and these bounds are again defined by the equivalence class. So how do I do it? I do it by using these global treaties. We can assume these global treaties are contracts which all the replicas sign and say that, as long as any changes which I am making are not going to violate the global treaty, I am good. But whenever I am good, as in I can execute things locally, but whenever I am in danger of violating a global treaty, I need to inform the other replica. So, of course, I have shifted the onus of communication from the transaction to enforcing this global treaty, and unless I have a mechanism of efficiently enforcing this global treaty in a distributed manner, I would go back to the strongly consistent case. So how do I enforce this global treaty? I project it into two local treaties such that, if each of these replicas are making changes which do not violate the local treaties, I am sure that the global treaty is not going to be violated. So intuitively, you can imagine that I had a budget of around 25,000 votes until the boundary of the equivalence class, and I have partitioned that into two, one of 12,500 votes. And if it is a little vague right now, it will become more concrete when I get to the technical details and precisely define what this projection is, how we get these global treaties. Yes. >>: Are you also going to talk about works that relate to this idea, as well, because I would then hold off my question until after then? >> Sudip Roy: Yes, I am going to. >>: Let me ask my question, and then if you are going to talk about it, you can defer it. How does it relates to two worlds? One is consistency rationing, 2009 I think we are leaving. And also as program detection models. >> Sudip Roy: Right, so consistency rationing, it basically said that we are going to classify -we are going to have three types of three classes of objects, one which have to be strongly consistent, one which can be eventually consistent and one which are in between. So our work basically says that you don't have to classify. >>: But you are between categories, right? Because within certain constraints, you switch from eventually consistent to strongly consistent and vice versa. >> Sudip Roy: No. We are going to be always strongly consistent, except that if the application doesn't actually require strong -- the application always requires strong consistency, but if the application cannot perceive some inconsistencies, then I am going to exploit that flexibility in the application to be inconsistent sometimes. But using these treaties, I'm going to ensure that from the application's point of view, it is always strongly consistent. Regarding the escrow transaction and demarcation protocol, and there are other protocols, like distributed divergence protocols, let me come back to that. Right, so how are transactions executed in this lazy yet strongly consistent way? So now, when you issue a set of transactions, they go to the west coast datacenter, they are executed locally, and you can execute them locally as long as this local treaty is not violated. Now, in this case, the Republican count has reached kind of a border of the local treaty, and the next transaction pushes it over, as in it violates the local treaty, at which point I synchronize with the east coast datacenter. I update it with the changes which had happened in the east coast datacenter, and I renegotiate and establish a new set of local treaties. So again, I made two assumptions. One is from the application's point of view, there are many database states which are equivalent, and it doesn't really perceive how they are different, and the second, that communication is expensive, which is a very mild assumption and is true in many scenarios. And again, I'll request you to humor me for now, and I'll come back and identify a class of applications over which these two assumptions hold. So assuming that these two assumptions hold, the key idea is to lazily synchronize distributed state, and by doing this lazy synchronization, we can minimize the amount of coordination without actually sacrificing the consistency requirement. So the takeaways from so far is that many applications have some flexibility in the transactions. By exploiting this flexibility in transactions, we can be lazy, and I've shown you one example in which this laziness creates room for optimizing. And I have shown you another case where we can exploit this flexibility to be lazy, and this laziness would reduce the amount of coordination required without sacrificing consistency. So that was my introduction, and the outline for the rest of the talk is that I am going to first present a solution for how we can be lazy and optimize resource allocation, which is the class of applications for which it is applicable. Second, I am going to show how laziness allows us to minimize coordination, and that's my project on homeostasis. And, finally, I'm going to show some experiments. So any questions on the high-level ideas so far? So let me start by revisiting the original example, which I just showed you. So I had told you that we had these three transactions. There was some flexibility in the values which were written, and I had also made the second assumption that there is a delay between the point at which the transaction commits and the point at which the values are read. And the key idea is that we were going to delay the binding for these values which are not read by the transactions, and this will create some room for optimization and we can maximize the global utility, and in this particular case, it was to allow the maximum number of transactions to go through. So coming back to my earlier promise of identifying what this class of application is, so there are many database applications which use transactions to allocate -- yes. >>: On this scenario, if you look at flight reservation applications of today, they don't necessarily commit you to a seat, unless you specifically ask for it. So they would not be solving at the database, but they already solve this at the app level. >> Sudip Roy: So the idea is that, yes, for this particular -- so the idea is not that you can do it on -- okay, let me rephrase that. So, yes, you can write custom application logic to do so, which is outside the database. But what we are claiming is that it is a more fundamental problem, and therefore we are presenting an abstraction for all of these applications which can be used. More than that, there are some interesting issues which arise out of now that you are executing this transaction and you have removed some part of the transaction and are executing at a later point, what happens to the traditional properties, traditional asset properties. In some sense, you're not executing it automatically. How do you reason about isolation, because now one transaction actually can be affected by the transaction? I don't know if that answers your question. Right. So I'm going to use the word resources as an abstraction for these objects which are allotted, and I'm going to assume that they are represented as data items in the database, and you're using transactions to change the state which is associated with these data items. So this is precisely an example of an application where those two assumptions hold. So seat ID is basically a social seating platform, which provides social plugins, so that you can basically choose who you sit next to in a flight. It may be one of your friends. It may be you can specify a constraint like I want to sit with someone else from Microsoft Research or another technical guy. Of course, you do not want to be in situations like this. Another field is -- another area where these kind of reservations hold is where you make reservations, you don't really know which room you are allotted. You are allotted the room when you actually get to the check-in point, and FrontDesk Upsell is such a pieced of hotel reservation software which, as they advertise intelligently makes the right offer for the right guest at check-in, but from the hotel's point of view, they are maximizing the revenue by allocating rooms efficiently. Finally, I am sure all of you have run into a scenario where you have some meetings which are scheduled, and someone higher up in the hierarchy schedules another meeting, which leads to a cascading rescheduling of meetings. In this case, of course, the time slots correspond to these resources. And this is usually bad for graduate students like us, who end up with no sign or advisers. >>: Are you going to talk about them? >> Sudip Roy: Yes, use Quantum Databases on it. Right, so again, going back to our solution, we are going to delay the assignment of resources beyond the transaction commits, so as opposed to a classical model in which first a user requests some resources with constraints, the system assigns a resource and then the transaction commits, now we are going to move to a lazy model, where the user requests the resource with some constraints, the transaction commits if there is a feasible assignment which exists, and the actual assignment of resource takes place at some point in the future when a read is performed over the database. That is when Mickey needs to know which seat he is sitting in. And between the point at which the transaction commits and the seat assignment takes place, the database is in a partially uncertain state, and we call this state a quantum state. In this state, Mickey has a seat, but which seat is unknown. And the database which manages this uncertainty is called a Quantum Database. So let me first show you at a conceptual level how Quantum Database supports this lazy execution model. So let us assume a scenario in which we have an empty flight reservation table. Now, Mickey's transaction arrives and is executed, as opposed to a classical model of execution, in which the database transitions to a single next state, which corresponds to whichever seat was allotted to Mickey. A Quantum Database transitions to three possible states. That is, it maintains all possibilities, one in which Mickey is sitting in 1A, one in which Mickey is sitting in 1B, and a third in which Mickey is sitting in 1C. After that, when Donald's transaction arrives, Donald's transaction executes in each of these three possible worlds, and that leads to even more number of possibilities. Finally, when Minnie's transaction arrives, Minnie's transaction can only execute on two of these possible worlds, the one in which there is a window seat which is available. So what we have effectively done is, by delaying Mickey's seat assignment, and we delayed it by maintaining all of these possibilities, we have allowed Minnie's transaction to successfully commit. More formally, the Quantum Database is nothing but a set of possible database states which are reachable through different choices made in the transactions. And you may find them similar to uncertain or probabilistic databases, and they basically differ from probabilistic or N-complete databases in three main ways. One, we are deliberately introducing some uncertainty, and we are doing so to enable this late binding. Second, we always need to maintain a guarantee that the Quantum Database eventually resolves to a single state. It doesn't really make sense for Mickey to have two seats. And the third is a key design choice, which is from where the name Quantum Database arises, and that is to keep uncertainty internal to the database. And let me come back to this key design choice in a few slides. So so far, I have introduced what at a conceptual level Quantum Database is. Let me now give you one specific way of implementing Quantum Databases. Clearly, enumerating all of these possible worlds is infeasible. In fact, there can be an exponential number of possible worlds, exponential in the number of transactions which you are delaying, and there is a rich literature on maintaining these uncertain databases, Codd-tables, C-tables and PC-tables. However, we choose the simple representation, so what we do is we partition the Quantum Database into two states, one which is deterministic, and the other which is a sequence of transactions which have committed but whose seat assignment or whose value assignment has not taken place. So because these sequence of transactions has already committed, the Quantum Database needs to ensure that there is a feasible assignment of resources. We do not want Mickey to be in a situation where the transaction has already committed and later you see that, well, I don't have a seat for you anymore. So we need to maintain some kind of a system invariant. We need to maintain the logical formula which would guarantee that this sequence of transactions can always execute. So the next question is, how do we construct this invariant automatically? And in order to do this, we need to extract the user's constraints from the transaction itself automatically. And doing this in its full generality is difficult, and therefore with a strict -- we require some hints from the user, and we require the user to write the transactions as these resource transactions in an external SQL language, which looks like this. So it has a SQL -- it has a conjunctive query initially, which says what are the resources which are acceptable to me, in this case, only window seats on Flight 123. As to a limit one keyword, we now use a choose one keyword which explicitly encodes this choice or flexibility. And finally, we have a followed by clause which are all the writes which are dependent on the resource which are selected, and these are the writes which are going to get delayed and are going to get executed at some point in the future. Now, given transactions which are written in this SQL form, I'm going to use equivalent datalog-like representation in which the body of the datalog is going to correspond to this conjunctive query which is up here, and the followed by clauses will be in the head, and I'm going to use the minus notation for a deletion. I'm going to use a plus notation for insertion and updates can be modeled as a sequence of deletion followed by another insertion. So going back to the problem of constructing this invariant, we can do it now in two steps. First, we convert these two transactions to this equivalent datalog-like form, and now we want to compose these transactions to construct a single logical invariant, and we do this by unification. Yes. >>: What is your followed by, the class of SQL inside? What is the class of constraints? >> Sudip Roy: What is it? >>: Does it delete values inside values, or do you allow sub-queries and stuff in there? >> Sudip Roy: No. It just deletes and answers. >>: There isn't sort of atomic queries. >> Sudip Roy: Yes. It's probably possible to extend it further, but we haven't looked into that. Right, so let's say that this is the datalog-like representation for Mickey's transaction. This is the datalog-like representation for Donald's transaction. Now, I construct an equivalent larger transaction, which is a sequential composition of these two transactions. Now, you want to be careful, because Donald's transactions execute on a database state which is obtained after Mickey's transaction has executed, and therefore it should perceive the writes which Mickey's transaction would have done. In this case, it would have deleted this particular seat, and therefore it results in this additional constraint, which is based on unification between the heads of all previous transactions and the body of the latest transaction. Now, this of course is a simple example of how we do this composition. We have a general algorithm for composition and proof of correctness in the paper, but I won't have time to go into that in this talk, but I am happy to talk about it later offline. So, assuming that -- okay, so let me just find out that now that we have this composed transaction, as long as the body of this composed transaction has a valid grounding over the database, we are sure that this sequence of transaction can commit. So that was the original goal of constructing this invariant. So how does the transaction execute? Over a Quantum Database, effectively, it basically checks if the invariant, which can be -- if the invariant with the extended sequence of transaction has a valid assignment or a valid grounding, if this is so, then you update the quantum state and you commit the transaction. But now the assignment has not taken place. If not, then the transaction aborts. Okay. So finally, what happens when you perform reads over the Quantum Database? At some point in the time, Mickey has to actually know his seat, so what happens in that case? And this goes back to the design choice which we made earlier to keep uncertainty completely internal to the Quantum Database. So let's say that this is the initial Quantum Database, one in which Mickey has both seat 1B and 1C, and now, if Mickey queries -- Mickey issues a read query over this Quantum Database, the Quantum Database, in order to keep the uncertainty completely internal, collapses all possible worlds in which Mickey can have two different seats. Sorry, it collapses to a set of possible worlds over which the read query has a completely deterministic answer. In this case, it has eliminated one of these possible worlds. In general, it can actually be a set of possible worlds. And we have a unification-based algorithm, which is not optimal, but it works in practice. In fact, the optimal solution is actually [indiscernible] to complete, and it can be related to a completely different problem of information disclosure through views, this famous paper of Miklau and Suciu'. So I hope if you understand this, then you now understand why we call it Quantum Database. There's an analogy you can draw to Schrodinger's cat, that when the cat is inside the box, it can be both dead and alive. But as soon as you open the box, which is in this case issuing a read query, the cat can be either dead or alive, but not both. >>: What happened to the state after the query? Do you take it back to the quantum state, or it stays? >> Sudip Roy: No. Once a read is -- so in effect, what is happening is a read is now also changing the database state. A read internally may be converted to an update. Yes. >>: What is the impact going to be? >> Sudip Roy: So the impact is basically, in order to -- so the whole point of having these possible worlds is by maintaining as many of these possibilities, I can optimize my resource allocation, right? So to minimize the impact of reads, I want to maximize the number of possible worlds which I retain and yet can answer the query deterministically. >>: So what's the object function you're optimizing where you collapse the ->> Sudip Roy: In this case, we are just maximizing the objective function and maintaining the maximum number of possible worlds after the collapse. >>: How is that specified? Who specifies it? >> Sudip Roy: We assume that that's the default in some sense. You can think of applications where you would want to maximize some other notion, so let's say that if you want to maximize revenue, then some possible worlds may be more beneficial for you than others. >>: Do you need an extension to some syntax or new syntax to specify this, or is it ->> Sudip Roy: Yes, you would. We don't support that as of now, but it's definitely something which can be extended. Right, so the takeaway was that we exploited this flexibility in the transactions which are executing to be lazy in binding some of the values which are not read immediately in the transaction. And I presented Quantum Database, which basically optimizes this resource allocation using lazy binding. So that concludes the part on Quantum Databases, and I'm going to now move on to Homeostasis. So any other questions on Quantum Databases so far? >>: Any performance numbers? Like in terms of doing it outside the database, was it worth the benefit -- do you get a gain in terms of performance, or do you gain in terms of ->> Sudip Roy: You gain in terms of utility, so it's not exactly in terms of performance. You may gain in terms of performance by implementing Quantum Database inside the database. Our implementation was in the form of a middle tier which sits outside the database, and actually I'm not going to show you the performance numbers for Quantum Databases just due to lack of time. I have some backup slides where we can go for them. >>: But to establish utility, you need to have a rich framework, right? I mean, it's also related to the next question. So take your two examples. One was flight, another was hotel. In the hotel case, there was an explicit goal that you want to maximize the revenue, so how you do express that in your Quantum Database? >> Sudip Roy: As I -- at this point, we don't support it. Yes. If we are to build a system, that's definitely a useful add on that has to be supported. Other questions? Okay. So let us go back to this example in which we were lazy, yet we were strongly consistent, and we achieved this by exploiting the fact that we are going to allow these two database states to be in two different states, yet two different states as long as they are equivalent to each other. And I kind of said that we are going to use this global treaty, we are going to project it to these local treaties, and all of this was a bit abstract. So in this part, I'm going to formalize and make all of this concrete. So before I do that, I had also promised that I have these couple of assumptions, and I'm going to come back and identify what exactly this class of applications are. So let us see a few examples. Firstly, why is low latency important? Why do we really care about saving on the network round trips? Now, there have been a number of anecdotal evidence which suggest that even a 100millisecond latency in the course of Amazon causes a 1% loss of revenue, and usually this figure rises exponentially with the average latency. So, clearly, latency is something which is important in order to -- which can directly be related to dollar values. And there are many applications which satisfy the previous assumption. Let's say online shopping, in which the data is actually replicated across different datacenters, and the flexibility is you can imagine -- and I'm going to actually show you, my experiments are going to be over TPC-W benchmark, which is an online shopping benchmark. So you don't really have to know how many items are there in the stock exactly, so there is some flexibility in that, as long as they are sufficient for your order to go through. Similarly, in auction systems, you only need to maintain which are the top set of auctions. It doesn't really matter what the other lower values of auctions are. And, finally, this is something which probably doesn't directly apply, at least right now, but you can imagine that, if you can partition the application state for mobile devices, in which part of your application state is in mobile, you are basically saying that you can make changes to some part of the application state, which is on your mobile device. And as long as you're doing that, you don't have to communicate to the server, then you can improve the app response time, because not every of your actions is now going to require communicating with the server. So here's the overview of our solution. So in the first step, we are basically going to analyze the application transactions to automatically identify this notion of flexibility, and the intuition is that we want to basically identify which database states are equivalent, and therefore we are going to partition the space of database states into equivalence classes. And we are going to build upon a rich literature on program analysis, because effectively the transactions, as I said initially, are user programs. And in the second step, once we identify these equivalence classes, we are going to exploit this flexibility to minimize coordination. And again, the intuition is that instead of trying to enforce that the two replicas are in completely identical state, I'm going to instead enforce that the two replicas are in equivalent state. They may be non-identical. And there is -- coming back to [Sudipta's] question escrow transactions and demarcation protocols and distributed divergence controlled protocols, it may be a bit vague right now, but we are a significant generalization over each of these techniques. Moreover, we do a number of other things, and I hope it will be more obvious by the end of the talk. Let me come back at the end of the talk to revisit how exactly we are different from each of them. So let us apply the solution to the voting example, right? So the input in the voting example was this set of three transaction types. The output of the first step would be these three equivalence classes, one in which the Democrats are leading, one in which the Republicans are leading and one in which they are tied. And this is going to feed into the second step, and then the output of the second step is going to be a protocol which ensures consistency by requiring that the replicas always stay in the same equivalence class. So whenever you are actually changing from one equivalence class to another, then the protocol is going to ensure that that happens consistently and no one perceives that you are in two different states. And that's how we are going to achieve strong consistency. So, with that, let me dive into how we do step one, and I'll get on to step two later. So doing this analysis in full generality is difficult, and I do not expect you to parse this. So we restrict the transactions to be expressed in a particular subset of the language. This is the language. I do not expect you to actually parse this. Let me just highlight a few key points. We assume that the database is a collection of integers. We have these IO statements to read and write from the database. Right now, we support only conditionals, if, then, else. We do not support for loops and while loops, but for OLTP transactions, this is not a big restriction. And, finally, we have arithmetic expressions and Boolean expressions. So assuming that transactions are executed -- transactions are expressed in this language, this is how a transaction would look like. So it has a read statement, and I use the hat notation to indicate local variables. The non-hat variables are stored in the databases. So the read(X, X-hat) would read the value of x from the database into the local variable X-hat. Read(Y) would do the same for y, and then the transaction checks if X+Y is less than 10. Then it increments X. Otherwise, it decrements X, and finally, it writes that value back into the database. So this is going to be my running example for the rest of the talk. So let us try to formalize this notion of flexibility a bit more. Assume that we have these three database states. These three all have different values of Xs and Ys, and yet, from this transaction's perspective, if you execute these transactions on each of these database states, it is going to produce an identical effect, the effect being increment X by one. So how can we represent concisely this entire set of database states? To do so, we use symbolic tables, which basically have two columns. The first column corresponds to a partition of the space of database states, and the second column is what effect the execution of the transaction has. So if you consider this tuple, it says that over all database states in which X+Y < 10, executing this transaction would have the effect of incrementing X by one and similarly, for the other case. Now, of course, application would have multiple transactions, not just one transaction, so let's add another transaction to the mix. It's very similar to the first transaction, except that now instead of writing to X, it is actually writing to Y. And also, I have changed the threshold from 10 to 20. And here, you can see that that's basically the symbolic table for transaction T2. Now, if these are the only two transactions which are executed in the application, I can combine them to construct a joint symbolic table, and I do so by taking a cross-product. Now, in normal cases, the cross-product would have four tuples. One of them is degenerate, and therefore I have eliminated it, and therefore it has three tuples. What does it say? It basically says that over all database states over which X+Y < 10, executing Transaction 1 has the effect of incrementing X by one. Executing Transaction T2 has the effect of incrementing Y by one. So I basically didn't explain how we construct the symbolic table from this transaction, so let me show you how we do that. And again, we have a set of inductive rules for constructing these symbolic tables from the transaction code. I do not expect you to parse through them. Instead, let us look at an example construction. So this is, again, a control flow graph for the transaction which I showed earlier, and we construct the symbolic table in a bottom-up manner. So we start with the last statement -- in this case, it is a write, in which case, executing only this statement overall database states would have the effect of assigning the value of the local variable X-hat to X. And that's why the true indicates that it'll have the same effect over all states. And as you work your way backward, in this case, you see that, well, it is going to have the effect of incrementing along this branch. It'll have the effect of decrementing along this branch. When you see an if statement, you see that in order to take this part in the code, X+Y must be greater than 10. To take the other part, it has to be less than 10. When you see a read statement, then you basically remove the local variables and substitute with the corresponding database variables. You do the same thing for read(X), and finally, you end up with the symbolic table, and this is exactly the symbolic table which I had shown you earlier. Now, the key thing to note here is that the symbolic table only uses variables which are in the database and does not have any references to local variables, because we have already substituted these local variables with their corresponding -- so when they were read. So now that we have constructed these symbolic tables, let us see how we can use these symbolic tables to construct a protocol. So, again, the input to the second step is the output of the first step, in this case, this giant symbolic table. And let us assume for simplicity that we are in a distributed case in which one of the sites has the variable X, the other site has the variable Y, and the initial states are 12 and 13. So what the Homeostasis Protocol does is it checks to which equivalence class does my current state of the database belong to? So in this case, the values of X and Y being 12 and 13 indicate that it belongs to the third equivalence class, and it is going to use that to be a global treaty. Let us assume that there is an efficient way of actually maintaining this global treaty without requiring communication, and I'll come back to that in the next step. So now, when I execute a transaction, what I do is basically I go and look up what effect that Transaction T1 has in this particular equivalence class. In this case, it just decrements the value of X. So I can keep on executing these transactions, as long as the overall state satisfies this global treaty. So, finally, I will reach a stage where a transaction may actually cause a violation of this global treaty, at which point I recheck and establish a new global treaty, and that begins a new round of this Homeostasis Protocol. So what we have done is basically we have executed six transactions in this case and incurred the cost of only two network latencies. If you had done it in a strongly consist manner, you would have incurred six network latencies. Now, of course, how many network latencies you actually incur will depend on how big your equivalence class is. Yes. >>: When you're in the state Y = 11, you don't know what is the global value of X, so how do you validate the global treaty locally without communicating? >> Sudip Roy: Right, so that comes back to the question of this magic, which I am going to come to in the next slide. But before I do that, so we have a theorem that proves that the Homeostasis actually produces UC realizable schedules. So, of course, the naive approach to enforce this global treaty is to be aware of the global state, and that will require communication and knowing the values of both X and Y at every step, which kind of defeats the whole purpose, because we'll be back in the world of strong consistency. We want a lazy approach, and to do this, we basically project this global treaty into a set of locally enforceable treaties. And, of course, because we are projecting it into this locally enforceable treaties, and these locally enforceable treaties are working on a limited state, they have to be more conservative, but we require that these locally enforceable treaties would together imply the global treaty. So in this case, to enforce that X+Y is greater than or equal to 20, one possible set of local treaties would be X > 10 and Y > 10. So as opposed to now enforcing the global treaty, I'm now going to enforce these local treaties. So I keep on executing transactions until I run into a treaty violation. Now, given that these local treaties have to be more conservative, this violation is going to occur earlier in the previous case which I showed you, at which point you renegotiate and establish a new set of treaties and the protocol goes on. So, of course, there are multiple possible ways of projecting a global treaty into a set of local treaties, and if you assume that you know something about the workload -- that is, you assume that you know that Transaction T1 is more frequent than the other transaction, then you can find an optimal projection, projections which are least likely to be violated. So in this case, this was the suboptimal solution of choosing 10 and 10, which only allowed you to execute four transactions. As it turns out, this -- for this particular sequence of transactions, the optimal projection is to have X is greater than or equal to 9 and Y is greater than or equal to 11. This will allow you to execute six transactions without a violation. Yes? >>: What happens if the right side in there has something in common, like if you're writing X and Y in both places. In this example, you're writing X on one side and Y on the other side, so you can kind of work around it, right? But the right side intersects, will it cause a problem? >> Sudip Roy: It does cause a problem, yes, in which case you would actually be back in the world of strong consistency, and there is nothing which you can do. Now, of course, in a replicated scenario, all the state is available locally. >>: But the same problem would show up in -- let's take your voting example. So there are two tables, and they're not -- they're applicable. You can't actually increment the vote in either place. You're testing all the writes to one place, but the reads ->> Sudip Roy: No, no, no. No. >>: No. I can make writes to both places. So in the example which I showed, I was actually casting both Republican and Democrat votes at both the datacenters. >>: When you know the final tally, some subset of votes got incremented here, some subset there. We just know the result. You won't know the actual data. >> Sudip Roy: Yes, but that's kind of the whole point, that the application doesn't really need to know the exact tally. At some point, when you synchronize, you are going to know the tally. You are going to merge these two states, so it's not that the state is going to be always divergent. It is going to reconcile periodically at synchronization points. No? Okay. >>: Sudip, without knowing the workload, of course, you don't really know what the optimal global treaty should be, right? Meaning you could have all the workload on one side or the other and all the updates to only one of the variables, and you wouldn't -- your global treaty wouldn't be able to adjust dynamically for that unless you waited more steps? >> Sudip Roy: Right. So in which case you can have something which is similar to the idea of doing -- dynamically estimating what the workload is. That is, if during the day some items are ordered more frequently on the west coast -- sorry, in America than on the other side of the globe, then you would allocate more budget to the datacenter in the US. So putting all of this together, we basically developed this system called Homeostasis, and it has a number of components, and let me just briefly walk you through it. So we assume that we are given a set of transactions, which are to be run, and then we use a compiler to construct these giant symbolic tables. Of course, we do not actually construct a single large, giant symbolic table for the entire set of transactions. We in fact use techniques from the SDD1 paper, which partitions -- it's actually [indiscernible] work, which partitions these sets, this entire set of transaction, into groups of interdependent transactions based on conflict graph analysis, and we construct a joint symbolic table for each such interdependent group of transactions. We maintain a treaty for each such group, so whenever a transaction is executed, the treaty enforcer allows a local execution if the local treaty is not violated. If it is violated, then it goes and talks to -- then it initiates a round of negotiation. The treaty negotiator goes and talks to the other replicas. It merges the changes which have happened at the other replica since the last synchronization. Based on this new state, new synchronized state of the database, it constructs an instance of a satisfiability problem. And the solution to the satisfiability problem is the optimal partitioning of the global treaty into local treaties. And then, it sets the new treaty, and that starts a new round of this homeostasis protocol. So the overall takeaway is that there's a class of applications which have some flexibility in their transactions. We can exploit these flexibilities to lazily propagate writes without sacrificing consistency, and I showed you the Homeostasis, which is a system that identifies and exploits this flexibility to minimize communication between different nodes in a distributed or a replicated system. So with that, let me present to you some experimental results, and as I had pointed out earlier, my experiments are going to focus on Homeostasis, but I'm happy to talk about results on Quantum Databases after the talk. So the goal is to evaluate the applicability of -- you had a question? >>: You were talking about this global treaty. Do you have any constraints on what kind of global treaty you can support? And are there any guidelines as to, once given the global treaty, how can you translate into these more -- these local treaties? You gave an example, but it was ->> Sudip Roy: Right. So the first question was what are the constraints on the global treaties? We actually restricted the language to the fragment I just showed you, and by analyzing those transactions, you can only get a certain class of global treaties, and precisely that's going to be P&O arithmetic first-order logic. Now, in general, solving satisfiability problems over P&O arithmetic first-order logic is undecidable. We use some tricks to actual convert it into Presburger arithmetic first-order logic, and that is decidable, as well as solvable. In fact, we use Z3 software to do this, which is actually a Microsoft Research technology. Does that answer your question? Right. So coming back to the experiments, we want to evaluate the benefits of Homeostasis in a georeplicated setting, and more precisely, we want to answer the question as to how often can actual we avoid coordination for realistic application workloads? Secondly, we want to study this tradeoff between how much time we are spending in finding this optimal projection of global treaties into local treaties and how does that correlate to how much savings we get in coordination? So for workload, we use a TPC-W by confirm like transactions, so we assume that there are 10,000 items in a database. We are assuming that each transform is purchasing one to four pieces of particular item. Initially, the database is populated with stock levels ranging from zero to 100 from each item, and based on the TPC-W specs, every time the level actually goes to zero, the transaction automatically replenishes the stock level by adding 100 new pieces of the item. And we basically ran our experience on EC2. We used m3.xlarge instances, and the system was deployed across five different datacenters, Virginia, Ireland, Oregon, Sao Paolo and Singapore. For the first two experiments, I'm just going to use two replicas. That is going to be Virginia and Ireland. And for the third experiment, I'm going to show from two to five, the behavior of the system as we add sites. Sorry. So let me explain what this graph is. So on the X-axis, I have the sequence of transactions issued for one particular item -- in this case, let's say item A. On the Y-axis, on this side, I have what is the view of the stock level from the point of view of Replica 1. And on the Y-axis on the other side, I have the transaction latencies. So the red line here corresponds to the stock value, and the green line corresponds to the transaction latencies. So let us walk through this graph from left to right. So let's say that I will start with this value of 100 for the red line. So now I'm executing transactions locally using the Homeostasis Protocol, and that manifests itself in these low latencies, because the transactions are executed locally. At this point, I witness a local treaty violation, so I need to run a round of synchronization, and that requires communication between the replicas, which is why there is a spike in this green plot. And, also, I witness a sharp cliff in the red innovate, which corresponds to that, and that is because I am now synchronizing the state, so this change was the number of purchases which happened at the other replica while I was running my transactions locally. At this point, I have established a new set of local treaties, and the execution continues locally again until I reach zero, at which point I replenish my stock and the protocol proceeds. So how often do we benefit from this? To understand this, basically, this is a transaction latency profile. On the X-axis, I have latencies in the log scale. On the Y-axis, I have the cumulative probability. That is what fraction of the transactions are executing under a particular latency value. So we are comparing against 2PC, which is strongly consistent and it always takes a round-trip hit -- in this case, 200 milliseconds. And, of course, there's a sharp cliff, because after 200 milliseconds, all transactions will be able to execute. However, and these four lines basically correspond to different settings of the optimization parameter, so the higher value of L means that you are spending more time and finding optimal treaties, and therefore, you expect more number of transactions to execute locally. So in this case, almost for all four of these parameters, you see that more than 85% of transactions were executed locally. Now, how does that behavior change as we increase the number of sites? So what exactly happens when we increase the number of sites? As I pointed out earlier, when you factorize a global treaty into locally enforceable treaties, you need to be conservative. So those local treaties have to be more and more conservative. And if you are factorizing it into more number of fragments, then they have to be even more conservative. So as you go from two replicas to five replicas, your local treaties are going to be increasingly more conservative and therefore more likely to be violated easily. And this manifests itself in this downward shift of the inflection point, which basically says that slightly lesser numbers of transactions are executed locally and you witness the treaty violations more frequently. Now, the takeaway from this is basically, even with five sites, more than 80% of the transactions were executed locally. So with that, let me mention some of the related works. There has been quite a bit of interest in the database community. I'm sorry? >>: Throughput. >> Sudip Roy: Throughput, I probably have a backup slide on that. We did have an experiment on that. I'm happy to show that to you offline. >>: What are the inter-site communication delays? >> Sudip Roy: So between -- so, of course, that depends on which two datacenters we are talking about. It ranges between 100 milliseconds between east coast and west coast -- actually, around 85 milliseconds -- to more than 250 milliseconds between Virginia and Singapore. >>: Can you back up then? So when you're in the 10-millisecond range, everything's local. >> Sudip Roy: Yes. >>: And then you have this big jump as soon as the communication -- as soon as you start renegotiating, then you're into cross-datacenter communication. That's when you get the big jump? >> Sudip Roy: Yes. No, that's when you get this jump in the X-axis. So what happens is, you can actually do better. You can give some anti-entropy protocol, which runs in the background and periodically reconciles the state between the two so that you can eliminate some of these local treaty violations. However, you cannot eliminate them completely, because whenever you transition across an equivalence class boundary, that has to happen consistently. >>: Do you update commute in this case? >> Sudip Roy: Yes. >>: So all you're really doing is you're just looking for better consistency of reads? Is that what's going on? Because if the update's commute, then you can do multi-master replication with impunity and you don't need to worry about cross-database delays or anything and renegotiate. As long as the updates eventually reach their destination, everything will wash out. >> Sudip Roy: However, I showed you in the voting example where -- yes, you were right earlier when you said that we are doing read. We want to ensure read consistency. >>: It's all about read consistency. >> Sudip Roy: Yes. >>: His reordering transaction sort of requires read consistency, at least at that point, right? But you could do an incremental -- you could save yourself a lot of renegotiating of treaties by simply incrementally sending the updates from one side to the other offline, if you will, or in the background. >> Sudip Roy: Yes, so in this particular case, you would actually not witness this vertical latency if you do this. >>: And if you were willing to soften your requirements about updating so that you had reordering by saying anything less than 10 you'll reorder, you could also soften the spike at that point, as well. >> Sudip Roy: Yes. So going back to the related work, there's been quite a bit of interest in this field in general, but we are the first ones who actually adapt the consistency which the data store provides to what is required by the application, and we do so by doing this program analysis. Moreover, we are also the first one who tried to adapt based on the transaction workload, which none of the other protocols do. Moreover, these other protocols always assume that you are given with a simple constraint, which is like an inequality constraint, and you want to maintain that constraint. Our protocol generalizes what this class of constraints is, as well as allows you to switch from one constraint to another. Only that has to happen consistently. There's also been some work in the programming languages community for program analysis and automatically identifying atomic sections and also, in systems community, to assert a formula in a distributed manner. So with that, let me summarize. So the key idea is that we want to exploit flexibility in transactions, and I have identified classes of transactions where such flexibility is available to be lazy when possible. And I have showed you one example in which this laziness created room for optimization, and I presented Quantum Databases as a system which does that. I have showed you another instance where this laziness minimizes the amount of coordination required without sacrificing consistency, and I presented Homeostasis, which provides such semantic-based adaptive consistency. So let me mention something about what I plan to do if I get an opportunity here at Microsoft Research. One of the interesting things which I want to pursue is can we synthesize concurrency control protocols automatically. Assume that you are given correctness criteria in terms of let's say one copy is realizability, and you are given some information about the environment, as in what kind of hardware support you have, what its efficient, what is not efficient, some specification of the environment. And then can we automatically synthesize the best concurrency control protocol? And there's been quite a bit of recent very interesting work in program synthesis, some of it from Microsoft Research itself, and it would be really interesting to investigate how we can apply some of those techniques to automatically synthesize concurrency control protocols. The other direction of research which would be interesting to pursue is no-knob cloud services. As computing moves to the cloud, managing cloud services by administrators becomes increasingly more and more difficult, so you would want to have a system in which the cloud somehow automatically detects performance anomalies or any other sort of anomalies and takes actions itself as far as possible to remove any kind of anomalies. And the first step in this is, of course, diagnostics, and we have done some initial work with Christian on this. Of course, the interesting question is once you have identified what these anomalies are with some high-level idea of what the reason is, can you close the loop and automatically improve the -- automatically take actions which improve the performance of the cloud service? So I talked about Homeostasis and Quantum Databases today. I have also worked -- I have also done some initial work on the Youtopia Project, which is about designing declarative abstractions for data-driven coordination. You may have heard of entangled queries and transactions. So the key idea there is basically with the rise of social networking, you would want users -- users would actually want to issue transactions, which can now talk to each other and take joint decisions. And we designed abstractions which allow you to do this in a clean and efficient manner. Finally, I have done three internships, one with Christian, in which we initiated this new project for robust diagnostic for cloud platforms. I have done two other internships, one -- actually, both of them in Google Research in the Fusion Tables team. For the first, I worked on spatial query processing in the Fusion Tables back end. In fact, if you have used Fusion Tables or if you use it today, it's very likely that my code is executed on the back end. And in the next internship, I worked on faceted navigation for data exploration, and I'm happy to talk about any of these projects in the one-on-one meetings which I have. So, with that, that concludes my talk. Thanks a lot for attending. I'm happy to answer any other questions which you may have. >> Christian Konig: Okay, any more questions? All right. Let's thank the speaker again.

>> Christian Konig: Good morning, everyone. Thank you... introduce Sudip Roy. He is joining us from Cornell...

Related documents

Products

Support

&gt;&gt; Christian Konig: Good morning, everyone. Thank you... introduce Sudip Roy. He is joining us from Cornell...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Christian Konig: Good morning, everyone. Thank you... introduce Sudip Roy. He is joining us from Cornell...