1 >> Ravi Remamurthy: Okay. Good morning, everybody. It's a pleasure to welcome Sudarshan. Most of you already know him. He's been a visiting research at MSR before. He's a professor in IT Bombay, which has an excellent database research group. He's a co-author of a very popular undergrad textbook and his research areas include query optimization, keyword [indiscernible] and databases. And today, recently, he's been working on something called holistic query optimization, and he's going to talk about that today. So I'll hand it over to him. >> S. Sudarshan: Thank you, Ravi. So this work was done by -- it was started by my Ph.D. student, Ravinda Guravannavar, or Ravi, as we all call him. And it is being continued by Karthik, who is a Ph.D. student currently working on this topic. So if there are any really hard questions, I'm going to ask you to e-mail them. But hopefully, it should be okay. And then several Master's students including Mahindra Chavan and a few others. Okay. So what is the problem? So here you have a a taxi. There's a long line. And in our context, database system. So you execute a query, you fill takes a long time to go there, a long time to come There's one taxi in town. So it's a very bad idea bunch of people waiting for the taxi is called to the taxi, start over. It back and then the next guy. to live this way. And why would you, you know, why would anyone do it this way? Because there are many situations where people, called loops with query [indiscernible]. Sometimes it because of a programmer didn't know better. It was easy to change. Sometimes it's actually hard to change because the query is not directly in the loop but it's deeply [indiscernible]. So it's useful not only in databases, but the same problem even arises with web service requests. Our implementation has focused on databases. But the ideas that are present here should be applicable for web services also. So naive execution by iterative or queries is obviously inefficient, said. The latency is a huge factor. In addition, what we have seen with multiple CPUs, the database, the database can actually handle a queries concurrently, especially if their data is [indiscernible] in there's not much disk IO. as we is that lot of cache. So 2 And we're able to bump, you know, ten year queries in [indiscernible] database, and the response time, you know, highly changes when you go to 20, 30. These were small queries. Even for large queries, this kind of works, because if they share the scan, for example, you can get much better performance. Okay. So how do we solve this problem? So a lot of us have been working on query optimization for many years, but unfortunately, there's nothing the database can do. It's being sent a series of queries, and unless it can kind of guess that this is what is happening, but even then there's little it can do, because at best it can precompute the query and keep it, but still [indiscernible] synchronous and slow with latency. So you really have to go to the other side of the gap here and work on the application program. So that's something we avoided for many years, because we thought we are database people. We are not programming. We don't know anything about programming languages. But likely, I had a Ph.D. student who not only knew databases, he also knew program analysis. So we started filling this gap. So as we say, it's time to think outside of the box for query optimization. So we're actually going to give two solutions to the problem I posed. The first one is to use a bus, ecologically sound. And what is a bus in this contest? It's basically [indiscernible] of execution, as you can imagine. had a lot of fun with these pictures. We So I'm first going to describe earlier work, which first appeared in VLDB 2008, which was given a program which makes calls on a database. Our implementation detects database calls and works on Java programs. How do you automatically rewrite the program to replace a loop with calls by a single batch oriented execution? So the obvious, such as [indiscernible] plan sharing of disc IO, network round trip delays, all of those which I told you. So our approach is to actually transform imperative programs, and the way we do that transformation is there are, luckily, tools available, including some at 3 Microsoft which unfortunately I don't think we have access to it, but we are using an open source tool called SOOT, which analyzes Java, code [indiscernible] and internal presentation called [indiscernible]. And then builds the data dependency in control for other graphs, collectively called the program dependency graph. And now we can -- it also provides a bunch of APIs to detect things in this graph. So it makes our life a lot easier to do an analysis using this tool. And it also allows us to move segments around at the [indiscernible] statements can be moved around. So we can actually transform these programs. transformations applied to the program. So now we have basically And then the second part is to actually rewrite the query to make it set oriented. Luckily, you know, in this case, the sequence of our optimizer saved the day here, because it's really good at decorrelating queries. So all we had to do was give it a query which essentially used the cross [indiscernible] and [indiscernible] version of the cross [indiscernible]. And it did the rest of the stuff for us. So that is actually on sequence server. Okay. So here is an example of a program, a small program, and how we rewrite it. See if I can use this pointer. Okay. So here is a connection which has been somehow set up and you are preparing a query which counts the number of partkeys from part from a particular category, which is provided [indiscernible]. And then there's the loop, which goes through a category list, starting through [indiscernible] elements using next. It binds the question mark parameter to the category, executes it, gets the count and then adds it to the sum. So it's doing a fair amount of work, but still a very small example. So here is what we do when we end up rewriting it. So this part is the same. Now here, you will notice there are two loops where there was one. So the key thing here is to split the loop into two parts, where the first part doesn't actually execute the query. Well, we'll see that in the asynchronous case, it actually does initiate execution of the query. But in the batch oriented portion, all it does is it steps through the category list and now, after binding the parameter, it says add batch. And that's all that loop does. 4 And then this single statement doesn't execute batch, but rewriting the query, which as I said is very simple. And then the last part is executing the results of this query and adding it up to get the sum. Okay? Of course, this all has to be syntactic. We don't understand the semantics. And, of course, a lot of conditions apply, you can't always do this. >>: In this particular case, can't you just issue one query? It's probably harder to decide that [indiscernible] [inaudible] you could push it as a union of categories and [inaudible]. >> S. Sudarshan: Right. >>: Could have potentially done that [indiscernible]. It's harder to do it. >> S. Sudarshan: So there are two kinds of things that could happen. One case is where the category list is actually a list which is in the programming language. It is not coming from the database. And here, what we are doing is this [indiscernible] doesn't execute the query. It's actually just collecting all the values. And then this execute batch is doing the whole execution. >>: Still execute [indiscernible]. >> S. Sudarshan: No. It's one query. It's a single query, which is a batch oriented version of it. Basically makes the parameter and makes a temporary table in the database, and then that table [indiscernible] on the basic query. [Indiscernible] are the parameters which went in, along with the regular attributes which were in the query. And then that comes back and this part actually steps over it. >>: I guess what I was asking is you could have pushed the sum -- >> S. Sudarshan: >>: Into the database, yes. Yes. If he -- [indiscernible]. >> S. Sudarshan: Exactly. So in this case, maybe we could [indiscernible] that we can translate this. But in general, what we're doing arbitrary stuff here, all dependent and so on. So we actually can resolve [indiscernible] if 5 the -- this -- I'm not showing it here, but the way we do it, this would execute exactly what this loop did in the same order over here. And so we can [indiscernible] if you bring stuff over for example, it still works. And, in fact, coming to the other thing, if this thing over here was actually it rating what a relation in the data -- this is actually another common case. So you have a query which looks at a relation of another query in sight. Then we could short circuit this part, approximately, and realize that this is already in the database as a relation and use that relation. We could. >>: I don't think our current implementation does it. [indiscernible]. >> S. Sudarshan: >>: Yeah? Yeah. What happens [indiscernible]. >> S. Sudarshan: Okay. So if you -- well, in the paper, we discuss how we could take a procedure and then rewrite it to a batched version of the same procedure, where it takes one parameter. We pass a set of parameters, and we can actually rewrite the procedure itself. It's a new signature in theory. >>: [indiscernible]. >> S. Sudarshan: >>: Right. [indiscernible]. >> S. Sudarshan: Right. So if the so if itself [indiscernible] is then [indiscernible] by [indiscernible] this is a possibility [indiscernible] could do it. But currently will not handle [indiscernible]. >>: So a lot of the latency issue is avoided by simply packaging this whole thing up into a procedure [indiscernible] down the database and letting that procedure perhaps execute unmodified? >> S. Sudarshan: query. Yeah, it's a query. It's not a procedure. It's just a 6 >>: Yeah. So what you're adding to that here is somehow driving more of the, more of the execution into the declarative part of the query when you optimize, is that? >> S. Sudarshan: That too, because we are telling the database to do this query on another batch of parameters. The optimizer can now do something [indiscernible], which it couldn't have if you got into the -- so that's the second one. The first one is the latency and the second is this absolute. Okay. So I'm going to discuss some of these conditions, but so this is -- what is this example here? This is a slightly more complex example, where what we're doing here is if you look here, in this case, we have a category coming in. And now look at the last part here. It says category is equal to get parent of category. So now the step here is after the query execution. So if you split the loop at this point, we actually don't have the ability to get all the categories. So what this example is showing us, you can't just take the point in the loop where the query is and split it, because there is a dependency here which feeds back into it. So this illustrates, that, well, two things. The first is that that if there are dependencies, which they call loop carried flow dependencies. So this thing goes into this next iteration of the loop back here. Then you cannot split the loop. That's the first thing this illustrates. So that's one of the required conditions for doing the loop split. But the good news is that we can actually do something about it. There is no reason why this had to be here. We could have moved it up here just before this query. But then there's a slight problem, which is that the category which was -- on which the query was to execute is now getting clobbered, because [indiscernible] we can use then query variables and work around this. We'll see an example. Okay. So there are several steps in this transformation, through which I'll show you how to handle complex cases like this. The first step is, obviously, to identify which queries we want to turn into batch execution. And in our [indiscernible] implementation we just look for database [indiscernible] and we could perhaps do it only if there are [indiscernible] of a loop and so on. But anyway, the [indiscernible] are small enough that we can get away with this for now. That's the first step. The intention is to split 7 the loop at this point. But like I said, there is a loop carried flow dependency here, which feeds back into the loop condition and also into this. So we can't directly split it. So we do a data flow analysis here to look for step is to try to reorder statements to see if flow dependency. Now, there are certain cases which case we have to abandon it and not split this condition, and so the third we can remove this loop carried where it cannot be done, in the loop. But in this particular case, what has happened is we have stored the category coming in into a temporary variable, and we have moved that statement, get category to get parent of category which was done here, we moved it up here, because I think nothing in between which depends on it. Well, only thing which depended was this thing. The category was being passed as a parameter to the query. But what we have done here is created [indiscernible] variable to we can use that over there. Okay. here. >>: So now, if you see, this resulting loop, there's actually no dependency Yeah? [Inaudible]. >> S. Sudarshan: Yeah, so that is a lot of conditions applied. So there are many conditions. One is there are all these functions which could potentially have side effects, but in this case, you know, if the side effect affected one of these things, then we could have a problem. So we can do interpretive analysis to decide this. And if any of these ran another query, which affected the result of this query, then again we have trouble. So to make sure there's no problem, we actually have to see what other queries shoot to the database inside here. So we could do static analysis. In our tool currently, you know, we are implementing that part. It's not really. So like I say, it's faith-based optimization. We trust that this currently doesn't cause problems. Yeah? >>: So one thing is the extra conditions of the program. This thing is on the data, for example, I wonder if there is [indiscernible] query is slightly more complicated query often these transformations are not fully semantic specific. 8 They are symptomatic there are no issues with [indiscernible]. [indiscernible] is you wouldn't know on the database. The >> S. Sudarshan: I think in this case, that problem will not arise, because the [indiscernible] cross join definition is for each value generated by the left side of the cross join, you execute the query on the right side, and the right side query is the original query. So assume being the SQL optimizer does it right, and I do believe it does it right, we should not have any problems with values. But you're right, if you try to do decorrelation analysis, we could run into trouble. So this is the reason we currently work only on SQL so the other guys don't fully implement this. Okay. So I think SyBase does present some sort of lateral joint, so we were trying to implement it on SyBase, but it's not yet functional, as far as I know. Okay so in this case, after this [indiscernible], the loop carried flow dependency is gone, and there are [indiscernible] conditions could take if you could split a loop at that point. I won't get into all the details. But the last step is how to split the loop. In the earlier program, I sugar coated a lot of stuff to make it easier to understand what was going on. But this is actually what a transformation does. So the first loop did something. Then we executed the query. Now, the second loop should do exactly the same set of things as the first loop did. Of course, what happens inside the loop is different. But if there's a variable which is defined in this loop and was used down here, then we better preserve the value of that variable so that when the same iteration of the loop is done over here, it sees the same value for that variable. So once we split the loop, we have to make sure that variable states are saved and then restored down here. Okay? So in order to do that, we create this loop context stable, and has entries. One entry per iteration of the first loop. And the second loop simply goes over the same entries in the same order in which they were created. So it's really more of an [indiscernible] than a [indiscernible]. 9 So what we do here is create a context. And in this case, let us see. I think there is no variable that needed to go into the context. So the context was basically used because the original query execution here -- well, let me go back to the first slide to show the original query. Okay. So if you see here, now in our earlier program, I sugar coated it by saying the query just returns one value. But in general, the query may return multiple values, and you might actually be iterating over the values here. In this case, you know, we have removed a little bit of the sugar coating and we say results [indiscernible] next, because there's just one loop but it's getting that. But if you had a loop, we have to make sure that in the second part of the split loop, we will be able to execute the same thing over there and it could get all the results of one particular [indiscernible] location. Okay. So that's part of what a loop context can do here. So when we are doing our batch, we are setting a context. So which iteration of the loop was this for is provided by the context. And here, we're saying the results are equal to statement [indiscernible] on the context. So we'll get exactly the same results that the corresponding iteration over here would have got. And then the rest is the same as before. Okay. Any questions? So last part is how to do the batch rewriting. So this is the sequence of our syntax. So over here is the original query. We have created a batch table, and we have inserted all of the values into the batch table. We can use the batch form of the database in such statement so that we don't do multiple round trips. We do only a few round trips. And then here, we select batch table one. So those are the parameters and all attributes of the query from batch table 1 outer apply, and then over here is the query. And finally, we order by loop key one so that the output of this comes back in the correct order so we can just go through it in sequence. We don't have to go back and forth in it. >>: [indiscernible] doesn't matter, though, right? This final ordering. >> S. Sudarshan: So this ordering is on not of the query. If the query had an order by -- well, then what would happen? You'd have [indiscernible] order. 10 So we'd do an order by loop key. there for the original query. And then in the loop key, whatever order was So the point is that a mini query is issued. We want the results to come back in the same order in which the queries were issued, because there is a loop there. Maybe it prints out there isn't. So the results have to come in the same order then. >>: I was just thinking that there might be some loops, though, where that doesn't matter. Where you wouldn't have to do that. >> S. Sudarshan: Yes, so that could be an optimization where we turn it off. But unless we know for sure that the order doesn't matter, this is safe. Okay. So that goes an overview. Like I said, I didn't go into the details of all the conditions, but we have those in the 2008 paper, and the more detailed version is there in Ravi's Ph.D. thesis. Okay. But there are some limitations. Well, a limitation is an opportunity for a new paper. So anything which you didn't do in the first place, you get one more paper. So that's what we did here. And then I'll tell you some of the limitations of -- so this was a paper published in ICD. Well, that also has some limitations, which will be our next paper hopefully. Okay. So the first limitation is we were doing batching and there were obviously many interfaces which don't give a batched -- ability to do a batch query. So we have to do asynchronous submission. That's the only way for these things. The second problem is for certain programs, the query may actually vary across iterations, and so then what query are you batching? Batching assumes that across the loop, the query is fixed. Sometimes people add a few selection condition, depending on what parameters came in and the queries. So we cannot do batching as is. And then there are some inter-statement data dependencies which by I mean we can't actually split the loop. So maybe we can't apply our transformations. And finally, like I said, even though we may not be able to batch, the multicore processing power on the client can be used to issue a number of 11 queries in parallel and also fetch them back in parallel so at least the -whatever work the client has to do can be parallelized, and so the server, whatever it does, can be parallelized, even if it's not set oriented. So basically, we exploit asynchronous query submission. So in order to do this -- by the way, original batching did a whole lot of low-level calls and added a lot of [indiscernible] in there. So one of the problem when we saw a written program, we couldn't understand what it was doing. So, you know, like I said, [indiscernible] to do the right thing, but we do trust our programs that much. We do something, and then you run it, well, you won't be happy. So what we ended up doing is building an API so that a transformed program uses the API and is a lot easier to read. So the programs that I showed you actually are based on that API. The original one was actually much harder to be. The other thing is that once you have this API, if you don't trust our rewriting, you can still use the API and do whatever we would have done automatically, you can do it manually and get the same benefit. And finally, there's an improved set of transformation rules. getting into that, including reordering and so on. I won't be So what we are doing in this thing is to have a whole bunch of taxis. >>: So I have one question. >> S. Sudarshan: Yeah. >>: It seems it should be very hard to find to argue that any [indiscernible] that you have [indiscernible] extremely sudden [indiscernible] if you, for example, use [indiscernible] or if you use [indiscernible]. Seems ->> S. Sudarshan: Yes. >>: How you would find all that, you know, that you could have [indiscernible]. 12 >> S. Sudarshan: So we are assuming that the procedures which you call are not going to have that side effect. So if, yeah, reordering the way in which you do things. So if you have the procedure which was in the first part of the loop and a procedure in the second part, earlier they will run in lockstep. Procedure one, procedure two, one, two, one, two. So now it's one many times, two many times. So absolutely, it will break. break. >>: If they have side effects, the whole thing will No, but my question was [indiscernible] won't have the side effects. >> S. Sudarshan: So there are two parts, right. If you do a full interpretive analysis, we could actually see what those procedures sure they don't have side effects. Current implementation doesn't fully. The reason we don't do it is at least with the tool we are very slow. fledged do and make do that using, it's The problem is it doesn't just look at our procedures. It starts and it goes deep into all the system libraries which are there and starts analyzing all the library, which is crazy. >>: Right. >> S. Sudarshan: So we need a better tool which will just look at our procedures. And for the system things, you know, [indiscernible] contract about what side effects it has or doesn't have. So really it should work at that level and then it would be efficient. Because the time is [indiscernible] purpose here is not much, but the problem is it goes into all the libraries. So current implementation doesn't actually do that. So we are assuming whatever procedures there are side-effect free. But the remaining parts, the dependencies within the loop is what we are actually making sure is okay. Okay. So the motivation is obviously asynchronous submission can improve performance and it is, in fact, widely used. Ajax is very widely used. It's also true that it's very hard to program in Ajax. So it's okay for simple stuff, but if you want to do more complex things, you need a bunch of very smart programmers, and your average application programmer is not that smart. 13 So one of our goals, in addition to what we have been doing, is to take an application which would run at a client, you know, tablets or whatever, [indiscernible] or [indiscernible] so Java script or whatever other language application and turn synchronous calls into asynchronous calls. We're not there yet. We would like to do it. Okay. So asynchronous, you all know what this is. This is for a different audience. Yeah? >>: Has this been [indiscernible]. >> S. Sudarshan: >>: I'm going to skip this. Yes. Asynchronous? >> S. Sudarshan: Yeah. So the programming -- actually, the web services community has certainly studied this. Obviously, it is important for them. But whatever work we have seen has focused on straight line code. So if you have a sequence of calls, then they would do [indiscernible] asynchronous submission ahead of time for that thing. But if you had a loop, then whatever techniques we've found do not work. So database people tend to do loops over some data and execute a number of queries and that is something which the web services people somehow have not paid attention to. But for the straight line case, indeed, there's been work from quite a while back. 2003, 4, even, there's been work. Okay. So the ICDE paper had the following contributions. Like in the batch case, it automatically transforms the program. There is a statement reordering algorithm which is applicable both to the batch and to the asynchronous thing. In the earlier paper, we didn't have any guarantees. It was like the best effort. Later on, we developed an algorithm which could guarantee that variable reordering was possible which would allow us to split the loop. It would, in fact, do that. So there's a corresponding theorem in Ravi's thesis in detail and briefly mentioned in the paper. I won't get into it. Then there's the APA, as I said, and we also talk about some of the design challenges involved in making this happen. Yeah? 14 >>: [indiscernible] contracting the efficiencies that the programmer may have? For example, he may not have added a [indiscernible] limit in this is query. But you can look at the program and realize that he really only wants to [indiscernible]. >> S. Sudarshan: be -- yeah. >>: That's a nice idea, but thanks for [indiscernible]. It would [Inaudible]. >> S. Sudarshan: So here's basically the same program as before. This time, what we have done is we have a handle, which is for queries which have been submitted. The handle is actually used to fetch the result from there, and what we do here is in sort of adding the query to a batch, we do a submit query over here, and the submit query immediately returns a handle, which we save in this handle area. And then the second part of the loop simply goes over the different handles. It does a fetch result on that handle, and then finishes up the loop. Again, this is simplified to work for this program. Actual rewriting doesn't look like this. We do have the loop context and all that in there. But conceptually, this is what happens. So conceptually, API will execute a submit query, execute -- sorry, execute is the blocking one, which is split into a submit and a fetch. Yeah? >>: [Inaudible]. >> S. Sudarshan: Right. So that is a parameter which we can control as part of a configuration. So the submit query actually doesn't [indiscernible] go and send the query. It simply adds it to -- I think I have a picture here. Yeah. So submit query is simply added to a submit queue and then there's a bunch of worker processes over here. So we can control how many there are. I believe the current versions of the [indiscernible] actually has an asynchronous query submit. We'll not use it as of now. Our tool is JDBC, so we don't have it. But if we did it in the dark net framework, we could perhaps avoided all this and used the asynchronous submission. 15 Okay. So each of these threads is synchronous. It blocks until it gets a result back and then puts it into result area, and then it's fetched over here. Okay. So what do we have here? This is the same thing as we saw challenge is the same as before, complex program that arbitrarily flow, data dependencies, loop splitting requests, variable values in this. So these are all the same problems that we had before. explicitly list all of them before. before. The controls to be stored I didn't But let me say a little bit about what we do with some of these. The data dependencies, I told you, that are conditions so then we can split it. The second issue is what about control dependency. What if there is a query which is conditional in there? How do we handle it? So we use a fairly standard trick, which is [indiscernible] anything which is inside and if -- into guarded statement. So we have a variable which stores the result of the if predicate and then each of the things within the then part is guided by the -- that variable being true. And then the else part is guided by that thing being false. So we have this usually guarded statement, and the control flow is basically gone. So conceptually, we do everything, of course, we could skip the else part if you're doing the then part. But in this case, we pretend that all the guards are actually executed. But when we finish our transformation, that is a second stage where we take the [indiscernible] code and get it back into Java. So at that point, we actually go back and create [indiscernible] back. So the final program doesn't -- it actually hides all those details. Okay. So we give here a few of the rules. Now, there were similar rules for batching, but I will focus on the rules which we use for the asynchronous part. The first one is the result of the equivalency rule for loop /TP*EUGS. The second is to convert control dependencies to flow dependencies. This is the one that I told you with the if-then-else can be turned into guarded commence. And then rule C1, C2, C3, which are reordering of statements. Again, I won't give all the details. But these, some of these generalize the batching rules and some of these simplify the batching rules. 16 So I'm going to skip the details. Yeah? >>: So one thing that you've been showing in the [indiscernible] is scale, right, because I think it's possible that you [inaudible]. >> S. Sudarshan: >>: Yes. And how you control that, like why [inaudible]. >> S. Sudarshan: That's a good question. So we reuse the same variable in the next loop. We're context object which stores the old value of the until the second loop. So certainly, there's an will use actually variable increase the variable and then storing a loop and keeps it around in the state. But the thing is how much will this blow up, right? If you had a thousand iterations of loop, you have, you know, a thousand full blow up of the state. Typically what we have seen in these programs, the state is just a few bites. If you have a very complex status, then that could be trouble. And if you had data structures which are updating and so on, which are also used in the second part. So the thing is if variable is not used in the second part, we don't have to save its state. So it's only for these things which cross the boundary. So if you have complex data structures which cross the boundary, then we are in trouble. So we won't actually split the loop in that case. But as long as it's simple variables whose value we can save and restore, we do it. Now, my condition is that the number of iterations of the loop is the no going to go large. If it were, let's say, 10 million, your program would never have executed. Try doing 10 million round trips to a database. Your program wouldn't have executed. So it's not something you need to worry about. >>: No, but it could be an issue in the [indiscernible]. >> S. Sudarshan: >>: [Inaudible]. Yes. Yeah. 17 >> S. Sudarshan: So think in this case, since we have, you know, we can control what is sent and when it comes back. So that can be under the control of the API. The submit part simply spews out the whole thing. So if the [indiscernible], we can actually stop sending the queries to -- we don't implement it, but there's no reason why we can't do this. That, you know, we stop sending queries at some point, wait for the thing submitted earlier to be consumed and then send more things. >>: Isn't that true for any of the loops data, you could simply [indiscernible] better iterations of the loop and sort of package things up in batches in order to avoid having to materialize? >> S. Sudarshan: Yes, that is true. So we could -- the rewriting would be a little more complex. So you'd have an outer loop and then an inner loop per mini batch. So we could do it. >>: [indiscernible] variables remain the same [indiscernible] arbitrary changes that are happening in the loop [indiscernible]. >> S. Sudarshan: It's possible we are not excluded. Yeah if you realize that a variable is simply a counter, then we don't actually have to save its state. So those are optimizations that are possible but not currently implemented. Okay. So this is -- well, actually, pasted two docs together. So there's a little bit of reiteration between batch and this. So I think this particular one is the loop carried dependencies. So I am going to skip this part of the slide. But the thing to notice when we did the earlier paper, the reordering was not complete. It was just, you know, said you can move this and then if it results the condition, then you can do the rewriting. But there's no specific algorithm to say what should you do, how should you do this, in what order do you do the ordering. So one of the contributions in this paper is an algorithm that decides when you can move something and among the candidates which one to move. And it actually does this iteratively until it cannot move anything until either the condition for splitting is satisfied or it cannot move anything [indiscernible]. 18 So this is an example of the dependency graph. So this is a little bit of the inside study of what happens. So this are the statements corresponding to this thing over here. Now, again, over here this is a Java program so the statements here are Java state -- lines. But that's not [indiscernible]. We have something closer to byte code and those are the statements here. But sticking to the Java statements, we are treating S2 as these two together. And then S3 and S4. So let's look at some of the dependencies. The black ones here are the flow dependencies. That is S2 defining a variable here so variable count. And S3 uses it. So that is a direct flow dependency. Then there are other kinds of dependency. There are anti-dependencies. So over here, there's an anti-dependency from S1 to S4. Because S1 is reading something and later S4 writes to it. So those are the anti-dependencies. Then there are output dependence. In this case, this -- well, the dashed ones are loop carry, which is across iterations of the loop. So over here are the assignment to category is clobbered by the next loop, which also assigns to the same thing. So that is an output dependency, but it is loop carried because it's in the next iteration so we have a dotted red arrow here. Similarly, if you see here, category is assigned here. And then in the next loop, the value of category red is whatever is assigned in the previous loop. So there's also a flow dependency which is now loop carried from S4 to S4. So this is the kind of things which we graph. I'll skip all the minor details. But finally, you'll see that from S4 to S1, there is a loop-carried flow dependency which goes from the second part of the loop back to the first part. Because this assigns it and that reads it. So those are the ones which prevent splitting, and that's what we get rid of by reordering. And we already have seen this particular example of creating a temp variable. I'm going to skip that slide. But let's see the same thing in terms of what happens to the graph. So what we have done is added a new statement, S5, which is the temp thing over there, and S2 has been rewritten to use the temp. So over here, there's a loop-carried flow dependency from the second part to the first part. Over here, after doing this reordering, you will notice that the part where we want to split is this. S2 is over here. This is the execute query. We want to split the loop into a 19 part that is before it and a part that is after it. And here, there is no such dependency going back. And that's where we are able to split the loop. Okay. So there is the statement reordering algorithm. Again, it takes us in input, blocking query execution statement. And the basic block, which is the loop itself. And wherever possible, it reorders the basic block such that no loop-carried flow dependencies cross the split boundary Sq. And, of course, program equivalence is preserved. That's the formal definition, a statement of what it does. And again, there are a lot of details here. I will probably not get into all of them. But I'll just give you an idea of what we are trying to do. So basically, what we want to do is we want to find a statement which we can move to some other place in order to get rid of that loop-carried flow dependency. That's not good. So the first step is do identify statements which we want to move. So in these two cases, there's a V1 with a loop carried flow dependency to this. And in this case, again, there's a V1 which is over here. Which feeds back over here. Now, the second part is where the loop carried flow dependency is from the query itself to something earlier. Or from the query to something later, which, in turn, has been going back. So these are the various cases. So we move the same things around in different cases. We won't get into all the details. So the thing is to decide what to move in each of these cases. And then we are to see what other statements depend on the one which we want to move. Because if we move something, something else may get affected. So we have to move it carefully. some cases. We can move a set of statements together in So in this case, we identify everything which is dependent and move all of those past target. And finally, the statement which -- well, the dependence statements are moved past target first, and then the statement itself is moved past the target. Because if you don't do that, we are splitting the loop there. So this is the last step. Once we move statement past, we can now split the loop. 20 Okay. So true dependence cycle in a data dependence graph is a directed cycle made up of only the flow dependencies and the loop carried flow dependencies. And the theorem is that if a query execution statement doesn't lie on a true dependence cycle, then the reorder algorithm is successful in moving things around. So this was a guarantee which you didn't have the earlier paper. paper, we have the algorithm which guarantees this. In the ICDE Good thing is that pretty much all of this is applicable to both blocking and asynchronous. And, in fact, our API [indiscernible], we have this one API, and then we have a flag somewhere which says good as batch or do it asynchronously. The API looks weird. We say at batch, when in the asynchronous case, it actually goes and executes it. But the nice thing that the transformations are identical. The API is identical. It's just a flag whether you want to do batching or this. Of course, in certain cases you cannot do batching, in which case the flag is to asynchronous only. Okay. So that is a quick view of what one of the kinds of things we do to rewrite the program. And this is an overall flow of what we do. We take the source Java file, pars it. Well, we use a [indiscernible] framework which does all of that. Converts too the jimple representation. Data flow analysis and def-use representation, all of these, the dependency graph, all of this is done by SOOT up to here. This part is what we do, apply the rules to move things around. The thing is once we move a statement, everything changes. The dependency graph itself changes. So after we do any such move, we have modified the jimple code, of course, so we actually have to again do the data flow analysis so that we -dependency information is correct after the move and then we can again apply more transformation rules. We decided we are done and then we decompile and give the target Java out. So this API, like I said, can be auto generated, or it can be manually used. So there is a loop context structure in the API which makes it easy to remember all those variables which were defined in the first part of the loop and used in the next. And the same API for batching and asynchronous. 21 So this was demoed at ICDE also. So that is a quick overview. this is worth it. Now let's move to the performance, whether all of So the batching performance, there is a number of results earlier. all of them. But one or two of those are presented here. I won't use So what did we use for doing our analysis? There were two public domain benchmarks. There was two real world applications, which, one of which the company was having actual performance problems for a real application. So they had built it in a modular fashion, okay. So modular fashion. Everything is an object and you have to give a stock option to an employee to have a set of procedures which deals with one employee. And now stock options are generally given as a batch. You give it to a number of employees. And it turned out in their application, they used a really expensive computer with, you know, plenty of memory and everything. And in spite of that all, they had to process the stock grant, and they were running out of that window. So they came to us and it came out well because we had already been working on this problem. So that is a good connect. >>: Did you include yourself among those who got the stock? [Laughter]. >> S. Sudarshan: >>: No, unfortunately. [Inaudible]. >> S. Sudarshan: Actually, that we give them the idea, they said would work with them for a while did. Whether they used our idea to use our idea, I have no idea. Okay. area. company turned out to be very trouble. Once thank you and went away. So the idea was we and develop it, but I don't know what they and said bye-bye or whether they decided not So this one, as I already told you, was something developed in this So we used a dual-core machine. With dual-core, we are actually getting 22 a lot of benefit from having multiple threads. There were actually some experiments on postgreSQL, the asynchronous part, we can do on postgreSQL. Batching transformation is a little trickier. So we look at the impact of various things such as iteration counts, the number of threads, impact of warm versus cold cache, since IO is a big issue. So one of the things we thought is if we increase the number of threads, there would be more IOs happening which would destroy the normal sequentiality of a single execution. It turns out that we thought performance would actually become much worse. But surprisingly, you know, it didn't become much worse with either of these systems. >>: [Inaudible]. >> S. Sudarshan: Yeah, for [indiscernible]. So we actually, this is what happened first. We said okay, let's have a query which does a lot more work as can. [indiscernible] didn't matter. So I think the data bases are fairly good at controlling the load internally. Okay. So here are these things. There are two things which are cold cache mum numbers and then two which are the warm cache numbers. This is SQL server. So if you go here, the original program with four iterations, you can see that the transformed program is actually running worse with cold cache. And with warm cache, the difference is even more. You can't even see original program down here. So the bottom line is when the number of iterations is very low, the batched -- well, which one is this? This I think this is the batched one. No, sorry, this is the asynchronous That is the threads. So it actually becomes worse, potentially. But see the time was actually very small anyway. the is the -one. if you So ten-fold increase doesn't mean very much when it's ten milliseconds. But as the number of iterations increases, you can see that the transform program, like here, this is cold and this is warm. You can see that the transform program was like nine seconds, when the original was 50. And in this case, it was 5.9 seconds, when the original was 46.4 seconds. So the improvement is substantial in the number of iterations is more. And this was with ten threads. 23 >>: So it doesn't matter, because for the smaller number of situations, because the run time is so small anyway. Through-put, it does matter when you're ten times more resource intensive. So wonder whether you can -- it introduces a new estimation or prediction problem. Based on the program, not on the database. Would it be on both? Could you try to predict whether [indiscernible] on the right or whether you are in the other case on the left? >> S. Sudarshan: Yeah. So Ravi is actually working on that. I have not been involved in that part. He's been working with some of the programming language people to try to do static estimation of the number of loops that you would have. It is static or maybe dynamic based on previous ones. cost-based changes. So we need to do some The other thing is I'm not sure that this decrease in performance is because the database is certainly much more inefficient. It may just be that [indiscernible] of setting up an asynchronous call and fetching it. So it may not have any impact at all on the database. It only impacts the ->>: [indiscernible]. >> S. Sudarshan: >>: Yes. [indiscernible]. >> S. Sudarshan: Absolutely. Okay. Now, this one is thing impact with number of threads with one thread -- well, we are at 46.4. And the time decreased sharply. Starts levelling off after five. It improved up to somewhere around here, 30 or 40. And then it started increasing again. So experiments were done with ten threads. Could have been slightly better maybe with 30 threads. Or maybe some of that impact. But for the four iterations, it wouldn't have matters. 30 threads or four. It's all the same. >>: [Inaudible]. >> S. Sudarshan: Hm? Doesn't matter whether 24 >>: How many processors? How many processors? >> S. Sudarshan: So the database server, I think, was a dual core. That is here. 64 bit dual core is the database server machine. And the client -okay, this doesn't say, but I think the client was just a single core. >>: [indiscernible]. >> S. Sudarshan: hyper threading. Yeah, I think it had. That's, yeah, I'm pretty sure it had So it probably would equal into four cores, at least. Okay. That slide is done. Now, this is web service, where we coded this manually, because our code does not actually recognize web service calls. And again, here something which took almost 180 seconds, whether the database was free base with a web service API. And after about 95 threads, you can see there's improvement. They start levelling off after this. So there's a lot of potential for this. Now, what about batching versus asynchronous. If both are applicable, how do they compare? Sometimes only asynchronous is applicable so then this is not relevant. And if you see here, the first is original, the second is asynchronous, and third is batching. And as you can see, batching, whether it's applicable, it pretty much outperforms asynchronous. It's fairly clear, the number of messages you send over the network is reduced. The database can use a better plan so you should use batching if at all it is applicable. But if it's not, asynchronous still gives a substantial improvement. Okay. So that completes the talk. There are many directions for future research. The one which we are currently focusing on is this. So whatever I showed you was a query in a loop. Now, this works for certain applications, but there's a whole class of applications where the query is deep inside a procedure. So any application which uses, say, hibernate framework, it hides the SQL underneath. You just see an object model and you [indiscernible] on the object. And deep inside, it's either doing a SQL query or it is looking up something that is already cached. It's a [indiscernible] object or it's looking it up. On this case, what can we do? If you had a loop or multiple objects and whatever you're doing actually required running a query to fetch an object, well, we would like to prefetch those things. 25 So we can't do this exactly the same kind of, you know, execution of the query asynchronously like we did, but what we can do is if you recognize that inside a procedure -- so here is a loop outside. Here's a stack of calls and deep inside a SQL query [indiscernible] are being executed. Let's say it's likely to be executed. And we know what the query is. And if you can trace the way in which parameters are passed down so that the parameters to the query is actually available up here in the stack, we can actually issue a prefetch all the way at the beginning. And then we don't touch the second part. We are not splitting loops. We're not doing anything. So this is what is nice about the new approach. Very non-intrusive. The earlier one actually did a lot of program rewriting. Now, the new rewriting is to issue prefetch call. So that's work in progress which is coming out quite well so far. The second one is which calls to transform. As Jared was saying, we need to figure out how many loops are there and then decide whether to split or not. Minimizing memory overheads, there's some [indiscernible] optimization, certainly. How many threads to use. So our experiment showed something, but this is always the case. Maybe it depends on the load of the database server. So it's already heavily loaded and you start throwing a lot more work on it, you may be causing trouble. And then you're not actually using it immediately. So can we control this in a slightly more sensible manner? And the last part is actually quite interesting, transactions with work David and Jared here. So this is a big issue. We swept several things under the carpet. The first thing that we swept under the carpet is even for the simple read-only case, each of these threads like opens a fresh connection to the database. Now how do you guarantee that all of these are running under the same transaction? So it turns out that in theory, you can use the [indiscernible] interface to make all of these part of the same actual transaction. It also turns out that many databases don't actually support this feature. So it's a bit of a problem. 26 So that is supported, we can use it. >>: Some of the things you might, in fact, want to have separate transactions and not, in fact, have the database run long transactions but rather run a bunch of short transactions instead. >> S. Sudarshan: Yes. So if that is what you want, then we are already doing it. We are ignoring the effect of taking one -- what used to be one transaction, we could do one connection. But if you use auto commit and each one was independent, then there's no issue at all. >>: You need to pursue it for semantics. >>: Oh, sure, of course [indiscernible] several transactions it's the same effect. >>: But most of the so-called default transactions are transaction per statement, which [indiscernible] another transaction. That's involved unless you do something explicit. >> S. Sudarshan: So that case, you know, we do handle as long is it's read-only. If you do updates synchronously and things are [indiscernible] you're in deep trouble. So we obviously don't issue updates asynchronously. >>: So it's safe to turn a bunch of dis/PRAEURT transactions into one big transaction, but it could some bad effects on [inaudible]. >> S. Sudarshan: Yeah. So if you're -- we can, since we have access to the API, we can easily figure out whether your original program ran under the single transaction. So in that case, we take all these connections and try to shoe horn them into one transaction. But as I said, it appears not to work quite right on the databases we've tried. So we've not been able to get it running so far. So with [indiscernible] that's [indiscernible]. With snapshot isolation, with read-only, seems like a natural thing to say that here are all these connections. Let them all use the same snapshot. Now, if the database decided to support this, it would be completely [indiscernible]. The [indiscernible] for the database to do it. But we need an API from the database to allow this. 27 Okay. So that's it. Any other questions? >>: So one of the things that [indiscernible] background in compilers and optimization, and I think an interesting question is to what extent the kind of program transformations you're doing might be of use, whether or not you've got a database program down in the bottom of the loop of some size. What sort of conditions can you put on the things and what sort of generality can you do to, in fact, get program transformations which might affect over scenarios as well as the database scenario? >> S. Sudarshan: That's a good question. It's, indeed, something we talked about. So the first concern was maybe all of this had already been done in the programming language community. So we just use it and substitute, you know, add statement is substitute the by JDBC call. It turns out that these are very high overhead, setting up loop context and so on is very high overhead. This is not something that any compiler writer who is saying whatever put in in order to [indiscernible]. So most of the work we do makes sense only if whatever you're doing is extensive. So that is something they have not done. But the specific analysis for loop splitting and so on, that has some similarities with, you know, the parallel compiler where you want to take something which has multiple iterations and then turn it into a parallel execution. So some of the analysis is very similar to work that happens in parallel compilers. >>: Any other questions? >> S. Sudarshan: Thank you.