1

advertisement
1
>> Ravi Remamurthy: Okay. Good morning, everybody. It's a pleasure to
welcome Sudarshan. Most of you already know him. He's been a visiting
research at MSR before.
He's a professor in IT Bombay, which has an excellent database research group.
He's a co-author of a very popular undergrad textbook and his research areas
include query optimization, keyword [indiscernible] and databases. And today,
recently, he's been working on something called holistic query optimization,
and he's going to talk about that today. So I'll hand it over to him.
>> S. Sudarshan: Thank you, Ravi. So this work was done by -- it was started
by my Ph.D. student, Ravinda Guravannavar, or Ravi, as we all call him. And it
is being continued by Karthik, who is a Ph.D. student currently working on this
topic. So if there are any really hard questions, I'm going to ask you to
e-mail them. But hopefully, it should be okay. And then several Master's
students including Mahindra Chavan and a few others.
Okay. So what is the problem? So here you have a
a taxi. There's a long line. And in our context,
database system. So you execute a query, you fill
takes a long time to go there, a long time to come
There's one taxi in town. So it's a very bad idea
bunch of people waiting for
the taxi is called to
the taxi, start over. It
back and then the next guy.
to live this way.
And why would you, you know, why would anyone do it this way? Because there
are many situations where people, called loops with query [indiscernible].
Sometimes it because of a programmer didn't know better. It was easy to
change. Sometimes it's actually hard to change because the query is not
directly in the loop but it's deeply [indiscernible].
So it's useful not only in databases, but the same problem even arises with web
service requests. Our implementation has focused on databases. But the ideas
that are present here should be applicable for web services also.
So naive execution by iterative or queries is obviously inefficient,
said. The latency is a huge factor. In addition, what we have seen
with multiple CPUs, the database, the database can actually handle a
queries concurrently, especially if their data is [indiscernible] in
there's not much disk IO.
as we
is that
lot of
cache. So
2
And we're able to bump, you know, ten year queries in [indiscernible] database,
and the response time, you know, highly changes when you go to 20, 30. These
were small queries.
Even for large queries, this kind of works, because if they share the scan, for
example, you can get much better performance.
Okay. So how do we solve this problem? So a lot of us have been working on
query optimization for many years, but unfortunately, there's nothing the
database can do. It's being sent a series of queries, and unless it can kind
of guess that this is what is happening, but even then there's little it can
do, because at best it can precompute the query and keep it, but still
[indiscernible] synchronous and slow with latency.
So you really have to go to the other side of the gap here and work on the
application program. So that's something we avoided for many years, because we
thought we are database people. We are not programming. We don't know
anything about programming languages.
But likely, I had a Ph.D. student who not only knew databases, he also knew
program analysis. So we started filling this gap. So as we say, it's time to
think outside of the box for query optimization.
So we're actually going to give two solutions to the problem I posed. The
first one is to use a bus, ecologically sound. And what is a bus in this
contest? It's basically [indiscernible] of execution, as you can imagine.
had a lot of fun with these pictures.
We
So I'm first going to describe earlier work, which first appeared in VLDB 2008,
which was given a program which makes calls on a database. Our implementation
detects database calls and works on Java programs.
How do you automatically rewrite the program to replace a loop with calls by a
single batch oriented execution?
So the obvious, such as [indiscernible] plan sharing of disc IO, network round
trip delays, all of those which I told you.
So our approach is to actually transform imperative programs, and the way we do
that transformation is there are, luckily, tools available, including some at
3
Microsoft which unfortunately I don't think we have access to it, but we are
using an open source tool called SOOT, which analyzes Java, code
[indiscernible] and internal presentation called [indiscernible]. And then
builds the data dependency in control for other graphs, collectively called the
program dependency graph.
And now we can -- it also provides a bunch of APIs to detect things in this
graph. So it makes our life a lot easier to do an analysis using this tool.
And it also allows us to move segments around at the [indiscernible] statements
can be moved around.
So we can actually transform these programs.
transformations applied to the program.
So now we have basically
And then the second part is to actually rewrite the query to make it set
oriented. Luckily, you know, in this case, the sequence of our optimizer saved
the day here, because it's really good at decorrelating queries. So all we had
to do was give it a query which essentially used the cross [indiscernible] and
[indiscernible] version of the cross [indiscernible]. And it did the rest of
the stuff for us. So that is actually on sequence server.
Okay. So here is an example of a program, a small program, and how we rewrite
it. See if I can use this pointer.
Okay. So here is a connection which has been somehow set up and you are
preparing a query which counts the number of partkeys from part from a
particular category, which is provided [indiscernible]. And then there's the
loop, which goes through a category list, starting through [indiscernible]
elements using next. It binds the question mark parameter to the category,
executes it, gets the count and then adds it to the sum. So it's doing a fair
amount of work, but still a very small example.
So here is what we do when we end up rewriting it. So this part is the same.
Now here, you will notice there are two loops where there was one. So the key
thing here is to split the loop into two parts, where the first part doesn't
actually execute the query. Well, we'll see that in the asynchronous case, it
actually does initiate execution of the query. But in the batch oriented
portion, all it does is it steps through the category list and now, after
binding the parameter, it says add batch. And that's all that loop does.
4
And then this single statement doesn't execute batch, but rewriting the query,
which as I said is very simple. And then the last part is executing the
results of this query and adding it up to get the sum. Okay? Of course, this
all has to be syntactic. We don't understand the semantics. And, of course, a
lot of conditions apply, you can't always do this.
>>: In this particular case, can't you just issue one query? It's probably
harder to decide that [indiscernible] [inaudible] you could push it as a union
of categories and [inaudible].
>> S. Sudarshan:
Right.
>>: Could have potentially done that [indiscernible].
It's harder to do it.
>> S. Sudarshan: So there are two kinds of things that could happen. One case
is where the category list is actually a list which is in the programming
language. It is not coming from the database. And here, what we are doing is
this [indiscernible] doesn't execute the query. It's actually just collecting
all the values.
And then this execute batch is doing the whole execution.
>>:
Still execute [indiscernible].
>> S. Sudarshan: No. It's one query. It's a single query, which is a batch
oriented version of it. Basically makes the parameter and makes a temporary
table in the database, and then that table [indiscernible] on the basic query.
[Indiscernible] are the parameters which went in, along with the regular
attributes which were in the query. And then that comes back and this part
actually steps over it.
>>:
I guess what I was asking is you could have pushed the sum --
>> S. Sudarshan:
>>:
Into the database, yes.
Yes.
If he --
[indiscernible].
>> S. Sudarshan: Exactly. So in this case, maybe we could [indiscernible]
that we can translate this. But in general, what we're doing arbitrary stuff
here, all dependent and so on. So we actually can resolve [indiscernible] if
5
the -- this -- I'm not showing it here, but the way we do it, this would
execute exactly what this loop did in the same order over here. And so we can
[indiscernible] if you bring stuff over for example, it still works.
And, in fact, coming to the other thing, if this thing over here was actually
it rating what a relation in the data -- this is actually another common case.
So you have a query which looks at a relation of another query in sight. Then
we could short circuit this part, approximately, and realize that this is
already in the database as a relation and use that relation.
We could.
>>:
I don't think our current implementation does it.
[indiscernible].
>> S. Sudarshan:
>>:
Yeah?
Yeah.
What happens [indiscernible].
>> S. Sudarshan: Okay. So if you -- well, in the paper, we discuss how we
could take a procedure and then rewrite it to a batched version of the same
procedure, where it takes one parameter. We pass a set of parameters, and we
can actually rewrite the procedure itself. It's a new signature in theory.
>>:
[indiscernible].
>> S. Sudarshan:
>>:
Right.
[indiscernible].
>> S. Sudarshan: Right. So if the so if itself [indiscernible] is then
[indiscernible] by [indiscernible] this is a possibility [indiscernible] could
do it. But currently will not handle [indiscernible].
>>: So a lot of the latency issue is avoided by simply packaging this whole
thing up into a procedure [indiscernible] down the database and letting that
procedure perhaps execute unmodified?
>> S. Sudarshan:
query.
Yeah, it's a query.
It's not a procedure.
It's just a
6
>>: Yeah. So what you're adding to that here is somehow driving more of the,
more of the execution into the declarative part of the query when you optimize,
is that?
>> S. Sudarshan: That too, because we are telling the database to do this
query on another batch of parameters. The optimizer can now do something
[indiscernible], which it couldn't have if you got into the -- so that's the
second one. The first one is the latency and the second is this absolute.
Okay. So I'm going to discuss some of these conditions, but so this is -- what
is this example here? This is a slightly more complex example, where what
we're doing here is if you look here, in this case, we have a category coming
in. And now look at the last part here. It says category is equal to get
parent of category. So now the step here is after the query execution. So if
you split the loop at this point, we actually don't have the ability to get all
the categories. So what this example is showing us, you can't just take the
point in the loop where the query is and split it, because there is a
dependency here which feeds back into it.
So this illustrates, that, well, two things. The first is that that if there
are dependencies, which they call loop carried flow dependencies. So this
thing goes into this next iteration of the loop back here. Then you cannot
split the loop. That's the first thing this illustrates. So that's one of the
required conditions for doing the loop split.
But the good news is that we can actually do something about it. There is no
reason why this had to be here. We could have moved it up here just before
this query. But then there's a slight problem, which is that the category
which was -- on which the query was to execute is now getting clobbered,
because [indiscernible] we can use then query variables and work around this.
We'll see an example.
Okay. So there are several steps in this transformation, through which I'll
show you how to handle complex cases like this. The first step is, obviously,
to identify which queries we want to turn into batch execution.
And in our [indiscernible] implementation we just look for database
[indiscernible] and we could perhaps do it only if there are [indiscernible] of
a loop and so on. But anyway, the [indiscernible] are small enough that we can
get away with this for now. That's the first step. The intention is to split
7
the loop at this point. But like I said, there is a loop carried flow
dependency here, which feeds back into the loop condition and also into this.
So we can't directly split it.
So we do a data flow analysis here to look for
step is to try to reorder statements to see if
flow dependency. Now, there are certain cases
which case we have to abandon it and not split
this condition, and so the third
we can remove this loop carried
where it cannot be done, in
the loop.
But in this particular case, what has happened is we have stored the category
coming in into a temporary variable, and we have moved that statement, get
category to get parent of category which was done here, we moved it up here,
because I think nothing in between which depends on it. Well, only thing which
depended was this thing. The category was being passed as a parameter to the
query. But what we have done here is created [indiscernible] variable to we
can use that over there.
Okay.
here.
>>:
So now, if you see, this resulting loop, there's actually no dependency
Yeah?
[Inaudible].
>> S. Sudarshan: Yeah, so that is a lot of conditions applied. So there are
many conditions. One is there are all these functions which could potentially
have side effects, but in this case, you know, if the side effect affected one
of these things, then we could have a problem.
So we can do interpretive analysis to decide this. And if any of these ran
another query, which affected the result of this query, then again we have
trouble.
So to make sure there's no problem, we actually have to see what other queries
shoot to the database inside here. So we could do static analysis. In our
tool currently, you know, we are implementing that part. It's not really. So
like I say, it's faith-based optimization. We trust that this currently
doesn't cause problems. Yeah?
>>: So one thing is the extra conditions of the program. This thing is on the
data, for example, I wonder if there is [indiscernible] query is slightly more
complicated query often these transformations are not fully semantic specific.
8
They are symptomatic there are no issues with [indiscernible].
[indiscernible] is you wouldn't know on the database.
The
>> S. Sudarshan: I think in this case, that problem will not arise, because
the [indiscernible] cross join definition is for each value generated by the
left side of the cross join, you execute the query on the right side, and the
right side query is the original query. So assume being the SQL optimizer does
it right, and I do believe it does it right, we should not have any problems
with values.
But you're right, if you try to do decorrelation analysis, we could run into
trouble. So this is the reason we currently work only on SQL so the other guys
don't fully implement this.
Okay. So I think SyBase does present some sort of lateral joint, so we were
trying to implement it on SyBase, but it's not yet functional, as far as I
know.
Okay so in this case, after this [indiscernible], the loop carried flow
dependency is gone, and there are [indiscernible] conditions could take if you
could split a loop at that point. I won't get into all the details.
But the last step is how to split the loop. In the earlier program, I sugar
coated a lot of stuff to make it easier to understand what was going on. But
this is actually what a transformation does.
So the first loop did something. Then we executed the query. Now, the second
loop should do exactly the same set of things as the first loop did. Of
course, what happens inside the loop is different. But if there's a variable
which is defined in this loop and was used down here, then we better preserve
the value of that variable so that when the same iteration of the loop is done
over here, it sees the same value for that variable.
So once we split the loop, we have to make sure that variable states are saved
and then restored down here. Okay? So in order to do that, we create this
loop context stable, and has entries. One entry per iteration of the first
loop. And the second loop simply goes over the same entries in the same order
in which they were created. So it's really more of an [indiscernible] than a
[indiscernible].
9
So what we do here is create a context. And in this case, let us see. I think
there is no variable that needed to go into the context. So the context was
basically used because the original query execution here -- well, let me go
back to the first slide to show the original query.
Okay. So if you see here, now in our earlier program, I sugar coated it by
saying the query just returns one value. But in general, the query may return
multiple values, and you might actually be iterating over the values here.
In this case, you know, we have removed a little bit of the sugar coating and
we say results [indiscernible] next, because there's just one loop but it's
getting that. But if you had a loop, we have to make sure that in the second
part of the split loop, we will be able to execute the same thing over there
and it could get all the results of one particular [indiscernible] location.
Okay. So that's part of what a loop context can do here. So when we are doing
our batch, we are setting a context. So which iteration of the loop was this
for is provided by the context. And here, we're saying the results are equal
to statement [indiscernible] on the context. So we'll get exactly the same
results that the corresponding iteration over here would have got. And then
the rest is the same as before.
Okay.
Any questions?
So last part is how to do the batch rewriting. So this is the sequence of our
syntax. So over here is the original query. We have created a batch table,
and we have inserted all of the values into the batch table. We can use the
batch form of the database in such statement so that we don't do multiple round
trips. We do only a few round trips.
And then here, we select batch table one. So those are the parameters and all
attributes of the query from batch table 1 outer apply, and then over here is
the query. And finally, we order by loop key one so that the output of this
comes back in the correct order so we can just go through it in sequence. We
don't have to go back and forth in it.
>>:
[indiscernible] doesn't matter, though, right?
This final ordering.
>> S. Sudarshan: So this ordering is on not of the query. If the query had an
order by -- well, then what would happen? You'd have [indiscernible] order.
10
So we'd do an order by loop key.
there for the original query.
And then in the loop key, whatever order was
So the point is that a mini query is issued. We want the results to come back
in the same order in which the queries were issued, because there is a loop
there. Maybe it prints out there isn't. So the results have to come in the
same order then.
>>: I was just thinking that there might be some loops, though, where that
doesn't matter. Where you wouldn't have to do that.
>> S. Sudarshan: Yes, so that could be an optimization where we turn it off.
But unless we know for sure that the order doesn't matter, this is safe.
Okay. So that goes an overview. Like I said, I didn't go into the details of
all the conditions, but we have those in the 2008 paper, and the more detailed
version is there in Ravi's Ph.D. thesis.
Okay. But there are some limitations. Well, a limitation is an opportunity
for a new paper. So anything which you didn't do in the first place, you get
one more paper. So that's what we did here. And then I'll tell you some of
the limitations of -- so this was a paper published in ICD. Well, that also
has some limitations, which will be our next paper hopefully.
Okay. So the first limitation is we were doing batching and there were
obviously many interfaces which don't give a batched -- ability to do a batch
query. So we have to do asynchronous submission. That's the only way for
these things.
The second problem is for certain programs, the query may actually vary across
iterations, and so then what query are you batching? Batching assumes that
across the loop, the query is fixed. Sometimes people add a few selection
condition, depending on what parameters came in and the queries. So we cannot
do batching as is.
And then there are some inter-statement data dependencies which by I mean we
can't actually split the loop. So maybe we can't apply our transformations.
And finally, like I said, even though we may not be able to batch, the
multicore processing power on the client can be used to issue a number of
11
queries in parallel and also fetch them back in parallel so at least the -whatever work the client has to do can be parallelized, and so the server,
whatever it does, can be parallelized, even if it's not set oriented.
So basically, we exploit asynchronous query submission. So in order to do this
-- by the way, original batching did a whole lot of low-level calls and added a
lot of [indiscernible] in there. So one of the problem when we saw a written
program, we couldn't understand what it was doing.
So, you know, like I said, [indiscernible] to do the right thing, but we do
trust our programs that much. We do something, and then you run it, well, you
won't be happy.
So what we ended up doing is building an API so that a transformed program uses
the API and is a lot easier to read. So the programs that I showed you
actually are based on that API. The original one was actually much harder to
be.
The other thing is that once you have this API, if you don't trust our
rewriting, you can still use the API and do whatever we would have done
automatically, you can do it manually and get the same benefit.
And finally, there's an improved set of transformation rules.
getting into that, including reordering and so on.
I won't be
So what we are doing in this thing is to have a whole bunch of taxis.
>>:
So I have one question.
>> S. Sudarshan:
Yeah.
>>: It seems it should be very hard to find to argue that any [indiscernible]
that you have [indiscernible] extremely sudden [indiscernible] if you, for
example, use [indiscernible] or if you use [indiscernible]. Seems ->> S. Sudarshan:
Yes.
>>: How you would find all that, you know, that you could have
[indiscernible].
12
>> S. Sudarshan: So we are assuming that the procedures which you call are not
going to have that side effect. So if, yeah, reordering the way in which you
do things. So if you have the procedure which was in the first part of the
loop and a procedure in the second part, earlier they will run in lockstep.
Procedure one, procedure two, one, two, one, two. So now it's one many times,
two many times.
So absolutely, it will break.
break.
>>:
If they have side effects, the whole thing will
No, but my question was [indiscernible] won't have the side effects.
>> S. Sudarshan: So there are two parts, right. If you do a full
interpretive analysis, we could actually see what those procedures
sure they don't have side effects. Current implementation doesn't
fully. The reason we don't do it is at least with the tool we are
very slow.
fledged
do and make
do that
using, it's
The problem is it doesn't just look at our procedures. It starts and it goes
deep into all the system libraries which are there and starts analyzing all the
library, which is crazy.
>>:
Right.
>> S. Sudarshan: So we need a better tool which will just look at our
procedures. And for the system things, you know, [indiscernible] contract
about what side effects it has or doesn't have. So really it should work at
that level and then it would be efficient. Because the time is [indiscernible]
purpose here is not much, but the problem is it goes into all the libraries.
So current implementation doesn't actually do that. So we are assuming
whatever procedures there are side-effect free. But the remaining parts, the
dependencies within the loop is what we are actually making sure is okay.
Okay. So the motivation is obviously asynchronous submission can improve
performance and it is, in fact, widely used. Ajax is very widely used. It's
also true that it's very hard to program in Ajax. So it's okay for simple
stuff, but if you want to do more complex things, you need a bunch of very
smart programmers, and your average application programmer is not that smart.
13
So one of our goals, in addition to what we have been doing, is to take an
application which would run at a client, you know, tablets or whatever,
[indiscernible] or [indiscernible] so Java script or whatever other language
application and turn synchronous calls into asynchronous calls. We're not
there yet. We would like to do it.
Okay. So asynchronous, you all know what this is.
This is for a different audience. Yeah?
>>:
Has this been [indiscernible].
>> S. Sudarshan:
>>:
I'm going to skip this.
Yes.
Asynchronous?
>> S. Sudarshan: Yeah. So the programming -- actually, the web services
community has certainly studied this. Obviously, it is important for them.
But whatever work we have seen has focused on straight line code. So if you
have a sequence of calls, then they would do [indiscernible] asynchronous
submission ahead of time for that thing.
But if you had a loop, then whatever techniques we've found do not work. So
database people tend to do loops over some data and execute a number of queries
and that is something which the web services people somehow have not paid
attention to. But for the straight line case, indeed, there's been work from
quite a while back. 2003, 4, even, there's been work.
Okay. So the ICDE paper had the following contributions. Like in the batch
case, it automatically transforms the program. There is a statement reordering
algorithm which is applicable both to the batch and to the asynchronous thing.
In the earlier paper, we didn't have any guarantees. It was like the best
effort.
Later on, we developed an algorithm which could guarantee that variable
reordering was possible which would allow us to split the loop. It would, in
fact, do that. So there's a corresponding theorem in Ravi's thesis in detail
and briefly mentioned in the paper. I won't get into it.
Then there's the APA, as I said, and we also talk about some of the design
challenges involved in making this happen. Yeah?
14
>>: [indiscernible] contracting the efficiencies that the programmer may have?
For example, he may not have added a [indiscernible] limit in this is query.
But you can look at the program and realize that he really only wants to
[indiscernible].
>> S. Sudarshan:
be -- yeah.
>>:
That's a nice idea, but thanks for [indiscernible].
It would
[Inaudible].
>> S. Sudarshan: So here's basically the same program as before. This time,
what we have done is we have a handle, which is for queries which have been
submitted. The handle is actually used to fetch the result from there, and
what we do here is in sort of adding the query to a batch, we do a submit query
over here, and the submit query immediately returns a handle, which we save in
this handle area. And then the second part of the loop simply goes over the
different handles. It does a fetch result on that handle, and then finishes up
the loop.
Again, this is simplified to work for this program. Actual rewriting doesn't
look like this. We do have the loop context and all that in there. But
conceptually, this is what happens.
So conceptually, API will execute a submit query, execute -- sorry, execute is
the blocking one, which is split into a submit and a fetch. Yeah?
>>:
[Inaudible].
>> S. Sudarshan: Right. So that is a parameter which we can control as part
of a configuration. So the submit query actually doesn't [indiscernible] go
and send the query. It simply adds it to -- I think I have a picture here.
Yeah. So submit query is simply added to a submit queue and then there's a
bunch of worker processes over here. So we can control how many there are.
I believe the current versions of the [indiscernible] actually has an
asynchronous query submit. We'll not use it as of now. Our tool is JDBC, so
we don't have it. But if we did it in the dark net framework, we could perhaps
avoided all this and used the asynchronous submission.
15
Okay. So each of these threads is synchronous. It blocks until it gets a
result back and then puts it into result area, and then it's fetched over here.
Okay. So what do we have here? This is the same thing as we saw
challenge is the same as before, complex program that arbitrarily
flow, data dependencies, loop splitting requests, variable values
in this. So these are all the same problems that we had before.
explicitly list all of them before.
before. The
controls
to be stored
I didn't
But let me say a little bit about what we do with some of these. The data
dependencies, I told you, that are conditions so then we can split it. The
second issue is what about control dependency. What if there is a query which
is conditional in there? How do we handle it?
So we use a fairly standard trick, which is [indiscernible] anything which is
inside and if -- into guarded statement. So we have a variable which stores
the result of the if predicate and then each of the things within the then part
is guided by the -- that variable being true. And then the else part is guided
by that thing being false.
So we have this usually guarded statement, and the control flow is basically
gone. So conceptually, we do everything, of course, we could skip the else
part if you're doing the then part. But in this case, we pretend that all the
guards are actually executed. But when we finish our transformation, that is a
second stage where we take the [indiscernible] code and get it back into Java.
So at that point, we actually go back and create [indiscernible] back. So the
final program doesn't -- it actually hides all those details.
Okay. So we give here a few of the rules. Now, there were similar rules for
batching, but I will focus on the rules which we use for the asynchronous part.
The first one is the result of the equivalency rule for loop /TP*EUGS. The
second is to convert control dependencies to flow dependencies. This is the
one that I told you with the if-then-else can be turned into guarded commence.
And then rule C1, C2, C3, which are reordering of statements. Again, I won't
give all the details.
But these, some of these generalize the batching rules and some of these
simplify the batching rules.
16
So I'm going to skip the details.
Yeah?
>>: So one thing that you've been showing in the [indiscernible] is scale,
right, because I think it's possible that you [inaudible].
>> S. Sudarshan:
>>:
Yes.
And how you control that, like why [inaudible].
>> S. Sudarshan: That's a good question. So we
reuse the same variable in the next loop. We're
context object which stores the old value of the
until the second loop. So certainly, there's an
will use
actually
variable
increase
the variable and then
storing a loop
and keeps it around
in the state.
But the thing is how much will this blow up, right? If you had a thousand
iterations of loop, you have, you know, a thousand full blow up of the state.
Typically what we have seen in these programs, the state is just a few bites.
If you have a very complex status, then that could be trouble. And if you had
data structures which are updating and so on, which are also used in the second
part.
So the thing is if variable is not used in the second part, we don't have to
save its state. So it's only for these things which cross the boundary. So if
you have complex data structures which cross the boundary, then we are in
trouble. So we won't actually split the loop in that case.
But as long as it's simple variables whose value we can save and restore, we do
it.
Now, my condition is that the number of iterations of the loop is the no going
to go large. If it were, let's say, 10 million, your program would never have
executed. Try doing 10 million round trips to a database. Your program
wouldn't have executed. So it's not something you need to worry about.
>>:
No, but it could be an issue in the [indiscernible].
>> S. Sudarshan:
>>:
[Inaudible].
Yes.
Yeah.
17
>> S. Sudarshan: So think in this case, since we have, you know, we can
control what is sent and when it comes back. So that can be under the control
of the API. The submit part simply spews out the whole thing. So if the
[indiscernible], we can actually stop sending the queries to -- we don't
implement it, but there's no reason why we can't do this.
That, you know, we stop sending queries at some point, wait for the thing
submitted earlier to be consumed and then send more things.
>>: Isn't that true for any of the loops data, you could simply
[indiscernible] better iterations of the loop and sort of package things up in
batches in order to avoid having to materialize?
>> S. Sudarshan: Yes, that is true. So we could -- the rewriting would be a
little more complex. So you'd have an outer loop and then an inner loop per
mini batch. So we could do it.
>>: [indiscernible] variables remain the same [indiscernible] arbitrary
changes that are happening in the loop [indiscernible].
>> S. Sudarshan: It's possible we are not excluded. Yeah if you realize that
a variable is simply a counter, then we don't actually have to save its state.
So those are optimizations that are possible but not currently implemented.
Okay. So this is -- well, actually, pasted two docs together. So there's a
little bit of reiteration between batch and this. So I think this particular
one is the loop carried dependencies. So I am going to skip this part of the
slide.
But the thing to notice when we did the earlier paper, the reordering was not
complete. It was just, you know, said you can move this and then if it results
the condition, then you can do the rewriting. But there's no specific
algorithm to say what should you do, how should you do this, in what order do
you do the ordering.
So one of the contributions in this paper is an algorithm that decides when you
can move something and among the candidates which one to move. And it actually
does this iteratively until it cannot move anything until either the condition
for splitting is satisfied or it cannot move anything [indiscernible].
18
So this is an example of the dependency graph. So this is a little bit of the
inside study of what happens. So this are the statements corresponding to this
thing over here.
Now, again, over here this is a Java program so the statements here are Java
state -- lines. But that's not [indiscernible]. We have something closer to
byte code and those are the statements here.
But sticking to the Java statements, we are treating S2 as these two together.
And then S3 and S4. So let's look at some of the dependencies. The black ones
here are the flow dependencies. That is S2 defining a variable here so
variable count. And S3 uses it. So that is a direct flow dependency.
Then there are other kinds of dependency. There are anti-dependencies. So
over here, there's an anti-dependency from S1 to S4. Because S1 is reading
something and later S4 writes to it. So those are the anti-dependencies. Then
there are output dependence. In this case, this -- well, the dashed ones are
loop carry, which is across iterations of the loop. So over here are the
assignment to category is clobbered by the next loop, which also assigns to the
same thing. So that is an output dependency, but it is loop carried because
it's in the next iteration so we have a dotted red arrow here.
Similarly, if you see here, category is assigned here. And then in the next
loop, the value of category red is whatever is assigned in the previous loop.
So there's also a flow dependency which is now loop carried from S4 to S4. So
this is the kind of things which we graph. I'll skip all the minor details.
But finally, you'll see that from S4 to S1, there is a loop-carried flow
dependency which goes from the second part of the loop back to the first part.
Because this assigns it and that reads it. So those are the ones which prevent
splitting, and that's what we get rid of by reordering.
And we already have seen this particular example of creating a temp variable.
I'm going to skip that slide.
But let's see the same thing in terms of what happens to the graph. So what we
have done is added a new statement, S5, which is the temp thing over there, and
S2 has been rewritten to use the temp. So over here, there's a loop-carried
flow dependency from the second part to the first part. Over here, after doing
this reordering, you will notice that the part where we want to split is this.
S2 is over here. This is the execute query. We want to split the loop into a
19
part that is before it and a part that is after it. And here, there is no such
dependency going back. And that's where we are able to split the loop.
Okay. So there is the statement reordering algorithm. Again, it takes us in
input, blocking query execution statement. And the basic block, which is the
loop itself. And wherever possible, it reorders the basic block such that no
loop-carried flow dependencies cross the split boundary Sq. And, of course,
program equivalence is preserved. That's the formal definition, a statement of
what it does.
And again, there are a lot of details here. I will probably not get into all
of them. But I'll just give you an idea of what we are trying to do. So
basically, what we want to do is we want to find a statement which we can move
to some other place in order to get rid of that loop-carried flow dependency.
That's not good.
So the first step is do identify statements which we want to move. So in these
two cases, there's a V1 with a loop carried flow dependency to this. And in
this case, again, there's a V1 which is over here. Which feeds back over here.
Now, the second part is where the loop carried flow dependency is from the
query itself to something earlier. Or from the query to something later,
which, in turn, has been going back. So these are the various cases. So we
move the same things around in different cases. We won't get into all the
details.
So the thing is to decide what to move in each of these cases. And then we are
to see what other statements depend on the one which we want to move. Because
if we move something, something else may get affected.
So we have to move it carefully.
some cases.
We can move a set of statements together in
So in this case, we identify everything which is dependent and move all of
those past target. And finally, the statement which -- well, the dependence
statements are moved past target first, and then the statement itself is moved
past the target. Because if you don't do that, we are splitting the loop
there. So this is the last step. Once we move statement past, we can now
split the loop.
20
Okay. So true dependence cycle in a data dependence graph is a directed cycle
made up of only the flow dependencies and the loop carried flow dependencies.
And the theorem is that if a query execution statement doesn't lie on a true
dependence cycle, then the reorder algorithm is successful in moving things
around.
So this was a guarantee which you didn't have the earlier paper.
paper, we have the algorithm which guarantees this.
In the ICDE
Good thing is that pretty much all of this is applicable to both blocking and
asynchronous. And, in fact, our API [indiscernible], we have this one API, and
then we have a flag somewhere which says good as batch or do it asynchronously.
The API looks weird. We say at batch, when in the asynchronous case, it
actually goes and executes it. But the nice thing that the transformations are
identical. The API is identical. It's just a flag whether you want to do
batching or this.
Of course, in certain cases you cannot do batching, in which case the flag is
to asynchronous only.
Okay. So that is a quick view of what one of the kinds of things we do to
rewrite the program. And this is an overall flow of what we do. We take the
source Java file, pars it. Well, we use a [indiscernible] framework which does
all of that. Converts too the jimple representation. Data flow analysis and
def-use representation, all of these, the dependency graph, all of this is done
by SOOT up to here.
This part is what we do, apply the rules to move things around. The thing is
once we move a statement, everything changes. The dependency graph itself
changes. So after we do any such move, we have modified the jimple code, of
course, so we actually have to again do the data flow analysis so that we -dependency information is correct after the move and then we can again apply
more transformation rules.
We decided we are done and then we decompile and give the target Java out.
So this API, like I said, can be auto generated, or it can be manually used.
So there is a loop context structure in the API which makes it easy to remember
all those variables which were defined in the first part of the loop and used
in the next. And the same API for batching and asynchronous.
21
So this was demoed at ICDE also.
So that is a quick overview.
this is worth it.
Now let's move to the performance, whether all of
So the batching performance, there is a number of results earlier.
all of them. But one or two of those are presented here.
I won't use
So what did we use for doing our analysis? There were two public domain
benchmarks. There was two real world applications, which, one of which the
company was having actual performance problems for a real application. So they
had built it in a modular fashion, okay. So modular fashion. Everything is an
object and you have to give a stock option to an employee to have a set of
procedures which deals with one employee.
And now stock options are generally given as a batch. You give it to a number
of employees. And it turned out in their application, they used a really
expensive computer with, you know, plenty of memory and everything. And in
spite of that all, they had to process the stock grant, and they were running
out of that window.
So they came to us and it came out well because we had already been working on
this problem. So that is a good connect.
>>: Did you include yourself among those who got the stock?
[Laughter].
>> S. Sudarshan:
>>:
No, unfortunately.
[Inaudible].
>> S. Sudarshan: Actually, that
we give them the idea, they said
would work with them for a while
did. Whether they used our idea
to use our idea, I have no idea.
Okay.
area.
company turned out to be very trouble. Once
thank you and went away. So the idea was we
and develop it, but I don't know what they
and said bye-bye or whether they decided not
So this one, as I already told you, was something developed in this
So we used a dual-core machine. With dual-core, we are actually getting
22
a lot of benefit from having multiple threads. There were actually some
experiments on postgreSQL, the asynchronous part, we can do on postgreSQL.
Batching transformation is a little trickier.
So we look at the impact of various things such as iteration counts, the number
of threads, impact of warm versus cold cache, since IO is a big issue.
So one of the things we thought is if we increase the number of threads, there
would be more IOs happening which would destroy the normal sequentiality of a
single execution. It turns out that we thought performance would actually
become much worse. But surprisingly, you know, it didn't become much worse
with either of these systems.
>>:
[Inaudible].
>> S. Sudarshan: Yeah, for [indiscernible]. So we actually, this is what
happened first. We said okay, let's have a query which does a lot more work as
can. [indiscernible] didn't matter. So I think the data bases are fairly good
at controlling the load internally.
Okay. So here are these things. There are two things which are cold cache mum
numbers and then two which are the warm cache numbers. This is SQL server. So
if you go here, the original program with four iterations, you can see that the
transformed program is actually running worse with cold cache.
And with warm cache, the difference is even more. You can't even see
original program down here. So the bottom line is when the number of
iterations is very low, the batched -- well, which one is this? This
I think this is the batched one. No, sorry, this is the asynchronous
That is the threads. So it actually becomes worse, potentially. But
see the time was actually very small anyway.
the
is the -one.
if you
So ten-fold increase doesn't mean very much when it's ten milliseconds. But as
the number of iterations increases, you can see that the transform program,
like here, this is cold and this is warm. You can see that the transform
program was like nine seconds, when the original was 50. And in this case, it
was 5.9 seconds, when the original was 46.4 seconds. So the improvement is
substantial in the number of iterations is more. And this was with ten
threads.
23
>>: So it doesn't matter, because for the smaller number of situations,
because the run time is so small anyway. Through-put, it does matter when
you're ten times more resource intensive. So wonder whether you can -- it
introduces a new estimation or prediction problem. Based on the program, not
on the database. Would it be on both? Could you try to predict whether
[indiscernible] on the right or whether you are in the other case on the left?
>> S. Sudarshan: Yeah. So Ravi is actually working on that. I have not been
involved in that part. He's been working with some of the programming language
people to try to do static estimation of the number of loops that you would
have.
It is static or maybe dynamic based on previous ones.
cost-based changes.
So we need to do some
The other thing is I'm not sure that this decrease in performance is because
the database is certainly much more inefficient. It may just be that
[indiscernible] of setting up an asynchronous call and fetching it. So it may
not have any impact at all on the database. It only impacts the ->>:
[indiscernible].
>> S. Sudarshan:
>>:
Yes.
[indiscernible].
>> S. Sudarshan: Absolutely. Okay. Now, this one is thing impact with number
of threads with one thread -- well, we are at 46.4. And the time decreased
sharply. Starts levelling off after five. It improved up to somewhere around
here, 30 or 40. And then it started increasing again. So experiments were
done with ten threads. Could have been slightly better maybe with 30 threads.
Or maybe some of that impact.
But for the four iterations, it wouldn't have matters.
30 threads or four. It's all the same.
>>:
[Inaudible].
>> S. Sudarshan:
Hm?
Doesn't matter whether
24
>>:
How many processors?
How many processors?
>> S. Sudarshan: So the database server, I think, was a dual core. That is
here. 64 bit dual core is the database server machine. And the client -okay, this doesn't say, but I think the client was just a single core.
>>:
[indiscernible].
>> S. Sudarshan:
hyper threading.
Yeah, I think it had. That's, yeah, I'm pretty sure it had
So it probably would equal into four cores, at least.
Okay. That slide is done. Now, this is web service, where we coded this
manually, because our code does not actually recognize web service calls. And
again, here something which took almost 180 seconds, whether the database was
free base with a web service API. And after about 95 threads, you can see
there's improvement. They start levelling off after this.
So there's a lot of potential for this. Now, what about batching versus
asynchronous. If both are applicable, how do they compare? Sometimes only
asynchronous is applicable so then this is not relevant. And if you see here,
the first is original, the second is asynchronous, and third is batching. And
as you can see, batching, whether it's applicable, it pretty much outperforms
asynchronous. It's fairly clear, the number of messages you send over the
network is reduced. The database can use a better plan so you should use
batching if at all it is applicable. But if it's not, asynchronous still gives
a substantial improvement.
Okay. So that completes the talk. There are many directions for future
research. The one which we are currently focusing on is this. So whatever I
showed you was a query in a loop. Now, this works for certain applications,
but there's a whole class of applications where the query is deep inside a
procedure. So any application which uses, say, hibernate framework, it hides
the SQL underneath. You just see an object model and you [indiscernible] on
the object. And deep inside, it's either doing a SQL query or it is looking up
something that is already cached. It's a [indiscernible] object or it's
looking it up.
On this case, what can we do? If you had a loop or multiple objects and
whatever you're doing actually required running a query to fetch an object,
well, we would like to prefetch those things.
25
So we can't do this exactly the same kind of, you know, execution of the query
asynchronously like we did, but what we can do is if you recognize that inside
a procedure -- so here is a loop outside. Here's a stack of calls and deep
inside a SQL query [indiscernible] are being executed. Let's say it's likely
to be executed. And we know what the query is.
And if you can trace the way in which parameters are passed down so that the
parameters to the query is actually available up here in the stack, we can
actually issue a prefetch all the way at the beginning. And then we don't
touch the second part. We are not splitting loops. We're not doing anything.
So this is what is nice about the new approach. Very non-intrusive. The
earlier one actually did a lot of program rewriting. Now, the new rewriting is
to issue prefetch call. So that's work in progress which is coming out quite
well so far.
The second one is which calls to transform. As Jared was saying, we need to
figure out how many loops are there and then decide whether to split or not.
Minimizing memory overheads, there's some [indiscernible] optimization,
certainly.
How many threads to use. So our experiment showed something, but this is
always the case. Maybe it depends on the load of the database server. So it's
already heavily loaded and you start throwing a lot more work on it, you may be
causing trouble. And then you're not actually using it immediately.
So can we control this in a slightly more sensible manner?
And the last part is actually quite interesting, transactions with work David
and Jared here. So this is a big issue. We swept several things under the
carpet. The first thing that we swept under the carpet is even for the simple
read-only case, each of these threads like opens a fresh connection to the
database. Now how do you guarantee that all of these are running under the
same transaction?
So it turns out that in theory, you can use the [indiscernible] interface to
make all of these part of the same actual transaction. It also turns out that
many databases don't actually support this feature. So it's a bit of a
problem.
26
So that is supported, we can use it.
>>: Some of the things you might, in fact, want to have separate transactions
and not, in fact, have the database run long transactions but rather run a
bunch of short transactions instead.
>> S. Sudarshan: Yes. So if that is what you want, then we are already doing
it. We are ignoring the effect of taking one -- what used to be one
transaction, we could do one connection. But if you use auto commit and each
one was independent, then there's no issue at all.
>>:
You need to pursue it for semantics.
>>: Oh, sure, of course [indiscernible] several transactions it's the same
effect.
>>: But most of the so-called default transactions are transaction per
statement, which [indiscernible] another transaction. That's involved unless
you do something explicit.
>> S. Sudarshan: So that case, you know, we do handle as long is it's
read-only. If you do updates synchronously and things are [indiscernible]
you're in deep trouble. So we obviously don't issue updates asynchronously.
>>: So it's safe to turn a bunch of dis/PRAEURT transactions into one big
transaction, but it could some bad effects on [inaudible].
>> S. Sudarshan: Yeah. So if you're -- we can, since we have access to the
API, we can easily figure out whether your original program ran under the
single transaction. So in that case, we take all these connections and try to
shoe horn them into one transaction. But as I said, it appears not to work
quite right on the databases we've tried. So we've not been able to get it
running so far.
So with [indiscernible] that's [indiscernible]. With snapshot isolation, with
read-only, seems like a natural thing to say that here are all these
connections. Let them all use the same snapshot. Now, if the database decided
to support this, it would be completely [indiscernible]. The [indiscernible]
for the database to do it. But we need an API from the database to allow this.
27
Okay.
So that's it.
Any other questions?
>>: So one of the things that [indiscernible] background in compilers and
optimization, and I think an interesting question is to what extent the kind of
program transformations you're doing might be of use, whether or not you've got
a database program down in the bottom of the loop of some size. What sort of
conditions can you put on the things and what sort of generality can you do to,
in fact, get program transformations which might affect over scenarios as well
as the database scenario?
>> S. Sudarshan: That's a good question. It's, indeed, something we talked
about. So the first concern was maybe all of this had already been done in the
programming language community. So we just use it and substitute, you know,
add statement is substitute the by JDBC call. It turns out that these are very
high overhead, setting up loop context and so on is very high overhead. This
is not something that any compiler writer who is saying whatever put in in
order to [indiscernible]. So most of the work we do makes sense only if
whatever you're doing is extensive. So that is something they have not done.
But the specific analysis for loop splitting and so on, that has some
similarities with, you know, the parallel compiler where you want to take
something which has multiple iterations and then turn it into a parallel
execution. So some of the analysis is very similar to work that happens in
parallel compilers.
>>:
Any other questions?
>> S. Sudarshan:
Thank you.
Download