Document 17864769

>> Nikhil Swamy: All right. Thank you all for coming. I'm really delighted to welcome Philip Wadler to MSR. He's visiting us for the next couple of days. Most of you probably know Phil from his work on all kinds of stuff. Functional programming in Haskell, Aizu programming in generics for Java, databases and XML and query languages related to that stuff. And I guess he's going to talk mostly about query languages today and how that fits in with general purpose programming languages. So looking forward to hearing about that. Also a little plug for tomorrow. So at an unspecified time tomorrow Phil will talk about ->> Philip Wadler: Unspecified location. >> Nikhil Swamy: And unspecified location, Phil will talk about relational parametricity. If you're interested in hearing about that, send me an e-mail. I'll announce it on some of the Rise lists. If you're not part of Rise, send me an e-mail and I'll tell you where that meeting will be at some point tomorrow. Without further ado, Phil, managing query ->> Philip Wadler: Thank you very much. It's a great pleasure to be here and to be able to talk to this audience. I'm particularly pleased I know some of the people in here are researchers and some are developers. And we'll talk about both those things. So thank you very much for coming. Just a bookkeeping thing, to mention, for the more researchy ones of you, this talk that Nikhil mentioned, a long time ago I did something called theorems for free. And the way the magic in it works is through something called semantic parametricity due to John Reynolds. The way that semantic parametricity is explained is not surprisingly with semantics. And turns out the semantic explanation of it gets a bit weighty and difficult. So a few years ago I wrote a paper called the Girard Reynolds Isomorphism that explains the same thing but in a more simpler way using the translation between two systems, one representing the programming language and one representing a logic. And they're interesting translations in fact both ways. If you're at all interested in semantic parametricity this is a bit easier way of explaining what's going on. If you know about it and you want some extra insight or you don't know about it and you want to know what it is, then this would be an appropriate talk. And this is work I did many years ago that unlike most of my other work is not very highly cited and I think it should be. So I thought I'll go and talk to people about it. Right. Let's see. So that's the first thing out of the way. The second thing out of the way is Judith what are you doing here? You've seen this talk. >>: [indiscernible]. >> Philip Wadler: You can. This is the same talk I gave at DDBFP, same title even. But there's more data. We've got more data since I gave it at DDFP. >>: I have my questions, you see. >> Philip Wadler: Even better. So the last time I gave this talk at DDFP and also at the Midlands Graduate School, it had this title: The Essence of Language Integrated Query. I'm pleased to tell you that this paper will be presented at ICFP but not with this title. They said: Oh, no, we're not sure that's really the essence of language integrated query. So they said you have to change the title. So the new title is called a Practical Theory of Language Integrated Query. In many ways this is a better title because in fact what I'm going to show you is that there's a practical side and a theoretical side to what we're doing and I'll talk a little bit how those interact. But in some ways it's a worst title, because this title alludes to one of John Reynolds' classic papers, the Essence of Algal [phonetic]. John Reynolds died recently. So it's a shame that that bit of tribute to him has gone away. I should mention, by the way, that the work on semantic parametricity is also John Reynolds and so the talk tomorrow will certainly be a tribute to him. So the tribute's gone away, but his influence, of course, goes on. Right. What is the difference between theory and practice? You all know the answer to this riddle. In theory, there is no difference but in practice there is. How many people -- that's an old saw. How many people have heard that joke before? Yeah, all of you. >>: Attributed to young [indiscernible]. >> Philip Wadler: That I did not know. >>: But maybe he also [indiscernible]. >> Philip Wadler: Oh, okay. Good. I will get that citation from you afterwards so I can add it. So this is going to be our touch stone for the talk, what is the difference between theory and practice. And I'd like this to be an attractive talk, you guys all just interacted with me, good. If I say something you don't understand, please do ask. There are many, many, many database programming languages. Here are a few of them. I don't think I'll go through these in any detail at all, except to point out it goes all the way back to Kleisli many years ago and done by people at Penn who are now my colleagues at Edinburgh, so I ought to mention them. But because Kleisli in fact is named that because of the use of Monads [phonetic] in this work, influenced by some things I had done and it in turn has a deep influence on what I'm going to show you today. So there are many, many systems here. >>: [indiscernible]. >> Philip Wadler: Thank you. I better fix that. This is the one that I worked on. And one of the points is that all the way back to here you'll see the roots of the ideas I'm talking about. So these are very deep ideas, but I don't want to claim credit for that. What I'm going to be talking about is how do you take one of these database query languages that's a nice database query language and integrate it with a programming language that's a nice programming language? You've got the database language. You've got the programming language. How do you get them to play nicely together? And that, of course, is exactly what LINQ is intended to address. And LINQ was in fact in part based again on ideas of monads that go back to the Kleisli work and to my own work. Okay. And we're going to show a way of making this go even better. And these are our goals. What I'm going to do first is skip to the end and show you something. I really should move this slide to the beginning. So there's the whole talk for you. What I wanted to show you is this bit of results. So one of the contributions of this talk is I'm going to go through some of a small set of examples, and there are more examples in the paper. And these all cover a set of things that I think are important to do. In one way this is the essence of language integrated queries. We've tried to extract the essence of what are some of the important things we want to do. So these are all sample programs you can see in the talk or read in the paper. And here's what happens if you try to compile them under F#20 or F#30. And those little Xs mean it didn't compile. It fell over waving its legs in the air. You can see F#20 falls over on some things and F#30 falls over on some different things. And our stuff doesn't fall over on any of these. In fact, our stuff is guaranteed not to fall over on any of these. We will give you a theory. So here's the theory bit of it that says for things in this subset it always works. So that's quite nice. You can know for certain set of programs that you might rise queries that they will not fall over kicking their legs in the air. Now the subset we deal with, this is the difference between theory and practice, the subset is smaller than everything you might want to do. There are other things you want to do. You will see that the key idea that we're doing here is we're taking programs and normalizing them. This is the amount of time it takes to normalize. The nice thing if you're accessing a database, accessing a database is expensive. So normalization, down in the noise. Right? You can afford the time to normalize if what you're doing is accessing a database. And that's what these figures show you. But as I say we only deal with a subset of queries in the theory. What happens in practice? So we took every single query that's in the F#30 documentation and ran it through our system in the normalizer, and the point is they all work. So that's the practice side of it. In practice, this could be put in F#0 today and more things would work and nothing would break. So that's the take-away lesson. So let me go back and explain how it works. So I mentioned that part of what we want to do is extract out a set of problems that capture the kinds of things you want to do. So here's my list to explain the kinds of things we want to do, abstraction over values. Pretty much everybody can do that. Abstraction over predicates. You can do higher order queries, composition of queries, composition is always a good thing. Dynamic generation of queries. That's very important. And something that's often not dealt with well. And type safety is a nice thing to have. We also want very much the Goldilocks property. The number of queries that you generate should be not too few and not too many. Where and so what's just right in this case is very easy to characterize, just right means exactly one. So one query in your program should turn into one SQL query sent to the database. And I'm going to be working only with SQL, but I believe that the ideas I'm showing would extend to other things, for instance, queries written in a different query language like XQuery. Possibly even these ideas could extend to integrating GPU code into a general programming language, or something like that. So we want exactly one. We don't want too few. What too few means is no queries came out, it couldn't do it. Failure, kicking its legs in the air and not too many. You don't want to have what looks like one query end up generating a thousand different queries, and one of the things I showed you in that result table does generate a literally a thousand queries from one query, and you don't want that to happen. And then as I mentioned, the theory is not everything you could write in LINQ. The theory is basically dealing with select from where queries and exist and union. What we don't deal with is group by or sort by, and those of course would be very important. to extend the theory to include those. It would be nice As I've mentioned the practice includes those already. But the theory doesn't yet. We can't give you any guarantees about them. And just as a notational convention doing exactly what happens in LINQ, every time I say list, I mean bag, because we're going to a database, the ordering is not relevant. As I mentioned we're not dealing with sort by. Okay? So I will say list and bag interchangeably. Is everybody familiar with bag? Some people prefer multi-set. What do people use here, bag or multi-set? Fine, I will continue to say bag. Okay. So here's an example. So I've got a database consisting of some people and their ages. And some couples. And couples here have a her and a him. If you've been paying attention to what's going on with the law in the UK, very soon I'm going to have to apply schema update to this schema. But for the time being we'll be old-fashioned and have her and him. >>: I don't know how you're going to do that since the columns are distinct. >> Philip Wadler: Yeah, it's much easier to have a her and a him. can have a partner one and partner two. You >>: Yeah, but the problem is how do you really have two partners, neither one is first nor second. >> Philip Wadler: Yes, fortunately that entire question is orthogonal to this talk. So here's a typical query written in SQL. Find all the women -- what we're going to return is the women's name and the difference between the woman's age and the man's age. And what we're going to be drawing from is couples as C, people as W. People as M. And of course W is going to be woman. M is going to be man. And the her must be the woman's name and the him must be the man's name and the woman's age is greater than the man's age. This is just finding all couples where the woman is older than the man and we're printing the woman's name and the difference in the ages. Indeed, if they were symmetric it would be hard to use this example. So here's our database. Well, wait, no, here's our database. But what we want to do is look at the data in the programming language. So we want some way viewing the database as data in the programming language. There's an old idea about how we do this. We are going to view each table as a bag of records. And since we have several tables we actually have a record of bags of records. So here we have a record with two fields people and couples. And people is this bag of records and each record has a name and an age. And couples, again, is a bag of records where each record has a her and a him. Everybody suitably bored? If there's something you don't understand, please do ask a question. No questions yet? Yes? >>: Before you alluded to just when you make a query and just kind of keels over and puts its legs in the air, what exactly does that mean in essence? Does that mean the parity is just invalid or ->> Philip Wadler: We'll see examples where it might have trouble generating a suitable SQL query. So this is what happens in LINQ. You can write LINQ things that denote queries, but at runtime it tries to turn that into an SQL query and it fails. That's what I meant by kicking its legs in the air. Our type of database here is just a record of lists of records. So this is just encoding the type of the database. I'm saying here, okay, so DB prime is going to be this data. And I've put a prime on it to indicate you don't really want to do this. I'll explain why in a minute. But the idea is we've got a construct that says read in the whole -- look at the people database. Read it in. Convert it to a data structure and store that in memory. And then we could write out our query now in ordinary F#. This is ordinary F#. This corresponds exactly to our query. It says integrate over the couples' table. Let's see vary over that. W vary over people. M vary over people. Do a condition just the same one as we had before. The her field is the same of C is the same as the name field of W. The him field is C the same field as name field of him and M is greater than MH and we yield as before a name field which is the woman's name and a difference field which is the difference between the woman's age and the man's age and there we get the answer. There, we're done. We've now integrated our queries into our programming language, and we're writing them just as ordinary programs in our programming language. This is fine. Why did I put a prime down there? Well, of course this is fine except for one tiny problem. And the tiny problem is doing it this way is insane. Other than that, it's perfectly fine. Why is this insane? Most of you will probably have already figured this out. Here I've got about ten records. Of course, in a real database I might have 10,000 or 10 million or 10 billion or 10 trillion records, and reading 10 trillion records into the computer's memory might not be feasible. The other problem is if you look at this query, this specifies for C ranging over all of couples. For W ranging over all of people and for M ranging over all of people. So these are large, it would be the size of couples times the size of people squared, because we ran over people twice. So that could be huge. This is at least cubic in the size of the database. And what you'd like is something that runs much faster than that. Of course, with indexing as an actual SQL query, this would run much faster than that. So apart from the fact that this is insanely inefficient, it's perfect. So what we'd like to do is write something like this and generate SQL and actually have the SQL execute against the database. How do we do that? So the key idea which already exists and is used in F# for exactly this purpose is quotation. So the claim is quotation is the essence of language integrated query. So a quotation just means a data structure that represents an expression. And the type of a data structure representing expression of type A will be written Expr A. So here I've got Expr DB is the type of quotations that return values of type DB, where DB is exactly what we had before. And quotations in F# are written in these brackets written angle bracket at and at angle bracket. And the quoted code I will always have written in blue. So now instead of saying actually read in database people, we say just bind DB to a quotation of something that stands for access to database people. So now, before, remember, we just had here's an expression in F# of this type. It returns a list of name difference records. Now we're going to return an Expr of the same thing. So we just write the thing in quotation brackets. And other than that it's exactly as it was before. And now what we do instead of just saying differences, we say run of differences. So run is going to be this construct in our language that takes a quoted thing and runs it against the database. And we get the same answers before. So what does run do? It computes the quoted expression. It simplifies the quoted expression or normalizes it. It then takes the normalized quoted expression and translates that to SQL. We will see that once you've done normalization, it looks almost exactly like SQL. So this step is very easy. You ship it off to the database and run it. That step's very easy. You get back a table. You translate that table back to a data structure in the host language that can be processed. So all of these steps are very easy. Turns out this is rather easy as well. So in fact all these steps are pretty easy. The hard work, of course, is actually executing the SQL query, but we're using existing SQL implementations to do that. And now here's the guarantee. If you stick to the subset language that I mentioned, which is just the only construct we're going to use are for, if, yield and also concatenation of bags and the exist construct, which given a bag returns true or false depending on whether or not it's empty. So just testing a bag for emptiness, unioning bags and for, if and yield. If you just write your program using those and in addition your answer type is what is called flat, meaning it's a bag of record of scalars. So we've seen tables are just that. They're bags of record of scalars. So if your answer is a table, which it better be if it's the answer from an SQL query. The first requirement is just the answer has to have the obvious type for a table return from an SQL query. Second thing is you only do permitted operations. So those are for, if, yield, as we just saw. Union, exist, and, right, if you're doing addition or less than or whatnot, they better be operations that the database supports. Check that this number is prime. Can't do unless it's actually supporting your database language, which last time I looked at SQL it did not have an is prime primitive. And in particular recursion is not generally supported in SQL. You can't use recursion. And finally, your uses of database must all be consistent. Right? If we expand this program out each of these expands to a use of database, they all better be the same database, right? You tried access two different databases then you'll be in trouble. Of course we'll put as many tables in the database as we like, but it all had better be one database. So all these constraints are I think quite reasonable if you want to translate into SQL. Okay. So I mentioned all these great things we want to be able to do. Abstraction. Composition and dynamic generation of code, how does that work? So the first thing you want to do, of course, is to be able to abstract over values. So here's something range. And range takes a pair of integers and returns a list of names. It must be a table again. So we make a list of records. Each containing a name field. And we're going to find everybody whose age is greater than or equal to the first integer, and less than the second integer. So it's a function for each W in the database. Check the age. And notice this is a range over all people. So it's women and men. And then we yield the name. So we're going to find all people whose age they're 30 somethings between greater than or equal to 30 less than 40 and that's Cora and Drew. Notice, by the way, that the abstraction here is going on in the quoted code. I'll say more about that later. But that's actually slightly surprising. I'll come back to that. So that's fairly straightforward, right? Everything supports abstracting over values. That's the obvious thing that you want to be able to do. A more sophisticated thing to do is abstracting over predicates. So now let's take an arbitrary predicate over integers and return a list of names. So now we take the predicate as an argument and again iterate over everybody and people. And if the age satisfies the predicate then we return the name. The predicate of course can be between 30 and 40, and then it would return Cora and Drew as before or the predicate might be people whose age is even because mod works because it's supported in SQL. Again if we had prime here it wouldn't work because prime is not supported in SQL. So now we're invoking satisfies on an arbitrary function. Again, the function not surprisingly has to be written in the quoted language. So everything here is just inside quotes. Any questions about that yet? Okay. So now we can extract -- yes? >>: So the expression of integrals and names, I'm surprised it's not an expression of integral to an expression of names. >> Philip Wadler: Right. So this is exactly the point I just mentioned. I said it's slightly surprising that this bit of the abstraction and this bit of the abstraction is happening inside the programming language. And you just confirmed that by saying I'm surprised. I expected it to be Expr of it two Expr of names, I'll return to that later. I'm glad you said you were surprised. I was surprised, too. Okay. And then, of course, we want to compose queries. So here's something getAge, which is going to run over a loop of people and if the name matches, returns the age. And notice that using these constructs, the best we can get coming out is actually a list of ages, because there's no way of converting a list to a single value. That's not one of our supported operators. So the closest we can get to getAge is return a list of the ages. And then we can compose getAge with range. So given two strings, we will return Everett. We'll find the age of the first person, the age of the second person and we'll return everybody whose age is between, that's greater than or equal to the age of the first person and less than the age of the second person. So these are all people that are at least as old as Edna but younger than Bert. That turns out to be Cora, Drew and Edna. Now we're just in a very straightforward way composing queries to build a larger query. So we call getAge. It's going to return list. Let A be everything in that list. Actually, there's only one possibility. The same for B and find everything in range. We'd like to turn this into just -- right. [indiscernible] was kind enough to read this paper and he said what's hard about that? Execute getAge on the database. Execute getAge again database, execute range on the database. Three queries, you've got your answer. Yes indeedy. But we don't want to do it that way. We want one query. Every time we call run, it should be a single query. And it turns out turning that into a single SQL query is not entirely trivial. And then finally of course we would like dynamically generated queries. This is extremely important. You fiddle around with your Web interface. This is almost every Web program in existence. What you're really doing is through the Web interface building some data structure, turning that data structure into a query, executing that query on a database, and then displaying the answer. That's pretty much every Web application in existence. So as a simple example of that, here's a data structure that represents predicates. So above will mean greater than or equal to the given integer. Below means less than the given integer. We can take and/or of arbitrary predicates. So not surprisingly this structure T0 of type predicate represents the query we had before, which are people at least 30 and less than 40. And this is something different, not of or below 30 or above 40 which happens to existentially be the same predicate, not the same piece of code but true for the same values. Any questions about that? That's how we represent predicates. How would we turn this into a query? We use our Web interface. We build up a predicate and we want to query against the database. Again, this can be pretty straightforward. I'll write a function P that takes a predicate and returns an Expr into Boole that's given a age, depending whether the predicate is satisfied. There's an operation that takes an integer into an Expr of integer, which is called lift. This percent sign means splice into some code. We've been using that before. To splice into our database or call to getAge or to range. People didn't ask about that. So I assume it's fairly straightforward. So that just takes a value of type Expr and splices it in to get a bigger value of type Expr. In this case you want to splice in is an integer, convert integer to a lift between Expr of integer and we can splice it in. And this is the obvious thing, the function over X that if the given integer is less than or equal to X. Similarly for below. And then for "and" we just recursively apply P to T and U to convert those into predicates and apply each of those to X and as a result similarly for or, similarly for not. You couldn't imagine a much more straightforward piece of code than this. Now, remember our database doesn't handle recursion. This is, of course, recursive code. We're recursively invoking P. But we're doing the recursion at query generation time, not query execution time. So recursion at query generation time is fine. Query execution time is not. Notice also, remember, you said before you wanted something type was something goes to Expr of something. Here we have this. The type of this code is given a predicate return an Expr of int to Boole. The predicate argument can't be inside Expr because we're taking apart the predicate at query generation time not query execution time. So things are inside the Expr when we want them to happen at query execution time, outside when we want them to happen at query generation time. And then not surprisingly, P of T 0. You just expand that out using the code I gave you before and it turns into this, which is kind of messy, because we've got -- here's the above 30. And then here's the "and" of those two things. Here's the below 40. And this is, of course, needlessly complicated code but we'll normalize it. And the normalizing here is very easy. We just substitute X for X in this case. And we end up getting this for the normalized code, which is fine for executing. So now I can do satisfies of P applied to T0. And this gives, of course, the same answers as before. And of course since T1 is the same predicate, this gives the same answers as before. Now we've got dynamically generated code because we can build T0 at runtime. So notice that T0 here, its type would be predicate, not Expr of predicate. But it generates an Expr. Okay. Any questions about that? That's the basic technique. There's one other thing we can do, which is really cool, which is nesting. Sometimes it's very useful to build a non-flat data of fielding your query. I'll give you what I think example of this. And our point is going to be it's nest as long as the answer itself is not nested but structure as part is a compelling perfectly fine to flat. Again, a good area for future work is what do you do if you want the answer itself to be nested there's nice work by the Ferry Group I showed at the beginning they show if your nesting is D deep. If you have a list of list of lists for your answer, that's three deep. They can do it with three queries, not one, but three queries, number depending on the depth of nesting. That's work that's been done. Adapting that to this work would be interesting future work. Let's restrict ourselves, the answer is flat but there's maybe some nesting in the query. Why would you want to do that? Here's an example. Here's some company. It's got four departments. Products, policy, research and sales and each department has some employees. And each employee has tasks they can do. So Alex knows how to build stuff. Bert knows how to build stuff. Cora knows how to abstract, build and design things. Fred knows how to call. And so that makes sense because like Fred works in sales. So it makes sense he knows how to call. Cora knows how to abstract. That makes sense because she works in research. Alex knows how to build that makes sense because she works in product. So how would I represent this organization? It's got three tables, departments, employees and tasks that we've just seen. So there's our organization represented and ready to do a query. Here's the query I want to do. Find departments where every employee can do a given task. So here I'm asking find departments where every employee knows how to abstract. Turns out there are two departments like that. One is research. Not surprisingly. The other is quality. That's maybe surprising. But not too surprising, because if you go back and look you see that the quality department in this company has no employees. So if you ask is it true that every single employee in quality knows how to abstract, the answer is yes because there are none of them. So here's the query. Very straightforward, right? Well, let's see. We go over all the departments. Let D be a department. Let's go over all the employees. Find those employees that are in that department. And then for those find -- let's go over tasks. Find all the tasks done by that employee and check if the task is this task U that we're interested in. So U is the name in this case abstract. So now we're forming an inner list here for each given employee of all the tasks that that employee can do and we're saying, right, if one of the tasks the employee can do is the given one, then yield an empty record. So then this list will be non-empty if the employee can do the given task. Then we take the negation of that. So this is now the list of all employees that cannot do the given task. And if that list and will yield something if that's true. And if that list is empty, then that means there's no employee in that department that cannot do the given task. Nice straightforward piece of code, right? If you were writing SQL, the exact analog of this is the cleanest thing you could write. This is actually the native code that somebody would write in SQL to answer this query. But as we've just seen, it's not really very clear. Can we structure this in a way to make the query easier to read? So that involves nesting. So here's a more logical way of structuring the data that we have. We're going to have a bag of departments -- a bag of records. And the record has a department name and a list of all the employees. For each employee we have the employee name and a list of all the tasks that employee can do. So Cora can do three tasks. Drew can do two tasks and so on. And quality has no employees. So now we want to build the nested data. That's easy. Here's the straightforward query that takes the three flat tables that I showed you and gives you a nested organization structure. Very straightforward. Then given a nested organization structure, so here's some higher order queries you might want. And these all actually exist in LINQ with pretty much these names. "Any", which takes a list of As and a predicate over As and returns true if some value in that list satisfies the given predicate. So this is just the exists predicate. "All" is the dual of that. So run over the negation of the predicate and as long as that's non-empty, this is just the normal way for defining for all in terms of their exist using double negation, using De Morgan's law. This returns everything in the list satisfies the given predicate and contains is just checking whether given a list of A's and a value A, it just checks there's some value in the list that is equal to the given U. So a list of Xs and a value U, just the predicate we use with any is X equal to U. So this is check for containment. Check whether the value A appears in the A list. So now we can rewrite expertise much more cleanly. Remember, before we had expertise prime. Here the prime doesn't mean this is insanely inefficient, here the prime means this is insanely difficult to read. So here's an equivalent query, which I think is not insanely difficult to read, which is just for each D in the nested organization, extract from that the employees, use the predicate that the tasks field of the employee contains the given task name U. And if that's true, yield that department name. So again running expertise over abstract we get quality and research just as before, because the two predicates are equivalent. Okay. So that's why you might want to do nesting. moment I'll say how we support all this. And now in just a >>: SQL should be extended with the nested structures. so many people happy. It would make >> Philip Wadler: Possibly you want a bigger programming language. But I'm not going to address that problem. I'm going to say SQL is a hard object to move. But you guys can change LINQ if you want to. And this doesn't even require any changes to LINQ, just a change to the LINQ query provider, the thing that changes the LINQ expression tree into SQL. So this is an easy way of getting all that power without needing to change SQL and getting it easily within LINQ. Now I'll go to the point you were surprised by and that I think everybody should have been surprised by. I was surprised by it. I was so surprised I can still remember exactly where I was when I realized: Oh, this is surprising! I was riding through -- in fact well named, the Links, in Edinburgh, with the castle to my left. What's going on here? So what's interesting here. The way we wrote range, we took a pair of integers, all done inside the quotes. And you might well have been surprised by that because you might have expected it outside the quotes. The natural thing to do is to say, well, really take two integers, but just to separate out the problems, let's say take two Exprs of integers, and we want to plunk those in in the right place in the query. Wouldn't that be the best way of doing it? Well, no, because that gets in the way of building, doing compositions that return just one query. So then when we did compose we'd like that to give a single query. As it was, that was easy. What would happen if we put the quotations on the outside? Put the quotations on the outside so let's make the minimum change we could. So we could also put these strings on the outside. Let's even leave the strings on the inside. Let's just assume only this one did the expected thing and put them on the outside. What goes wrong? What goes wrong is we call -- again we call getAge of S and getAge of T for A and B. But A and B now are in -range now is expecting quoted things. So we give it A quoted and B quoted. And in fact there are types of instances with quotations this is a fine bit of code and everybody is happy. There's something called meta ML, where this is a perfectly fine thing to do. F# supports typed quotations, but it does not support the type system of M meta ML. What happens if you try to do this in F#? It gives you an error message. I will now give you a loose translation of the error message. The error message says where are you, crazy? You've given me a quotation for A, but A isn't bound within that quotation. Turns out A is bound in an enclosing quotation. But F# doesn't handle that. So if F# could handle it, this would be a perfectly fine way to do things. But it can't. So what you have to do is put it on the inside instead. That's why the surprising thing happens. >>: Expr is multiplicative? >> Philip Wadler: multiplicative? What do you mean when you say Expr is >>: I'm thinking multiplicative Haskell, actually. >> Philip Wadler: Right. You have an Expr that's of a function, Expr type A and B another type A and you want to apply the two. >>: Yes. >> Philip Wadler: Here's some whiteboard. So the type of F is an Expr of A to B. And the type of X is an Expr of A. >>: Can you apply those -- is there [indiscernible] gives you an IP to that? >> Philip Wadler: want. We just do percent F and that does exactly what you So if you read the paper -- I'm not going to do it here, but if you read the paper, there are things that says these are what is called open quotation. Open quotation is not bound in the quotation, these are easy to deal with as long as you have a very sophisticated type system that including in the type of the Expr the types of all -sorry, the names and types of all the free variables that appear inside quotes. So it's very straightforward to deal with. It's just F# doesn't deal with it. And we show that if you take a very simple version of such a language in fact basically we say instead of having an open variable, lambda abstract over it. That's a fairly straightforward thing to do. That's all we've done all the time. Instead of having free variables, we just abstract over them. So in fact if you had perfect -- if you were willing to change your programming language, you would be perfectly fine to add one of the many different ways of dealing with open quotation to it. Surprising result, the aha moment I had, because I was doing it for ages without realizing it, the paper was almost ready to send off. In the week we were sending off I suddenly went, wait, this is amazing. We're putting all the quotations inside rather than outside. And the reason we're doing it is because then we don't need open quotation. And you never need open quotation because you can just lambda abstract instead. And that's all cheap because we're going to normalize things before we generate the query. So the fact that we've got lots of extra lambda abstractions that get applied doesn't matter. It all gets normalized out. So there's nothing deep here, but as I say it was surprising enough to almost make me fall off my bicycle. >>: Isn't there another problem, too, that range actually runs the query already, so you would ->> Philip Wadler: No, you never run a query until you say run. >>: Range prime I thought was [indiscernible] to be running. >> Philip Wadler: to Expr. No, range prime just had a prime because it's Expr >>: I see. >> Philip Wadler: names not names. By the way, there's a typo here that should be Expr >>: [indiscernible]. >>: I didn't notice it was outside. >> Philip Wadler: There's the run. >>: But still range prime is a -- you so you would apply it to runtime still in the upper ->> Philip Wadler: Yes. >>: In a system like MetaML you have the nested quotations what you're saying is take MetaML and quotations and start applying... >> Philip Wadler: This would be perfectly fine -- >>: [indiscernible]. >> Philip Wadler: This would be perfectly fine in metaML yes. >>: But arbitrary metaML do quotations, the misinformation do transformation, put all quotations inside and use lambda extractions. >> Philip Wadler: So the interesting question is could any program in metaML be changed to this other style, and I don't know the answer to that yet. >>: So you don't know the exact -- for this system it holds? don't know yet like when would it fail to -- But you >> Philip Wadler: Yes. I detailed in detail how and Haskell implements the metaML system for the quotations. As a week ago it has that in it as a feature. >>: Way advanced. [laughter]. >>: So behind, Tim. >> Philip Wadler: So what I'm saying here is in the language that only supports closed quotations, like F#, obviously you better use closed quotations if you can rather than open quotations. That works for us because of the normalization and if you want a more complicated way of saying it, prefer quotations of functions to functions of quotations, which is, you want a function that took a quotation to a quotation, we've got quotation of a function. There's a more complicated example in the paper involving changing queries over XPath into SQL. Look at the paper. I won't show you that. But what I do want to do is take a couple of minutes to show you how it works. So this is scheduled from ->>: We have the room until three. >> Philip Wadler: Until three but people probably want to go. People want to leave, should I finish at 2:30? Is that what people would like. >>: You can go up to 2:30 and maybe a bit past. >> Philip Wadler: Okay. Let's aim for 2:30. So this is a very straightforward type system for the language we've got. All I want to show you, this is the typing for expressions. It's just what you would expect. Notice that we've got recursion as a construct in the language for expressions. Here's the typing for quoted terms. It's again exactly what you'd expect. Everything in blue this time. Notice that we've got access to the database in the language of quoted expressions, but not in the language of unquoted. So things outside of quotes you can use recursion. Things inside you can't. Now, in practice the way you do this is you just run over the quote and see is there anything I'm permitted there, like recursion. So at runtime you'll need to check that. In the theory of the language, then in the theory we check for that in the type system. And then the key things are the things that move between quoted expressions and unquoted expressions. Notice quoted expressions need an extra thing in the type so things in gamma are free variables that appear unquoted. Things in delta are free variables that appear inside the quotes like fun X where X is inside a quote. And then the thing that moved between just moved between the two judgments. Quoting moves from one judgment to the other. And I quote takes another way around. Run takes the quoted thing runs it. Type T where T, remember, is, what was it, bag of records of scalars. Your table type. And Boole, remember, takes something of base type. So that's O. And turns it into an Expr representing same thing. So now we need to normalize. The rules for normalization are all very familiar. If you've got a function application, substitute. If you have a record, extract the record field. These are both called beta reduction. They're very standard rules for for and yield. These are called the monad laws. If you've got four of something that's immediately doing a yield just substitute. If you have two 4s you can rearrange them. That's called the associativity law. And you have other laws that are very straightforward. If you've got a four of an if, turn it to an if of a four. You have a four over the empty record, that of course would just be the empty record. If you have a four over concatenation, we didn't even see examples involving this is union, but that would just turn into a union of two fours, and of course if of true and if of false reduced in obvious ways. Notice we're using the vary of F than else where the else clause returns empty. It's a where clause in SQL. >>: Don't you worry about captures at this point? >> Philip Wadler: Yes. Capture avoiding substitution, of course. And so these are very standard rules. They go way back. And then we need some non-standard rules just to make sure it's in SQL format. So SQL is not completely compositional. You can use union but only at the top level. If you have four of a union turn it into a union of a four. This, by the way, is the only place where we rearrange the order. This is why we have bags, apart from the fact SQL supports bags. So again we'll need to if we have 4 over an empty list we'll have to turn it into an empty list, turns out SQL doesn't support that. And these are just all straightforward things. The most interesting one is you cannot have two where clauses in SQL. So if you have two successive where clauses turn them into one. You can't have -- you can only have a where inside a four. If you have an if outside the four, push it inside. So this pushes all our ifs to the end. So let's -- and then this has all the standard properties you would want for both of these relations. >>: I'm surprised you pushed it to the end, bubble these to the top as much as you can. >> Philip Wadler: The most efficient thing is to bubble the ifs to the top. The thing SQL supports at the end. SQL forces you to push the wheres to the end. What we'll do is push the wheres to the end and SQL optimizer will bubble them up again. But this has all the standard properties. The reductions preserve typing. Strongly normalizing and confluence. You can apply them in any order whatsoever. So all of this is very straightforward. >>: You said they're strongly normalized but you didn't show the reductions for the recursive functions, right? >> Philip Wadler: We're reducing quoted terms. >>: [indiscernible]. >> Philip Wadler: But only reducing the quoted stuff and the quoted stuff can't include recursion. That's why it's all easy. And I should mention that you can find these rules at least going back to the old papers on Kleisli. Ezra Cooper working on the LINQs team working with me, wrote a paper that has essentially these rules and these rules in it. They were done all at once. Strong normalization was hard to prove. One of the small innovations in this paper is we break it into two sets of rules, straightforward, strong normalization is well known. These you have to prove strong normalization but it's straightforward. >>: Do you really have strong reduction, can you de[indiscernible]. >> Philip Wadler: Yeah. >>: Beta outside, call name reduction? >> Philip Wadler: Yeah, we're doing call by name reduction in the quoted terms. Doesn't matter. >>: Can you reduce even underneath that I suppose ->> Philip Wadler: Yes because there's no side effects. >>: I don't see the rules that let you do that. >> Philip Wadler: What rule lets you do it? This rule. >>: Reduce N without having ->> Philip Wadler: Sorry, it's the compatible closure of these rules so you can reduce anywhere. >>: Okay. >> Philip Wadler: Example. Remember compose. Let me show you how all these rules help us out. Here's compose of Edna and Bert. We take the definition of compose, expand it out. That involves calling age of and range. We expand everything out and we get this. So this is in fact what happens after you have spliced everything together. So at the point we need to normalize, this is what we're given. So now we're going to normalize this. Let's see, what have we got? We've got this function applied to Edna and Bert. We'll substitute Edna and Bert. We've got this. We've got some fours with fours inside of them and ifs inside of them, but remember we have rules that would percolate those out. We go ahead and do that. We do it here again. Now we've just got a bunch of fours and ifs. But we've got ifs alternating with fours. We have these rules, push it to the end. We do that. We use the rule that combines this. So all these separate ifs now get combined together into one big if. And this looks exactly like SQL. Right? Select, where -- sorry. These are froms, this is where, and this is the select at the front. So this has to go to the front. But that's about it. So there's the corresponding SQL. And we execute that and we get the answer. Okay? And this is what you saw before, right? So -- oh, the nested one, if you run it in F#30, it will run but it actually acts on the nested structure and to act on the nested structure it issues the query the same number of times as our departments. So if you have 100 departments, this will execute 100 queries. Whereas we execute one query. So it's a lot faster. Everything else we're either slightly faster or slightly slower than F# in either variant. And they're all much of a muchness. And the point is the tables are big enough that the normalization time doesn't matter and is quite tiny. And this is from moderate sized tables with about 5,000 entries. Really big tables the normalization really doesn't matter. And this is, by the way, using a very slow normalizer. I'm sure you could write faster ones. >>: Why does F#2 or F#3 fall over in some cases? >> Philip Wadler: Ha, because they're not doing the normalization. They could be, but they're not. >>: What does it mean, like SQL they cannot generate SQL for it? >> Philip Wadler: You can build up the Expr that represents the query and then you hand it over to the query provider, which is supposed to turn it into SQL. And it says, well, I don't know what to do here. >>: It changes between different releases. >> Philip Wadler: Changes between different releases. So for the subset I've described, we can always guarantee it works. And as I mentioned if you're outside that subset for all the queries in the standard documentation, it works. And again the times are, the normalization time is small. So this is something that somebody could just sit down and implement today. Right. We've implemented it. Turns out there's one problem with F#30 in hooking in and using your own data provider, your own thing that converts LINQ expressions into SQL queries. It would be very nice if somebody on the F# team would do that so people can start using this new technique. That can be done now. And there are all the details. But, please, you can download the paper from my website. There will be an updated version by the end of the month which will be the camera ready copy for ICFP and please have a look if you would like to see more details. So these were our goals. You've seen they've all been achieved. And finally I want to return to this old question of what is the difference between theory and practice? So we do have two different things. We've got the theory that I've showed you and we've got the practical implementation. Right. And the theory doesn't apply to all programs, but if you just do normalization over ordinary programs, it never makes things worse, not surprisingly. So anything that ran before still runs after you normalize it. And some things that didn't run before do run, if you normalize it. What we actually got here is a recipe. And the recipe says what do you do if you want to take some arbitrary domain-specific language and integrate it into your programming language? Well, just write out the domain-specific language in the same syntax as your programming language and quote it. Actually, it's not necessarily. And then normalize it. The one thing -- what normalizations do you need? It will vary from language to language but you certainly want beta reduction. And sometimes maybe even beta reduction will be adequate. I'm guessing that for generating GPU code it's just data reduction. I'm not sure. We'll look at that as an example next. Some reviewers of the paper point out, wait a minute, you do not need the host language and the quoted language to be the same. In fact, in our case they weren't really the same. It's the same syntax, but the host language has recursion. The quoted language has the database construct. So in fact they differ just slightly. In fact, you could just have your quoted language be completely different. Nobody does that, though, in practice. In practice, people implement quotation that's for the language that contains the quotation. So the quotations in F# are of F# code. The quotations in Haskell are in Haskell code and so on. So what is the difference between theory and practice in our work? Well, in theory there is a difference, but in practice there isn't. Thank you very much. [applause] >>: So I'm curious if you looked around and found evidence of people running into this limitation of F# and complained about it, basically. Or sort of developer communities running into this obstacle, essentially. >> Philip Wadler: Right. That's a very good question. The answer is you can find various blog posts saying -- Thomas Petricheck [phonetic], for example, who is on the F# team has done clever work in getting things like our dynamic queries to work. Done [indiscernible] if you want dynamic query to work here's how you do it. He gives a clever recipe. He doesn't have any theory that says it's always going to work or anything like that. So this is actually a very good question. How often do people bump into this in practice? We don't know. There's no systematic data. There's a little bit of anecdotal data from people doing blog posts saying this is how I managed to get this thing to go through LINQ. But in fact I would say we don't have convincing knockdown evidence to prove it's a problem. >>: Moving chart because when people complain we say we'll fix that case? >> Philip Wadler: Looking at this this, these are all things you'd like to do, and sometimes you can and sometimes you can't. So this is sort of our knock-down evidence, saying there's a problem. >>: Do you know about other languages, like C# LINQ providers, [indiscernible]. >> Philip Wadler: We've not tested any of this in C#. That's a good question. And C# doesn't actually give you quotation per se it's all hidden that give you authors that build up expression trees but they don't have quotation per se. Doing this in C# would not be quite so easy. And one of our recommendations would be put quotation into your programming language because it gives people this as an option. >>: So F# [indiscernible] that is what is it that it provides in its more complicated algorithm? >> Philip Wadler: It's complicated enough we haven't looked exactly what this is doing. We're hoping they will just adopt it and use it. >>: So you don't know, for example ->> Philip Wadler: Drop a note saying you should implement Phil's stuff right now, I want it. Sorry? >>: For example, it's at least not the case that they're just missing the normalization phase or that they do ->> Philip Wadler: No, it's exactly just that they're missing the normalization phase. You just add normalization it will work, because the back end that we used for this is F#30. But the only thing we changed is we did some normalization first, that's it. So just normalizing is all you need to do. >>: You said the quoted language does not support [indiscernible] of any sort. Could we have that dropped in kind of feature where you can have -- providing any -- not expression, but any value passed in expanded in place. If I got that correctly. >> Philip Wadler: Right. You're referring to the lift operator that would convert an integer to an Expr of integer or string. >>: Yes. >> Philip Wadler: Right. >>: So in that case, is it -- well, is it the case ->> Philip Wadler: But that's a value. You use recursion to compute the value but then you just have the value. >>: What about -- a function itself? Sorry if I misunderstood. >> Philip Wadler: >>: Okay. Ah, lift applies to base types, not to functions. So -- >> Philip Wadler: Functions you must start with quoted code. You can't take an arbitrary function and turn it into a quoted expression tree that describes that function. >>: Okay. Thank you. >>: Did you have a question? >>: I'll ask afterwards. >>: All right. [applause] Well, let's thank the speaker again.

Document 17864769

Related documents

Products

Support

Document 17864769

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib