>> Nikhil Swamy: I'd like to welcome Adam Chlipala to MSR. Adam's a post-doc at Harvard, and he's going to talk to us today about his language per web which is about metaprogramming AJAX apps with static types. >> Adam Chlipala: Thanks. So I'm going to be talking about a new programming language that makes it easier to build the correct web applications, and I want to start by reviewing what the main challenges are in building web applications today. In the beginning of the web, things were pretty simple. All it was was clients asking web servers to give them particular static pages. And the -- when you're building a website the model you might have in mind is that your website is a big graph with interconnections between your different pages. But unfortunately the actual implementation of the website is a bunch of HTML pages that encode links between pages using URLs as text. So it's possible to make a mistake in a URL or remove a page that's still being referenced from URL somewhere. And it's hard to have an idea of how change to your application is going to affect the overall structure and whether you'll make it invalid somehow. So already there's an opportunity for using some higher level language structure to make it easier to build the correct applications. Then we come to the use of CGI scripts which is a way for a request from a client to be serviced by having the web server spawn some external process, tell it what the client request says, and have that process decide which page should be returned to the client. And now things get even harder because there's that same idea of you have a big web of pages that you need to keep consistent but now it's sort of an infinite graph because you can have different pages based on different inputs that the client might provide to your application. And that can take the form of for instance a forum website where users can come and post messages that everyone else can see. And maybe there's a mischievous user who says that the body of his message contains some HTML, and maybe you didn't implement your application defensively enough and this HTML just gets transplanted directly into the page output that you showed to other users visiting your website. And maybe this has some unintended side effect. And that could include things like including Javascript code in that HTML which essentially would let users cause your website to tell other users web browsers to run arbitrary keyed. This is what's called a code injection attack, where some insufficiently filtered input from the user is treated directly as a program to be evaluated. So it would be nice to have ways of being sure that your program doesn't have this kind of vulnerability in the. And it's also common for web applications to use relational databases as persistent back ends where they store their data. And for this to work, your application has to have in mind a view of what actually the database is, including when tables there are and which columns they have. And the database has its own version of that information. And if these two get out of sync then you can run into some trouble, including some potential security violations. And this is what's ended up being called the impedence mismatch problem in interfacing systems like this. And the way that this kind of interface has traditionally been implemented is just that the -- your application constructs queries and commands of strings and just sends those over to the database which interprets them as programs. And if you don't take sufficient care in constructing these strings, there are other possibilities for attacks where maybe the user comes up with a clever string to enter in one of your forums and because you're taking his input and including it in your SQL query but you forgot to say escape the string properly, then the user finds a way to be able to run arbitrary SQL commands as your web application. And this can have bad consequences for the integrity of your data. So things get even more complicated when we move into what's called web 2.0 today, which is a movement to have more of the structure of an application run inside the client web browser using Javascript so that there's not so much of every action causes a page refresh and takes a long time to reload the entire page. So what might happen is that things start out as in the old way where a page is returned to the client and it might contain some Javascript code that every once in a while decides it once to make a remote procedure call to the server to ask it to change something or query some information so this call is traditionally implemented with a serialization format like XML, and web server comes up with a response to your query and the client then needs to use that information to perform some imperative modifications to the in memory representation of the tree structure of the page that's being displayed. So this whole loop of making asynchronous queries and Then updating the structure of the page without reloading the whole page is what's ended up being called the AJAX application style. You might want to go even further and for applications like a mail client you might want to let it be the server that initiates some kind of communication, asynchronously. Maybe the server wants to tell the client that some new piece of mail is available to be displayed. But unfortunately with the Internet today you can't really directly implement this connection from the server to the client because a lot of clients are behind firewalls that will disallow that kind of connection. So in practice you need to have some convoluted, inverted way of implementing this where the client makes a HTTP request that is usually referred to as long-polling which says let me know when the next event happens, and both the client and the server are expecting that this connection might sit idle for a long time before anything interesting actually happens, and when something does happen the server can just return a response just like in the AJAX case. And then the client is responsible for reinitiating this connection. And implementing that manually can be complicated enough, but maybe your application is actually dynamically generating some Javascript code that's dependent on the client request and that just makes getting this right even harder. So these examples show a bunch of different ways that it's possible to make a mistake in crafting web application. There are -- there's this common pattern of taking the easy way out and implementing some kinds of interaction by interpreting strings as code at runtime, including for HTML, SQL, Javascript or the formats like XML that are used for these RPC messages. And wherever you're interpreting strings as code, there's always a chance for something unexpected to make its way into the interpreter. And there are a bunch of different protocols and conventions for manipulating the structure of a document and using different styles of communication between the client and the server where it's easy to make mistakes that have security consequences or just make it harder to get your program to a working state. There have been a bunch of solutions proposed recently that try to give you complete toolkits for building web applications. Ruby on Rails and Django are two of the most popular ones, which are based on some of the most common dynamic scripting languages. They provide libraries to encapsulate some of the details of this functionality. But these are dynamically typed languages, and there's still a lot of interpreting strings as code and pretty ad hoc ways. So even if you test your application thoroughly, it's hard to know that it really respects all the abstractions that you'd like it to. To help get around that problem, there have been a bunch of recent protocols of languages that use static type systems to rule out some of the potential problems, and Links in Ocsigen are two examples in this category. These are languages inspired by functional programming in the tradition of Haskell and ML that used type systems and more explicit formal representations to rule out some of these problems. And there's also the LINQ system which handles, among other things, the impedence mismatched problem for communicating with a database. I think there are a few areas that are mostly neglected in this kind of research and one of them is figuring out what are the right interaction and modularity techniques for structuring web application to make it easy to read the code and convince yourself that it has certain properties, much in the same way that we have come to encapsulate definitions of abstract data types inside of classes or modules so that we can read just a small amount of code and know something very definite about how that data structure behaves. For instance, you might want to take the total structure of your website and chop off a little chunk of it and be able to be sure that the rest of the application only interacts with that part that you chopped off in very well defined ways. Maybe any links that go into it go through a particular formally defined interface in the form of the type signature. And so you can sort of normally or informally prove theorems to yourself about what this part of your application could actually do in practice. You also might want to identify some chunks of your database that should be thought of as belonging to particular modules, and you'd like to know that no one ever touches those parts of the database except by going through a particular well defined code interface that has a static type signature. And this lets you be able to enforce invariants about the constants of the database in much the same way that we're used to doing for abstract data types in tradition programming. And it's also useful to think about the dynamic structure of a web page in the client and wanting to be able to designate a subtree of that page that you could think of as belonging to a particular module of the application, and you'd like to know that none of the other pieces of your application can directly change that part of the page, but rather they'd have to go through a statically enforced interface to do so. So the language that I'm going to be telling you about, Ur/Web, is based on ideas from ML and Haskell, and it imports a bunch of the common abstraction and mad later features from those languages, and the big one for enabling this kind of modular reasoning is the ideas from the ML module system, which can be used for idioms like placing a database inside a module or placing a bunch of pages syntactically inside a module and using interfaces to control all the ways that they can be accessed from the outside. And the other main way that frameworks like Rudy on Rails and Django make life easier for programmers has been largely ignored in systems based on static types, and that's metaprogramming. One of the most popular features of rails is something called scaffolding, where let's say that you have a new application that's based around a particular database table, and you want to get started right away building your application once you decide what the table looks like. So you call a command line code generator application, and you pass it short description of what each of the columns in your table is, and what its type is, and that generator builds a standard directory structure for your initial application that has among it some files of Ruby source code. And you can start using this right away. And it gives you a standard interface for things like listing the contents of your table and adding new rows and updating your deleting rows. And this turns out to be really popular and useful among mainstream web programming. But you could probably see this has some of the usual disadvantaging of ad hoc code generation. Let's say that there's a new version published of the code generator that adds some features that you'd like to have in your application. But you took the output of the old generator and you just started hacking on it and adding your own new customizations. And so that gives you no really easy way to merge in the changes from the new code generator with your custom changes that you implemented in the source file. And it's also really hard to implement one of these generators correctly if all it's doing is manipulating sources code strings in an unprincipled way without much static checking. So here's what we'd really like, or at least what I'd really like. It's the sort of the same basic picture, accept the thing that come out in the end isn't just a directory structure with some text, it's a first class value in your programming language that stands for a piece of the application. And this might be a function or a module in the ML and Haskell kind of view of the world. And this thing that comes out should have a static type that guarantees that it's free of certain kinds of abstraction violations like code rejection attacks or dangling links and so on. Even better, we'd like the code generator to have a static type that guarantees that no matter which input you give it, if that input is well typed, then the thing that comes out is guaranteed to be free of any of this kind of application flaw. And since I'm used to functional programming, I'd like to go even further and be able to do -- write a higher order code generator that is parameterized on another piece of code that you might think of as a generator, maybe encapsulating some common pattern that's still specific to a particular application. And this should all work pretty naturally. So to make it possible to write this kind of program with static types, I've drawn on some features that are mostly associated with the [inaudible] type programming languages like Coq and Agda. But from my perspective it's not the dependent types feature that's so important where a type can contain a runtime value in it, but it's more the idea of type level computation where a type can be a program that computes a rich functional relationship between the properties of an input to a function and the properties that should hold to the output. And I'll be able to give you some more examples of what that means in this context and why it's useful a little later. So my proposed solution is based on a new language, which is more or less a general purpose language, though I haven't used it for anything beyond web application so far. It's called Ur. It has some abstraction and modularity features that are useful in a bunch of different domains. But on top of that I add a library with some support for some web specific things. There's and encoding of HTML such that static types can validate the structure of your documents, and you can embed fragments of Ur code in documents for things like the event handler on a button for a code that should fire if the button gets clicked. There's support for client-side AJAX style programming without any explicit Javascript or XML and support for the other server push model of programming without any XML or without any explicit polling of the kind that I described that must happen under the hood. And there's also a sort of orthogonal interface to an SQL database that uses static types to ensure that the communication is done properly. And all of this you can think of as a special standard library for Ur that turns it into a domain specific language for web programming. And the implementation of the compiler for this language works by starting out with a parser that's specific to the web language that understands some notations like HTML literal syntax and the syntax of SQL queries. And that's passed off to a type inference engine that's completely generic. It only deals with the Ur language, and all these other features are encoded in terms of an express and type system. And then there are optimization and code generation phases that are specific to this web version of the language. And in the end the different parts of the code end up being run on the server side or the client side, and with the first kind their output is native code and the second kind they end up as Javascript that's embedded in pages that the server side piece returns to clients. So let me give you an overview of the features of that general language, Ur, that make it possible to encode these libraries effectively. The main novel features are related to type level computation and one of the biggest of those matters is a sport for a type level records. So here's an example of a type-level record. It looks sort of like ML with syntax with names attached to each element on the list. This is a record that you could think of as a mapping from names to types and the name A is mapped to the type int, and the name B is mapped to the type float. So it's fairly standard looking. But we might also think about having a record of functions over types. So this is where the type level computation really comes in. We might have a record that says the name C has the value of a function that maps a type T to itself, and name D has the value of a function and maps a type T to a pair type with a -both the first and second components of the pair are T. And we can also have records of other data structures built up from types including pairs of types. All these different patterns end up being useful in developing and describing web applications. >>: Is there a distinction between the [inaudible]? >> Adam Chlipala: So the far notation is a description of other values like the star type describes runtime values whereas the comma is used for giving -there's an overlap between compiled time of a layer that for some things describe the runtime layer but some things are just data on their own. And the star is for describing the runtime layer and the comma is for describing stand alone pieces of data that are only used at compile time. I don't know if that explained it well enough. But it's -- it's -- this is basically like system F Omega if you're familiar with that. And the star is from building some type and the comma is for building [inaudible] record. I don't know if that does it. Okay. All right. And one of the probably the most obvious thing to do with one of these type-level records is use it to build a type that describes a record that actually exists at runtime. And we can do that with the dollar sign operator which takes a record of types and changes it or injects it into the set of types. And one of these types is standing for runtime records where the type of each field is described by whatever that field got mapped to in the record that you passed to the dollar sign. Yeah? >>: [inaudible] comma string [inaudible] record? [inaudible] comma string? >> Adam Chlipala: It's -- that's a tuple which is different from a record because you know ahead of time what all the fields are, whereas the record can vary. All right. So the usual ML syntax for a record type is really just a shorthand for in the case of this example in the bottom line taking the record on the top line and applying the dollar sign operator to inject it into -- to translate it from a record of types into a type. And concatenate records inside our types and we might think about running a concatenation like this one that wants to combine two records that both map the name A types but they disagree in which type that is. And it takes out to make type inference work well it's convenient to rule out concatenations like this one. So the type system is going to enforce that you can't concatenate two records that share any field names. So that one is not legal. And another important ingredient in the kind of encodings that I'll be showing you is a type level map function which works just like map in usual functional programming, except it's running only at compile time, sort of like the crazy things that happen in C++ templates, but in a more principled kind of way. So here's and example use of map that you might apply to the first record in the first line of this slide. If you have a record of types you might apply a function that transforms it by walking through and changing every type to a function type that uses the original type as both the domain and the range. You can also think about mapping over the second example in the second line of this slide where you have a record of type functions, and you can step through each of these and replace each of those functions with what you get when you call that function on the type int. So if we apply this function to that second line of the slide we get a record that maps C to int and maps D to int star int. Yeah? >>: [inaudible] can you explain this map line again. >> Adam Chlipala: Okay. So. >>: So one is R and one is a record? >> Adam Chlipala: Yes. So I'm thinking of R 1 that's some record that's like the example on the top line of the slide that is an association of names with types. And so we're going to step through the record. And for each one of those types, we're going to replace it with what we get when we apply this function to the type. >>: Oh, so this is just like something that you apply on the syntax itself. It's got nothing to do with like runtime computation? >> Adam Chlipala: Right. Everything on this slide is a separate compile time only language that's only there for describing typing constraints. >>: [inaudible]. >> Adam Chlipala: And this is standard in programming with dependent types. But the difference here is that I'm trying to make it a little easier to use this over really programming. All right. So we could also apply, we could also do a map over that third line in the record constant examples where we have pairs of types and we can slip over each of those and pull out the two pieces of the pair and use them to form a new function type. So now we can move on to the actual runtime part of the language which is closer to what functional programmers are usually familiar with. We can write record constants in the same old ML style way. Their types are implemented using this more general machinery, but you can more or less ignore that if you don't need to use that stuff directly. You can project out fields in the standard way. And we can also concatenate together to runtime records with analogous syntax to that from the previous slide. We can cut out fields of records to form record with all the fields except that one. We might also want to concatenate two records that share a field name with -- in a similar way to on the previous slide. But for the same reason I'm going to rule out concatenations that reuse field names because it makes type inference trickier. So that one's no good. We might want to write a function that abstracts over the idea of taking a record and adding the mapping A equals 0 to it. So here is a way of expressing that using polymorphism. We have function that binds a type variable, FS, and we write formal arguments that are typed variables inside the square brackets. Bind the type variable FS, that stands for a set of fields, and the types that they are mapped to, also called a row type a lot of the time in programming language theory. And we say that there's a value level argument R, that is a record, whose fields are scripted by that type variable that we abstracted over. So if FS is empty, then R is an empty record, and if FS says A -- says B equals float, then R is a record in field type B of type float. Sorry, one field A and B that has a value of type float. And then we can just take R and concatenate this new binding A equals 0 on to it. Yes? >>: The thread-modular sign and operator is [inaudible] or is that a different name [inaudible]. >> Adam Chlipala: It's and operator. It's the operator that injects a record of types into the set of types. So the problem -- there's a problem with this definition, which is that there's no -- yes? >>: I have a question. So usual does ML [inaudible] it provides something like this you have a polymorphic type argument and then you can just use the type argument in the place of a regular type. But here you put a dollar before the type. >> Adam Chlipala: Right. >>: Why are you doing that? >> Adam Chlipala: Because FS isn't a type, it's a record of types, and you can turn a record of types into a type by saying build the type that figures out its fields by consulting this record. >>: It's a record of types? Okay. >> Adam Chlipala: Should I say more about that or ->>: Yes. >> Adam Chlipala: Okay. So problem with this definition is that there's nothing here to force FS not to already include a mapping for A. And I said that we don't want to allow clashes for field names. So this definition can't actually be accepted directly. Instead we can give this alternate definition which is like the previous line but I add an exclusive constraint, this piece here, which gives essentially two different records and asserts that they must not share any field names. So this constraint is saying that A should not be among the fields that are used in FS. And with that, the type checker's able to verify that this concatenation doesn't induce any conflicts on the record field names. >>: [inaudible] framework developers using this [inaudible]. >> Adam Chlipala: I envision web application developers calling functions that have these types and not really knowing what's happening but being glad that it works. So this function can be assigned a static type, and it's a type that says for all FS, which are records of types as indicated by taking type and putting curly braces around it when this constraint is satisfied then you can have a function of this type which sort of once you get used to the notation it expresses that you're starting out with a record that has field FS, and then you end up with a record that has more fields where you -- particularly you added A equals int on that record. There's another useful feature that works well in conjunction with this kind of thing, which is first class names. Here's a function that is polymorphi8c over the name of a record field. That's this argument, NM here. So it's a function that works for any name. As long as the constraint is satisfied that that name is not equal to A, and if that's true, let's build a two field record that takes the name you asked for and assigns that to value zero and assigns A to value 1.0. >>: [inaudible]. >> Adam Chlipala: It's type inference because it's used as a name here. There's a more explicit syntax if you want to make it clear for the reader. >>: [inaudible]. >> Adam Chlipala: Yeah. All this is generalizes the recursion, a recursive [inaudible] of types of types, otherwise known as kinds. And this function has a type which says for all names NM when you know that NM is not equal to A, then you can produce a record of this kind which is normal type except instead of a known field we see the name variable NM appearing. We can also write a function that is polymorphic and implements the usual record projection operator and makes that a first-class function. We can do that by saying this is parameterized over three different type parameters, a field name, the type of that field, and the names and types of all the other fields. We need to know that this name does not overlap with any of those fields. And if so, we can take in a record whose fields are everything in this list that we passed in plus a mapping from the field we asked for to the type we asked for. And if that's true, then by looking at this, it's syntactically patent that the field NM belongs to R so we can just project that field out of R. >>: By the way, we are still trying to define function that [inaudible] syntax, right. Are we defining function that do computation at runtime? >> Adam Chlipala: This slide is runtime computations. >>: Okay. >> Adam Chlipala: Sorry. I think of expressions as saying that, but I realize that's not a universal convention. So the type of this function it says here these three type variables that were the first one's a name, the second one's a type, the last one is a record of types. We have this same destroyedness constrain like before and then a type that expresses that we're looking inside this record, seeing what is associated with name, and that type is what the function returns. And it's pretty easy to call a function like this. It turns out that from the formula that we can usually deduce by inference what the values of T and FS are, but the value of NM should be specified explicitly, otherwise we might project out the wrong field that just happens to have the type that we're looking for. So we can write and explicit type of argument inside brackets. So here we pass pound B which is the first class value standing for the name B, and we can pass a record as the additional argument and type inference figures out the rest of the details and automatically figures out which constraints need to be proved and proves them for us. >>: So the FS really is like a [inaudible] polymorphism that you might find [inaudible]. >> Adam Chlipala: So I'm more familiar with the [inaudible] for an example. I'm not sure what path it lies in. Definitely not surprised if they have something like this in DAC. >>: And it seems like the additional thing you're providing here is first class record names, the name itself you can [inaudible]. >> Adam Chlipala: Right. It's the record names and ->>: Record field names. >> Adam Chlipala: Yeah. Field names and the ability to write maps over records at the type level. And these destroyedness constraints you can maybe think of as just necessary for supporting those features where it's something separate, but those are the main, the syntactic differences that you see compared to [inaudible]. All right. So I know this is sort of an abstract. So now I can factual show you a demo in a web browser that hopefully expresses why these are useful features to have. Obviously I have to start out with the hello world application. This is application that just acts like a static page. And here is the source file for it. This is demonstrating the basic features of Ur web that handle the structure applications explicitly so that the type system can make sure you don't mess that up. This is just an ML style function definition that returns a piece of XML syntax included in line in the page. And this syntax is parsed and type checked to make sure that you don't use tags in the wrong places or that it you don't use a tag with a wrong attribute name or other properties like that. We can build and application with a link, which doesn't sound that exciting, but at least the compiler is using type checking to make sure that there are no dangling links or malformed URLs in our application. And the way you build this application is you can just write two different function definitions. The second one, which is where we started, uses the normal HTML A tag but instead of an [inaudible] with a string, it uses an alternate link attribute that contains and expression to evaluate when that link is clicked to generate the resulting page, and that just calls up to this target function, which is the second page that we visited. >>: So I have a question. Do we sometimes [inaudible] click on links doesn't go to new page, goes into the same page. That's a special [inaudible] it goes to somewhere else on the same page. >> Adam Chlipala: So you mean like [inaudible]. >>: Okay. So that's a different kind of link then. You can define that also. >> Adam Chlipala: So there's this concept anchorage in HTML where you can have links inside a page. I haven't actually implemented that yet in this system. There's probably a current way of having it ->>: [inaudible] we have this problem that way. We have like we kind of [inaudible] current links. But problems like this like sometimes we [inaudible]. >> Adam Chlipala: Oh, I should probably not have hidden the location bar actually I need -- let me bring this up in a separate browser window so you can see what it looks like if I unhide the browser bar. So everything is based around an ML module system, so module paths are used as unique identifiers to pages and that's when you see up here, demo is the enclosing module for the whole demo, link the is the source file that we were just looking at, and name is the name of a function within there. And you can have arbitrary nesting from submodules. >>: Okay. So if you [inaudible]. >>: So are you explaining that every -- the notion of A being generated is also coupled with the identifier of the page so you [inaudible] once the page is generated, you can type that either in that box and go straight to that page? Is that what you mean? >> Adam Chlipala: Yes. The way the web server works is it parses the part of the URL and treats it as a module path inside your program and finds the right function that's named by that path and runs it. There will be an example later, but what happens when you want -- what that dynamic page takes arguments. Those will just appear as extra parts of the URL. They'll sort of look like descending further into a file system tree. And they'll be parse correct based function arguments. >>: So when you [inaudible] does that change your target? >> Adam Chlipala: Yes. Little things like pretty URLs are often underlooked in research projects, but I wanted to do that here. Yeah? >>: How does the function -- does the system decide to treat a function as a [inaudible] rather than just a ->> Adam Chlipala: It has a type -- let me show you the type of this function. Just let me go back to full screen mode. So the type -- this is the signature of the application. It says that made is a function from no arguments to a transaction that produces a page. And these are the things that are true to these pages. >>: So it's based on return type? >> Adam Chlipala: Yes. And you can -- if you assume things that are pages but you don't want to advertise them as part of the interface, you can just leave them out of the signature. They'll still be accessible if they're called from other pages. >>: So the type of target here was also [inaudible] transaction page? >> Adam Chlipala: Yes. There will be more complicated examples three demos from here. Okay. So we can write recursive loops between pages without much trouble. This is always fun. And all this is is two mutually recursive functions with sort of ML style syntax for the mutual recursion. Main calls -- yes? >>: Question. But how did you know that there are only two pages, why doesn't the descent generate more and more pages? >> Adam Chlipala: Because a page is identified -- each page stands for a pointer to a particular location in the source code, namely one of these function definitions. And the source code's only finite, so you're not going to end up with ->>: Oh, a page is not like a dynamic value? >> Adam Chlipala: A page is a function in the source code plus arguments to it. In this case, there are no arguments. So it was just a finite set. I will shortly have an example where there are arguments so you can think of it as having infinitely many pages. And that's one of the examples people like using the most to see how far they can make it go. But it keeps going. >>: So the previous example was type [inaudible] transaction page and that was all [inaudible] if we knew the URL was [inaudible] could we just go there? Would that work? >> Adam Chlipala: Yeah, that works fine. You can make all sorts of decisions on how much work you invest to keep the user from diverging from the flow through the application that you could actually get from starting from the first page. One of those things I don't work too hard to guarantee. They're just a few kinds of mistakes like that that are associated with common security vulnerabilities that there's special code to deal with. Like there's something called cross side request forgery where -- so you post on a forum a link to a well known bank website that says something like transfer all my money to this guy over here and users on the forum think it's a link to a funny picture of a kitten, they click on it, and their money gets transferred. There's a use of cryptography in signing to prevent that particular kind of thing. But mostly if you know the URL, you can -- if you know the URL, you could hit any page that has no side effects. And the compiler enforces that you really understand which pages don't have side effects. All right. So here's a page that actually involves conceptually into many pages. We can just keep stepping through incrementing a counter, and there's no server side state that tracks how far along the counter is. This is all stored in the client. And the way you can write this is as a function counter that takes one argument and which is where the counter stands. You can return page that injects the current value of the counter and has two links, one to a recursive call where the counter is increased by a one and one where the counter is decreased by a one. And may be it's useful to bring up, do this where you can see what the URL is. All that's happening is the argument of the function ends up automatically centralized and decentralized from the end of the URL. >>: I have a question but might be just obvious but you're taking a collection of web pages as a reactive program, right, and which is expecting input from the user when I want to clip something. And as a result of that, the state of the program changes in the sense that it's sort of program counter moves from one controlled location to another, and then the gain waits for an input, right? Is that how I should think of it? >> Adam Chlipala: Yes except there's this property that the user can always guess URLs and jump to a different part of the state graph if these clever enough. But mostly that's how it will work. >>: I see. >>: The [inaudible]. >> Adam Chlipala: Right. The state lives only on the client, and the client can lie about its state if it -- yeah, if the user knows how to pick a URL that is actually valid for the application but is not where it should really be going next. >>: I see. So for example like when one set of focus of the user's on one page, that page has lots of [inaudible] on it [inaudible] links. Now, the computation implement of clicking on a link means that the function corresponding to that page is going to be executed with a particular argument? Is that how I should think of it? >>: Yes. >>: Okay. Good. >> Adam Chlipala: All right. And here's an example, a more interactive page. This is a silly demo that just shows echoing back what the user enters into a forum. And it's pretty issue to write this in a way where we're sure that the form actually is matched with its handler typewise. So let's start with the main function. It returns in HTML form that starts out looking pretty normal. There are two differences from the usual way of writing HTML or from actual HTML. This is sort of a stylized superset. Each of the inputs like this text box here, instead of having an attribute name equals whatever has a escaped piece of Ur syntax which says the name of this widget is the field name A. And so on for the widgets. And there's a submit button that has a attribute that says the action for this button is the handler function. So when you click this button, run this function on the values of the widgets. And we can look at the handler function see it takes a record R as in argument and it can just project out the three widgets, the value of the three widgets that we used, and each one of these has the appropriate type. In particular, the two that came from text boxes are strings and the one that came from a check box is boolean. And we're going to be sure that this doesn't -- this code doesn't erroneously try to use a field that we didn't define or it doesn't try to use one of those fields of a different type than the widget actually produces. >>: [inaudible] doing any analysis on these strings that are [inaudible]. >> Adam Chlipala: No. Any string is [inaudible]. >>: All right. Are you going to show us what it looks like not every string [inaudible]. >> Adam Chlipala: Well, if not every string is allowed then you just complain if you don't like the string that comes back. There are -- or you can have client side code that watches the strings and complains in realtime as you don't like them. But the -- what HTML gives you doesn't really provide a more general way of constraining strings. >>: So [inaudible] if you had some [inaudible] constraining the strings. >> Adam Chlipala: I can really only do what the browser lets me do and I don't know another way to do it. >>: So a question [inaudible]. If you went back to that forum [inaudible] field day you supplied angle slash TD, now what you were saying before was that handler will always return in XML, that XML will be well formed but I can break that perform by inserting that [inaudible] basically. >> Adam Chlipala: You could if the syntax here other than what it actually means. But the disinsertion code is abstraction preserving -- your string is not interpreted as HTML, it shows up as a string. >>: I think that -- I think that's the answer to the previous question is that any strings [inaudible] but it isn't escaped by your injection syntax. >>: So what you're saying is that did the problem of taking care of making sure that these strings are valid but they [inaudible] it's not [inaudible] problem, somebody else has to take care of that problem with either [inaudible] some other static checking on strings that come in and things like that. >>: Yes. >> Adam Chlipala: Although I think in practice what this means is your form recedes the string and you just write a function that inspects the string and decides if you like it or not. It's not really a very complicated procedure. And at the same time it's hard to think of how to do better than that. >>: So for example like I'm not -- I don't really work in this area but I keep hearing about people arguing about all the [inaudible]. >> Adam Chlipala: Yes. >>: So how does that fit into this story? >> Adam Chlipala: So the previous question sort of is suggesting that someone was asking if the user appears a in one of these text boxes that has some HTML and you've got to display what he entered, does that get entered as HTML and maybe run some Javascript code? And the answer is that there's this quoting and anti quoting syntax that always make sure that strings never get interpreted as code. >>: I see. So the solution of that you program in work and then you -- whenever there's some interaction with the user in which he can provide strings you write some chain code that makes sure that the [inaudible] implement that? >> Adam Chlipala: And actually the type system enforces that you can never forget to do that properly. >>: So my understanding is if the user entered a closed slash TB in the text box A, the runtime takes care of when it constructs the HTML page in response the runtime takes care of escaping the [inaudible] tag with an ampersand LT and so on, that it's not actually [inaudible]. >> Adam Chlipala: Yes. >>: [inaudible]. But maybe that gets a lot of [inaudible] for novice developers that actually does some kind of [inaudible] where in MySpace put in actually they tried to [inaudible] you go to web page [inaudible] virus and then you as you last will try to [inaudible] somebody else, and then you inject the virus on to your own web page. The next time somebody goes to your MySpace page they will get the attack. It just keeps on getting through. So the problem there was actually it's fine to put in on the server side this [inaudible] code itself [inaudible] where MySpace put it in their coding, and the reality they forgot to filter some stuff out and so that based on some fancy CSS encoding stuff they still found a way through. So like partially the people try to do this, that's not actually addressing the problem. >> Adam Chlipala: I think that it doesn't address the problem if you want to allow users to provide their own content using as much of HTML as possible. But if you're willing to pretend it's 1995 in your validation, then it's pretty easy to rule out that kind of attack. And some people will be disappointed but you at least won't have any code injections. And you can gradually add more and more of that as you convince yourself you wrote your parser correctly. >>: [inaudible]. >> Adam Chlipala: Okay. All right. And then let me just show you briefly what looks like to interface to SQL database. Here's a interface for manipulating in SQL table. You can add a row to our table and delete. That's all this demo has. And I don't want to look -- whoops. Look in too much detail at the code, but I just wanted to point out here's a piece of SQL syntax that's included inline in the code, and that's type checked and verified to match the actual scheme of the table which you actually define up here using a special declaration form just like you were defining a function and you give a record type that maps each one of the fields otherwise known as columns of your table into their types. And the -- and you can write SQL online for querying from the table. Here's some inline SQL for inserting the table with some anti quotes for injecting Ur values into your command. And here's some delete command just off the bottom there. And all of this is type checked to match up with the schema. And when the application starts, it reads the database's system catalog to check that the tables really have the types you said they would have. >>: [inaudible]. >> Adam Chlipala: That's more work to separate out your queries in a way. >>: [inaudible]. >> Adam Chlipala: Yeah. It just makes it cleaner to -- you can write what you really meant and not what was convenient for the implementer of the database library. And writing a version with question marks in the parameters means less direct than this. >>: [inaudible]. >> Adam Chlipala: I guess compared to anything with dynamic typing you can be sure that you don't mismatch the types of the parameters to your prepared statement statically. >>: [inaudible]. >> Adam Chlipala: Yes. That's this declaration. And you can declare module local tables inside say a output and they have the same properties as an abstract type inside an ML module. >>: [inaudible]. >> Adam Chlipala: The application is responsible for deciding what the database looks like, and the compiler outputs is SQL script you can run to make sure the database looks like that. >>: [inaudible] modified the database [inaudible]. >> Adam Chlipala: You'll get a startup time. The program starts, it queries the system catalog and checks that everything is as it expects. It's kind of with the way these usually implemented it's hard to do that in the compiler because you'd have to somehow clamp down the database and prevent it from ever changing which doesn't usually work. >>: [inaudible]. >> Adam Chlipala: Yeah. You can -- there can be an alternate version of the compiler that has a database and controls everything about it inside its own executable, and then there -- things will be more direct in some sense. But probably harder to administer because you'd have to rewrite all these high availability tools that the main database servers have. >>: So are you otherwise saying that [inaudible] given a descriptor of the database automatically generate a code that say provides you with access [inaudible]. >> Adam Chlipala: Yes. That's more or less the next thing. I guess I have theoretically one minute or so to do that. >>: [inaudible]. >> Adam Chlipala: So here's my first example of the kind of metaprogramming that I think is the most neglected thing in the other projects that try to use static types in this setting. This is a really simple form echoing application. The interesting thing about it is that it's using a generic component so that the full implementation of this application is just this source file which says call this component, I'd like you to build me a form echoing application that uses these names. There's an underlying record field named A that should be printed with the label tic and the same for B, C, and tac and toe. Yeah? >>: So again I want to just understand what you just said. So you're saying that this metaprogramming means that the program that is going to be executed is actually constructed initially by performance of computation? >> Adam Chlipala: Yes. >>: And then that constructive program executes ->> Adam Chlipala: This metaform.make is sort of like a function that is going to build your application for you by analyzing this structure. >>: And can you show me the constructed application? >> Adam Chlipala: Well, the code doesn't get output, it's all done internally inside the compiler so it's not easy for me to show it to you. >>: Okay. >> Adam Chlipala: It looks a lot like the earlier manual echos, if you were to expand it. >>: So can you say in English what that thing will do? >> Adam Chlipala: It's going to output an application that displays a form with one line for each of these record mappings, uses these strings as the labels of the widgets, and you click the submit button, it prints all the values you entered. >>: I [inaudible] the output, what I meant is that how is this metaform.make saying ->> Adam Chlipala: You want to know what algorithm it uses or ->>: Something like that, you know. How does it construct the program? >> Adam Chlipala: I was going to flash up the source code at some point. >>: Okay. Okay. All right. >> Adam Chlipala: Question? >>: Yes. You say that all of this is more than just [inaudible]. >> Adam Chlipala: Yes. So here is the typed signature of that component. So the first thing is this uses a richer type system than ML [inaudible]. And I guess that answer is short, but it sort of is the whole answer. It just details [inaudible]. >>: [inaudible]. >> Adam Chlipala: So here is -- let me explain what the arguments are. First, the interesting thing, two of these arguments weren't actually written explicitly in the application because they can be inferred from the others. So that's one extension over -- functar arguments can be inferred in some cases, so that makes it easier to write shorter programs. But if we want to be completely explicit, what we're giving is a -- is a typed variable FS of kind record of unit. Unit is like an ML. It's the kind that has just one value. So a record unit is essentially a set of names. So this parameter to the -- to the functar tells us which field name this application wants us to use. There's another value of type folder, FS, folder is a type that basically stands for permutations of the -- a set of field names. They fix an order of these names so if we would like to step through them in order, we can follow this instruction. So to implement this form that list all these fields in order, we need to use this folder to step through them for each field printed out. Finally, there's a record that gives a string name to each one of these fields. And that's expressed in a kind of unusual way. We would like to say somehow this is a record where it's a regular ML record type except every time has to use string. We can't make any other choices. And the way that that's expressed is by saying we start out with a record of unit values, in other words, a record with information and free values for each of the fields, and we replace each of these place holders with string by mapping over it. So we map over the record FS and we replace each of the values of each of the fields with string. In this way we say only string is allowed in our record. And it seems kind of a weird way to do it, but I couldn't come up with a more direct way that wasn't too specialized for this kind of example. >>: Would it be correct to say that the generated program could happen directly by the user at work? >> Adam Chlipala: Yeah. It's just like C++ templates. This is a time saving mechanism that doesn't increase expressivity in this instance. >>: I see. >> Adam Chlipala: And so this functar takes these things as input and it outputs a main function for application which returns a page when called. >>: [inaudible]. >> Adam Chlipala: Looks like this. I don't want to explain all the details of this, but let me just point out the main action here is generating the form and generating its handler. In each case we call library functions with names that begin with fold. These are different ways of stepping through all the fields in a record and modifying a functional accumulator as we do so. And these need a few type arguments to be specified explicitly. The details of doing that are a little bit more intricate than ML and Haskell programmers are used to. But a final code once you have a little practice reading it isn't all that dense and the total length of the file is pretty short and in the end we get a pretty strong static guarantee that this dynamic process never outputs a malformed application, which is I think a good -- reasonable price to pay for the extra type complexity. All right. >>: [inaudible] metaprogramming is not related to [inaudible] applications. So why doesn't everybody always do metaprogramming? Wouldn't it make sense? >> Adam Chlipala: Maybe a kind of snarky answer is that most people don't use metaprogramming because they don't believe in static types. Static types are so useful for making sure you don't screw it up, that they're kind of handicapping themselves. >>: [inaudible] generated programs, writing programs [inaudible]? >> Adam Chlipala: Well, another answer. >>: [inaudible]. >> Adam Chlipala: Another answer is the metaprogramming is already really popular in C++ templates and Grail's scaffolding and all sorts of other systems. But these are relatively clunky unprincipled interfaces compared to ways of doing it like this that are more inspired by type theory and coming up with a minimal set of constructs to get the job done while still getting these strong guarantees. So maybe the question is why don't more people use better programming systems that are more principled and less [inaudible]. >>: So in your mind metaprogramming is very tightly coupled with static typing? >> Adam Chlipala: No. I just think that static typing is so useful for metaprogramming that more -- fewer people would give up on trying to right metaprograms if they could -- were using static types to help them catch the bugs in their code generators. All right. So I guess I'm done describing this part. Here's an example of my version of the most common rails code generation. The generate which is called CRUD for create, read, update and delete. This is the standard admin interface generator. Here's a table we want to add rows to it, see which rows are in there now. Might want to be able to change the value of one of our existing rows or delete a row. Nothing very interesting there from a functionality standpoint. The interesting thing is that here's the file that implements this particular application. Let's define our table with its columns and types. And we call a functar, we tell it which table to build this application for. We give it a title to display in the title bar. And we give it a record of meta data for each of our columns of the table that express things like how should we format this -- values of this column when we displayed them, how should we render the widgets that take inputs for this column and so on. And in each of these cases, I use one of the substandards functions form the library for the default handling for a particular type. Yeah? >>: [inaudible] instead of just creating, filling out [inaudible] you can actually create a [inaudible] where you can actually create new tables and create new type of tables. >> Adam Chlipala: So first there are no dependent types here in the sense that there's a strict separation between runtime and compile time and compile time thing never mentions a runtime thing. The feature that's here is an interesting idea of type level computation. And there's no sport right now for creating tables dynamically. Tables are only defined at the top levels of modules. It would be possible to add a way of creating a local table, but I'm not sure how that interacts with the use of type level computation. You could do that in ML just as easily. You just wouldn't have as much information about the table type. Or am I missing some property? >>: I want to create a user interface for creating a table so I want to create an administrator panel for creating tables. >> Adam Chlipala: Yeah, I don't -- that would require actually using dependent types. >>: [inaudible] previous systems. >> Adam Chlipala: Right. Luckily that seems to be a relatively infrequent problem with applications that may want to do that. So this is domain specific [inaudible] I guess. >>: [inaudible]. >> Adam Chlipala: I'll just admit that that's outside the domain of this tool for now. >>: Do you think inherently or not? I mean it sounds like you should be able to do ->> Adam Chlipala: Well, it's easy to say well you can solve that problem if you just add dependent types. But it turns out to be more complicated than that in practice. Yeah? >>: [inaudible] could those be [inaudible] arbitrarily complicated expressions? >> Adam Chlipala: You mean this string constant? >>: Yeah. >> Adam Chlipala: This is just treated as a string. It's not interpreted. This is only for a display purposes. You could -- you mean put an expression here that calls an infinite looping function? >>: Yes. >> Adam Chlipala: That would work fine. It would be just like an ML. Your program wouldn't hang when you started it. >>: [inaudible] or runtime? >> Adam Chlipala: [inaudible] runtime. So you would be able to visit other pages that didn't need to use that code but when you [inaudible] it wouldn't run forever. I guess. >>: [inaudible] recursive type functions? >> Adam Chlipala: No. The only kind of recursion is this very simple math based recursion that isn't even explicitly recursive. It's important because when you're type checking you have to run programs and you don't want it to go forever. >>: I just want to understand the comment that mentioned some application and then he said okay [inaudible] and then he said oh, I don't have [inaudible] I can't express it. So I mean you can probably express it, but you cannot guarantee safety in the sense, right? You can write that down. >> Adam Chlipala: I could add features that make it possible to do that kind of thing but the system as it is now guarantees that every table access is done properly. I could add another SQL interface that lets you access a table with fewer guarantees, and then that will be more easily compatible with Leo asked about. But -- >>: What I don't understand is that are you saving that your programs that I [inaudible] crash? >> Adam Chlipala: It's the usual well typed programs don't go wrong theorem extended to work across whole client server interactions more or less. >>: [inaudible] program? Can I say insert X and then Y? >> Adam Chlipala: You can write an assertion. You can write code that fails if a particular boolean expression turns out to be false. >>: Right. But then I'm hoping your type system would be able to make sure that that boolean never is ->> Adam Chlipala: Right. So this isn't about making sure your application invariants are never ignored, this is about making sure that program doesn't crash by failing to follow the rules of some generic abstraction like you shouldn't be able to submit a form and have an input interpreted as an integer when it was really a string. >>: Okay. So now what I was wondering is that since you allow for that loophole where you're thinking crash, if X is not less than Y, why can't that same loophole be used to encode the application? Why do you need to [inaudible]. >> Adam Chlipala: For I guess I could add a library function that creates a table and does dynamic checking somehow in the way you seem to be suggesting. >>: [inaudible]. >> Adam Chlipala: It would have to be in the runtime system. It would have to be implemented outside the language. >>: I don't see your point though. I mean wouldn't you mean by program crash if X is less than Y some -- it's not claiming that the program is correct. What I don't understand is that -- I mean, unless -- so either his language is complete or it's not. If it is, then what Leo is asking for is that you're incomputable function, right, and he should be able to do that. >> Adam Chlipala: But it's not quite that simple because this isn't just about computing arithmetic functions, this is about interfacing with particular systems like the database server. And there are strong guarantees about how that interaction can be done. >>: Oh, I see. It's not -- so there are some input outputs that would be for [inaudible] okay. >>: [inaudible] actually very particular use case where I want to [inaudible] create tables like I want [inaudible] create like maybe add on fields to their form or something. And that's actually because it's domain specific and in this case it [inaudible]. I don't think actually [inaudible] that's why it's kind of excited about the possibility of being [inaudible]. >> Adam Chlipala: Let me just talk briefly about the interface of this component. I don't want to go into the details of this too much. But the interesting thing is here is where it's important to be able to write type level functions and map over them. Because I define what metadata is needed for each column as a function from a pair of types to a record type. The pair of types tells you the application's representation of this column and the database's representation of this column. One piece of metadata is a parse function that goes from the database's representations of the application's representation, is a function for computing changing the application wrap into a piece of HTML, there's a function for building a forum widget and a few other pieces. And the type of the metadata needed for the whole usage of the functar is expressed by taking a record of pairs of types standing for your whole said of columns and mapping this metadata function oar each of those pairs to form a record type with a dollar sign operator. And there's the functar signature. We see at non ML supported thing which is a constraint which is sort of like a precondition to using this functar, which if called this functar you must be able to prove that the column ID is not among the columns that you mentioned explicitly because we're going to use special handling for ID. And just one more thing. This has all been sort of web 1.0 kind of stuff. I could also build web 2.0 style the same sort of thing. Here's a way of viewing and updating a database where you can make changes that you batch locally before you tell the server anything. And then execute them all in a bunch. At no point during this is the page refreshing. This is all using the usual client sight AJAX stuff. And we can actually execute a batch of changes, we can click this update button to ask the server what the current table values are. And we can delete some of those and update and check that it really happened and so forth. And this application is built by a functar application just like in the previous case. It's really the same thing from this level of detail. >>: So you mentioned something at the beginning about this long polling kind of strategy to get data pushed from the [inaudible]. Could you use something like that here instead of having the [inaudible] update to get the changes to actually ->> Adam Chlipala: This is a good place to demonstrate that. I also have a -- the last ML on this is a chat application that uses along that server to client flow to notify of new messages. >>: [inaudible]. >> Adam Chlipala: Sure. Might not be able to explain it within the amount of time available, but okay. [inaudible] value goes update on click equals call this function to get the list of rows. That function is automatically RPCed to the server. And then set this client side variable with that list and I'm sure you'll like it. >>: [inaudible] you said RPC ->> Adam Chlipala: This is implemented using continuation passing style of program where it doesn't need to think about that. >>: So [inaudible]. >> Adam Chlipala: And the disk setting influences what the page looks like because it uses FRP to propagate things. >>: So that's the [inaudible]. >> Adam Chlipala: LSS is -- I'm always not sure what the usual standard terminology for FRP is. Here I'm calling it a source and the page is a signal which is a pure function source ->>: [inaudible]. >> Adam Chlipala: It's like half. All right. I guess this is the natural ending point. I can -- I don't know. I can keep showing a couple of MLs as long as we're allowed to keep the room and people have questions. >> Nikhil Swamy: It's probably time to wrap up. >> Adam Chlipala: I'm wrapped up. >> Nikhil Swamy: So let's thank you our speaker. [applause]