1 >> Tom Ball: Okay. Hello, everybody, and welcome. Thanks for coming to this talk. This work that Daniel Perelman is going to present is -- originated out of some discussions that Sumit Gulwani and I had with Daniel's advisor, Dan Grossman, from the University of Washington really about programmer productivity and ways to improve that. Come on in. So, yeah, Daniel will present Intellisense for the Twenty-First Century: Type-Directed Completion of Partial Expressions. So, Daniel, take it away. >> Daniel Perelman: So today I'm going to talk about a technology that all of you are familiar with and probably use on a day-to-day basis and how I think that it can be improved. As Tom said, I'm Daniel Perelman. This is joint work with Sumit Gulwani and Thomas Ball here at MSR and Dan Grossman over at U-Dub. So I'm going to start walking through what I think is a rather typical API discovery task. I'm going to be talking about a programmer working on the Paint.NET image editor who has an image and a size and wants to shrink this image to that size, but the programmer is new to the API and doesn't know what method does this. The programmer is most likely going to start searching through the API by just bringing up an Intellisense window, going to type in image. and just expect this to be a method and an image and just look for things that have to do with shrinking. It's going to sort of skim through this list, scroll down, page down, page down, and find that there's nothing to do with shrinking in this list. Now, there are a few things in this list that could possibly apply. There's a size and width. But you can't tell by looking at this, but those are read-only properties, so they aren't helpful in modifying the image. So now the programmer having failed to find what they want here is going to just probably just start looking through the other lists using Intellisense saying, oh, maybe it was under size, which it turns out doesn't make any sense because this is actually part of the .NET. [inaudible] library is not part of Paint.NET. And after that is probably going to try to start enumerating all of the static methods and hoping to find what they want somewhere in here. Just scrolling down, trying to figure out if any of these classes or name spaces sound like they have to do what they want to do. Say, oh, maybe it was static on document and not there either. At this point the programmer is probably getting a bit frustrated, wondering who designed this API so poorly and is going to run off to the Web to do a Web search for -- to try and see if someone on the Internet can tell them how to do this. So they want a Web search. None of these results have much to do with what they're trying to program. Now so at this point maybe they just try using a better search engine and still get no results. We'll try to few different search queries with different keywords and find nothing. As it turns out, they're not going to find anything because Paint.NET is a freeware project developed by a very small development team. So in order to find any information about it on the Internet, you'd have to send someone an e-mail or find them [inaudible] or something. 2 So the programmer could do that or continue searching through the list of methods. And eventually going through all the name spaces they'd find, oh, there's a least size action that's -nope, that's not it either. It can resize layers, but not documents. Going back you eventually come upon canvas size action, resize document. So this whole -- this was a lot of work. I mean, obviously it didn't take too much time just to scroll through all these options because this was a moderately small API and it came up moderately early alphabetically, so it wasn't too much trouble. But the programmer says I have a document in size. I want to shrink the document. I know there is a method to do this somewhere. I want to be able to tell the IDE this and get a result and just continue programming without having to spend this time on searching for the method manually. Current Intellisense is not designed for this task. It allows you to type things in left to right and gives you a complete alphabetic list of the next token based on what the receiver is and doesn't filter based on any particular information the programmer might have. It's reasonable since Intellisense is designed to help you type quickly, not find methods that you want to call for the most part. It just happens to get used for that. So my proposal is instead the programmer should be able to tell the IDE I want to call image.Shrink(newSize). The IDE says there is no method shrink, but here's these other methods that take an image new size and might be what you want. So this is saying this is a list of methods it found that take type document and size and also these other types. Because in this case the programmer probably didn't know a priori that they needed this anchor edge and a background color for when they're resizing the document. They weren't thinking of that. And so could come up with other options that turn out to have nothing to do with what the programmer is thinking. So the problem here is that numerous unfamiliar APIs are difficult to navigate. And as we have more and larger libraries, it's more common that programmers will have to deal with APIs that they are not familiar with and be able to find what they want do with them. And also this programmer wanted to shrink the document, but the method they wanted to call was resize document. So knowing the right names for API is part of the difficulty. So simply searching by name, which is what the current tools are helpful for, isn't as useful. So we have this programmer thought process. But what we want the IDE to be able to do is to support adding arguments to the query, reordering the arguments, renaming the method or even appending pattern expressions to arguments. That is, perhaps the programmer provided one argument, one value, but the method actually only takes a specific property of that value. We want to be able to offer these as options to the programmer. So there has been some prior work in doing type-based searches of code. The first one is Jungloids, which is a system for Java that's based on the concept of doing conversions from one type to another. The example is in the Eclipse IDE they have an IFile which is just a normal file reference to a Java file and they want to get a syntax tree which is [inaudible] Java file which is 3 represented as an AST node. To do this conversion they have this middle step of creating an ICompilationUnit that the programmer can't usually search for. So their tool will take these, the input type and the output type, and produce this code by searching through the API. So this is generating an entire expression at once. The actual tool they present uses the locals as the source types and the context as the return type, as a new type AST node, AST equals, and then call the tool. This is somewhat different from what I'm proposing because they're only talking about using one input value and not multiple, and I'm suggesting using multiple input values to create more complicated expressions. Secondly, there's isynth for Scala which also pulls in the locals and all the APIs around and tries to piece them together into different expressions. And this will create more complicated expressions, and they attempt to pare down the list of it by not returning results that violate contracts. This unfortunately doesn't seem to scale that well. They only present using it in APIs that have a few hundred methods. Lastly, there's the Hoogle tool for Haskell which allows you to just add a command line, type in a Haskell type, and it will return all of the functions from the Haskell library that have that type. This does not make a particular attempt to rank the results, and it also is searching by more or less exact type. It doesn't offer results that have other arguments that the user didn't think of. So in all what I'm suggesting that's different from these tools that I want to be able to present methods that have arguments the user didn't think of providing, I want to rank these methods in a way that's useful to the user, and I want the user to specify which arguments that they expect, which will help in filtering out which results are meaningful. So a proposal is that I want to make searches that work like code but are incomplete, partial expressions. So I have this example where the user knows that they're in a key-pressed event handler. There's some collection of keys that are down and they want to add information from the key event to that. To do this, the tool should be able to tell the user, oh, the right type you need is the key event key data. Also this collection needs to be told the key time info of when the key was pressed. Or going back to my running example, if the user wants to shrink an image, the program should be able to tell the user you need to call resize document, put in the image new size here, and then also need to know these anchor edge and background color arguments. The key here is that we are using types as the primary search key. That is, this search means find me a method that's a document and a size. So to write out these partial expressions, I'm running them like this with the question mark meaning we don't know what the method is and these curly braces to remind us that this is a subset of the possible arguments and they might not even be in the right order. So just in general the questions marks is a hole where something needs to be filled in. And I'd 4 like to emphasize that the search style complements Intellisense. In particular, I expect this to be useful for API discovery and not particularly useful for typing fast. So now I'm going to summarize my general approach of how this is going to be implemented. So the workflow of this is that the user is typing in some text into the IDE and this is somehow going to be parsed into a query. I have not worked on a parser, I do not have a particular UI in mind at the moment. So taking this method query and the static type context and a library interface in order to know which methods could be called, this will be combined into determining a result set of every possible method that could be called with those arguments. Then because this is going to tend to be a lot of methods, in order to make this useful, the results will be ranked using a heuristic ranking algorithm to present the method calls that the user most likely meant towards the top. To formalize the queries, I have this query language where the first two lines define a simple programming language where I have variable references, field lookups, and method calls. I'd like to highlight that I am writing instance methods with the receiver as the first argument, so instance and static methods work the same in this language just to simplify things. And then I have these versions with the tilde on top to indicate the partial expressions, that need something to be filled in, where I can either put -- add a dot question mark at the end of an expression or use a question mark and the curly braces as my method call saying we don't know what method we're calling. So an example here is that you could imagine the Visual Studio UI taking image.Shrink(newSize), recognizing that there is no method shrink, and converting into a query that looks like that. Now, to generate the results, we take the query, we have two arguments, so we just simply enumerate every method in the library that takes a document and a size or a size and a document. Now, you'll notice this would not actually find resize document because resize document had two other arguments. So we need to also consider there being additional arguments past the ones that the user specified. And, also, as I had said before, we want to be able to support the user not specifying the property of an argument. In this example we have a sting builder, an exception, and we want to -- we have some debug log or something, and we want to append the exception to the string builder and we want the tool to be able to recommend, oh, you meant to append the stack trace of the exception. And we can write this formally from the query language using rewriting rules. The first one says that we can just add a question mark onto our partial method calls saying, oh, there's another argument we didn't know about. And the bottom row is talking about what we can do with the dot question mark, either just ignore it, or we could make a method call. We call this -- this could be a static method call with one argument or an instance method call with zero arguments. Or we could append a field 5 lookup. And for those two you'll notice I left the dot question mark on to say there could be more than one. Now, this is going to end up with a lot of results, especially once I add on that rule saying that we can do this multiple times. In fact, a little work will show this actually gets you an infinite number of results, which would take an awful lot of time to read through. So you could simplify this just by querying a hard cutoff of applying these rules, one, two, three times, something like that. On the other hand, you would still end up with a lot of results, so we need some way to lengthen. In particular, here's an example of if you had a string and a string array, here's some results you might get. They vary between reasonable and what you probably meant to completely crazy and very unlikely to be what you meant. So some of these, like comparing the lengths to each other, which unless you're doing something very strange, is not what you meant, or comparing their types which always returns false, so certainly not what you meant. Or concatting them all together, or could use the string as the format specifier when converting the length of the array to a string. I actually didn't read the documentation, so I'm not even sure what that means. Or some more reasonable things. There's a few different string operations that make sense taking a string, an array of strings, can join the array together, you could format -- use the string as a format string, or you could split the string using the array [inaudible]. In the last case, you would need to specify the extra details of how you want the split to work. So in order to -- sorry. In order to order these, I'm going to discuss a few different heuristics for doing the sorting. As I was reading previous, I mentioned some semantic reasons why they don't make any sense. I can't expect my algorithm to understand the semantics of these methods, so instead I'm going to rely on heuristics that seem to work well in practice. The first heuristic I'm going to discuss is type distance. Since the concept of my search is that I'm searching for methods that are -- that match the types of the arguments I'm providing, I'm going to want to prefer methods that take the exact type I've provided. So if I provide a rectangle, methods that take a rectangle are probably more likely to have to do with what I'm doing than methods that take a shape, which in turn are much more likely to have what I'm doing than methods that take an object. So I'm going to write this metric of how for up the type DAG I have to go and use that to prefer -- prefer methods that takes types lower down on this DAG. Next, I had this concept of path expressions starting from the arguments the user provided, which certainly can be useful, as in the example I had before where I wanted to stack trace some expression. But as you put more and more of them on, you tend to get less related to the original argument. So I'm going to want to prefer results to have fewer of them. In this example, dot get type dot two string has very little to do with the original array of strings. It's unlikely to be what the user meant. The last heuristic I'm going to discuss is working at subtypes of strings. Strings get used for a lot of different uses, paths, font family names, icon names. And it would be nice if when the user 6 provides a fonts family name we don't recommend file operations. So I have this example on the bottom where the user asks a query for something to do with a font family name, and we want to tell them recommend create font and methods like that. In order to do this, in order for this to be useful, we also need this to be automatic for any subtype of string. That is, we can't hard code what a path is, what a font family name is, what an icon name is into the program. We need to be able to discover those automatically in any other subtypes we might not have thought of yet. In order to do this, I'm going to use an algorithm that's based on the Lackwit paper which did a similar paper for [inaudible] C. It's just a simple type inference algorithm where every method return type and parameter that involves a string gets a type variable assigned to it. We use basic dataflow and unification-based type inference in order to determine a set of subtypes of strings. For example, there's these two methods, get temp file and get file name, and they both get used at the first argument to change extension. Therefore, my algorithm can know that they have the same type. The problems with this method tend to derive from accidentally merging subtypes that are not actually the same, which means that we might present file methods when presented with a pallet name, but we still can avoid showing, say, font family or icon name related methods. So it's not so bad. Also I'd like to note that this takes some time but can be done in the background, so it doesn't need to be executed on every query. So I don't expect you to read this slide. This is just showing that we have some score that these metrics are combined into and we compute this score for each result on the result set and then sort the results, and that's the ranking mechanism. Now, I'm going to talk about the experiment that I've performed. I have not yet implemented a UI for this. So the experiment is completely quantitative results on existing code simulating using this algorithm on existing code. So I use -- did a binary analysis using the Phoenix compiler framework to read through several mature C# projects. For each call with at least two arguments, it took the call and produced a set of queries from it. For example, for this call to resize document, it would take each argument and make a query using each argument and then make a set of queries using each pair of arguments and a set of queries using each triplet of arguments, and it would evaluate how well each of these queries did at guessing the call, which is in this case resize document. The evaluation would be we have some ranked result list and we want to know what position number resize document shows up at. These are the projects that I used. They'll mature C# projects. It's a combination of both user facing and library code as well as a combination of close source and open source projects. 7 So here's the evaluation. On this graph the rank along the bottom is what position number the correct result showed up in. That is, the actual call we found in the code. And the rank -- so rank 1 means that using one of those queries -- that is knowing just the types of up to three of the arguments -- the actual call in the code -- the actual method call in the code showed up at the top of the list. That happened about 35 percent of the time. Moving across here up to 20 for the correct results showed up in the top 20 about 80 percent of the time. In other words, we can predict a method call from its arguments up to a set -- into a set of about 20 practical method calls about 80 percent of the time. I'm going to overlay on this graph a breakdown of separating instance calls only and static calls only. At the top we can see that instance calls do slightly better and it does somewhat notably worse on static calls. I'm not entirely sure why this is, but I suspect that it's because that searching through all of the static methods is simply a larger search space. You have a question? >>: This is an average over all of the different possible patterns you generated off the [inaudible]? >> Daniel Perelman: No, I'm taking the best. >>: [inaudible] >> Daniel Perelman: I'm taking whichever one of those did the best, which is admittedly somewhat unfair. >>: Did you find that certain patterns always did [inaudible] best or a small number of patterns were the best? >> Daniel Perelman: Well, in general we're going to expect that the pattern with the most -- that takes the most arguments is going to be the best, but I'm going to get into that more specifically in a moment. >>: It's a great question, but if some method takes a fu and two Ns, you want your query to involve the fu and not the two Ns. >>: Yeah, yeah, yeah. >>: But for this experiment you tried all subsets including ->> Daniel Perelman: The concept of this is that most likely one or two arguments the user is going to think of when thinking of what method they want to call are going to be ones that are going to be successful at finding the method. Unfortunately I can't automatically figure out which arguments the user would think of. But on the next slide I'm going to focus in on the results at just this 20 line where -- so first this is just a graph showing the calls that I -- how many calls I looked at of each number of arguments, the call that I'm combining the receiver into the argument set. So this 3 here means -- is the number of static method calls that took three arguments plus the number of instance method calls 8 that took a receiver in two other arguments. So now here this is looking at only queries that had one argument. So the type of one of the arguments was able to predict a three-argument method call 54 percent of the time. So that's requiring only one of the three arguments. If I then add in queries that have a second argument -that is, two of the types of the arguments -- I'm able to get up to getting the correct answer [inaudible] within the top 20 86 percent of the time. Now, if I throw in the third argument, so for our three-argument calls looking at the types of all of the arguments, I get practically no difference. So this is sort of supporting what I was saying before that few of the arguments are generally enough to decide which call we want to make. And that I wasn't necessarily gaining all that much by throwing in all three arguments. Now -- yes? >>: Can you go back one slide. What does the inverse of this look like in the way that no matter how many arguments you had [inaudible]? >> Daniel Perelman: So I don't go above three arguments because when I added four arguments to the graph it wasn't visible. And obviously once I have all three arguments, that's the best I can do. So there's this space that didn't end up in the top 20 area of time, which is the remaining 12 percent. I'm not sure what more I can say about that. Could you clarify your question? >>: If you look at top 100 [inaudible]? >> Daniel Perelman: Ah. Yeah. Although, if I go back to this slide, you'll see that after 20 it was notably leveling off. So it's going to be in there eventually since obviously if you work in the top 10,000, they're all going to be in there somewhere. But you're not getting significant gains going all that -- going much further, more or less why I cut off the graph here. Okay. So this doesn't necessarily tell you much about how useful the tool would be since I started this talk saying I'm going to make Intellisense better, but I haven't yet given any argument this is actually better than Intellisense. Naturally any comparison I make to Intellisense has to be an apples-to-oranges comparison because they take different inputs. In particular, especially for static methods, Intellisense has to be told what the receiver is. This is a significant bonus on static methods, but it's still a bonus on instance methods if I'm specifying multiple arguments in my method and my algorithm has no information on which one is going to be the receiver. On the other hand, obviously knowing multiple arguments to the method is useful in filtering out which method you want to call. So Intellisense doesn't use that information. So I'm going to present the comparison of the simulation of typing in the receiver, hitting dot, and looking how many entries down in the Intellisense list you have you have to go and taking the difference of that from how many lists -- of how far down in the list my algorithms gives you have to go. Yeah? 9 >>: Well, then you give preferential treatment to the receiver. I mean, it seems like if the programmer is not sure how to use the API, if they have a receiver, that's something that they already have in their program. So they already kind of have a handle on it. Seems likes you should treat that special and not treat it so equally with all the other [inaudible]. >> Daniel Perelman: Oh. You're saying that I should allow the algorithm. Basically, it seemed to work pretty well when I didn't give preferential treatment to the receiver, so I didn't see any reason to change it. It might make sense to in an actual implementation to say, yes, whichever one the [inaudible] receiver is probably the receiver tried to put their methods first. Is there another question? >>: So the list -- the Intellisense list is alphabetical? >> Daniel Perelman: Yes. >>: Okay. So have you considered the possibility of just reorganizing it by frequency of use, which would make it, you know, [inaudible]? >> Daniel Perelman: No, I have not. There actually has been previous work that did that, but I haven't compared against that. >>: It's also very interesting for API discovery, do you want the frequently used ones or the infrequently used ones? >>: Most don't want equal. >> Daniel Perelman: I actually had someone tell me previously that I should try taking all the methods frequently used and not preventing them, because obviously the programmer already knows them. I'm not sure which is actually the right choice. So to give an intuition for how this is going to be compared to -- sorry? >>: [inaudible] just a comment, in a way Intellisense is there so that we have to type less versus [inaudible] you actually trying to discover something. So it's actually solving a completely different problem. >> Daniel Perelman: Yes. So sort of the way I'm looking at this is Intellisense gets used for API discovery, but it wasn't really designed for it. So, yes, we're solving a different problem. So to give some intuition on how this is going to compare to Intellisense, our algorithm is going to do notably better on instances where the arguments have interesting types. In particular here I'm saying that there is an error type enum, is the argument here. If you have an enum, there's probably going to be relatively few methods in your library that deal with that specific enum, so that's going to help our algorithm a lot, but give no benefit to Intellisense. The two are going to be about the same when you're dealing with, for instance, methods of system.object because there just aren't that many things to deal with an object, and neither algorithm is going to have many options, so the difference can't be that big. 10 Algorithms going to do significantly worse, first going back to that if the user does want to call dot equals but has two enums, then my algorithm is probably going to do poorly. Or just has two of some richer type than object, it's probably going to do poorly because it's going to be preferring methods that take more precise types and the user is actually calling a method that takes object. So of course since we're in alphabetical order, Intellisense does significantly better finding methods whose names start with the letter A. So here's the comparison. The difference here, here the minus 50 means that it showed up on my algorithm 50 spaces higher than on Intellisense. So that happens about 5 percent of the time. We go over to 20. We get 20 spaces higher about 20 percent of the time. And at the other side, Intellisense does 20 spaces higher about 15 percent of the time. And there's this big middle space where they really do about the same, plus or minus 10, not a huge difference. Yes? >>: [inaudible] using ranking as a proxy for the [inaudible] involved in looking at that list and figuring out which one's the right answer? >> Daniel Perelman: Yes. Which is notably nonlinear, but yes. >>: For the alphabetical one, if you kind of -- you can scan type faster when it's alphabetical [inaudible]? >> Daniel Perelman: A reasonable argument. >>: Because in yours it's almost random ->> Daniel Perelman: Yes. >>: Which means that the reader ->> Daniel Perelman: You're going to have to read through every single one. >>: Actually slow you down significant just to read. >> Daniel Perelman: Yes. >>: If it's not -- what you're looking for is not in that top 10. >>: But you can also take the top-k and organize by assembly or by -- you know, by package. I mean, you're right. It would be additional work [inaudible] type for some k [inaudible]. >> Daniel Perelman: I should note that my heuristic ranking algorithm actually tends to get clumps of ties which end up being alphabetically ordered. So but still you're right. It's going to end up working significantly more random and therefore take more time to read through than the 11 equal number of -- equal size [inaudible] Intellisense. So on top of this, we could imagine including in our query the return type we want, in which case that would significantly reduce the options in our list. But, once again, Intellisense doesn't support this. So this line jumps up about 10 percentage points. So now we can get about 30 spaces higher about 25 -- about, sorry, 15, 20 percent of the time or 20 spaces higher more like 20, 25 percent of the time. And, on the other hand, now we end up only 10 or 20 spaces worse than Intellisense only around 5 percent of the time. Also noticed here that this is -- that there's another 5 percent jump when we're talking about static methods, once again because my algorithm has to consider a larger search space. Yes? >>: So I guess it really doesn't increase [inaudible] there is a model, you have to have some kind user model what they would recognize as the correct answer. So they might know the name, for example, they might recognize the name, or they might not have no clue to what the name is. So what's your model? >> Daniel Perelman: So this is just telling you what position is going down and up in the list. So basically I'm asserting that the user does not know the name but will recognize it. This, once again, might not be a reasonable model. Yes? >>: So this alphabetical ordering, this bothers me. It's almost like completely unnecessary that in a way the Intellisense list, right, sure, it's alphabetical, whenever I hit Intellisense, I start somewhere in the middle of this list based on the last time I used it, whatever, so you think it should always say the distance is half the length of the list, right, any Intellisense. This whole alphabetical thing seems to be just -- I don't know, just puts some random noise in this thing. It doesn't seem relevant, right? I have to search for my Intellisense list, I will basically on average have to search [inaudible], right? >> Daniel Perelman: Well -[multiple people speaking at once] >>: It goes back to the question: Does the person know what alphabetically they're looking for? >>: No, he's trying to -- I mean, he's right that you don't know the name, but you have to browse through the list [inaudible] one by one. So the assumption is that it will be half the list that you have [inaudible]. >>: In Intellisense you're reading one word on every line. This one you're reading quite a long expression, right? >>: No, Intellisense also gives you a tool tip and you want to [inaudible]. >>: [inaudible] tool tip. >>: Yeah. You need to read all the arguments because [inaudible] all the parameters, because 12 the assumption here we use the method, the parameters to parse [inaudible]. >>: [inaudible] >> Daniel Perelman: This will only show one [inaudible]. Yes? >>: What is the -- is there an assumption that you're making that if the user -- while the user doesn't know the method they're calling or the sequence of events they want to call, they do know the types? Like why assume nothing, know anything semantically about what they're trying to do? [laughter] >> Daniel Perelman: Well, if the programmer has no idea what they want to do, I can't help them. >>: Like what if they were [inaudible]? [multiple people speaking at once] >> Daniel Perelman: So if they wrote size but they meant rectangle, then for one thing there's likely to be a conversion from size to rectangle, in which case my algorithm can say, oh, maybe you want to convert to rectangle and here's these methods that actually take rectangle that get ranked a bit lower. So that could be helpful. And I'm saying that the programmer is probably at some point in writing their code they have some objects around that they want to work on for the next step. So that's sort of the base assumption. Yes? >>: Now, this is based on analyzing calls and things that are in their program, right? Other information there, or do you just analyze all the reference to symbols? >> Daniel Perelman: I'm sorry, I'm not sure I understand the question. >>: When you pull an [inaudible] do you analyze all these sort of methods that are hanging around to figure out which one is relevant? >> Daniel Perelman: Yeah. So I look at each call in the program individually. I look at that call and apply, and the heuristics are completely local. >>: So if they are starting out with a blank project, they have nothing. >> Daniel Perelman: If they're starting out with a blank project, that makes essentially no difference to this algorithm. >>: No, no, but what's a [inaudible]? What are you searching through? [multiple people speaking at once] >> Daniel Perelman: Okay. Okay. So yes. It's the libraries it's searching through is whatever 13 libraries are being referenced by the project. >>: What if the user didn't know they needed a reference library? >>: I just mean if we're in the scenario of the user discovering an API, they're going to be starting out theoretically with this new API with a blank project that has nothing but a void name method in it and there are no calls in their program from which to figure out things that are useful and ->> Daniel Perelman: This isn't reading other calls in the program. But it does need to have an input of which libraries to search through, which is currently based off just whichever libraries are referenced by the binary I was analyzing. >>: So it can only analyze patterns that are used in the library itself of accomplishing objectives. >> Daniel Perelman: It is not working at existing code to decide which library to use. It's purely -- it is only reading through libraries to find out what method signatures exist, and then it is ranking which method signatures. >>: Only need metadata. >>: You only need metadata. >> You only need metadata. >>: Well, there's a query. >>: And a query. >>: [inaudible] I think he's just misleading, because to do his experiments he was using existing code to start out with a different call [inaudible]. >>: [inaudible] this self typing [inaudible]. [multiple people speaking at once] >> Daniel Perelman: It's not local, but you could most likely get most of the information by reading through the binary of the library without having any code written by the user. >>: So you have to analyze the implementation of a library. >> Daniel Perelman: In order to get that information, yeah. Although that could be done ahead of time. >>: If you need metadata for things, I mean, if you point at the Microsoft Symbol Server, you'll get a lot of libraries that exist and maybe your search space is really, really large, but at least you'll then not miss in a library [inaudible]. >>: [inaudible] I didn't find anything in the reference libraries. Do you want me to continue. 14 >>: Yeah, but it's an interesting -- yeah, it's an interesting question however the things scale to really huge numbers of [inaudible]. >> Daniel Perelman: Yes. I can't really comment on that, because that would depend on the mechanism you had for reading that metadata. I suspect that it wouldn't have too much difficulty. Just needs to be able to know the type hierarchy and know what method signatures are available. >>: Okay. >> Daniel Perelman: Other questions? >>: So one other question. So this seems to have a potential to really work well [inaudible] if you have a good type system. So what about -- do you have any experience or results about why there is only [inaudible] like integer, because there can be methods that always get integers, creating character, whatever, right? And if you don't give me specific types, then this is not going to be able to search properly. >> Daniel Perelman: Yes. This is going to be less useful in directing those types. As I had been talking about doing the subtyping thing for strings, you could certainly imagine for trying to do that on other primitive types. But, yeah, it's not really the problem this is directed at solving, and I don't necessary have a good idea of how to address that in a similar manner. >>: How does this work in a world where people are using a lot of type inference, where the kind of receiver -- not the receiver type, but if I have a variable that I'm assigning into, I don't really know what the type is until I get the method that then returns that thing. >> Daniel Perelman: That's why I showed this graph first, because I expect in the real world you don't know what type you want. So this graph is more realistic than this one. This one looks nice, but this probably represents the real world better, because you're usually going to be using the auto keyword or whatever and not know what type you actually want. >>: I think this brings it back to the other question I have of what are you trying to do? Are you trying to save typing? Are you trying to actually make the user discover stuff? Because, again, that's something in Intellisense seems to be the wrong thing here. For example, if I [inaudible] when I'm stuck, it's because I don't know how to get a text writer [inaudible]. >>: No, no, were you here when he side it's an apples-and-oranges comparison? >>: Yeah [inaudible] [multiple people speaking at once] >> Daniel Perelman: I'm comparing to Intellisense because as far as I know Intellisense is -reading through the Intellisense list is how programmers tend to find methods. So I'm not saying that it's necessarily very good for that, so it's not necessarily that difficult to beat, but ->>: I guess what I'm trying to say, like my argument is that the next graph is actually relevant because often I know what I want but I know how to get it so I need the text writer but I just can't [inaudible] text writer uses abstract, just like this complicated thing I have to go through to get a 15 text writer. So this would, help, right? Because I can say here's the string and a text writer ->>: As long as you're willing to write text writer. [multiple people speaking at once] >>: Just making the argument that comparing yourself to Intellisense alone is not really showing what it can do. >> Daniel Perelman: Yes. I compared -- I talked about the Jungloids before which is addressing the specific problem is I have one type, I want to get this other type, there's some random path in the middle that's completely undiscoverable. So that I would think would have to be -- is definitely something that belongs in the IDE and is definitely part of an implementation of this. So my last graph is running time. This is the time -- this is the cumulative time to execute one query. This is saying that 95 percent of the time I can finish -- I can get these ranked results in method query is under one second. I'd like to note once again this is not including the time to do the type inference algorithm which can be run in the background ahead of time for ->>: What was the size of the background library you searched to give us one second? >> Daniel Perelman: I don't know exactly. It was wording several dependent libraries which the binary sizes probably totaled up to a couple megabytes. I don't know. But including the .NET quo library and a few other libraries. >>: Now, when you say .NET [inaudible] library, what is that? >> Daniel Perelman: Whichever assemblies were referenced by the code that it was analyzing. So not all them, but whichever ones were actually referenced. So system.core.DLL and probably a couple other .DLLs that were around. These usually ordered around a dozen external .DLLs, but I don't have ->>: So these were mature projects ->> Daniel Perelman: Yes. >>: So we can expect that that's something -- somewhat reasonable. >> Daniel Perelman: Yes. Unfortunately that's not necessarily comparable to what's going to be visible from the Visual Studio context. This is what was visible from the binary context, which is hopefully comparable, but not obviously comparable. >>: [inaudible] is it the ranking methods this time or is it the -- I assume you already have all this stuff cached, right, so you're just kind of [inaudible]. >> Daniel Perelman: Yeah. And ranking each result takes time. Just running the heuristics on each one and looking up all the data in the data structures. And of course this information isn't necessarily stored in the most efficient way. It has the whole heavy weight of Phoenix loaded in 16 the background. >>: So should you take this as an upper bound? >> Daniel Perelman: Yes. I expect an actual implementation to be able to run faster. >>: Well, an upper bound of one second for each query? >> Daniel Perelman: Yes. Which I sort of drew the line at one second, because I feel like that's a reasonable time to expect a user to wait for their window dialogue to pop up, and much longer than that they're going to get annoyed the UI being too slow. >>: That's faster than going out to Bing. I mean, if you're comparing somewhere between Intellisense and somewhere between searching online, it's still one second [inaudible]. >> Daniel Perelman: Waiting a second or two is ->>: [inaudible] might argue that no, no, you got to get it down to a quarter second, which we probably could, but, you know, we're talking along the order of human interactive speeds. So you care about things somewhere between a tenth of a second and a second and a half. That's where you have to play. >> And because a person is exploring this API and doesn't exactly know, they're already in the sort of thinking moment. It's not like they're in the process of typing really fast, they don't know [inaudible]. >>: [inaudible] >> Daniel Perelman: Sorry. >>: Have you looked at incorporating like chunky frameworks, like WPF or Silverlight? >> Daniel Perelman: I have not looked at those and I am not familiar with those APIs, so I'm not sure how that would relate to this. >>: Just thinking that a typical developer is probably working inside a framework like that, and so, you know, which is why Intellisense is really useful because they're big and there are many different policies. >> Daniel Perelman: I expect that for a GUI framework this should be useful, because GUI is definitely an area where you have relatively specific types. >>: Are you saying, then, you should prefer -- if you're working within a framework, then those should get higher ranking? >>: Well, that's a possibility. I think we're -- the question I have is about scale, if you have something like that and you throw that into the mix, does it make it much slower. >>: It's just a noise. And, you know, you said you can't analyze based on what was available from the binary. I think the C# compiler scripts out unused binaries. But if someone has a 17 project, they may really have references to WPF and Silverlight and Windows and a very large number of libraries. >> Daniel Perelman: So what my analysis is doing is if there's any reference to a library, it has to load the DLL for that library and read the metadata from that. So hopefully it will -- so hopefully it's actually getting all of the methods in that library, not just the ones that were actually called from the project. >>: But I think there's -- I mean, there is a very solid point [inaudible] it's a future work here, which is, you know, if you put this in a development environment, you probably have a couple orders of magnitude more sort of methods nearby and noise and all this sort of stuff and does the performance scale, does the ranking still work. These are all the questions you'd want to answer. >> Daniel Perelman: The ranking algorithm should be, well, essentially linear in the number of methods you have to look at. Because you have to look at each method, decide if you can assign the arguments to the methods arguments, and if you can, assign a ranking score. Of course then you have log and factor of actually doing the sorting. But ->>: You can create a fancy new thing. You can make this look so much -- you can make it look highly [inaudible]. [multiple people speaking at once] >>: So you haven't gone there at all. >> Daniel Perelman: No. I mean, there's obviously ways to improve on that. >>: I mean, you could probably take all the DLLs in the nonuniverse and make it pretty efficient on that. >> Daniel Perelman: Yeah. >> Right? I mean, you would have to follow on types, so people typing [inaudible] you can probably do [inaudible]. >>: Think of Web search engines. [multiple people speaking at once] >>: [inaudible] to do the ranking, not to find the hits. So the problem Bing or Google have now is they have to run things right, not to find ->> Daniel Perelman: Yeah. I was mostly worried about Phoenix not eating up all of my memory including the DLLs. So I wasn't going to add to my memory problems by creating a fancy index. >>: [inaudible] the reference as simple, I guess. >> Daniel Perelman: So my future work here is to consider other types of partial expressions. I've just really been talking about partial expressions where I have a method call and arguments 18 and perhaps some of the arguments have that dot question mark on it saying that it's not exact, and I want some path expression starting there. Next, like to determine what would be a good user interface for this. I had some ideas scattered throughout the talk, but I'm not sure they're exactly what makes sense. And then to actually develop a Visual Studio plug-in for this, and once I have that, to perform a user study and find out how useful this actually is so I can have some more concrete data of how good an API discovery tool this is. So conclusion. Type-based extensions to Intellisense can likely be useful. And this partial expressions language is sort of the key of how to talk about it formally. >> Tom Ball: Thank you very much. [applause] >> Tom Ball: Wow. Just a little under an hour. Any more questions? >>: I recommend that you also include common type conversions as sort of pseudo dot question mark operators. >> Daniel Perelman: Yeah. My sort of ->>: [inaudible] would create too much noise. >> Daniel Perelman: I mean, yes, my concept from the dot question mark basically is to generally just plug in the entire Jungloid result there and do any sort conversions from one type to another was the concept. So, yes, if there's common type convergents that are not actually [inaudible] as an instance method, then that would make sense to put them there. Jungloids actually works by reading through all the existing code and trying to find type conversions that way. Yes? >>: So I'd suggest I think you're going to get the most bang for your buck by designing a number of different UIs, for instance, around what the scenarios are that you think your users are going to be using this for. One thing about the Intellisense scenario is that if you type and you hit dot, you know, you're not going to wait ten seconds for something to happen; you're going to think the computer crashed. Whereas if you have a search spot for the Web page, you're willing to wait a couple seconds because you know it's the Web. So a particular UI can give the user hints about the expectations of how long it will take, which means you don't have to speed things up unless you have a UI that convinces the user it's supposed to be faster. So by exploring different UIs, you can totally know the user as to the real performance of your system. >>: Yeah, I think -- I think you're absolutely right. Another thing that's sort of totally orthogonal to that is the quantitative data so far looked at like every method in some file. But in practice, you know, we need a better sense of which of those methods would actually be queried about, right, to make sure it's useful. And that's just completely 90 degrees off from how you do the query. 19 >> Daniel Perelman: That's definitely a problem that my results is looking at all of the calls. I have no reason to believe that any particular segment of a call are ones the programmer would need to do an API search for. Sorry. Yes? >>: So I think another [inaudible] that might be like the question that I was asking about library always has integers [inaudible] I think this can be complemented by like X similar to searching like [inaudible] even the -- I think there was a question about can we use context, for example [inaudible] because if somebody's putting a dot, there's something before that. So the name that they gave to it might be useful to filter out some results and stuff like that. >> Daniel Perelman: Yes. I agree that could be useful. There's sort of the synonyms problem of the programmer is going to use one name; the API developer might use a different name. There has been some prior work in trying to like create thesauroses of synonyms. And I was basically ignoring this because I can't do anything about that automatically. But it's definitely worth considering for an actual tool. >>: Just for a clarity, I considered kind of playing around with the title a little bit. You say that your goal is not necessarily the [inaudible] Intellisense exceeds [inaudible] right now but the API with a discovery scenario. >>: Daniel Perelman: Yes. >>: And the title by type-basing Intellisense seems to sort of imply that this is a wholesale change to Intellisense, whereas this is really a supplemental feature that's focused on one scenario. It -- it -- just to sort of message correctly what you're going after, it'd be -- I think it'd be better to say that as opposed to type-based Intellisense. This is more an alternative to Intellisense or similar to. >> Daniel Perelman: Yeah. Thank you. >>: [inaudible] Intellisense is very -- when you think of Intellisense, it's very [inaudible] API discovery, that the idea -- the query expression for this API discovery is interesting, interesting in concept, but don't tie yourself necessarily to Intellisense. Think about like object browser and navigate to and all the ways you really want to do API discovery [inaudible] natural flow of Intellisense [inaudible]. >> Daniel Perelman: Okay. >>: But don't necessarily be too tied to the UI, because the piece that's really interesting is the API discovery piece. >>: He can always use [inaudible]. >> Daniel Perelman: I guess I was sort of thinking this could be presented to the user as they type in the same sense as Intellisense, is sort of why I was thinking that. But, I agree, it's ->>: Context is important. 20 >> Daniel Perelman: Yeah. I agree it's significantly different than Intellisense. >>: I think Intellisense does -- it does have an API discovery piece, so that's interesting how it plays into it, but there are some things that you talk about that, you know -- in terms of how long does it take for a search or query. People will pay for it in a different form. So that's the interesting piece [inaudible]. >>: [inaudible] couple seconds [inaudible] keystroke and bring up another window to do that. >>: And Tom's point [inaudible]. [multiple people speaking at once] >>: [inaudible] then we've got to cursor, we know ->> Daniel Perelman: And -- okay. And also the prior work of Jungloid [inaudible] had built upon the -- well, code completion [inaudible]. >>: Don't tie yourself -- at this point don't tie yourself to a specific kind of UI. Play around with a lot of them. But, for instance, in [inaudible] now are code snippet experience, you know, you can type a question mark and you can hit tab and that sort of brings up an inline sort of somewhat search list or a completion list. It's not that you can't do it inline and you have to do a dialogue box or something else. Just be careful about getting into that workflow of when the user gets [inaudible]. There are other alternatives that you can sort of [inaudible]. >> Daniel Perelman: Okay. Thank you. Was there other people with hands up? >> Tom Ball: Great. Thanks again. [applause]