>> Tom Ball: Okay. Hello, everybody, and welcome. ... work that Daniel Perelman is going to present is --...

advertisement
1
>> Tom Ball: Okay. Hello, everybody, and welcome. Thanks for coming to this talk. This
work that Daniel Perelman is going to present is -- originated out of some discussions that Sumit
Gulwani and I had with Daniel's advisor, Dan Grossman, from the University of Washington
really about programmer productivity and ways to improve that. Come on in.
So, yeah, Daniel will present Intellisense for the Twenty-First Century: Type-Directed
Completion of Partial Expressions.
So, Daniel, take it away.
>> Daniel Perelman: So today I'm going to talk about a technology that all of you are familiar
with and probably use on a day-to-day basis and how I think that it can be improved.
As Tom said, I'm Daniel Perelman. This is joint work with Sumit Gulwani and Thomas Ball
here at MSR and Dan Grossman over at U-Dub.
So I'm going to start walking through what I think is a rather typical API discovery task. I'm
going to be talking about a programmer working on the Paint.NET image editor who has an
image and a size and wants to shrink this image to that size, but the programmer is new to the
API and doesn't know what method does this.
The programmer is most likely going to start searching through the API by just bringing up an
Intellisense window, going to type in image. and just expect this to be a method and an image
and just look for things that have to do with shrinking. It's going to sort of skim through this list,
scroll down, page down, page down, and find that there's nothing to do with shrinking in this list.
Now, there are a few things in this list that could possibly apply. There's a size and width. But
you can't tell by looking at this, but those are read-only properties, so they aren't helpful in
modifying the image.
So now the programmer having failed to find what they want here is going to just probably just
start looking through the other lists using Intellisense saying, oh, maybe it was under size, which
it turns out doesn't make any sense because this is actually part of the .NET. [inaudible] library is
not part of Paint.NET. And after that is probably going to try to start enumerating all of the
static methods and hoping to find what they want somewhere in here.
Just scrolling down, trying to figure out if any of these classes or name spaces sound like they
have to do what they want to do. Say, oh, maybe it was static on document and not there either.
At this point the programmer is probably getting a bit frustrated, wondering who designed this
API so poorly and is going to run off to the Web to do a Web search for -- to try and see if
someone on the Internet can tell them how to do this.
So they want a Web search. None of these results have much to do with what they're trying to
program. Now so at this point maybe they just try using a better search engine and still get no
results. We'll try to few different search queries with different keywords and find nothing.
As it turns out, they're not going to find anything because Paint.NET is a freeware project
developed by a very small development team. So in order to find any information about it on the
Internet, you'd have to send someone an e-mail or find them [inaudible] or something.
2
So the programmer could do that or continue searching through the list of methods. And
eventually going through all the name spaces they'd find, oh, there's a least size action that's -nope, that's not it either. It can resize layers, but not documents. Going back you eventually
come upon canvas size action, resize document.
So this whole -- this was a lot of work. I mean, obviously it didn't take too much time just to
scroll through all these options because this was a moderately small API and it came up
moderately early alphabetically, so it wasn't too much trouble.
But the programmer says I have a document in size. I want to shrink the document. I know
there is a method to do this somewhere. I want to be able to tell the IDE this and get a result and
just continue programming without having to spend this time on searching for the method
manually.
Current Intellisense is not designed for this task. It allows you to type things in left to right and
gives you a complete alphabetic list of the next token based on what the receiver is and doesn't
filter based on any particular information the programmer might have.
It's reasonable since Intellisense is designed to help you type quickly, not find methods that you
want to call for the most part. It just happens to get used for that.
So my proposal is instead the programmer should be able to tell the IDE I want to call
image.Shrink(newSize). The IDE says there is no method shrink, but here's these other methods
that take an image new size and might be what you want.
So this is saying this is a list of methods it found that take type document and size and also these
other types. Because in this case the programmer probably didn't know a priori that they needed
this anchor edge and a background color for when they're resizing the document. They weren't
thinking of that. And so could come up with other options that turn out to have nothing to do
with what the programmer is thinking.
So the problem here is that numerous unfamiliar APIs are difficult to navigate. And as we have
more and larger libraries, it's more common that programmers will have to deal with APIs that
they are not familiar with and be able to find what they want do with them.
And also this programmer wanted to shrink the document, but the method they wanted to call
was resize document. So knowing the right names for API is part of the difficulty. So simply
searching by name, which is what the current tools are helpful for, isn't as useful.
So we have this programmer thought process. But what we want the IDE to be able to do is to
support adding arguments to the query, reordering the arguments, renaming the method or even
appending pattern expressions to arguments. That is, perhaps the programmer provided one
argument, one value, but the method actually only takes a specific property of that value. We
want to be able to offer these as options to the programmer.
So there has been some prior work in doing type-based searches of code. The first one is
Jungloids, which is a system for Java that's based on the concept of doing conversions from one
type to another. The example is in the Eclipse IDE they have an IFile which is just a normal file
reference to a Java file and they want to get a syntax tree which is [inaudible] Java file which is
3
represented as an AST node.
To do this conversion they have this middle step of creating an ICompilationUnit that the
programmer can't usually search for. So their tool will take these, the input type and the output
type, and produce this code by searching through the API.
So this is generating an entire expression at once. The actual tool they present uses the locals as
the source types and the context as the return type, as a new type AST node, AST equals, and
then call the tool.
This is somewhat different from what I'm proposing because they're only talking about using one
input value and not multiple, and I'm suggesting using multiple input values to create more
complicated expressions.
Secondly, there's isynth for Scala which also pulls in the locals and all the APIs around and tries
to piece them together into different expressions. And this will create more complicated
expressions, and they attempt to pare down the list of it by not returning results that violate
contracts.
This unfortunately doesn't seem to scale that well. They only present using it in APIs that have a
few hundred methods.
Lastly, there's the Hoogle tool for Haskell which allows you to just add a command line, type in
a Haskell type, and it will return all of the functions from the Haskell library that have that type.
This does not make a particular attempt to rank the results, and it also is searching by more or
less exact type. It doesn't offer results that have other arguments that the user didn't think of.
So in all what I'm suggesting that's different from these tools that I want to be able to present
methods that have arguments the user didn't think of providing, I want to rank these methods in a
way that's useful to the user, and I want the user to specify which arguments that they expect,
which will help in filtering out which results are meaningful.
So a proposal is that I want to make searches that work like code but are incomplete, partial
expressions. So I have this example where the user knows that they're in a key-pressed event
handler. There's some collection of keys that are down and they want to add information from
the key event to that.
To do this, the tool should be able to tell the user, oh, the right type you need is the key event key
data. Also this collection needs to be told the key time info of when the key was pressed. Or
going back to my running example, if the user wants to shrink an image, the program should be
able to tell the user you need to call resize document, put in the image new size here, and then
also need to know these anchor edge and background color arguments.
The key here is that we are using types as the primary search key. That is, this search means find
me a method that's a document and a size. So to write out these partial expressions, I'm running
them like this with the question mark meaning we don't know what the method is and these curly
braces to remind us that this is a subset of the possible arguments and they might not even be in
the right order.
So just in general the questions marks is a hole where something needs to be filled in. And I'd
4
like to emphasize that the search style complements Intellisense. In particular, I expect this to be
useful for API discovery and not particularly useful for typing fast.
So now I'm going to summarize my general approach of how this is going to be implemented.
So the workflow of this is that the user is typing in some text into the IDE and this is somehow
going to be parsed into a query. I have not worked on a parser, I do not have a particular UI in
mind at the moment.
So taking this method query and the static type context and a library interface in order to know
which methods could be called, this will be combined into determining a result set of every
possible method that could be called with those arguments.
Then because this is going to tend to be a lot of methods, in order to make this useful, the results
will be ranked using a heuristic ranking algorithm to present the method calls that the user most
likely meant towards the top.
To formalize the queries, I have this query language where the first two lines define a simple
programming language where I have variable references, field lookups, and method calls. I'd
like to highlight that I am writing instance methods with the receiver as the first argument, so
instance and static methods work the same in this language just to simplify things.
And then I have these versions with the tilde on top to indicate the partial expressions, that need
something to be filled in, where I can either put -- add a dot question mark at the end of an
expression or use a question mark and the curly braces as my method call saying we don't know
what method we're calling.
So an example here is that you could imagine the Visual Studio UI taking
image.Shrink(newSize), recognizing that there is no method shrink, and converting into a query
that looks like that.
Now, to generate the results, we take the query, we have two arguments, so we just simply
enumerate every method in the library that takes a document and a size or a size and a document.
Now, you'll notice this would not actually find resize document because resize document had
two other arguments. So we need to also consider there being additional arguments past the ones
that the user specified.
And, also, as I had said before, we want to be able to support the user not specifying the property
of an argument. In this example we have a sting builder, an exception, and we want to -- we
have some debug log or something, and we want to append the exception to the string builder
and we want the tool to be able to recommend, oh, you meant to append the stack trace of the
exception.
And we can write this formally from the query language using rewriting rules. The first one says
that we can just add a question mark onto our partial method calls saying, oh, there's another
argument we didn't know about.
And the bottom row is talking about what we can do with the dot question mark, either just
ignore it, or we could make a method call. We call this -- this could be a static method call with
one argument or an instance method call with zero arguments. Or we could append a field
5
lookup. And for those two you'll notice I left the dot question mark on to say there could be
more than one.
Now, this is going to end up with a lot of results, especially once I add on that rule saying that
we can do this multiple times. In fact, a little work will show this actually gets you an infinite
number of results, which would take an awful lot of time to read through.
So you could simplify this just by querying a hard cutoff of applying these rules, one, two, three
times, something like that. On the other hand, you would still end up with a lot of results, so we
need some way to lengthen.
In particular, here's an example of if you had a string and a string array, here's some results you
might get. They vary between reasonable and what you probably meant to completely crazy and
very unlikely to be what you meant.
So some of these, like comparing the lengths to each other, which unless you're doing something
very strange, is not what you meant, or comparing their types which always returns false, so
certainly not what you meant. Or concatting them all together, or could use the string as the
format specifier when converting the length of the array to a string. I actually didn't read the
documentation, so I'm not even sure what that means.
Or some more reasonable things. There's a few different string operations that make sense
taking a string, an array of strings, can join the array together, you could format -- use the string
as a format string, or you could split the string using the array [inaudible]. In the last case, you
would need to specify the extra details of how you want the split to work.
So in order to -- sorry. In order to order these, I'm going to discuss a few different heuristics for
doing the sorting. As I was reading previous, I mentioned some semantic reasons why they don't
make any sense. I can't expect my algorithm to understand the semantics of these methods, so
instead I'm going to rely on heuristics that seem to work well in practice.
The first heuristic I'm going to discuss is type distance. Since the concept of my search is that
I'm searching for methods that are -- that match the types of the arguments I'm providing, I'm
going to want to prefer methods that take the exact type I've provided. So if I provide a
rectangle, methods that take a rectangle are probably more likely to have to do with what I'm
doing than methods that take a shape, which in turn are much more likely to have what I'm doing
than methods that take an object.
So I'm going to write this metric of how for up the type DAG I have to go and use that to
prefer -- prefer methods that takes types lower down on this DAG.
Next, I had this concept of path expressions starting from the arguments the user provided, which
certainly can be useful, as in the example I had before where I wanted to stack trace some
expression. But as you put more and more of them on, you tend to get less related to the original
argument. So I'm going to want to prefer results to have fewer of them. In this example, dot get
type dot two string has very little to do with the original array of strings. It's unlikely to be what
the user meant.
The last heuristic I'm going to discuss is working at subtypes of strings. Strings get used for a lot
of different uses, paths, font family names, icon names. And it would be nice if when the user
6
provides a fonts family name we don't recommend file operations.
So I have this example on the bottom where the user asks a query for something to do with a font
family name, and we want to tell them recommend create font and methods like that.
In order to do this, in order for this to be useful, we also need this to be automatic for any
subtype of string. That is, we can't hard code what a path is, what a font family name is, what an
icon name is into the program. We need to be able to discover those automatically in any other
subtypes we might not have thought of yet.
In order to do this, I'm going to use an algorithm that's based on the Lackwit paper which did a
similar paper for [inaudible] C. It's just a simple type inference algorithm where every method
return type and parameter that involves a string gets a type variable assigned to it. We use basic
dataflow and unification-based type inference in order to determine a set of subtypes of strings.
For example, there's these two methods, get temp file and get file name, and they both get used at
the first argument to change extension. Therefore, my algorithm can know that they have the
same type.
The problems with this method tend to derive from accidentally merging subtypes that are not
actually the same, which means that we might present file methods when presented with a pallet
name, but we still can avoid showing, say, font family or icon name related methods. So it's not
so bad.
Also I'd like to note that this takes some time but can be done in the background, so it doesn't
need to be executed on every query.
So I don't expect you to read this slide. This is just showing that we have some score that these
metrics are combined into and we compute this score for each result on the result set and then
sort the results, and that's the ranking mechanism.
Now, I'm going to talk about the experiment that I've performed. I have not yet implemented a
UI for this. So the experiment is completely quantitative results on existing code simulating
using this algorithm on existing code.
So I use -- did a binary analysis using the Phoenix compiler framework to read through several
mature C# projects. For each call with at least two arguments, it took the call and produced a set
of queries from it.
For example, for this call to resize document, it would take each argument and make a query
using each argument and then make a set of queries using each pair of arguments and a set of
queries using each triplet of arguments, and it would evaluate how well each of these queries did
at guessing the call, which is in this case resize document.
The evaluation would be we have some ranked result list and we want to know what position
number resize document shows up at.
These are the projects that I used. They'll mature C# projects. It's a combination of both user
facing and library code as well as a combination of close source and open source projects.
7
So here's the evaluation. On this graph the rank along the bottom is what position number the
correct result showed up in. That is, the actual call we found in the code. And the rank -- so
rank 1 means that using one of those queries -- that is knowing just the types of up to three of the
arguments -- the actual call in the code -- the actual method call in the code showed up at the top
of the list. That happened about 35 percent of the time.
Moving across here up to 20 for the correct results showed up in the top 20 about 80 percent of
the time. In other words, we can predict a method call from its arguments up to a set -- into a set
of about 20 practical method calls about 80 percent of the time.
I'm going to overlay on this graph a breakdown of separating instance calls only and static calls
only. At the top we can see that instance calls do slightly better and it does somewhat notably
worse on static calls. I'm not entirely sure why this is, but I suspect that it's because that
searching through all of the static methods is simply a larger search space.
You have a question?
>>: This is an average over all of the different possible patterns you generated off the
[inaudible]?
>> Daniel Perelman: No, I'm taking the best.
>>: [inaudible]
>> Daniel Perelman: I'm taking whichever one of those did the best, which is admittedly
somewhat unfair.
>>: Did you find that certain patterns always did [inaudible] best or a small number of patterns
were the best?
>> Daniel Perelman: Well, in general we're going to expect that the pattern with the most -- that
takes the most arguments is going to be the best, but I'm going to get into that more specifically
in a moment.
>>: It's a great question, but if some method takes a fu and two Ns, you want your query to
involve the fu and not the two Ns.
>>: Yeah, yeah, yeah.
>>: But for this experiment you tried all subsets including ->> Daniel Perelman: The concept of this is that most likely one or two arguments the user is
going to think of when thinking of what method they want to call are going to be ones that are
going to be successful at finding the method. Unfortunately I can't automatically figure out
which arguments the user would think of.
But on the next slide I'm going to focus in on the results at just this 20 line where -- so first this is
just a graph showing the calls that I -- how many calls I looked at of each number of arguments,
the call that I'm combining the receiver into the argument set. So this 3 here means -- is the
number of static method calls that took three arguments plus the number of instance method calls
8
that took a receiver in two other arguments.
So now here this is looking at only queries that had one argument. So the type of one of the
arguments was able to predict a three-argument method call 54 percent of the time. So that's
requiring only one of the three arguments. If I then add in queries that have a second argument -that is, two of the types of the arguments -- I'm able to get up to getting the correct answer
[inaudible] within the top 20 86 percent of the time.
Now, if I throw in the third argument, so for our three-argument calls looking at the types of all
of the arguments, I get practically no difference. So this is sort of supporting what I was saying
before that few of the arguments are generally enough to decide which call we want to make.
And that I wasn't necessarily gaining all that much by throwing in all three arguments.
Now -- yes?
>>: Can you go back one slide. What does the inverse of this look like in the way that no matter
how many arguments you had [inaudible]?
>> Daniel Perelman: So I don't go above three arguments because when I added four arguments
to the graph it wasn't visible. And obviously once I have all three arguments, that's the best I can
do. So there's this space that didn't end up in the top 20 area of time, which is the remaining 12
percent. I'm not sure what more I can say about that. Could you clarify your question?
>>: If you look at top 100 [inaudible]?
>> Daniel Perelman: Ah. Yeah. Although, if I go back to this slide, you'll see that after 20 it
was notably leveling off. So it's going to be in there eventually since obviously if you work in
the top 10,000, they're all going to be in there somewhere. But you're not getting significant
gains going all that -- going much further, more or less why I cut off the graph here.
Okay. So this doesn't necessarily tell you much about how useful the tool would be since I
started this talk saying I'm going to make Intellisense better, but I haven't yet given any argument
this is actually better than Intellisense.
Naturally any comparison I make to Intellisense has to be an apples-to-oranges comparison
because they take different inputs.
In particular, especially for static methods, Intellisense has to be told what the receiver is. This is
a significant bonus on static methods, but it's still a bonus on instance methods if I'm specifying
multiple arguments in my method and my algorithm has no information on which one is going to
be the receiver.
On the other hand, obviously knowing multiple arguments to the method is useful in filtering out
which method you want to call. So Intellisense doesn't use that information.
So I'm going to present the comparison of the simulation of typing in the receiver, hitting dot,
and looking how many entries down in the Intellisense list you have you have to go and taking
the difference of that from how many lists -- of how far down in the list my algorithms gives you
have to go. Yeah?
9
>>: Well, then you give preferential treatment to the receiver. I mean, it seems like if the
programmer is not sure how to use the API, if they have a receiver, that's something that they
already have in their program. So they already kind of have a handle on it. Seems likes you
should treat that special and not treat it so equally with all the other [inaudible].
>> Daniel Perelman: Oh. You're saying that I should allow the algorithm. Basically, it seemed
to work pretty well when I didn't give preferential treatment to the receiver, so I didn't see any
reason to change it. It might make sense to in an actual implementation to say, yes, whichever
one the [inaudible] receiver is probably the receiver tried to put their methods first.
Is there another question?
>>: So the list -- the Intellisense list is alphabetical?
>> Daniel Perelman: Yes.
>>: Okay. So have you considered the possibility of just reorganizing it by frequency of use,
which would make it, you know, [inaudible]?
>> Daniel Perelman: No, I have not. There actually has been previous work that did that, but I
haven't compared against that.
>>: It's also very interesting for API discovery, do you want the frequently used ones or the
infrequently used ones?
>>: Most don't want equal.
>> Daniel Perelman: I actually had someone tell me previously that I should try taking all the
methods frequently used and not preventing them, because obviously the programmer already
knows them. I'm not sure which is actually the right choice.
So to give an intuition for how this is going to be compared to -- sorry?
>>: [inaudible] just a comment, in a way Intellisense is there so that we have to type less versus
[inaudible] you actually trying to discover something. So it's actually solving a completely
different problem.
>> Daniel Perelman: Yes. So sort of the way I'm looking at this is Intellisense gets used for API
discovery, but it wasn't really designed for it. So, yes, we're solving a different problem.
So to give some intuition on how this is going to compare to Intellisense, our algorithm is going
to do notably better on instances where the arguments have interesting types. In particular here
I'm saying that there is an error type enum, is the argument here. If you have an enum, there's
probably going to be relatively few methods in your library that deal with that specific enum, so
that's going to help our algorithm a lot, but give no benefit to Intellisense.
The two are going to be about the same when you're dealing with, for instance, methods of
system.object because there just aren't that many things to deal with an object, and neither
algorithm is going to have many options, so the difference can't be that big.
10
Algorithms going to do significantly worse, first going back to that if the user does want to call
dot equals but has two enums, then my algorithm is probably going to do poorly. Or just has two
of some richer type than object, it's probably going to do poorly because it's going to be
preferring methods that take more precise types and the user is actually calling a method that
takes object.
So of course since we're in alphabetical order, Intellisense does significantly better finding
methods whose names start with the letter A.
So here's the comparison. The difference here, here the minus 50 means that it showed up on my
algorithm 50 spaces higher than on Intellisense. So that happens about 5 percent of the time.
We go over to 20. We get 20 spaces higher about 20 percent of the time. And at the other side,
Intellisense does 20 spaces higher about 15 percent of the time.
And there's this big middle space where they really do about the same, plus or minus 10, not a
huge difference.
Yes?
>>: [inaudible] using ranking as a proxy for the [inaudible] involved in looking at that list and
figuring out which one's the right answer?
>> Daniel Perelman: Yes. Which is notably nonlinear, but yes.
>>: For the alphabetical one, if you kind of -- you can scan type faster when it's alphabetical
[inaudible]?
>> Daniel Perelman: A reasonable argument.
>>: Because in yours it's almost random ->> Daniel Perelman: Yes.
>>: Which means that the reader ->> Daniel Perelman: You're going to have to read through every single one.
>>: Actually slow you down significant just to read.
>> Daniel Perelman: Yes.
>>: If it's not -- what you're looking for is not in that top 10.
>>: But you can also take the top-k and organize by assembly or by -- you know, by package. I
mean, you're right. It would be additional work [inaudible] type for some k [inaudible].
>> Daniel Perelman: I should note that my heuristic ranking algorithm actually tends to get
clumps of ties which end up being alphabetically ordered. So but still you're right. It's going to
end up working significantly more random and therefore take more time to read through than the
11
equal number of -- equal size [inaudible] Intellisense.
So on top of this, we could imagine including in our query the return type we want, in which
case that would significantly reduce the options in our list. But, once again, Intellisense doesn't
support this. So this line jumps up about 10 percentage points.
So now we can get about 30 spaces higher about 25 -- about, sorry, 15, 20 percent of the time or
20 spaces higher more like 20, 25 percent of the time. And, on the other hand, now we end up
only 10 or 20 spaces worse than Intellisense only around 5 percent of the time.
Also noticed here that this is -- that there's another 5 percent jump when we're talking about
static methods, once again because my algorithm has to consider a larger search space.
Yes?
>>: So I guess it really doesn't increase [inaudible] there is a model, you have to have some kind
user model what they would recognize as the correct answer. So they might know the name, for
example, they might recognize the name, or they might not have no clue to what the name is. So
what's your model?
>> Daniel Perelman: So this is just telling you what position is going down and up in the list.
So basically I'm asserting that the user does not know the name but will recognize it. This, once
again, might not be a reasonable model. Yes?
>>: So this alphabetical ordering, this bothers me. It's almost like completely unnecessary that
in a way the Intellisense list, right, sure, it's alphabetical, whenever I hit Intellisense, I start
somewhere in the middle of this list based on the last time I used it, whatever, so you think it
should always say the distance is half the length of the list, right, any Intellisense. This whole
alphabetical thing seems to be just -- I don't know, just puts some random noise in this thing. It
doesn't seem relevant, right? I have to search for my Intellisense list, I will basically on average
have to search [inaudible], right?
>> Daniel Perelman: Well -[multiple people speaking at once]
>>: It goes back to the question: Does the person know what alphabetically they're looking for?
>>: No, he's trying to -- I mean, he's right that you don't know the name, but you have to browse
through the list [inaudible] one by one. So the assumption is that it will be half the list that you
have [inaudible].
>>: In Intellisense you're reading one word on every line. This one you're reading quite a long
expression, right?
>>: No, Intellisense also gives you a tool tip and you want to [inaudible].
>>: [inaudible] tool tip.
>>: Yeah. You need to read all the arguments because [inaudible] all the parameters, because
12
the assumption here we use the method, the parameters to parse [inaudible].
>>: [inaudible]
>> Daniel Perelman: This will only show one [inaudible]. Yes?
>>: What is the -- is there an assumption that you're making that if the user -- while the user
doesn't know the method they're calling or the sequence of events they want to call, they do
know the types? Like why assume nothing, know anything semantically about what they're
trying to do?
[laughter]
>> Daniel Perelman: Well, if the programmer has no idea what they want to do, I can't help
them.
>>: Like what if they were [inaudible]?
[multiple people speaking at once]
>> Daniel Perelman: So if they wrote size but they meant rectangle, then for one thing there's
likely to be a conversion from size to rectangle, in which case my algorithm can say, oh, maybe
you want to convert to rectangle and here's these methods that actually take rectangle that get
ranked a bit lower. So that could be helpful.
And I'm saying that the programmer is probably at some point in writing their code they have
some objects around that they want to work on for the next step. So that's sort of the base
assumption. Yes?
>>: Now, this is based on analyzing calls and things that are in their program, right? Other
information there, or do you just analyze all the reference to symbols?
>> Daniel Perelman: I'm sorry, I'm not sure I understand the question.
>>: When you pull an [inaudible] do you analyze all these sort of methods that are hanging
around to figure out which one is relevant?
>> Daniel Perelman: Yeah. So I look at each call in the program individually. I look at that call
and apply, and the heuristics are completely local.
>>: So if they are starting out with a blank project, they have nothing.
>> Daniel Perelman: If they're starting out with a blank project, that makes essentially no
difference to this algorithm.
>>: No, no, but what's a [inaudible]? What are you searching through?
[multiple people speaking at once]
>> Daniel Perelman: Okay. Okay. So yes. It's the libraries it's searching through is whatever
13
libraries are being referenced by the project.
>>: What if the user didn't know they needed a reference library?
>>: I just mean if we're in the scenario of the user discovering an API, they're going to be
starting out theoretically with this new API with a blank project that has nothing but a void name
method in it and there are no calls in their program from which to figure out things that are
useful and ->> Daniel Perelman: This isn't reading other calls in the program. But it does need to have an
input of which libraries to search through, which is currently based off just whichever libraries
are referenced by the binary I was analyzing.
>>: So it can only analyze patterns that are used in the library itself of accomplishing objectives.
>> Daniel Perelman: It is not working at existing code to decide which library to use. It's
purely -- it is only reading through libraries to find out what method signatures exist, and then it
is ranking which method signatures.
>>: Only need metadata.
>>: You only need metadata.
>> You only need metadata.
>>: Well, there's a query.
>>: And a query.
>>: [inaudible] I think he's just misleading, because to do his experiments he was using existing
code to start out with a different call [inaudible].
>>: [inaudible] this self typing [inaudible].
[multiple people speaking at once]
>> Daniel Perelman: It's not local, but you could most likely get most of the information by
reading through the binary of the library without having any code written by the user.
>>: So you have to analyze the implementation of a library.
>> Daniel Perelman: In order to get that information, yeah. Although that could be done ahead
of time.
>>: If you need metadata for things, I mean, if you point at the Microsoft Symbol Server, you'll
get a lot of libraries that exist and maybe your search space is really, really large, but at least
you'll then not miss in a library [inaudible].
>>: [inaudible] I didn't find anything in the reference libraries. Do you want me to continue.
14
>>: Yeah, but it's an interesting -- yeah, it's an interesting question however the things scale to
really huge numbers of [inaudible].
>> Daniel Perelman: Yes. I can't really comment on that, because that would depend on the
mechanism you had for reading that metadata. I suspect that it wouldn't have too much
difficulty. Just needs to be able to know the type hierarchy and know what method signatures
are available.
>>: Okay.
>> Daniel Perelman: Other questions?
>>: So one other question. So this seems to have a potential to really work well [inaudible] if
you have a good type system. So what about -- do you have any experience or results about why
there is only [inaudible] like integer, because there can be methods that always get integers,
creating character, whatever, right? And if you don't give me specific types, then this is not
going to be able to search properly.
>> Daniel Perelman: Yes. This is going to be less useful in directing those types. As I had been
talking about doing the subtyping thing for strings, you could certainly imagine for trying to do
that on other primitive types. But, yeah, it's not really the problem this is directed at solving, and
I don't necessary have a good idea of how to address that in a similar manner.
>>: How does this work in a world where people are using a lot of type inference, where the
kind of receiver -- not the receiver type, but if I have a variable that I'm assigning into, I don't
really know what the type is until I get the method that then returns that thing.
>> Daniel Perelman: That's why I showed this graph first, because I expect in the real world you
don't know what type you want. So this graph is more realistic than this one. This one looks
nice, but this probably represents the real world better, because you're usually going to be using
the auto keyword or whatever and not know what type you actually want.
>>: I think this brings it back to the other question I have of what are you trying to do? Are you
trying to save typing? Are you trying to actually make the user discover stuff? Because, again,
that's something in Intellisense seems to be the wrong thing here. For example, if I [inaudible]
when I'm stuck, it's because I don't know how to get a text writer [inaudible].
>>: No, no, were you here when he side it's an apples-and-oranges comparison?
>>: Yeah [inaudible]
[multiple people speaking at once]
>> Daniel Perelman: I'm comparing to Intellisense because as far as I know Intellisense is -reading through the Intellisense list is how programmers tend to find methods. So I'm not saying
that it's necessarily very good for that, so it's not necessarily that difficult to beat, but ->>: I guess what I'm trying to say, like my argument is that the next graph is actually relevant
because often I know what I want but I know how to get it so I need the text writer but I just can't
[inaudible] text writer uses abstract, just like this complicated thing I have to go through to get a
15
text writer. So this would, help, right? Because I can say here's the string and a text writer ->>: As long as you're willing to write text writer.
[multiple people speaking at once]
>>: Just making the argument that comparing yourself to Intellisense alone is not really showing
what it can do.
>> Daniel Perelman: Yes. I compared -- I talked about the Jungloids before which is addressing
the specific problem is I have one type, I want to get this other type, there's some random path in
the middle that's completely undiscoverable. So that I would think would have to be -- is
definitely something that belongs in the IDE and is definitely part of an implementation of this.
So my last graph is running time. This is the time -- this is the cumulative time to execute one
query. This is saying that 95 percent of the time I can finish -- I can get these ranked results in
method query is under one second.
I'd like to note once again this is not including the time to do the type inference algorithm which
can be run in the background ahead of time for ->>: What was the size of the background library you searched to give us one second?
>> Daniel Perelman: I don't know exactly. It was wording several dependent libraries which the
binary sizes probably totaled up to a couple megabytes. I don't know. But including the .NET
quo library and a few other libraries.
>>: Now, when you say .NET [inaudible] library, what is that?
>> Daniel Perelman: Whichever assemblies were referenced by the code that it was analyzing.
So not all them, but whichever ones were actually referenced. So system.core.DLL and probably
a couple other .DLLs that were around. These usually ordered around a dozen external .DLLs,
but I don't have ->>: So these were mature projects ->> Daniel Perelman: Yes.
>>: So we can expect that that's something -- somewhat reasonable.
>> Daniel Perelman: Yes. Unfortunately that's not necessarily comparable to what's going to be
visible from the Visual Studio context. This is what was visible from the binary context, which
is hopefully comparable, but not obviously comparable.
>>: [inaudible] is it the ranking methods this time or is it the -- I assume you already have all
this stuff cached, right, so you're just kind of [inaudible].
>> Daniel Perelman: Yeah. And ranking each result takes time. Just running the heuristics on
each one and looking up all the data in the data structures. And of course this information isn't
necessarily stored in the most efficient way. It has the whole heavy weight of Phoenix loaded in
16
the background.
>>: So should you take this as an upper bound?
>> Daniel Perelman: Yes. I expect an actual implementation to be able to run faster.
>>: Well, an upper bound of one second for each query?
>> Daniel Perelman: Yes. Which I sort of drew the line at one second, because I feel like that's
a reasonable time to expect a user to wait for their window dialogue to pop up, and much longer
than that they're going to get annoyed the UI being too slow.
>>: That's faster than going out to Bing. I mean, if you're comparing somewhere between
Intellisense and somewhere between searching online, it's still one second [inaudible].
>> Daniel Perelman: Waiting a second or two is ->>: [inaudible] might argue that no, no, you got to get it down to a quarter second, which we
probably could, but, you know, we're talking along the order of human interactive speeds. So
you care about things somewhere between a tenth of a second and a second and a half. That's
where you have to play.
>> And because a person is exploring this API and doesn't exactly know, they're already in the
sort of thinking moment. It's not like they're in the process of typing really fast, they don't know
[inaudible].
>>: [inaudible]
>> Daniel Perelman: Sorry.
>>: Have you looked at incorporating like chunky frameworks, like WPF or Silverlight?
>> Daniel Perelman: I have not looked at those and I am not familiar with those APIs, so I'm not
sure how that would relate to this.
>>: Just thinking that a typical developer is probably working inside a framework like that, and
so, you know, which is why Intellisense is really useful because they're big and there are many
different policies.
>> Daniel Perelman: I expect that for a GUI framework this should be useful, because GUI is
definitely an area where you have relatively specific types.
>>: Are you saying, then, you should prefer -- if you're working within a framework, then those
should get higher ranking?
>>: Well, that's a possibility. I think we're -- the question I have is about scale, if you have
something like that and you throw that into the mix, does it make it much slower.
>>: It's just a noise. And, you know, you said you can't analyze based on what was available
from the binary. I think the C# compiler scripts out unused binaries. But if someone has a
17
project, they may really have references to WPF and Silverlight and Windows and a very large
number of libraries.
>> Daniel Perelman: So what my analysis is doing is if there's any reference to a library, it has
to load the DLL for that library and read the metadata from that. So hopefully it will -- so
hopefully it's actually getting all of the methods in that library, not just the ones that were
actually called from the project.
>>: But I think there's -- I mean, there is a very solid point [inaudible] it's a future work here,
which is, you know, if you put this in a development environment, you probably have a couple
orders of magnitude more sort of methods nearby and noise and all this sort of stuff and does the
performance scale, does the ranking still work. These are all the questions you'd want to answer.
>> Daniel Perelman: The ranking algorithm should be, well, essentially linear in the number of
methods you have to look at. Because you have to look at each method, decide if you can assign
the arguments to the methods arguments, and if you can, assign a ranking score. Of course then
you have log and factor of actually doing the sorting. But ->>: You can create a fancy new thing. You can make this look so much -- you can make it look
highly [inaudible].
[multiple people speaking at once]
>>: So you haven't gone there at all.
>> Daniel Perelman: No. I mean, there's obviously ways to improve on that.
>>: I mean, you could probably take all the DLLs in the nonuniverse and make it pretty efficient
on that.
>> Daniel Perelman: Yeah.
>> Right? I mean, you would have to follow on types, so people typing [inaudible] you can
probably do [inaudible].
>>: Think of Web search engines.
[multiple people speaking at once]
>>: [inaudible] to do the ranking, not to find the hits. So the problem Bing or Google have now
is they have to run things right, not to find ->> Daniel Perelman: Yeah. I was mostly worried about Phoenix not eating up all of my
memory including the DLLs. So I wasn't going to add to my memory problems by creating a
fancy index.
>>: [inaudible] the reference as simple, I guess.
>> Daniel Perelman: So my future work here is to consider other types of partial expressions.
I've just really been talking about partial expressions where I have a method call and arguments
18
and perhaps some of the arguments have that dot question mark on it saying that it's not exact,
and I want some path expression starting there.
Next, like to determine what would be a good user interface for this. I had some ideas scattered
throughout the talk, but I'm not sure they're exactly what makes sense. And then to actually
develop a Visual Studio plug-in for this, and once I have that, to perform a user study and find
out how useful this actually is so I can have some more concrete data of how good an API
discovery tool this is.
So conclusion. Type-based extensions to Intellisense can likely be useful. And this partial
expressions language is sort of the key of how to talk about it formally.
>> Tom Ball: Thank you very much.
[applause]
>> Tom Ball: Wow. Just a little under an hour. Any more questions?
>>: I recommend that you also include common type conversions as sort of pseudo dot question
mark operators.
>> Daniel Perelman: Yeah. My sort of ->>: [inaudible] would create too much noise.
>> Daniel Perelman: I mean, yes, my concept from the dot question mark basically is to
generally just plug in the entire Jungloid result there and do any sort conversions from one type
to another was the concept.
So, yes, if there's common type convergents that are not actually [inaudible] as an instance
method, then that would make sense to put them there. Jungloids actually works by reading
through all the existing code and trying to find type conversions that way. Yes?
>>: So I'd suggest I think you're going to get the most bang for your buck by designing a
number of different UIs, for instance, around what the scenarios are that you think your users are
going to be using this for. One thing about the Intellisense scenario is that if you type and you
hit dot, you know, you're not going to wait ten seconds for something to happen; you're going to
think the computer crashed. Whereas if you have a search spot for the Web page, you're willing
to wait a couple seconds because you know it's the Web.
So a particular UI can give the user hints about the expectations of how long it will take, which
means you don't have to speed things up unless you have a UI that convinces the user it's
supposed to be faster. So by exploring different UIs, you can totally know the user as to the real
performance of your system.
>>: Yeah, I think -- I think you're absolutely right. Another thing that's sort of totally
orthogonal to that is the quantitative data so far looked at like every method in some file. But in
practice, you know, we need a better sense of which of those methods would actually be queried
about, right, to make sure it's useful. And that's just completely 90 degrees off from how you do
the query.
19
>> Daniel Perelman: That's definitely a problem that my results is looking at all of the calls. I
have no reason to believe that any particular segment of a call are ones the programmer would
need to do an API search for. Sorry. Yes?
>>: So I think another [inaudible] that might be like the question that I was asking about library
always has integers [inaudible] I think this can be complemented by like X similar to searching
like [inaudible] even the -- I think there was a question about can we use context, for example
[inaudible] because if somebody's putting a dot, there's something before that. So the name that
they gave to it might be useful to filter out some results and stuff like that.
>> Daniel Perelman: Yes. I agree that could be useful. There's sort of the synonyms problem of
the programmer is going to use one name; the API developer might use a different name. There
has been some prior work in trying to like create thesauroses of synonyms.
And I was basically ignoring this because I can't do anything about that automatically. But it's
definitely worth considering for an actual tool.
>>: Just for a clarity, I considered kind of playing around with the title a little bit. You say that
your goal is not necessarily the [inaudible] Intellisense exceeds [inaudible] right now but the API
with a discovery scenario.
>>: Daniel Perelman: Yes.
>>: And the title by type-basing Intellisense seems to sort of imply that this is a wholesale
change to Intellisense, whereas this is really a supplemental feature that's focused on one
scenario. It -- it -- just to sort of message correctly what you're going after, it'd be -- I think it'd
be better to say that as opposed to type-based Intellisense. This is more an alternative to
Intellisense or similar to.
>> Daniel Perelman: Yeah. Thank you.
>>: [inaudible] Intellisense is very -- when you think of Intellisense, it's very [inaudible] API
discovery, that the idea -- the query expression for this API discovery is interesting, interesting in
concept, but don't tie yourself necessarily to Intellisense. Think about like object browser and
navigate to and all the ways you really want to do API discovery [inaudible] natural flow of
Intellisense [inaudible].
>> Daniel Perelman: Okay.
>>: But don't necessarily be too tied to the UI, because the piece that's really interesting is the
API discovery piece.
>>: He can always use [inaudible].
>> Daniel Perelman: I guess I was sort of thinking this could be presented to the user as they
type in the same sense as Intellisense, is sort of why I was thinking that. But, I agree, it's ->>: Context is important.
20
>> Daniel Perelman: Yeah. I agree it's significantly different than Intellisense.
>>: I think Intellisense does -- it does have an API discovery piece, so that's interesting how it
plays into it, but there are some things that you talk about that, you know -- in terms of how long
does it take for a search or query. People will pay for it in a different form. So that's the
interesting piece [inaudible].
>>: [inaudible] couple seconds [inaudible] keystroke and bring up another window to do that.
>>: And Tom's point [inaudible].
[multiple people speaking at once]
>>: [inaudible] then we've got to cursor, we know ->> Daniel Perelman: And -- okay. And also the prior work of Jungloid [inaudible] had built
upon the -- well, code completion [inaudible].
>>: Don't tie yourself -- at this point don't tie yourself to a specific kind of UI. Play around with
a lot of them. But, for instance, in [inaudible] now are code snippet experience, you know, you
can type a question mark and you can hit tab and that sort of brings up an inline sort of somewhat
search list or a completion list. It's not that you can't do it inline and you have to do a dialogue
box or something else. Just be careful about getting into that workflow of when the user gets
[inaudible]. There are other alternatives that you can sort of [inaudible].
>> Daniel Perelman: Okay. Thank you. Was there other people with hands up?
>> Tom Ball: Great. Thanks again.
[applause]
Download