Document 17836936

advertisement

>> Judith Bishop: Great. So we're nearly at the beginning of our talk, of our first talk. And I just want to give you a heads up. I hope all of you have got the program in front of you, but the twin goals of this workshop, just to focus on them here, is to expose the state of the platform and data. So platform and data of Code Hunt and it's underlying software packs, which you're going to hear about in the very first talk, and to collectively decide on development work. So, really, Microsoft Research is not going to be able to single-handedly or even as a team take this project further without help.

We need the help of the academic community in two ways, both your expertise in terms of teaching as to what is needed and what is valuable, what is prioritized in what you think is best in this kind of platform, because we're one step removed from what goes on in the teaching community. As you know, and as you will hear today, we've been concentrating on contests, which are not really a teaching activity. So the teaching activity is something we would like to move into, but we need your help on that. And then secondly, of course, in terms of helping is how can you get involved in coding on the platform, and we'll be talking about that, as well.

So topics of discussion, so just this morning's program, Nikolai here is going to give us the deep dive into

Pex, and then we're going to have a contest experience, so you're all going to get actually onto it, which will be fun even for the students. That doesn't mean to say you're going to solve puzzles. It means you're actually going to create puzzles, but don't worry, we're going to make it really, really easy for you, because we've got the puzzles and you've just got to fix them a little bit. Okay. So that happens up until lunch, and lunch will be at the back here in a very pleasant area we've got behind. As far as practical arrangements are concerned, there is coffee and free drinks through that door, also in the same area, and if you go through there or through that door on that side of the restaurants. And feel free to get up and move around and go out at any time.

So now, it's my great pleasure to introduce Nikolai. He knows who he is, Nikolai Tillman, the owner and instigator of this great project. Nikolai?

>> Nikolai Tillman: So as I said earlier -- which microphone? This one? As I said earlier, I recently moved to our Product Group, and a little advertiser, we are hiring, so not only Microsoft Research is hiring, but we are also hiring in the Product Group. We are looking for great interns and full-time developers. So if you're interested or you know someone who's interested, let me know. We're hiring

PhDs, we are hiring interns who are graduate students or master's students. We're looking for all kinds of people. Okay.

So what I will talk about is first of all Pex itself, the engine behind the Code Hunt system, and I will also tell you a little bit about the history, how things became the way they are, and then I will look at Pex from the point of view of a Code Hunt [product] developer. So when you create products in Code Hunt, there are certain things to observe, and I will go into the details there. I want this to be very interactive. If you have any questions, immediately ask me. I have a bit of a challenge in the sense that I suspect that many of you are already very, very familiar with some aspects of Pex, symbolic execution and Code Hunt. I don't want to bore you, but I suspect there are also a few people who haven't heard the full story yet. So I want to adjust what I'm telling you depending on what you feel you need to know.

So if I am rushing over something, let me know if you want more details. Okay. So let's start with Pex itself and what it does, dynamic symbolic execution. So the basic idea is that we are actually running the code that we want to test or analyze, and while running the code concretely, on the site we collect constraints. So when we run the code, typically, we are looking at a function that takes parameters, and we have to pick particular values to run the function with. Otherwise, we couldn't do a concrete execution. So for a particular input, we run the code, and then on the site, we monitor all the instructions the code is executing, and assuming the inputs wouldn't have been concrete, but symbolic, we build up a

symbolic representation of the program path that is being taken, and all the conditions that are being checked are being checked along the program path, and that is the path condition. So after having seen one program path, we now have an abstract representation of basically the equivalence class of all the inputs that would take that particular path, and then we remember that and basically building up the big set, the union of all those equivalence classes, and then we ask a great constraint. However, the next time around, for another input that will go along at different execution path, because it is not in any of these equivalence classes, and then by construction we should see a new program behavior, a new execution path, and we keep going. So there are some interesting aspects to this general scheme of exploring programs. One is the way we choose the next input is really going to determine how we explore the different areas of the program. We could do a very poor choice here and always explore the same loop by unfolding the same branch, or we could try to be clever and spread the search over the whole program, and that is what Pex is trying to do. It's trying to cover different aspects here, so Pex has various search strengths built in that try to make this choice very fair under different coverage criterias, so one search strategy in the mix was actually written by Tao Xie when the was visiting us, so that is using a fitness function and the idea of getting closer to a particular flipping branch.

So the other interesting aspect is that, in general, if you have any kind of interesting program, this loop won't terminate, and because you can unfold loops further and further, and that particularly means that, again, this choice becomes very, very important, because you could just get stuck unfolding the same loop, and so Pex is trying to be clever here, using fair search heuristics. So some of you might have seen this particular example before, if you ever saw a talk about Pex from me. So here is a particular example.

Let's say we want to cover this code. It takes an array, does some checks. How are we going to explore it and generate test cases while we are doing it? So we have to start somewhere, and what Pex does is it starts with the simplest possible inputs, always. There are other approaches that start with random inputs, and we have found that to be not very effective, and sometimes it's also not very refined. What is a random array? So we always start with the simplest possible inputs. We run the code. In this case, it will return here for the first if statement, and then we record the abstract condition while the program returned, ignoring the concrete value, so that is here A is null. On the side, we build up an execution tree representing all the different program paths we have seen, and then what we want to do next is do something we haven't seen before, pick an execution path that's new, and so we have seen the two cases here. So next time around, we know that if we can solve the constraint system, there's a negation, A not equal null. Then we should get a test input that will take us along a new path. So we use a pretty good constraint which can deal with program representation of the heap and many different aspects that are relevant -- in this case, .NET programs. So in particular here, what we need is an array that is not null, and there are many such arrays. Again, any random array will do. What Pex tries hard is to minimize the inputs, so in this case you will get the empty array and not some randomly sized array. Then we run the code again. This time around, we will get to the next if statement, and in this case, for our empty array, it’s not the case that the length is greater than zero. That's why the program returns, and we now have this new condition recorded, and the next time around, I want to see a new path, so the only path we haven't seen yet goes here in the tree. So we build the corresponding path condition, use our constraints and so on. So in this very systematic way, after a few iterations, we actually cover all of the different branches and statements in the code. We find this nested execution in here, and this is a very, very simple example, so actually, we are done after a few iterations, and we are done because there is no loop, there is no recursion. This is not a terribly interesting program, but it serves well to illustrate the basic idea. Any questions? No? Yes.

>> Judith Bishop: Is there a limit on the length of the array that Pex is going to reasonably be able to handle?

>> Nikolai Tillman: There are various limits, so there are limits regarding how long Pex will explore the code, how long an execution path can be, how many branches can be along a path, how many recursions

Pex would follow. And there are also some limits on the array size, because realistically speaking, if you were to allocate the maximum array in a search process, actually, it would run out of memory. There is no way we could actually test that. So I'm not sure what the hardcoded limit is. It might be around

16,000 elements. So there's a practical bound, but it's pretty big. And typically, well, let's do a little demo. So I'm going to switch to the browser, and if you're interested, you can go to that address if you want to try it out yourself. The important part is really pex4fun.com. So I will start out with a few examples that are actually in Pex4Fun, and I will alter switch to Code Hunt. So here is basically our program we had before. I can run it, and then we get -- well, we can explore it. It sends the program to the cloud. Actually, somewhere in this building is a server that handles all the Pex4Fun requests, and then we get exactly the test cases that were predicted in the table before. Judith was asking about arrays.

Now, let's say I do something else here, and I need a bigger array. How much longer will that take? And the answer is, actually, it doesn't take any longer. The most time was spent on compiling, so if in fact it's too big here for us to see the whole result, but I can click on it, and the other one here. I can click on it, and so one thing we produce is actually test code, which you could take and run in Visual Studio, any

.NET system. And this code is actually very compact, so we get this array. The only position that actually matters is this one, and this will hit the issue. So Pex doesn't really care how big the array is.

The thing that matters is how many constraints were there, or how many particular positions in memory do have to be configured to hit a particular issue? So the strict size of the data isn't necessarily an issue.

You also see very nicely how Pex tries to minimize the input. Any questions around that? Yes.

>>: How about like a non-primitive argument?

>> Nikolai Tillman: Non-primitive arguments? In fact, there is already a non-primitive argument here.

It's an array of integers. So in general, if you use Pex4Fun and Code Hunt, there are limitations on what kind of parameters you can use. I will talk about that more later. Yes, any other questions?

No? Okay. So this is a fun website where you can type in any code and let Pex explore it. If you go there -- or maybe I can ask, who is already familiar with Pex4Fun? Most but not everyone. So I just want to give one little hint here. There is one button called new, and if you don't want to play a game, if you just want to get test cases and you don't have Visual Studio and Pex installed, then go to Pex4Fun, click on the new button, and now we are in a mode where all that Pex is going to do when you ask Pex is explore this one program, so that's really nice to do some experiments, to see how does the symbolic execution engine work without playing a game, which comes with some more rules and constraints.

Okay. New.

Let's go back here. So behind the scenes, the symbolic execution framework is pretty powerful, so it understands pretty much all .NET instructions, and we faithfully model the behavior of a .NET program, which includes the heap, so not just primitive types, but arrays, which are actually indirect references, objects in general, dealing with aliasing constraints. There is even some support if you use Visual Studio proper for unsafe code, with are pointers. And a lot of effort went into the system to make it scale to interesting programs. So the bottom line here is it's a pretty sophisticated system that can deal with pretty much any .NET program you saw with it, so even here, you might see that -- this is the whole program. I am defining a class. I can have more than one class. That's totally fine, and I can have my different methods and they can call each other. They can be recursive. So it's not a toy. So then what we do is we translate the formulas or the constraints that we gather during execution into the series of an SMT server.

Who is familiar with Satisfyability Modulo Theories servers? Most people. Okay, so these, original, they were really designed as serial improvers to support statistic tools, but it turns out what they really do is they find counterexamples. If they cannot find one, your theorem is correct. But in the case of Pex, what we are interested in are counterexamples. We want to get particular test cases, and it turns out SMT servers work really, really well for that, so they have all the relevant decision procedures for the programming constructs that we see, and with a few exceptions. So we are not using decision procedure

for floating point constraints. There is a different search-based approach implemented in Pex, and we similarly have a search-based approach for strings, so it's not complete for strings. We use Z3 as a constraint server, which is the world-leading constraint server, SMT server from Microsoft Research. It has been winning a lot of competitions. In case somebody is very, very familiar with it, these three actually have a decision procedure for floating points, but it cannot be used in combination with all the other decision procedures, so we are not using it. Any question on this aspect of Pex, in coding, constraint solving?

>> Judith Bishop: So the implication is that, in Code Hunt and in Pex, we don't use floating point.

>> Nikolai Tillman: Right, so avoid floating points, or be very aware of what you are doing with them.

So constraint solving over them is not complete, and the other issue of precision, so if the inputs and outputs would be floating point value of a puzzle, then what Pex would want is with identical results, and that might not be achieved, depending on how exactly the algorithm looks like.

>>: Why is string there different? I don't see anything special about strings?

>> Nikolai Tillman: So what's special about strings is that -- so strings have a length, and then you have characters in them. And what happens in Pex is when it tries to find a solution for constraints involving strings, then Pex has -- it's guessing the length. It is not using a sound and complete decision procedure to determine the lengths of the string. It's a heuristic, which is actually different from when you use arrays.

So for strings, you have special -- well, when you use strings, you know there is a function flagged substring equality, deep equality, and so on, which do not exist for arrays. So for arrays, Pex can directly encode all of the constraints into series, for which Z3 has decision procedures, but for strings, where you have operations like substring, concatenation, deep equality, there is no corresponding decision procedure, and it all works down in the end to Pex has to -- is guessing a string length. And then, for that particular length, we can encode it again, but we might not guess the right length, so it's complicated.

>>: You use strings in the puzzles. It's not a problem, right?

>> Nikolai Tillman: It depends. Strings actually work pretty well. In particular, there is one problem that's only there for floating points not for strings, and that is the equality problem on the precision of the results. I mean, a string is a string, and either you get it right or it's wrong, so what remains is just the not-so-optimal decision procedure. Yes?

>>: What if a player puts a lot of floating point computations in the student's solution, to try to more like fraud, make a fraud to --

>> Nikolai Tillman: So it's very, very hard to cheat the system. I will later show you how it works in more detail and how we typically address this problem. So in general, whatever the student puts in their code will not make it easy for them to win, because there's always -- I haven't really talked about how the game works yet, so some people might be a bit confused. I don't know. So there are always two pieces of code involved, the secret teacher code and the student code. And by leveraging the secret teacher code, we can always generate test cases that the student code has to fulfill. There was, yes, William?

>>: So I'm assuming when you guess the length and you guess wrong, there's some iteration, right?

>> Nikolai Tillman: Yes. So we don't just guess it once. There is an iterative search, but after -- again, there is a bound on how often we do that, and then we give up. Yes, and so we have a number -- so we have different approaches. There's a special approach for regular expressions. There's a dedicated engine for that called Rex, and then there's a different approach that we use for strings with coming from

operation such as concatenation, equality, substring, and there are two very nice papers that talk in detail about how that works. Okay. So these three are available. You can look at the source code. It's a really good server. I have said that the symbolic execution engine in Pex is pretty awesome, so I want to substantiate that a bit. So there is, again, a link, so if you are interested to do it now or later, here are some more examples to look at how the symbolic execution engine works, and I already opened this link here. So in Pex4Fun, if you go into a page that has some more content, the way it works is that there are embedded links on which you can click to open a particular code in on the right, which you can then explore or edit. And so we already did CoverMe. Let's look at another one. Let me simplify this a bit.

So question for you, how many test cases would you expect from the engine, and what are they? Do we have a pen, yes.

>>: I'm guessing two.

>> Nikolai Tillman: You are guessing two. What are they?

>>: A equals null and A equals the empty array.

>> Nikolai Tillman: Just two?

>>: At least more than --

>>: One more.

>>: Wasn't it crashed on the empty array?

>> Nikolai Tillman: Oh, wait, okay. So you said null and empty array, two test cases. And what is I?

>>: Zero.

>> Nikolai Tillman: Always?

>>: Oh, there I don't know.

>>: Minus one index.

>> Nikolai Tillman: Minus one with what array?

>>: I, anything.

>>: No, with not empty array.

>> Nikolai Tillman: Not empty.

>>: Not null, not null array.

>> Nikolai Tillman: So then here something like this? Okay, so this is getting pretty close. Let's first of all see what actually happens, and then we can take it apart. It turns out it's four test cases, and most of them exhibit exceptional behaviors. So just as explicit branches, Pex also understands all the implicit control flow behaviors of instructions. So that's why first of all here there is a dereference and there are two cases. It can be null, you get an exception, or not. So even if you don't make all of those branches explicit, Pex knows about these aspects of the .NET runtime system and it will go through the different

cases. And then we get the next interesting input that actually has an empty array and zero, and now we get the index out of range. It turns out that minus one isn't needed to produce that error, and then we actually get an array that has an element, and then it gets further down, solving that constraint, and we get an exception. So the bottom line here is, all implicit branching behaviors and exceptional behaviors are monitored by Pex. Here's another interesting program. It is doing something which you later should not do when you write a coding duel. We will talk about that. What it does here is it actually mutates the input, so part of symbolic execution, we actually track all state mutations and code them. You get more interesting constraint systems, but it's not conceptually an issue. So how many test cases would you expect now and what are they? Let's just [indiscernible].

>> Judith Bishop: But it gets better.

>> Nikolai Tillman: Yes, we will. So in this case, we get a similar behavior, only that now our values are slightly different, because we can no longer influence the ultimate position, zero. So, finally, so I think I said that the intention behind Pex and us doing dynamic symbolic execution is to generate test cases. Now, what good is a test case if you don't know whether it passes or succeeds, if there is no test oracle? So the message here is, if you embed assert in the code, then you have a test oracle, and Pex will actively try to break it or to find assertion violations. So in this case, first of all, you see that one function can call another function, and that is not a problem. And then this is literally how an assert is implemented in any programming language. There's a branch. If condition is [indiscernible], blow up, and that's why here we actually get two test cases, because Pex will try to break asserts. No magic, and please, do write asserts when you want to use the system for actual testing to get a test oracle. Does that make sense? Any questions? So when we go out and advertise Pex and say it's a test engine, we sometimes get feedback that, oh, yeah, it's great. I get a lot of test cases, high code coverage, but it didn't find any deep bugs. It's because typically, when we looked at their codes, they didn't have any assertions whatsoever, and then we can't find any deeper, interesting bugs. Okay? Yes?

>>: So what test cases did you select to display on the table?

>> Nikolai Tillman: Oh, good question. So Pex explores eventually all execution paths. In practice, it won't, because we have finite time, and then Pex tries to do a clever selection. But you typically don't get shown all test cases for all execution paths here, because that grows exponentially. There's a program size, and you typically don't want an exponential test suite. So we only actually emit a test case if it increases branch coverage.

>>: Including implicit branch that you --

>> Nikolai Tillman: Including implicit branches and exceptional behaviors. So the test that you get is basically linear in the size of the program. If you use -- so there are these -- you can configure all of that in Visual Studio if you use a full Pex system. On the websites, there are preconfigured defaults, which especially when we come to the game play, you cannot change. Pretty reasonable defaults. Any other questions? Okay. Talked about that. So in practice, if we use Pex to test real code, you will run into one issue, which is the issue of the unknown environment. And so there's a little program on the right that goes to the file system and does something, and our symbolic execution engine has no clue why the file system is behaving as it's behaving, so using dynamic symbolic execution, you can actually run this code and analyze it to some extent, but we do not get meaningful constraints from the file system, so it doesn't work so well. And first of all, the good news is that for the game play we are going to look at, this doesn't matter, because our game has to be self-contained. It's not allowed to reach out to the environment. But if you're wondering, in Pex, we actually have a way to help you, the developer, isolate your code from the environment. We call that Moles. It's basically a mocking framework. So now you know how Pex works and how we deal with different aspects. I want to give you a brief overview of the history here. So

actually, I think the first line of Pex, ever, was written maybe nine years ago, in 2006, and then in 2008 we released the first version, and it got wildly popular as far as research project goes, so that immediately we had 30,000 downloads, so we're pretty happy with that. Then, so it turned out that our approach to isolate code from the environment on our mocking framework became very popular by itself, because whether you use Pex or not, whenever you do unit testing in the true sense of the word, you need to deal with that problem. A unit test is not supposed to depend on the system -- on the file system. So we released our test isolation framework called Moles as a separate download, and then later we had another version of Pex called Code Digger, a simplified version for the latest Visual Studio edition, also very, very popular. And then kind of interleaved, we launched our Pex4Fun website a couple of years ago that was basically born out of the desire to make it easy for other people to try out Pex without having the right version of Windows and Visual Studio and Pex. Just go to our website, so that was pretty successful, and then thanks to Tao Xie, the idea was born, why not turn it into a game. And if you go to

Pex4Fun, the whole game experience is a bit confusing. Originally, the website wasn't designed to host a game. It was designed to showcase our test generation, and that's why Code Hunt was born, a really dedicated website for the gameplay aspect, hosting contests and, hopefully, having also good teaching experience. So everything here are research projects, and we are really part of our download and user numbers, but they're research projects, so we are even more happy that a couple of years ago, the first thing that actually shipped as part of Visual Studio was Moles, so it's in the box. If you get Visual Studio, the [example] was 2012 Ultimate, it was part of the toolset that all of our professional developers were handed. And the latest news is that Pex itself is shipping with Visual Studio 2015 Ultimate in the box.

They rebranded it, so they renamed Moles to Fakes, and they rebranded Pex to Smart Unit Tests, but it's basically the same experience. So we are really, really happy. That is what we as researchers ultimately want, that our ideas actually make it into the hands of millions of developers, and so you now know that behind all of the Code Hunt fun is actually an industrial-strength testing system that is used to test real code. If you want, you can already play with the new Smart Unit Test experience, so there's a Visual

Studio 2015 preview out there that you can download and try it out. Any questions? Yes.

>>: So with Smart Unit Tests, that's pretty interesting. Do you generate tests that you display to the user, or is it like --

>> Nikolai Tillman: Yes. So on the website, I showed you already that you can click on a line, and then you actually see the code behind the unit test, and sometimes the code can be much more interesting than this. In Visual Studio, you have a similar experience with the table. You can see the test code, and then you can also save it as unit tests in your unit test project, so you can persist them, you can pretend you wrote them by hand, and you can rerun them for future regression purposes without having to do dynamic symbolic execution again, if you want. Also, it's great for debugging that you actually get a concrete test case that you can step through. Yes?

>>: So on there, you need to express it in Smart Unit Tests?

>> Nikolai Tillman: They're in there. All right.

>>: There's another question.

>>: Smart Unit Tests, if you show the test to users, how do you solve the naming problem?

>> Nikolai Tillman: Oh, it's funny that if you take your research ideas into practice and some issues pop up that were not really your core research issues that you thought would matter. This is something that hasn't been ultimately solved yet, so by default, we name the unit tests one, two, three, four, and then you can write -- so there is an extension mechanism where you can write your own naming scheme. And so

when you save the unit tests, you can also rename them in the editor. It becomes part of your unit test code, and then that name will stick.

>>: Yes, it was [indiscernible] names.

>> Nikolai Tillman: There are the names. So unit tests -- let's go back to one that had an array. So the unit tests might indeed contain variables that we have to further initialize, and those names are chosen based on an obvious naming scheme. I don't think that has been much of an issue. People don't care so much about the variable names in a function, but they do get picky about the name of the test case. We could easily change that, but didn't have a --

>>: [Indiscernible].

>> Nikolai Tillman: The oracle, again, ideally, you have assertions in your code, because otherwise you get things like my reference exception, [indiscernible] range, overflow and so on, but nothing deep. So there is no magic for the oracle problem. It's based on the developer writing assertions, either in product code or what we call in parameterized unit test code. So the idea is you can take a regular test case, add some parameters where values don't matter -- if you're testing a stack and you want to make sure that what you push you can later pop, it doesn't matter what value you push and pop. In fact, it should work for all values. Otherwise, we have a problem. But more interesting was a hash table, where the value actually matters for indexing behavior. So you take a test case, pull out a value that shouldn't matter, and then you get a parameterized test case, and since it's a unit test, you better have an assertion in there, and then Pex will try to break it. Yes?

>>: Does Pex or Smart Unit Test benefit from Code Contracts that is --

>> Nikolai Tillman: Yes, so Code Contracts is actually developed by our colleagues down the hall in this building, and so Pex and Code Contracts work very well together. Basically, Code Contracts, whether you've got preconditions, post conditions or invariants, they are all assertions in a way at runtime, and then Pex will try to break them. Okay? Okay. So let's talk a bit more about what all of this means for

Code Hunt in particular. So first question, who is already familiar with the basic Code Hunt game experience? Who is not familiar with the basic Code Hunt game experience? Okay. You will learn it on the go, so since most people know, I will skip the introduction, and you will see it from here on.

So the basic idea of the game is there are two programs involved. Tao might recognize this slide. One program is the secret program, written by the teacher or someone behind the scenes, and then there is another program which sits in front of the player, and the player can modify. And so the idea is that they have the same signature, the same parameters and the same results, and then what we literally do is synthesize a driver that calls them both and checks whether they behave the same. So for the same inputs, you expect the same outputs. Otherwise, there is a mismatch. So this is how the game works, and this is literally how we decide whether the player program works the same as the secret program. And so if there is a mismatch, you get a table, as I showed you in Pex4Fun, and it shows you where you got it right, for what values you got it right, and for what values you got it wrong. And then the idea is that a player, in an iterative way, changes his program or her program to make it more like the secret one, and if there is no mismatch, you win. That's the basic game idea. What it really means is that if you want to enter a new puzzle into the system, all you have to do is give a reference implementation for some algorithm.

That's the basic idea, right? So you don't have to give a set of test cases. In fact, you cannot. You have to give a reference implementation. And the only thing that matters here is the behavior of the code. It doesn't matter whether you do it with a loop or with a recursion. It doesn't matter what syntactic construct you use, because under the hood, Pex is going to translate all of the code, all of the instructions, into a formula for our SMT server, which does a deep behavioral analysis. So it's interesting that, in fact, most

of our solutions that we get are all slightly different. So a bit more about the technicalities of how it works. So while Pex4Fun, our first interaction, runs on a single server somewhere here in this building,

Code Hunt is a cloud-based system, so there's a back end. I will talk more about that later, api.codehunt.com, which is implemented in Windows Azure, running in the cloud in some datacenter, scaling up depending on current load. It can handle an arbitrary number of people, and we did some contests with thousands of people. I think Judith will talk more about that. So then behind the scenes, what actually happens is Pex is analyzing the code of the user, together with the secret code, and it does that in a sandbox. I will talk more about that. For practical purposes, we limit the submission size, so in principle, you can submit pretty sophisticated program, any classes in it, many functions, but the overall size limit is 32K, kilobytes, after a deflate compression, a little technicality just in case you will try to -- I don't know, submit your entire product. It won't work. It's meant for this game-size program. A limitation of Pex is it only deals with single-threaded code, so you're not allowed to use threads or tasks.

There are many, many defaults preconfigured that cannot be changed, so if you just want to test something out and at some point you think, maybe it's a limitation of Pex, it might be just a limitation of these default configurations, and you wouldn't run into that if you want to test your product code in Visual

Studio. So in particular, we have fixed the default Pex search strategy, which is pretty clever and tries to be fair, and there are certain timeouts in place, so each individual query basically trying to decide a path constraint is limited to two seconds, and there is an overall 30-second timeout. And that becomes relevant as soon as there is a loop, because then it will most likely take 30 seconds, since the number of paths might be unlimited. Without a loop, you get very, very quick results. There are other limits in place, so you might think about trying to trick the system by writing an infinite loop or just getting really, really hard constraints, so a particular limit in place is one that limits the number of instructions along any discovered execution path, and the idea here is that you basically want to detect likely non-termination, and then you lose -- or the player loses who tries to submit that. If you cannot find a counterexample under all of these constraints, the player wins, and I will show you there's some special things you can do in the secret program which the player is not allowed to do in their program. In particular, in the secret program, we can limit with assumptions valid inputs, which the player is not allowed to do. I will talk about that more later. So we had the question of what data types can we use in passive signatures, and here is the answer in a bit more detail. So you can use simple data types such as bytes, Boolean, character, int, technically even doubles, and of course, strings. And then there are many more that are just different bit sizes and some more details. And you can use arrays of such data types. As we talked about, you should avoid floating point numbers. Strings are typically okay.

>>: And objects?

>> Nikolai Tillman: Objects. Well, technically, if you have an array or string that is an object, and the

Pex engine underneath, if you use Visual Studio, supports any object. However, today, in Code Hunt, we do not allow you to use custom objects or custom types in the signature, and it's basically implementation limitation, which if there's a lot of interest we could work around. So the basic problem here is that we need a way to compare results, and for simple data types in arrays, it's very simple. We basically do a deep equality. If you had a custom data type, then we need a custom equality function, and today we don't support that.

>>: I would suggest, right, that if that is the key conceptual issue, then you could support up to deep equality, right? Or value types, right? Something that moves away from what I would consider the historical overemphasis of arrays in interim programming, right?

>> Nikolai Tillman: So it gets a bit more complicated. Just at a technical issue, there is no deep conceptual issue, but -- so basically, what we would need is yet another piece of code which defines data types that are accessible by both. So today these get the secret program, and the player program get

compiled separately, and it's just all getting a bit more complicated. We can edit if that's a top request.

Yes?

>>: You can [run] that by allowing types like tuple, which are in the standard library but more complicated.

>> Nikolai Tillman: That probably would be a very good first -- I think the truth is, when we originally designed Pex4Fun, tuples didn't exist yet, so it's not there.

>>: Well, they existed, but not in .NET.

>> Judith Bishop: Would structs be all right?

>> Nikolai Tillman: Technically, we can support anything. The problem with any custom data type is that we need yet another piece of code that gets compiled separately and is made available to both, so it's limitation and development manpower. Okay. So I mentioned before, I will literally generate such a naive driver program that compares both, and in practice, it's slightly more complicated because of exception handling, so if either program throws an exception, we make sure that both throw the same exception type. Otherwise, you lose. The player loses. So we can deal with exceptions, too. There is an important further detail here. So Pex will always start exploring the secret program just by sequentiality, semantics of .NET, which means we get a good coverage of the secret program, and only then Pex starts exploring the player's program. So even if the player's program tries to do something naughty or tries to confuse Pex, we first of all get good coverage of the secret program, and that way, we will find certain issues in the player's program. In the big picture, let's say the player doesn't do anything funny. What this really means is that Pex will eventually try to explore all combinations of execution paths of the secret program and the player's program. So basically, the cost product of everything that can happen, so it's a very, very thorough analysis, as it turns out.

>>: Sanity checks for secret program then?

>> Nikolai Tillman: Sanity checks, like what?

>>: It should not go into infinite loop?

>> Nikolai Tillman: For the secret program, the same applies for the player's program. Any infinite loop or exceeding this instruction limit will make the player fail. So as a teacher, you can easily create a puzzle that is not winnable, if you want. And that's part of what I'm going to talk about next, all the things you should watch out for to not create puzzles that are unwinnable or really, really hard to win, even if you might not have intended to. But yes, so one takeaway here is that you should always test a puzzle you create, whether it's actually winnable or whether it's a good experience, because sometime, if you don't do that, you might accidentally have a non-termination issue, and then it's just a bad student experience. Yes.

>>: So we cannot put the input code, the input variables in secret code, right?

>> Nikolai Tillman: Exactly. This is literally what we do, and for performance reasons, if you have an array, we don't clone the array, because that would basically mean one more level of interaction for Pex.

So if the input is an array, your secret program better not mutate the array, because otherwise, things are very confusing. They're still very defined on some level, but very, very confusing. Yes?

>>: So in the previous slide, you mentioned that the secret code could include assumptions how to interpret that particular behavior when you talk about equivalence of the user code and the secret code.

>> Nikolai Tillman: Yes, yes, I will get there very soon, I think. Yes, assumptions. So so far, I haven't talked about assumptions at all. So let's do that now. So let's say I have a program, and there's a branch.

I like throwing exceptions, so now we get two test cases, because there is a branch in it, can go two ways, and what we can do -- and this is only allowed in the secret program -- is we can use assumptions to shape the input space, so I can have an assumption that says I must be greater than 15. And now, what I effectively did is I cut off everything outside of that range, and instead of two test cases, you only get one.

This program always fails, by definition. Another route, what actually happens is that Pex explores both cases, but there is a branch in the code. But then, the case that fails the assumption is simply ignored. It's just never shown to the user. It does not enter the gameplay, and so it's an effective way to restrict the input space. And you can only do that in the secret program. It's not allowed in the player's program, because otherwise, you can easily get rid of all failures ever. So this, in Pex4Fun, in the mode where I'm just exploring programs, it's not rejecting it. This is not a gameplay right now. I'm just exploring a single program, and I get literally no test cases anymore with that, so beware. Yes?

>>: So for preconditions, you [indiscernible] a lot, [shaping] the power of the program behavior that you want users to add missing preconditions, if my secret code has Code Contracts preconditions?

>> Nikolai Tillman: I think for Code Hunt, we do not encourage using Code Contracts.

>>: We don't run the runtime permutation, so my own line is do not use Code Contracts in your duos.

Don't run the rewriter.

>>: I use Code Contracts in Pex4Fun exercises, but that was allowed, but in Code Hunt it wouldn't --

>> Nikolai Tillman: And there are some other differences between Pex4Fun and Code Hunt, just good that you asked. So in Pex4Fun, we supported C#, Visual Basic and F#, all .NET languages that are easy for us to support since the Pex engine actually only cares about the compiled .NET instructions. In Code

Hunt, we support C# and Java, because it turns out that nobody ever used Visual Basic or F# in Pex4Fun, but we get a lot of requests about Java and also C and C++, Python, but that's a different story. That's another difference between Pex4Fun and Code Hunt. Okay, so what's so great about assumptions? Why would you actually want to use them? So here's a very practical example why you want to use them. So

Pex is very good at monitoring the meaning of .NET instructions, but sometimes that can result in confusing behaviors, which ultimately might actually be real bugs, so there is a reason why Pex does it the right way. But for the gameplay, it might be unintended, unless you actually want to teach someone about overflow behaviors. So let's say I have this code. What would you expect? I kind of gave it away.

So we have 32-bit integers, and by default, the behavior in C# is that it overflows silently, no exception, and Pex knows that, so you get this test case. Now, that might be confusing if you just want to teach addition to someone who is new to all of this. So this could be a great place to use a Pex assumption where say that I don't want integers that are bigger than something like this. And now this -- if I got that right, this ugly test case is just gone, filtered out. Pex still internally actually generates it, but it filtered out as an assumption violation. Does that make sense?

>>: You also will need to think about on the flow, right? I mean, similar things there, for some other operations?

>> Nikolai Tillman: Yes, so overflow, underflow, division by zero, there are many behaviors which fall into this category.

>>: Can you use checked?

>> Nikolai Tillman: Okay, I can do that. So this is the default behavior, and I suspect it's the same in

Java. I'm not a Java expert, but Nigel noted. So in C#, there is a way to actually turn the silent overflow into -- I think it's both -- into an explicit exception, so there are different ways how to write that, and one is using the checked exception with magic in C#. It actually produces a different addition instruction in the code, which I guess Java doesn't have. And now, this subtle change here is you're getting an overflow exception here and not exception what. So Pex knows about all of this, and it's ultimately up to you what you want to achieve with the puzzle, whether you want to showcase something like that or put it under the covers. And there are some really surprising behaviors, so I'm sure that -- well, I suspect that most of you do not know everything that can go wrong here.

What are the exceptions that can be thrown here?

>>: Division by zero.

>> Nikolai Tillman: Division by zero, that one, yes. Relatively my [indiscernible] exception. And I think this is actually slightly different in Java, so indeed, overflow exception. When you try to divide min value by minus one, instead of silently big ignored as is the case for addition, they actually throw an overflow exception. It wasn't my idea. We are just modeling what it is. So again, you might want to limit the range if you use division to avoid that case.

>>: Have you tried to -- since this is about pedagogy, right? There's a reason to ignore overflow, and it's very dangerous to ignore overflow. So do you think about just supporting big integers and encouraging those for a certain set of puzzles?

>> Nikolai Tillman: Good question. So big integers would be -- and conceptually, we could support them, but it would be a lot of work and we don't have the engineering power to make that true. That's the truth. I can see two different ways of approaching them, either by actually analyzing their behavior as it is defined in .NET, probably using some backing array and doing iterations in loops and huge, huge overhead. Or we could actually model them as God's integers when we go to the [three], but this is a whole new data type, and again, you think of engineering.

>>: Well, that's fascinating. You would definitely recommend the latter, right? Trust the .NET influence on God's integers, correctly telling Z3 these are God's integers, and then certain things get easier, like addition and subtraction, and certain things get much harder, like division.

>> Nikolai Tillman: Yes, but for many practical purposes, filtering out these overflow cases is just good enough. If you want to teach addition or some basic constructs, maybe to someone with the intention not to teach programming, per se, but just arithmetic, then just filter out those values and everything is good.

>>: I mean, in practice, we are -- we put assumption ranges that are much smaller to force smaller numbers. It's pretty hard to reason when you have eight-digit numbers. It makes the game really hard.

>> Nikolai Tillman: So I'm going a bit over time. I hope that's okay.

>> Judith Bishop: No, so you've got until 10:30?

>> Nikolai Tillman: Oh, I do? Oh, okay.

>> Judith Bishop: We can start the next session at 11:00.

>> Nikolai Tillman: Okay, very good. Perfect. So let's talk a bit more about path explosion and also in connection to assumptions. So Pex explores all execution paths eventually, except that we will stop after

30 seconds. So it's very easy to run into the path explosion problem where Pex indeed has to analyze an exponential number of execution paths, which is something that typically happens when you have a loop with branches in it. All right. In fact, you don't even need a loop. If you just have a sequence of branches, there could be an exponential number of paths. So that is in general not necessarily a good thing. Often, it's okay, because Pex has a pretty good internal search strategy that will try not to get stuck anywhere in particular, but we have found that you actually typically run into this issue when you want to filter out certain values. So let's say you have a puzzle that takes an array, and for some reason, you really want the elements of the array to be positive. Maybe, again, to avoid some kind of overflow. So the naive way of doing it could be by writing such a loop and using the somewhat standard double ampersand, which has short-circuit semantics and, if you look at the code that is actually generated, it's a conditional branch in the code. So that's another thing to be conscious of, how what you write in the surface syntax like C# or Java, how that translates to the byte code in the end, which ultimately is what

Pex analyzes. So with a double ampersand here, Pex has a really hard time satisfying inputs, because the number of paths explodes. So in the simple, simple -- maybe I can demo this. So I'm sure that Nigel made this experience a couple of times as a professional puzzle writer, so let's start with the simple ampersand. Rather, the double ampersand, which is a short-circuit behavior that results in a conditional branch, so you see that Pex is not able to come up with an input that satisfies these conditions. And if I now turn this into a -- and it took a moment. If I turn this into a single ampersand, then immediately, you will get actually something that works. So it's not a problem for these three. Our constraints are how to deal with the 100 constraints that come out of here, but it is a problem for Pex if it has to explore an exponential number of paths to ever get there. So the contours of Boolean operators, sometimes, you have to use a double ampersand. If you get an array, you first have to make sure, typically, that it's not null, and then you can look at the length. You cannot do that without short-circuit semantics, but whenever you can avoid that, please avoid it. Similarly, so here we really have to use a double ampersand. Similarly, if you could write something else -- no, what do I want to do here? Let's say I want to fix this to 42, and then I want to fix that to 21. Then -- so this results in two branches, two conditions that are checked, to it's more efficient to pull this together. I don't know if I need more parentheses now. So again, this is more efficient, because if you look at the .NET code, in the end, we need to check one as much. Okay. So that is the story how to avoid path explosion when possible, and it's especially important in the secret program, which Pex will look at first. If that already explodes, then not so much time to look at the user code.

>>: Did you ever think of trying to put in a hack for amper-amper, like to notice that, okay, these are paths, but it doesn't matter.

>> Nikolai Tillman: Right. So in fact, there is some logic built into Pex, where we take conditional control flow and turn it into a closed formula when possible. That might kick in -- it's a heuristic. In general, given code, it's difficult. I think the heuristic is that if you have some check in a static function, and this can be expressed as a closed formula, then Pex will do that, but you have to factor out that code into a separate function, and even then, it's a very conservative heuristic, because in some cases, it's -- I mean, you have to do it in an iterative way, and then the heuristic might not kick in. So do not rely on that heuristic. I would encourage you to do that reasoning yourself. That's what it is now. Okay. So I might have briefly mentioned that before Pex actually runs your code, so we also run all submissions to

Code Hunt and Pex4Fun, and that has certain implications. You could try to do something naughty and delete the server disk, right? People have tried that, of course, unsuccessfully.

So the way we deal with that is .NET comes with some sandboxing concepts, which unfortunately we cannot use for technical reasons, so the way Pex works is we instrument the code to monitor what it's doing. In parallel, to conquer the execution, we build symbolic path constraints, and we do that by instrumenting the code, basically inserting a callback for every single instruction. And this instrumentation is extremely intrusive, and it makes the code inherently, in .NET terminology, unsafe. So for that reason, we cannot use the standard .NET sandboxing concepts, because as soon as something is unsafe, they would just say whatever, either you trust me or you don't. So what we do instead is a whitelisting approach, so whenever you use an API, it has to be on our whitelist, and the basic intuition is that the whitelist only contains purely functional things, no IO, obviously. You cannot tinker with security settings. You cannot create threats. What else can you not do? Oh, yes, so .NET has a mechanism, PInvoke, to call into random [indiscernible]. Not allowed. Actually, I think that was a hole that we had for a while, and a nice person reported it to us. Fixed.

>>: Massive.

>> Nikolai Tillman: Yes, so if you do it anyway, because you got interested, then both in Code Hunt and

Pex4Fun, what you get is a message --

>>: Is that a code [indiscernible].

>>: Typos.

>>: The first set, second line.

>> Nikolai Tillman: Oh, that string. Right. So it compiled successfully, but then we do the analysis of all APIs and concluded not a lot, so don't be confused by that message. It means you are using something that's not on our whitelist. And we have refined it so that it should make sense. You actually run your code, and that is a nice side effect, that while doing concrete execution, you can actually look into the symbolic world that is being constructed in parallel, so to illustrate that, let's say we have some checks here, 42, and maybe we perimeter, say we also change the array. Then we -- let me try to make this a bit more interesting. So for every execution path that gets executed, I want to do some logging, basically, and one way to do that even in the presence of exceptions is by using a finally block, and I'm going to show the whole program in a moment. It's not beautifully arranged, but it should be correct. Does anyone see a problem? No. Let's run this. So if you're wondering sometimes what happens under the hood or if you don't understand something, or maybe you just want for research purposes a path condition out of some complicated piece of code, then you can use this mechanism and you can write to the console, and that shows up as part of the outputs, so I can click on a line here to see the pretty printed representation of the path condition that looks like C#. So under the hood, everything Pex does is actually at the level of MSIL instructions. And sometimes, there isn't really a good mapping, but typically, you get C# code that would actually compile, and in the end return the Boolean, which sends the entire pass condition. And you will see that certain aspects have been normalized, so -- but this is what Pex actually reasons about or gets to these three. Yes?

>>: When you were reaching the path condition with the Z3 impacts, so that I can copy-paste everything, the conditions --

>> Nikolai Tillman: If you use Visual Studio, yes. We can actually emit these three files using SMT-LIB syntax, but here, that is not exposed. This Pex symbolic value class has a number of interesting things to offer.

>>: The role path condition?

>> Nikolai Tillman: Right. So there's something called role path condition that gives you S expression with all of the low-level operants in it, but there is no SMT-LIB series that exactly matches that. It's a level of the .NET instruction set. There's another mapping layer to these three, which we don't expose here. I think Judith indicates that I should be faster. Okay. What else? Whitelisted APIs. You're not allowed to call anything that results in random values, in particular, system random is out. There are still ways. I think you can look at the hash code of an object, which unfortunately is random, but you should avoid that. Similarly, avoid using static fields, unless you know exactly what you're doing and you're just doing some lazy initialization, because Pex does not reset them. If Pex runs the code with different inputs, it does not reset static fields, so don't use them. It basically results in nondeterministic behavior.

It's not, however, but it's probably not what you want in a puzzle for a student. If a student does it, he just makes his own life miserable, but it's not a conceptual deep issue.

Okay. Oh, and we talked about that earlier. Do not -- we probably cannot say that often enough. Do not mutate arrays in your secret code, because that's also visible to the user code. Another useful hint is Pex is going to produce test cases, as many as there are branches in your code roughly, right? So if your secret algorithm is a closed formula with no branches in it, you would get one test case, and that's not a lot. So a way to get more test cases is to write dummy branches in the code, and that is a technique that players can use to learn more about the input-output behaviors, but it might also be interesting for you to force from the beginning multiple test cases for the user to see? So we've heard different opinions. Here on one hand, it's part of the fund in Code Hands that the user has to discover what the problem is. It's an iterative process. On the other hand, sometimes, it might be a bit frustrating, and if you don't want to teach how to write if statements, you can give some values right away using this technique. So I'm almost at the end. The last thing I want to advertise is our back end itself. So it happens to be a publicly accessible service at api.codehunt.com, and if you go there, there's a nice help page that tells you all about what you can do, and so at the top -- it's running in the cloud. It's dynamically scaling up. Right now, it doesn't have much to do, but if there was a classroom hammering on it, it would scale up the capacity, which is basically the number of concurrent Pex runs it can do. Remember that every invocation can take up to 30 seconds, basically using up one core in the cloud, so that's pretty substantial, and that's why we can scale up to basically any number of users.

One word of caution, there are some statistics here which represent the view of the world from our back end, so if you write a bot and hammer the back end, trying to either reverse engineer some secret programs or do something else, it will show up here as programs you submitted, explorations that were done, and you also have to create users. Everything goes under users, so these are not the numbers of real human beings, end users sitting in front of www.codehunt.com, but these are users from the point of view of the back end. I should clarify that. So talking a bit more about the API, there's an authentication scheme where you can either get an anonymous user account, so that's very easy, or you can also get a persistent, real user account. And then, everything you do is exposed in terms of REST APIs. Who's familiar with REST APIs in general. A few people? Okay. So these are very clearly defined Internet endpoints, so what you can do is you can emit individual programs. You can also trigger our translation engine is isolation, if you want to translate Java to C#, and then you can kick of an exploration which can either generate test cases for a single program, like Pex4Fun, or it can generate test cases comparing two programs. It can take up to 30 seconds to complete, and then you get the outputs back as test cases. So data type is defined here in terms of TypeScript syntax. So TypeScript, who is familiar with TypeScript?

Oh, not so many people. So it's basically a typed façade of a JavaScript, and particularly, it allows to define interfaces with annotated data types, so you know you get JSON objects back, and by looking at this interface, you know what is the structure of this JSON object. Let's see. For your own users that you own or create, you can also get the history of everything you have ever submitted. We do keep that around. If you indeed hammer our cloud with requests, there are certain limits in place where we would try to throttle you. If you are trying to do something interesting, serious, let me know, and then we can

lift your limits, so that we don't think it's a denial of service attack. And the limit should be plenty, but in the end, somebody has to pay for the cloud, and it does limit.

>>: So it's better if you ask us for a special user. We can filter in some ways, than creating thousands of anonymous users.

>> Nikolai Tillman: Yes. That's why we have so many of those. User settings. Code Hunt has a leaderboard which you can query, and then in the next session, you will learn more about how the games are structured. So there is a concept of levels, multiple levels, put together in a universal world, and this

API allows you to interact with that structure. There's a bit of a mismatch. It usually meant that I first implement something in the back end, and then the front end team looks at it and says, oh, oh, great features. We want to expose it, but we would like to call it differently. So that's why you would find a mismatch. What's called a world here is called a universe.

>> Judith Bishop: Or a zone.

>> Nikolai Tillman: Or a zone.

>>: Or a game.

>>: And we can --

>>: Real contest.

>>: A universe is a zone. A sector is a world.

>> Nikolai Tillman: Oh, right, right. Yes.

>>: Do you have a mapping table of that?

>> Nikolai Tillman: Yes. We should document that somewhere. Okay. Yes, that's it. So this API is publicly available for anyone to use. It's basically a testing service in the cloud, and the game service in the cloud. And with that, I am a bit over time, but I am done. So Code Hunt is a serious game, and it's powered by an industrial strength test-generation engine. There are various resources you can leverage to look behind the scenes, so API, codehunt.com I just showed you.

>> Judith Bishop: That's a new one now, remember.

>>: /codehunt.

>> Judith Bishop: /codehunt.

>>: Oh, you have the new website?

>> Judith Bishop: We have a new website.

>> Nikolai Tillman: Oh, right, I see what you mean.

>> Judith Bishop: It's Friday. We've got a new website.

>> Nikolai Tillman: So it's so new that it wasn't on here yet.

>> Judith Bishop: So this is another website to write down.

>> Nikolai Tillman: Yes. So on our research Pex website, you can find all of our papers that talk more about the engine, and on the Code Hunt website, you find dedicated information about Code Hunt, what we already analyzed and different views on it. I think Judith will give particular overview later on. Okay, any more questions?

>>: What is the license behind the Pex?

>> Nikolai Tillman: The license behind Pex and Code Hunt and everything? So Pex comes as a -- so maybe I go back to the family slide. How do you do that? Okay, so here are all of our products. So Pex,

Moles and Code Digger are Microsoft Research downloads, and they come with licenses, which by now I think do not --

>>: [Indiscernible].

>> Nikolai Tillman: Still?

>>: Yes.

>> Nikolai Tillman: Okay, so they come with licenses that basically say not for commercial use, and we give no support whatsoever, officially. It's closed source. So Pex4Fun and Code Hunt. If you submit code, basically, you grant Microsoft the rights to do whatever's necessary for the experience, and we also store it and reserve the rights to do further research on it, research by researchers of Microsoft research, but also research in collaboration with external partners like potentially you. So that's part of the license agreement, and everyone has to implicitly or explicitly agree to that before they submit to those websites.

Now, the shipping versions of what's now called Fakes and Smart Unit Tests, they come with full-blown commercial use licenses as part of -- like everything else in Visual Studio. So if you actually want to use it to make money yourself, you should buy Visual Studio Ultimate, and then you can.

>> Judith Bishop: So I think the background to what you're saying is to what extent would this one day be open source, and that's probably something we'll talk about later down the line in the next two days, for sure. Thanks very much, Nikolai.

Download