Document 17836754

advertisement
>> Judith Bishop: Thank you very much. Thank you for all your feedback and suggestions. I think that
after this talk on Analytics you’re actually going to have some more, because I’m going to switch a little
bit from the focus on single puzzles to the focus on lots of puzzles. What happens when you have got
lots? How can you use all these puzzles and make a big experience out there for everybody and then for
yourself once you get the data back?
Those are the two things, you’ve got the puzzles. You want to actually get them out there to the world.
You’ve seen a small experience of that. But now how do you get the data back and use that data
perhaps in a feedback loop or perhaps just for your own research?
Summarizing what Code Hunt does for you is, and other problem, other platforms like this. Gaming
platforms is that they enable you to actually work at problems you otherwise wouldn’t work at. It’s this
old thing about learning how to play the piano. You need ten thousand hours they tell you. Well if its
fun and people make it fun you actually put the work in. Learning to program is one of those things. If
you can make it more fun, and discovery is a powerful driver for that, and gaming which is in Code Hunt
joins these both up.
Now one of the biggest gaming platforms is the ACM Programming Contest which is highly regarded.
Quite respectable, it’s been going for a very long time. But Cod Hunt is completely different to the ACM
Programming Contest. The reason being the ACM Programming Contest tells you what to do. It says
here’s a specification, solve the problem. Now you might say well its fun solving problems. But that is
not the way that Code Hunt works. Code Hunt says there’s a problem out there. What is the problem?
It’s a little bit more like detective work.
As you were told earlier it’s all built on top of Pex for fun. Now when we started this whole thing we
realized that in fact we had different audiences. I think this is important when you’ve got a research
project. You can’t be having one size fits all on everything. Obviously it is one size I mean there is only
one project but different parts of a talk to different people.
We’ve got the coders and students out there. When I talk about coders these are people like
developers. We have a lot of developers who use this. It’s not just students. We don’t know who’s
actually using it. But we know they’re not just students. We have educators. We’re not currently
servicing educators in a big way. We hope that at the end of this workshop that we will be on a path to
have a plan for educators.
Recruiters, now within Microsoft you might not be totally aware of this. We have people who recruit
students and they’re using this platform now, ever since September. They’re watching what’s going on
in this platform and see who are the top coders. They treat these people a little more specially when
they go out the universities. This is not unique. All industrial companies do this. In fact Microsoft’s
probably the last one to do it, as usual, sorry.
[laughter]
Yeah with this particular thing. Researchers also, and this is the main thing that we want to do, mine
extensive data in Azure. We then can evaluate how people code and learn.
To give you some idea of the data we’ve had several hundreds of thousands of users since March.
March is when we really kicked off this platform. It was about March the third. We can take statistics at
any time over any period. It’s all worked, and here is a picture which was over a period from September
to October. It was about a one month period. What its showing is the loyalty of users. This is one thing
on the website. One thing that Visual Studio website gives you.
The blue figures are new users and the purple ones are returning users. If you’ve got about a fifty
percent which this one is showing you, you have, then that’s considered to be pretty good. You’ve got a
lot of people coming in. You’ve got existing users staying on the platform and this is a good thing.
Whoops, here’s some figures about languages and also about the global interest on Code Hunt. Now
this changes from time to time. It all depends where the contests are. But as you can see Polish at this
time was a huge interest because we were running contests in Poland at that time. It’s dropped a little
bit. Also Brazil which is Portuguese up there, they seem to have picked it up all on their own without us
doing anything. There’s a big interest from Brazil. We’ll have another bigger interest from China when
we start the Chinese competitions in April. Globally you can see the effect here. Europe is bigger at the
moment or at that time when we were taking the data than North America was.
That is Telemetry data which is on the website. We’re just reflecting what’s going on, on the website.
We also have other data which we mine which is at any time people can answer a ten question survey.
You’re welcome to answer it. Its a little feedback button on the right hand side of the website. Some of
the questions in there are interesting.
I wrote the questions and I wrote them in my particular style. They might not be particularly; they might
not pass a test for the best way for writing questions. But I said how much did the puzzle aspect of Code
Hunt keep you interested in reaching a solution? As you can see a lot and quite a bit, if you add those
two numbers up come to eighty-three percent. The puzzle aspect kept people interested.
Then the next one I was interested in are out of the ten questions was, were your final solutions well
structured code, in your opinion, right? The answer came back seventy-seven percent felt their code
was good. Not only did they feel it was good but I put the little piece in there. I kept tidying my code
up. They’re not submitting rubbish just to get the answers. That was what I was trying to get out of
people, right. But at least fifteen percent said there were extra statements in there which they just left.
These are interesting. But these particular statistics were not really relevant to contests that we’re
doing.
For contests what we want to do and this is where the rest of the talk will move to. Is we want to
identify top coders and we want to make these competitions fun. Now the little green and orange block
here is what motivated us to move into this area. Our Chinese outreach colleagues were running a
competition. They asked us to help them with this competition. Would we use Code Hunt alongside the
ACM Programming Competition? We did. What happened was that of the two thousand players in the
competition we saw that there were forty-one average tries per player on, across all puzzles, across all
players. This was just a raw figure that we could extract.
But if we took the top three hundred and fifty players and that was just a place where we drew a line.
The number of tries was just over seven. It seemed as if in this big competition number of tries, for
whatever reason, and we have been discussing reasons as to why tries might or might not be a measure.
But for whatever reason it was an indicator in this competition. There were four sessions in this
competition, so it went on awhile, was an indicator of top coders.
We decided to stick with this measure. Since then we’ve had fifteen competitions and I’ll tell you a bit
more about them. Some of them had only about twelve people in them. Some of them have had
thirteen thousand people in them, a bit crazy. We’ve got regional contests around the world. We’ve got
Image Cup contests. Imagine Cup is a big event that Microsoft runs and Code Hunt is part of that on a
monthly basis.
Now, how do we create new contests? Let’s give you some insight into this whole process. It would be
wonderful if puzzles fell from the sky.
[laughter]
Unfortunately they don’t anymore than exam questions fall from the sky. I think for those of you who
are educators it is exactly the same kind of exercise you have to go through. You have to think of a
problem and you have to think of a new problem. You have to think of an appropriate problem. You
have to be good at thinking of problems. We didn’t force that on you today. We gave you problems just
to amend.
But we’ve got a couple of people who think of new problems. Then we’ve got to put them in a bank and
although it says two fifty we’re up to three hundred now. For each puzzle we have properties.
Tomorrow we’ll be showing you live what the bank looks like on a live tool that we’ve been constructing.
On the right hand side you’ll see the kind of things that we record. We record a puzzle number. We
have five categories, very, very simple, numbers, arrays, strings, bools, binary. That’s it and that’s the
kind of puzzle it is. Then we have the description. That’s highly secret. For the first time today you’ve
actually seen some of those descriptions. But before today we’ve never let them out because that’s,
that would spoil the game if anybody got to see them. Please don’t tell people about those.
Then there’s a key that says who wrote the puzzle. Then a number two, ah what’s that? That says how
difficult we think the puzzle is on a scale of one to five.
>>: [indiscernible]
>> Judith Bishop: Right, so that is a thumb suck. It is the person who’s doing it and has been doing
puzzles for a long time says this is two, right. This is going to be the crunch of what I’m going to tell you
in a minute. Then we create a lot of contests. Now because I’m the one who uploads these contests in
the same ways you’ve just uploaded one. I get to see lots and lots of contests. These are a lot of
contests that I can see. You can see the Poland one. You can see a lot of Imagine Cup contests there
and various other ones. I can actually go in there and check the results of those contests. Once I’m in
anyone of those as you know I can check also the leader board which is public and the dashboard which
is private to the owner of the thing.
Now when we’ve got these contest finish, a contest is over we update in the bank the fact that the
puzzle was used in that contest. Now that’s important because in certain cases we don’t want to reuse
a puzzle. We don’t want to reuse it immediately. Maybe we’ll reuse it in six months time or something
like that. There’s various structures. It’s just like an exam question. Then we modify the difficulty.
Great question there Mark, the subjective difficulty needs modification. How do we do that? Well,
we’ve got two options. The one is well we could modify it based on how the user scored on that puzzle.
Or we could modify it based on the number of tries. We decided to go for the tries because it depends
on the mix of students who enter. It also depends unfortunately on the internet speed so that’s a
disadvantage. It also depends on whether students use an IDE so they go offline and then they come in.
They might only have two tries but they’ve actually worked on the puzzle a lot. It’s a bit of a
disadvantage. But for the sheer numbers that we have we think we’re winning on this metric.
Now you’re going to have all the writing. This is the formula we use that we publish. It’s not secret. It’s
the perceived difficulty. What happens is that we calculate this. We multiply it by the number of people
in the contest. We replace the difficulty by this new difficulty across all contests at that time. The
difficulty is updated after each contest. I’ll show you how spectacular that can be or how incremental.
Yeah?
>>: What tries are not timed to solution which you know from what we’ve heard seems like a better
measure of?
>> Judith Bishop: Which?
>>: Time to solution, how [indiscernible].
>> Judith Bishop: Well, now well you see we specifically never use the time to solution because people
can go to the bathroom. It can be night time where they are. You know it can be midnight they go to
bed, they come back. We specifically never ever use time.
>>: [indiscernible] the kind of things that would affect time are the kind of things that if you have a large
number of users they would average out. Versus with tries it seems that it’s easier to get systematic
biases based on you know one group of people who’s just accustomed to pressing Ctrl S every time for
example, every time they type something.
>> Judith Bishop: We’ve had this discussion a lot and time has been one of the things we’ve really didn’t
want to get into. Because a game is something you sit with and you walk away from, and you sit with,
and you walk away from. Do you want to add something there Pelli?
>>: We don’t know when they stop playing if they are still working on it or whether they just don’t.
>> Judith Bishop: Yeah, Alexi?
>>: I’ve commented some times, so time works pretty well if the competition is time constraint, like it’s
one hour, two hours, four hours. You can assume that no one goes to the toilet maybe.
[laughter]
[indiscernible] it lasts for two days and people can go sleep and so on. It’s really a bit metric in this case.
>> Judith Bishop: Yeah, most of these contests are two days which actually I’m not very keen on. But
that’s the way they’ve been setup so, Nigel?
>>: That’s also the complication that the person can give up on the puzzle for quite awhile and then
come back to it again later. How do we add that time affect?
>> Judith Bishop: We really did think about it. You know these time lapses are awkward to try and nail
together after the event, yeah?
>>: You do know the time stamps for each try, right? I mean, so you can go back to the data and
[indiscernible]…
>>: We could do be smarter. We could definitely try to put that easily to gather it.
>>: Of course everything is time stamped, it’s recorded.
>>: Right, so we got this great suggestion earlier, times, right. That counting more than one try within
thirty or sixty seconds is not useful.
>> Judith Bishop: Yeah, so I think what’s coming out of this is that there is scope for other ways of
calculating this. I mean I’m going to tell you how much data we’ve got and all the data’s there. It can be
recalculated all over again. It’s not lost at all. Yeah, we can do the figures again and see what happens.
Here we’ve got the formula which actually just looks at two things. The first one is the B tries which
looks t the difficulty of a puzzle progressively higher as more attempts are used to solve it. But the C
one it determines where the players are improving their solving skills as they progress. Distance means
how far are you in the actual game and you multiply by that. Also the set of players at that stage is
getting progressively smaller as we’ll see. That is taken account of there.
Here’s a nice table. We’ve got two different real contests. The first one was the China contests here
where we had four rounds, qualification, preliminary, preliminary, and semi-final. The subjective
difficulty was there and the perceived difficulty was very, very similar, alright. Players who started, as
you can see there were a lot of people in the game.
Now we then had another one where the two communities here, students and teachers did exactly the
same contest. The formula produced somewhat different results and also fairly higher results. But the
problem here was that numbers were very much smaller. You probably can’t trust those results as
much as you can trust the other ones.
>>: What was the end of this one, do you know?
>> Judith Bishop: Huh?
>>: What was the end of the second one, the number of people?
>> Judith Bishop: They are out on the far right there. Oh, so we had sixty-one and fourteen, very small
numbers in those contests.
>>: When you say perceived you mean your calculation right?
>> Judith Bishop: Yeah, yep the perceived difficulty. I just want to show you one more which I put up
here. Here is another one. This is a contest that we ran. These are the kind of results that come out at
the end of a contest. These are the puzzles down here. This is the one that you were in Alexi and
Jorgen, you tried this one. These were the original difficulties. This is what came out of it.
You can see that down here there were two puzzles that had an average of nine or eighteen. Whereas
we thought they were difficulty two, okay. That will get folded in back into the difficulties. But we
didn’t have a lot of people in this contest. There’s not going to be a huge impact of that on it. As much
as there would be on some other contest where we had eight hundred people and there was really
quite a bump in the thing.
Now we have another piece of data. That is we have in Code Hunt as you know this is probably the most
well known part of Code Hunt and that is the default zone. Otherwise known as, among ourselves as
the APCS zone because it teaches computer science and it’s got a hundred and twenty-nine puzzles. It
goes through all the details of computing.
Actually forty-five thousand, we’re up to about seventy-five thousand users have started this zone now.
You get a very steady drop off of people on it. They start and they drop off. The drop off will be very
similar. I’m just waiting for the data to come out of the computer Daniel, right. Then I’ll be able to draw
the new graph.
Now what we did and this is a pretty difficult graph to understand but let’s see if we can do it. That blue
line is the same as what I’ve just shown you, alright. The blue line is that line. I’m indicating at a
somewhat bigger scale the drop off rate here. Then I’m indicating with the orange bars the percentage
of a thousand winners who dropped off, right, because of the previous, who changed between two
levels, right. The steep drop offs, here’s a good one. You can see this one. You see that steep drop off?
That’s how much it was.
We then tried to find out well what was actually in that puzzle that caused that steep drop off? Did it
turn people off because that was just an unhelpful puzzle? We discovered commonalities among these
puzzles that caused these drop offs. It was quite startling. I’m not saying it’s definitive but it was
startling. The yellow ones consistently and these are right at the beginning, so these were beginners.
These were large numbers of people. The yellow ones, that’s this one, this one, this one, and this one all
involved puzzles with division. Simple puzzles with division, X over ten, ten over X, that kind of thing.
>>: [indiscernible]
>> Judith Bishop: Huh?
>>: [indiscernible]
>> Judith Bishop: Yep.
>>: [indiscernible]
>> Judith Bishop: Any kind of division. It wasn’t you know model or something like that. It was just
plain division, so they didn’t do it. The blue ones were operators such as binary operators. Those were
these ones here. They also, people didn’t think of those puzzles. I think these; they didn’t think of those
patterns and therefore couldn’t solve the problem. The green ones were sector changes. When you
switch from one sector to the next sector sometimes people just gave up at that sector and said well I
finished a sector now I’m not coming back.
>>: Often a sector introduces a new concept [indiscernible].
>> Judith Bishop: Yes, correct, so you know they might be moving onto loops and they might not know
about loops. The sector says loops and they say oh what’s this all about? They don’t go on, right.
In order to keep people on the game we do need to have results and rewards. As we’ve discussed
among ourselves already, a ranking on the leader board, the public ranking, and so on, and the way in
which the rules are written for a contest. It’s always based on score. It’s not based on time or attempts
except in so far as if you’re the first there that breaks a tie.
Now the score is based on how many puzzles solved, how well solved, and when solved. That’s the
order in which the ratings are. We’re very keen that rewards should be given. Automatically for
example in the Imagine Cup contests there’s a prize. In fact Imagine Cup gives a thousand dollars and
then they send out certificates.
We think that actually there should be more of this. What we’re proposing for the new Code Hunt Zone
which will be coming out soon. Is that there will be more certificates that go out for achievement during
the playing of the game. Maybe if you get past two sectors and then you get past another two sectors
or another one sector you automatically get a certificate. We’re interested to hear what you think
about that as well. We think achievement is quite a good thing.
This is the kind of data that we extract out at the end of a contest. This was the September contest; the
names of the people would be here. As you can see this is the score and this is the number of levels that
have been solved. Then we record everything across for every puzzle. It’s an enormous spreadsheet
that we deal with. That then goes over to our recruiters. They get the names of the people, the
universities they’re at, and they do something with it if they’d like to.
What’s the size of this data? Well it’s a lot bigger than we thought. At the moment we’ve got all these
contests that have run. At a minimum we’ve got seven hundred twenty-five thousand or more people
who have begun the APCS. For each of those what happens is we’ve got a number of puzzles. For each
puzzle there’s an approximate number of tries. Okay, not everybody solves the problem so you get this
jagged array, like this for every contest.
It’s about half of a matrix. But for example Imagine Cup September contest which we hope to give you
tomorrow had two hundred and fifty-seven users, twenty-four puzzles. If we multiply it out at
approximately ten tries its thirteen thousand programs. We know that. There it is this is it here. An
average try count across all of the puzzles maximum try count and total solved users was fifteen
hundred and eighty-one. It’s big numbers just for a single contest. We’re running them all the time.
We want to give you that data. We will give it to you tomorrow. What it will be is just a whole bunch of
programs. But they will be indexed by user so that you can go in across a particular user and find out
how that person is progressing in programming. If that is the kind of thing you want to do.
Lastly, we always look at gender. It’s an important issue in anything we do here. I’m sure for you as
well. Does this game contest appeal to men and women alike? We asked the question on the, that
survey whether the people were male or female. They were free to answer that question. I don’t think
that six hundred and eighty-two is correct. I think it’s meant to be; yeah it is six hundred and eight-two,
sorry, so converted to percentage. Eighty percent of the people were male, twelve percent female, and
then there was some who didn’t choose.
What’s interesting is that equal, almost equally they played it for their own enjoyment, right. Or they
played for a course in Java or a course in C#. Those were other questions. But this is the one that I
really liked is that equally they were playing for their own enjoyment. More particularly this twelve
percent female tallies in the US at least with the current percentage of women in computer science
courses. Now I know that’s going up and I hope it will continue to go up but when we took the survey
that was more or less the same.
Now we’re going to go towards the course experience and we’ll be doing that tomorrow. One of the
places where we can do that is in Office Mix. Hopefully there will be time to have more discussions
about Office Mix. Office Mix is part of Office. If you’ve got the latest Office you can just download it as
a plugin. Code Hunt is there, we heard this morning that there’s loads more Code Hunt courses.
This is a simple little one that one of our interns did on Strings, Alisha. She was a great kid. You do a
whole PowerPoint and right in the middle you can pop Code Hunt slide. You actually work on Code Hunt
and they carry on with the slides. Then you come back to Code Hunt and carry on with the slides.
In summary we’ve got a very powerful and versatile program. We’ve got large numbers. We need to
really test more hypotheses on these numbers. We don’t believe we’ve got all the questions or answers
correct, we’ve only got some of them. We’d really like to talk to you about what we could do to hold up
some of these students or players, what’s going on in there. Finally, the course experience is probably
our most pressing next step.
As an advertisement Imagine Cup are running the next contest for a thousand dollars starting on
February the twenty-first, four p.m. in the USA, Pacific daylight time, midnight UTC. If you want to start
with that alert your students. Thanks you.
[applause]
Yeah?
>>: During your presentation I whispered a question to Pelli and now I wanted to share it with you and
the whole crowd because I think it’s important. The summary slide fits it. I think as you try to move
Code Hunt more into the course experience, the education experience, all this sort of stuff. There’s a
big unanswered question. To be clear I’m not actually skeptical but others will be. It’s a deep question
which is does being good at Code Hunt mean you’re good at programming? Does getting better at Code
Hunt mean you’re getting better at programming? I don’t think there’s any data in this corpus of data
that helps answer those questions.
>> Judith Bishop: Yes, actually there is. Daniel’s got some evidence.
[laughter]
I think so, I remember a slide, so there is some evidence that progress is made when people are moving
through the APCS course.
>>: You mean there’s evidence that being that playing Code Hunt makes you better at Code Hunt.
>>: Exactly, yeah.
>>: That’s not the same as evidence that playing Code Hunt makes you better at programming.
>> Judith Bishop: Alright.
>>: Or better at passing the AP test which is arguably a different thing. But nonetheless still an open
question.
>> Judith Bishop: Well that would be a very difficult link to make.
>>: Yes.
>>: I remember, and tomorrow, yeah in my talk I will share some experience I’m using in my course so
that, I mean not that huge amount of data but close up to see what’s going on. I think hopefully we
could have some more discussion then.
>>: Yeah.
>> Judith Bishop: Maybe you’ve got some as well, Alfred?
>>: I’m, I hope to have some. I don’t have enough, I haven’t accumulated enough data yet but it’s
definitely something I’m very interested in and it’s a big focus.
>> Judith Bishop: Alright, sure.
>>: [indiscernible]
>>: [indiscernible] contest in China that, so people could use either Code Hunt or other ways to get into
the final, right?
>>: Oh, yeah.
>>: In fact the Code Hunt people did very well in the final which was not Code Hunt.
>> Judith Bishop: Oh, yes, actually they won the contest. There were ten teams were invited to the
finals. The one that won the finals were actually a team which composed of students who came in
through the Code Hunt route.
>>: Other people came in from more traditional ways of programming contests. But the final was a
more traditional [indiscernible] contest, not Code Hunt?
>> Judith Bishop: No, so but my full answer to all of this is that there are many aspects to programming.
This is your serious, serious coding aspect. One of the things I think that, yeah.
>>: Wait so this is why I’m pushing back, sorry. Is that Code Hunt is sort of unique in this. It’s your
second bullet, right. It’s unique in trying to solve, in trying to guess what the secret program is, right?
It’s very engaging. It’s very fun. It get’s students to think in this very logical manner but I think there’s a
larger gap between that and what programming traditionally involves than the other pedagogic
techniques that have been tried. I think there will be more skepticism in does this help do the
traditional thing because it’s more different.
>> Judith Bishop: Yeah, right, yeah. One of the criticisms that we’ve been hearing in education
conferences which I’ve been attending has been that because students have been concentrating on
team work and these other aspects of programming such as GUIs and softer areas if one might put it
that way. They can’t reverse a link list. They’ve lost the ability to do hard coding.
The pendulum had swung a little bit too far to the one side. You often hear this from professors who
have second year students. What did they learn in first year they can’t code, right? I think it really is a
mix and there’s room for a lot of aspects.
>>: Yeah, but putting it that way I think the interesting study which I’m not signing up to do but
someone should do it.
[laughter]
Is in comparing a well chosen set of Code Hunt puzzles versus sit down and write a program to reverse a
link list.
>> Judith Bishop: Right.
>>: I’d love someone to do that, yeah.
[laughter]
>>: Right, that’s the comparison, right.
>>: If you have students that [indiscernible] studies…
>> Judith Bishop: Then what would your metric be? Would you metric be time, speed?
>>: The metric would be the exam a month later.
>>: Yeah, it would be the exam at the end of the year.
>> Judith Bishop: Okay, well that’s for the educational research community. We’ll have to get them
excited at the next [indiscernible] Pelli.
>>: Right, yeah I mean should you give, should you keep it fuzzy and make it a game or should you give
the specification and…
>>: Yeah in, this is a particular game, right. This guess this program you can’t see, right. We’re arguing
that that particular game has educational value, right. That’s, you know there’s some leaps of faith
there.
>>: This is a particular game, yes, there is.
>>: I want to just chip in one thing [indiscernible]. The [indiscernible] phone could be very powerful in
terms of giving you [indiscernible], right. You should just disclose whatever specification eve the code
that you get into it. You put it in the course that I taught but it’s a mixture of both, right. I mean for
some probably I just directly tell them what’s the requirement. They just use you as a feedback engine
to them where they went wrong, right. But for some other you have brandings I’m guessing to have a
little bit more fun aspect of that. There’s a mixture there.
>>: True.
>>: So in your user initial problem you can put a massive comment if it completes specification of the
problem. That’s not something we do in the regular game play. You could think that as a teacher you
could just do that and then go into a more traditional way where you know exactly what to build.
>>: Yeah, yeah, yeah, I think I made it to that point. There’s this aspect that when we talk about a Code
Hunt puzzle we talk about it as if it were some uniform thing. But in practice there could be huge variety
between you know exactly how those puzzles look like. How carefully they were crafted with what goals
they had in mind. You could imagine running a study and getting completely different results depending
on which body of puzzles you picked.
>>: Yes, right, yes, that’s completely true.
>>: I mean the competitions are, the puzzles are very scrutinized and it takes a lot of time, ask Nigel
how long it takes to get all these puzzles, but yes.
>>: It’s tricky.
>>: Yeah, I really like all these points. But particularly Todd’s point that if I were trying to sell this to an
Intro CS Instructor I might first sell it as a great UI for Pex for fun, right. Give them some practice
problems. When they get it wrong they’ll be shown a test case that doesn’t work. They can, you know
give them the specs. But if you want more engagement, if you want to setup fun competitions there’s
kind of game mode, right. Again, this is not how I would sell it to the masses. But it’s how; if I were
trying to get it into a particular classroom its much closer to the sub-traditional model, right.
>>: The feedback I got from the [indiscernible] teacher who teach APCS is it’s more down to earth. They
have zero time to invest. They want their Chapter one point two. There’s a website that gives them one
point two. That’s where you go. They use Practice It because of that, because it maps one to one to the
book they’re using, saves them time. They don’t really care about the interface or anything like that.
That’s just a time, and educational a lot like that teachers make decisions knowing if it’s the best thing
because they have no time to spend with it. It might be overfitting also much more best.
>> Judith Bishop: Okay, so it gives me a lot of pleasure at the moment now to introduce Willem Visser.
He’s come all the way from South Africa via South America. He took the longest journey to Seattle. He’s
going to tell us about the slider bar you want.
>> Willem Visser: Okay, thank you very much. Yeah, my little joke about being from a few miles south I
wasn’t quite sure it hit home earlier. I had to put South Africa there at the bottom. It’s the very south
of South Africa as well which is probably even further away.
Okay, so what I’m going to talk about is actually something very simple. I’m not quite sure how hard it is
to actually do. It might be quite hard. But the idea is the following. What you have at the moment, you
have Code Hunt and it’s built on symbolic execution. What you’re interested in really is just, is
something SAT or UNSAT. Then you need some solutions because we need those solutions to be able to
tell you or to guide the student, whoever to see what hidden specification they’re trying to find.
Now what I’m going to talk about is something called Model Counting. In Model Counting what you’re
after is the number of satisfying solutions. I’ll show you in a second kind of what that means. But the
idea is can you use this to add a Progress Bar? That’s the one slide summary.
Just for these examples, we’re going to go to a perfect world which by the way is roughly the world we
live in any way in Code Hunt because usually linear integer constraints. It’s the kind of stuff that
symbolic execution is pretty good at. It turns out that if symbolic execution is good at it so is Model
Counting by [indiscernible].
>>: [indiscernible]
>>: Yeah.
>> Willem Visser: Yeah, [indiscernible], that’s always good. We’re going to work with uniform
distributions, which is kind of important for these calculations. Again, in these examples uniform
distributions are actually what you want anyway. It’s going to work well.
I’m going too quickly in about one or two slides just recap symbolic execution. Simply because this kind
of symbolic execution that I’m describing here is actually the opposite of the kind of symbolic execution
you saw earlier with Pex. Because this is what’s, I don’t know if it has a real name but usually people call
it like symbolic execution or classic symbolic execution because it’s not dynamic. Dynamic symbolic
execution is when you execute the program with concrete inputs and you collect some constraints as
you go. This stuff you just start with the symbolic input.
I have a little program here. I start here with two symbolic inputs x and y and my path condition or the
constraint under which I reached that point in the code is currently true. If I start executing if I want to
reach this S zero statement, never mind what it is. Then Y which was the input y needs to be equal to x
times ten, the input x times ten. From S zero I can reach S two. Under this constraint which is a
conjunction of the one you had plus this thing here and similarly you could have got to S three, and on
the other side S one, and S two, and S three.
Basically for all these locations you execute the path through the code and you have a constraint.
Basically this is what we’re going to be working with. These constraints are the constrains in the input to
reach that point in the code. Now if you’re going to be doing this symbolic execution you can now solve
these constraints. In this example they’re all satisfiable so that’s not very important. But you can get
these inputs like one in ten, zero and one, and four and eleven. This for instance will cover all these
statements. That’s kind of your vanilla symbolic execution.
Now what we’re going to be after is a slightly different view of the world. Where we start off with this
big volume of values and they’re going to be flowing down the program. Here’s an example of what I
mean, so this is the same program except now I have suddenly a range here of values. I say x and y will
be hundred values from zero to ninety-nine. Then there’s a barrier here in the beginning that’s y equals
ten x. Some of the values split this way where it’s true. Some of the values split this way where it’s false
and similarly for this other if here.
Now the thing is though this is not quite accurate, right, because these rivers of values or flows of values
they’re not equally distributed in this way, right. A question would be if you had these four paths which
of these are actually the most likely paths? Okay, so when I first looked at this just blindly I thought this
one here looks like it should be the widest path. However, it turns out this is roughly what it looks like.
Actually the most likely path is this one. These two are almost equally likely. But it’s very small sliver
compared to all the values that go that way.
Now question is how did we come up with these metric widths? We used Model Counting. There’s a
Model Counter called LattE that we’re using. It counts the solutions for a conjunction of linear
inequalities. It works really very well. There’s another one called Barnavic that Diego there is using. If I
can get it to compile I will also use it. But it’s still a mission but LattE works really well for what we want
to do.
Now with this we can actually calculate these values precisely. For my little example here nine hundred,
nine thousand nine hundred and ninety goes this way, and so forth, right. This is the basic technology
that we’ll be using. We have these constraints, these path conditions that come straight from symbolic
execution, so they’re not mysterious. What we’re trying to figure out is simply how many solutions are
there in each one of these.
Now what can you use it for? Well programming understanding for one thing to kind of give you an idea
of what happens in your program. You can of course add probabilities to this because you know what
the number of values are. You know they say almost eighty-five percent chance of your program
coming down this way. Once you have probabilities you could use it for reliability. If you know
something bad happens here then you know this program is three nines reliable.
This is nothing to do with what we’re going to talk about now. What we want to do is this. Can you add
a progress bar? How wrong is the program? That’s kind of the same question. Which correct program
is better? Now you heard already that the correct program is better spec at the moment is the number
of instructions. I’m going to kind of show a slight alternative to that, as well. But that’s actually not the
most important part. Most important part is how do we rank incorrect programs?
The example I’m going to be using, this example I used before I heard of Code Hunt. Then since I’ve
heard about Code Hunt I’ve tried to also include some examples from Code Hunt. But the Triangle
Classification is a classic thing in the testing community. You have three sides of a triangle and you need
to classify what type of triangle it is. I have four versions of this program, two correct versions, two
buggy versions. The two buggy versions the one I actually or one of my colleagues accidently created
with a small typo. The other one was from coding exercise we have our students at one point. This
other buggy version comes from there.
Okay, so here’s the correct version. It’s kind of like a bit of an elaborate one. You really don’t need to
know what it does. Except that the incorrect version just swaps around right here the greater and less
or equal, right. That’s only little one character mistake that is in this program.
Here’s another correct version. You can see it’s a lot more sasync than the first one. The second buggy
version is almost exactly the same logic as in this program except that this big thing here actually misses
a case. There’s a condition that’s been missed in this example. But really what these programs do is
somewhat irrelevant to the presentation.
Okay, so now the first question you might have is if you look at these two programs which one do you
think is better? Okay, I know the Code Hunt spec will say this one is better because it has a lot less
instructions, right. Another way to look at it is to look at all the constraints of the different paths
through the code. There are these, I think thirteen or something paths here.
The color coding is not really very important except that it assisted different types of outputs. If you
look here there’s like four outputs, return four, return well one, two, three, four is the values you can up
it. This is the color coding for those ones. For example there’s ninety-nine here, is there one where all
the sides of the triangle is equal. The yellow ones are the cases in which it’s not a triangle. Okay, now if
you look, so this the, for the first correction version these are the partitions and this if for the second
one.
The second one has less paths and more importantly it has less paths that are dealing with the case of
the illegal triangle. These yellow ones here are less. Now it’s not rocket science. Lots of people have
already mentioned this. But of course you can therefore use the number of paths for the code as an
actual metric for which one of the programs are better.
But that’s kind of like somewhat uninteresting portion of the talk hopefully. Maybe this is going to be a
bit more interesting. The, you saw this before basically. This is the little harness that you use also in
Code Hunt. This is my version of it. For this specific program it takes the three inputs, you run the
buggy version, you run the oracle, and you assert it to the same, or buggy version or whatever the
student version, student version against the oracle. Then you check if they’re the same.
All I’m suggesting is to do the following. You record the path conditions for which this assertion fails and
then you see how big those partitions are. How many values are actually going down this stuff? That
simply is going to be the metric for how buggy your program is.
Now if we look at the two programs we have. The correct version, now of course here it doesn’t matter
even which of the correct versions you have because obviously all the correct versions will have the
same paths, right, so for the correct stuff. Therefore, if you look at the correct version versus the first
buggy program and correct versus the second buggy program. It turns out the first one has about sixty,
what sixty-two percent of the input values are actually incorrect, or incorrectly classified. Whereas the
second program has a lot less, like fourteen percent roughly, right.
When we looked at them syntactically the one literally had one character mistake and the other one
missed the whole condition. But the one with the one character mistake is actually remarkably wrong,
right. This is somewhat unintuitive. I mean you would have thought since there’s such a small syntactic
difference that it also would be a small difference in the number of behaviors. But in fact it’s not the
case. Buggy two is in fact closer to being correct than buggy one.
Yeah, so this is just saying the same thing again. I mean the edit distance between the one correct
program and buggy one is very close. However behaviorally it’s a big difference. Of course you can just
almost prove by contradiction think that there is no way you can use edit distance because you could
have written a lot of different versions of the correct program. How can you compare against that
because you might be comparing against the wrong one.
The small issue of this is this is all good and well. But intuitively though if you’re a user of this system
and you see this progress bar and you literally have a one character mistake in your program. You might
not appreciate the fact that the fingers going to tell you that you’re way off with your current program.
This is a potential thing one should consider.
Now as a side track I was kind of shocked when I saw that result. I couldn’t believe that that small
change in the program actually had such a big effect on the correctness of the program. I thought to
myself well this is a classic mutation operator. There’s this whole area in software testing where people,
limitation testing is suppose to be something that people use to kind of evaluate the test with. But what
actually what ninety-nine percent of people use it for is as a metric to see how good your new yada,
yada bug finding technique is. Because you use it to just mutate programs and then you run your
technique on this and try and see if it can find the bugs.
I was interested or I’m still interested in see about how much of effect is the operators make in terms of
mutating the program. Because if you have one mutation and it suddenly has a huge impact on the
program of course you’re whatever bug finding technique is going to kill that mutant. Because half the
behavior is the program change.
As a little example here we have a one line program. It just is i less or equal to j, this is what you’re
trying to test. Here you have four mutations of that thing. The one is greater equal. That one is just
missing the equal. This is equal and that one just returns true. A question, which of these do you think
is the most wrong, anybody?
>>: In terms of what?
>> Willem Visser: Okay, I think the third one [indiscernible] F one for the third, let’s see. That one’s only
one percent wrong, right, because it’s only misclassifying the equality here. That’s pretty correct. Now
this one is way off, right. That’s with the greater equal actually the only thing it classifies correctly is the
equality one.
Okay, but now here are the two tricky cases. Which one of these two do think is the most wrong or
correct?
>>: That’s wrong fifty percent of the time.
>> Willem Visser: Yeah, almost fifty percent of the time. But the funny thing is these two are actually
the same.
[laughter]
It’s like this is wholly unintuitive in my humble opinion. I think this could be actually quite interesting if
you look at the, but now let’s play some Code Hunt. Now this one of the very early examples in the
Code Hunt, I think it might actually even be the first one. The program and I’m going to take a lot liberty
here with my syntax so the stuff fits on the slide. But this one takes x input and I give it the range from
zero to nine, so ten values. You’re supposed to return x times two.
My way of playing Code Hunt, maybe I shouldn’t give to many hints away here because it’s probably
very stupid, but this is how I do it. I look at the results that I get from the kind of hint window so to
speak. Then I just cut and paste it into my program until I can see a pattern. When I cut and paste this
in it says x equals two returns four and x equals six returns twelve, so I stick that in there. As you might
notice I stick it in just like that. I don’t even change the return.
It turns out seven of those values that fail for that program will be incorrect, right, basically seventy
percent of my programs incorrect because ten is the number that flows in there. Okay, so then I look at
what the thing gives me again when I give this program. It adds one more, so it adds two if x is one and
return two. At that point actually notice that there’s a comment that says this returns zero is not, that’s
not usually the correct way to do it. That’s what Code Hunt tells me.
Then I try a next version slightly incorrect but I decide maybe x plus three is the right thing to do here. I
expected this to be a better solution than the one I had before. But unfortunately it turns out it’s the
same number of incorrectness. What I failed to see was if I return x plus three here I do cover the case
where x is free but I actually lose the case where x was zero because that was with this one. I didn’t
actually improve my program by doing that.
Okay, so another example and again full disclosure, this actually happened. What you’re going to see
next, let’s stop laughing. Notice I don’t tell you what the correct version is. We’ll get to that in a second.
This is what the system output for me the first time around. It turns out there’s a hundred thousand
possibilities here and about ten thousand, sorry ten thousand possibilities, x and y were each a range of
a hundred values. I was essentially completely wrong except for the three that was actually there.
Those three were correct because that’s what the tool told me I should do. Then when I ran that it gave
me these two at the bottom here so I added those. Suddenly another two of them fell away. I’m like
almost completely incorrect.
At that point I thought I saw the pattern, right. If you look at these things maybe the pattern is x minus
y, okay. Actually full disclosure I did notice these ones won’t comply with that spec so I wrote a little
extra code to kind of cover those cases but still never mind. When I did this I noticed that, okay, so now
I’m about a hundred values better off. I’m slightly better. My program is slightly better. Then I kind of
tried this one and amazingly enough I went backward, right. Because here I decide let’s just take all this
stuff out and just replace it here with the x minus y is probably a good approximation. Unfortunately I
lost two of my correct answers doing that. I moved backwards.
Then kind of like what if we just fix the code so that if x is greater to equal y we’ve done x minus y,
otherwise y minus x. Then I chopped off another like ninety-nine values or so. Point is I’m still
completely off, right. This is just, I’m completely wrong with this program because it turns out that was
the actual correct problem was x plus y. But cunningly the two always gave me a zero in one of these
things. I thought like in the beginning it’s the difference between these two values. But actually wasn’t
the difference it was x plus y.
I’m hoping to see here that if you have this kind of scenario you should see that you’re so far away from
the answer that you must change your approach, right. Clearly you’re heading in the wrong direction
with your code here.
Okay, so one last example, oh, yeah, sorry a small interlude, yes?
>>: Can you go back? But you could argue that you were very close to the solution though, right?
Because I mean the deal was right you just had to reverse the sign.
>> Willem Visser: Yeah, sure but.
>>: It’s hard to say you know when you were fighting when you’re closer. It’s whatever count leads you
towards the…
>> Willem Visser: It’s the same issue that we had in the beginning with the one character was wrong in
the other program. Syntactically you’re close to the answer but unfortunately semantically you’re a
million miles away from it.
>>: Yeah, I guess it depends on what helps you more if it’s being closer syntactically or semantically.
[laughter]
>> Willem Visser: I like to argue semantically with you.
>>: I would just like to add a particular point I mean as I mentioned earlier we tackle these problems
with simple solution and it’s very interesting to later on when we compare to see like when did the
simple solution work. One way to use these is to use the behavior [indiscernible] metric to see whether
[indiscernible] fitting, right. You adjust and I just see these whatever table [indiscernible] at the end,
you see incremental like an improvement rather big jump, right. You want to see big jump because
that’s really solve the new things. It’s another way to use the data is what you described.
>> Willem Visser: Actually the next example hopefully will show exactly that. Okay so what happened
at this point is I had to start to play Code Hunt because as was pointed out you can’t skip ahead, right.
You have to solve all the puzzles until you get to the interesting examples. Time passed and I played
Code Hunt until the cows came home. Except I was also in Brazil so I did a few touristy things. But I
needed to get to the interesting examples and it really took me awhile.
But at last here is one that I think is somewhat more interesting. This is correct version if x is greater
equal than fifty or it return false otherwise it return true. The reason why this is zero to one nine nine is
I usually look at the first answers the tool gave me. If it gave me answers above say a hundred then I
went to the next multiple, so that’s why it’s two hundred. The tool, the Pex result gave like a hundred
and thirty-three or something like that so I went for two hundred.
Okay, so here’s the, my first attempt. Again it was a cut and paste from the original. It turned out this
was wrong on forty-nine values out of two hundred. It’s about twenty-five percent. Here I went for this
version because again this is what, this is again a cut and paste just from what Pex gives you. This was
wrong on forty-six answers. Okay, so basically just because I added three more, actually there were a
few steps in between here where it gave other ones. Like forty-eight and forty-seven was really here as
well. So forty-six is wrong so this is incremental, this doesn’t seem to be really helping. It’s just going a
little better.
Now when it gave me this it gave me sixty-four as a false. I thought oh well maybe there’s something
here. Because at this point I thought modular two this seems like it might be that. But when it gave me
the sixty-four I realized it’s not modular two. I decided there must be some different way of doing it. I
decided to go with if x is greater equal sixty-four then you return false. Unfortunately that didn’t really
help. Although that was a step in, potentially a step in the right direction it was the same number of
wrong answers.
But when it gave me this answer it also gave me like one of the smaller numbers or something as true
like two of something. I though oh well maybe it’s simple as that if x is greater equal sixty-four else
return true rather than just enumerating. That gave the big leap forward because suddenly only
fourteen values were wrong. Actually, now of course from that fourteen you can almost deduce what
the real answer should be at this point because fourteen from that is giving you the fifty that you were
actually looking for. Here you made a big step forward and that should have been a, like hopefully a big
green flag that you’re moving in the right direction.
Okay, so this the last example, so just to conclude. The bounds of course now are quite important
because if I don’t have bounds then I can’t calculate these values. By the way all these values are
calculated with a tool. Okay admittedly I had to cut and paste the programs into it because it’s not
connected to Code Hunt obviously. But we have tools that do all this stuff. It’s not a manually going,
counting all the solutions. But the bounds are important. If I don’t know what the bounds are I can’t
really do this. But as I pointed out early even when I played all these games sometimes I would
accidently stumble upon the bounds even playing Code Hunt. But it’s even worse in this case because
somehow you must expose the bounds completely.
The other question is should we give like these raw scores like the numbers I just gave? Like literally tell
you that you are ninety-four point three percent wrong or something like that. Or should we give like
more kind of like the descriptive things like you’re way off, you’re a little better, now you’re getting
there. You know something like that rather than these precise numbers. Because the precise numbers
might put people off a little. It might not be so good. Of course a progress bar that moves forward and
stuff might be another alternative to this exactly.
The last question is can we actually do Model Counting for all the different domains that is currently
supported by Code Hunt? For most of them we can. Strings is kind of complicated but there’s been a
recent paper on how to do Model Counting for strings. Hopefully this can also be combined. But one
will still have to see if all the domains can be properly supported.
That’s it.
>> Judith Bishop: Great.
[applause]
>>: Early you mentioned briefly on the bounds. I mean because I [indiscernible] code induce you don’t
have the nice small enough bound for the input domain. That would limit, I mean the…
>> Willem Visser: Actually the, I mean bounds is not that, it depends exactly what output you want to
give, right. In all these examples I had a concrete bound there and I gave concrete values and stuff. But
you’re not going to give concrete values you don’t need to, then the bounds are slightly less important.
The bounds must just be big enough. As long as the bounds is big enough then you’ll be okay. But the
point is this technique it cannot even start if you don’t have bounds. You absolutely must have bounds
because otherwise you can’t count the number of solutions.
>>: Right, so I mean the techniques that we try basically are very simple. One is the random sampling,
second is I take the reference program, we just explore, use text, explore all the paths. I mean within
the bound, right. Then I mean the constraint, bound the constraint. Then you basically count how many
paths actually fail on the [indiscernible] solution, right. The third one just mix both like the
[indiscernible]. I just wondering like do you have any [indiscernible] these techniques would fix the
simple techniques.
>> Willem Visser: Yeah, so what you just described is exactly, this is taking, like in your first example you
said just the paths. What this is also doing is taking the width of the paths into account, right. That kind
of plays an important role. Instead of sampling to see how many times you go down a certain path this
one just calculated. Yes and yes, it’ll, this in theory should do better if you can for sure apply the
technique on that domain, right. That’s the only real caveat. As long as you’re like an integer linear
arithmetic your home and dried all works perfectly fine. Floating point for example is a different story
but luckily you don’t support floating point here either.
Strings is an issue but like I said there is ongoing work on that. Maybe if you introduce a feature like this
maybe from the get go it won’t work on all examples. Like only some examples will have a progress bar
not all of them but yeah.
>> Judith Bishop: Willem just to be clear we do envisage that we might be able to have this little
progress bar on the main page of Code Hunt. There is a possibility that the tools could integrate.
>> Willem Visser: Well I mean Pex produces path connections. This thing just makes a path connection
and counts the number of solutions. The actual technical integration is relatively straightforward I think.
>> Judith Bishop: Right.
>> Willem Visser: I don’t think that’s, I think there are a few other issues we need to consider. Like for
example this discussion we had about this syntactic versus semantic because it could be that it actually
is not a helpful thing for people when they play the game. Well one will have to see if it’s something
useful.
>>: I mean based on that experience this could be a mixture depending on what kind of a game or what
kind of [indiscernible] you get. For example if [indiscernible] solution would be typically uniform, right,
not way off, right. I mean then syntactic, I mean distance would be okay, but if like you could write a
loop or you could write occursion to accomplish the same [indiscernible] syntactic distances would
improve it well. Then behavior, I mean metrics, or it’s got dynamic metrics could help better in the
sense.
>> Willem Visser: Yeah it does that, yeah.
>>: Yeah, so a couple of things. What are the syntactic semantic thing is, so you said that you guys
looked at the, I guess it’s a question for everybody whether it’s if you want a big jump. But is there any
evidence that it’s, if you look at the way people get, that the good players get to the solution. Is that,
you know what kind of phenomenon do you observe in terms of getting closer?
>>: We did not really distinguish like a good player but data’s there, right. We have this data, I mean
metric, we have here metric using simpler techniques, right.
>>: Right, it would be good to kind of to try that with your with the counting and see whether you know
what’s the evolution. If you see basically there’s really this slope perfectly then it means [indiscernible].
>>: Yeah, I agree.
>> Judith Bishop: Well do you think you would be able to work this out from the data we’re going to be
giving you?
>>: We already I mean have reads out on the existing; the orange and the paper bound data. With the
new data definitely we could try it. Then we have some discussion and see like what kind of pattern,
what kind of question we could learn by looking at the data that we are going to get, definitely.
>> Judith Bishop: Yeah, sure.
>>: Because in a sense if you keep putting there the constraints, right, you know that’s what you get.
You know you get slope that goes towards the high solution, right. But that’s not what you want.
>> Willem Visser: Yeah, but I mean the issue would be like if you follow my approach to it you add like
five of these and cut and paste like first five say. Then hopefully you see the pattern. But in some of the
examples I didn’t see the pattern for example.
>>: I think sometimes it really depends on your strategy or check, right. I mean like for exactly for the
example [indiscernible] show. For me I just, we download, click one time button, I just write a
[indiscernible] from zero up to one hundred, right. Immediately a big table, right, then you look for
pattern there. I will just one time click, right. That does not mean, I mean I’m better it’s just that I
mastered the strategy, right. I mean I know how Pex work, what kind of data would be produced.
>> Willem Visser: Yeah.
>> Judith Bishop: [indiscernible] and then Armando.
>>: We were looking at more like syntactic distances and actually based on [indiscernible] with
[indiscernible] and students. Mainly what happens is students will do everything right. It’s just a
common case where they get something off by one. There’s off by one [indiscernible] adding indexing.
All the test cases fail and I think that’s where she, really you wanted the feedback that you’re really
close. It’s just that small mistake in your program even though all of the test cases are failing you’re
getting zero percent. I think in some times maybe it’s a combination between syntactic distances and
semantic distances is something what we need and good feedback.
>> Judith Bishop: Armando is next.
>>: Yeah, so [indiscernible] we’ve seen is this is very, very problem dependent. That on the one hand
you have some problems where the common error is that you don’t realize there’s a corner case you’re
trying to handle, right. This is the kind of case where these kind of Model Counting can be very useful
because it can tell you look for most inputs you’re doing okay. But there’s this corner cases that you’re
missing. On the other hands those are also the kind of cases where the counter [indiscernible]
themselves are giving you a lot of the information that you need. Because they’re going to tell you, you
missed this corner case, you missed this corner case, you missed this corner case. In some sense this is
not quite, one of the questions is, is this giving you information that is above what you can already tell
by looking at the examples that you’re getting?
>> Willem Visser: Yeah, yes, well just in that one example because I ran into that all the time when I
played the game. Is of course what happens now is you see like say three examples in the you know red
crosses for three examples. But what you do not know is that those are the only three that you’re still
missing, right. I mean basically in this case it’ll actually give you that information. It’ll say you’re missing
like say one percent of the cases. Then you might be able to see oh wow there, you know I can just add
these and I’m done, but sometimes you add those and it brings you down from ninety-nine percent
wrong to ninety-eight percent wrong. It’s not really getting you anywhere.
>>: Yeah especially case where you could generalize the corner cases with the corner cases not just one
input but the familiar examples where the system would help him, did you make a jump or not after
adding those more general corner cases.
>> Judith Bishop: [indiscernible]
>>: [indiscernible] scenario where we have a problem is the lab example scenario where it’s turned into
[indiscernible] standard program within [indiscernible] they try to make a lot of changes [indiscernible]
stops compiling. At the same time the timing’s over. Okay, so we get a lot of program we cannot even
compile it. But probably they’re very close to the correction solution. Some semicolon is missing or
some comment has been open, that similar type of thing. I think in that case syntactic thing will help
even there we have to find some way to repair the program first.
>> Willem Visser: Yeah, sure, yeah, that’s true. [indiscernible] all day.
>>: [indiscernible]
>>: One of the things I think is interesting about this is I don’t think it’s actually either syntax or
semantics that you’re picking up on specifically. It’s just a number of paths.
>> Willem Visser: Well [indiscernible] paths. Yes.
>>: Right, sure, right sorry. I think one thing that there’s a couple ways that you can present that
information that I think wouldn’t interfere with a person’s flow. One of them is to actually specifically
say there’s this many of this many paths correct. Another one would actually be to pair it exactly with
the output that you’re already getting and say you know this many are left in this current category.
Because it actually does sort of fit with the [indiscernible], so anyway considering a way of presenting it.
Because it’s actually its consistent with the way you’re doing it already.
>> Willem Visser: Yeah, that’s a good way to do.
>> Judith Bishop: I’m going to have the last question. That is have you got a feel for the size of problem
that this might be applied to?
>> Willem Visser: Our experience so far has been whatever we can symbolically execute we can count.
But I do have my favorite next line here. This is just our little framework for speeding things up and it’s
really not much more than splitting up the problem into sub-problems and storing the results
persistently. If you ever run the same thing again the results already there and in the context like Code
Hunt because thousands of people are playing the same game all the time you gain a lot by caching the
results and probably already caching results. But at least this Model Counting results you can also
cache. Therefore I would think for the size of things you’re talking about here you’ll probably be fine
with Model Count.
>> Judith Bishop: Great, well excellent.
[applause]
Download