>> Nikolai Tillmann: So welcome everyone. This is... would like to start by saying a big thank you...

advertisement
>> Nikolai Tillmann: So welcome everyone. This is the last session of the workshop and I
would like to start by saying a big thank you to Judith who organized the workshop.
>> Judith Bishop: Oh wow thank you.
[Applause]
>> Nikolai Tillmann: It’s not quite over yet, but I really enjoyed the community here, everyone
is working on a related subject and I think that’s really great to bring together this set of
researchers to see how we can improve education and learning how to program.
>> Judith Bishop: Thank you Nikolia. I would never have got here without your inspirations of
course. Let’s leave that there. So our first speaker here is Alex.
>> Alex Orso: Okay, yea thanks also from me to Judith for inviting me and for organizing this.
It was very interesting. So what I did was to ask people around the room, I was trying to
summarize the where are the major themes, issues and so on. So several people responded and
thank you to all the people who responded. What I am going to try to do is just kind of
summarize what the information was and use that to classify 3 different things. So, one is
themes that emerge from the workshop. The other one is open issues and open issues is the one
with the biggest number of entries and then opportunities.
So in terms of themes the main one that seems to be the overarching theme from most of the
talks and discussion is feedback. So it’s all about basically the kind of feedback that you want to
give to the students, so to whoever is using the system. For example one particular theme or sub
theme in this context was closeness. So this idea: How do you tell the students or whoever is
playing the game that they are close to the solution? How do you indicate how close they are?
How do you direct them towards the right solution? Some people talked about providing the
exact delta with respect to the solution or possibly an abstracted version of that. We learned this
idea of doing it based on the number of counted inputs for the right or wrong solutions. There
were many, many ways of doing this that were discussed and I think that’s a really interesting
theme.
In terms of open issues, and this is probably of interest, especially for you guys or developers,
the minds behind Code Hunt, so one is kind of this overarching questions like the elephant in the
room, which is the approach [indiscernible] good? So how do we know that people actually
learn by doing Code Hunt more than by traditional means? And we discussed a possible study
which would be very interesting to try and see if you had a kind of student population, you split
it and half of them are learning through Code Hunt or similar means and half of them are
learning through traditional methods. What will be the difference in their learning?
>>: Do you have a study on the traditional method?
>> Alex Orso: Well that’s an interesting point. I teach online classes and now there are all these
discussion of: How do we know that online classes are working? And you think: Well how do
you know about the traditional classes? But, in the traditional classes everybody is assuming that
they work, right. So it’s a very valid point and in fact I think it would be good to just test all
these methods and see what works and what doesn’t work including also the traditional one, but
it’s a very, very valid point. Nevertheless I think a study of that kind would be very interesting,
especially if the results are good. It would give you a lot of fuel for these kinds of approaches.
Another open issue that was mentioned by several people was how to make puzzle writing easier
and I think it probably resonates with you as well. In particular: How do you make it easy to
provide good guidance to the player, which also goes back to the idea of feedback? So how do
you give them something that is useful so that they will learn how to get to the same resolution?
>>: Do you mean [indiscernible] or do you mean test cases?
>> Alex Orso: Well it’s however you do it, whether you do it by applying the [indiscernible] or
by providing hints. What’s the best way to tell the students they are getting there? It also really
goes with feedback of like you’re getting closer, but you might want to look at this or you use the
right input. It might be easier to find the patters and so on. So that’s definitely an issue on this
side. Now you guys showed us how to do it by basically tricking the system like for example
putting in their specific conditions if you want something to be considered. It would be nice if
there were more of an abstract and general way of doing that and also maybe a more understood
way. Sometimes I think when [indiscernible] and I were trying to improve the puzzles it was
hard to figure out what would help and what would not. So you think you did a great job, you
try it and its like, “Oh no, that’s not really good.” So I think that’s an open issue that would be
worth investigating.
The other thing that I heard from a couple of people is that this fact that the knowledge of
symbolic execution seems to be a little too necessary for whoever is developing the problems.
So basically you have to know what the machinery is underneath in order to do the right thing. It
would be nice if that could be kind of abstracted away a little bit so that you don’t have to
understand symbolic execution. This is not a problem for many of the people in this room, but
might be a problem if you want to kind of generalize the approach. So sometimes the results
might be surprising because you don’t know how the symbolic executor works. For example the
fact that you have different tests every time. It is like, “How come you have different tests each
time?” Well it’s because it depends really on how the paths are explored. So that’s kind of
trying to decouple the kind of underlying technology from the puzzle creating. Then of course
anybody should feel free, I’m trying to summarize what I heard and what are my impressions are
as well, but feel free to add and jump in.
Another issue is the problem creation: So how do you choose interesting problems and what
feedback systems do you use? This is really what is good and what is not good. There are tons
of studies on whether [indiscernible] good examples to use, but it’s not really clear what will
work and what will not in terms of teaching the students. One of the points that were made here
is that there seems to be good problems for different purposes. So some problems that are good
for a given purpose might not be good for a different one. So for example there might be
something that is great for teaching a specific topic to the students, but if instead what you want
to do is engage new students and get them interested in development maybe you have to use
something completely different where they don’t learn anything, but have a lot of fun. It’s not
clear that these two things necessarily go together. So it will be good to have some sort of
classification of what problems are good for learning? What problems are good for engaging
and maybe other categories as well? Now we know that everything seems to be seen together as
a whole.
So let’s see, primitives, oh yea that’s something that was really interesting for me. The
primitives for the designer that for example are kind of providing this higher level language in
which you can kind of add some of the information to the example filters. It’s like now we have
to do this assumed to avoid the null inputs. It would be very nice if you could just say, “I don’t
want to see any null input.” At that point you just remove the generation of those. I don’t want
to see any overflow. So, you don’t have to worry about how that kind of happens. In the
generation you have to use the assume. You just specify that they are at a higher level.
Something else is like specific inputs and you might say, “Oh, I really want to see 25 as an input
because I think it reveals some interesting aspect of the problem.” And again you can do it by
adding a branch, but that requires that you understand how the system works. So it would be
nice if you could specify that or a set of inputs.
Something that was brought up by William, which I think is very relevant is non-determinism.
So it would be nice to be able to make some non-deterministic choices in the specs. Enforcing
varieties is something that actually I found very difficult. In many cases when you develop the
problems you end up getting the same values for the inputs and that is not very interesting. It
doesn’t provide much information to the student.
So we ended up kind of putting a lot of assumes on differences between different elements of the
array and it get’s very complicated. It would be nice if you could just kind of give some
directive. I want to see some sort of variety in the inputs that are generated and maybe explore
why they are part of the domain of the solution. It might reveal more interesting facts to the
students. Also hints may be a possible way of saying, “If this happens or if you follow this
specific path,” I definitely want to mention this to the students. It’s something that in Richards
work was done automatically. Maybe you also want to provide the developers or designers a
way of kind of providing specific [indiscernible], because you know that it would help the
students.
Another issue that was brought up and discussed quite extensively is, and just tells me if I’m
talking too much.
>> Judith Bishop: Not at all.
>> Alex Orso: What is the best metric to track the problem difficulty? So how do you decide
whether a problem is difficult or not? Is it the number of tries or is it time? And I agree with
you that time is not a good metric because you don’t know what the students are really doing
unless it’s a controlled environment. On the other hand it’s the number of tries. So we listened
to different people that have different strategies. In some cases people just liked the feedback so
they are like, “I will submit 300 slightly different solutions because that’s going to help me get a
very fast way to the solution.” So there should be a way, maybe a smart way, of figuring out
what is the amount of effort put into the solution? So something that was mentioned I think was
for example the fact that small deltas maybe [indiscernible]. Frequent deltas shouldn’t be
[indiscernible] exploratory mode. So maybe there is a way of quantifying that in a more rigorous
way.
Some issues that were not mentioned by anybody, but we listed them yesterday so I extracted a
couple that were not covered by the rest. One I think received kind of a lot of interest and it was
the idea of preserving tests including the order. That seems to be really a necessary feature
because as you mentioned some people write down the tests that they see. I ended up doing the
same when I was trying to solve the puzzle. So it would be very nice if you could say, “Okay
freeze this test. This is one I want to see again.” It seems like something that should also be
relatively easy. I mean everything is easy to do right given time.
>>: Yea, maybe to just jump in there, but I want listen to the entire list of entry.
>> Alex Orso: Yea, I’m almost done.
>>: I think in general one can distinguish two kinds of feedback. There are those that I would
consider controversial in the sense that sure you would like to see more values, more diverse
value in an ordered way immediately, but I think it would to a large extent destroy the fun of the
game. So there are some things that I think are controversial and maybe we need to study to see
what the impact is on the education aspect, verses how quickly people solve it verses whether it’s
fun and engaging. There are some other aspects which I think everyone agrees with. It’s an
interesting challenge how to come up with good hints or how to design puzzles. Those are
definitely hard problems and we don’t have any answer. So just in general when looking at all
this feedback something to keep in mind is there are some things where it might go either way
and then there are clearly research deficiencies or big opportunities where we don’t know any
answer. But keep going, keep going.
>> Alex Orso: All right. In general it’s clear why it’s not there, but just to mention it again is
user friendliness of the environment. But, as you say it’s for internal use so it’s perfectly fine,
but if you want to have a broader adoption of course people will want more features like
completion and better feedback when you write the code and so on and so forth.
>>: So educators will tell you that they don’t want completion.
>> Alex Orso: Oh really.
>>: Right because they want their students to actually learn.
>> Alex Orso: Oh, no I was thinking for whoever is designing the problem.
>>: For the teacher.
>> Alex Orso: Not for the students, for the teacher.
>>: In fact some educators say, “I love the fact that there is no completion.”
>> Alex Orso: Yea, but for whoever is designing the problem it will save you time if you don’t
have to remember the exact syntax of the language.
>> Judith Bishop: But we have discussed whether we would be able to get the [indiscernible]
editor into Code Hunt. That would give more.
>>: Yea, it’s a possibility, but what [indiscernible] said it’s not clear that we really want to.
>> Judith Bishop: Right.
>>: I mean again it’s like having more values in an ordered way you want it, but it would harm
the value.
>>: So that one is probably more doable than having a complete IDE experience with completion
right now, because we would have to compile the code in the browser and things and we don’t
have a compiler for C#.
>>: Well in fact they have one. In [indiscernible] we do have completion for C#.
>>: We do have completion, but we haven’t migrated it.
>>: Yea and for a reason.
>> Alex Orso: Because you can also think of it in terms of the pedagogical aspect. If you are
doing very simple code in assignment maybe you don’t want completion. If you get to a point
where the key thing is really figuring out the algorithm then at that point you want to have
completion because you don’t want them to stumble on the syntax.
>> Judith Bishop: So we’ve got hung up on one of the smallest things out here.
>>: In fact out of all of these challenges, whether an IDE was code completion or not as better
for education I wonder if somebody already studied that, because you don’t need Code Hunt for
that. Just split up your classroom into two and give some programming assignment. We should
do our homework and look that up.
>> Alex Orso: Okay.
>>: Is it happening or not?
>> Judith Bishop: Well what did the students thing? I know you had some thoughts about that
didn’t you at one point?
>>: On auto completion?
>> Judith Bishop: Yea, whether it was a good thing or not a good thing.
>>: I think the purpose of Code Hunt is not to know the syntax or the API of C# or Java,
whatever you are doing. So auto completion would definitely –.
>>: But you are under time pressure. You have to produce something that syntactically correct,
compiled and –.
>> Judith Bishop: So he’s saying it would be a good thing to have it.
>>: What I’m thinking is that it’s the purpose of learning those things so why not support it in
this IDE?
>>: It’s tricky. I mean coloring is easy.
>>: We have done it so it’s quite possible. We made another interesting experience when first
sold out Code Hunt in a context that was done in China. So at least until recently there was no
Windows Azure Data Center in China mainland, but maybe that has changed by now. The
closest one would have been on Hong Kong, but it still has to go through the grid fiber. So we
had feedback basically that there is a huge latency for people being in China and if a lot of data is
being sent around in the browser it’s a bad experience. So that’s another thing, it might even get
unfair depending on whether you sit behind a slow connection or a fast connection. So that’s
another thing to consider, just throwing it in.
>>: But code completion would be probably inside JavaScript right.
>>: So that means you have to download the JavaScript. When we take the editor lodes too slow
we actually use a text area and we don’t use any kind of fancy editor. That means all these
players would be unfairly at a disadvantage because they don’t have access to colors and latency
is a concern.
>>: Well it doesn’t all happen in the browser, but then you have a round trip every time. I mean
we would need a C# compiler or a Java compiler that runs in your browser. I’m not sure if that
even exists.
>>: So what I’m thinking and I just did some right now, I know I’m not a C# developer, I’m
usually Java or Ruby, but always [indiscernible] upper case for the properties.
>>: We will retrain you.
>>: Yea, apparently, but that was kind of annoying because control S what’s going on? Oh, yea
and right upper case.
>>: The Java is lower case.
>>: Right, but I used C# now because of obvious reasons.
>>: Did you have more?
>> Alex Orso: Yea, let’s see that’s the last one I have for issues. So now: Opportunities, which
is what it’s all about. So what is next in terms of opportunities? So I think several people,
including myself, I’m very interested in what we can mine from the set of solutions, which can
be seen from many different perspectives. So one thing could be how do people get to the
solution? Where do they make a mistake? How can you learn typical mistakes? How can you
use the feedback to then make the assignments better and to teach better to the students? What if
you have like a student that submits solutions that are correct, but bad and then they get better?
How do they do that process? Can you learn how to improve it? Maybe you can then speed that
part up.
So there’s really partial solutions verses complete solutions. There’s really a lot that I think can
be mined and I don’t think we have a clear understanding of all the opportunities there. So the
fact that you guys are making the data available will be great. Including for program synthesis,
for people working on program synthesis can you then use that to sort of simulate the way in
which a human being synthesizes code and maybe have a better program synthesis approach that
is more similar to the one that humans use. So that in my opinion is really the best opportunity
here. We have this unprecedented set of data and we can use it to learn how people learn. So
that’s pretty good.
Something else that I think was mentioned and definitely seems to be interesting is AB testing.
So it will be interesting, you can do a lot of AB testing, because you can have different hint
systems, different input generation system techniques and just if you get enough players you
would be able to really explore what works, what doesn’t work and what works for what kind of
population.
>>: So what I was mentioning is that you can do it even in your own classroom where you
actually add a comment that gives the specification for exact sizes. Then you can look at your
dashboard and see what’s happening between classes.
>> Alex Orso: Yea, in fact I think this is something that is true in general for anything that is
online and with enough participants. You can just try different things for different parts as long
as you can separate in a fair way the students assuming that it’s a class that’s taking full credit.
>>: I think Billy’s point is that while it’s not entirely trivial to plug in a new hint system. What
basically anyone who goes home after this event can do is upload two different zones, which
differ in maybe descriptions or other ways and compare how students do in a pretty nice
automated fashion.
>> Alex Orso: Yea or even just what we were doing to improve the different problems where
you can say you can try different strategies and see how that works. The other thing that’s
probably more for me than for anybody else, but I don’t teach introductory classes. I normally
teach software engineering classes, which means that we are dealing with more complex code
and I would be really interested to see whether you can kind of take this to that level. And of
course it’s not going to be exactly the same because one of the beauties of Code Hunt is that it’s
self contained, it’s relatively small, you see the whole program, but maybe there is a way in
which you can kind of push in the constant so you can have design, development and you can
kind of use this same gaming approach for the high level abstraction for larger system.
>>: I think you’re going to challenge your best students with this. I’m sure the students will tell
you it can get really hard.
>> Alex Orso: Yea, it’s difficult, I know.
>>: I don’t think the range is limited to intro. In this structure that’s a couple of arrays and then
you can start having enough pointers that it get’s really tricky.
>> Alex Orso: No, no I understand, but the point is like if you are teaching a software
engineering class and you have these kinds of issues people will jump on you, because they will
say, “Well I’m not here to learn programming. I want to learn how to design a system.”
>>: So Tao Kyle had some interesting ideas on how to leverage the system to teach design
patterns. I mean we start by showing you that you fill in this one function, but really you can
write 64K of code with classes. So in other words you can look at the challenge of how can we
use this framework to teach balance? Tao had some ideas, but I think [indiscernible] and there is
not much more to discover.
>> Judith Bishop: And patterns are only one part of software engineering as well.
>>: But you’re right that it has a scope.
>> Alex Orso: Yea, it has a scope.
>>: It has a scope.
>> Alex Orso: But, it would be interesting to see if maybe there is a variation of this and you can
have maybe a broader scope. And maybe it can be put together, because then you can leverage
some of the –. The problem with these kinds of assignments is that, for example in one of my
assignments that I just brought a couple of days ago, is that you develop a small Android app that
does something and there is not a correct solution and there is not an easy test that you can run.
So how can you use a system like this in that context? I don’t know, is it possible?
>>: No.
>> Alex Orso: Ah, I don’t know. I think the clustering thing is something you can still do. Like
say you get all the solutions and then you cluster them to identify patterns of development that
might be good or bad and then you classify them and give feedback to the students. So it’s not
going to be the same thing where you submit your app and it tells you good/bad and then you
submit another one, but maybe if you kind of obstruct the feedback on other aspects of the code.
>>: With your Android app that’s something really hard in the core of it. Then you have the
whole [indiscernible] about doing Android and getting all the SDKs and stuff. But, at the heart
of it there is a hard grade to crack. You could think that your students would just pop to window
and start to test it. We could get a better feedback than just trying to write them and write a test
[indiscernible] themselves. And we found sometimes [indiscernible] where we would just pop
the browser and we needed a little parser and we [indiscernible]. So it was completely correct
and it copied back into [indiscernible].
>> Alex Orso: Yea, that might be also a way to do it.
>>: So I will draw the line where the question is: Is there a correct solution that you can
characterize or is there some subjective aspect to like an Android app? So how it behaves, how
it looks and that’s clearly outside of the scope of what this is all about. So what we have heard
when we run contests or from recruiting and from teachers is that the appealing aspect is really
this fully automated system which doesn’t require someone to assess the quality of solutions. So
if one keeps that in mind that should have to figure out what problems can fit and which don’t.
So there is certainly some aspects of design patterns where the beauty is in the eye of the
beholder, but then if it’s about having a visitor for a tree either the visitor produces the right
[indiscernible] or it doesn’t. So that’s how I would characterize the problems and then see if it
fits or not.
>> Alex Orso: Also for mobile apps for example you might think about constraining a little bit
and you have to pick your app in a suitable way. For example there is a display for the app and
you have to have these input fields and so on. At that point you could characterize that and that
becomes your input for the system. And there you could really do something along those lines.
So it has to be a specialized system, but maybe if there is enough interest in a specific domain
you could have a customized version of Code Hunt for mobile apps. Who knows, it’s just
something that might be worth considering. That’s actually all I had no my list.
>> Judith Bishop: Well I can immediately fill in on that opportunities list. We didn’t mention it,
but one of the aspects within the sandbox that Nikolai has already been talking about, or just on
the surface or edges of it, has been the idea that we should be able to have levels that build on
levels. So this is a very typical game experience that if you pass one level what you built in that
level enables you to unlock and get onto the next level, but also you use that. So what that would
mean in the context of Code Hunt is if you built a procedure in one function in one level you
would be able to call that function in the next level. So you’ve actually built something and you
can move on with that. So that’s a sense of achievement that you are not currently getting. It
would be another fun aspect you know, build you X, get your elixir and then move on and kill
the dragon’s sort of thing, right.
>>: Yea, actually I thinking whether you could break down even very complex system?
>> Judith Bishop: Yes so then the second level of that would be what you are actually doing is
building a class. So since we don’t yet have objects maybe the idea would be that you could
structure Code Hunt so that what happens is that within a sector you can build a class and then in
the next section you can instantiate your class. So that’s a thought. We haven’t thought it
through completely and it would obviously be something that would require a lot of building, but
I think it would fit within the model nicely.
>>: I think that’s a really great idea and I think that actually ties back to some of the questions
about difficulty too, because you want to structure those levels in a way that that difficulty
progression makes sense. So really having a good grasp of what difficulty is and being able to
measure it for an individual is really essential I think to have that experience be a good one for
learning. So I am really interested in that.
>>: And remember like the last summer we had some discussion on that.
>> Judith Bishop: Oh exactly, no we did, it was on our list and it’s still on our list.
>>: Okay I just wanted to check the progress.
>> Judith Bishop: Well so this kind of brings us to implementation of action to some extent. The
bare factor is that Microsoft will eventually and probably sooner rather than later, is stop
development on the project itself and for many good reasons, not for bad reasons, but simply
because it makes sense actually to involve other peoples good ideas and get other people
working on it. So, each of these suggestions is a packet of work that somebody could work on
should they wish to. That assumes there is a decent interface that enables them to come in and
we would have to ensure that is available. The most decent interface is that you send an intern
and the intern works on it. Unfortunately, that is also not possible all the time, because we might
not have slots or we might be busy with other interns at the time. So it’s not a silver bullet, the
intern option.
>>: I mean I’ve already had this discussion with the current team, but basically assuming that for
example my group [indiscernible], how to really have them hook it to be live, at least in the small
scale so that we have feedback, data to see how things go. I think from the outside, for example,
how do get the Cloud resources? I mean I think the easy way is like we host the service on our
server or our Cloud, but it’s difficult for a university to get that kind of a budget to really have
hosting, but maybe [indiscernible].
>>: So Microsoft Research has [indiscernible] research and you could apply for that and get
VMs and get resources on Azure.
>> Judith Bishop: Are you talking about hosting the service or hosting the data?
>>: The services. But, of course we need to have some way to get the data. I mean if the service
will be used the data would come in. So naturally the data would be, at least part of the data,
would be in the service.
>>: So I don’t know if it became clear to everyone how the system works, but there is a back end
that does all of the groundwork of doing the [indiscernible] explorations. It can scale up to use a
lot of CPU power and this potentially costs a lot of money to us. Then there’s a front end aspect
that optimizes web sites and guides you through the sectors. So those are basically two different
aspects of the system and some of the things we talked about are really front end related. So if
you want to do AB testing where some users see certain things that other users don’t that’s
mainly a front end issue. Even if you want to generate some kinds of hints then if that’s your
own hint generation engine then it’s again about showing something to some users and maybe
not to others if you want to do AB testing.
So there’s a back end that possibly costs a lot of money to run and then there’s a front end which
is all the light weight. The one option we were thinking about is to make that front end open
source, which would allow other people to pick it up, make some changes and then you could
either deploy that locally, which would be you can do that on your laptop and have your students
party on it or if you did something really great we could actually deploy that on our servers since
it’s open source and we all work together, assuming that the basic idea of the distinction between
front end and back end is reasonably clear.
How does that sound? Does anyone have any thoughts of what kind of experiments would be
possible in that setting, or what kind of addition you would want to add? Does anyone feel that
what you really want to do is change something about the back end you are interested in
generating different kinds of values or any of that? I am sure some of you have some ideas of a
research question you want to answer or some feature you would like to add. Does anyone have
an opinion of what an open source front end would help or if that’s not really a help?
>>: Can I ask a little bit of a clarification question?
>>: Yea.
>>: So how hard would it be with the infrastructure you are proposing to access things like a set
of samples of code attempts rather than data about numbered tries and things like that?
>> Judith Bishop: Well you probably weren’t here this morning, but we are releasing that.
>>: Okay so that’s easy and then the back end, the only thing you are talking about with access
to the back end here is to change the way that back end works, rather than just [inaudible].
>> Judith Bishop: Sure.
>>: No, I like the idea of being able to use that front end and getting access to the data
simultaneously.
>> Judith Bishop: So the plan is to regularly reduce chunks of data. The first one would come
out within a couple of days just making sure it’s right and then every few months we will bring
out another lot with different data.
>>: One idea would be that if you upload your universe you access to the data.
>>: [inaudible].
>>: Then if you upload the universe and you own it you own all the programs that are entered
through that universe.
>> Judith Bishop: Yea, that’s already the case, but –.
>>: No that’s not the case.
>>: Not accessible to the teacher.
>> Judith Bishop: Ah.
>>: It’s there in the server right.
>>: Right.
>> Judith Bishop: Okay, I think that’s one of the changes we asked –.
>>: [inaudible]. If you want to do studies then you would be able to query the Cloud and
download on demand.
>> Judith Bishop: Yea.
>>: Ideally you would like to have something like Code Hunter or service where you can access
the different aspects. Say for example I want to use the back end, but I want to use my own front
end and I can just provide you with whatever are the proposed solution and you give me back the
results or I can use the different pieces. But, I don’t know how much modular you can make
that.
>>: That already exists.
>>: But where dose it breakdown? Where are the individual modules that you can use?
>>: The big distinction is front end/back end. So the back end manages the data and performs
the test case generation and it is driven by the front end, which does all the UI basically. If we
make the front end open source then you could go in and change the [indiscernible] and tag to
the back end. So instead of using [indiscernible] to generate test cases you could use call your
own test generation to produce that.
>>: That’s the rest interface that you were talking about.
>>: Yes, exactly.
>>: We had an intern this summer that built Code Hunt in a touch develop.
>>: Yea, that’s an example.
>>: So he wrote a plug in touch develop and he basically rebuilt a complete front end based on
the touch develop language with a cross compiler to C#.
>>: And that was done without any need to modify the back end?
>>: No modification.
>>: That’s good, because I think that’s the kind of thing that would make it very –.
>>: We suspect to what Nikolai proposed is making the front end open source. That’s very
important if you wanted to [indiscernible] on the UI. But, for some of the work, for example like
in generation, test data generation selection, there are some kinds of well known important
features. If you could expose that as a more like call back service and then we configure through
your website it would just use the one that we hooked in.
>>: So Tao that’s exactly why we want to go open source. You would define these hooks and
you build your system and then it would take the pull request and see where we can implement it
on our side.
>>: Because you can do this with the same interface, the rest interface. You could just submit
that you want to set this parameter this way and this parameter that way. If I understand what
you mean. Are you saying, “Define the parameter for example like how you do symbolic
execution?”
>>: Right.
>>: So many of such extensions you can probably do just by tweaking the front end. For the hint
generation it turns out that actually the hint generation was pretty much a complete stand alone
project. There are very few exceptions. It needs access to the secret solution and attempts from
other people, which you would actually learn over time, but if you plug it in somewhere in the
middle it also would like to have access to the existing database. So the tricky part here is to
figure out: What do you need to do a particular extension that you might be interested in? Then
like what [indiscernible] said once you identify that, you can add the right hooks to make that
happen. So again if anyone at this point has some particular research ideas that you are very
interested in then it would be interesting to know and we can discuss what particular extension
points you would actually need to turn that into a reality.
>>: So pass condition.
>>: Pass conditions, very good, we have those. So the user enters the program and then when
they click the button you would want all the pass conditions that [indiscernible] discovered to be
sent to your server to be analyzed?
>>: Yea, the ones that the article and the test differ on.
>>: Oh, the miss matches. That’s a very interesting concrete ask, okay. I can see that works.
It’s interesting that it doesn’t even revere the secret program in all detail. It’s kind of an
abstraction and that’s interesting.
>>: If the pass condition boasts together the pairs you would review it to some extent.
>>: Yea, I guess if the user program was completely trivial then it’s going to miss match pretty
much everywhere and that will revere the entire program. Anyways, pass conditions, that’s
interesting and any other thoughts? I mean we heard about the C programming system and the
extension mechanism in there and you said you already thought about other extensions. Do you
have some experience in the surface area, the API, the interface, that is in between an external
extension and your system or do you still think that you will tweak it as new extensions come
up?
>>: Yes, we can tweak it. [indiscernible]. I did not get much time to design these things.
[indiscernible].
>>: Yea we are kind of in a similar situation for code and when it comes to extensions.
>> Right, right.
>>: So this is kind of a logistic question, but since we are getting the data from you any specific,
a little bit synergetic kind of activities that all these researchers could do? I mean I could
imagine some competition on a particular kind of feature. I see a lot of more resolves would be
produced by these more kind of loosely coordinated kind of [indiscernible].
>> Judith Bishop: Well I think we were thinking of doing some analysis of the data. Is that still
in the chapter or did we dump it?
>>: For now it’s still in the chapter.
>> Judith Bishop: Okay, so you mentioned that you had some tools that can analyze programs.
>>: It’s just [indiscernible].
>> Judith Bishop: Uh huh.
>>: It’s just some [indiscernible] tools that give some hint about code quality.
>> Judith Bishop: So you would be looking at the grades, whether they are going up or going
down and I think you were also doing that? So I am not sure whether we would be having
competitions, but we could be having paper collaborations once we start analyzing this data,
because we would be analyzing it in different ways.
>>: So I think another thing would be private [indiscernible] or open source. So for example in
the past all these extensions that my students [indiscernible] and other people could leverage that
attempt at whatever usage they could. So it seems we are really working on similar data sets or
the same data sets and building some features. I think that some of them could be data analysis.
Some of the tools could be just building some feature, hint generation, test data, selection, and
progress indication, whatever. I think it really makes sense to have kind of a sharing in terms of
the progress and the infrastructures.
>>: Yea, I mean right now we give you the data, but it doesn’t exactly come with the written
analysis framework, just sharing that kind of infrastructure. Daniel has some infrastructure
based on [indiscernible] that is able to pass code and that’s a good first step for anything. And I
don’t know what kind of state it is in right now, but that’s definitely something.
>>: Uh sort of the problem is that my parsing tools are all intermeshed with my synthesis code.
So they are not really particularly portable to other users.
>>: Yea, so you should spend some time to factor that out possibly.
>> Judith Bishop: Yea.
>>: In any case, having a shared bit repository that –.
>>: [indiscernible].
>> Judith Bishop: So here’s a practical question: What’s the best way for the community to keep
in touch and share ideas? And eventually e-mail just gets out of hand.
>>: A forum seems to be a good way.
>> Judith Bishop: Okay, which forums do people actually visit?
>>: Well, maybe we should release the data on GitHub.
>> Judith Bishop: Yea, we will release the data on GitHub and then we can set a forum up there.
>>: Right.
>> Judith Bishop: Okay.
>>: I think if you keep everything in GitHub that would be nice.
>> Judith Bishop: So the data released will spring that off then. Thank you.
>>: So what to do with the data? One big problem area that we discussed and Daniel presented
is hint generation. How can we help people to actually move forward? How can we identify that
people are stuck? There is another dimension that I think we very briefly mentioned, but then
didn’t discuss in more detail and that’s quality assessment of the code. So right now what we do
is we count [indiscernible] instructions by code instructions. That’s a proxy of how compact is
your algorithm and it seems to motivate people to keep going, but we don’t really know if that’s
the most engaging way of keeping them going. Sometimes it’s confusing and the other is quality
metric complexity or some other beauty metrics. So given the data one could certainly have
different quality assessment metrics and run that over the code base.
>> Judith Bishop: So on the data exactly the winning solutions are identified. So there is a
progressing of attempts and then plunk, the winning solution. So one could just take winning
solutions and then apply different objective criteria and see how they stack up against each other.
>> And of course there is no information about the players, right?
>> Judith Bishop: No, except that we do know of player 1.
>>: You know its player 1, but there are no demographics and nothing.
>> Judith Bishop: We could actually add to it the 3 point thing that is self added by players,
which is whether they select themselves as novice, intermediate or expert.
>>: That would probably be good. Anything that can be added to the data I think will be good if
you can do some clustering.
>>: So you can analyze it on the data, but what does it really tell us? That is something we can
also turn into. So generating real hints is difficult, but another competition we could create is
let’s say you come up with your own new assessment then we could [indiscernible] it out in the
cloud and then give different players different assessments from different groups and see which
one creates the most stickiness that people actually stay in the system. So for that you don’t even
need to know the secret solution, you just have to analyze individual program snippets. So that
would be another idea.
>> Judith Bishop: So we’ve also got time stamps on the data. So if you want to evaluate
winning by time stamps, etc, how long it takes. You can go and do that computation if you want
to.
>>: I just want to say for me the criteria for the best solution are the one of least paths. But, you
actually are sitting on that information because you have to have it already when you do the
symbolic execution.
>>: Well we sample it. It’s not exhaustive typically if you have anything that’s truly interesting
was loops, but there is interesting information that we should make available. So the data we
give you contains the user’s submissions and what you can do easily with some script that again
should be in that GitHub repository, you can fire that off to the Cloud to get the actual test cases
out. So I think the data that we will distribute doesn’t actually have the test cases in it that are
generated, but [indiscernible] in the Cloud and is available behind the API that I showed. And
again today that wouldn’t give you the path, but a projection of that to the test cases and we
could give you more.
>> Judith Bishop: Good feedback, yea. I mean with version 2 with the data we could do that
kind of a thing. Okay, so I think we should have our last question so that we don’t over run,
Tao?
>>: I mean you could also run [indiscernible].
>>: No execution paths beyond a second limit, yes.
>>: So you probably can’t provide this, but –.
>>: In the mining software repository conference community [indiscernible]. Basically every
year they just pick one or two open source projects, we [indiscernible]. Then in the end they
have this small like competition, but they accepted their papers, present the papers at the
conference [indiscernible]. And that could be combined with the workshop idea that we
discussed.
>> Judith Bishop: Yea, very much so. I think data is a key that opens many doors.
>>: You had a question?
>>: No I had a comment that we could have some kind of advanced tool where we give a
program and we ask people to shorten it.
>>: Yea.
>>: May I challenge that idea that shorter cycles are better in a program. I mean I come from a
self revolution background and usually the programs that have the shorter cycles might not be
the one that, from a comprehensive point of view, are better. So what are we trying to teach the
students? Are we trying to teach the most performance way?
>>: I mean think they are truly different dimensions and all of them make sense. A compact
program has advantages and in some settings the best complexity you want in another setting and
–.
>>: It’s also an exercise in program understanding. You have to be able to understand the
program very deeply in order to be able shorten. [indiscernible].
>> Judith Bishop: It’s the same as when you are learning a language, you learn [indiscernible],
you learn summary and you learn comprehension.
>>: So I think we should expose all of these different dimensions to the player, because I might
be interested and they can look at all of these axis’s, but what we also can do since we have a
human being involved is try to figure out what is most fun to optimize?
>> Judith Bishop: Yea.
>>: So that is a component and this is all great for teaching, but what we have found is that the
fun aspect is really what’s unique and what keeps people going. That is something that we
should also study to keep this going by itself. We don’t want to have to have a teacher sitting
behind you to do something.
>>: Right, from a [indiscernible] perspective I remember in high school I tried to do the shortest
solutions and see. I tried to do everything on one line and as I went onto programming for actual
companies, from a self [indiscernible] or maintenance perspective, this doesn’t make sense at all.
>>: Well not the on one line part, yes.
>> Judith Bishop: Daniel, you wanted to add something.
>>: I am just going to say that also because we are measuring [indiscernible] instructions and not
[indiscernible] it’s actually often quite un-intuitive what’s the shortest. In fact when I was like
playing around and trying to minimize stuff in code [indiscernible] I actually [indiscernible] so I
could figure out why some program is actually not shorter. To figure out what the compiler is
actually doing and otherwise minimizing [inaudible].
>> Judith Bishop: A master code hunter. Okay, on that note I think we should close our
workshop, thank our panelists very much and go forward to do great things.
[Applause]
Download