>> Anoop Gupta: So let's start. And apologies that... technical difficulties. So good morning, everyone. I'm Anoop Gupta from...

advertisement
>> Anoop Gupta: So let's start. And apologies that we're starting ten minutes late, some
technical difficulties. So good morning, everyone. I'm Anoop Gupta from Microsoft
Research, Redmond, and welcome to the talk on synthesis for education by Sumit
Gulwani. Sumit is a senior researcher at Microsoft Research, Redmond. He got his
Bachelor's from IIT Kanpur and then his PhD from UC Berkeley in 2005. His current
research interests are in the cross-disciplinary application areas of end-user
programming, programming synthesis and intelligent tutoring systems. Sumit's recent
work on program synthesis shipped as part of our latest Excel 2013, something called
Flash Fill that works like magic to simplify programming for end-users.
In this talk, Sumit is going to talk about some surprising applications of similar
technology to the area of intelligent tutoring systems; in particular how, you know, you
might generate new kinds of problems to offer to students, the solutions to these
problems as practice problems and automatic grading.
As many of us are familiar with online and these massively open online courses,
MOOC's, are becoming very popular during the last year and truly hold the potential for
enabling quality education for everyone everywhere. The kind of research that Sumit is
presenting today, I think will be an important component for us to truly realize the
potential of such online education. So without further adieu, I'm excited to have Sumit
Gulwani come and present.
This talk is being given as a part of the monthly ExCAPE Webinar Series and is being
watched by many people around the world. So welcome all of you. Sumit will give you
some more details and Professor Rajeev Alur from UPenn, if he can join us -- They
were, again, some technical difficulties -- will introduce a little bit more about how these
are being recorded. Thank you. Sumit, welcome.
>> Sumit Gulwani: Thanks, Anoop. Rajeev, are you there? Do you want to add
something? Okay. I'm afraid we probably cannot hear Rajeev because he was having
some technical difficulties. So this talk is going to be recorded and the link will be
available in a couple of days, so you can either contact Liz or I will also try to put this link
on my webpage.
So thanks once again, Anoop and Rajeev for hosting me for this talk. I'm going to talk
about intelligent tutoring systems. This is a cross-disciplinary research area and a quite
diverse one. My own technical inspiration for this research area drives from my recent
work on program synthesis. So let me start out by giving a small background on these
technologies.
So the traditional goal of program synthesis is synthesize computer programs or
computational artifacts from specification that usually available in the form of logical
relations between the inputs and outputs of the program. So this is essentially aimed for
helping software developers write code, and I myself worked in this area for several
years and played around with a variety of search techniques for searching the program
from the logical specification that the user had provided.
And these search techniques have a lot of inter-disciplinary flavor, so they were inspired
by my own background in formal methods. But then, we also learned a lot from the
traditional work done in AI and machine learning.
But recently we started applying these techniques to helping end-users. So these area
people who have access to computational devices but are not expert programmers. So
we want to help them synthesize small [inaudible] for automating repetitive tasks in their
lives. But these people are not going to be able to specify their intent using logic, but
they're going to specify their intent using examples. They're going to specify their intent
using natural language. And it turns out that the techniques that we developed for enduser programming are also very applicable to the domain of intelligent tutoring systems.
And this is what I'm going to talk about in this talk today.
So you can also find a commentary on some of this material in a recent paper that I
wrote based on a recent keynote which appears in SYNASC 2012.
So there are quite deep connections between end-user programming and intelligent
tutoring systems. So in end-user programming the end-user teaches the computer by
means of giving examples or demonstrations, and the computer applies that knowledge
to automate the repetitive task for the user. And now the paralleling education word is
that the teacher is going to teach the computer and the computer is going to apply that
knowledge to teach the students instead of automating the repetitive task.
Also it turns out that the way end-users interact with the computer is usually by means of
specifications which are often ambiguous. And hence some interactivity is required to
facilitate the interaction between the end-user and the computer. And a similar
interactivity can be used to facilitate the interaction between computer and students to
resolve the students' confusion.
Now what this means is that the research in end-user programming, which is a research
area on which is relatively easy to make money, can fuel research in intelligent tutoring
systems, which is a topic which has huge societal implications and vice-versa.
So in fact we have developed some techniques in intelligent tutoring systems which are
now bringing back [inaudible] end-user programming. So without further adieu let me
now directly jump into the topic of intelligent tutoring systems.
So there are several aspects in intelligent tutoring systems, and I'm going to talk about
four of these aspects in this talk, four different parts. So I'm going to focus on problem
generation, automatically generating solutions to these problems, automated greeting
and how to enter content inside the computer.
And you will notice that most of these techniques are going to be a little bit more
specialized for the domains for which they have been developed. And these techniques
can be applied to a variety of domains ranging from mathematics and science but also to
programming subjects, teaching automatic theory, logic, even to board games. And
some of these ideas and inspirations also carry over to the field of language learning.
So at this point let me see if someone wants to ask a question.
Okay. I don't see any question yet. Okay, so let me start out by problem generation.
>>: You need to focus [inaudible].
>> Sumit Gulwani: So the motivation for automatically generating problems is plenty.
Often times you might want to generate problems that are similar to a given a problem,
and this is often helpful in avoiding copyright issues. So if you see some problem in a
textbook, you cannot simply copy it and put it online for the students. So we like to be
able to generate problems that are similar to the problem that was given in a textbook.
We might use this to prevent plagiarism in several settings. So in MOOC's where you
have a massive number of students trying to take an exam, you ideally would like to
provide a different problem of the same difficulty level to each different student.
And in fact one the problems that happens in MOOC's is the issue of unsynchronized
instruction. So if you offer an exam and you publish your solutions then some other
student wants to take the exam at a later point of time, you want to provide them wth a
fresh set of problems.
Often times you might also want to generate problems in an absolute way; so, you want
to generate problems of a given difficulty level and problems that require exercising
certain concepts. And this can be useful in various ways. So the moment you start
developing notions of difficulty levels of problem, you can compare different
progressions in different textbooks; you can evaluate the quality of different textbooks.
You can also generate personalized workflows for the students. So if a student to solve
a problem and fails, you want to present a problem which is a simpler than that problem.
Or if a student solves a problem correctly, you want to present them with a more difficult
problem. So just as how it is done in exams like GRE, but now you can use this idea in
a more constructive fashion to aid learning in classrooms.
So I'm going to talk about key ideas which we have been using in our work on being to
automatically generate problems. And the first idea is what I call Guess and Verify. And
this technique essentially works when you have an extremely fast capability to check the
quality of a problem that you have generated or the correctness of a problem that you
have guessed.
So let me start out with the domain of algebraic identities or algebraic proofs. So
consider this red problem that I have taken from quite a famous textbook. Now I cannot
put this problem simply online for practice for the student because it is a creative
concept, and I would be violating copyright issues. But now we have developed a tool
that can take this problem and automatically generate all the green problems that you
see. So this is problem generation by example.
All these green problems have a very similar syntactic structure to the original problem,
and it also turns out that they have similar difficulty to the original problem. But in this
particular case, the system also ends up generating these two--. Okay, so the way that
this works is that the red problem is generalized into an orange problem. And then, we
try to find the [inaudible] of the orange problem which are well defined problems. But it
also happens that we end up generating the last two problems that you see here which
are much easier because they are of the form: alpha plus beta times alpha minus beta
equals alpha squared minus beta squared which is something that is true for all alpha's
and beta's.
But one way to avoid such problems is to strengthen the orange constraint by adding the
constraint that the term T should not be equal to T five. So T was a term that was a
generalization of the first trigonometric operator, a secant of X, and it stands for any kind
of trigonometric operator that can be put in its place and so on.
And this is in fact the generalization that [inaudible] are producing in the first place, and
once this is there then these last two problems which are relatively simpler, kind of
disappear. So the point that I want to stress forward here is that this is a tool that can
either be used in a completely automatic manner by a teacher or for a super-user, the
user can also interact with this tool by controlling how rich or how restricted the orange
query is.
So this is quite a general purpose tool which works for several algebraic domains. So
here is...
>>: So do you do combinatorial search to [inaudible] and rule out the ones that are not
valid?
>> Sumit Gulwani: Yes. So great question. So the question is how do we do this? So do
we do combinatorial search to try out all possibilities? And this is exactly what we do. So
the technique, the general principle in this here is to do combinatorial search and quickly
verify whether a problem is correct or not. So thanks for bringing up that question.
So if you look at this [inaudible] here, so each of the operators is replaced by a hole
which can be plugged in by an operator of the same type signature. And in this particular
case we have six holes, and then there's also a binary hole for a plus or a minus. So
we're going to try out all possibilities. So each trigonometric hole can have one of six
trigonometric operators. So we see that [inaudible] is actually quite big. But then, once
we put in a nondeterministic choice inside these holes, we need a method to test
whether the problem is correct or not because if you arbitrarily or randomly plug in these
holes with trigonometric operators you might not get a correct problem.
We actually have a very efficient test to test for the correctness of these problems. And
this test is an extension of so-called Polynomial Identity Testing which states that if you
check the correctness of these problems on a few random inputs, and if the left-inside
and the right-inside turn out to be same when you evaluate these terms on those
random inputs then with very high probability all the choices of the random inputs, this
problem, was indeed a correct problem. And this is the fast verification check that we
use to essentially identify quickly and filter whether a problem is correct or not.
>>: So the [inaudible] is stored somewhere in the database so you can see whether
something is exactly the same or not?
>> Sumit Gulwani: So...
>>: [inaudible].
>> Sumit Gulwani: So the groundwork here is, I choose a random value of x and I
evaluate a guessed problem with respect to that random value. And I see whether the
left-hand side and right-hand side were evaluating to the same value or not. So this is a
general concept, works for a variety of algebraic operators for [inaudible] randomized
identity testing holes.
So for example if you give our tool this red limit problem, it will end up generating these
green problems which have a similar structure and also turn out to have the same
difficulty level. And here is the orange term that we generalize the red problem to. And
here's an integration problem. You give this red problem; our tool generates these green
problems which you can go and put on your websites.
Here's a determinate problem, and these are the other very similar problems that you
automatically generated. So now before I move on to another application, are there any
questions that the audience has? Okay, so let's move on. So let me show you another
quick example of the same methodology. And this time in the context of generating
problems for natural deduction which is a topic that is taught in introductory logic
courses.
So [inaudible] idea was that all of this work that I'm presenting today is joint with several
collaborators, and I'm putting down the name of the people who are involved when I talk
about the specific projects. Oops
So in this particular case we are talking about natural deduction problems which have
the form given some premises -- In this case, Premise 1 and Premise 2 -- prove that the
conclusion is true. And one way to generate similar problems might be for the teacher to
have a tool where the teacher can generate some replacements for either these
premises or these conclusions. So in this case if the teacher says, "I want replacements
for Premise 1 --"
[recording sound]
>> Sumit Gulwani: So if the teacher wants to generate replacements of Premise 1 then
this is what our tool gives them. And, again, the way that this technique works is that we
have extremely fast check for checking the correctness or the well defined-ness of a
given problem. So what we do is we enumerate all possible formulas. But the number of
formulas is actually huge, so what we do is we enumerate all possible truth tables that
correspond to formulas that can be represented in a small space. And the number of
such truth tables that correspond to a small formula is actually much less than 2 to the
power, 2 to the power N where N is the number of Boolean variables. So if N is 5, I think
this number is around 20,000.
Then, for each of these things we quickly check whether the problem is correct or not by
representing the truth table by bit [inaudible]. So all that we need is like 6-bit [inaudible]
checks to check whether one of the problems that we have guessed is correct or not.
Right? Very similar to the principle that I was using in algebraic identities that will try out
all possibilities by brute force search and have a very fast verification check to figure out
whether the problem is correct or not. And the same principle we use here as well.
Okay, so now let me talk about another key idea that is often helpful in problem
generation which is that you think of it as a reverse process of solution generation. And
here you might want to use some symbolic methods or symbolic techniques.
So I won't go into any more details but I will just show you an application of it. And this
time in the context of generating good practice problem for playing board games. So
imagine you want to design a new board game. So let's say we're talking about -- Okay,
so I see that someone probably has a question here. So I'm getting confused because
people are -- So are people able to hear me?
>>: [inaudible]....
>> Sumit Gulwani: It might be [inaudible]. Okay.
>>: I think you've [inaudible]...
>> Sumit Gulwani: Okay.
>>: Can you tell what time [inaudible]?
>> Sumit Gulwani: Yeah, let me just see if I have some --. Okay. So --. So can someone
confirm that they can hear us? Okay. Oh, people can hear us. Okay, good. Thanks, Liz.
Okay, so now suppose you want to design a new board game, Tic Tac Toe on 4-cross-4.
Right? And you want to have a rule where you would only allow matches of three but
only along rows and columns, not along diagonals. And another reason why you would
want to generate new interesting starting configurations for such games is that if you
play on this new game and if you start from the starting configuration, the first player is
almost always going to win. It is triggered for the first player unlike in 3-cross-3. But if
you start from an initial configuration which is not the empty board then this game
becomes interesting. Also, even if the game was already interesting, if you start from the
default state, people can memorize moves. But if you start from an given initial
configuration, you can actually train students to be able to play certain strategies. So
what we have is a system where you write down the rules of the game, and you specify
the difficulty level for the opponent and for the player, and out comes some good,
interesting starting play configurations that you can play with.
And the way that this technique works is, it's kind of like solution generation in reverse.
So I won't have more time to go into details but let me actually move on.
So now I'm going to talk about a third technique for problem generation which is inspired
by techniques developed in the software engineering community on test input
generation. So these kinds of techniques work for procedural problems that are
especially common in middle school mathematics but you'll also see them in advanced
courses as well.
So this is a progression of problems for practicing addition of numbers that I've taken
from middle school curriculum. So how can you figure out what is the quality of these
problems and how can you generate more problems of their particular kind? So the key
idea is that you first write down the procedure that the student is supposed to execute in
order to be able to add these numbers. So let me see if someone has a comment here.
Okay. Okay, so most people are able to hear me so I'll ignore that. Okay. So now once
you write down the procedure and label interesting statements in this procedure with
tags which denote which branch was a duplicate or --.
[filtered echo sounds]
Then you can distinguish between these problems...
[filtered echo sounds]
Then we can distinguish these problems by observing their traits.
[filtered echo sounds]
I can hear some echo. I was wondering if someone has their microphone turned on...?
[filtered echo sounds]
Okay. So one way to distinguish between all these different problems that we see in the
progression in a textbook, is to observe the trace that was executed when this problem
was done or the procedure that the student was supposed to execute in order to solve
this problem. And then you can see that these problems turn out to be quite different, but
maybe the second and third one are quite similar because of their trace characteristics.
And they are supposed to be because they look quite symmetrical; they are symmetrical.
Now you can use these traces to define the difficulty level of a problem. The longer the
trace, the more number of loop iterations that it goes through, the more number of
exceptional conditions that it executes in this procedure, the more complicated our trace
might be.
So we use this concept to actually compare two different progressions that we got from
two different math textbooks. So this is about integer comparison. You're given two
integers and you have to figure out which integer is larger. So we have a green
progression taken from a middle school mathematics textbook called Jump Math, and
we have a blue progression taken from a textbook called Skill Sharpeners.
So what we did was, we again wrote down the procedure that the student is supposed to
execute in order to be able to solve this problem. And then, we drew the graph of the
different problems for these progressions. So the blue progression is one and the green
is the other one. So now you will notice from this graph that the green progression
actually moves into more involved problems that require comparing numbers with more
digits in which the first several digits are actually equal. And in fact it moves quickly into
more involved problems.
But it actually ignores an entire class of levels, namely the ones corresponding to H and
L. So what are H and L? H and L are two specific cases where two integers can be
compared easily, not by virtue of comparing their digits one by one from left to right but
by simply counting the number of digits in those integers. So if an integer has a few
number of digits then it's going to be smaller than the other integer.
So now let's move on to the second part of the talk which is about being able to
automatically solve problems. So people often ask me, "Why do you want to develop
capabilities to automatically solve problems? Do you want students to cheat on exams?"
Well, not really. So there's a lot of motivations why we want to be able to solve problems.
So one big motivation is that it is an important ingredient to be used in the problem
generation phase itself when you want to generate problems that are not only similar to
a given problem but problems that involve solutions that have certain characteristics. Or,
when you want to generate sample solutions to the new problems that you have
generated using your problem generation tool.
Another reason for solution generation is to be able to generate customized solutions.
So oftentimes problems have multiple possible solutions, like programming problems or
proof problems, and one solution might appeal more to a student than to the other
student who might be familiar with a different set of concepts.
It can be used to complete unfinished solutions from students. So instead of trying to
present a sample solution to the student which is very different from how the student
tried to solve the problem, it is better to complete the solution along the lines the student
was trying to think.
And there are two key ideas that we have used in terms of solving approaching this
problem. So one idea comes from our work on automatically synthesizing programs from
logical specifications and from specifications in the form of examples.
So let me explain this in the context of geometry constructions, ruler-compass based
geometry constructions. Suppose you are given this red triangle, and the goal is to
construct a green circle that pass through the three vertices of this red triangle. Now one
way to solve this problem is to construct a perpendicular bisector of XY then construct a
perpendicular bisector of XZ. These two perpendicular bisectors will intersect at a point
N and then, you draw a circle through the center whose area is equal to distance
between that point and one of the points on the triangle.
Now what I just said was actually a program. It's a straight-line program in which the
operations are ruler and compass operations. So it is a program synthesis problem. You
want to synthesize a straight-line program made up of ruler-compass operators which
can solve this particular problem.
The first thing that we have to do is to be able to formally specify the English description
that was there in this problem. In fact, we have all done natural language processing
techniques which can take such a natural language description and automatically
generate a logical specification of the problem.
So in this case the problem has a precondition on points XYZ; namely, that XYZ are
three points which are not on a straight line. And there's a postcondition which relates
how C relates to XYZ. So C is a circle that passes through X. It also passes through Y. It
also passes through Z. So this is what the logical specification of the problem says, and
we have developed technologies which can automatically take the English description of
the problem and generate these logical descriptions.
Now once you have a logical description, we want to generate a program -- We want to
synthesize a program -- which, in this case, is a straight-line composition of geometry
methods. So there are types, like points, lines and circles, and there geometry methods
like ruler, compass and intersect operator.
So if you go back to this problem, constructing the perpendicular bisector requires these
four different steps, constructing the other perpendicular bisector requires the other four
steps and then, you find the point of intersection. And then, you draw the circle between
them.
So this is the kind of program that we can automatically produce using our technique.
Now let me give you a little bit of [inaudible] of how it works. And before I describe to
automatically produce the solution, let me address a simpler problem of how to
automatically check a given solution from the student. So let's say P is a geometry
program that a student has given us, and it's supposed to translate inputs I into outputs
O that have this precondition and postcondition that you see in blue.
So if the input I satisfies the precondition Pre then the output O should be such that it
satisfies the postcondition Post. So this is the verification problem. And the synthesis
problem would be that we will be only given the logical specification pre- and post- which
describes how the output should be related to the inputs, and the goal is to figure out a
program P.
Now let me first talk about a verification problem where the solution is given to us, and
we just want to check whether the solution is correct or not. In fact this problem is
decidable; there are known existing decision procedures, but they're extremely complex.
But I will preset to you a new decision procedure which is very fast and it will enable us
to extend this technology to be able to synthesize the solutions in the first place.
So the efficient approach is actually random testing. It is exactly the same idea that we
used for checking the correctness of our guesses for algebraic identities which is
something that I covered during problem generation. And the idea is that if you test the
correctness of the student's construction on a random input, and if it succeeds then it's
very high probability all the choices of the random inputs, the student's solution is
actually correct. And the correctness of this remark that I just made again follows from
an extension of Polynomial Identity testing which is a classical theorem in computer
science theory. And the same theorem applies to algebraic operators and geometric
operators which form the basis of a solution for automatically generating algebra
problems and automatically solving geometry problems.
So the synthesis algorithm works like this: so we take the logical specification and we
convert it into a random input-output example that satisfies that logical specification. And
we find this random input-output example using numerical methods. And then, we try to
solve or generate a program which can take the randomly chosen input to the
corresponding output.
And this is not very different from how students would solve these problems, right? They
will sketch out something using pen and paper and then try to work backwards and solve
them. And this is exactly what we do. So then, we just do brute force search. We start
from the input objects and then, we try to apply geometric operators randomly to them
until we are able to hit the desired output objects. And then, there are again some details
in how this technique scales but this is pretty much the basic broad idea.
And let me again give you an [inaudible] of why this approach actually works. So the
error probability of this algorithm is extremely low. So if I look at the same problem of
constructing a circumcircle, now if the input triangle isn't equal to a triangle and if I did a
wrong construction like that of an incircle versus a circumcircle, I will notice that their
centers actually coincide. And I will be forced into believing that the construction that
works for the incenter is the correct construction for finding the circumcenter, but it is not
the case.
But when will this problem arise? It will arise only if when I choose the input triangle XYZ
randomly, I choose it to be an equilateral triangle. So what are the chances of me
choosing a triangle to be an equilateral triangle if I'm choosing it randomly? It's actually
very small. Right?
So I will not be fooled into believing that angular bisectors will intersect at the same point
as the perpendicular bisectors because the changes of me choosing a random triangle
to be an equilateral one is going to be very small. So I see someone wants to ask a
question here. It's a good time to probable stop as well.
So, Liz, do you know who raised their hands and if someone wants to ask a question? If
someone wants to ask a question, please go ahead.
>>: [filtered inaudible]
>> Sumit Gulwani: Okay. Probably not. So let's move on. So let me present you -- Yeah,
go ahead.
>>: Is it a problem that you -- So I can accept that you would not create programs that
would generate the wrong solution. But is it a problem that you would create programs
that contain unnecessary or superfluous steps that do not actually contribute?
>> Sumit Gulwani: Yes. So the question is: can we create geometric programs in which
there are some unnecessary steps that are not needed? Or maybe I generate a longer
solution. So that's a good question. So the way we solve this problem is by pruning out
statements which are dead statements, so we only keep statements that contribute to
live variables. And we bias our search space by trying to figure out shorter solutions first
before we search for longer solutions. So both of them are valid problems, and we do
address that. Yeah.
>>: I have a question here. I assume that the purpose of doing solutions is provide
feedback to the student so he can go through the steps to learn how to do that. But
that's [inaudible]. Like the example that you showed early on, you put a lot of steps
which are just for the programming purpose rather than just getting an original solution to
doing this.
>> Sumit Gulwani: Okay. So good question. So the question is that the program that I
showed you for geometry had lots of steps because drawing a perpendicular bisector
itself takes four operations if you look at ruler and compass operators. And giving
feedback to the students in the form of these larger programs might not be as
meaningful as giving them higher level concepts. And we do, again, cater for this. So by
trying to replace instructions which correspond to some known concepts like, "Draw
perpendicular bisector," so we take those four instructions and we replace them by one
instruction which says, "Clear the perpendicular bisector." So we do try to present
solutions in terms of higher level concepts.
Okay. So now let me show you another application of a very similar technology. In this
case it is about solving problems in an automata theory course. So one common
problem that you see in automata theory course is you are given a language and you
have to figure out whether the language is regular or not. If the language is regular, you
have to construct an automata. If the language is not regular then you have to a proof of
non-regularity.
So what do you think? Is this language that you see in red, is this regular or non-regular,
language of all the strings that have the same number of occurrences of the substring
AB as the substring BA?
So it might appear that it is a non-regular language, but it is regular. It's a trick question.
And if you give this to our tool, our tool outputs the automata for this language. But if I
change this language slightly, same number of occurrences of A as occurrences of B,
then it is really counting and this is non-regular and our tool ends up generating a proof
of non-regularity.
So again what is the methodology in terms of automating these things? We need to be
able to describe the problem in a formal language. And we have defined a user-friendly
logic for doing this, and the idea of designing these logics is that it should enable easier
translation from natural language using standard [inaudible] natural language processing
tools, and we have this come as part of our tool as well.
Then you need to have a language for describing the solutions. So what are the kinds of
solutions that we want? Well we want to prove this in automata if the language is
regular, and automata is a concept that is variable-defined so no need to formalize it any
further. But we want to formalize non-regularity proofs. So the proofs that we look at are
so-called Myhill-Nerode Theorem proofs. So I don't want you to get bogged down into
the Greek symbols here. But the observation is that the proof of non-regularity can be
expressed in terms of two functions, F and G, which we'll call congruence functions and
witness functions. And what we observe is that in real life, these F and G functions take
values from a simple language that you see here.
And then, what we want to do is to be able to develop an algorithm that can convert the
problem description to either an automata if the language is regular or it can convert the
problem description to a non-regularity proof, which in this case I just showed you comes
from a small, simple language. And for both of these we use standard results for -Actually for automata construction we leveraged some standard results on automata
learning by Angluin; it's called the L-star algorithm, a very classical algorithm. And then
for generating non-regularity proofs, we again get back to our guess methodology where
we do an intelligent guessing of a solution. Then we test it and if it passes the testing
then it's verified. But again the details are a bit too much for me to cover in the talk.
But, again, the point that I want to stress forward is that most of these are search
problems. And one of the common methodologies that we use for the search problems is
brute force search, come up with an intelligent scheme to guess for solutions or a limited
[inaudible] space and then, use a very fast verification check to see whether the solution
is correct or not. So let me see if someone has a comment here.
Okay. So [inaudible] is asking, "How do you choose an input?" And so, [inaudible] was
this -- Am too late in seeing your question? So is this question in the context of
automata's or non-regularity proofs?
Okay. So let me then try to integrate more [inaudible] question. So the question is, how
do we choose an input? So in non-regularity proofs we have to construct these functions
F and G, that should satisfy some property for all I's and all J's; so I and J's are integers.
So what we do is we try to choose small values of I and J's and then, try to find an F and
G that works for those small values of I and J, then test it for even more values for I and
J and then, verify for all values of I and J.
So [inaudible], is you question about geometric constructions? How to choose inputs
there?
Okay, so [inaudible]'s mic does not work. But in case of geometry, we try to solve for a
random solution using some of the existing off-the-shelf numerical solvers. So they might
not give us perfected [inaudible] solutions, but it seems to be good enough for all the
practical experimentation that we did.
So let's move on. Yeah? So now I'm going to show you another technique for solution
generation. In this case it is from demonstrations, and this works for procedural
problems. It's a very different take at the problem. So consider this snapshot from a
middle school mathematics textbook in which the student is being shown how to
compute the LCM of three numbers. You can see on the top right.
Now if I do not just tell the answer to the student which is 1440, but if I walk the student
through all the different steps that are required in solving this problem, we are setting up
ourselves for a formalizing which is very similar to programming by demonstrations in
which an end-user shows me how to automatic tasks in spreadsheets through a
sequence of steps. And it is a very similar technology that we use here to be able to
automatically be able to generate solutions that can solve such procedural problems.
And, again, one thing that you'll observe is that that beauty of generating such solutions
is that once I have a procedure that can solve an LCM problem then I can use that
procedure not only to produce sample solutions for all the practice problems, I can use
that procedure to in fact generate practice problems in the first place.
If you recall, that was one of our techniques of problem generation in which the goal was
to get a procedure that can solve a given middle school mathematics problem and then,
use test case [inaudible] tools to exercise all parts in the procedure in order to generate
a good progression of problems for the students.
So I think this is a great example of the connections with end-user programming where
first the end-user teaches the computer and then the computer is able to help the enduser. So in this case the teacher is going to first teach the computer by means of some
examples as they would teach a student. But once the computer becomes smart enough
to understand how to follow a second concept then the computer can start teaching that
to the students.
So now let me go on to the third part of the talk which is about automated grading, and
this is probably the one that probably does not need much motivation. This is something
that everyone feels, you know, is probably one of the most important topics. Because
once you have an automated grading technology, you can help enable answer-scripts be
graded immediately and provide immediate feedback to students. But on the other hand,
it can also enable a rich tutoring experience which normally one-on-one tutors would not
have the patience to do.
For example, generate good hints for a student when the student gives you a wrong
solution or provide the student some simpler practice problems to act on, to work on if
they are not able to solve a particular problem based upon the kind of errors that they
actually make.
And there is one key area that I want to enforce, you know, in this topic which is that
there are often a variety of feedback metrics that need to be provided, a variety of
feedbacks that need to be provided to different students depending upon the kinds of
mistakes that they are making.
So one feedback metric might work better for one student; another feedback metric
might work better for another student. And one of the feedback metrics that is often used
in computational courses, like automata or programming, is showing the [inaudible] on
which the solution is not correct. And this is a great metric. In fact this is the state of the
art that is used in grading programming problems. So let me just show you can example
of it.
So we have a very nice tool [inaudible] called PexForFun.com where you can see a
problem, submit a solution to a programming problem, and the tool will give you
feedback in the form of counter-examples. It will show you an input on which your
program does not compute the right answer. So this is a technology that we have
deployed here at MSR. And this is, by the way, very useful technology. But now what I'm
going to show you is one of the [inaudible] points that a user went through to emphasize
the need for giving a different kind of feedback.
So this is a solution that someone tries to submit for an array reverse problem. So the
goal is reverse an array, and this solution is buggy. And this is a solution that a student
submits. And when the student submits a solution, the student immediately gets back a
counter-example. And the student stares at it, you know, "Why is my program not correct
on this input?" And then the student tries to change this. So once the student gets the
feedback, the student tries to change it. And the student changes it like this which is still
incorrect.
And the student gets another counter-example. And then the student submits the same
attempt again, you know, as if we had an [inaudible] which will change its response. But,
no. So clearly some sign of frustration here. And the student makes further changes.
Now this is the same as the initial attempt except that the student put a print statement in
the middle of the loop, you know, going back to the original mistake. The student gets
the same counter-example again and then, submits the same attempt again after looking
at the counter-example. The student gets back to the reverse attempt. This is, again,
same as the initial attempt. This is also the same. Ah-ha! And now the student is slowly
moving towards the correct solution. The only error is that instead of A of I in the loop
body, the student should have A of I minus 1. And the student gets another counterexample.
Now the problem with counter-examples is that the counter-examples is not telling the
student how close the student is to the correct solution. Okay? And see what happens
after this. Now the student tries to make fixes in the wrong direction. And now the
student is back to the bigger error. The student submits the same attempt again, gets
the counter-example, submits the same attempt again, and now the student is too
frustrated and gives up. Right?
So what I'm trying to emphasize is the need to provide different kinds of feedback for
different students. And in this case, one of the feedback that you can provide is to take
the student's solution and try to make small edits to it so that the student's solution
becomes a correct solution.
Now note that you cannot simply do syntactic edits to compare it with the sample
solution because the number of correct solutions, at least in programming, are infinite.
Right? You cannot simply try to morph the student's solution to a correct sample
solution; you have to modify it to the nearest correct solution.
And one of the techniques that we use is this very nice work on program sketching that
was originally started out with by Armando Solar-Lezama and [inaudible] and now
Rishabh Singh is [inaudible] at MIT who is working on this.
So the way we do this is that the teacher writes down an error model describing the
standard kinds of mistakes that the students often make. So for example the students
are often wrong by the plus-minus one in the array indexing. They can often be wrong in
understanding the semantics of increments should it be V++ or V- - and so on.
And so the teacher writes down and prepares this error model based upon their
experience. And then the key idea is that we try to solve the problem of making the
smallest set of changes to the student's solution using this error model, using these rewrite rules that will make the student's solution into a correct one. And this can be
phrased as a sketch problem which is an off-the-shelf tool for program synthesis.
And once you do that then these are two different [inaudible] for array reverse. We give it
to our tool and our tool will simply point out the errors in these solutions immediately.
And if you want, you can play with this tool on these links that I've provided here.
So let me see if someone has a question here. Okay. Okay, so let's move on. So now let
me show you another technique, another way of providing feedback using mutation to
generate feedback. Another kind of feedback that you can provide is that if you look at
the student's solution try to figure out the problem for which the student has given a
solution. Now you try to reverse synthesize the problem for which the student's solution
is correct, and you say that instead of solving this problem, you try to solve a slightly
different problem.
So these are different attempts to automate a problem. The problem was draw an
automata that accepts strings with an even number of the letter a's. And this is the
correct solution. Our tool can automatically grade it. But this is an incorrect solution that
one of the students provided. So what does our tool do? Well, one of the feedbacks that
is useful to give here is that your solution is almost correct except that it does not work
well on the empty string. And we give four out of five to the student. And this is the same
as input-based feedback which is also used in programming exercises.
But here's another attempt that we saw a student might have given. And in this case we
try to do more of the kind of mutation-based bug localization where you try to find
minimal syntactic changes that you can do to the automata to make it into a correct one.
But this is another one where you would actually require two different changes, so
instead of deducting, you know, four points on the student's solution with some other
model, you can actually figure out that the student was actually quite close to the correct
solution. So the student solved the wrong problem. Instead of solving an automata that
accepts strings with even number of a's, the student solved the automata that accepts
strings with odd number of a's. And, better feedback might be more useful to the student
and this is what we call problem-based feedback.
Okay. So let me quickly move on to the last part of my talk -- And try to finish in five
minutes here -- which is the problem of content entry. So we've been talking about all
these technologies to provide feedback to students which require the students to enter
all these things inside the computer in the first place. And today we do not have great
editors to be able to enter structured content inside the computer except for programs,
so we have programming [inaudible] Visual Studio which make it easy to enter
programs.
But what about subjects such as math and physics which require the student to enter
lots of equations and drawings? So the key ideas that we tried to use in trying to develop
editors which make it easy to enter such structured content is one: we want to allow for
multi-modal input such use of ink, speech, touch, and this becoming especially
significant with the presence of new devices with new form vectors. Then, what we want
to do is we also want to do some error correction because when you the soft forms of
intent like speech --.
So when you use some soft forms of intent like speech and ink, you're likely to make
some errors. So it will be good to have an error correction technology. And for the
[inaudible] we would like to have a prediction as there is in IntelliSense.
So let me illustrate these concepts to you for the domain of mathematical equations. So
if you use existing editors like LaTeX or Microsoft Word, they're going to be very painful
for you to write equations. One of the solutions that we have investigated is that of
developing an intelligent predictive editor, and the reason why such an editor is possible
is because most man-made stuff has very low entropy. Most structured concepts have
very low entropy and hence amenable to prediction. So there are two kinds of prediction
that we can do.
So one kind of prediction is what I call Structure Prediction in which we can predict
parentheses in a mathematical term. Right? So this is a very specific thing, but I'm just
giving examples of what is possible in this space. And this can enable entering of
mathematical text by speech. So for example, suppose you want to enter this term that
you see on the slide, how would you say it if I asked you to, you know, speak this term?
You would very likely say, "Square root 1 plus cos divided by 1 minus cos." Right?
Ideally what you should have said was, "Square root of, open parentheses, open
parentheses, 1 plus cos, open parentheses, A, close parentheses, close parentheses,
divide by..." and so on, right, which is becoming quite [inaudible] to speak.
But now we have a technology which can automatically take the term without
parentheses and give you likely suggestions on where parentheses could have been in
the first place. So it uses the heuristic that most mathematical equations will have
parentheses inserted in a way that the term becomes highly symmetrical. And this
heuristic seems to work well for lots of experiments that we've tried.
>>: [inaudible] I mean it could be...
>> Sumit Gulwani: Oh, absolutely...
>>: ...square root of the top...
>> Sumit Gulwani: Absolutely, right. But the point is that people are not trying to write
arbitrary text. And this text has a very low entropy, and this is what [inaudible]. So you're
absolutely right. If every possible parenthesization was equally likely, this technique is
not going to work. Right? But the point is that every parenthesization is not likely. People
are not going to put parentheses around 1. People are not going to say, "Square root of
1," and so on. And people want symmetrical terms, and that's why this technique works.
So another technology that we have here is very much similar to IntelliSense, Visual
Studio IntelliSense, where I can predict what the user is wanting to type next. So all that
you see in color is something that has been predicted automatically. Once I see the
black terms, I can predict that colored terms because there's some type of pattern. And
these patterns normally occur when you try to enter symbolic matrices like this.
We can also predict sometimes when you're trying to solve problems if you apply a given
step, we can then apply that kind of reasoning to the rest of the problem so make it easy
for teachers to enter sample solutions if they wanted to.
Okay. So the last part in my talk which is about trying to create drawing which is another
important concept in eduction; you need drawings for geometry, physics diagrams,
chemistry diagrams and so on. And one challenge in drawings is that they often require
extreme precision; you have to make sure that the line is a tangent to a circle, the angle
at the center of the circle has 90 degrees and so on.
And the idea is to allow users to do pen-based sketching and infer constraints using
machine learning and then, use constraint solving technology in [inaudible] community to
beautify these drawings. And the other challenge is that sometimes drawing is very
repetitive. It can be very tedious to draw sometimes, like trying to draw a resistor in an
electrical circuit and lots of repetitive edges and so on, or trying to draw a ladder.
And the problem with copy and paste is that copy and paste does not help you with
positioning of copied objects and transformations on copied objects. And our solution is
to use synthesis technology to predict the repetitive features that are there in that
drawing. So let me just give you an example of how it works: so let's say you want to
draw this perfect ladder. You sketch it out using pen and then, we can beautify it for you
but we can also predict what is the repetitive aspect here. Similarly if you're trying to
draw this circle with lots of spokes, we beautify this for you and then, we are able to
predict what you were trying to draw.
Okay, so now let me conclude. So in this talk I talked about different kinds of aspects in
an intelligent tutoring system, namely solution generation, problem generation,
automated graded, entry of structured content. And we also saw that some of these
techniques were specific to their domain for which they were applied. But I personally
think that this is not a problem because you need to double up these technologies once
and for all. The standard curriculum does not change that often. We have been reading
the same concepts for several decades. So one you double up these specialized
systems for these domains, they are there to stay.
The other thing that I want to stress is that this area can benefit from cross-disciplinary
research. So you must have already seen some inspiration that I drew from my
background in formal methods and my expertise in program synthesis and search
techniques. There was a lot of HCI aspect that I just talked about in the last part of the
talk. Things that I did not get to cover are natural language processing for dealing with
word problem because often times problems are not [inaudible] in English. And also
using machine learning for analytics. So maybe that would be a good material for a talk
some other day.
Now what is the value proposition that we are seeking here? So in case of short term,
the goal is to actually improve education by making it more interaction, by allowing
students to explore more, by making it into a social activity because if all these things
are digitized, different students are trying to solve the same problem at the same time
they can be brought together. It can turn into a very social activity as well. It can really
enhance learning. But I am also personally interested in some of the long term benefits
of investing into this technology. And this goes back to the connections that I was talking
about with end-user programming and intelligent tutoring systems because once the
computer becomes smart enough, you learn concepts of math, programming, automata,
physics, chemistry, maybe even undergraduate concepts -- I’m not talking about
graduate level concepts, right, even undergraduate concepts. We are now talking about
having a very ultra-intelligent computer.
Also if we succeed in this endeavor of trying to use technology to improve learning, we
would likely also have developed the model of human mind. And I will conclude with this
last long term application that I have so that this is, I think, at least the thing that'll help
you remember this talk.
So how can this technology help in inter-stellar travel? So one of the theories for interstellar travel is where you want to travel to far off galaxies is to freeze a human embryo
and put it in a space ship and then, time does not matter. The space ship will take its
own time to reach a far enough galaxy. And when you find a planet that you can inhabit,
you bring the human embryo to life, and now you have a baby without a mother and the
baby needs to be taught. And the computer can teach the baby now.
So that's it. I would like to stop here. If there are any questions, I would be more than
happy to take it. One thing that I would like to stress is that if there are students watching
this talk and they would like to participate in this kind of research which has huge
societal implications, please do send me an e-mail. And there are lots of exciting topics
that you can work on which will definitely fit your area of expertise or interest. Okay,
thanks everyone for your attention.
[applause]
>>: [inaudible]
>> Sumit Gulwani: Okay, folks, I'm un-muting. It looks like I have to un-mute each and
every attendee one by one. But if anyone wants to ask a question, please do. I'll be able
to hear you.
>>: [inaudible] get feedback.
>>: So is this [inaudible].
>>: I don't know. It's not [inaudible] webinars.
>>: Oh, I see.
>>: [inaudible].
>> Sumit Gulwani: So if someone has any questions, please put your name in the chat
and then, I will let you speak.
>> Anoop Gupta: Sumit, while we're waiting: so, you know, even if you're, let's say, sixth
to eighth grade math, how far are we from covering 90 percent of the topics that get
covered, you know, through the combination of the solution technologies?
>> Sumit Gulwani: So Anoop is asking a question which is that grade six to grade eight
math is quite an important topic. And how far are we from developing such intelligent
technologies for that curriculum?
So if we leave out the proofs part of things, which is something that we are still working
on and has a long way to go because being able to generate machine-readable proofs
for mathematical columns is easy; begin able to generate human-readable proofs or
prove what a student is supposed to do for a mathematical problems is not that easy. So
assuming grade six to grade eight does not have proof problems, I think it is probably
the most ripe area to go forward with these kinds of technologies and deployment of
these technologies.
Zoran Popovich with whom I am engaging with on this particular topic, he has a plan for
deployment of some of the technologies that I have worked on and some of the work
that he has already done in this area. One of the things that he is focusing on is how to
get immediate feedback as to what the students are understanding, what they're not
understanding. So he is developed some good theories of getting the data quickly,
analyzing it and then using it to improve the instruction in the classroom. And these
deployments are supposed to happen very soon.
In terms of the editor, I'm not so sure about what would be a good model there, but that
is something that needs development. In terms of the formal [inaudible] technology I
think we have nailed it down. In terms of the analytics and machine learning, I think
Zoran has probably a very fine story there. He has also worked on design games to
engage, that hide these mathematical concepts and engages the audience.
But I personally think that this is a very important area that one can go after, and we
most prepared in this particular area. But the problem of [inaudible] needs to be solved.
>>: [inaudible].
>> Sumit Gulwani: Okay. So, [inaudible], do you want to speak?
>>: Yes.
>> Sumit Gulwani: Yeah, go ahead please.
>>: Oh, sure. I'm a natural language processing person, and I'm more interested in the
natural language processing part of your talk which is very exciting to me. For example,
about the automatic inputting of LaTeX equations. So is there some language model
involved in, you know, knowing which formula is more likely than other formulas? And
also, you must have parallel corpora of a LaTeX and an natural language descriptions of
a lot of equations so that you can train it on the correspondence between them. So you
could elaborate on that project?
>> Sumit Gulwani: Yeah, so great question, [inaudible]. So the question is how do we -So more background or more information about natural language processing and what
kind of training data are we using to develop these translators which can take natural
language into logical descriptions.
So one of the things that help us solve this problem: as opposed to the general natural
language translation problem is that if we look at the textbook-style problems, these
problems are actually very structured. They do not have much ambiguity unlike arbitrary
English text that you would normally speak with each other.
So there are two kinds of things that you might want to consider in the education
process, right? One thing is how a teacher formally states a given problem and the other
is the language that the students would be using to describe their solutions and so on.
So the student's language will be a little bit more ambiguous and less structured, but at
least the teacher's language will be more structured. So you are absolutely right that
most of these techniques that we have do involve training on some benchmark data and
then using them. Unfortunately we do not have a great corpus. What happens in these
domains is like -- I'll give an example of automata theory. We were able to collect around
less than 200 problems. And for each of the English descriptions, we wrote down the
expressed logical translation by hand. For geometry, I think we could have opted in more
problems, but the experiments that we did were kind of roughly similar.
So what is needed is techniques that exploit the fact that we are dealing with a
structured English, and we have a very well defined logical language which is a domainspecific language and is a limited restricted language we are doing the translation to.
And both of these facts have to be exploited in order to develop techniques that can
learn from a small amount of training data. I think this is one big thing that is different
from general natural language processing where normally the state of the art has gone
more into trying to develop semi-supervised or unsupervised techniques that can
leverage the presence of a huge amount of data.
In contrast, we are working in a setting which will not have huge amounts of data to start
with, which will have little amount of data. But then, you want to develop more
supervised techniques to be able to achieve this translation. This is what we're doing on
a short run. But having said that, it's like [inaudible] in a problem: once you start
deploying these technologies, you're going to start collecting more data. And when more
data starts coming in, you can then try to go more towards unsupervised techniques for
doing the translation.
>>: Can you...
>>: Thank you.
>>: ...comment a little bit on what your expectations are for the level of effort that will be
required by the teachers, like the middle school teachers, to use Anoop's example, to
create the systems, to deploy these in their classrooms?
>> Sumit Gulwani: So the question is how much effort is required on the part of the
teacher to be able to use the technologies that we have working on for middle school
curriculum?
So as you might observe, especially the example that I talked about, the one on
developing technologies which can see how problems have to solved which is already
done in the textbooks today, right, and from there automatically develop models of the
procedure that the teacher wants the student to employ -- We have a technology for that.
Once you take that procedure, you can use existing test case [inaudible] tools to develop
progressions for the students.
I did not talk about automatic grading in this domain, but that's also possible. So we are
pretty much talking about a domain where, at least for middle school mathematics, it will
mostly be a push button technology for the teacher, not much interactivity required.
That's why it's such a live domain. It's already something that can have an impact and is
the most that is closest to almost complete automation.
>>: I have a question here. Microsoft [inaudible] has a product of -- Microsoft
mathematics. It has a lot of similar things you were talking about, for example, you know,
the [inaudible] project [inaudible]. So what's the connection between that product and
some the ideas we have here?
>> Sumit Gulwani: So the question is that Microsoft has a technology that goes by the
name Microsoft Math Problem Generation, and how are the concepts there different
from the ones that I talked about for generating problems for mathematics?
So the way new problems are generated in Microsoft Math Generator is the teacher has
to encode a lot of domain knowledge, and the freedom is with respect to choosing some
constants, only constants. So in my case of algebra problem generation, you see we are
trying to generate problems which are not only different with having fresh constants but
they have a complete different operator set as well.
So Microsoft Math Generator cannot generate new proof problems which are an act of
creativity. Right? On the other hand, we are also investing into automatic generation of
simpler problems like problems for addition, problems for multiplication and so on. And
this is something that Microsoft Math Generator does. But we are, again, trying to get
into a fully automatic mode where the teacher will not have to enter domain knowledge
such as, you know, the important concepts, whether there's a single carry, double carry,
triple carry, multiple single carries, no carry and so on.
All these concepts are automatically discoverable by observing their different paths in
the procedure that corresponds to that particular concept. So I think we are talking about
much more improvement in the state of the art on top of what Microsoft Math Generator
software already does today.
>>: Sumit, maybe we should wrap up.
>> Sumit Gulwani: Okay. So if there are no more questions then let's conclude today.
So thanks everyone for attending. And as I said, students if you want to work in this area
please do send me an e-mail. There are a lot of exciting topics to work on, and we need
your help in order to make a big impact in this topic which has huge societal implications.
Okay, thank you everyone for your attention. Bye.
Download