>> Anoop Gupta: So let's start. And apologies that we're starting ten minutes late, some technical difficulties. So good morning, everyone. I'm Anoop Gupta from Microsoft Research, Redmond, and welcome to the talk on synthesis for education by Sumit Gulwani. Sumit is a senior researcher at Microsoft Research, Redmond. He got his Bachelor's from IIT Kanpur and then his PhD from UC Berkeley in 2005. His current research interests are in the cross-disciplinary application areas of end-user programming, programming synthesis and intelligent tutoring systems. Sumit's recent work on program synthesis shipped as part of our latest Excel 2013, something called Flash Fill that works like magic to simplify programming for end-users. In this talk, Sumit is going to talk about some surprising applications of similar technology to the area of intelligent tutoring systems; in particular how, you know, you might generate new kinds of problems to offer to students, the solutions to these problems as practice problems and automatic grading. As many of us are familiar with online and these massively open online courses, MOOC's, are becoming very popular during the last year and truly hold the potential for enabling quality education for everyone everywhere. The kind of research that Sumit is presenting today, I think will be an important component for us to truly realize the potential of such online education. So without further adieu, I'm excited to have Sumit Gulwani come and present. This talk is being given as a part of the monthly ExCAPE Webinar Series and is being watched by many people around the world. So welcome all of you. Sumit will give you some more details and Professor Rajeev Alur from UPenn, if he can join us -- They were, again, some technical difficulties -- will introduce a little bit more about how these are being recorded. Thank you. Sumit, welcome. >> Sumit Gulwani: Thanks, Anoop. Rajeev, are you there? Do you want to add something? Okay. I'm afraid we probably cannot hear Rajeev because he was having some technical difficulties. So this talk is going to be recorded and the link will be available in a couple of days, so you can either contact Liz or I will also try to put this link on my webpage. So thanks once again, Anoop and Rajeev for hosting me for this talk. I'm going to talk about intelligent tutoring systems. This is a cross-disciplinary research area and a quite diverse one. My own technical inspiration for this research area drives from my recent work on program synthesis. So let me start out by giving a small background on these technologies. So the traditional goal of program synthesis is synthesize computer programs or computational artifacts from specification that usually available in the form of logical relations between the inputs and outputs of the program. So this is essentially aimed for helping software developers write code, and I myself worked in this area for several years and played around with a variety of search techniques for searching the program from the logical specification that the user had provided. And these search techniques have a lot of inter-disciplinary flavor, so they were inspired by my own background in formal methods. But then, we also learned a lot from the traditional work done in AI and machine learning. But recently we started applying these techniques to helping end-users. So these area people who have access to computational devices but are not expert programmers. So we want to help them synthesize small [inaudible] for automating repetitive tasks in their lives. But these people are not going to be able to specify their intent using logic, but they're going to specify their intent using examples. They're going to specify their intent using natural language. And it turns out that the techniques that we developed for enduser programming are also very applicable to the domain of intelligent tutoring systems. And this is what I'm going to talk about in this talk today. So you can also find a commentary on some of this material in a recent paper that I wrote based on a recent keynote which appears in SYNASC 2012. So there are quite deep connections between end-user programming and intelligent tutoring systems. So in end-user programming the end-user teaches the computer by means of giving examples or demonstrations, and the computer applies that knowledge to automate the repetitive task for the user. And now the paralleling education word is that the teacher is going to teach the computer and the computer is going to apply that knowledge to teach the students instead of automating the repetitive task. Also it turns out that the way end-users interact with the computer is usually by means of specifications which are often ambiguous. And hence some interactivity is required to facilitate the interaction between the end-user and the computer. And a similar interactivity can be used to facilitate the interaction between computer and students to resolve the students' confusion. Now what this means is that the research in end-user programming, which is a research area on which is relatively easy to make money, can fuel research in intelligent tutoring systems, which is a topic which has huge societal implications and vice-versa. So in fact we have developed some techniques in intelligent tutoring systems which are now bringing back [inaudible] end-user programming. So without further adieu let me now directly jump into the topic of intelligent tutoring systems. So there are several aspects in intelligent tutoring systems, and I'm going to talk about four of these aspects in this talk, four different parts. So I'm going to focus on problem generation, automatically generating solutions to these problems, automated greeting and how to enter content inside the computer. And you will notice that most of these techniques are going to be a little bit more specialized for the domains for which they have been developed. And these techniques can be applied to a variety of domains ranging from mathematics and science but also to programming subjects, teaching automatic theory, logic, even to board games. And some of these ideas and inspirations also carry over to the field of language learning. So at this point let me see if someone wants to ask a question. Okay. I don't see any question yet. Okay, so let me start out by problem generation. >>: You need to focus [inaudible]. >> Sumit Gulwani: So the motivation for automatically generating problems is plenty. Often times you might want to generate problems that are similar to a given a problem, and this is often helpful in avoiding copyright issues. So if you see some problem in a textbook, you cannot simply copy it and put it online for the students. So we like to be able to generate problems that are similar to the problem that was given in a textbook. We might use this to prevent plagiarism in several settings. So in MOOC's where you have a massive number of students trying to take an exam, you ideally would like to provide a different problem of the same difficulty level to each different student. And in fact one the problems that happens in MOOC's is the issue of unsynchronized instruction. So if you offer an exam and you publish your solutions then some other student wants to take the exam at a later point of time, you want to provide them wth a fresh set of problems. Often times you might also want to generate problems in an absolute way; so, you want to generate problems of a given difficulty level and problems that require exercising certain concepts. And this can be useful in various ways. So the moment you start developing notions of difficulty levels of problem, you can compare different progressions in different textbooks; you can evaluate the quality of different textbooks. You can also generate personalized workflows for the students. So if a student to solve a problem and fails, you want to present a problem which is a simpler than that problem. Or if a student solves a problem correctly, you want to present them with a more difficult problem. So just as how it is done in exams like GRE, but now you can use this idea in a more constructive fashion to aid learning in classrooms. So I'm going to talk about key ideas which we have been using in our work on being to automatically generate problems. And the first idea is what I call Guess and Verify. And this technique essentially works when you have an extremely fast capability to check the quality of a problem that you have generated or the correctness of a problem that you have guessed. So let me start out with the domain of algebraic identities or algebraic proofs. So consider this red problem that I have taken from quite a famous textbook. Now I cannot put this problem simply online for practice for the student because it is a creative concept, and I would be violating copyright issues. But now we have developed a tool that can take this problem and automatically generate all the green problems that you see. So this is problem generation by example. All these green problems have a very similar syntactic structure to the original problem, and it also turns out that they have similar difficulty to the original problem. But in this particular case, the system also ends up generating these two--. Okay, so the way that this works is that the red problem is generalized into an orange problem. And then, we try to find the [inaudible] of the orange problem which are well defined problems. But it also happens that we end up generating the last two problems that you see here which are much easier because they are of the form: alpha plus beta times alpha minus beta equals alpha squared minus beta squared which is something that is true for all alpha's and beta's. But one way to avoid such problems is to strengthen the orange constraint by adding the constraint that the term T should not be equal to T five. So T was a term that was a generalization of the first trigonometric operator, a secant of X, and it stands for any kind of trigonometric operator that can be put in its place and so on. And this is in fact the generalization that [inaudible] are producing in the first place, and once this is there then these last two problems which are relatively simpler, kind of disappear. So the point that I want to stress forward here is that this is a tool that can either be used in a completely automatic manner by a teacher or for a super-user, the user can also interact with this tool by controlling how rich or how restricted the orange query is. So this is quite a general purpose tool which works for several algebraic domains. So here is... >>: So do you do combinatorial search to [inaudible] and rule out the ones that are not valid? >> Sumit Gulwani: Yes. So great question. So the question is how do we do this? So do we do combinatorial search to try out all possibilities? And this is exactly what we do. So the technique, the general principle in this here is to do combinatorial search and quickly verify whether a problem is correct or not. So thanks for bringing up that question. So if you look at this [inaudible] here, so each of the operators is replaced by a hole which can be plugged in by an operator of the same type signature. And in this particular case we have six holes, and then there's also a binary hole for a plus or a minus. So we're going to try out all possibilities. So each trigonometric hole can have one of six trigonometric operators. So we see that [inaudible] is actually quite big. But then, once we put in a nondeterministic choice inside these holes, we need a method to test whether the problem is correct or not because if you arbitrarily or randomly plug in these holes with trigonometric operators you might not get a correct problem. We actually have a very efficient test to test for the correctness of these problems. And this test is an extension of so-called Polynomial Identity Testing which states that if you check the correctness of these problems on a few random inputs, and if the left-inside and the right-inside turn out to be same when you evaluate these terms on those random inputs then with very high probability all the choices of the random inputs, this problem, was indeed a correct problem. And this is the fast verification check that we use to essentially identify quickly and filter whether a problem is correct or not. >>: So the [inaudible] is stored somewhere in the database so you can see whether something is exactly the same or not? >> Sumit Gulwani: So... >>: [inaudible]. >> Sumit Gulwani: So the groundwork here is, I choose a random value of x and I evaluate a guessed problem with respect to that random value. And I see whether the left-hand side and right-hand side were evaluating to the same value or not. So this is a general concept, works for a variety of algebraic operators for [inaudible] randomized identity testing holes. So for example if you give our tool this red limit problem, it will end up generating these green problems which have a similar structure and also turn out to have the same difficulty level. And here is the orange term that we generalize the red problem to. And here's an integration problem. You give this red problem; our tool generates these green problems which you can go and put on your websites. Here's a determinate problem, and these are the other very similar problems that you automatically generated. So now before I move on to another application, are there any questions that the audience has? Okay, so let's move on. So let me show you another quick example of the same methodology. And this time in the context of generating problems for natural deduction which is a topic that is taught in introductory logic courses. So [inaudible] idea was that all of this work that I'm presenting today is joint with several collaborators, and I'm putting down the name of the people who are involved when I talk about the specific projects. Oops So in this particular case we are talking about natural deduction problems which have the form given some premises -- In this case, Premise 1 and Premise 2 -- prove that the conclusion is true. And one way to generate similar problems might be for the teacher to have a tool where the teacher can generate some replacements for either these premises or these conclusions. So in this case if the teacher says, "I want replacements for Premise 1 --" [recording sound] >> Sumit Gulwani: So if the teacher wants to generate replacements of Premise 1 then this is what our tool gives them. And, again, the way that this technique works is that we have extremely fast check for checking the correctness or the well defined-ness of a given problem. So what we do is we enumerate all possible formulas. But the number of formulas is actually huge, so what we do is we enumerate all possible truth tables that correspond to formulas that can be represented in a small space. And the number of such truth tables that correspond to a small formula is actually much less than 2 to the power, 2 to the power N where N is the number of Boolean variables. So if N is 5, I think this number is around 20,000. Then, for each of these things we quickly check whether the problem is correct or not by representing the truth table by bit [inaudible]. So all that we need is like 6-bit [inaudible] checks to check whether one of the problems that we have guessed is correct or not. Right? Very similar to the principle that I was using in algebraic identities that will try out all possibilities by brute force search and have a very fast verification check to figure out whether the problem is correct or not. And the same principle we use here as well. Okay, so now let me talk about another key idea that is often helpful in problem generation which is that you think of it as a reverse process of solution generation. And here you might want to use some symbolic methods or symbolic techniques. So I won't go into any more details but I will just show you an application of it. And this time in the context of generating good practice problem for playing board games. So imagine you want to design a new board game. So let's say we're talking about -- Okay, so I see that someone probably has a question here. So I'm getting confused because people are -- So are people able to hear me? >>: [inaudible].... >> Sumit Gulwani: It might be [inaudible]. Okay. >>: I think you've [inaudible]... >> Sumit Gulwani: Okay. >>: Can you tell what time [inaudible]? >> Sumit Gulwani: Yeah, let me just see if I have some --. Okay. So --. So can someone confirm that they can hear us? Okay. Oh, people can hear us. Okay, good. Thanks, Liz. Okay, so now suppose you want to design a new board game, Tic Tac Toe on 4-cross-4. Right? And you want to have a rule where you would only allow matches of three but only along rows and columns, not along diagonals. And another reason why you would want to generate new interesting starting configurations for such games is that if you play on this new game and if you start from the starting configuration, the first player is almost always going to win. It is triggered for the first player unlike in 3-cross-3. But if you start from an initial configuration which is not the empty board then this game becomes interesting. Also, even if the game was already interesting, if you start from the default state, people can memorize moves. But if you start from an given initial configuration, you can actually train students to be able to play certain strategies. So what we have is a system where you write down the rules of the game, and you specify the difficulty level for the opponent and for the player, and out comes some good, interesting starting play configurations that you can play with. And the way that this technique works is, it's kind of like solution generation in reverse. So I won't have more time to go into details but let me actually move on. So now I'm going to talk about a third technique for problem generation which is inspired by techniques developed in the software engineering community on test input generation. So these kinds of techniques work for procedural problems that are especially common in middle school mathematics but you'll also see them in advanced courses as well. So this is a progression of problems for practicing addition of numbers that I've taken from middle school curriculum. So how can you figure out what is the quality of these problems and how can you generate more problems of their particular kind? So the key idea is that you first write down the procedure that the student is supposed to execute in order to be able to add these numbers. So let me see if someone has a comment here. Okay. Okay, so most people are able to hear me so I'll ignore that. Okay. So now once you write down the procedure and label interesting statements in this procedure with tags which denote which branch was a duplicate or --. [filtered echo sounds] Then you can distinguish between these problems... [filtered echo sounds] Then we can distinguish these problems by observing their traits. [filtered echo sounds] I can hear some echo. I was wondering if someone has their microphone turned on...? [filtered echo sounds] Okay. So one way to distinguish between all these different problems that we see in the progression in a textbook, is to observe the trace that was executed when this problem was done or the procedure that the student was supposed to execute in order to solve this problem. And then you can see that these problems turn out to be quite different, but maybe the second and third one are quite similar because of their trace characteristics. And they are supposed to be because they look quite symmetrical; they are symmetrical. Now you can use these traces to define the difficulty level of a problem. The longer the trace, the more number of loop iterations that it goes through, the more number of exceptional conditions that it executes in this procedure, the more complicated our trace might be. So we use this concept to actually compare two different progressions that we got from two different math textbooks. So this is about integer comparison. You're given two integers and you have to figure out which integer is larger. So we have a green progression taken from a middle school mathematics textbook called Jump Math, and we have a blue progression taken from a textbook called Skill Sharpeners. So what we did was, we again wrote down the procedure that the student is supposed to execute in order to be able to solve this problem. And then, we drew the graph of the different problems for these progressions. So the blue progression is one and the green is the other one. So now you will notice from this graph that the green progression actually moves into more involved problems that require comparing numbers with more digits in which the first several digits are actually equal. And in fact it moves quickly into more involved problems. But it actually ignores an entire class of levels, namely the ones corresponding to H and L. So what are H and L? H and L are two specific cases where two integers can be compared easily, not by virtue of comparing their digits one by one from left to right but by simply counting the number of digits in those integers. So if an integer has a few number of digits then it's going to be smaller than the other integer. So now let's move on to the second part of the talk which is about being able to automatically solve problems. So people often ask me, "Why do you want to develop capabilities to automatically solve problems? Do you want students to cheat on exams?" Well, not really. So there's a lot of motivations why we want to be able to solve problems. So one big motivation is that it is an important ingredient to be used in the problem generation phase itself when you want to generate problems that are not only similar to a given problem but problems that involve solutions that have certain characteristics. Or, when you want to generate sample solutions to the new problems that you have generated using your problem generation tool. Another reason for solution generation is to be able to generate customized solutions. So oftentimes problems have multiple possible solutions, like programming problems or proof problems, and one solution might appeal more to a student than to the other student who might be familiar with a different set of concepts. It can be used to complete unfinished solutions from students. So instead of trying to present a sample solution to the student which is very different from how the student tried to solve the problem, it is better to complete the solution along the lines the student was trying to think. And there are two key ideas that we have used in terms of solving approaching this problem. So one idea comes from our work on automatically synthesizing programs from logical specifications and from specifications in the form of examples. So let me explain this in the context of geometry constructions, ruler-compass based geometry constructions. Suppose you are given this red triangle, and the goal is to construct a green circle that pass through the three vertices of this red triangle. Now one way to solve this problem is to construct a perpendicular bisector of XY then construct a perpendicular bisector of XZ. These two perpendicular bisectors will intersect at a point N and then, you draw a circle through the center whose area is equal to distance between that point and one of the points on the triangle. Now what I just said was actually a program. It's a straight-line program in which the operations are ruler and compass operations. So it is a program synthesis problem. You want to synthesize a straight-line program made up of ruler-compass operators which can solve this particular problem. The first thing that we have to do is to be able to formally specify the English description that was there in this problem. In fact, we have all done natural language processing techniques which can take such a natural language description and automatically generate a logical specification of the problem. So in this case the problem has a precondition on points XYZ; namely, that XYZ are three points which are not on a straight line. And there's a postcondition which relates how C relates to XYZ. So C is a circle that passes through X. It also passes through Y. It also passes through Z. So this is what the logical specification of the problem says, and we have developed technologies which can automatically take the English description of the problem and generate these logical descriptions. Now once you have a logical description, we want to generate a program -- We want to synthesize a program -- which, in this case, is a straight-line composition of geometry methods. So there are types, like points, lines and circles, and there geometry methods like ruler, compass and intersect operator. So if you go back to this problem, constructing the perpendicular bisector requires these four different steps, constructing the other perpendicular bisector requires the other four steps and then, you find the point of intersection. And then, you draw the circle between them. So this is the kind of program that we can automatically produce using our technique. Now let me give you a little bit of [inaudible] of how it works. And before I describe to automatically produce the solution, let me address a simpler problem of how to automatically check a given solution from the student. So let's say P is a geometry program that a student has given us, and it's supposed to translate inputs I into outputs O that have this precondition and postcondition that you see in blue. So if the input I satisfies the precondition Pre then the output O should be such that it satisfies the postcondition Post. So this is the verification problem. And the synthesis problem would be that we will be only given the logical specification pre- and post- which describes how the output should be related to the inputs, and the goal is to figure out a program P. Now let me first talk about a verification problem where the solution is given to us, and we just want to check whether the solution is correct or not. In fact this problem is decidable; there are known existing decision procedures, but they're extremely complex. But I will preset to you a new decision procedure which is very fast and it will enable us to extend this technology to be able to synthesize the solutions in the first place. So the efficient approach is actually random testing. It is exactly the same idea that we used for checking the correctness of our guesses for algebraic identities which is something that I covered during problem generation. And the idea is that if you test the correctness of the student's construction on a random input, and if it succeeds then it's very high probability all the choices of the random inputs, the student's solution is actually correct. And the correctness of this remark that I just made again follows from an extension of Polynomial Identity testing which is a classical theorem in computer science theory. And the same theorem applies to algebraic operators and geometric operators which form the basis of a solution for automatically generating algebra problems and automatically solving geometry problems. So the synthesis algorithm works like this: so we take the logical specification and we convert it into a random input-output example that satisfies that logical specification. And we find this random input-output example using numerical methods. And then, we try to solve or generate a program which can take the randomly chosen input to the corresponding output. And this is not very different from how students would solve these problems, right? They will sketch out something using pen and paper and then try to work backwards and solve them. And this is exactly what we do. So then, we just do brute force search. We start from the input objects and then, we try to apply geometric operators randomly to them until we are able to hit the desired output objects. And then, there are again some details in how this technique scales but this is pretty much the basic broad idea. And let me again give you an [inaudible] of why this approach actually works. So the error probability of this algorithm is extremely low. So if I look at the same problem of constructing a circumcircle, now if the input triangle isn't equal to a triangle and if I did a wrong construction like that of an incircle versus a circumcircle, I will notice that their centers actually coincide. And I will be forced into believing that the construction that works for the incenter is the correct construction for finding the circumcenter, but it is not the case. But when will this problem arise? It will arise only if when I choose the input triangle XYZ randomly, I choose it to be an equilateral triangle. So what are the chances of me choosing a triangle to be an equilateral triangle if I'm choosing it randomly? It's actually very small. Right? So I will not be fooled into believing that angular bisectors will intersect at the same point as the perpendicular bisectors because the changes of me choosing a random triangle to be an equilateral one is going to be very small. So I see someone wants to ask a question here. It's a good time to probable stop as well. So, Liz, do you know who raised their hands and if someone wants to ask a question? If someone wants to ask a question, please go ahead. >>: [filtered inaudible] >> Sumit Gulwani: Okay. Probably not. So let's move on. So let me present you -- Yeah, go ahead. >>: Is it a problem that you -- So I can accept that you would not create programs that would generate the wrong solution. But is it a problem that you would create programs that contain unnecessary or superfluous steps that do not actually contribute? >> Sumit Gulwani: Yes. So the question is: can we create geometric programs in which there are some unnecessary steps that are not needed? Or maybe I generate a longer solution. So that's a good question. So the way we solve this problem is by pruning out statements which are dead statements, so we only keep statements that contribute to live variables. And we bias our search space by trying to figure out shorter solutions first before we search for longer solutions. So both of them are valid problems, and we do address that. Yeah. >>: I have a question here. I assume that the purpose of doing solutions is provide feedback to the student so he can go through the steps to learn how to do that. But that's [inaudible]. Like the example that you showed early on, you put a lot of steps which are just for the programming purpose rather than just getting an original solution to doing this. >> Sumit Gulwani: Okay. So good question. So the question is that the program that I showed you for geometry had lots of steps because drawing a perpendicular bisector itself takes four operations if you look at ruler and compass operators. And giving feedback to the students in the form of these larger programs might not be as meaningful as giving them higher level concepts. And we do, again, cater for this. So by trying to replace instructions which correspond to some known concepts like, "Draw perpendicular bisector," so we take those four instructions and we replace them by one instruction which says, "Clear the perpendicular bisector." So we do try to present solutions in terms of higher level concepts. Okay. So now let me show you another application of a very similar technology. In this case it is about solving problems in an automata theory course. So one common problem that you see in automata theory course is you are given a language and you have to figure out whether the language is regular or not. If the language is regular, you have to construct an automata. If the language is not regular then you have to a proof of non-regularity. So what do you think? Is this language that you see in red, is this regular or non-regular, language of all the strings that have the same number of occurrences of the substring AB as the substring BA? So it might appear that it is a non-regular language, but it is regular. It's a trick question. And if you give this to our tool, our tool outputs the automata for this language. But if I change this language slightly, same number of occurrences of A as occurrences of B, then it is really counting and this is non-regular and our tool ends up generating a proof of non-regularity. So again what is the methodology in terms of automating these things? We need to be able to describe the problem in a formal language. And we have defined a user-friendly logic for doing this, and the idea of designing these logics is that it should enable easier translation from natural language using standard [inaudible] natural language processing tools, and we have this come as part of our tool as well. Then you need to have a language for describing the solutions. So what are the kinds of solutions that we want? Well we want to prove this in automata if the language is regular, and automata is a concept that is variable-defined so no need to formalize it any further. But we want to formalize non-regularity proofs. So the proofs that we look at are so-called Myhill-Nerode Theorem proofs. So I don't want you to get bogged down into the Greek symbols here. But the observation is that the proof of non-regularity can be expressed in terms of two functions, F and G, which we'll call congruence functions and witness functions. And what we observe is that in real life, these F and G functions take values from a simple language that you see here. And then, what we want to do is to be able to develop an algorithm that can convert the problem description to either an automata if the language is regular or it can convert the problem description to a non-regularity proof, which in this case I just showed you comes from a small, simple language. And for both of these we use standard results for -Actually for automata construction we leveraged some standard results on automata learning by Angluin; it's called the L-star algorithm, a very classical algorithm. And then for generating non-regularity proofs, we again get back to our guess methodology where we do an intelligent guessing of a solution. Then we test it and if it passes the testing then it's verified. But again the details are a bit too much for me to cover in the talk. But, again, the point that I want to stress forward is that most of these are search problems. And one of the common methodologies that we use for the search problems is brute force search, come up with an intelligent scheme to guess for solutions or a limited [inaudible] space and then, use a very fast verification check to see whether the solution is correct or not. So let me see if someone has a comment here. Okay. So [inaudible] is asking, "How do you choose an input?" And so, [inaudible] was this -- Am too late in seeing your question? So is this question in the context of automata's or non-regularity proofs? Okay. So let me then try to integrate more [inaudible] question. So the question is, how do we choose an input? So in non-regularity proofs we have to construct these functions F and G, that should satisfy some property for all I's and all J's; so I and J's are integers. So what we do is we try to choose small values of I and J's and then, try to find an F and G that works for those small values of I and J, then test it for even more values for I and J and then, verify for all values of I and J. So [inaudible], is you question about geometric constructions? How to choose inputs there? Okay, so [inaudible]'s mic does not work. But in case of geometry, we try to solve for a random solution using some of the existing off-the-shelf numerical solvers. So they might not give us perfected [inaudible] solutions, but it seems to be good enough for all the practical experimentation that we did. So let's move on. Yeah? So now I'm going to show you another technique for solution generation. In this case it is from demonstrations, and this works for procedural problems. It's a very different take at the problem. So consider this snapshot from a middle school mathematics textbook in which the student is being shown how to compute the LCM of three numbers. You can see on the top right. Now if I do not just tell the answer to the student which is 1440, but if I walk the student through all the different steps that are required in solving this problem, we are setting up ourselves for a formalizing which is very similar to programming by demonstrations in which an end-user shows me how to automatic tasks in spreadsheets through a sequence of steps. And it is a very similar technology that we use here to be able to automatically be able to generate solutions that can solve such procedural problems. And, again, one thing that you'll observe is that that beauty of generating such solutions is that once I have a procedure that can solve an LCM problem then I can use that procedure not only to produce sample solutions for all the practice problems, I can use that procedure to in fact generate practice problems in the first place. If you recall, that was one of our techniques of problem generation in which the goal was to get a procedure that can solve a given middle school mathematics problem and then, use test case [inaudible] tools to exercise all parts in the procedure in order to generate a good progression of problems for the students. So I think this is a great example of the connections with end-user programming where first the end-user teaches the computer and then the computer is able to help the enduser. So in this case the teacher is going to first teach the computer by means of some examples as they would teach a student. But once the computer becomes smart enough to understand how to follow a second concept then the computer can start teaching that to the students. So now let me go on to the third part of the talk which is about automated grading, and this is probably the one that probably does not need much motivation. This is something that everyone feels, you know, is probably one of the most important topics. Because once you have an automated grading technology, you can help enable answer-scripts be graded immediately and provide immediate feedback to students. But on the other hand, it can also enable a rich tutoring experience which normally one-on-one tutors would not have the patience to do. For example, generate good hints for a student when the student gives you a wrong solution or provide the student some simpler practice problems to act on, to work on if they are not able to solve a particular problem based upon the kind of errors that they actually make. And there is one key area that I want to enforce, you know, in this topic which is that there are often a variety of feedback metrics that need to be provided, a variety of feedbacks that need to be provided to different students depending upon the kinds of mistakes that they are making. So one feedback metric might work better for one student; another feedback metric might work better for another student. And one of the feedback metrics that is often used in computational courses, like automata or programming, is showing the [inaudible] on which the solution is not correct. And this is a great metric. In fact this is the state of the art that is used in grading programming problems. So let me just show you can example of it. So we have a very nice tool [inaudible] called PexForFun.com where you can see a problem, submit a solution to a programming problem, and the tool will give you feedback in the form of counter-examples. It will show you an input on which your program does not compute the right answer. So this is a technology that we have deployed here at MSR. And this is, by the way, very useful technology. But now what I'm going to show you is one of the [inaudible] points that a user went through to emphasize the need for giving a different kind of feedback. So this is a solution that someone tries to submit for an array reverse problem. So the goal is reverse an array, and this solution is buggy. And this is a solution that a student submits. And when the student submits a solution, the student immediately gets back a counter-example. And the student stares at it, you know, "Why is my program not correct on this input?" And then the student tries to change this. So once the student gets the feedback, the student tries to change it. And the student changes it like this which is still incorrect. And the student gets another counter-example. And then the student submits the same attempt again, you know, as if we had an [inaudible] which will change its response. But, no. So clearly some sign of frustration here. And the student makes further changes. Now this is the same as the initial attempt except that the student put a print statement in the middle of the loop, you know, going back to the original mistake. The student gets the same counter-example again and then, submits the same attempt again after looking at the counter-example. The student gets back to the reverse attempt. This is, again, same as the initial attempt. This is also the same. Ah-ha! And now the student is slowly moving towards the correct solution. The only error is that instead of A of I in the loop body, the student should have A of I minus 1. And the student gets another counterexample. Now the problem with counter-examples is that the counter-examples is not telling the student how close the student is to the correct solution. Okay? And see what happens after this. Now the student tries to make fixes in the wrong direction. And now the student is back to the bigger error. The student submits the same attempt again, gets the counter-example, submits the same attempt again, and now the student is too frustrated and gives up. Right? So what I'm trying to emphasize is the need to provide different kinds of feedback for different students. And in this case, one of the feedback that you can provide is to take the student's solution and try to make small edits to it so that the student's solution becomes a correct solution. Now note that you cannot simply do syntactic edits to compare it with the sample solution because the number of correct solutions, at least in programming, are infinite. Right? You cannot simply try to morph the student's solution to a correct sample solution; you have to modify it to the nearest correct solution. And one of the techniques that we use is this very nice work on program sketching that was originally started out with by Armando Solar-Lezama and [inaudible] and now Rishabh Singh is [inaudible] at MIT who is working on this. So the way we do this is that the teacher writes down an error model describing the standard kinds of mistakes that the students often make. So for example the students are often wrong by the plus-minus one in the array indexing. They can often be wrong in understanding the semantics of increments should it be V++ or V- - and so on. And so the teacher writes down and prepares this error model based upon their experience. And then the key idea is that we try to solve the problem of making the smallest set of changes to the student's solution using this error model, using these rewrite rules that will make the student's solution into a correct one. And this can be phrased as a sketch problem which is an off-the-shelf tool for program synthesis. And once you do that then these are two different [inaudible] for array reverse. We give it to our tool and our tool will simply point out the errors in these solutions immediately. And if you want, you can play with this tool on these links that I've provided here. So let me see if someone has a question here. Okay. Okay, so let's move on. So now let me show you another technique, another way of providing feedback using mutation to generate feedback. Another kind of feedback that you can provide is that if you look at the student's solution try to figure out the problem for which the student has given a solution. Now you try to reverse synthesize the problem for which the student's solution is correct, and you say that instead of solving this problem, you try to solve a slightly different problem. So these are different attempts to automate a problem. The problem was draw an automata that accepts strings with an even number of the letter a's. And this is the correct solution. Our tool can automatically grade it. But this is an incorrect solution that one of the students provided. So what does our tool do? Well, one of the feedbacks that is useful to give here is that your solution is almost correct except that it does not work well on the empty string. And we give four out of five to the student. And this is the same as input-based feedback which is also used in programming exercises. But here's another attempt that we saw a student might have given. And in this case we try to do more of the kind of mutation-based bug localization where you try to find minimal syntactic changes that you can do to the automata to make it into a correct one. But this is another one where you would actually require two different changes, so instead of deducting, you know, four points on the student's solution with some other model, you can actually figure out that the student was actually quite close to the correct solution. So the student solved the wrong problem. Instead of solving an automata that accepts strings with even number of a's, the student solved the automata that accepts strings with odd number of a's. And, better feedback might be more useful to the student and this is what we call problem-based feedback. Okay. So let me quickly move on to the last part of my talk -- And try to finish in five minutes here -- which is the problem of content entry. So we've been talking about all these technologies to provide feedback to students which require the students to enter all these things inside the computer in the first place. And today we do not have great editors to be able to enter structured content inside the computer except for programs, so we have programming [inaudible] Visual Studio which make it easy to enter programs. But what about subjects such as math and physics which require the student to enter lots of equations and drawings? So the key ideas that we tried to use in trying to develop editors which make it easy to enter such structured content is one: we want to allow for multi-modal input such use of ink, speech, touch, and this becoming especially significant with the presence of new devices with new form vectors. Then, what we want to do is we also want to do some error correction because when you the soft forms of intent like speech --. So when you use some soft forms of intent like speech and ink, you're likely to make some errors. So it will be good to have an error correction technology. And for the [inaudible] we would like to have a prediction as there is in IntelliSense. So let me illustrate these concepts to you for the domain of mathematical equations. So if you use existing editors like LaTeX or Microsoft Word, they're going to be very painful for you to write equations. One of the solutions that we have investigated is that of developing an intelligent predictive editor, and the reason why such an editor is possible is because most man-made stuff has very low entropy. Most structured concepts have very low entropy and hence amenable to prediction. So there are two kinds of prediction that we can do. So one kind of prediction is what I call Structure Prediction in which we can predict parentheses in a mathematical term. Right? So this is a very specific thing, but I'm just giving examples of what is possible in this space. And this can enable entering of mathematical text by speech. So for example, suppose you want to enter this term that you see on the slide, how would you say it if I asked you to, you know, speak this term? You would very likely say, "Square root 1 plus cos divided by 1 minus cos." Right? Ideally what you should have said was, "Square root of, open parentheses, open parentheses, 1 plus cos, open parentheses, A, close parentheses, close parentheses, divide by..." and so on, right, which is becoming quite [inaudible] to speak. But now we have a technology which can automatically take the term without parentheses and give you likely suggestions on where parentheses could have been in the first place. So it uses the heuristic that most mathematical equations will have parentheses inserted in a way that the term becomes highly symmetrical. And this heuristic seems to work well for lots of experiments that we've tried. >>: [inaudible] I mean it could be... >> Sumit Gulwani: Oh, absolutely... >>: ...square root of the top... >> Sumit Gulwani: Absolutely, right. But the point is that people are not trying to write arbitrary text. And this text has a very low entropy, and this is what [inaudible]. So you're absolutely right. If every possible parenthesization was equally likely, this technique is not going to work. Right? But the point is that every parenthesization is not likely. People are not going to put parentheses around 1. People are not going to say, "Square root of 1," and so on. And people want symmetrical terms, and that's why this technique works. So another technology that we have here is very much similar to IntelliSense, Visual Studio IntelliSense, where I can predict what the user is wanting to type next. So all that you see in color is something that has been predicted automatically. Once I see the black terms, I can predict that colored terms because there's some type of pattern. And these patterns normally occur when you try to enter symbolic matrices like this. We can also predict sometimes when you're trying to solve problems if you apply a given step, we can then apply that kind of reasoning to the rest of the problem so make it easy for teachers to enter sample solutions if they wanted to. Okay. So the last part in my talk which is about trying to create drawing which is another important concept in eduction; you need drawings for geometry, physics diagrams, chemistry diagrams and so on. And one challenge in drawings is that they often require extreme precision; you have to make sure that the line is a tangent to a circle, the angle at the center of the circle has 90 degrees and so on. And the idea is to allow users to do pen-based sketching and infer constraints using machine learning and then, use constraint solving technology in [inaudible] community to beautify these drawings. And the other challenge is that sometimes drawing is very repetitive. It can be very tedious to draw sometimes, like trying to draw a resistor in an electrical circuit and lots of repetitive edges and so on, or trying to draw a ladder. And the problem with copy and paste is that copy and paste does not help you with positioning of copied objects and transformations on copied objects. And our solution is to use synthesis technology to predict the repetitive features that are there in that drawing. So let me just give you an example of how it works: so let's say you want to draw this perfect ladder. You sketch it out using pen and then, we can beautify it for you but we can also predict what is the repetitive aspect here. Similarly if you're trying to draw this circle with lots of spokes, we beautify this for you and then, we are able to predict what you were trying to draw. Okay, so now let me conclude. So in this talk I talked about different kinds of aspects in an intelligent tutoring system, namely solution generation, problem generation, automated graded, entry of structured content. And we also saw that some of these techniques were specific to their domain for which they were applied. But I personally think that this is not a problem because you need to double up these technologies once and for all. The standard curriculum does not change that often. We have been reading the same concepts for several decades. So one you double up these specialized systems for these domains, they are there to stay. The other thing that I want to stress is that this area can benefit from cross-disciplinary research. So you must have already seen some inspiration that I drew from my background in formal methods and my expertise in program synthesis and search techniques. There was a lot of HCI aspect that I just talked about in the last part of the talk. Things that I did not get to cover are natural language processing for dealing with word problem because often times problems are not [inaudible] in English. And also using machine learning for analytics. So maybe that would be a good material for a talk some other day. Now what is the value proposition that we are seeking here? So in case of short term, the goal is to actually improve education by making it more interaction, by allowing students to explore more, by making it into a social activity because if all these things are digitized, different students are trying to solve the same problem at the same time they can be brought together. It can turn into a very social activity as well. It can really enhance learning. But I am also personally interested in some of the long term benefits of investing into this technology. And this goes back to the connections that I was talking about with end-user programming and intelligent tutoring systems because once the computer becomes smart enough, you learn concepts of math, programming, automata, physics, chemistry, maybe even undergraduate concepts -- I’m not talking about graduate level concepts, right, even undergraduate concepts. We are now talking about having a very ultra-intelligent computer. Also if we succeed in this endeavor of trying to use technology to improve learning, we would likely also have developed the model of human mind. And I will conclude with this last long term application that I have so that this is, I think, at least the thing that'll help you remember this talk. So how can this technology help in inter-stellar travel? So one of the theories for interstellar travel is where you want to travel to far off galaxies is to freeze a human embryo and put it in a space ship and then, time does not matter. The space ship will take its own time to reach a far enough galaxy. And when you find a planet that you can inhabit, you bring the human embryo to life, and now you have a baby without a mother and the baby needs to be taught. And the computer can teach the baby now. So that's it. I would like to stop here. If there are any questions, I would be more than happy to take it. One thing that I would like to stress is that if there are students watching this talk and they would like to participate in this kind of research which has huge societal implications, please do send me an e-mail. And there are lots of exciting topics that you can work on which will definitely fit your area of expertise or interest. Okay, thanks everyone for your attention. [applause] >>: [inaudible] >> Sumit Gulwani: Okay, folks, I'm un-muting. It looks like I have to un-mute each and every attendee one by one. But if anyone wants to ask a question, please do. I'll be able to hear you. >>: [inaudible] get feedback. >>: So is this [inaudible]. >>: I don't know. It's not [inaudible] webinars. >>: Oh, I see. >>: [inaudible]. >> Sumit Gulwani: So if someone has any questions, please put your name in the chat and then, I will let you speak. >> Anoop Gupta: Sumit, while we're waiting: so, you know, even if you're, let's say, sixth to eighth grade math, how far are we from covering 90 percent of the topics that get covered, you know, through the combination of the solution technologies? >> Sumit Gulwani: So Anoop is asking a question which is that grade six to grade eight math is quite an important topic. And how far are we from developing such intelligent technologies for that curriculum? So if we leave out the proofs part of things, which is something that we are still working on and has a long way to go because being able to generate machine-readable proofs for mathematical columns is easy; begin able to generate human-readable proofs or prove what a student is supposed to do for a mathematical problems is not that easy. So assuming grade six to grade eight does not have proof problems, I think it is probably the most ripe area to go forward with these kinds of technologies and deployment of these technologies. Zoran Popovich with whom I am engaging with on this particular topic, he has a plan for deployment of some of the technologies that I have worked on and some of the work that he has already done in this area. One of the things that he is focusing on is how to get immediate feedback as to what the students are understanding, what they're not understanding. So he is developed some good theories of getting the data quickly, analyzing it and then using it to improve the instruction in the classroom. And these deployments are supposed to happen very soon. In terms of the editor, I'm not so sure about what would be a good model there, but that is something that needs development. In terms of the formal [inaudible] technology I think we have nailed it down. In terms of the analytics and machine learning, I think Zoran has probably a very fine story there. He has also worked on design games to engage, that hide these mathematical concepts and engages the audience. But I personally think that this is a very important area that one can go after, and we most prepared in this particular area. But the problem of [inaudible] needs to be solved. >>: [inaudible]. >> Sumit Gulwani: Okay. So, [inaudible], do you want to speak? >>: Yes. >> Sumit Gulwani: Yeah, go ahead please. >>: Oh, sure. I'm a natural language processing person, and I'm more interested in the natural language processing part of your talk which is very exciting to me. For example, about the automatic inputting of LaTeX equations. So is there some language model involved in, you know, knowing which formula is more likely than other formulas? And also, you must have parallel corpora of a LaTeX and an natural language descriptions of a lot of equations so that you can train it on the correspondence between them. So you could elaborate on that project? >> Sumit Gulwani: Yeah, so great question, [inaudible]. So the question is how do we -So more background or more information about natural language processing and what kind of training data are we using to develop these translators which can take natural language into logical descriptions. So one of the things that help us solve this problem: as opposed to the general natural language translation problem is that if we look at the textbook-style problems, these problems are actually very structured. They do not have much ambiguity unlike arbitrary English text that you would normally speak with each other. So there are two kinds of things that you might want to consider in the education process, right? One thing is how a teacher formally states a given problem and the other is the language that the students would be using to describe their solutions and so on. So the student's language will be a little bit more ambiguous and less structured, but at least the teacher's language will be more structured. So you are absolutely right that most of these techniques that we have do involve training on some benchmark data and then using them. Unfortunately we do not have a great corpus. What happens in these domains is like -- I'll give an example of automata theory. We were able to collect around less than 200 problems. And for each of the English descriptions, we wrote down the expressed logical translation by hand. For geometry, I think we could have opted in more problems, but the experiments that we did were kind of roughly similar. So what is needed is techniques that exploit the fact that we are dealing with a structured English, and we have a very well defined logical language which is a domainspecific language and is a limited restricted language we are doing the translation to. And both of these facts have to be exploited in order to develop techniques that can learn from a small amount of training data. I think this is one big thing that is different from general natural language processing where normally the state of the art has gone more into trying to develop semi-supervised or unsupervised techniques that can leverage the presence of a huge amount of data. In contrast, we are working in a setting which will not have huge amounts of data to start with, which will have little amount of data. But then, you want to develop more supervised techniques to be able to achieve this translation. This is what we're doing on a short run. But having said that, it's like [inaudible] in a problem: once you start deploying these technologies, you're going to start collecting more data. And when more data starts coming in, you can then try to go more towards unsupervised techniques for doing the translation. >>: Can you... >>: Thank you. >>: ...comment a little bit on what your expectations are for the level of effort that will be required by the teachers, like the middle school teachers, to use Anoop's example, to create the systems, to deploy these in their classrooms? >> Sumit Gulwani: So the question is how much effort is required on the part of the teacher to be able to use the technologies that we have working on for middle school curriculum? So as you might observe, especially the example that I talked about, the one on developing technologies which can see how problems have to solved which is already done in the textbooks today, right, and from there automatically develop models of the procedure that the teacher wants the student to employ -- We have a technology for that. Once you take that procedure, you can use existing test case [inaudible] tools to develop progressions for the students. I did not talk about automatic grading in this domain, but that's also possible. So we are pretty much talking about a domain where, at least for middle school mathematics, it will mostly be a push button technology for the teacher, not much interactivity required. That's why it's such a live domain. It's already something that can have an impact and is the most that is closest to almost complete automation. >>: I have a question here. Microsoft [inaudible] has a product of -- Microsoft mathematics. It has a lot of similar things you were talking about, for example, you know, the [inaudible] project [inaudible]. So what's the connection between that product and some the ideas we have here? >> Sumit Gulwani: So the question is that Microsoft has a technology that goes by the name Microsoft Math Problem Generation, and how are the concepts there different from the ones that I talked about for generating problems for mathematics? So the way new problems are generated in Microsoft Math Generator is the teacher has to encode a lot of domain knowledge, and the freedom is with respect to choosing some constants, only constants. So in my case of algebra problem generation, you see we are trying to generate problems which are not only different with having fresh constants but they have a complete different operator set as well. So Microsoft Math Generator cannot generate new proof problems which are an act of creativity. Right? On the other hand, we are also investing into automatic generation of simpler problems like problems for addition, problems for multiplication and so on. And this is something that Microsoft Math Generator does. But we are, again, trying to get into a fully automatic mode where the teacher will not have to enter domain knowledge such as, you know, the important concepts, whether there's a single carry, double carry, triple carry, multiple single carries, no carry and so on. All these concepts are automatically discoverable by observing their different paths in the procedure that corresponds to that particular concept. So I think we are talking about much more improvement in the state of the art on top of what Microsoft Math Generator software already does today. >>: Sumit, maybe we should wrap up. >> Sumit Gulwani: Okay. So if there are no more questions then let's conclude today. So thanks everyone for attending. And as I said, students if you want to work in this area please do send me an e-mail. There are a lot of exciting topics to work on, and we need your help in order to make a big impact in this topic which has huge societal implications. Okay, thank you everyone for your attention. Bye.