>>: It's my pleasure to introduce Rebecca Fiebrink from Goldsmiths University of London. Rebecca has done some really interesting work at the intersection of HCI, machine learning and signal processing, which is what she is going to be talking to us about today. >> Rebecca Fiebrink: Hi everyone. It's nice to be here. I'm going to talk about machine learning in creative practice. I'm going to touch on a lot of projects that I've done over the last seven years or so. I will take questions at any point, so feel free to interrupt. We are a small group. One of the things that I'm really excited about right now is the fact that we have all of these sensors that are cheap. They are easy for people to use. They're exciting for students and hackers in many cases and often ubiquitous, like the sensors that are in your smart phone. My goal, and a lot of my work is to make it easier for people to create interesting new real-time interactions with these sensors. By real-time interaction I just mean really broadly, you've got data coming in from somewhere. It could be a sensor. It could be a Twitter feed. It could be a game controller or any number of things and you want to do something with it. You want to actually control a game or you want to build a new musical instrument where you are controlling sound synthesis as you move, or you want to give somebody feedback about the way that they're moving and maybe guide them on how to move in a better way. I want to make this easier and faster for both professional software developers as well as end-users, students, teachers, musicians and so on. And so most of my work that I'm going to talk about falls into one or more of these application areas and I'll talk more in detail about some of the projects as we go. I mentioned I want to make this easier but I also want to make it more accessible to people. The key to doing that in my work is to use machine learning. Use machine learning to make sense of this real time sensor data, but also to rethink in that scope what machine learning is really good for, why it might allow us to build these kinds of systems, what kinds of systems we can build, what kinds of design processes we can support. And also rethinking what the user interfaces to machine learning should be to make all of the possible. Lots of systems, like I already mentioned, new musical instruments or systems for biofeedback or for data sonification and visualization, they kind of have these three components. First of all you have to get your data from somewhere. You have to get a sensor or an API that gives you data coming in. Then you have to make sense of that data. You have to interpret it, do some kind of decision-making to figure out if my data looks like this, this is what I want my computer to do. This is what I want my game avatar to do or this is the sound that I want my instrument to make. And then you've actually got to do that. There's got to be some piece of software or hardware that takes those instructions and does the thing. A lot of applications for my work the state of acquisition piece has become really easy largely in part to all of these off-the-shelf sensors to people getting efficient with things like Arduino and Raspberry Pi and plugging sensors into them and making stuff. And this other piece is often really easy. People who are professional musicians, for instance, they're proficient in using digital music software where you can send it middy information, rather control information that will make a sound. They know how to use that. Or theme designers, they know how to program the game engine. That's part of what they do. But this interpretation or this mapping stuff can be really difficult, annoying, time-consuming for lots of reasons. These sensors might be giving you a noisy data. They might be giving you high dimensional data. They might have a rather complicated relationship between the data coming in and the thing you actually care about. Unsurprisingly, this is where machine learning can come in and make things easier. I'm going to unpack this a little bit. There are lots of different ways that we might interpret or map data. The two that I focused on the most are classification and regression. When you talk about classification here, the easy type of application is doing something like gesture recognition. So if I have a webcam turned on and I want to do hand gestures in front of it I might say here's something that I could do and I want this to basically be an action classifier. I wanted to say that's a closed fist. Once I know that then it's easy for me to say send a message to my music program and have it play a sequence of notes or send a message to my game engine and have my avatar do something. And then if I do a different type of gesture, I get a different label and I can produce some different responses. For most applications you do want this classifier to be really accurate. You don't want it to give you the wrong label for a certain gesture and people really care about that. There are a number of other priorities that start coming into play. You might also want it to give you classifications for gestures that are comfortable for someone to use, that are easy for people to remember or easy for people to learn. And these come up as part of the design process that I'm going to talk about later. On the other end of the spectrum we might have somebody not saying I want to trigger a bunch of different actions, but I want to control something continuously. Or maybe I want to control a dozen things or a hundred things continuously. In music, which is the domain that I'm coming from primarily, there are all sorts of really compelling applications where you want to be controlling pitch and volume and tone color and location in a space and these map onto dozens or hundreds of control parameters in your software. Obviously, there are other application areas where you also want to control many different continuous things simultaneously. So I'm going to be talking about this type of problem as a mapping problem. You're literally constructing a function which maps from some end dimensional input space into some end dimensional output space. This guy is one of the most well-known early designers of new musical controllers and he has these sensor sensing systems on his hands that he used to control sound as a performer onstage. Here the challenge is slightly different. You don't just want something that is going to give you an accurate set of labels. You want to create an efficient, effective high dimensional controller. You want appropriate control for the task at hand. And this may mean that you want some accurate reproduction of the function that you have in mind. It might also mean that you want this controller to be expressive or comfortable or intuitive, whatever that means. And, again, this is specific off into the application and to the specific person. We've got these three different stages and the sensing, of course, in order to do that sensing we need some hardware and possibly some software to get data. The interpretation is this classification or mapping and produce the response. We can think of this as just taking those outputs from our classifier or mapping function and sending them on to the appropriate piece of software. And usually in most systems out there that do the real time control or interaction this interpretation is a piece of software. It's a piece of program code that somebody had to sit down and write. I've written this probably hundreds of times as many of you have written these pieces of code a lot and there are problems with this. Again, especially when your data is noisy or high dimensional or you have a complicated relationship between what you are sensing and what you actually want the computer to do. So what I'm doing in my work is getting rid of us, getting rid of as much of this as we possibly can, and instead, getting the person building the system to build a system through examples of data. And there are two main types of data that are really easy to get from a person building a new system. One is examples of inputs, examples of data streams that they expect to see in the future in this real-time system. Another is examples of outputs. Here are the sorts of things I want the computer to ultimately do. One approach, obviously, if we are going to use machine learning, one approach is to use supervised learning where we actually ask people to compare these two things together and that has been the majority of the work that I've done in the space, though not all of it. So if you're not a machine learning person here's my supervised learning in a nutshell slide. We have some algorithm and the algorithm builds the model, so we don't have a person building the model anymore, but it's still just a function. The algorithm builds this function from a set of training examples and each training example has a set of example inputs, for instance, hand gestures and each of these hand gestures is labeled with the output that I want my model to make for that input. So we train our model and if everything goes well we can show it new input, new hand gestures and the model will produce an appropriate output, in this case, sound one. In many applications we even want this to be robust to small changes in the input, so even if my hand doesn't look exactly the same as when I made my training set, it should give a reasonable classification. I will come back to when this gets to be really interesting. It turns out this process, and we can think about this as machine learning algorithms are designed mathematically to do this really well, and just that principle alone starts to make this a really useful tool for interaction design. One of the first benefits we get is that we can produce models that are much better at generalizing to new inputs, being robust to these changes in the input data. This means that if we are going to use machine learning to build a gesture classifier we can often build a gesture classifier that's more accurate in general. It's more robust to changes compared to even if we had the best programmer in the world sitting down and trying to write a function from scratch. But there are additionally interesting aspects of machine learning that make it a good tool for a design. Second of all, we often really do have these complex relationships between inputs and outputs in the data where it might not be possible for any programmers to sit down and build us something that works. Machine learning can do that. But beyond that, by using this process we have circumvented the need for somebody somewhere to sit down and write some program code. First of all, this makes it possible for somebody who is not a programmer to go through this process. I have worked with kids as young as seven years old and they can do this process. It's not that hard. It's easier than learning how to code. It also means that people who might be programmers can often do this process much more quickly. And that starts to drive some really substantial changes in the design process. If building something is easy that actually changes the sorts of things that you build and the approach that you take to building them. I'm going to talk a little bit about the software that I started making back in 2008, 2009 when I was a PhD student at Princeton. And this software is called the Wekinator. Many of you here probably know Weka. It's a pretty nice off-the-shelf machine learning toolkit and it's good for lots of different problems. It's also pretty easy for people who aren't machine learning experts to get and use. They might have to read a textbook, but they can do something useful fairly easily. I wanted to make something that was like Weka but for real-time applications. So like Weka, the Wekinator is a software toolkit. It's a standalone piece of software. It runs in a user interface, graphical user interface. It doesn't require you to be a programmer. You never need to touch code. It's also fairly general purpose. It gives you a set of algorithms for classification and regression and temporal modeling that work for a lot of different problems. Furthermore, it's compatible with pretty much any type of sensor or game input or computer vision or audio analysis system on the input side and you can connect it up to code that's written in any programming language on the output side. It works nicely with music synthesis engines and game engines and animation environments and so on. It's very much in the spirit of Weka in those senses, however, it runs in real-time which Weka doesn't in the user GUI version of it. More interestingly, we found out very quickly when we started this project that if we were going to build something useful we had to address some of the differences between what machine learning means and how you do it well in an environment like Weka. We've got off-line data sets. You want to do a certain type of analysis and model building versus in these contexts where you want to build a new gestural control system or a new musical instrument. I'm going to make the case throughout this talk that using machine learning to build these interactive systems is not exactly the same as building something with a tool like Weka. The first big difference that you run into is that you don't have training data out there. You need to collect the training data. If I want to make my hand gesture classifier I can't go out on the internet and download the ground truth training set for these hand gestures. It probably doesn't exist. But I can make that data set. I can just give it a bunch of examples and maybe collect examples from other people doing the same set of gestures, but I get to choose the gestures. The second difference is that if I'm building something like a hand gesture classifier, I'm often proficient enough in the application area under consideration that I am qualified to take that model and say I'm going to make some new hand gestures and I'm going to see what it does. I'm not limited to saying I'm going to run cross validation on this data set and use that as a metric of model quality. I can actually just take it and say what does it do when I do this. What does it do when I do this? I can get a much different approach to evaluating models that's going to give me different types of information. And then third, I can start taking that information about what my model does well and what it doesn't do well and I can use it to make changes, informed changes to the training data. So in the simple case maybe I say my hand gesture classifier does really well in these two classes. Let's make the problem more interesting. Let's add a third class. Why not? It's going to be more fun. Or I could say if it doesn't do this gesture when I tilt my hand over to the side, why don't I just give it more training examples with my hand tilting to the side, and I have a reasonable expectation that I might improve the model. I know there are a bunch of people here who do interactive machine learning. Different people mean different things when they talk about interactive machine learning. I'm using the term the way that Fales and Olsen used it in their paper about 10 years ago. When I say interactive machine learning I mean these types of things, specifically, in this context. Before I give you a demo I want to give you a good idea of what's going to be happening in the software that I show you. Once you have built a model with Wekinator you can run it in real-time and you just get a stream of feature vectors coming in and these can be from sensors, from audio, from wherever you want and it's going to output a stream of output vectors. You can take these and send them to an animation program. You can send them to processing or unity or sound synthesis environments as well. It doesn't matter. All of this communication is done through a very nice simple communication protocol called open sound control. If you're doing real time stuff and you haven't used open sound control, I recommend it. It's like a very nice glue. It makes things easy. I mentioned that we might have more work than one model at a time. For usability and debugging purposes, from the perspective of the end user, we're making one model per output. Here if I am building a music system, I want to control volume and pitch and some filter coefficients and so on. I'm going to separate each of these out into a different model that is completely independent from the others, so that means I can tune it independently if I want. In each of these models I can use one or more of the available features. These models can be regression models, so I might use them for something like volume. They could be classification models which I might use for something like discrete pitches. They can be models that do segmentation. I'm not going to talk too much about that in my talk but we could talk about that later. Or if you are doing classification, you can get posterior probability distributions rather than just a single most likely label. And you can send this on to whatever type of environment you want to control. If you're interested, these are the algorithms that are in the current version of Wekinator. They are all pretty standard. I didn't invent them. They are not specifically designed for interaction, although you could say that dynamic time warping is something that doesn't get used a lot in a lot of other contexts. But they are pretty standard. Let's do some demos. I'm going to open up Wekinator here and I'm going to tell it, first of all I'm going to do when of my oldest demos for you. Dan Morris has seen this before I think. I'm going to make a classifier. I'm going to get 100 really, really bad computer vision inputs here. I'm taking a webcam input from my computer and I'm just chopping it into a 10 by 10 color matrix and I'm taking the lengthy average brightness value in each of these cells and sending that as 100 dimensional feature vector. This is kind of a silly feature to use if you know anything about computer vision, but this is a lot like features that people use in practice all over the place. It's kind of a first pass. I don't know anything about my data. I don't know anything about signal processing. I want to try to use machine learning to build something anyway. I'm going to send it 100 features and I want to control one classifier with four classes. I'm going to start a really simple drum machine and this drum machine is going to play different sounds when I give it different values. I'm going to train it, give it a few examples of me standing here. I recorded 15 snapshots of me here and I'm going to record some examples of me not standing there. Now I have 30 examples total. I'm going to train it and run it. And it's actually learned pretty good me classifier. And I can start making it more complicated. You might say here is my hand. It's still working pretty well. Let's see if I can make it make a mistake. No. All right. Here it's a little bit fuzzy in here. It's confused with my hand right here so I'll give it more hand examples in this space and I hope retrain it and run it and now we say that that's better. That's a really simple classifier with bad features with examples that I have just given it on the fly. Okay. Let's do a regression demo. For this one I've got a Leap motion sensor here and I've got a much better feature set. In this case I've got, using the hand skeleton data that I'm getting from the Leap and I'm just using the x, y and z features of each fingertip. I got 15 features total. To do this, again, I'll say listen for these 15 inputs and let's control a physical modeling algorithm. This algorithm that I'm going to show you is a very pretty high dimensional algorithm. It takes in something like 11 different control inputs and makes very different sounds depending on which input you give it. I'm just going to cycle through some of the sounds for you to give you a sampling of this 11 dimensional sound space, not with the drum in the background. That's just some pseudorandom location in the sound space. If I'm a sound designer or I'm a composer, my job is to find not just locations in this space that sound good or useful for an application, but trajectories through this space. I want sound to change over time anyway that makes sense, which is musically expressive or fits the sound seeing that I'm trying to design the sound for and so on. And I want to control this with Leap, so I'm actually constructing, in this case I'm just using the nine most interesting parameters and I'm constructing a 15 dimension to nine dimension function, so this is a pretty complicated mapping function and I'm going to use Wekinator's default which is just off the shelf neural networks to do this. What I can do is start out with a sound that I like, more or less as a starting point, so I can start with that sound. I can say I want that sound to be here, and then I might change the sound a little bit and say maybe I want a higher sound up here. I'll give it some examples of that and train it. Am I getting data? Okay. That did not work for some reason. Let's try this again. Here it is. Somebody's unhappy here. We are not getting data. Let me restart this. I'm not sure what's going on. If that doesn't work we will do a different demo. There we go. That should work. Okay. So now I've taught it that something about the fact that the height of my hand corresponds to pitch and we get a nice little slider. Not the most interesting thing in the world but it's kind of a nice Leap slide whistle. I can start making it a lot more interesting if I give it different sounds. So with this you see suddenly I really exploded the space of sounds that I can access. I still have some predictability in it. I know that I can make certain signs or here and one sound over here, but I also can explore the space and start finding things that weren't, they don't sound anything like what was in my training data. I can iterate on this process and say I like what it's doing over here, but I don't like what it's doing over there. But we put a different sound into the space over here and I can iteratively make this more complicated giving it different sounds, giving it sounds that are more tailored to my aesthetic preferences and so on. So that in a nutshell is what a lot of composers do when they are using this system to make a new musical instrument. Does anyone have any questions about the demo before I move on? Okay. As I mentioned, I build the first version of Wekinator back in 2008, 2009. I've been using it with a lot of different people in different contexts since then and also building new types of interfaces which are not Wekinator specifically, but similar interactive machine learning interfaces for different applications. Some of the first people I've worked with in this space are really gifted computer music composers and I have a demo of one-piece which is much better music than what I just showed you. This is an example of somebody who is a professional composer who worked with this over the period of several months and built a piece which you are going to see here. The sensors here are these gain track real world golf controllers which are supposed to be used to measure your golf swing, but you can pick them up and use them to measure 3-D space of your hands, or 3-D position of your hands in space. What she is doing, what she really wanted with this piece was to have people doing something that was like the yoga Sun Salutation and she had a particular sound space that she wanted people to move through and smoothly move through as they were moving and also have slight differences between different performers. She had quite a clear conceptual idea of what the piece was going to do in the sound space and she used Wekinator to turn this into something that felt like it was the right instrument for her for the piece. I'll give you some video. [video begins]. That's one video. The next clip I'll show you is another composer who is an early user of Wekinator who was walking down the street one day and found a piece of tree bark and as you do, said I want to turn that piece of tree bark into an instrument. And she put a bunch of light sensors in it and connected it to Wekinator and then to the same music synthesis software that I showed you earlier. This is her talking about the piece and you see a little bit of the instrument and hear it in the background. >>: Basically we are taking the data and it's comparing it to examples that I've given it in the past of relationships between a certain data and a certain gesture and a certain sound. So if I train the machine learning software that when I wrap my arms around and instrument the sensors register less light it makes a particular kind of sound and then full light it makes a different kind of sound. So I give it all these examples in the training process and then I run it and see what happens. It takes the data that is coming in and says that looks just like the data, or that is similar to the data that she wants, that she has when she wants this sound, so it sends that message to my sound processor and the computer outputs the sound. It's just another way, basically, mapping gesture to sound. >> Rebecca Fiebrink: I like that video because you can see the instrument. You can see the way that she's developed to play this instrument, but you also hear from her how she is thinking about machine learning and her understanding of machine learning as somebody who is a composer. She is not a machine learning person in any sense. I've done quite a bit of work and still do quite a bit of work with professional composers, but I'm also working in a lot of other application contexts. I've used Wekinator quite a bit in teaching, teaching kids as young as seven as well as through PhD level, both teaching them about sensors and how you sensors, but also teaching them about interaction design. It's a great way to get people started playing with new types of interacting with computers without first having to get them proficient in programming. And they can learn a lot by saying what happens if I connect this to this thing and what might I build? I've had some projects recently building musical instruments for and with people with different types of disabilities. Some of them look kind of like the instruments that you just saw where it's really sort of experimental weird sounds. Some of them look and sound much more conventional. I've done some work on building recognizers for existing vocabularies. So instead of just saying I want this thing to do something interesting, people come in and say I have a pretty clear idea of what it is that I want the system to learn. For instance, this is a cellist who had a sensor bow that she used with her cello and she wanted to teach the computer to recognize when she was doing legato and staccato and staccato articulations. It's not a trivial learning problem, but if you can get the computer to recognize that then you can build better computer accompaniment systems, for instance. I've done a little bit of work on gesture recognition for rehabilitation and even research on human motor learning. Right now one of my main projects is working with developers at different startups and working with hackers and makers with things like hack days and building better prototyping tools for them. The rest of my talk I'm going to give you a high-level tour of what I think are some of the most interesting findings of this research, but I'm happy to answer questions about any of these specific projects later if you have one. New perspectives on what good is machine learning as a design tool. How does it work? Doesn't work well? What's hard about it? Highlevel finding here is maybe unsurprisingly, yeah. This kind of works. It works well for enough concepts that I am still doing this work six years later. The composers that I worked with, like the ones that you saw in the videos right away when I started doing participatory design processes with them to build the first version of Wekinator, it became obvious that this is going to be useful. The first thing it does is it makes the time to build a new instrument much, much faster even for people who are expert programmers. And then secondly, people started talking about how the type of instruments that they were building was very different from the type of instrument that they were building when they wrote programming code. So I'll come back to this and talk about why I think that is. Also, we were able in very early work in this area to see that somebody that doesn't know anything about machine learning but has some sensors, has some good feature extractors in that sensor and knows how to make a gesture set accurately can build state-of-the-art quality classifiers. That's why this cellist was able to build a set of articulation classifiers that matched or beat the state-of-the-art in published research on this topic. And she could do that because this process I think is pretty easy to understand and engage with even if you're not a machine learning person. The next thing that I want to highlight here is that when I observe people using Wekinator and logged the things that people are doing with the software, it becomes clear that it's very rare that somebody says I am going to plug in my sensor. I'm going to give it some data. I'm going to train a model and then I'm done and I walk away with it. There's a lot of iteration, a lot of people saying I'm going to try it out. I like this. I don't like that. Let me change it, build a new model, try it out and so on. And usually this is happening dozens of times in the simple cases. It might happen hundreds or even more times, people building professional quality robust systems. So people are continually iterating, building new models, trying them out, modifying them. In contrast to what people usually do with a tool like Weka. When people are using Wekinator to build new interactions it's usually not changing learning algorithms or changing algorithm parameters or changing features. It's often changing the training data. It's saying I don't like this. I'm going to give it more examples of what I really do want it to do for this type of input. And I think it's constructive to think about the training data as actually a type of user interface. Instead of writing code, people are giving examples of what kinds of inputs they want to give to the model, what kinds of outputs they want the model to have. And this is the primary way they communicate their goals from what the system should ultimately do. This is also often the way that people fix model mistakes, by saying it didn't do what I wanted here. I'll give it more examples here. Again, data, real-time streams of data is primarily the way that people evaluate whether they like a model or not. If you think about as I was moving my hand around with this train the model here, I'm learning quite a lot about what sounds does it make where? Do I like it? Do I not like it? What else might I want it to do instead? This is true for both these continuous mappings and for classifiers. Yeah? >>: You said that you could fix models. You could add more data. Do you ever allow them to [indiscernible] >> Rebecca Fiebrink: Absolutely. One of the most obvious useful things that people requested and that I added was the equivalent of an undo button, to say I just added a bunch of examples. It screwed everything up. Let me remove those. In artistic contexts people have been really interested in saying can I have it gradually remove the old data so I can actually impose sort of a concept drift on my model. That might be interesting in non-artistic context, but it's been useful for some people. So you can remove examples, certainly. Yeah? >>: So the Leap not so much, but with the camera, I wonder something that would be nonobvious to users, you know, how much the context of like doing something that totally changes and it totally works to their desk and the performance-based value my God. It's happening. Do you see that and how do you help people account for these types of things? >> Rebecca Fiebrink: Up to this point I haven't tried to build any tools that explicitly help people with that process. Certainly being able to have something that is really easy to just turn on and try out, at the very least you want somebody to end their soundtrack in a new space and say this lighting is destroying everything and hopefully make it easy enough that they can recover by adding more examples in that space. So in practice, that's what people have done. But yeah, there's a lot more you could do there especially with certain types of sensors you have sensor drift or sensitivity to environmental conditions. There's a number of cool ways that you might address that. Good. One point, two points I want to make before I move on. First of all, thinking about the training data as an interface for doing these things makes sense. You're going to be able to do these more efficiently often by changing the data then you are by changing the learning algorithm or changing your SVM kernel or so on. So there is a pretty direct interface that people understand. I have asterisks here next to goals because I do want to make a point that it's not that somebody comes to the table, usually, and says I want to build a classifier that does exactly this. Or maybe they do, but often those goals change slightly over time. I'll come back to this. But if you are able to easily change the data then that's okay. You don't have to do anything complicated algorithmically. You just allow people to have a very lightweight, low overhead weight to say my idea for what I'm building has changed now and it's okay. Which brings me to my next point which is that when people have used Wekinator for really serious projects and I say why in the world would you use this. It's a research piece of software. It's a little bit weird looking. And I think this is key to a lot of the success that people have had which is allowing them to very easily instantiate a new working system even if it's not perfect. It allows them to prototype ideas quickly. The time from nothing to having something that does something is very short. It could be 10 seconds as you saw in my demo. Whereas, doing that with programming you might be talking about minutes, hours, days or weeks. It allows them to say I think I have an idea. I'm not sure if this is a good idea. Let me try it out. And when you allow people to do that with lots of different ideas, to say I'm not sure what gesture said I want to use, for instance, you are not stuck with the first when you try. You haven't sunk two weeks into it by the time you find out that maybe you're on the right track. You can explore lots of different ideas in parallel. And for some applications it's also important to be able to discover behaviors that you didn't necessarily plan for. So these first two points, people have written about this. Ben Schneiderman has written about this. He talks about creativity support tools. Bill Buxton has written about this. And we can talk about the importance of these activities in the context of wicked problems. If you guys haven't come across the idea of wicked problems, this is something that has been useful in shaping the way I look at this work. You can talk about problems in engineering and design and music, all sorts of things that people might want to do with sensors, well, you don't necessarily know exactly what the specifications are until you actually have built the thing. You don't necessarily know what exactly is a really good gesture classifier for controlling this videogame until you build it and you try it out. And probably you build it and you try it out and you say that almost works, but there is this thing that I didn't consider, which is screwing me up and I need to fix that. So you are understanding of the problem goal and the problem constraints change over time and it's by instantiating different designs that you actually learn and are able to get to your final design not just more quickly, but as Bill Buxton says, you need to not just get the design right. You don't just implement your specifications. You're getting the right design. You are making sure that you are building the right thing to begin with. And when you are able to build something really quickly and try it out that makes this process easier. So I think when people talked about that this allows me to build just a better interface than by programming, I think this is a lot of what is behind that. As I mentioned, sometimes especially people building new creative systems, new musical instruments, they want to do more than this. They also want to not be constrained by their own imaginations. If I'm building a new Leap motion sound exploration interface because I'm a sound designer and I want to find a good sound for a particular scene in a game or a sound effect, I might not have in my imagination the best sound already. I want to be able to really efficiently explore lots of sounds and here something that might surprise. This is very hard to do if you start by writing code. You say you are going to right a 15 to nine dimensional mapping function, the easiest thing to do is to make a linear function with some fairly simple translations and transformations and then you are kind of stuck with it. Whereas, using this sort of example driven paradigm, you can put very different sounds into your training set and get very different outputs. So this is also something that people have talked about as being important to them in their choice to use this versus programming. I've been talking about a specific set of ways in which people and machines are really co-adapting in this process. This is I think a really important point that took me a long time to realize. You think about machine learning from the conventional perspective and you think about it as I'm going to try to build the very best model for the stated set. You assume that your goals are embedded in that data set to an extent and you want to build the best thing. That's not often how the real world works. As I mentioned here, we are not typically starting with ground truth data that has already been collected. Even if we are we often are able to go get more data to either test the model or prove the model. There are lots of different concepts that you might teach an algorithm that are potentially useful. Earlier today we were sitting down and talking about building a shake detector for the micro bit, for instance. There you could say this is pretty simple. It's a pretty clear-cut problem. Either you are shaking it or you are not. Yes and no. You could imagine are you going to enforce the fact that everybody has to shake it with, you know, sort of the LEDs facing up and they have to shake it back and forth left and right? Or are we also going to allow people to shake it up and down? Are we going to allow them to hold it any way they want and shake it? Those are all different variations of the same is someone shaking it problem. And they are all going to have different implications for how hard it is to build a shake classifier. And there are going to be different implications for how easy to use that shake classifier is going to be. And so we can think about this design space as presenting lots of different potential trade-offs between the usefulness of the end model and the feasibility of making it. What you see unsurprisingly is that people navigate this space. They have a limited amount of time to build something. They have a limited amount of effort that they are going to build into it or put into building it, and at some point they are going to make a judgment call and say this is good enough. Let me move on with my life. Obviously, when we are building tools we want to make this as easy as possible for people to build things that are as complex as possible. But at the same time I think it's helpful to think about this larger context. For instance, I have a paper at AVI a couple of years ago where we tried to build a better tool to help people understand these trade-offs. This is a tool for recognizing beat boxing but also other types of vocalizations and sounds. If you want to train a three class classifier, we actually show you some information about the examples that you have recorded and how they might overlap in the feature space. So this is one choice of three classes. This is another slightly different choice of three classes. If you are a user, and we don't show people this because we are not working in a two-dimensional feature space, but we show people this. If you are a user knowing this can help you understand the trade-offs and say I could either just work with this one because these are more easily able to be classified, or I could work with this but I have to redefine class B by changing the way that I perform it. Or maybe I have to be more careful in the type of training data that I give it and give it better training examples with less noise. Or I have to come up with a better feature representation, so there's not one answer that's the best. It's going to depend on the person and the context. >>: [indiscernible] useful during the exploratory process with Wekinator? Because you can see someone getting into [indiscernible] example that takes the model somewhere where they didn't expect and have no way to inquire about that. Have you folded back this kind of feedback to help people with that process? >> Rebecca Fiebrink: Not yet. That's something that I would really like to do. A couple of more points that I want to make before wrapping up. Another underappreciated benefit of using machine learning to make interactive systems is that it allows systems to communicate very directly. This is an embodied action that I want to take. Here's my embodied understanding of how what I'm doing relates to what the computer is doing. If you are building a tree musical instrument, for instance, it's going to be really hard for you to operationalize the relationship between the sensors and the sounds in a mathematical function. It's really easy for you to say here's what I want to be doing when I want the sound quiet and this is something that's louder. You can demonstrate that. And I think there are all sorts of other application domains in which people have tacit or embodied knowledge that they can provide much more easily than by writing program code. So this is another factor that I think has made people want to use this experience. Interactive machine learning is different from conventional machine learning applications in a few ways that I think might be interesting for people who are machine learning folks in the room. First, most obvious thing that comes up is the examples that people provide when they are building a classifier in this way, they are not IID. This is actually a good thing. It means that we can learn really efficiently from small training sets. This is sort of a conventional machine learning application. Imagine these are two classes we want to fit a decision boundary to this. You've all seen diagrams like this before. If someone has in their mind this decision boundary, what they often start doing is giving canonical examples of each class and then they train the model and say where does that boundary end up? And when they start testing it they start testing the canonical ones as well as things that might be closer to the boundary. And they're going to notice right away that there are a few examples that appear on the wrong side of that boundary and they're going to feed that back into the training said and immediately get a much better classifier. But they didn't have to go through the process of giving all these other examples that actually are not that informative to the ultimate model. This makes things a little bit hairy though because when you don't have IID data then things like cross validation accuracy start to become problematic. In fact, in the cellists study that I mentioned we looked at the relationship between cross validation accuracy of the models that she was making and her own satisfaction with the models. In an ideal world you would want those to be positively correlated. We found that they were negatively correlated. And we can talk about why that is, but it kind of makes sense. The last point I'm going to make may be controversial, but I'm going to claim that gesture recognition, gesture classification is often the first thing that comes to mind for people who want to build a new system with sensors. It's I want to wave my hand and turn my TV on or I want to do this and my drone is going to turn right. And that is cool. But a lot of times this raises problems. This is a finite gesture set. It makes you behave in a sort of rigid, prescriptive way. There's not a lot of room for error. You've got to memorize the gestures. You feel like you're making mistakes when things go wrong and what I always ask people is is there a good reason why you are not doing this with a button, because buttons are really good for certain things. If there's a good reason, then fine. Go build yourself a gesture classifier, but in a lot of other cases, building something that might be more like a cello where you have continuous multidimensional control that allows you to explore where you can form an understanding of what the interface allows you to do and learn how to play it in a way that might be idiosyncratic to you is often much more satisfying. Our Chi paper for last year, we looked at this a little bit. We compared using end-user training of classifiers for people with disabilities with pretraining really high dimensional continuous control spaces that feel kind of like this Leap thing here. We would build an interface that makes a sound a matter what you do with it and as you move a little bit the sound changes and that's it. We gave it to people with very different types of physical constraints and actually observed that people ended up coming up with discrete gesture sets on their own. Everybody had an idiosyncratic way of playing it and they would come up with these sorts of riffs just really that would result in sonic riffs. So in the end everybody had a bespoke computer music instrument but everybody was able to do something that was very comfortable and because they were exploring this space they were able to build up a gesture set for themselves that didn't require them to sit back and memorize it. So that was an interesting outcome of that. At the end of the day anybody can use this. I mentioned I used it with seven-year-olds. It helps experts as well but I think we're on our way to making this much more effective. We don't have too much time. I'm going to leave it there and open it up for questions. So many people. I'm not going to take your questions because we are going to talk. But I will come back to you. >>: Do you know of anything that looks sort of like this that has been commercially deployed? Like maybe somebody like Leap, for example, a super [indiscernible] put it in the developer's hands who have never seen it before and how did that go? >> Rebecca Fiebrink: Yeah. That is the third next slide that I was going to mention. Not a lot of people have been commercially deploying this, but I'm working with four startups right now around Europe who are trying to put this into products. We are actually studying their process of doing this and trying to figure out how to best support them. >>: In the model I was talking about the weren't really end-users. The end-users are software developers. >> Rebecca Fiebrink: Yes and no. We are actually looking at those in this context where, for instance, oh I maybe have them on there. Sorry, five startups, the one that I left off is making an app for sound designers where the end user will be customizing. So ask me in a year. Yeah? >>: Especially from my background [indiscernible] it seems like teacher instruction is a big part of this. How have you tackled that in the past? You take a seven-year-old and they want to do assessment of something phonic. How does this solve the problem of extracting the features? >> Rebecca Fiebrink: My first pass into that and this also came up when we were working with Plucks [inaudible] who does sort of Arduino like platform for a bio signal acquisition, the first thing we just did was just say let's wrap everything up in a GUI and give people visualization and give people the sort of drop-down ability to add filtered features and look at peak detection and that kind of thing. It's better than nothing, but it's not something that we have had a chance to really rigorously explore. And I think we have been talking about this the last few days. I think there's so much stuff we can do to make that easier. Yeah, I would love to do more of that. >>: On that same thread if you think about it it's like a guitarist where you kind of get to know like [indiscernible] outputs. Can you imagine people learning in the space of like I need something speechy and like this package gives me like very [indiscernible] features or something and seeing that become a part of the vocabulary of libraries of features that they need to used to do certain kinds of things. >> Rebecca Fiebrink: Yeah. Yeah? >>: You mentioned how you trained these models and rather than using a metric sometimes it's just how it feels you and how important that it feels right to you. How much of that transfers across users? Is this my instrument I mean in particular for me or is that [indiscernible] >> Rebecca Fiebrink: Yeah. I think that's a great point and certainly once you have something that is meant to translate across users, for certain applications, it's okay to have the developer say here's my gesture set. And to some extent if I wanted to be recognizing these hand signals I'm going to train it the best I can and assume that other people are going to adapt to make those gestures the same way and they are going to learn how to control the thing accurately. Obviously, that breaks down at some near point where you want to give people better ability to test it out on data from people who aren't themselves and to notice that my sensor really doesn't work well on people with hand sizes that are different. Again, that is something that we haven't explicitly started working with, but I think there's a lot that you could do there either to give people better understandings of how deployment is likely to work for to use something like transfer learning to allow end-users to further adapt something that has been pre-trained. Yeah? >>: Have you played with gestures that are more temporal in nature? >> Rebecca Fiebrink: Yeah, so I skipped that part of the talk but one of the things that I've been doing over the last year or so is looking at different basically path recognition algorithms. But the easiest way to do that and the way that is built into Wekinator this version is dynamic time working. And I've got some sort of specially configured dynamic time warping methods that work really well for a lot of different sensors. There is a postdoc who just finished with us at Goldsmiths who was doing some other techniques based on simplified Markov model where you don't need a lot of training data to set the transition probabilities and that gives you furthermore the ability to have an idea of where in the sequence you are at any given time. I think that super useful as well. Ofer, do you want to ask your question? >> Ofer Dekel: Yeah. When you were describing one of your demos you said you are just using the default regression algorithm, which was a neural network and it seemed like you were kind of brushing that off as an obvious default. If the goal of the people is to explore the space, and I use two different algorithms. I have the nearest neighbor algorithm versus the neural network. Maybe they will both do a good job learning my gestures, but they will interpolate differently so really they will extrapolate to faraway points much, much differently. If you are regularizing your neural network parameters, if you regularize very aggressively maybe you could get a very simple interpolation. If you let the thing go wild and start from some random point you could move through many, many different states and from one to another. But that would imply that you need to expose something about the algorithm or the regular session parameter or some of the machine learning [indiscernible] to the artist. >> Rebecca Fiebrink: Yes and no. Yes and no. I think one of the first things that I found when building the first version of Wekinator is that people get really kind of turned off by having to explicitly make decisions about what algorithm to use or what parameterization to use. So one of the things I spent a lot of time on was saying what's a good default algorithm for classification or regression. What's a good default network architecture for the kinds of sensors and applications that people are using? And then I didn't ever see the word neural network on the screen when I loaded up the program and trained one. So you can happily coast along without ever doing that. I think that's not optimal and certainly people are missing out on opportunities to get better performance if they're never changing the algorithm. So one of the things I forgot to mention, I'm teaching a MOOC starting in a few weeks about machine learning for artists and musicians. One of the things I'm exploring in that MOOC is how to get people to have a good intuitive understanding of how their choices of algorithms and parameters are going to affect the models. So without having to know calculus or take a machine learning course you can still with some human training make better decisions about things. That's one side of it. At the same time, I think there is a lot that could be done without having to train people buy allowing people instantiate multiple alternatives. Just say here is my classification training data set, hit train and now I want to get three or five or 10 models out and I don't necessarily need to know which one is which, but I can try the first one and if I don't like it I can try the second one, and that's just another option that is there in addition to just training, changing the training data. Especially, at some point when people are really happy with their training data set, you kind of converge to something that needs to feel a little bit more like Weka where you are happy with the data. Now it's time to explore the space of algorithm configurations. Any other questions? >>: Let's think the speaker. >> Rebecca Fiebrink: Thanks a lot, guys. [applause].