>>: It's my pleasure to introduce Rebecca Fiebrink from... Rebecca has done some really interesting work at the intersection...

advertisement
>>: It's my pleasure to introduce Rebecca Fiebrink from Goldsmiths University of London.
Rebecca has done some really interesting work at the intersection of HCI, machine learning and
signal processing, which is what she is going to be talking to us about today.
>> Rebecca Fiebrink: Hi everyone. It's nice to be here. I'm going to talk about machine
learning in creative practice. I'm going to touch on a lot of projects that I've done over the last
seven years or so. I will take questions at any point, so feel free to interrupt. We are a small
group. One of the things that I'm really excited about right now is the fact that we have all of
these sensors that are cheap. They are easy for people to use. They're exciting for students
and hackers in many cases and often ubiquitous, like the sensors that are in your smart phone.
My goal, and a lot of my work is to make it easier for people to create interesting new real-time
interactions with these sensors. By real-time interaction I just mean really broadly, you've got
data coming in from somewhere. It could be a sensor. It could be a Twitter feed. It could be a
game controller or any number of things and you want to do something with it. You want to
actually control a game or you want to build a new musical instrument where you are
controlling sound synthesis as you move, or you want to give somebody feedback about the
way that they're moving and maybe guide them on how to move in a better way. I want to
make this easier and faster for both professional software developers as well as end-users,
students, teachers, musicians and so on. And so most of my work that I'm going to talk about
falls into one or more of these application areas and I'll talk more in detail about some of the
projects as we go. I mentioned I want to make this easier but I also want to make it more
accessible to people. The key to doing that in my work is to use machine learning. Use
machine learning to make sense of this real time sensor data, but also to rethink in that scope
what machine learning is really good for, why it might allow us to build these kinds of systems,
what kinds of systems we can build, what kinds of design processes we can support. And also
rethinking what the user interfaces to machine learning should be to make all of the possible.
Lots of systems, like I already mentioned, new musical instruments or systems for biofeedback
or for data sonification and visualization, they kind of have these three components. First of all
you have to get your data from somewhere. You have to get a sensor or an API that gives you
data coming in. Then you have to make sense of that data. You have to interpret it, do some
kind of decision-making to figure out if my data looks like this, this is what I want my computer
to do. This is what I want my game avatar to do or this is the sound that I want my instrument
to make. And then you've actually got to do that. There's got to be some piece of software or
hardware that takes those instructions and does the thing. A lot of applications for my work
the state of acquisition piece has become really easy largely in part to all of these off-the-shelf
sensors to people getting efficient with things like Arduino and Raspberry Pi and plugging
sensors into them and making stuff. And this other piece is often really easy. People who are
professional musicians, for instance, they're proficient in using digital music software where
you can send it middy information, rather control information that will make a sound. They
know how to use that. Or theme designers, they know how to program the game engine.
That's part of what they do. But this interpretation or this mapping stuff can be really difficult,
annoying, time-consuming for lots of reasons. These sensors might be giving you a noisy data.
They might be giving you high dimensional data. They might have a rather complicated
relationship between the data coming in and the thing you actually care about. Unsurprisingly,
this is where machine learning can come in and make things easier. I'm going to unpack this a
little bit. There are lots of different ways that we might interpret or map data. The two that I
focused on the most are classification and regression. When you talk about classification here,
the easy type of application is doing something like gesture recognition. So if I have a webcam
turned on and I want to do hand gestures in front of it I might say here's something that I could
do and I want this to basically be an action classifier. I wanted to say that's a closed fist. Once I
know that then it's easy for me to say send a message to my music program and have it play a
sequence of notes or send a message to my game engine and have my avatar do something.
And then if I do a different type of gesture, I get a different label and I can produce some
different responses. For most applications you do want this classifier to be really accurate. You
don't want it to give you the wrong label for a certain gesture and people really care about that.
There are a number of other priorities that start coming into play. You might also want it to
give you classifications for gestures that are comfortable for someone to use, that are easy for
people to remember or easy for people to learn. And these come up as part of the design
process that I'm going to talk about later. On the other end of the spectrum we might have
somebody not saying I want to trigger a bunch of different actions, but I want to control
something continuously. Or maybe I want to control a dozen things or a hundred things
continuously. In music, which is the domain that I'm coming from primarily, there are all sorts
of really compelling applications where you want to be controlling pitch and volume and tone
color and location in a space and these map onto dozens or hundreds of control parameters in
your software. Obviously, there are other application areas where you also want to control
many different continuous things simultaneously. So I'm going to be talking about this type of
problem as a mapping problem. You're literally constructing a function which maps from some
end dimensional input space into some end dimensional output space. This guy is one of the
most well-known early designers of new musical controllers and he has these sensor sensing
systems on his hands that he used to control sound as a performer onstage. Here the challenge
is slightly different. You don't just want something that is going to give you an accurate set of
labels. You want to create an efficient, effective high dimensional controller. You want
appropriate control for the task at hand. And this may mean that you want some accurate
reproduction of the function that you have in mind. It might also mean that you want this
controller to be expressive or comfortable or intuitive, whatever that means. And, again, this is
specific off into the application and to the specific person. We've got these three different
stages and the sensing, of course, in order to do that sensing we need some hardware and
possibly some software to get data. The interpretation is this classification or mapping and
produce the response. We can think of this as just taking those outputs from our classifier or
mapping function and sending them on to the appropriate piece of software. And usually in
most systems out there that do the real time control or interaction this interpretation is a piece
of software. It's a piece of program code that somebody had to sit down and write. I've
written this probably hundreds of times as many of you have written these pieces of code a lot
and there are problems with this. Again, especially when your data is noisy or high dimensional
or you have a complicated relationship between what you are sensing and what you actually
want the computer to do. So what I'm doing in my work is getting rid of us, getting rid of as
much of this as we possibly can, and instead, getting the person building the system to build a
system through examples of data. And there are two main types of data that are really easy to
get from a person building a new system. One is examples of inputs, examples of data streams
that they expect to see in the future in this real-time system. Another is examples of outputs.
Here are the sorts of things I want the computer to ultimately do. One approach, obviously, if
we are going to use machine learning, one approach is to use supervised learning where we
actually ask people to compare these two things together and that has been the majority of the
work that I've done in the space, though not all of it. So if you're not a machine learning person
here's my supervised learning in a nutshell slide. We have some algorithm and the algorithm
builds the model, so we don't have a person building the model anymore, but it's still just a
function. The algorithm builds this function from a set of training examples and each training
example has a set of example inputs, for instance, hand gestures and each of these hand
gestures is labeled with the output that I want my model to make for that input. So we train
our model and if everything goes well we can show it new input, new hand gestures and the
model will produce an appropriate output, in this case, sound one. In many applications we
even want this to be robust to small changes in the input, so even if my hand doesn't look
exactly the same as when I made my training set, it should give a reasonable classification. I
will come back to when this gets to be really interesting. It turns out this process, and we can
think about this as machine learning algorithms are designed mathematically to do this really
well, and just that principle alone starts to make this a really useful tool for interaction design.
One of the first benefits we get is that we can produce models that are much better at
generalizing to new inputs, being robust to these changes in the input data. This means that if
we are going to use machine learning to build a gesture classifier we can often build a gesture
classifier that's more accurate in general. It's more robust to changes compared to even if we
had the best programmer in the world sitting down and trying to write a function from scratch.
But there are additionally interesting aspects of machine learning that make it a good tool for a
design. Second of all, we often really do have these complex relationships between inputs and
outputs in the data where it might not be possible for any programmers to sit down and build
us something that works. Machine learning can do that. But beyond that, by using this process
we have circumvented the need for somebody somewhere to sit down and write some
program code. First of all, this makes it possible for somebody who is not a programmer to go
through this process. I have worked with kids as young as seven years old and they can do this
process. It's not that hard. It's easier than learning how to code. It also means that people
who might be programmers can often do this process much more quickly. And that starts to
drive some really substantial changes in the design process. If building something is easy that
actually changes the sorts of things that you build and the approach that you take to building
them. I'm going to talk a little bit about the software that I started making back in 2008, 2009
when I was a PhD student at Princeton. And this software is called the Wekinator. Many of you
here probably know Weka. It's a pretty nice off-the-shelf machine learning toolkit and it's good
for lots of different problems. It's also pretty easy for people who aren't machine learning
experts to get and use. They might have to read a textbook, but they can do something useful
fairly easily. I wanted to make something that was like Weka but for real-time applications. So
like Weka, the Wekinator is a software toolkit. It's a standalone piece of software. It runs in a
user interface, graphical user interface. It doesn't require you to be a programmer. You never
need to touch code. It's also fairly general purpose. It gives you a set of algorithms for
classification and regression and temporal modeling that work for a lot of different problems.
Furthermore, it's compatible with pretty much any type of sensor or game input or computer
vision or audio analysis system on the input side and you can connect it up to code that's
written in any programming language on the output side. It works nicely with music synthesis
engines and game engines and animation environments and so on. It's very much in the spirit
of Weka in those senses, however, it runs in real-time which Weka doesn't in the user GUI
version of it. More interestingly, we found out very quickly when we started this project that if
we were going to build something useful we had to address some of the differences between
what machine learning means and how you do it well in an environment like Weka. We've got
off-line data sets. You want to do a certain type of analysis and model building versus in these
contexts where you want to build a new gestural control system or a new musical instrument.
I'm going to make the case throughout this talk that using machine learning to build these
interactive systems is not exactly the same as building something with a tool like Weka. The
first big difference that you run into is that you don't have training data out there. You need to
collect the training data. If I want to make my hand gesture classifier I can't go out on the
internet and download the ground truth training set for these hand gestures. It probably
doesn't exist. But I can make that data set. I can just give it a bunch of examples and maybe
collect examples from other people doing the same set of gestures, but I get to choose the
gestures. The second difference is that if I'm building something like a hand gesture classifier,
I'm often proficient enough in the application area under consideration that I am qualified to
take that model and say I'm going to make some new hand gestures and I'm going to see what
it does. I'm not limited to saying I'm going to run cross validation on this data set and use that
as a metric of model quality. I can actually just take it and say what does it do when I do this.
What does it do when I do this? I can get a much different approach to evaluating models
that's going to give me different types of information. And then third, I can start taking that
information about what my model does well and what it doesn't do well and I can use it to
make changes, informed changes to the training data. So in the simple case maybe I say my
hand gesture classifier does really well in these two classes. Let's make the problem more
interesting. Let's add a third class. Why not? It's going to be more fun. Or I could say if it
doesn't do this gesture when I tilt my hand over to the side, why don't I just give it more
training examples with my hand tilting to the side, and I have a reasonable expectation that I
might improve the model. I know there are a bunch of people here who do interactive machine
learning. Different people mean different things when they talk about interactive machine
learning. I'm using the term the way that Fales and Olsen used it in their paper about 10 years
ago. When I say interactive machine learning I mean these types of things, specifically, in this
context. Before I give you a demo I want to give you a good idea of what's going to be
happening in the software that I show you. Once you have built a model with Wekinator you
can run it in real-time and you just get a stream of feature vectors coming in and these can be
from sensors, from audio, from wherever you want and it's going to output a stream of output
vectors. You can take these and send them to an animation program. You can send them to
processing or unity or sound synthesis environments as well. It doesn't matter. All of this
communication is done through a very nice simple communication protocol called open sound
control. If you're doing real time stuff and you haven't used open sound control, I recommend
it. It's like a very nice glue. It makes things easy. I mentioned that we might have more work
than one model at a time. For usability and debugging purposes, from the perspective of the
end user, we're making one model per output. Here if I am building a music system, I want to
control volume and pitch and some filter coefficients and so on. I'm going to separate each of
these out into a different model that is completely independent from the others, so that means
I can tune it independently if I want. In each of these models I can use one or more of the
available features. These models can be regression models, so I might use them for something
like volume. They could be classification models which I might use for something like discrete
pitches. They can be models that do segmentation. I'm not going to talk too much about that
in my talk but we could talk about that later. Or if you are doing classification, you can get
posterior probability distributions rather than just a single most likely label. And you can send
this on to whatever type of environment you want to control. If you're interested, these are
the algorithms that are in the current version of Wekinator. They are all pretty standard. I
didn't invent them. They are not specifically designed for interaction, although you could say
that dynamic time warping is something that doesn't get used a lot in a lot of other contexts.
But they are pretty standard. Let's do some demos. I'm going to open up Wekinator here and
I'm going to tell it, first of all I'm going to do when of my oldest demos for you. Dan Morris has
seen this before I think. I'm going to make a classifier. I'm going to get 100 really, really bad
computer vision inputs here. I'm taking a webcam input from my computer and I'm just
chopping it into a 10 by 10 color matrix and I'm taking the lengthy average brightness value in
each of these cells and sending that as 100 dimensional feature vector. This is kind of a silly
feature to use if you know anything about computer vision, but this is a lot like features that
people use in practice all over the place. It's kind of a first pass. I don't know anything about
my data. I don't know anything about signal processing. I want to try to use machine learning
to build something anyway. I'm going to send it 100 features and I want to control one
classifier with four classes. I'm going to start a really simple drum machine and this drum
machine is going to play different sounds when I give it different values. I'm going to train it,
give it a few examples of me standing here. I recorded 15 snapshots of me here and I'm going
to record some examples of me not standing there. Now I have 30 examples total. I'm going to
train it and run it. And it's actually learned pretty good me classifier. And I can start making it
more complicated. You might say here is my hand. It's still working pretty well. Let's see if I
can make it make a mistake. No. All right. Here it's a little bit fuzzy in here. It's confused with
my hand right here so I'll give it more hand examples in this space and I hope retrain it and run
it and now we say that that's better. That's a really simple classifier with bad features with
examples that I have just given it on the fly. Okay. Let's do a regression demo. For this one I've
got a Leap motion sensor here and I've got a much better feature set. In this case I've got, using
the hand skeleton data that I'm getting from the Leap and I'm just using the x, y and z features
of each fingertip. I got 15 features total. To do this, again, I'll say listen for these 15 inputs and
let's control a physical modeling algorithm. This algorithm that I'm going to show you is a very
pretty high dimensional algorithm. It takes in something like 11 different control inputs and
makes very different sounds depending on which input you give it. I'm just going to cycle
through some of the sounds for you to give you a sampling of this 11 dimensional sound space,
not with the drum in the background. That's just some pseudorandom location in the sound
space. If I'm a sound designer or I'm a composer, my job is to find not just locations in this
space that sound good or useful for an application, but trajectories through this space. I want
sound to change over time anyway that makes sense, which is musically expressive or fits the
sound seeing that I'm trying to design the sound for and so on. And I want to control this with
Leap, so I'm actually constructing, in this case I'm just using the nine most interesting
parameters and I'm constructing a 15 dimension to nine dimension function, so this is a pretty
complicated mapping function and I'm going to use Wekinator's default which is just off the
shelf neural networks to do this. What I can do is start out with a sound that I like, more or less
as a starting point, so I can start with that sound. I can say I want that sound to be here, and
then I might change the sound a little bit and say maybe I want a higher sound up here. I'll give
it some examples of that and train it. Am I getting data? Okay. That did not work for some
reason. Let's try this again. Here it is. Somebody's unhappy here. We are not getting data. Let
me restart this. I'm not sure what's going on. If that doesn't work we will do a different demo.
There we go. That should work. Okay. So now I've taught it that something about the fact that
the height of my hand corresponds to pitch and we get a nice little slider. Not the most
interesting thing in the world but it's kind of a nice Leap slide whistle. I can start making it a lot
more interesting if I give it different sounds. So with this you see suddenly I really exploded the
space of sounds that I can access. I still have some predictability in it. I know that I can make
certain signs or here and one sound over here, but I also can explore the space and start finding
things that weren't, they don't sound anything like what was in my training data. I can iterate
on this process and say I like what it's doing over here, but I don't like what it's doing over
there. But we put a different sound into the space over here and I can iteratively make this
more complicated giving it different sounds, giving it sounds that are more tailored to my
aesthetic preferences and so on. So that in a nutshell is what a lot of composers do when they
are using this system to make a new musical instrument. Does anyone have any questions
about the demo before I move on? Okay. As I mentioned, I build the first version of Wekinator
back in 2008, 2009. I've been using it with a lot of different people in different contexts since
then and also building new types of interfaces which are not Wekinator specifically, but similar
interactive machine learning interfaces for different applications. Some of the first people I've
worked with in this space are really gifted computer music composers and I have a demo of
one-piece which is much better music than what I just showed you. This is an example of
somebody who is a professional composer who worked with this over the period of several
months and built a piece which you are going to see here. The sensors here are these gain
track real world golf controllers which are supposed to be used to measure your golf swing, but
you can pick them up and use them to measure 3-D space of your hands, or 3-D position of your
hands in space. What she is doing, what she really wanted with this piece was to have people
doing something that was like the yoga Sun Salutation and she had a particular sound space
that she wanted people to move through and smoothly move through as they were moving and
also have slight differences between different performers. She had quite a clear conceptual
idea of what the piece was going to do in the sound space and she used Wekinator to turn this
into something that felt like it was the right instrument for her for the piece. I'll give you some
video. [video begins]. That's one video. The next clip I'll show you is another composer who is
an early user of Wekinator who was walking down the street one day and found a piece of tree
bark and as you do, said I want to turn that piece of tree bark into an instrument. And she put a
bunch of light sensors in it and connected it to Wekinator and then to the same music synthesis
software that I showed you earlier. This is her talking about the piece and you see a little bit of
the instrument and hear it in the background.
>>: Basically we are taking the data and it's comparing it to examples that I've given it in the
past of relationships between a certain data and a certain gesture and a certain sound. So if I
train the machine learning software that when I wrap my arms around and instrument the
sensors register less light it makes a particular kind of sound and then full light it makes a
different kind of sound. So I give it all these examples in the training process and then I run it
and see what happens. It takes the data that is coming in and says that looks just like the data,
or that is similar to the data that she wants, that she has when she wants this sound, so it sends
that message to my sound processor and the computer outputs the sound. It's just another
way, basically, mapping gesture to sound.
>> Rebecca Fiebrink: I like that video because you can see the instrument. You can see the way
that she's developed to play this instrument, but you also hear from her how she is thinking
about machine learning and her understanding of machine learning as somebody who is a
composer. She is not a machine learning person in any sense. I've done quite a bit of work and
still do quite a bit of work with professional composers, but I'm also working in a lot of other
application contexts. I've used Wekinator quite a bit in teaching, teaching kids as young as
seven as well as through PhD level, both teaching them about sensors and how you sensors, but
also teaching them about interaction design. It's a great way to get people started playing with
new types of interacting with computers without first having to get them proficient in
programming. And they can learn a lot by saying what happens if I connect this to this thing
and what might I build? I've had some projects recently building musical instruments for and
with people with different types of disabilities. Some of them look kind of like the instruments
that you just saw where it's really sort of experimental weird sounds. Some of them look and
sound much more conventional. I've done some work on building recognizers for existing
vocabularies. So instead of just saying I want this thing to do something interesting, people
come in and say I have a pretty clear idea of what it is that I want the system to learn. For
instance, this is a cellist who had a sensor bow that she used with her cello and she wanted to
teach the computer to recognize when she was doing legato and staccato and staccato
articulations. It's not a trivial learning problem, but if you can get the computer to recognize
that then you can build better computer accompaniment systems, for instance. I've done a
little bit of work on gesture recognition for rehabilitation and even research on human motor
learning. Right now one of my main projects is working with developers at different startups
and working with hackers and makers with things like hack days and building better prototyping
tools for them. The rest of my talk I'm going to give you a high-level tour of what I think are
some of the most interesting findings of this research, but I'm happy to answer questions about
any of these specific projects later if you have one. New perspectives on what good is machine
learning as a design tool. How does it work? Doesn't work well? What's hard about it? Highlevel finding here is maybe unsurprisingly, yeah. This kind of works. It works well for enough
concepts that I am still doing this work six years later. The composers that I worked with, like
the ones that you saw in the videos right away when I started doing participatory design
processes with them to build the first version of Wekinator, it became obvious that this is going
to be useful. The first thing it does is it makes the time to build a new instrument much, much
faster even for people who are expert programmers. And then secondly, people started talking
about how the type of instruments that they were building was very different from the type of
instrument that they were building when they wrote programming code. So I'll come back to
this and talk about why I think that is. Also, we were able in very early work in this area to see
that somebody that doesn't know anything about machine learning but has some sensors, has
some good feature extractors in that sensor and knows how to make a gesture set accurately
can build state-of-the-art quality classifiers. That's why this cellist was able to build a set of
articulation classifiers that matched or beat the state-of-the-art in published research on this
topic. And she could do that because this process I think is pretty easy to understand and
engage with even if you're not a machine learning person. The next thing that I want to
highlight here is that when I observe people using Wekinator and logged the things that people
are doing with the software, it becomes clear that it's very rare that somebody says I am going
to plug in my sensor. I'm going to give it some data. I'm going to train a model and then I'm
done and I walk away with it. There's a lot of iteration, a lot of people saying I'm going to try it
out. I like this. I don't like that. Let me change it, build a new model, try it out and so on. And
usually this is happening dozens of times in the simple cases. It might happen hundreds or even
more times, people building professional quality robust systems. So people are continually
iterating, building new models, trying them out, modifying them. In contrast to what people
usually do with a tool like Weka. When people are using Wekinator to build new interactions
it's usually not changing learning algorithms or changing algorithm parameters or changing
features. It's often changing the training data. It's saying I don't like this. I'm going to give it
more examples of what I really do want it to do for this type of input. And I think it's
constructive to think about the training data as actually a type of user interface. Instead of
writing code, people are giving examples of what kinds of inputs they want to give to the
model, what kinds of outputs they want the model to have. And this is the primary way they
communicate their goals from what the system should ultimately do. This is also often the way
that people fix model mistakes, by saying it didn't do what I wanted here. I'll give it more
examples here. Again, data, real-time streams of data is primarily the way that people evaluate
whether they like a model or not. If you think about as I was moving my hand around with this
train the model here, I'm learning quite a lot about what sounds does it make where? Do I like
it? Do I not like it? What else might I want it to do instead? This is true for both these
continuous mappings and for classifiers. Yeah?
>>: You said that you could fix models. You could add more data. Do you ever allow them to
[indiscernible]
>> Rebecca Fiebrink: Absolutely. One of the most obvious useful things that people requested
and that I added was the equivalent of an undo button, to say I just added a bunch of examples.
It screwed everything up. Let me remove those. In artistic contexts people have been really
interested in saying can I have it gradually remove the old data so I can actually impose sort of a
concept drift on my model. That might be interesting in non-artistic context, but it's been
useful for some people. So you can remove examples, certainly. Yeah?
>>: So the Leap not so much, but with the camera, I wonder something that would be nonobvious to users, you know, how much the context of like doing something that totally changes
and it totally works to their desk and the performance-based value my God. It's happening. Do
you see that and how do you help people account for these types of things?
>> Rebecca Fiebrink: Up to this point I haven't tried to build any tools that explicitly help
people with that process. Certainly being able to have something that is really easy to just turn
on and try out, at the very least you want somebody to end their soundtrack in a new space
and say this lighting is destroying everything and hopefully make it easy enough that they can
recover by adding more examples in that space. So in practice, that's what people have done.
But yeah, there's a lot more you could do there especially with certain types of sensors you
have sensor drift or sensitivity to environmental conditions. There's a number of cool ways that
you might address that. Good. One point, two points I want to make before I move on. First of
all, thinking about the training data as an interface for doing these things makes sense. You're
going to be able to do these more efficiently often by changing the data then you are by
changing the learning algorithm or changing your SVM kernel or so on. So there is a pretty
direct interface that people understand. I have asterisks here next to goals because I do want
to make a point that it's not that somebody comes to the table, usually, and says I want to build
a classifier that does exactly this. Or maybe they do, but often those goals change slightly over
time. I'll come back to this. But if you are able to easily change the data then that's okay. You
don't have to do anything complicated algorithmically. You just allow people to have a very
lightweight, low overhead weight to say my idea for what I'm building has changed now and it's
okay. Which brings me to my next point which is that when people have used Wekinator for
really serious projects and I say why in the world would you use this. It's a research piece of
software. It's a little bit weird looking. And I think this is key to a lot of the success that people
have had which is allowing them to very easily instantiate a new working system even if it's not
perfect. It allows them to prototype ideas quickly. The time from nothing to having something
that does something is very short. It could be 10 seconds as you saw in my demo. Whereas,
doing that with programming you might be talking about minutes, hours, days or weeks. It
allows them to say I think I have an idea. I'm not sure if this is a good idea. Let me try it out.
And when you allow people to do that with lots of different ideas, to say I'm not sure what
gesture said I want to use, for instance, you are not stuck with the first when you try. You
haven't sunk two weeks into it by the time you find out that maybe you're on the right track.
You can explore lots of different ideas in parallel. And for some applications it's also important
to be able to discover behaviors that you didn't necessarily plan for. So these first two points,
people have written about this. Ben Schneiderman has written about this. He talks about
creativity support tools. Bill Buxton has written about this. And we can talk about the
importance of these activities in the context of wicked problems. If you guys haven't come
across the idea of wicked problems, this is something that has been useful in shaping the way I
look at this work. You can talk about problems in engineering and design and music, all sorts of
things that people might want to do with sensors, well, you don't necessarily know exactly what
the specifications are until you actually have built the thing. You don't necessarily know what
exactly is a really good gesture classifier for controlling this videogame until you build it and you
try it out. And probably you build it and you try it out and you say that almost works, but there
is this thing that I didn't consider, which is screwing me up and I need to fix that. So you are
understanding of the problem goal and the problem constraints change over time and it's by
instantiating different designs that you actually learn and are able to get to your final design not
just more quickly, but as Bill Buxton says, you need to not just get the design right. You don't
just implement your specifications. You're getting the right design. You are making sure that
you are building the right thing to begin with. And when you are able to build something really
quickly and try it out that makes this process easier. So I think when people talked about that
this allows me to build just a better interface than by programming, I think this is a lot of what
is behind that. As I mentioned, sometimes especially people building new creative systems,
new musical instruments, they want to do more than this. They also want to not be
constrained by their own imaginations. If I'm building a new Leap motion sound exploration
interface because I'm a sound designer and I want to find a good sound for a particular scene in
a game or a sound effect, I might not have in my imagination the best sound already. I want to
be able to really efficiently explore lots of sounds and here something that might surprise. This
is very hard to do if you start by writing code. You say you are going to right a 15 to nine
dimensional mapping function, the easiest thing to do is to make a linear function with some
fairly simple translations and transformations and then you are kind of stuck with it. Whereas,
using this sort of example driven paradigm, you can put very different sounds into your training
set and get very different outputs. So this is also something that people have talked about as
being important to them in their choice to use this versus programming. I've been talking
about a specific set of ways in which people and machines are really co-adapting in this process.
This is I think a really important point that took me a long time to realize. You think about
machine learning from the conventional perspective and you think about it as I'm going to try
to build the very best model for the stated set. You assume that your goals are embedded in
that data set to an extent and you want to build the best thing. That's not often how the real
world works. As I mentioned here, we are not typically starting with ground truth data that has
already been collected. Even if we are we often are able to go get more data to either test the
model or prove the model. There are lots of different concepts that you might teach an
algorithm that are potentially useful. Earlier today we were sitting down and talking about
building a shake detector for the micro bit, for instance. There you could say this is pretty
simple. It's a pretty clear-cut problem. Either you are shaking it or you are not. Yes and no.
You could imagine are you going to enforce the fact that everybody has to shake it with, you
know, sort of the LEDs facing up and they have to shake it back and forth left and right? Or are
we also going to allow people to shake it up and down? Are we going to allow them to hold it
any way they want and shake it? Those are all different variations of the same is someone
shaking it problem. And they are all going to have different implications for how hard it is to
build a shake classifier. And there are going to be different implications for how easy to use
that shake classifier is going to be. And so we can think about this design space as presenting
lots of different potential trade-offs between the usefulness of the end model and the
feasibility of making it. What you see unsurprisingly is that people navigate this space. They
have a limited amount of time to build something. They have a limited amount of effort that
they are going to build into it or put into building it, and at some point they are going to make a
judgment call and say this is good enough. Let me move on with my life. Obviously, when we
are building tools we want to make this as easy as possible for people to build things that are as
complex as possible. But at the same time I think it's helpful to think about this larger context.
For instance, I have a paper at AVI a couple of years ago where we tried to build a better tool to
help people understand these trade-offs. This is a tool for recognizing beat boxing but also
other types of vocalizations and sounds. If you want to train a three class classifier, we actually
show you some information about the examples that you have recorded and how they might
overlap in the feature space. So this is one choice of three classes. This is another slightly
different choice of three classes. If you are a user, and we don't show people this because we
are not working in a two-dimensional feature space, but we show people this. If you are a user
knowing this can help you understand the trade-offs and say I could either just work with this
one because these are more easily able to be classified, or I could work with this but I have to
redefine class B by changing the way that I perform it. Or maybe I have to be more careful in
the type of training data that I give it and give it better training examples with less noise. Or I
have to come up with a better feature representation, so there's not one answer that's the
best. It's going to depend on the person and the context.
>>: [indiscernible] useful during the exploratory process with Wekinator? Because you can see
someone getting into [indiscernible] example that takes the model somewhere where they
didn't expect and have no way to inquire about that. Have you folded back this kind of
feedback to help people with that process?
>> Rebecca Fiebrink: Not yet. That's something that I would really like to do. A couple of more
points that I want to make before wrapping up. Another underappreciated benefit of using
machine learning to make interactive systems is that it allows systems to communicate very
directly. This is an embodied action that I want to take. Here's my embodied understanding of
how what I'm doing relates to what the computer is doing. If you are building a tree musical
instrument, for instance, it's going to be really hard for you to operationalize the relationship
between the sensors and the sounds in a mathematical function. It's really easy for you to say
here's what I want to be doing when I want the sound quiet and this is something that's louder.
You can demonstrate that. And I think there are all sorts of other application domains in which
people have tacit or embodied knowledge that they can provide much more easily than by
writing program code. So this is another factor that I think has made people want to use this
experience. Interactive machine learning is different from conventional machine learning
applications in a few ways that I think might be interesting for people who are machine learning
folks in the room. First, most obvious thing that comes up is the examples that people provide
when they are building a classifier in this way, they are not IID. This is actually a good thing. It
means that we can learn really efficiently from small training sets. This is sort of a conventional
machine learning application. Imagine these are two classes we want to fit a decision boundary
to this. You've all seen diagrams like this before. If someone has in their mind this decision
boundary, what they often start doing is giving canonical examples of each class and then they
train the model and say where does that boundary end up? And when they start testing it they
start testing the canonical ones as well as things that might be closer to the boundary. And
they're going to notice right away that there are a few examples that appear on the wrong side
of that boundary and they're going to feed that back into the training said and immediately get
a much better classifier. But they didn't have to go through the process of giving all these other
examples that actually are not that informative to the ultimate model. This makes things a little
bit hairy though because when you don't have IID data then things like cross validation accuracy
start to become problematic. In fact, in the cellists study that I mentioned we looked at the
relationship between cross validation accuracy of the models that she was making and her own
satisfaction with the models. In an ideal world you would want those to be positively
correlated. We found that they were negatively correlated. And we can talk about why that is,
but it kind of makes sense. The last point I'm going to make may be controversial, but I'm going
to claim that gesture recognition, gesture classification is often the first thing that comes to
mind for people who want to build a new system with sensors. It's I want to wave my hand and
turn my TV on or I want to do this and my drone is going to turn right. And that is cool. But a
lot of times this raises problems. This is a finite gesture set. It makes you behave in a sort of
rigid, prescriptive way. There's not a lot of room for error. You've got to memorize the
gestures. You feel like you're making mistakes when things go wrong and what I always ask
people is is there a good reason why you are not doing this with a button, because buttons are
really good for certain things. If there's a good reason, then fine. Go build yourself a gesture
classifier, but in a lot of other cases, building something that might be more like a cello where
you have continuous multidimensional control that allows you to explore where you can form
an understanding of what the interface allows you to do and learn how to play it in a way that
might be idiosyncratic to you is often much more satisfying. Our Chi paper for last year, we
looked at this a little bit. We compared using end-user training of classifiers for people with
disabilities with pretraining really high dimensional continuous control spaces that feel kind of
like this Leap thing here. We would build an interface that makes a sound a matter what you
do with it and as you move a little bit the sound changes and that's it. We gave it to people
with very different types of physical constraints and actually observed that people ended up
coming up with discrete gesture sets on their own. Everybody had an idiosyncratic way of
playing it and they would come up with these sorts of riffs just really that would result in sonic
riffs. So in the end everybody had a bespoke computer music instrument but everybody was
able to do something that was very comfortable and because they were exploring this space
they were able to build up a gesture set for themselves that didn't require them to sit back and
memorize it. So that was an interesting outcome of that. At the end of the day anybody can
use this. I mentioned I used it with seven-year-olds. It helps experts as well but I think we're on
our way to making this much more effective. We don't have too much time. I'm going to leave
it there and open it up for questions. So many people. I'm not going to take your questions
because we are going to talk. But I will come back to you.
>>: Do you know of anything that looks sort of like this that has been commercially deployed?
Like maybe somebody like Leap, for example, a super [indiscernible] put it in the developer's
hands who have never seen it before and how did that go?
>> Rebecca Fiebrink: Yeah. That is the third next slide that I was going to mention. Not a lot of
people have been commercially deploying this, but I'm working with four startups right now
around Europe who are trying to put this into products. We are actually studying their process
of doing this and trying to figure out how to best support them.
>>: In the model I was talking about the weren't really end-users. The end-users are software
developers.
>> Rebecca Fiebrink: Yes and no. We are actually looking at those in this context where, for
instance, oh I maybe have them on there. Sorry, five startups, the one that I left off is making
an app for sound designers where the end user will be customizing. So ask me in a year. Yeah?
>>: Especially from my background [indiscernible] it seems like teacher instruction is a big part
of this. How have you tackled that in the past? You take a seven-year-old and they want to do
assessment of something phonic. How does this solve the problem of extracting the features?
>> Rebecca Fiebrink: My first pass into that and this also came up when we were working with
Plucks [inaudible] who does sort of Arduino like platform for a bio signal acquisition, the first
thing we just did was just say let's wrap everything up in a GUI and give people visualization and
give people the sort of drop-down ability to add filtered features and look at peak detection
and that kind of thing. It's better than nothing, but it's not something that we have had a
chance to really rigorously explore. And I think we have been talking about this the last few
days. I think there's so much stuff we can do to make that easier. Yeah, I would love to do
more of that.
>>: On that same thread if you think about it it's like a guitarist where you kind of get to know
like [indiscernible] outputs. Can you imagine people learning in the space of like I need
something speechy and like this package gives me like very [indiscernible] features or
something and seeing that become a part of the vocabulary of libraries of features that they
need to used to do certain kinds of things.
>> Rebecca Fiebrink: Yeah. Yeah?
>>: You mentioned how you trained these models and rather than using a metric sometimes
it's just how it feels you and how important that it feels right to you. How much of that
transfers across users? Is this my instrument I mean in particular for me or is that
[indiscernible]
>> Rebecca Fiebrink: Yeah. I think that's a great point and certainly once you have something
that is meant to translate across users, for certain applications, it's okay to have the developer
say here's my gesture set. And to some extent if I wanted to be recognizing these hand signals
I'm going to train it the best I can and assume that other people are going to adapt to make
those gestures the same way and they are going to learn how to control the thing accurately.
Obviously, that breaks down at some near point where you want to give people better ability to
test it out on data from people who aren't themselves and to notice that my sensor really
doesn't work well on people with hand sizes that are different. Again, that is something that
we haven't explicitly started working with, but I think there's a lot that you could do there
either to give people better understandings of how deployment is likely to work for to use
something like transfer learning to allow end-users to further adapt something that has been
pre-trained. Yeah?
>>: Have you played with gestures that are more temporal in nature?
>> Rebecca Fiebrink: Yeah, so I skipped that part of the talk but one of the things that I've been
doing over the last year or so is looking at different basically path recognition algorithms. But
the easiest way to do that and the way that is built into Wekinator this version is dynamic time
working. And I've got some sort of specially configured dynamic time warping methods that
work really well for a lot of different sensors. There is a postdoc who just finished with us at
Goldsmiths who was doing some other techniques based on simplified Markov model where
you don't need a lot of training data to set the transition probabilities and that gives you
furthermore the ability to have an idea of where in the sequence you are at any given time. I
think that super useful as well. Ofer, do you want to ask your question?
>> Ofer Dekel: Yeah. When you were describing one of your demos you said you are just using
the default regression algorithm, which was a neural network and it seemed like you were kind
of brushing that off as an obvious default. If the goal of the people is to explore the space, and I
use two different algorithms. I have the nearest neighbor algorithm versus the neural network.
Maybe they will both do a good job learning my gestures, but they will interpolate differently so
really they will extrapolate to faraway points much, much differently. If you are regularizing
your neural network parameters, if you regularize very aggressively maybe you could get a very
simple interpolation. If you let the thing go wild and start from some random point you could
move through many, many different states and from one to another. But that would imply that
you need to expose something about the algorithm or the regular session parameter or some of
the machine learning [indiscernible] to the artist.
>> Rebecca Fiebrink: Yes and no. Yes and no. I think one of the first things that I found when
building the first version of Wekinator is that people get really kind of turned off by having to
explicitly make decisions about what algorithm to use or what parameterization to use. So one
of the things I spent a lot of time on was saying what's a good default algorithm for
classification or regression. What's a good default network architecture for the kinds of sensors
and applications that people are using? And then I didn't ever see the word neural network on
the screen when I loaded up the program and trained one. So you can happily coast along
without ever doing that. I think that's not optimal and certainly people are missing out on
opportunities to get better performance if they're never changing the algorithm. So one of the
things I forgot to mention, I'm teaching a MOOC starting in a few weeks about machine learning
for artists and musicians. One of the things I'm exploring in that MOOC is how to get people to
have a good intuitive understanding of how their choices of algorithms and parameters are
going to affect the models. So without having to know calculus or take a machine learning
course you can still with some human training make better decisions about things. That's one
side of it. At the same time, I think there is a lot that could be done without having to train
people buy allowing people instantiate multiple alternatives. Just say here is my classification
training data set, hit train and now I want to get three or five or 10 models out and I don't
necessarily need to know which one is which, but I can try the first one and if I don't like it I can
try the second one, and that's just another option that is there in addition to just training,
changing the training data. Especially, at some point when people are really happy with their
training data set, you kind of converge to something that needs to feel a little bit more like
Weka where you are happy with the data. Now it's time to explore the space of algorithm
configurations. Any other questions?
>>: Let's think the speaker.
>> Rebecca Fiebrink: Thanks a lot, guys. [applause].
Download