prefpo_presentation_final_fordan

advertisement
Real-time Human-Computer Interaction
with Supervised Learning Algorithms for
Music Composition and Performance
Rebecca Fiebrink
Perry Cook, Advisor
Pre-FPO, 6/14/2010
2
Source: googleisagiantrobot.com
3
4
function [x flag hist dt] = pagerank(A,optionsu)
[m n] = size(A);
if (m ~= n)
error('pagerank:invalidParameter', 'the matrix A must be square');
end;
options = struct('tol', 1e-7, 'maxiter', 500, 'v', ones(n,1)./n, …
'c', 0.85, 'verbose', 0, 'alg', 'arnoldi', …
'linsys_solver', @(f,v,tol,its) bicgstab(f,v,tol,its), …
'arnoldi_k', 8, 'approx_bp', 1e-3, 'approx_boundary', inf,…
'approx_subiter', 5);
if (nargin > 1)
options = merge_structs(optionsu, options);
end;
if (size(options.v) ~= size(A,1))
error('pagerank:invalidParameter', …
'the vector v must have the same size as A');
end;
if (~issparse(A))
A = sparse(A);
end;
% normalize the matrix
P = normout(A);
switch (options.alg)
case 'dense’
[x flag hist dt] = pagerank_dense(P, options);
case 'linsys’
[x flag hist dt] = pagerank_linsys(P, options)
case 'gs’
[x flag hist dt] = pagerank_gs(P, options);
case 'power’
[x flag hist dt] = pagerank_power(P, options);
case 'arnoldi’
[x flag hist dt] = pagerank_arnoldi(P, options);
case 'approx’
[x flag hist dt] = pagerank_approx(P, options);
5 'eval’
case
[x flag hist dt] = pagerank_eval(P, options);
function [x flag hist dt] = pagerank(A,optionsu)
6
7
Effective
Efficient
Satisfying
8
Effective
Efficient
Satisfying
9
Machine
learning
algorithms
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
10
computer music
Interactive computer music
sensed action
interpretation
response (music,
visuals, etc.)
12
computer
Computer as instrument
sensed action
interpretation
sound
generation
13
computer
Computer as instrument
sensed action
interpretation
mapping
sound
generation
14
human + control
interface
computer
Computer as collaborator
sensed action
interpretation
model
meaning
sound
generation
microphone and/or
sensors
15
computer
A composed system
sensed action
mapping/model/
interpretation
response
16
Supervised learning
Supervised learning
inputs
training
data
algorithm
model
Training
outputs
18
Supervised learning
inputs
“C Major” “F minor”
“G7”
training
data
model
algorithm
Training
19
Running
outputs
“F
minor”
Supervised learning is useful
•
Models capture complex relationships from the
data and generalize to new inputs. (accurate)
•
Supervised learning circumvents the need to
explicitly define mapping functions or models.
(efficient)
So why isn’t it used more often?
20
A lack of usable tools for making music
1. General-purpose: many
algorithms & applications
Existing
WEKA: computer
Weka music tools
• Many standard
???
algorithms
✗
✓
✓
• Apply to any dataset
2. Runs on real-time signals
•✗Graphical
✓interface✓+
API
3. Appropriate user interface
•✗> 10,000 citations ✓
and interaction support
(Google scholar)
Built
by engineer(Witten
and Frank, 2005)
21
musicians for
specific
applications
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
22
The Wekinator
• A general-purpose, real-time
tool with appropriate interfaces
for using and constructing supervised
learning systems.
• Built on Weka APIs
• Downloadable at
http://code.google.com/p/wekinator/
(Fiebrink, Cook, and Trueman 2009; Fiebrink, Trueman, and Cook 2009; Fiebrink et al. 2010)
23
A tool for running models in real-time
Feature extractor(s)
.01,
.59,
.03,
......
.01,
.59,
.03,
.01,
.01,.59,
.59,.03,
.03,......
model(s)
5,5,.01,
22.7,
……
.01,
22.7,
5,5,.01,
.01,22.7,
22.7,……
24
Parameterizable process
A tool for real-time, interactive design
Wekinator supports user interaction with all stages of the
model creation process.
TRAIN
ADD &
MODIFY
DATA
25
RUN &
EVALUATE
Under the hood
joystick_xLearning
joystick_y algorithms:
webcam_1
…
Feature1 Classification:
Feature2 Feature3
FeatureN
AdaBoost.M1
J48 Decision Tree
SupportModel2
vector machineModelM
Model1
…
K-nearest neighbor
Regression:
MultilayerPerceptron
Parameter1
Parameter2
volume
pitch
Class24
3.3098
26
…
ParameterM
Tailored but not limited to music
The Wekinator
• Built-in feature extractors for music & gesture
• ChucK API for feature extractors and synthesis
classes
Open Sound Control
(UDP)
Other feature
extraction
modules
27
Other modules for
sound synthesis,
animation, …?
Control
messages
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
28
Wekinator in performance
29
Recap: what’s new?
• Runs on real-time signals and general-purpose
• A single interface for building and running models
• Comprehensive support for interactions appropriate
to computer music tasks
30
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
31
Study 1: Participatory design process with 7
composers
•
Fall semester 2009
•
10 weeks, 3 hours / week
•
Group discussion, experimentation, and evaluation
•
Iterative design
•
Final questionnaire
(Fiebrink et al., 2010)
32
Study 2: Teaching interactive systems building in
PLOrk
• COS/MUS 314 Spring 2010
• Focus on interactive music
systems building
• Wekinator midterm
assignment
– Master process of building a
continuous and discrete
gestural control system, and
use in a performance
• Logging + questionnaire
• Final projects
33
Study 3: Bow gesture recognition
• Winter 2010
• Work with a composer/cellist to build gesture
recognizer for a commercial sensor bow
• Classify standard bowing gestures
– e.g., up/down, legato/marcato/spiccato
(Fiebrink, Schedel, and Threw, 2010)
• Outcomes: classifiers, improved software, written
notes on engineering process
34
34
Study 4: Composition/composer case studies
• Completed: Winter 2010 to present
– CMMV (Dan Trueman, faculty)
– Martlet (v 1.0) (Michelle Nagai, graduate student)
– G (Raymond Weitekamp, undergraduate)
– Blinky; nets0 (Rebecca Fiebrink)
• Interviews completed with Michelle and Raymond
35
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
36
Findings to date
1. Interacting with supervised
learning
2. Training the user
3. Supervised learning in a creative
context
4. Usability summary
37
Interactively training
TRAIN
ADD &
MODIFY
DATA
RUN &
EVALUATE
• Primary means of control: iteratively edit the
dataset, retrain, and re-evaluate
• A straightforward way of affecting the model
– Add data to make a model more complex
– Add or delete data to correct errors
38
Exercising control via the dataset
Average number actions per task, PLOrk students
Changed algorithm parameters
Changed classifier
Deleted some examples
Deleted all examples
Edited examples
Added examples
0
5
10
15
20
25
30
N=21; Students re-trained an average of 4.64 times per task (4.91)
39
The interface to the training data is important
• Real-time example recording and a single interface
improve efficiency
• Supports embodiment and higher-level thinking
– Several composers used playalong learning as the
dominant method
• Supports different granularities of control
– K-Bow: visual label editing interface
– Spreadsheet editor is still used
40
Interactive evaluation
• Evaluation of models is also an interactive process in
Wekinator
41
“Traditional” evaluation (e.g. Weka)
Available
data
Training
set
Evaluatio
n set
42
Train
model
Evaluate
Evaluation in Wekinator
Training
set
Train
model
Evaluate
43
Interactive evaluation
• Running models is primary mode of evaluation
– In PLOrk study:
• Model run & used: 5.3 times (5.3) per task;
• On average, 4.0 minutes (out of 19 minutes) running
• CV computed: 1.4 times (std dev. 2.6) per task
• Traditional metrics also useful
– Compare different classifiers quickly (K-Bow)
– Validation (of the user’s model-building ability)
44
When is this interaction feasible?
1. Appropriate and possible for human to provide and/or
modify the data
–
User has knowledge and (possibly control) over future
input space
2. Training process is fast
– Training time in PLOrk:
Median .80 seconds, 71 % of trainings under 5 seconds
– PLOrk # training examples in final round:
Mean 692, std. dev. 610
45
Related approaches to interactive learning
• Building models of the user
– Standard in speech recognition systems
• Use human experts to improve a model of other
phenomena
– Vision: Fails and Olsen, 2003
– Document classification: Baker, Bhandari, and
Thotakura, 2009
– Web images: Amershi 2010
• Novel in music, novel for a general-purpose tool
46
Findings to date
1. Interacting with supervised
learning
2. Training the user
3. Supervised learning in a creative
context
4. Usability summary
47
Interaction is two-way
control
Machine
learning
algorithms
feedback
Running & evaluation
48
Training the user to provide better training
examples
1. Minimize noise and choose easily differentiable
classes
49
PLork students learned:
“In collecting data, it is crucial, especially in
Motion Sensor, that the positions recorded are
exaggerated (i.e. tilt all the way, as opposed to
only halfway.) Usually this will do the trick…”
“I tried to use very clear examples of contrast in
[input features]... If the examples I recorded had
values that were not as satisfactory, I deleted
them and rerecorded… until the model
understood the difference…”
50
Training the user to provide better training
examples
1. Minimize noise and choose easily differentiable
classes
2. Minimize risk and balance class representations
51
PLork students learned:
“I tried to assess which [sounds] I would use more
often and correlate them with [features] that were
easier to obtain on the [input device]…”
“Each extreme of a parameter should be trained with
roughly the same number of examples...”
52
Other desirable user training
• Train the user to:
– change algorithms, their parameters, or features
– change compositional goals
– or not use machine learning at all
• Compositional goals changed frequently
• Insufficient feedback leads to frustration
53
Findings to date
1. Interacting with supervised
learning
2. Training the user
3. Supervised learning in a creative
context
4. Usability summary
54
Exploration and prototyping
• Rapid exploration of alternatives is an important
creative task
(Shneiderman 2000)
• Iterative re-training allows quick exploration of
alternative designs
&*$?
• Rapid prototyping is easy
… even useful for problems where
machine learning isn’t necessary
55
&*$?
&*$?
Access to surprise and complexity
• Neural networks are a black box for
complexity and surprise
– Acoustic instruments have complex,
non-linear “mappings”
– Gesturally exploring the output
space of a new mapping can lead
to serendipitous discovery
56
“There is simply no way I would be able to
manually create the mappings that the Wekinator
comes up with; being able to playfully explore a
space that I've roughly mapped out, but that the
Wekinator has provided the detail for, is inspiring.”
57
Access to surprise and complexity
• Neural networks are a black box for
complexity and surprise
– Acoustic instruments have complex,
non-linear “mappings”
– Gesturally exploring the output
space of a new mapping can lead
to serendipitous discovery
• Training data creation strategies
enforced constraints & boundaries
58
Findings to date
1. Interacting with supervised
learning
2. Training the user
3. Supervised learning in a creative
context
4. Usability summary
59
Usability: According to PLOrk
60
Statement
5-pt. Likert mean
(std. dev.)
“I can reliably predict what sound my
model will make for a given input
gesture.”
“Wekinator eventually learned what I
wanted it to.”
“My model provides reliable gesture
classifications” (discrete task)
4.5 (.7)
“My model is musically expressive”
(continuous task)
4.1 (.7)
4.3 (.9)
4.9 (.2)
Usability: According to composers
Statement
“The Wekinator allows me to create
more expressive mappings than other
techniques.”
“The Wekinator allows me to create
mappings more easily than other
techniques.”
61
5-pt. Likert mean
(std. dev.)
4.5 (.8)
4.7 (.5)
“Well, I had basically lost interest in the whole process of
digital controller-based instrument building, so the
Wekinator's very existence has enabled and inspired me
to get back into the game... The Wekinator enables you to
focus on what your primary sonic and physical concerns
are, and takes away the need to address so many details,
and it does so in such a way that even if you DID spend all
the time on building the mappings manually, you would
*never* come up with what the Wekinator comes up
with. So, the process becomes more focused, more
musical, more creative, more playful. I actually *want* to
do it. “ (Composer)
“I love Wekinator!” (PLOrk)
62
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
63
Further work to do
• RED = Propose to complete before FPO
• GREY = Propose to complete after FPO
64
Further work: Software
• Fix known bugs
• Add top-priority design features
– E.g., algorithms, feature handling, GUI modifications
• More reliable, light-weight logging
• Up-to-date instructions & help
• Repository of example feature extractors, synthesis
modules, and full pieces
65
Further work: Bow gesture analysis
• Package existing gesture recognizer (with logging) for
wider distribution to K-Bow users
• Leverage existing collaboration to obtain more data
– Construct ground-truth dataset of cello bow gestures in
performance
– Use it for quantitative evaluation of our interactively-built
models
• Extend to multiple cellists (if possible)
• Lots of potential future work on cross-performer gesture
variability and interactive model personalization
66
Further work: Continuing work with composers
• Ongoing composition projects whose timelines may fit into
the dissertation:
– Magnetic resonator piano (Andrew McPherson, postdoc @
Drexel)
– Other misc. projects with unknown futures / timelines (Michelle
Nagai, Jeff Snyder, Laetitia Sonami, …)
• Continue to informally study composers using the Wekinator
and support them with software improvements when feasible
• Distribute logging versions of Wekinator to all composers
using the system
67
Further work: Underscoring importance of
interaction in ML with controlled user studies
• Show effects of play-along example generation,
interactive training set creation, interactive/hands-on
evaluation on model accuracy, user satisfaction,
model creation time, etc.
• Requires study-specific interface implementation &
infrastructure
68
Further work: Studying ML as a creative tool
• Implement new algorithms and interfaces to
specifically support creative work
• Delineate different types of creative tasks in music,
and associated differences in interface & interaction
requirements
69
Further work: Wekinator
• Make Wekinator the tool of choice for anyone doing
machine learning in music (or other real-time
domains)
• Make it a practical foundation for HCI and music
researchers (me) to test out different machine
learning interfaces, algorithms, and interaction flows
70
Outline
• Overview of computer music and machine learning
• The Wekinator: A new interface for using machine
learning algorithms
• Live demo + video
• Completed studies
• Findings
• Further work for FPO and beyond
• Wrap-up
71
Larger goals going forward
• Expanding breadth of HCI research in computer
music to focus on composition
• Understanding supervised learning as a creativity
support tool
• Understanding the possible roles of humancomputer interaction in applied supervised learning
72
Contributions of my dissertation work
1. A software tool for interacting with supervised learning
in new ways and applying supervised learning to real-time
domains
2. Insight into the role of human-computer interaction in
computer music composition and instrument design
3. Greater knowledge of the scope of interactive and
creative possibilities in applied machine learning
4. New musical works
5. Evidence of the importance of interfaces and interaction
in applying algorithms in the real world
73
Applied supervised learning is an HCI
problem.
(And an especially interesting one in real-time, creative
domains!)
Thanks!
•
•
•
•
•
•
•
•
•
•
•
•
75
Perry Cook
Dan Trueman
Dan Morris
Ken Steiglitz
Adam Finkelstein
Cameron Britt
Michelle Nagai
Konrad Kaczmarek
Michael Early
MR Daniel
Anne Hege
Raymond Weitekamp
•
•
•
•
•
•
•
•
•
•
•
•
Andrew McPherson
All the PLOrk students
Xiaojuan Ma
Sonya Nikolova
Ge Wang
Matt Hoffmann
Merrie Morris
Sumit Basu
Ichiro Fujinaga
Jeff Snyder
National Science Foundation GRFP
MacArthur Foundation
Related publications
•
Fiebrink, R. 2006. An exploration of feature selection as an optimization tool for musical genre
classification. Master’s thesis, McGill University.
•
Fiebrink, R., P. R. Cook, and D. Trueman. 2009. “Play-along mapping of musical controllers.” Proc.
International Computer Music Conference.
•
Fiebrink, R., M. Schedel, and B. Threw. 2010. “Constructing a personalizable gesture-recognizer
infrastructure for the K-Bow.” International Conference on Music and Gesture (MG3).
•
Fiebrink, R., D. Trueman, C. Britt, M. Nagai, K. Kaczmarek, M. Early, M.R. Daniel, A. Hege, and P. R.
Cook. 2010. “Toward understanding human-computer interactions in composing the instrument.”
Proc. International Computer Music Conference.
•
Fiebrink, R., D. Trueman, and P. R. Cook. 2009. “A meta-instrument for interactive, on-the-fly
learning.” Proc. New Interfaces for Musical Expression.
•
Fiebrink, R., G. Wang, and P. R. Cook. 2007. “Don't forget the laptop: Using native input capabilities
for expressive musical control.” Proc. International Conference on New Interfaces for Musical
Expression.
•
Fiebrink, R., G. Wang, and P. R. Cook. 2008. “Support for MIR prototyping and real-time applications
in the ChucK programming language.” Proc. International Conference on Music Information Retrieval.
•
Wang, G., R. Fiebrink, and P. R. Cook. 2007. “Combining analysis and synthesis in the ChucK
programming language.” Proc. International Computer Music Conference.
76
References
•
Amershi, S., Fogarty, J., Kapoor, A., and Tan, D. 2010. “Examining Multiple Potential Models in End-User
Interactive Concept Learning.” Proc CHI 2010.
•
Baker, K., A. Bhandari, and R. Thotakura. 2009. “Designing an Interactive Automatic Document
Classification System.” Proc. HCIR 2009, pp. 30–33.
•
Fails, Jerry, and Dan Olsen. 2003. “Interactive machine learning.” Proc. IUI, pp. 39–45.
•
Fels, S. S. and G. E. Hinton. 1993. “Glove-Talk: A neural network interface between a data-glove and a
speech synthesizer.” IEEE Trans. on Neural Networks, vol. 4.
•
M. Lee, A. Freed, and D. Wessel. 1992. “Neural networks for simultaneous classification and parameter
estimation in musical instrument control.” Adaptive and Learning Systems, vol. 1706, pp. 244-55.
•
Raphael, Chris. 2001. “A probabilistic expert system for automatic musical accompaniment.” Journal of
Computational and Graphical Statistics, vol. 10, no. 3, pp. 487-512.
•
Shneiderman, B. 2000. “Creating Creativity: User interfaces for supporting innovation.” ACM Trans. CHI,
vol. 7, no. 1, pp. 114–138.
•
Shneiderman, B. 2007. “Creativity support tools: Accelerating discovery and innovation.” Comm. ACM vol.
50, no. 12, Dec. 2007, pp. 20–32.
•
Witten, I., and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San
Francisco: Morgan Kaufmann.
77
The role of accuracy
• Do we care?
– Yes!
• Role affected by task and interaction paradigm
• training vs. generalization accuracy
78
Download