Real-time Human-Computer Interaction with Supervised Learning Algorithms for Music Composition and Performance Rebecca Fiebrink Perry Cook, Advisor Pre-FPO, 6/14/2010 2 Source: googleisagiantrobot.com 3 4 function [x flag hist dt] = pagerank(A,optionsu) [m n] = size(A); if (m ~= n) error('pagerank:invalidParameter', 'the matrix A must be square'); end; options = struct('tol', 1e-7, 'maxiter', 500, 'v', ones(n,1)./n, … 'c', 0.85, 'verbose', 0, 'alg', 'arnoldi', … 'linsys_solver', @(f,v,tol,its) bicgstab(f,v,tol,its), … 'arnoldi_k', 8, 'approx_bp', 1e-3, 'approx_boundary', inf,… 'approx_subiter', 5); if (nargin > 1) options = merge_structs(optionsu, options); end; if (size(options.v) ~= size(A,1)) error('pagerank:invalidParameter', … 'the vector v must have the same size as A'); end; if (~issparse(A)) A = sparse(A); end; % normalize the matrix P = normout(A); switch (options.alg) case 'dense’ [x flag hist dt] = pagerank_dense(P, options); case 'linsys’ [x flag hist dt] = pagerank_linsys(P, options) case 'gs’ [x flag hist dt] = pagerank_gs(P, options); case 'power’ [x flag hist dt] = pagerank_power(P, options); case 'arnoldi’ [x flag hist dt] = pagerank_arnoldi(P, options); case 'approx’ [x flag hist dt] = pagerank_approx(P, options); 5 'eval’ case [x flag hist dt] = pagerank_eval(P, options); function [x flag hist dt] = pagerank(A,optionsu) 6 7 Effective Efficient Satisfying 8 Effective Efficient Satisfying 9 Machine learning algorithms Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 10 computer music Interactive computer music sensed action interpretation response (music, visuals, etc.) 12 computer Computer as instrument sensed action interpretation sound generation 13 computer Computer as instrument sensed action interpretation mapping sound generation 14 human + control interface computer Computer as collaborator sensed action interpretation model meaning sound generation microphone and/or sensors 15 computer A composed system sensed action mapping/model/ interpretation response 16 Supervised learning Supervised learning inputs training data algorithm model Training outputs 18 Supervised learning inputs “C Major” “F minor” “G7” training data model algorithm Training 19 Running outputs “F minor” Supervised learning is useful • Models capture complex relationships from the data and generalize to new inputs. (accurate) • Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient) So why isn’t it used more often? 20 A lack of usable tools for making music 1. General-purpose: many algorithms & applications Existing WEKA: computer Weka music tools • Many standard ??? algorithms ✗ ✓ ✓ • Apply to any dataset 2. Runs on real-time signals •✗Graphical ✓interface✓+ API 3. Appropriate user interface •✗> 10,000 citations ✓ and interaction support (Google scholar) Built by engineer(Witten and Frank, 2005) 21 musicians for specific applications Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 22 The Wekinator • A general-purpose, real-time tool with appropriate interfaces for using and constructing supervised learning systems. • Built on Weka APIs • Downloadable at http://code.google.com/p/wekinator/ (Fiebrink, Cook, and Trueman 2009; Fiebrink, Trueman, and Cook 2009; Fiebrink et al. 2010) 23 A tool for running models in real-time Feature extractor(s) .01, .59, .03, ...... .01, .59, .03, .01, .01,.59, .59,.03, .03,...... model(s) 5,5,.01, 22.7, …… .01, 22.7, 5,5,.01, .01,22.7, 22.7,…… 24 Parameterizable process A tool for real-time, interactive design Wekinator supports user interaction with all stages of the model creation process. TRAIN ADD & MODIFY DATA 25 RUN & EVALUATE Under the hood joystick_xLearning joystick_y algorithms: webcam_1 … Feature1 Classification: Feature2 Feature3 FeatureN AdaBoost.M1 J48 Decision Tree SupportModel2 vector machineModelM Model1 … K-nearest neighbor Regression: MultilayerPerceptron Parameter1 Parameter2 volume pitch Class24 3.3098 26 … ParameterM Tailored but not limited to music The Wekinator • Built-in feature extractors for music & gesture • ChucK API for feature extractors and synthesis classes Open Sound Control (UDP) Other feature extraction modules 27 Other modules for sound synthesis, animation, …? Control messages Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 28 Wekinator in performance 29 Recap: what’s new? • Runs on real-time signals and general-purpose • A single interface for building and running models • Comprehensive support for interactions appropriate to computer music tasks 30 Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 31 Study 1: Participatory design process with 7 composers • Fall semester 2009 • 10 weeks, 3 hours / week • Group discussion, experimentation, and evaluation • Iterative design • Final questionnaire (Fiebrink et al., 2010) 32 Study 2: Teaching interactive systems building in PLOrk • COS/MUS 314 Spring 2010 • Focus on interactive music systems building • Wekinator midterm assignment – Master process of building a continuous and discrete gestural control system, and use in a performance • Logging + questionnaire • Final projects 33 Study 3: Bow gesture recognition • Winter 2010 • Work with a composer/cellist to build gesture recognizer for a commercial sensor bow • Classify standard bowing gestures – e.g., up/down, legato/marcato/spiccato (Fiebrink, Schedel, and Threw, 2010) • Outcomes: classifiers, improved software, written notes on engineering process 34 34 Study 4: Composition/composer case studies • Completed: Winter 2010 to present – CMMV (Dan Trueman, faculty) – Martlet (v 1.0) (Michelle Nagai, graduate student) – G (Raymond Weitekamp, undergraduate) – Blinky; nets0 (Rebecca Fiebrink) • Interviews completed with Michelle and Raymond 35 Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 36 Findings to date 1. Interacting with supervised learning 2. Training the user 3. Supervised learning in a creative context 4. Usability summary 37 Interactively training TRAIN ADD & MODIFY DATA RUN & EVALUATE • Primary means of control: iteratively edit the dataset, retrain, and re-evaluate • A straightforward way of affecting the model – Add data to make a model more complex – Add or delete data to correct errors 38 Exercising control via the dataset Average number actions per task, PLOrk students Changed algorithm parameters Changed classifier Deleted some examples Deleted all examples Edited examples Added examples 0 5 10 15 20 25 30 N=21; Students re-trained an average of 4.64 times per task (4.91) 39 The interface to the training data is important • Real-time example recording and a single interface improve efficiency • Supports embodiment and higher-level thinking – Several composers used playalong learning as the dominant method • Supports different granularities of control – K-Bow: visual label editing interface – Spreadsheet editor is still used 40 Interactive evaluation • Evaluation of models is also an interactive process in Wekinator 41 “Traditional” evaluation (e.g. Weka) Available data Training set Evaluatio n set 42 Train model Evaluate Evaluation in Wekinator Training set Train model Evaluate 43 Interactive evaluation • Running models is primary mode of evaluation – In PLOrk study: • Model run & used: 5.3 times (5.3) per task; • On average, 4.0 minutes (out of 19 minutes) running • CV computed: 1.4 times (std dev. 2.6) per task • Traditional metrics also useful – Compare different classifiers quickly (K-Bow) – Validation (of the user’s model-building ability) 44 When is this interaction feasible? 1. Appropriate and possible for human to provide and/or modify the data – User has knowledge and (possibly control) over future input space 2. Training process is fast – Training time in PLOrk: Median .80 seconds, 71 % of trainings under 5 seconds – PLOrk # training examples in final round: Mean 692, std. dev. 610 45 Related approaches to interactive learning • Building models of the user – Standard in speech recognition systems • Use human experts to improve a model of other phenomena – Vision: Fails and Olsen, 2003 – Document classification: Baker, Bhandari, and Thotakura, 2009 – Web images: Amershi 2010 • Novel in music, novel for a general-purpose tool 46 Findings to date 1. Interacting with supervised learning 2. Training the user 3. Supervised learning in a creative context 4. Usability summary 47 Interaction is two-way control Machine learning algorithms feedback Running & evaluation 48 Training the user to provide better training examples 1. Minimize noise and choose easily differentiable classes 49 PLork students learned: “In collecting data, it is crucial, especially in Motion Sensor, that the positions recorded are exaggerated (i.e. tilt all the way, as opposed to only halfway.) Usually this will do the trick…” “I tried to use very clear examples of contrast in [input features]... If the examples I recorded had values that were not as satisfactory, I deleted them and rerecorded… until the model understood the difference…” 50 Training the user to provide better training examples 1. Minimize noise and choose easily differentiable classes 2. Minimize risk and balance class representations 51 PLork students learned: “I tried to assess which [sounds] I would use more often and correlate them with [features] that were easier to obtain on the [input device]…” “Each extreme of a parameter should be trained with roughly the same number of examples...” 52 Other desirable user training • Train the user to: – change algorithms, their parameters, or features – change compositional goals – or not use machine learning at all • Compositional goals changed frequently • Insufficient feedback leads to frustration 53 Findings to date 1. Interacting with supervised learning 2. Training the user 3. Supervised learning in a creative context 4. Usability summary 54 Exploration and prototyping • Rapid exploration of alternatives is an important creative task (Shneiderman 2000) • Iterative re-training allows quick exploration of alternative designs &*$? • Rapid prototyping is easy … even useful for problems where machine learning isn’t necessary 55 &*$? &*$? Access to surprise and complexity • Neural networks are a black box for complexity and surprise – Acoustic instruments have complex, non-linear “mappings” – Gesturally exploring the output space of a new mapping can lead to serendipitous discovery 56 “There is simply no way I would be able to manually create the mappings that the Wekinator comes up with; being able to playfully explore a space that I've roughly mapped out, but that the Wekinator has provided the detail for, is inspiring.” 57 Access to surprise and complexity • Neural networks are a black box for complexity and surprise – Acoustic instruments have complex, non-linear “mappings” – Gesturally exploring the output space of a new mapping can lead to serendipitous discovery • Training data creation strategies enforced constraints & boundaries 58 Findings to date 1. Interacting with supervised learning 2. Training the user 3. Supervised learning in a creative context 4. Usability summary 59 Usability: According to PLOrk 60 Statement 5-pt. Likert mean (std. dev.) “I can reliably predict what sound my model will make for a given input gesture.” “Wekinator eventually learned what I wanted it to.” “My model provides reliable gesture classifications” (discrete task) 4.5 (.7) “My model is musically expressive” (continuous task) 4.1 (.7) 4.3 (.9) 4.9 (.2) Usability: According to composers Statement “The Wekinator allows me to create more expressive mappings than other techniques.” “The Wekinator allows me to create mappings more easily than other techniques.” 61 5-pt. Likert mean (std. dev.) 4.5 (.8) 4.7 (.5) “Well, I had basically lost interest in the whole process of digital controller-based instrument building, so the Wekinator's very existence has enabled and inspired me to get back into the game... The Wekinator enables you to focus on what your primary sonic and physical concerns are, and takes away the need to address so many details, and it does so in such a way that even if you DID spend all the time on building the mappings manually, you would *never* come up with what the Wekinator comes up with. So, the process becomes more focused, more musical, more creative, more playful. I actually *want* to do it. “ (Composer) “I love Wekinator!” (PLOrk) 62 Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 63 Further work to do • RED = Propose to complete before FPO • GREY = Propose to complete after FPO 64 Further work: Software • Fix known bugs • Add top-priority design features – E.g., algorithms, feature handling, GUI modifications • More reliable, light-weight logging • Up-to-date instructions & help • Repository of example feature extractors, synthesis modules, and full pieces 65 Further work: Bow gesture analysis • Package existing gesture recognizer (with logging) for wider distribution to K-Bow users • Leverage existing collaboration to obtain more data – Construct ground-truth dataset of cello bow gestures in performance – Use it for quantitative evaluation of our interactively-built models • Extend to multiple cellists (if possible) • Lots of potential future work on cross-performer gesture variability and interactive model personalization 66 Further work: Continuing work with composers • Ongoing composition projects whose timelines may fit into the dissertation: – Magnetic resonator piano (Andrew McPherson, postdoc @ Drexel) – Other misc. projects with unknown futures / timelines (Michelle Nagai, Jeff Snyder, Laetitia Sonami, …) • Continue to informally study composers using the Wekinator and support them with software improvements when feasible • Distribute logging versions of Wekinator to all composers using the system 67 Further work: Underscoring importance of interaction in ML with controlled user studies • Show effects of play-along example generation, interactive training set creation, interactive/hands-on evaluation on model accuracy, user satisfaction, model creation time, etc. • Requires study-specific interface implementation & infrastructure 68 Further work: Studying ML as a creative tool • Implement new algorithms and interfaces to specifically support creative work • Delineate different types of creative tasks in music, and associated differences in interface & interaction requirements 69 Further work: Wekinator • Make Wekinator the tool of choice for anyone doing machine learning in music (or other real-time domains) • Make it a practical foundation for HCI and music researchers (me) to test out different machine learning interfaces, algorithms, and interaction flows 70 Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up 71 Larger goals going forward • Expanding breadth of HCI research in computer music to focus on composition • Understanding supervised learning as a creativity support tool • Understanding the possible roles of humancomputer interaction in applied supervised learning 72 Contributions of my dissertation work 1. A software tool for interacting with supervised learning in new ways and applying supervised learning to real-time domains 2. Insight into the role of human-computer interaction in computer music composition and instrument design 3. Greater knowledge of the scope of interactive and creative possibilities in applied machine learning 4. New musical works 5. Evidence of the importance of interfaces and interaction in applying algorithms in the real world 73 Applied supervised learning is an HCI problem. (And an especially interesting one in real-time, creative domains!) Thanks! • • • • • • • • • • • • 75 Perry Cook Dan Trueman Dan Morris Ken Steiglitz Adam Finkelstein Cameron Britt Michelle Nagai Konrad Kaczmarek Michael Early MR Daniel Anne Hege Raymond Weitekamp • • • • • • • • • • • • Andrew McPherson All the PLOrk students Xiaojuan Ma Sonya Nikolova Ge Wang Matt Hoffmann Merrie Morris Sumit Basu Ichiro Fujinaga Jeff Snyder National Science Foundation GRFP MacArthur Foundation Related publications • Fiebrink, R. 2006. An exploration of feature selection as an optimization tool for musical genre classification. Master’s thesis, McGill University. • Fiebrink, R., P. R. Cook, and D. Trueman. 2009. “Play-along mapping of musical controllers.” Proc. International Computer Music Conference. • Fiebrink, R., M. Schedel, and B. Threw. 2010. “Constructing a personalizable gesture-recognizer infrastructure for the K-Bow.” International Conference on Music and Gesture (MG3). • Fiebrink, R., D. Trueman, C. Britt, M. Nagai, K. Kaczmarek, M. Early, M.R. Daniel, A. Hege, and P. R. Cook. 2010. “Toward understanding human-computer interactions in composing the instrument.” Proc. International Computer Music Conference. • Fiebrink, R., D. Trueman, and P. R. Cook. 2009. “A meta-instrument for interactive, on-the-fly learning.” Proc. New Interfaces for Musical Expression. • Fiebrink, R., G. Wang, and P. R. Cook. 2007. “Don't forget the laptop: Using native input capabilities for expressive musical control.” Proc. International Conference on New Interfaces for Musical Expression. • Fiebrink, R., G. Wang, and P. R. Cook. 2008. “Support for MIR prototyping and real-time applications in the ChucK programming language.” Proc. International Conference on Music Information Retrieval. • Wang, G., R. Fiebrink, and P. R. Cook. 2007. “Combining analysis and synthesis in the ChucK programming language.” Proc. International Computer Music Conference. 76 References • Amershi, S., Fogarty, J., Kapoor, A., and Tan, D. 2010. “Examining Multiple Potential Models in End-User Interactive Concept Learning.” Proc CHI 2010. • Baker, K., A. Bhandari, and R. Thotakura. 2009. “Designing an Interactive Automatic Document Classification System.” Proc. HCIR 2009, pp. 30–33. • Fails, Jerry, and Dan Olsen. 2003. “Interactive machine learning.” Proc. IUI, pp. 39–45. • Fels, S. S. and G. E. Hinton. 1993. “Glove-Talk: A neural network interface between a data-glove and a speech synthesizer.” IEEE Trans. on Neural Networks, vol. 4. • M. Lee, A. Freed, and D. Wessel. 1992. “Neural networks for simultaneous classification and parameter estimation in musical instrument control.” Adaptive and Learning Systems, vol. 1706, pp. 244-55. • Raphael, Chris. 2001. “A probabilistic expert system for automatic musical accompaniment.” Journal of Computational and Graphical Statistics, vol. 10, no. 3, pp. 487-512. • Shneiderman, B. 2000. “Creating Creativity: User interfaces for supporting innovation.” ACM Trans. CHI, vol. 7, no. 1, pp. 114–138. • Shneiderman, B. 2007. “Creativity support tools: Accelerating discovery and innovation.” Comm. ACM vol. 50, no. 12, Dec. 2007, pp. 20–32. • Witten, I., and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San Francisco: Morgan Kaufmann. 77 The role of accuracy • Do we care? – Yes! • Role affected by task and interaction paradigm • training vs. generalization accuracy 78