Songsmith: Using Machine Learning to Help People Make Music Dan Morris, Ian Simon, Sumit Basu, and the MSR Advanced Development Team Microsoft Research Computational User Experiences (CUE) group In general: HCI + (sensors, devices, machine learning, health, physiology) Computational User Experiences (CUE) group Using physiological signals for input Computational User Experiences (CUE) group Using physiological signals for input Health and wellness Computational User Experiences (CUE) group Using physiological signals for input Health and wellness Creativity support tools Songsmith: Using Machine Learning to Help People Make Music Dan Morris, Ian Simon, Sumit Basu, and the MSR Advanced Development Team Microsoft Research What is Songsmith? You, singing... Your music “Automatic accompaniment generation for vocal melodies” Songsmith High-Risk Live Demo (What other kind of live demo is there?) Who is Songsmith for? Today’s Talk Overview and demo How Songsmith works Exposing machine learning parameters What are people doing with Songsmith? Creativity support tools @ Microsoft Research Songsmith: 5000’ Overview G Amin C Daug C C# D D# E F F# G G# A A# B Chords from Melody Songsmith’s core: Hidden Markov Model Observations (note vectors) Song start Chord 1 Chord 2 Chord 3 Hidden states (chords) Chord 4 Song end HMMs in 5 Minutes or Less What does an HMM do for me? What does an HMM want from me? States Observations HMMs in 5 Minutes or Less Possible states: Transition Observation probabilities: probabilities: C major P(C Major P( | A Minor)? | A Minor)? C minor P(C Major P( | D Major)? | D Major)? C diminished … … …P(F# … Major | F# Major)? States Observations Building our HMM Things my HMM wants from me: Possible states Transition probabilities Observation probabilities Observations Finding the Probabilities Data-driven Not heuristic-driven Training Data ~300 lead sheets (vocal melodies with chords) Processing the Database Convert all chords to five basic triads Transpose every song into the same key Count the number of transitions from every chord to every other chord (transition probabilities): Count the total duration of each melody note occurring Minor C# Major C# Minor D Major D Minor … while eachC Major chordCis playing (observation probabilities): C Major 472 22 C Minor 9 314 C# Major … … Notes played over … … D Major … C Major: … C# Minor D Minor … … … … … 35 50 76 189 0 44 39 71 … 0.2 … … … 0.15 … … … … 0.1 … … … … … … 0.25 … 0.05 … 0 … C C# D … D# E F F# … G# G A A# B … Building our HMM Things my HMM wants from me: Possible states Transition probabilities Observation probabilities Observations Observations: what did the user sing? Input: Hard! Building a Pitch Histogram FFT Building our HMM Things my HMM wants from me: Possible states Transition probabilities Observation probabilities Observations Run Viterbi algorithm, get “best” sequence of chords… thank you, HMM! One hitch: key determination… Today’s Talk Overview and demo How Songsmith works Exposing machine learning parameters What are people doing with Songsmith? Creativity support tools @ Microsoft Research So we can choose optimal chords... so what? My demo took 10 seconds… is that fun? What would a songwriter do next? How can we build creative exploration into a learning-driven system? UI: Exposing Learning Parameters There are always hard-coded “magic numbers” in machine learning Machine learning also use lots of learned parameters Can we let users control those numbers? A Bad User Interface Songsmith: A fun way to make music (if you have a PhD in math and/or computer science) Transition matrix (edit me!) Expected pitch histograms (edit me!) C C# D D# E C 0.3 0.5 1.0 0.0 0.7 C 0.2 0.9 0.1 0.2 0.4 C# 0.9 0.3 0.1 0.8 0.0 C# 0.1 0.6 0.4 0.6 0.2 D 0.4 0.8 0.5 0.6 0.1 D 0.5 0.6 0.4 0.7 0.5 D# 1.0 0.9 0.2 0.7 0.2 D# 0.1 0.3 0.9 0.1 0.7 E 0.8 0.4 0.4 0.2 1.0 E 0.4 0.1 0.1 0.3 0.3 F 0.9 0.1 0.6 0.6 0.4 Observation weight: Conjugate prior: Frequency smoothing: UI: Exposing Learning Parameters Can we let users intuitively control those control those numbers? Exposing Model Parameters in Songsmith The “Happy Factor” Happy Factor: Implementation Partition database into two databases (major and minor songs) using clustering Build separate transition probability matrix for each database When we actually run our HMM, blend the two transition matrices together according to user input… The “Happy Factor” 0 1 log P ci | ci 1 log Pmaj ci | ci 1 1 log Pmin ci | ci 1 Bonus question: what’s wrong with this equation? Another “Happy Factor” Example Happy Sad The “Jazz Factor” Jazz Factor: Implementation When running our HMM, we need to make chords match the voice and each other Computing how well each chord fits at a given position: k • log( P(this chord | what the user sang) ) + (1-k) • log( P(this chord | the previous chord) ) Just put k on a slider! Chord Locking Global sliders are very coarse Chords can be “locked” by the user 1 P ci | ci 1 0 ci Clock ci Clock “Suggested Chords” Songwriters will often explore “chord substitutions”… …but we’re assuming our audience doesn’t know that much music theory… Expose suboptimal marginal probabilities at each node as “suggestions” Li log P x i | ci 1 log P ci | ci 1 1 log P ci 1 | ci Interactive Machine Learning Songsmith is one example of IML Roughly: moving what used to be in the domain of ML experts into users’ hands Related work: image classification Fogarty et al, CHI 2008: CueFlik Fails and Olsen, IUI 2003: Crayons Why? Harness end-user knowledge Use ML as a tool for data exploration Use ML as a tool for creative expression Today’s Talk Overview and demo How Songsmith works Exposing machine learning parameters What are people doing with Songsmith? Creativity support tools @ Microsoft Research Today’s Talk Overview and demo How Songsmith works Exposing machine learning parameters What are people doing with Songsmith? Creativity support tools @ Microsoft Research Dynamic Mapping of Physical Controls for Tabletop Groupware (Fiebrink, CHI 2009) One project, two problems: 1. Direct-touch, tabletop input is great for collaboration… …but suffers from serious precision issues. 2. Working on music alone is boring. Dynamic Mapping of Physical Controls for Tabletop Groupware (Fiebrink, CHI 2009) Incorporate high-precision controllers into a tabletop environment Evaluate in a collaborative audio-editing app Data-Driven Exploration of Musical Chord Sequences (Nichols, IUI 2009) Problem: let people create and explore music by moving around in a reduceddimensionality space Genres, artists make for intuitive labels Data-Driven Exploration of Musical Chord Sequences (Nichols, IUI 2009) Why isn’t this easy? Solution(s): Divergence-maximizing clustering, PCA Future work? work Make Songsmith freakin’ amazing, and port it to a bajillion platforms, and make an amazing community Web site, and build it into audio hosts… etc… Other applications of machine learning in creativity support tools… Writing? Painting? Web Design? CAD? Music? Songsmith research.microsoft.com/songsmith dan@microsoft.com Live/Google: songsmith Dan Morris, Ian Simon, Sumit Basu, and the MSR Advanced Development Team Microsoft Research