Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005 Presentation Outline • Motivating Music Transcription • My Project Proposal • Project Timeline Motivating Music Transcription • Given a musical recording, we wish to obtain a MIDI score for : – Performance (convert MIDI to a music score) – Analysis (evaluate intonation or number of missed or incorrect notes - useful for music education) – Comparison with other music (copyright infringement / search) – Replay on MIDI synthesizers (use different instruments / change settings / overlay tracks, etc...) Recent Previous Work • Multi-Instrument Musical Transcription Using A Dynamic Graphical Model, Michael Jordan, 2004 • Automatic Transcription of Piano Music, Christopher Raphael, 2002, Univ. of Massachusetts, Amherst • Polyphonic Pitch Extraction, Graham Poliner, E6820 Speech & Audio Signal Processing, Spring 2004 • Many, many, many more…. Try searching Google for PDF documents with keywords : music transcription Presentation Outline • Motivating Music Transcription • My Project Proposal • Project Timeline My Project Proposal • Jordan presents a multi-instrument transcription system capable of listening to a recording in which two or more instruments are playing, and identifying both the notes that were played and the instruments that played them. The system models two musical instruments, each capable of playing at most one note at a time. • My Goal : implement and improve upon Jordan’s Dynamic Graphical Model (DGM) approach. • Whereas he made assumptions about how to model each instrument, I want to let the system learn what to look for by starting with a general model. • Jordan uses a reduced set of states and parameters for efficiency. Try to use a larger model if possible. My Project Proposal • Dynamic Graphical Model (DGM) - what is it? Hidden State Variables Correspond to Discrete Set of Allowable Intensity and Pitch Values My Project Proposal • Key Points in Jordan’s Approach – Use of a note-event timbre model that includes both a spectral model (in frequency) and a dynamic intensity versus time model (or a “time envelope model”). – We will perform inference (using the Viterbi Algorithm) on the DGM to compute the path of maximum posterior probability to find explicit note-on events. (note locations) My Project Proposal Intensity Transition Model for Violin My Project Proposal Intensity Transition Model for Piano My Project Proposal General Intensity Transition Model My Project Proposal Pitch Transition Model • Build a pitch state conditional probability distribution as a function of both the previous pitch state and the previous intensity state. • Transition probabilities are also based on Shephard's pitch helix : defines psychoacoustic distance between pitches. My Project Proposal Observation Model - explains the sound • Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. • That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). • Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). • The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. • The intensity envelope at time t scales all of the harmonics. My Project Proposal Observation Model • Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. • That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). • Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). • The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. • The intensity envelope at time t scales all of the harmonics. My Project Proposal Observation Model • Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. • That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). • Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). • The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. • The intensity envelope at time t scales all of the harmonics. My Project Proposal Observation Model • Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. • That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). • Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). • The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. • The intensity envelope at time t scales all of the harmonics. My Project Proposal Observation Model • Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. • That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). • Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). • The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. • The intensity envelope at time t scales all of the harmonics. My Project Proposal Evalution Metrics • Note Error Rate (based on “minimum edit distance” in speech) = 100 x ( Insertions + Substitutions + Deletions ) / Total Number of Notes in Score. We want to minimize this. • Dixon Success Score = 100 x (Correct Notes / ( Correct + False Positives + Deletions ). We want to maximize this. Presentation Outline • Motivating Music Transcription • My Project Proposal • Project Timeline Project Timeline Seven Weeks Left 3/14 - Collect MIDI Data + Convert to WAV Audio, Understand DGM 3/21 - Start building / understanding graphical models 3/28 - Continue building / understanding graphical models 4/04 - Finish building / understanding graphical models 4/11 - Evaluate Results / Fix Bugs 4/18 - Try new data / Fix bugs. Begin Preparing Final Presentation. 4/25 - Finish Preparing Final Presentation 4/27 - Final Presentation in Class