Barry Rafkind: Polyphonic Music Transcription Using A Dynamic

advertisement
Polyphonic Music Transcription
Using A Dynamic Graphical Model
Barry Rafkind
E6820 Speech and Audio Signal Processing
Wednesday, March 9th, 2005
Presentation Outline
• Motivating Music Transcription
• My Project Proposal
• Project Timeline
Motivating Music Transcription
• Given a musical recording, we wish to obtain a MIDI
score for :
– Performance (convert MIDI to a music score)
– Analysis (evaluate intonation or number of missed
or incorrect notes - useful for music education)
– Comparison with other music (copyright
infringement / search)
– Replay on MIDI synthesizers (use different
instruments / change settings / overlay tracks,
etc...)
Recent Previous Work
• Multi-Instrument Musical Transcription Using A
Dynamic Graphical Model, Michael Jordan, 2004
• Automatic Transcription of Piano Music, Christopher
Raphael, 2002, Univ. of Massachusetts, Amherst
• Polyphonic Pitch Extraction, Graham Poliner, E6820
Speech & Audio Signal Processing, Spring 2004
• Many, many, many more…. Try searching Google for
PDF documents with keywords : music transcription
Presentation Outline
• Motivating Music Transcription
• My Project Proposal
• Project Timeline
My Project Proposal
• Jordan presents a multi-instrument transcription system
capable of listening to a recording in which two or more
instruments are playing, and identifying both the notes that
were played and the instruments that played them. The
system models two musical instruments, each capable of
playing at most one note at a time.
• My Goal : implement and improve upon Jordan’s Dynamic
Graphical Model (DGM) approach.
• Whereas he made assumptions about how to model each
instrument, I want to let the system learn what to look for by
starting with a general model.
• Jordan uses a reduced set of states and parameters for
efficiency. Try to use a larger model if possible.
My Project Proposal
• Dynamic Graphical Model (DGM) - what is it?
Hidden State
Variables
Correspond to
Discrete Set of
Allowable Intensity
and Pitch Values
My Project Proposal
• Key Points in Jordan’s Approach
– Use of a note-event timbre model that
includes both a spectral model (in
frequency) and a dynamic intensity versus
time model (or a “time envelope model”).
– We will perform inference (using the Viterbi
Algorithm) on the DGM to compute the path
of maximum posterior probability to find
explicit note-on events. (note locations)
My Project Proposal
Intensity Transition Model for Violin
My Project Proposal
Intensity Transition Model for Piano
My Project Proposal
General Intensity Transition Model
My Project Proposal
Pitch Transition Model
• Build a pitch state conditional probability
distribution as a function of both the previous
pitch state and the previous intensity state.
• Transition probabilities are also based on
Shephard's pitch helix : defines psychoacoustic distance between pitches.
My Project Proposal
Observation Model - explains the sound
• Model the spectrum of a harmonic musical signal as a series of
narrow bump functions that are harmonically spaced.
• That is, conditional on the fundamental frequency Pitch(t) of the
musical signal, we model the spectrum as consisting of a series
of bump functions located at integer multiples of Pitch(t).
• Each bump function is given a scale parameter alpha(n) that
can depend on Pitch(t).
• The motivation for this is that the relative spectral content of an
instrument can depend on what pitch is being played.
• The intensity envelope at time t scales all of the harmonics.
My Project Proposal
Observation Model
• Model the spectrum of a harmonic musical signal as a series of
narrow bump functions that are harmonically spaced.
• That is, conditional on the fundamental frequency Pitch(t) of the
musical signal, we model the spectrum as consisting of a series
of bump functions located at integer multiples of Pitch(t).
• Each bump function is given a scale parameter alpha(n) that
can depend on Pitch(t).
• The motivation for this is that the relative spectral content of an
instrument can depend on what pitch is being played.
• The intensity envelope at time t scales all of the harmonics.
My Project Proposal
Observation Model
• Model the spectrum of a harmonic musical signal as a series of
narrow bump functions that are harmonically spaced.
• That is, conditional on the fundamental frequency Pitch(t) of the
musical signal, we model the spectrum as consisting of a series
of bump functions located at integer multiples of Pitch(t).
• Each bump function is given a scale parameter alpha(n) that
can depend on Pitch(t).
• The motivation for this is that the relative spectral content of an
instrument can depend on what pitch is being played.
• The intensity envelope at time t scales all of the harmonics.
My Project Proposal
Observation Model
• Model the spectrum of a harmonic musical signal as a series of
narrow bump functions that are harmonically spaced.
• That is, conditional on the fundamental frequency Pitch(t) of the
musical signal, we model the spectrum as consisting of a series
of bump functions located at integer multiples of Pitch(t).
• Each bump function is given a scale parameter alpha(n) that
can depend on Pitch(t).
• The motivation for this is that the relative spectral content of an
instrument can depend on what pitch is being played.
• The intensity envelope at time t scales all of the harmonics.
My Project Proposal
Observation Model
• Model the spectrum of a harmonic musical signal as a series of
narrow bump functions that are harmonically spaced.
• That is, conditional on the fundamental frequency Pitch(t) of the
musical signal, we model the spectrum as consisting of a series
of bump functions located at integer multiples of Pitch(t).
• Each bump function is given a scale parameter alpha(n) that
can depend on Pitch(t).
• The motivation for this is that the relative spectral content of an
instrument can depend on what pitch is being played.
• The intensity envelope at time t scales all of the harmonics.
My Project Proposal
Evalution Metrics
• Note Error Rate (based on “minimum edit
distance” in speech) = 100 x ( Insertions +
Substitutions + Deletions ) / Total Number of
Notes in Score. We want to minimize this.
• Dixon Success Score = 100 x (Correct Notes
/ ( Correct + False Positives + Deletions ).
We want to maximize this.
Presentation Outline
• Motivating Music Transcription
• My Project Proposal
• Project Timeline
Project Timeline
Seven Weeks Left
3/14 - Collect MIDI Data + Convert to WAV Audio, Understand DGM
3/21 - Start building / understanding graphical models
3/28 - Continue building / understanding graphical models
4/04 - Finish building / understanding graphical models
4/11 - Evaluate Results / Fix Bugs
4/18 - Try new data / Fix bugs. Begin Preparing Final Presentation.
4/25 - Finish Preparing Final Presentation
4/27 - Final Presentation in Class
Download