Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab, http://music.cs.northwestern.edu For presentation in ICASSP 2010, Dallas, Texas, USA. Multi-pitch Estimation & Tracking Task • Given polyphonic music played by several monophonic harmonic instruments (Num known) • Estimate a pitch trajectory for each instrument Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 2 Potential Applications • Automatic music transcription • Harmonic source separation • Other applications – Melody-based music search – Chord recognition – Source localization – Music education – …… Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 3 The 2-stage Standard Approach • Stage 1: Multi-pitch Estimation (MPE): estimate pitches in each single time frame – Z. Duan, B. Pardo and C. Zhang. , “Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-peak Regions”, IEEE Trans. Audio Speech Language Process., in press. Frequency • Stage 2: Multi-pitch Tracking (MPT): connect pitch estimates across frames into pitch trajectories … Time 4 State of the Art of MPT • What existing MPT methods do – Form short pitch trajectories within a note, (note-level) according to local time-frequency proximity of pitch estimates • Our contribution – Form long pitch trajectories through multiple notes (song-level) using a new constrained clustering algorithm Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 5 Try Clustering by Timbre ? • Each trajectory is a cluster of pitch estimates • One cluster per instrument • Clustering principle: maintain timbre consistency in each cluster Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Timbre Feature of Pitch Estimates • Harmonic structure: relative amplitudes of first 50 harmonics Harmonic Structure 100 Frequency Amplitude (dB) 80 60 40 20 0 0 10 20 30 Harmonic number Time 40 50 Minimize This Objective Function Number of Clusters K Center of k-th cluster f ( ) xi ck 2 k 1 iTk A partition into K clusters For all pitch estimates in k-th cluster The 50-d harmonic structure of i-th pitch estimate Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Objective Function Is Not Enough Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Add Pitch-locality Constraints Frequency • Must-link: pitch estimates close in both time and frequency should be in the same cluster • Cannot-link: simultaneous pitches should not be in the same cluster (only for monophonic instruments) Time 10 Properties of Our Problem • Objective: timbre consistency • Constraints: pitch locality • Previous constrained clustering algorithms do not apply due to the following properties: – Inconsistent constraints: pitch estimates sometimes erroneous may make constraints unsatisfiable – Heavily constrained: nearly every pitch estimate is involved in at least one constraint Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu The Proposed Clustering Algorithm n : clustering in n-th iteration; Cn : {all constraints satisfied by n } ; 1. Start from an initial clustering 0 , which satisfies C0 , a subset of all constraints; n=1; 2. Find a new clustering n that decreases the objective f and also satisfies Cn1 ; 3. Cn = {all constraints satisfied by n } ; 4. Repeat 2-4 until the objective (nearly) cannot be decreased; C0 C1 C f ( 0 ) f (1 ) f () Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Initial Clustering • Trivial one – 0 : a random partition – C0 : constraints satisfied by 0 , may be empty • A more informative one for MPT Frequency – 0 : label pitches according to pitch order in each frame: highest, second-highest, third.., fourth… – C0 : will contain all cannot-links … Time Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Find A New Clustering • 1. Satisfy current constraints C0 C1 C • 2. Decrease the objective function f ( 0 ) f (1 ) 1 3 5 4 1 3 2 7 6 3 8 : satisfied cannot-link : satisfied must-link f () 3 2 5 7 6 4 8 : unsatisfied cannot-link : unsatisfied cannot-link • Swap set: A connected subgraph between two clusters. • Traverse all swap sets until finding a new clustering that decreases the objective function Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Algorithm Review k : partition of points into clusters S k : feasible solution space under constraints Ck Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Experiments • Data set – 10 J.S. Bach chorales (quartets, played by violin, clarinet, saxophone and bassoon) – Each instrument is recorded individually, then mixed • Ground-truth pitch trajectories – Use YIN on monophonic tracks before mixing • Input pitch estimates – Our previous work in [1] – Input accuracy: 70.0+-3.1% [1] Zhiyao Duan, Bryan Pardo and Changshui Zhang, “Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-peak Regions”, IEEE Trans. Audio Speech Language Process., in press. Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 16 Overall Multi-pitch Tracking Results Mean % of correct pitch estimates Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Among Correctly Estimated Pitches Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu An Example Pitch (MIDI number) Ground-truth Pitch Trajectories 90 80 70 60 50 40 0 5 10 15 Time (second) 20 25 Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu An Example Pitch (MIDI number) Our Resutls 90 80 70 60 50 40 0 5 10 15 Time (second) 20 25 Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Conclusion • Formulate the song-level Multi-pitch Tracking problem as a constrained clustering problem – Objective: timbre consistency – Constraints: pitch locality • Existing constrained clustering algorithms do not apply due to problem properties • Propose a new constrained clustering algorithm • Experimental results are promising Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu Thanks you! Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 22