DTW for QBSH J.-S Roger Jang (張智星) http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University Dynamic Time Warping (DTW) Goal: Allows comparison of high tolerance to tempo variation Characteristics: Robust for irregular tempo variations Trial-and-error for dealing with key transposition Expensive in computation Does not conform to triangle inequality Some indexing algorithms do exist -2- Dynamic Time Warping: Type 1 t: input pitch vector (8 sec) r: reference pitch vector Local paths: 27-45-63 degrees j r(j) D(i, j ) 3-step formula for DTW: r(j-1) 1. D(i, j ) : DT W dist ancebet ween t (1 : i ) and r (1 : j ) 2. Recurrentformulafor D (i, j ) D(i 1, j 2) D (i, j ) | t (i ) r ( j ) | min D (i 1, j 1) D(i 2, j 1) D (1,1) | t (1) r (1) | min D (m, j ) for anchoredbeginning j 3. Answer D ( m , n ) for anchoredbeginningand anchoredend t(i-1) t(i) i -3- Dynamic Time Warping: Type 2 j r(j) t: input pitch vector (8 sec) r: reference pitch vector Local paths: 0-45-90 degrees D(i, j ) DTW recurrence: r(j-1) D(i, j ) | t (i ) r ( j ) | D(i, j 1) min D(i 1, j 1) D(i 1, j ) Min distance min D(m, j ) j t(i-1) t(i) i -4- Local Path Constraints Type 1: Type 2: 27-45-63 local paths 0-45-90 local paths Di, j Di 1, j Di 2, j 1 Di 1, j 1 Di 1, j 1 Di 1, j 2 Di, j Di, j 1 D(i, j ) t (i ) r ( j ) D(i, j ) t (i ) r ( j ) D(i 1, j 2) min D(i 1, j 1) D(i 2, j 1) D(i, j 1) min D(i 1, j 1) D(i 1, j ) -5- Path Penalty Path penalty Small/no penalty for 45-degree path Large penalty for paths deviated from 45-degree D(i, j ) D (i 1, j 2) D(i 2, j 1) 0 D (i, j ) t (i ) r ( j ) min D(i 1, j 1) D(i 1, j 1) D (i 2, j 1) D(i 1, j 2) -6- Weighted DTW Distance 觀察: 在音符開始時,使用者的音高不穩定 在音符後半部,使用者的音高較穩定且逼近音符音高 Weighted DTW Distance 在音符開始時,權重函數 w(j) 較小 在音符後半部,權重函數 w(j) 較大 D(i, j ) D(i 2, j 1) D(i 1, j 2) D(i, j ) w( j ) t (i ) r ( j ) min D(i 1, j 1) D(i 1, j 1) D(i 2, j 1) D(i 1, j 2) -7- DTW Paths of “Anchored Beginning” Anchored beginning end position is free to move Assumption: The speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended song. DTW table size for 8sec query = 250x180 250 = 31.25*8 375 = 250*1.5 j i -8- DTW Paths of “Anchored Anywhere” Anchored anywhere Both ends are free to move. DTW table size for 8-sec query against 3-min song = 250 x 5620 250 = 31.25*8 5620 = 31.25*180 j i -9- 2 1 3 4 5 4 0 1 5 0 1 5 6 0 6 5 1 0 6 6 5 1 0 6 0 1 5 6 0 1 0 4 5 1 2 1 3 4 2 1 1 2 6 7 1 4 7 2 8 8 2 3 4 2 3 7 8 2 -102 2 4 1 5 4 7 2 8 8 2 3 4 0 1 0 6 7 5 6 6 5 1 0 1 2 1 0 1 2 3 6 1 7 2 6 12 0 5 4 7 6 1 7 4 6 1 7 6 3 6 0 6 6 2 5 0 1 4 2 7 3 1 2 1 11 2 6 5 2 4 5 10 2 1 1 1 4 0 5 0 0 5 6 0 3 7 8 5 2 1 2 -11- Implementation Issues To save memory Use 2-column table for type-1 DTW Use 1-column table for type-2 DTW To avoid too many if-then statements Pad type-1 DTW with two-layer padding Pad type-2 DTW with one-layer padding To find a suitable path Minimizing total distance Minimizing average distance -13- Other Variants Local constraints Flexible start/ending pos. -14- DTW Path of “Match Beginning” -15- DTW Path of “Match Anywhere” -16- DTW Path of “Match Anywhere” -17- Key Transposition (1/2) Goal: Allow users’ input of different keys Method 1: Mean shift and heuristic modification t+2 t t’-1 (t’) t’+1 t-2 Mean -4 -2 0 1 2 3 4 5 DTW computation when compared to each song -19- Key Transposition (2/2) Method 2: Fixed point iteration Step 1: DTW alignment Step 2: Stop if mapping path fixed Step 3: Shift to the same mean based on the alignment Step 4: Go back to step 2. Characteristics DTW distance monotonically nonincreasing to guarantee convergence -20- Type-3 DTW: Frame to Note Alignment DP-based method for filling the table: Notes 65 62 65 64 67 Recurrent formula: D(i 1, j ) D(i, j ) | t (i) r ( j ) | min D(i 1, j 1) Frame-level Pitch vector Local constraint: Di, j Di 1, j Di 1, j 1 -24- Type-3 DTW Characteristics Mapping path Frame-based query input vs. note-based music database Note duration unused More efficient, less effective Heuristics for keytransposition -25- Type-3 DTW: Effects of Key Transposition Rough key transpos. Fine key transpos. Please refer to the online tutorial page for playback. -26-