DTW for QBSH

advertisement
DTW for QBSH
J.-S Roger Jang (張智星)
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
Dynamic Time Warping (DTW)
Goal:
Allows comparison of high tolerance to tempo
variation
Characteristics:
Robust for irregular tempo variations
Trial-and-error for dealing with key
transposition
Expensive in computation
Does not conform to triangle inequality
Some indexing algorithms do exist
-2-
Dynamic Time Warping: Type 1
t: input pitch vector (8 sec)
r: reference pitch vector
Local paths: 27-45-63 degrees
j
r(j)
D(i, j )
3-step formula for DTW:
r(j-1)
1. D(i, j ) : DT W dist ancebet ween t (1 : i ) and r (1 : j )
2. Recurrentformulafor D (i, j )
 D(i  1, j  2)


D (i, j ) | t (i )  r ( j ) |  min D (i  1, j  1) 
 D(i  2, j  1)


D (1,1) | t (1)  r (1) |
min D (m, j ) for anchoredbeginning

j
3. Answer  
D
(
m
,
n
)
for
anchoredbeginningand anchoredend

t(i-1) t(i)
i
-3-
Dynamic Time Warping: Type 2
j
r(j)
t: input pitch vector (8 sec)
r: reference pitch vector
Local paths: 0-45-90 degrees
D(i, j )
DTW recurrence:
r(j-1)
D(i, j ) | t (i )  r ( j ) | 
 D(i, j  1) 


min D(i  1, j  1)
 D(i  1, j ) 


Min distance min D(m, j )
j
t(i-1) t(i)
i
-4-
Local Path Constraints
Type 1:
Type 2:
27-45-63 local paths
0-45-90 local paths
Di, j 
Di 1, j 
Di  2, j 1
Di  1, j  1
Di  1, j  1
Di 1, j  2
Di, j 
Di, j 1
D(i, j )  t (i )  r ( j ) 
D(i, j )  t (i )  r ( j ) 
 D(i  1, j  2)


min D(i  1, j  1) 
 D(i  2, j  1)


 D(i, j  1) 


min D(i  1, j  1)
 D(i  1, j ) 


-5-
Path Penalty
Path penalty
Small/no penalty for
45-degree path
Large penalty for
paths deviated from
45-degree
D(i, j )

 D (i  1, j  2)    D(i  2, j  1) 0 


D (i, j )  t (i )  r ( j )  min D(i  1, j  1) 
D(i  1, j  1)
 D (i  2, j  1)   


D(i  1, j  2)
-6-
Weighted DTW Distance
觀察:
在音符開始時,使用者的音高不穩定
在音符後半部,使用者的音高較穩定且逼近音符音高
Weighted DTW Distance
在音符開始時,權重函數 w(j) 較小
在音符後半部,權重函數 w(j) 較大
D(i, j )
D(i  2, j  1)
 D(i  1, j  2)   


D(i, j )  w( j ) t (i )  r ( j )  min D(i  1, j  1) 
D(i  1, j  1)
 D(i  2, j  1)   


D(i  1, j  2)
-7-
DTW Paths of “Anchored Beginning”
 Anchored beginning 
end position is free
to move
 Assumption: The speed
of a user’s acoustic
input falls within 1/2
and 2 times of that of
the intended song.
 DTW table size for 8sec query = 250x180
250 = 31.25*8
375 = 250*1.5
j
i
-8-
DTW Paths of “Anchored Anywhere”
Anchored anywhere
 Both ends are
free to move.
DTW table size for
8-sec query against
3-min song = 250 x
5620
250 = 31.25*8
5620 = 31.25*180
j
i
-9-
2
1
3
4
5
4
0
1
5
0
1
5
6
0
6
5
1
0
6
6
5
1
0
6
0
1
5
6
0
1
0
4
5
1
2
1
3
4
2
1 1
2
6
7
1
4
7
2
8
8
2
3
4
2
3
7

8
2
-102
2
4 
1
5
4
7
2
8
8
2
3
4

0


1
0
6
7
5
6
6
5
1
0
1
2
1
0
1
2

3
6
1
7
2
6
12
0
5
4

7
6
1
7
4
6
1
7
6
3
6
0
6
6
2
5
0
1
4
2
7
3
1
2
1 11
2
6
5
2
4
5
10
2

1
1
1
4
0
5
0

0
5
6
0
3
7


8
5
2
1


2
-11-
Implementation Issues
To save memory
Use 2-column table for type-1 DTW
Use 1-column table for type-2 DTW
To avoid too many if-then statements
Pad type-1 DTW with two-layer padding
Pad type-2 DTW with one-layer padding
To find a suitable path
Minimizing total distance
Minimizing average distance
-13-
Other Variants
Local constraints
Flexible
start/ending pos.
-14-
DTW Path of “Match Beginning”
-15-
DTW Path of “Match Anywhere”
-16-
DTW Path of “Match Anywhere”
-17-
Key Transposition (1/2)
Goal:
Allow users’ input of different keys
Method 1:
Mean shift and heuristic modification
t+2
t t’-1 (t’) t’+1
t-2
Mean
-4
-2
0
1
2
3
4
5 DTW computation when compared to each song
-19-
Key Transposition (2/2)
Method 2: Fixed point iteration
Step 1: DTW alignment
Step 2: Stop if mapping path fixed
Step 3: Shift to the same mean based on
the alignment
Step 4: Go back to step 2.
Characteristics
DTW distance monotonically nonincreasing to guarantee convergence
-20-
Type-3 DTW:
Frame to Note Alignment
DP-based method for filling the table:
Notes
65
62
65
64
67
Recurrent formula:
 D(i  1, j )
D(i, j ) | t (i)  r ( j ) |  min
D(i  1, j  1)
Frame-level
Pitch vector
Local constraint:
Di, j 
Di 1, j 
Di  1, j  1
-24-
Type-3 DTW
Characteristics
Mapping path
Frame-based query
input vs. note-based
music database
Note duration unused
More efficient, less
effective
Heuristics for keytransposition
-25-
Type-3 DTW:
Effects of Key Transposition
Rough key transpos.
Fine key transpos.
Please refer to the online tutorial page for playback.
-26-
Download