Proc. International Symposium on Music Information Retrieval

advertisement
Music Retrieval and
Analysis
Part I: Music Retrieval
Arbee L.P. Chen
National Tsing Hua University
ISMIR’03 Tutorial III
Outline

Technologies






Architecture for Music Retrieval
Music Representations
Music query processing
Music indexing
Similarity measures
Systems and evaluation

Existing systems







Meldex
Themefinder
SEMEX
PROMS
OMRAS
System evaluation
Future research directions
Architecture for Music Retrieval
Users
Music
Query
Interface
Music
Player
Query result
Result music objects
Music
Storage
manager
Result music objects
Music
Feature
Extractor
Music objects
Music
Database
Music query
Music
Query
Processor
Music features
Music
Index
Music Representations
Media
Music_Info
key
beat
tempo
Music
info
acoustical
thematic
Acoustical
loudness
pitch
duration
brightness
bandwidth
Thematic
theme*
rhythm
melody
chord
IS-A relationship
composition relationship
Music _Wave Music _MIDI Music_AU
*
multi-valued attribute
Styles of Music Composition

Monophony


Homophony


Monophonic music has at most one note playing at any given
time; before a new note starts the previous note must have
ended
Homophonic music has at most one set of notes playing at the
same time. For any set of notes that start at the same time, no
new note or notes may begin until every note in that set has
ended
Polyphony

Polyphonic music has no such restrictions. Any note or set of
notes may begin before any previous note or set of notes has
ended
Monophony Representations

Absolute measure

Absolute pitch


Absolute duration


1 1 1 1 1 0.5 0.5 1 1
Absolute pitch and duration


C5 C5 D5 A5 G5 G5 G5 F5 G5
(C5,1)(C5,1)(D5,1)(A5,1)(G5,1)(G5,0.5)(G5,0.5)(F5,1)(G5,1)
Relative measure

Contour (in semitones)


IOI (Inter onset interval) ratio


0 +2 +7 -2 0 0 -2 +2
1 1 1 1 0.5 1 2 1
Contour and IOI ratio

(0,1)(+2,1)(+7,1)(-2,1)(0,0.5)(0,1)(-2,2)(+2,1)
Polyphony Representations

All information preservation

Keep all information of absolute pitch and duration (start_time, pitch,
duration)


Relative note representation

Record difference of start times and contour (ignore duration)


(1,0)(1,+2)(0,+7)(1,-4)…
Monophonic reduction

Select one note at every time step (main melody selection)


(1,C5,1)(2,C5,1)(3,D5,1)(3,A5,1)(4,F5,4)(5,C6,1)(6,G5,0.5)(6.5,G5,0.5)…
(C5,1)(C5,1)(A5,1)(F5,1)(C6,1)...
Homophonic reduction (chord reduction)

Select every note at every time step

(C5)(C5)(D5,A5)(F5)(C6)(G5)(G5)…
Music Representation - Theme

Theme


A small part of a musical work


Efficient retrieval
A highly semantic representation


A short tune that is repeated or developed in a piece of music
Effective retrieval
Automatic theme extraction


Exact repeating patterns
Approximate repeating patterns
Music Representation – Markov Models

Capture global information for a music
piece
 Repeating
patterns
 Sequential patterns
A lossy representation
 Good for music classification

Markov Model Representation
[Pickens and Crawford, CIKM‘02]
 Homophonic reduction
 For each chord, compute its distance with
the 24 lexical chords
 Capture statistical properties by Markov
models

 The
representation of each song is reduced
into a matrix
Markov Model Representation
(Cont.)
Chord
Markov model representation
Lexical chords
Music Query Processing

On-line methods (string matching algorithms)
 Exact string matching
 Brute-force method
 KMP algorithm
 Boyer-Moore algorithm
 Shift-Or algorithm
 Partial string matching
 Shift-Or algorithm
 Approximate string matching
 Dynamic programming
Brute-Force Method



T: A5 B5 A5 C5 A5 B5 A5 B5
P: A5 B5 A5 B5
Time complexity O(mn)
A5
B5
A5
C5
A5
B5
A5
B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
KMP Algorithm



Left-to-right scan
Failure function shift rule
O(m+n)
A5
B5
A5
C5
A5
B5
A5
B5
A5 B5 A5 B5
Failure function f(i)
0
0
1
2
A5 B5 A5 B5
Skip this step
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
Boyer-Moore Algorithm





If the pattern P is relatively long and the alphabet is
reasonably large, this algorithm is likely to be the most
efficient string matching algorithm
Right-to-left scan
Bad character shift rule
Good suffix shift rule
O(m+n)
Bad Character Shift Rule
bad character
A5
B5
A5
C5
A5
B5
A5
B5
Right to left scan
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
skip these steps
A5 B5 A5 B5
A5 B5 A5 B5
Good Suffix Shift Rule
Good suffix
A5
C5
A5
B5
A5
B5
C5
B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
Skip this step
Shift-Or Algorithm
An example of the shift-or algorithm for p=aab and s=abcaaab
T
a
a
b
a
b
c
0
0
1
1
1
0
1
1
1
E S(E) T[a] E S(E) T[b] E S(E) T[c] E S(E) T[a] E S(E) T[a] E S(E) T[a] E S(E) T[b]
a 1 0 0
a 1 1 0
b 1 1 1
0 0 1 1 0 1
1 0 1 1 1 1
1 1 0 1 1 1
E
1 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 1 0 0 0 0 0 0 0 1 1
1 1 1 1 1 1 1 0 1 1 0 0 0
Shift-Or Algorithm for Partial Matching
[Lemstrom and Perttu, ISMIR’00]
An example of the shift-or algorithm for p=aab and s=(ab)(ca)(aab)
T
a
a
b
a
b
c
0
0
1
1
1
0
1
1
1
E S(E) T[a]^T[b]
a 1 0
a 1 1
b 1 1
0
0
0
E S(E) T[c]^T[a]
E S(E) T[a]^T[a]^T[b]
0 0
1 0
1 1
0 0
0 0
1 0
0
0
1
0
0
0
E
0
0
0
Approximate Matching



In practical pattern matching applications, exact
matching is not always suitable
In the field of MIR, approximation is measured
mainly by the edit distance: the minimal number
of local edit operations needed to transform one
music object into another
Dynamic programming method serves this
purpose
Edit Distance

Unit cost edit distance
 W(ab)=1,
ab (Replacement)
 W(a   )=W( b)=1 (Deletion and Insertion)

Non-unit cost edit distance (Content-sensitive)
 The
costs of replacement, deletion and insertion can
be any values which depend on the cost function
 E.g., W(mn)=0.2 and W(a  )=0.8
Dynamic Programming Method



Given any two strings
S1=abac, S2=aaccb
The edit distance
evaluated by DP
The edit distance is 3
abaccb
aaccb
(1 deletion, 2 insertions)
a
b
a
c
0
1
2
3
4
a
1
0
1
2
3
a
2
1
1
1
2
c
c
3
4
2
3
2
3
2
3
1
2
b
5
4
3
4
3
Music Indexing
Tree-based index (Suffix tree)
 List-based index (1D-list)
 N-gram index
 Indexing Markov models?

Tree-Based Index


[Chen, et al., ICME‘00]
Music objects are coded as strings of music
segments
 Four
segment types to model music contour
 Pitch and duration are considered

Index structures
 Augmented

suffix tree
Both incipit/partial and exact/approximate
matching can be handled
Tree-Based Index (Cont.)
Four segment types
type A
note number
67
type B
(A, 1, +1)
65
type C
64
62
type D
60
(C, 1, +1)
(C, 1, +2)
(B, 3, -3)
(D, 3, -3)
(C, 1, +2)
(B, 1, -2)
beat
Tree-Based Index (Cont.)
root
root
A
B
C
A
A<1,1>
3
C
2
B
C
1
$
4
A
$
5
N1
A
C
A
B
B
$
C<7,8>
C<1,3>
N2
A
A<3,4>
A<7,8>
3
$
B
2
(a)
(b)
$
1
The suffix tree of the string S=“ABCAB”
(a) An example suffix tree
(b) A 1-D augmented suffix tree
List-Based Index


[Liu, Hsu and Chen, ICMCS‘99]
Music objects are coded as melody strings
 “so-mi-mi-fa-re-re-do-re-mi-fa-so-so-so”


Melody strings are organized as linked lists
Both incipit/partial and exact/approximate
matching can be handled
 Exact
link
link, insertion link, dropout link, transposition
List-Based Index (Cont.)
do
re
mi
1:7
1:5
1:2
1:3
2:1
1:6
1:3
1:8
1:9
2:9
1:8
1:9
1:13
2:5
2:2
2:5
2:2
2:6
2:3
2:10
2:6
2:10
2:6
2:12
2:4
2:11
2:12
2:11
2:12
do
re
mi
1:7
1:5
1:2
1:11
2:1
1:6
2:7
1:12
2:9
2:8
do
re
mi
fa
so la
1:7
1:5
1:2
1:4
1:1
2:1
1:6
1:3
1:10
2:9
1:8
1:9
2:5
2:2
2:10
2:11
(a)
si
start
(b)
end
start
(c)
end
N-Gram Index





A widely used technique in music databases
Target strings are cut into index terms by a
sliding window with length N
Index can be implemented by various
methods, e.g., inverted file
Queries are also cut into index terms with
length N
Searching and joining are then performed
N-Gram Index [Doraisamy and
Ruger, ISMIR’02]
Query=bbca
Cut into 2-grams
S=aabbcaab
bb, ca
2-Gram
Position
aa
1,6
ab
2,7
bb
3
bc
4
ca
5
Position: 3
Position: 5
Join
Inverted file
The substring is found from position 3 to position 6
Similarity Measures

The effectiveness of MIR depends on the
similarity measure
 Edit
distance (Suitable for short queries)
 Difference between two probability matrices
 Note shift distance [Typke, et al., ISMIR‘03]
Probability Matrix Distance
S1: CCCAABCB
A
B
C
A
0.5
0.5
0
B
0
0
1
C
0.25
0.25
0.5
A
B
C
A
0.7
0.3
0
B
0
0
1
C
0.3
0.3
0.3
S2: CCAAABCB
qi ( x)
D(q || d )   (  qi ( x) log
)
di
(
x
)
qiq , did xX
Kullback-Liebler (KL) divergence:
The value is 0 when two matrices are the same
q: Query probability matrix
d: Data probability matrix
i: row
x: column
D(S1||S2)=0.1092
Probability Matrix Distance (Cont.)

Ineffective for MIR with short queries
 0-entries
in the query model mean unknown
values?
 0-entries in the corpus model means facts?

Performance comparison with string
matching needed
Note Shift Distance

Sum of the two dimensional distance
between the notes of the query and the
notes of the answer
0
0
0
0
0
Music Retrieval Systems
Music Representations
 Music Query processing
 Special features

Meldex
[McNab, et al., D-Lib Magazine‘97]
 "melodic contour" or "pitch profile“

 113531


(R:Repeat, U:Up, D:Down)
Approximate string matching
 Dynamic

RUUDD
programming
Query by humming
Themefinder
[Kornstadt, Computing in Musicology‘98]
 Select themes manually
 Allow different query types

 Pitch
 Interval
 Contour

Provide exact matching only
SEMEX



[Lemstrom and Perttu, ISMIR’00]
The pattern is monophonic; the musical source
is polyphonic
Finding all positions of S (source) that have an
occurrence of p
 p=bca
 S=<a,


b, c><a, b><b, c><a, b>
Shift-or algorithm
No similarity function
PROMS




[Clausen, et al., ISMIR‘00]
Representation by pitch and onset time (ignore
duration)
Index by inverted file
Fault-tolerant music search
 Allow
missing notes
 Allow fuzzy notes

Query=(b, (d or c), a, b)
OMRAS

[Dovey, ISMIR‘01]
Searching in a “piano roll” model
Gaps based dynamic programming

Example (gap = 2):



Data


Piano roll
T0=<64,72,76>, T1=<60>, T2=<59,67,79>, T3=<55,63>,
T4=<55,67,79>
Query

S0=<59,67>, S1=<55,67>
T0 T1 T2 T3 T4
S0
0
0
2
1
0
S1
0
0
0
0
2
System Evaluation

Traditional measures of effectiveness are
precision and recall
precision 
number of retrieved references that are relevant
number of references that are retrieved
number of retrieved references that are relevant
recall 
number of relevant references
The Recall-Precision Curve
A Platform for Evaluating MIR
Systems

Evaluation of various music retrieval approaches
 Efficiency
 response time
 Effectiveness
 recall-precision curve

The Ultima project builds such a platform [Hsu,
Chen and Chen, CIKM’01]
 Same
data set and query set for various approaches
 Compare recall-precision curves
The Ultima Project





Data store
Query generation
module
Query processing
module
Result
summarization
module
Report module
Mediator
1D-List
APS
APM
Query Processing Module
Report Module
Summarization Module
Query Generation Module
Table
SMF
Data Store (MS Access)
Mediator

to the Internet
Future Research Directions





Music Retrieval based on music structure
Music retrieval based on user’s perceptiveness
Similarity measure for polyphonic music
A novel index structure for polyphony
Fair evaluation method
Music Retrieval and
Analysis
Part II: Music Analysis
Outline




Music Segmentation and Structure Analysis
 Local Boundary Detection
 Repeating Pattern Discovery
 Phrase Extraction
Music Classification
Music Recommendation Systems
Future Research Directions
Local Boundary Detection



[Cambouropoulos, ICMC’01]
Segment music by local discontinuities between notes
Calculate boundary strength values for each interval of a
melodic surface, i.e., pitch, IOI, and rest, according to
the strength of local discontinuities
Local Boundary Detection
(Cont.)

A music object m has a parametric profile Pk, which is represented
as a sequence of n intervals
 Pk = [x1, x2, …, xn] where k{pitch, IOI, rest}




Pitch interval measured in semitones
IOI and rest intervals measured in milliseconds or numerical
duration values
IOI (Inter onset interval)
 The amount of time between the onset of one note and the
onset of the next note
Rest
 The amount of time between the offset of one note and the
onset of the next note
IOI
pitch
rest
Local Boundary Detection
(Cont.)

The degree of change r between two successive interval
values xi and xi+1 is:
| xi  xi 1 |
 r
i ,i 1 
xi  xi 1
 ri ,i 1  0
iff xi  xi 1  0 and xi , xi 1  0
iff xi  xi 1  0

The strength of the boundary si for interval xi is:
 si  xi  ( ri 1,i  ri ,i 1 )

Overall local boundary strength based on the three
intervals
 wp*si(p)+wd*si(d)+wr*si(r)
Local Boundary Detection
(Cont.)
Repeating Pattern Discovery



A repeating pattern in music data is defined as a
sequence of notes which appears more than once in a
music object
The themes or motives are typical kinds of repeating
patterns
Exact repeating patterns [Hsu, Liu and Chen, TMM’01]


By the string-join operator
Approximate repeating patterns [Lartillot, ISMIR’03]

By detecting when their successive notes are sufficiently close
and their borders contrast sufficiently with the outer environment
Exact Repeating Pattern
Discovery
{ 12(2), 23(3), 34(4), 45(3), 56(2) }
123(2)
234(3)
345(3)
{ 1234(2), 2345(2),
456(2)
3456(2) }
Nontrivial repeating patterns
S = 1234A2345B3456C123456
{ }
Approximate Repeating Pattern
Discovery

Stpe1
 Find
all note pairs (n1, n2) which satisfy the pitch
distance constraint

Step 2
 Group the similar note pairs
 D((n1, n2), (n1’, n2’))  

Step 3
 Merge
the adjacent note pairs for approximate
repeating patterns


(n1, n2)(n2,n3) : (n1,n2,n3)
(n1’,n2’)(n2’,n3’) : (n1’,n2’,n3’)
Approximate Repeating Pattern
Discovery (Cont.)
n1, n2
p1, p2
o1, o2
 p = p2 – p1
 o = o2 – o1
n1’, n2’
 p’ = p’2 – p’1
p1’, p2’
o1’, o2’  o’ = o’2 – o’1
pi is the pitch value of ni
oi is the onset time of ni
D((n1, n2), (n1’, n2’)) = ( | p -  p’| + 1 ) * (max { o/  o’,  o’/  o })
0.7
Phrase Extraction

Two features used for phrase extraction
 Duration

and rest
Melodic Shapes [Huron, Computing in
Musicology’95]
 Statistics



Information in Western Folksongs
The most common length of a phrase is 8 notes
Half of all phrases are between 7 and 9 notes in length
Three-quarters of all phrases are between 6 and 10 notes in
length
Phrase Extraction (Cont.)
A: the pitch value of the first note in the target phrase
B: the pitch value of the last note in the target phrase
C: the average pitch value of the remaining notes in the target
phrase
Contour
Type
Number of
Phrases
Percentage
Convex
13926
38.6%
A<C  B<C
Descending
10376
28.8%
A>C>B
Ascending
6983
19.4%
ACB
Concave
3496
9.7%
A>C  B>C
Others
1294
3.5%
Arch Shape
Definition
Phrase Extraction (Cont.)



Identify the positions of all the terminative notes
Extract the music pieces according to the terminative
notes
Select the candidate music pieces for decomposition
based on the length information
 If the length  12, the music piece is marked as a
phrase
 If the length > 12, decompose the music piece into
phrases
 convex > descending > ascending > concave
Phrase Extraction (Cont.)
x
y
z
64 62 60 57 55 67 67 69 67 69 64 62
64 62 60 57 55 67 67 69 67 69 64 62 64 62 60 57 55 67 64 64 62 60 62 64 62 62 67 67
Convex?
No
Order
The Length of the
Prefix Fragment
The Pitch of
the First Note
The Pitches of the Remaining Notes
The Pitch of the
Last Note
1
6
64
62, 60, 57, 55
67
2
7
64
62, 60, 57, 55, 67
67
3
8
64
62, 60, 57, 55, 67, 67
69
4
9
64
62, 60, 57, 55, 67, 67, 69
67
62, 60, 57, 55, 67, 67, 69, 67
69
62, 60, 57, 55, 67, 67, 69, 67, 69
64
62, 60, 57, 55, 67, 67, 69, 67, 69, 64
62
Descending?5
10
64
Length = 12, 6A = 64, B =1162, C = 63.7 64
7
12
64
Music Classification



After music segmentation, different kinds of
music units can be extracted from music objects,
such as repeating patterns and phrases
Different kinds of music units may have different
semantics in musicology
These extracted music units can be used in
music classification, retrieval, and analysis
Music Recommendation Systems




[Chen and Chen, CIKM’01]
The results of music classification can be used
for music-related services
By analyzing the user access histories, we can
discover which music classes the users may be
interested in and which users belonging to the
same group
By using different kinds of recommendation
mechanisms, we can recommend the users the
music objects
Architecture
Users
Track
Selector
Interface
Music
Object
representative track
Feature
Extractor
Recommendation
Module
CB Method
Profile
Manager
feature point
Classifier
COL Method
STA Method
Database
Music
Objects
Feature
Points
Access
Histories
Music
Groups
User
Groups
A polyphonic music object
• one melody track
• other accompaniment
tracks
Recommendation Mechanisms



Content-based filtering approach
 Similarity between music objects and user profiles
 Recommend the music objects that belong to the music groups
the user is recently interested in
Collaborative filtering approach
 Similarity between user profiles
 Provide unexpected recommendations to the users in the same
user group
Statistical approach
 Recommend “hot” music objects
Future Research Directions




Polyphonic Music Segmentation
Efficient Approximate Repeating Pattern Discovery
Representation and Similarity Measure for Musical
Structures
Musical Style/Form Detection
References





[Camb01] Cambouropoulos, E., “The Local Boundary Detection Model
(LBDM) and its Application in the Study of Expressive Timing,” Proc.
International Computer Music Conference, 2001.
[Chen00] Chen, A.L.P., M. Chang, J. Chen, J.L. Hsu, C.H. Hsu, and S.Y.S.
Hua, "Query by Music Segments: An Efficient Approach for Song Retrieval,"
Proc. IEEE International Conference on Multimedia and Expo, 2000.
[Chen01] Chen, H.C. and A.L.P. Chen, "A Music Recommendation System
Based on Music Data Grouping and User Interests grouping and user
interests," Proc. ACM International Conference on Information and
Knowledge Management, 2001
[Clau00] Clausen, M., R. Engelbrecht, D. Meyer, and J. Schmitz, “PROMS:
A Web-based Tool for Searching in Polyphonic Music,” Proc. International
Symposium on Music Information Retrieval, 2000.
[Dove01] Dovey, M.J., “A Technique for Regular Expression Style Searching
in Polyphonic Music,” Proc. International Symposium on Music Information
Retrieval, 2001.
References (Cont.)





[Korn98] Kornstadt, A., “Themefinder: A Web-based Melodic Search Tool,”
Computing in Musicology, 11:231-236, 1998.
[Dora02] Doraisamy, S. and S.M. Ruger, “A comparative and fault-tolerance
study of the use of n-grams with polyphonic music,” Proc. International
Symposium on Music Information Retrieval, 2002.
[Hsu01] Hsu, J.L., C.C. Liu and A.L.P. Chen, "Discovering Nontrivial
Repeating Patterns in Music Data," IEEE Transactions on Multimedia, Vol. 3,
No. 3, 2001.
[Hsu02] Hsu, J.L., A.L.P. Chen and H.C. Chen, "The Effectiveness Study of
Various Music Information Retrieval Approaches," Proc. ACM International
Conference on Information and Knowledge Management, 2002.
[Huro95] Huron, D., “The Melodic Arch in Western Folksongs,” Computing in
Musicology, Volume 10, 1995.
References (Cont.)





[Lart03] Lartillot, O., “Discovering musical patterns through perceptive
heuristics,” Proc. International Symposium on Music Information Retrieval,
2003.
[Lems00]Lemstrom, K. and Sami Perttu, “SEMEX-An Efficient Music
Retrieval Prototype,” Proc. International Symposium on Music Information
Retrieval, 2000.
[Liu99] Liu, C.C., J.L. Hsu, and A.L.P. Chen, "An Approximate String
Matching Algorithm for Content-Based Music Data Retrieval," Proc. IEEE
International Conference on Multimedia Computing and Systems, 1999.
[Mcna97] McNab, R.J., L.A. Smith, D. Bainbridge, and I.H. Witten, “The
New Zealand Digital Library: MELody inDEX,” D-Lib Magazine, 1997.
[Pick02] Pickens, J., and T. Crawford, “Harmonic Models for Polyphonic
Music Retrieval,” Proc. ACM Conference on Information and Knowledge
Management, 2002.
Download