Similarity Algorithms for Music Plagiarism

advertisement
Similarity Algorithms
for Music Plagiarism
Daniel Müllensiefen
Department of Psychology
Goldsmiths
University of London
Structure
1
2
3
4
5
Introduction
Method and Examples
Algorithms
Empirical Study
Summary – Future Research
Introduction: The Background
 Huge
Public Interest:
Importance for the pop industry, interesting
emphasis on tunes (melodic plagiarism)
 Ambigity surrounding UK law in defining a
substantail part


Lack of Research, Exceptions:
 Timothy English: Sounds Like Teen Spirit, (2008)
 Stan Soocher: They Fought The Law, (1999)
 Charles Cronin: Concepts of Melodic Similarity in
Music-Copyright, (1998)
Introduction:
No hard and fast rules!



The 2-second rule
The 1-bar rule
The 5-notes rule
It is more complicated than that!
What has been taken needs to be at least a
substantial part of the copyright work
(Lord Millet, Designers Guild v. Williams [2000] 1 WLR 2426)
Introduction: Müllensiefen &
Pendzich (2009)

The first quantitative study on music plagiarism

Questions:
Can computer algorithms predict plagiarism
incidents from the melodic similarity of tunes?
 What is the predictive power of these algorithms
when superimposed on documented case law?



What is the frame of reference
(directionality of comparisons)?
How can prior musical knowledge be taken
into account?
Method: Outline
 Selection
of 20 cases* from 1970 to 2005 – focus
on melodic aspects of copyright infringement
 Transcription to monophonic MIDI files
 Analysis of judges' written opinions
 Reduction of court decisions to two categories


“pro plaintiff” = melodic plagiarism
“contra plaintiff” = no infringement
 Use
of context: Corpus of 14,063 pop songs
representing pop music history from 1950-2006
*from UCLA/Columbia copyright infringement database: http://cip.law.ucla.edu/
Example 1: Bright Tunes v.
Harrisongs (1976, 420 F. Supp)
 The

Chiffons He‘s So Fine, 1963
No. 1 in US, UK highest position 11
 George
Harrison, My Sweet Lord
Published in 1971

No.-1-Hit in US, UK & (West-)Germany
Example 2: Selle v. Gibb (1983, 567
F.Supp 1173)
 Ronald
 Bee
Selle, “Let It End” Unreleased (1975)
Gees, “How Deep Is Your Love” (1977)
Algorithms
 Statistically
Informed Similarity Algorithms:
Importance of frequency of melodic
elements for similarity assessment
 Inspired from computational linguistics
(Baayen, 2001) and text processing (Manning &
Schütze, 1999; Jurafsky & Martin, 2000)
 Conceptual


Components:
m-types (aka n-grams) as melodic elements
Frequency counts: Type frequency (TF) and
Inverted Document Frequency (IDF)
Melodic elements: m-types
Word Type t
Frequency f(t),
Melodic Type τ (pitch
interval, length 2)
Frequency f(τ),
Twinkle
2
0, +7
1
little
1
+7, 0
1
star
1
0, +2
1
How
1
+2, 0
1
I
1
0, -2
3
wonder
1
-2, -2
1
what
1
-2, 0
2
you
1
0, -1
1
are
1
-1, 0
1
Type- and Inverted
Document Frequencies
C Corpus of melodies
m melody
τ
TF(m, " ) =
Melodic type
T # different melodic types
f m (" )
#
$ f (" )
m
i=1
|m:τ ∈ m| # melodies containing τ
Melodic Type τ
(pitch interval,
length 2)
Frequency f(τ)
!
0, +7
1
+7, 0
1
0, +2
1
+2, 0
1
0, -2
3
-2, -2
1
-2, 0
2
0, -1
1
-1, 0
1
!
i
(
IDFC (" ) = log
C
m:" # m
)
Type- and Inverted
Document Frequencies
C Corpus of melodies
m melody
TF(m, " ) =
τ
Melodic type
τ
T # different melodic types
f m (" )
#
$ f (" )
m
i=1
|m:τ ∈ m| # melodies containing τ
Melodic Type τ
(pitch interval,
length 2)
Frequency f(τ)
TF(m, !
τ)
0, +7
1
0.083
+7, 0
1
0.083
0, +2
1
0.083
+2, 0
1
0.083
0, -2
3
0.25
-2, -2
1
0.083
-2, 0
2
0.167
0, -1
1
0.083
-1, 0
1
0.083
!
i
(
IDFC (" ) = log
C
m:" # m
)
Type- and Inverted
Document Frequencies
C Corpus of melodies
m melody
TF(m, " ) =
τ
Melodic type
τ
T # different melodic types
f m (" )
#
$ f (" )
m
(
IDFC (" ) = log
i
i=1
|m:τ ∈ m| # melodies containing τ
Melodic Type τ
(pitch interval,
length 2)
Frequency f(τ)
TF(m, !
τ)
IDFC(τ)
0, +7
1
0.083
1.57
+7, 0
1
0.083
1.36
0, +2
1
0.083
0.23
+2, 0
1
0.083
0.28
0, -2
3
0.25
0.16
-2, -2
1
0.083
0.19
-2, 0
2
0.167
0.22
0, -1
1
0.083
0.51
-1, 0
1
0.083
0.74
!
C
m:" # m
)
Type- and Inverted
Document Frequencies
C Corpus of melodies
m melody
TF(m, " ) =
τ
Melodic type
τ
T # different melodic types
f m (" )
#
$ f (" )
m
(
IDFC (" ) = log
C
m:" # m
i
i=1
|m:τ ∈ m| # melodies containing τ
Melodic Type τ
(pitch interval,
length 2)
Frequency f(τ)
TF(m, !
τ)
IDFC(τ)
TFIDFm,C(τ)
0, +7
1
0.083
1.57
0.13031
+7, 0
1
0.083
1.36
0.11288
0, +2
1
0.083
0.23
0.01909
+2, 0
1
0.083
0.28
0.02324
0, -2
3
0.25
0.16
0.04
-2, -2
1
0.083
0.19
0.01577
-2, 0
2
0.167
0.22
0.03674
0, -1
1
0.083
0.51
0.04233
-1, 0
1
0.083
0.74
0.06142
!
)
TF-IDF Correlation
" C (s,t) =
' TFIDF (# )$TFIDF (# )
' (TFIDF (# )) $ ' (TFIDF
t, C
2
# % sn &t n
!
s, C
# % sn &t n
s, C
# % sn &t n
t, C (# ))
2
Feature-based Similarity
Ratio Model (Tversky, 1977): Similarity σ(s,t) related to


# features in s and t have common
salience of features f()
" (s,t) =
!
f (sn # tn )
,$ ,% & 0
f (sn # tn )+ $f (sn \ tn )+ %f (tn \ sn )

features => m-types
salience => IDF and TF
different values of α, β to change frame of reference

Variable m-type lengths


(n=1,…,4),
entropy-weighted average
Feature-based Similarity
Tversky.equal measure (with α = β = 1)
" (s,t) =
&
IDF (# ) + &
# $ sn %t n
&
# $ sn %t n
C
# $ sn \ t n
IDFC (# )
IDFC (# ) + &# $ t
n \ sn
IDFC (# )
Tversky.plaintiff.only
measure (with α = 1, β = 0)
!
" plaintiff.only (s,t) =
&
IDFC (# )
# $ sn %t n
&
# $ sn %tn
IDFC (# ) + &# $ s
n
\ tn
IDFC (# )
Tversky.defendant.only measure (with α = 0, β = 1)
!
" defendant .only (t,s) =
&
# $ sn %t n
&
# $ sn %t n
!
Tversky.weighted measure with
IDFC (# )
IDFC (# ) + &# $ t
n
\ sn
IDFC (# )
&
"=
&
# $ sn %t n
# $ sn
!
TFs(# )
and
TFs(# )
&
"=
&
# $ sn %t n
# $ tn
!
TFt (# )
TFt (# )
Evaluation

Ground Truth:
20 cases with pro/contra decision (7/13)

Evaluation metrics


Accuracy (% correct at optimal cut-off on
similarity scale)
AUC (Area Under receiver operating
characteristic Curve)
Evaluation
1.0
0.9
0.8
0.7
0.6
0.5
0.4
Accuracy
AUC
0.3
0.2
0.1
0.0
t.
.
f
m nen rel. ual ntif ant
ed
t
o
i
r
q
h
C
d
it
ko
ig
co v.e .pla fen
k
d
m
e
T
E Su
U
F
Tv .de v.w
I- D
T
Tv
TF
s
Di
Evaluation: ROC Curves
A Qualitative Look
Ronald Selle, “Let It End”
Bee Gees, “How Deep Is Your Love”
Observations:

Decision sometimes based on ‘characteristic motives’ (incl. rhythm)

High-level form can be important (e.g. call-and-response structure)

Frame of reference can be different
Summary
Court decisions can be related closely to
melodic similarity
 Plaintiff’s song is often the frame of
reference when determining the question of
substantial part (“It depends upon its importance to the

copyright work. It does not depend upon its importance to the
defendants” per Lord Millet)
Statistical information about commonness
of melodic elements is important

Future Research
 Conduct
analogous studies with cases from
the UK and Germany and utilize additional
US cases
 Include rhythm in m-types
 Use higher level features of melodies in
addition to m-types
 Conduct listening experiment to determine
cognitive validity
Similarity algorithms
for music plagiarism
Daniel Müllensiefen
Department of Psychology
Goldsmiths,
University of London
Download