Computational Modelling of Music Cognition Geraint A. Wiggins

advertisement
Computational
Modelling of
Music Cognition
Geraint A. Wiggins
Centre for Cognition, Computation and Culture
Goldsmiths, University of London
Overview
•
•
•
•
•
What has cognition got to do with music?
What is computational modelling of cognition?
How does one do it?
What are its limitations?
What can it tell us about music? (Some examples)
Music as a cognitive
phenomenon
•
•
•
Music, as an artefact, is made up of many things
•
•
•
•
•
•
•
art
culture
emotion
creativity
craft/skill
beauty
etc
Primarily, however, it is a psychological construct
Music doesn’t happen unless there is a human mind involved
Feeling the beat
Feeling the beat
Feeling the beat
Feeling the beat
Feeling the beat
Feeling the beat
Feeling the beat
Feeling the beat
Feeling the beat
•
This demonstration shows that human listeners tend to hear rhythmic structure
in sound...
...even when it isn’t there
•
When we know it isn’t there, we can manipulate our own perception, to hear
either twos or threes
The Necker Cube
The Necker Cube
Name that tune
Name that tune
Name that tune
Name that tune
•
What is a “melody”?
Name that tune
•
What is a “melody”?
Name that tune
•
What is a “melody”?
7-8 semitones
Name that tune
•
What is a “melody”?
7-8 semitones
Music as a cognitive
phenomenon
•
Milton Babbit (1965) proposed three different ways of looking at music:
Music as a cognitive
phenomenon
•
Milton Babbit (1965) proposed three different ways of looking at music:
Auditory
Acoustic
Graphemic
Music as a cognitive
phenomenon
Milton Babbit (1965) proposed three different ways of looking at music:
ing
pe
rfo
rm
en
w
re-
Acoustic
g
din
rea
g
re- ritin
sco
sco
ing
Auditory
list
•
playback
recording
Graphemic
Music as a cognitive
phenomenon
Milton Babbit (1965) proposed three different ways of looking at music:
ing
pe
rfo
rm
en
w
re-
Acoustic
g
din
rea
g
re- ritin
sco
sco
ing
Auditory
list
•
playback
recording
Graphemic
Music as a cognitive
phenomenon
Milton Babbit (1965) proposed three different ways of looking at music:
ing
rfo
rm
en
pe
g
din
rea
g
re- ritin
sco
w
re-
Acoustic
sco
ing
Auditory
list
•
MUSIC
playback
recording
Graphemic
Music as a cognitive
phenomenon
Milton Babbit (1965) proposed three different ways of looking at music:
ing
rfo
rm
en
pe
g
din
rea
g
re- ritin
sco
w
re-
Acoustic
sco
ing
Auditory
list
•
MUSIC
playback
recording
Graphemic
What is computational
modelling of cognition?
•
•
•
It is difficult to study minds
•
•
•
•
•
you can’t see them
you can’t stick electrodes in them
their relationship with brains is completely unclear
it is unethical to “mess about” with them
etc
Before the advent of computers, psychologists had two means of study:
•
•
look at what happened when things went wrong
make predictions from theory about what would happen in certain precise
circumstances (hypotheses), and test them (experiments)
This is very time-consuming (decades, not hours), error-prone, and (in the
first case) dependent on chance
What is computational
modelling of cognition?
•
•
With computers, however, new things become possible
•
We can also make predictions by computer which can then be tested in
experiments with humans
•
•
This can be much faster than the human-driven approach
We can write computer programs which embody theories and then test
them to destruction (ethically!)
It is more objective than the human-driven approach (so long as the
program is written objectively)
What’s the point?
•
This is really the only (ethical) way to understand how a cognitive
phenomenon actually works
•
•
•
duplicate it in an artificial system and test that to destruction
if it matches human behaviour in all circumstances, it is a good model
If you can write a program which embodies your theory, then your theory
is fully worked through (a Good Thing)
How do we build a cognitive
model?
How do we build a cognitive
model?
•
Apply reductionist methodology!
•
•
•
•
•
accept that most phenomena are too complex to understand all at once
identify part(s) of the phenomenon that are (as) separable (as possible)
be careful to use stimuli (music) that do not go beyond these boundaries
remember that the resulting model is probably an oversimplification
when you have understood the parts of the phenomenon, put them together,
study the interactions between them, and test them in concert
How do we build a cognitive
model?
•
•
Apply reductionist methodology!
•
•
•
•
•
accept that most phenomena are too complex to understand all at once
identify part(s) of the phenomenon that are (as) separable (as possible)
be careful to use stimuli (music) that do not go beyond these boundaries
remember that the resulting model is probably an oversimplification
when you have understood the parts of the phenomenon, put them together,
study the interactions between them, and test them in concert
This is quite different from the holistic view usually taken in the
humanities, but it is not incompatible
How do we build a cognitive
model?
•
Apply reductionist methodology!
•
•
•
•
•
accept that most phenomena are too complex to understand all at once
identify part(s) of the phenomenon that are (as) separable (as possible)
be careful to use stimuli (music) that do not go beyond these boundaries
remember that the resulting model is probably an oversimplification
when you have understood the parts of the phenomenon, put them together,
study the interactions between them, and test them in concert
•
This is quite different from the holistic view usually taken in the
humanities, but it is not incompatible
•
Human (musical) behaviour is at the start and the end of this process:
•
•
theories behind the models come from observation of musical behaviour
results from models are tested against musical behaviour
What are the limitations of
cognitive modelling?
What are the limitations of
cognitive modelling?
•
A model is only as good as
•
•
•
•
the theory it embodies
the computational implementation
the input data
the input and output data representation
What are the limitations of
cognitive modelling?
•
•
A model is only as good as
•
•
•
•
the theory it embodies
the computational implementation
the input data
the input and output data representation
We must always question and test (and re-test) results because of these
potential sources of error
What are the limitations of
cognitive modelling?
What are the limitations of
cognitive modelling?
•
We can only take one small step at a time
•
this science is in its infancy: we must not rush ahead and make mistakes
What are the limitations of
cognitive modelling?
•
•
We can only take one small step at a time
•
this science is in its infancy: we must not rush ahead and make mistakes
Therefore, we have to be satisfied with small, focused, isolated results
•
we look at how a given aspect of something changes, given that everything else
stays the same – an artificial situation
What are the limitations of
cognitive modelling?
•
•
•
We can only take one small step at a time
•
this science is in its infancy: we must not rush ahead and make mistakes
Therefore, we have to be satisfied with small, focused, isolated results
•
we look at how a given aspect of something changes, given that everything else
stays the same – an artificial situation
The results are only ever approximations
•
we continue to refine models as our understanding improves
What about music?
What about music?
•
If music is (at least originally) a psychological phenomenon, then there are
probably interesting things to learn about it by treating it as such
What about music?
•
If music is (at least originally) a psychological phenomenon, then there are
probably interesting things to learn about it by treating it as such
•
not least: WHY is it the way it is?
What are the requirements
of a cognitive model?
What are the requirements
of a cognitive model?
•
•
We must be careful to make the right abstraction of our data
•
A representation based on a 12-note octave will not be able to model
phenomena related to microtonal music
•
A representation based on a 12-note octave will not be able to model
phenomena related to conventional tonal tuning (eg playing into the key)
There is a very good abstraction of Western Common Practice music:
the score
•
•
•
•
•
models categorical pitch and time perception (and tonality if need be)
evolved over about 1,000 years to do this well
not good for everything (eg no means of representing instrumental timbre)
but very good at quite a lot!
Many cognitive models of music use (an equivalent of) score notation
Two kinds of cognitive
model (of music)
Two kinds of cognitive
model (of music)
•
Some models are descriptive (Wiggins, 2007)
•
•
•
•
•
they say what happens when stimuli are applied in each circumstance
the predict results in terms only of the application of rules
these rules may be complicated
these models do not explain WHY a cognitive effect is the way it is
they do explain WHAT the cognitive effect is, at the same level of abstraction
as the representation they use
Two kinds of cognitive
model (of music)
•
•
Some models are descriptive (Wiggins, 2007)
•
•
•
•
•
they say what happens when stimuli are applied in each circumstance
the predict results in terms only of the application of rules
these rules may be complicated
these models do not explain WHY a cognitive effect is the way it is
they do explain WHAT the cognitive effect is, at the same level of abstraction
as the representation they use
Some models are explanatory (Wiggins, 2007)
•
they give a general underlying mechanism by which a phenomenon
occurs
•
•
they predict results using this mechanism
they explain WHY a cognitive effect is the way it is (at some level of
abstraction different from the representation)
Example 1: GTTM
•
Generative Theory of Tonal Music (Lerdahl & Jackendoff, 1983)
•
•
“complete” theory of tonal music (actually not – still being updated)
has 4 components, each being a set of rules, written in English
‣
‣
‣
‣
grouping
metre
time-span reduction
prolongation
•
within each, there are two kinds of rule
•
•
•
“preference” rules mean that GTTM is not a computerisable theory
‣ fixed rules
‣ preference rules
therefore, it is not a rigorously objective model
it is only a descriptive model, because there is no mechanism
Example 2: IDyOM
•
Information Dynamics of Music (Pearce & Wiggins, 06)
•
explanatory model because it is based on an independent statistical process
(also found to model human speech understanding)
•
•
representation is (equivalent to) simple score
“learning” model
‣ the system is told NO rules
‣ it “hears” lots of music (973 tonal folk melodies)
‣ it “learns” the musical structure and generalises from common
occurrences
•
predicts human expectation of tonal-melodic pitch (explains up to 91% of
variance in human studies)
Example 2: IDyOM
•
•
Short-term memory (STM): n-gram (arbitrary n) model
•
•
complex backoff/smoothing strategy
dynamic weighting of features used for prediction, according to information
content
Long-term memory (LTM): as STM
•
trained with a database of >900 tonal melodies
STM
(this piece)
Note
data
Entropy
"Uncertainty"
Distribution
LTM
(all pieces)
Information
Content
"Unexpectedness"
Example 2: IDyOM
•
•
•
•
We have extended the model, using a further statistical technique
•
•
to predict melodic segmentation in tonal music (Müllensiefen et al., 2007)
to predict structure in minimalist music (Potter et al, 2007)
This works by asking how certain the model is of its pitch prediction
In information theoretic terms (Shannon, 1948):
•
•
Uncertainty ≈ entropy
Unexpectedness ≈ high information content
Our speculation/empirical evidence:
•
•
Closure (Narmour, 1990): drop in information content and entropy
Increase in information content and entropy ≈ beginning of new section
Example 2: IDyOM
•
•
Two Pages (Glass, 1969)
•
Strictly systematic piece
Useful because pitch is the only true dimension:
•
•
•
monodic
isochronous
monotimbral
Example 2: IDyOM
•
•
Two Pages (Glass, 1969)
•
Strictly systematic piece
Useful because pitch is the only true dimension:
•
•
•
monodic
isochronous
monotimbral
Example 2: IDyOM
•
Two Pages (Glass, 1969)
•
Strictly systematic piece
1
x 36
2
4
6
•
3
x 15
x 14
x 22
7
x 14
5
x 16
x 26
Useful because pitch is the only true dimension:
•
•
•
monodic
isochronous
monotimbral
8
x 26
etc.
Example 2: IDyOM
Part II
Part III
stm Entropy
stm Information Content
2.0
Part I
Part IV (York/model)
Part IV (Glass)
1.0
0.5
0.0
Bits per event
1.5
Part V
0
1000
2000
3000
Time (quavers)
4000
5000
6000
7000
Discussion: Part IV?
Part IV (Glass/Potter) Part IV (York/model)
#" " x18
#" " " #x18
"
#" " x20
#" " " #x39
" " " #" " " #" " " #x9
"
"
"
"
"
"
"
"
"
"
"
! "
"
"
"
"
"
(a)
(b)
•
The score shows that the first section of Glass’ Part IV (a) is in fact
exactly congruent with the preceding section—that is, it sounds like
part of that preceding section
•
York (1981) analysed Two Pages by transcribing a performance, and
places the boundary of Part IV at the same place as our system (b)
Study 2: Gradus (Philip
Glass)
•
Not a strictly systematic piece
#
#
#
#
#
#
""
#
#
#
#
#
#
#
#
#
#
#
#
# #
# #
# # $
# #
# # $
# #
!
q = 132
"" # # # # $ # # # # # # # # $ # # $ # # # # # # $ # #
#
#
#
#
#
! #
etc.
2
Study 2: Gradus (Philip
Glass)
•
Not a strictly systematic piece
#
#
#
#
#
#
""
#
#
#
#
#
#
#
#
#
#
#
#
# #
# #
# # $
# #
# # $
# #
!
q = 132
"" # # # # $ # # # # # # # # $ # # $ # # # # # # $ # #
#
#
#
#
#
! #
etc.
2
Study 2: Gradus (Philip
Glass)
•
Not a strictly systematic piece
#
#
#
#
#
#
""
#
#
#
#
#
#
#
#
#
#
#
#
# #
# #
# # $
# #
# # $
# #
!
q = 132
"" # # # # $ # # # # # # # # $ # # $ # # # # # # $ # #
#
#
#
#
#
! #
etc.
2
Example 2: IDyOM
Part I
stm Entropy
stm Information Content
Part II
2.5
[33]
[83-85]
[78-82]
[21-23]
[68]
[90]
[74-75]
1.5
[21]
[4]
[28]
[94]
[71-74]
[10]
1.0
[35-38]
[59]
[6&7]
[58]
[69-75]
0.5
[42-46]
[66]
[87]
[76]
[94]
[83]
0.0
Bits per event
2.0
[42]
0
500
1000
1500
Time (quavers)
2000
2500
3000
3500
Example 2: IDyOM
•
There are clear correspondences between the expert music analysis and
(changes in direction in) the curves output by our model
•
Future issues to resolve:
•
which statistical properties of monodic music most reliably predict perceived
boundaries and human reactions?
•
•
will the minimalist music results generalise?
•
(how) does this correspond with what brains do?
how do these interact with other dimensions of music (eg rhythmic,
metrical, harmonic structure) in influencing perceived grouping structure?
Acknowledgements
•
This work is funded by UK Engineering and Physical Sciences Research
Council grants
•
GR/S82220/01:
“Techniques and Algorithms for Understanding the Information Dynamics of
Music”
•
EP/D038855/01:
“Modelling Musical Memory and the Perception of Melodic Similarity”
References
Lerdahl, F. & Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, MA: MIT
Press
Narmour, E. (1990) The Analysis and Cognition of Basic Melodic Structures: The
Implication-realisation Model. Chicago: University of Chicago Press
Müllensiefen, D., Pearce, M. T., Wiggins, G. A. & Frieler, K. (2007) Segmenting Pop Melodies: A
Model Comparison Approach. Proceedings of SMPC’07, Montreal, Canada
Pearce, M. T. & Wiggins, G. A. (2006) Expectancy in melody: The influence of context and
learning. Music Perception, 23(5), 377–405
Potter, K., Wiggins, G. A. & Pearce, M. T. (2007) Towards Greater Objectivity in Music Theory:
Information-Dynamic Analysis of Minimalist Music. Musicae Scientiae, 11(2), 295–324
Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal,
27, 379--423, 623--656
Wiggins, G.A. (2007) Models of Musical Similarity. Musicae Scientiae, Discussion Forum 4a, 315–
338
Download