Computational Modelling of Music Cognition Geraint A. Wiggins Centre for Cognition, Computation and Culture Goldsmiths, University of London Overview • • • • • What has cognition got to do with music? What is computational modelling of cognition? How does one do it? What are its limitations? What can it tell us about music? (Some examples) Music as a cognitive phenomenon • • • Music, as an artefact, is made up of many things • • • • • • • art culture emotion creativity craft/skill beauty etc Primarily, however, it is a psychological construct Music doesn’t happen unless there is a human mind involved Feeling the beat Feeling the beat Feeling the beat Feeling the beat Feeling the beat Feeling the beat Feeling the beat Feeling the beat Feeling the beat • This demonstration shows that human listeners tend to hear rhythmic structure in sound... ...even when it isn’t there • When we know it isn’t there, we can manipulate our own perception, to hear either twos or threes The Necker Cube The Necker Cube Name that tune Name that tune Name that tune Name that tune • What is a “melody”? Name that tune • What is a “melody”? Name that tune • What is a “melody”? 7-8 semitones Name that tune • What is a “melody”? 7-8 semitones Music as a cognitive phenomenon • Milton Babbit (1965) proposed three different ways of looking at music: Music as a cognitive phenomenon • Milton Babbit (1965) proposed three different ways of looking at music: Auditory Acoustic Graphemic Music as a cognitive phenomenon Milton Babbit (1965) proposed three different ways of looking at music: ing pe rfo rm en w re- Acoustic g din rea g re- ritin sco sco ing Auditory list • playback recording Graphemic Music as a cognitive phenomenon Milton Babbit (1965) proposed three different ways of looking at music: ing pe rfo rm en w re- Acoustic g din rea g re- ritin sco sco ing Auditory list • playback recording Graphemic Music as a cognitive phenomenon Milton Babbit (1965) proposed three different ways of looking at music: ing rfo rm en pe g din rea g re- ritin sco w re- Acoustic sco ing Auditory list • MUSIC playback recording Graphemic Music as a cognitive phenomenon Milton Babbit (1965) proposed three different ways of looking at music: ing rfo rm en pe g din rea g re- ritin sco w re- Acoustic sco ing Auditory list • MUSIC playback recording Graphemic What is computational modelling of cognition? • • • It is difficult to study minds • • • • • you can’t see them you can’t stick electrodes in them their relationship with brains is completely unclear it is unethical to “mess about” with them etc Before the advent of computers, psychologists had two means of study: • • look at what happened when things went wrong make predictions from theory about what would happen in certain precise circumstances (hypotheses), and test them (experiments) This is very time-consuming (decades, not hours), error-prone, and (in the first case) dependent on chance What is computational modelling of cognition? • • With computers, however, new things become possible • We can also make predictions by computer which can then be tested in experiments with humans • • This can be much faster than the human-driven approach We can write computer programs which embody theories and then test them to destruction (ethically!) It is more objective than the human-driven approach (so long as the program is written objectively) What’s the point? • This is really the only (ethical) way to understand how a cognitive phenomenon actually works • • • duplicate it in an artificial system and test that to destruction if it matches human behaviour in all circumstances, it is a good model If you can write a program which embodies your theory, then your theory is fully worked through (a Good Thing) How do we build a cognitive model? How do we build a cognitive model? • Apply reductionist methodology! • • • • • accept that most phenomena are too complex to understand all at once identify part(s) of the phenomenon that are (as) separable (as possible) be careful to use stimuli (music) that do not go beyond these boundaries remember that the resulting model is probably an oversimplification when you have understood the parts of the phenomenon, put them together, study the interactions between them, and test them in concert How do we build a cognitive model? • • Apply reductionist methodology! • • • • • accept that most phenomena are too complex to understand all at once identify part(s) of the phenomenon that are (as) separable (as possible) be careful to use stimuli (music) that do not go beyond these boundaries remember that the resulting model is probably an oversimplification when you have understood the parts of the phenomenon, put them together, study the interactions between them, and test them in concert This is quite different from the holistic view usually taken in the humanities, but it is not incompatible How do we build a cognitive model? • Apply reductionist methodology! • • • • • accept that most phenomena are too complex to understand all at once identify part(s) of the phenomenon that are (as) separable (as possible) be careful to use stimuli (music) that do not go beyond these boundaries remember that the resulting model is probably an oversimplification when you have understood the parts of the phenomenon, put them together, study the interactions between them, and test them in concert • This is quite different from the holistic view usually taken in the humanities, but it is not incompatible • Human (musical) behaviour is at the start and the end of this process: • • theories behind the models come from observation of musical behaviour results from models are tested against musical behaviour What are the limitations of cognitive modelling? What are the limitations of cognitive modelling? • A model is only as good as • • • • the theory it embodies the computational implementation the input data the input and output data representation What are the limitations of cognitive modelling? • • A model is only as good as • • • • the theory it embodies the computational implementation the input data the input and output data representation We must always question and test (and re-test) results because of these potential sources of error What are the limitations of cognitive modelling? What are the limitations of cognitive modelling? • We can only take one small step at a time • this science is in its infancy: we must not rush ahead and make mistakes What are the limitations of cognitive modelling? • • We can only take one small step at a time • this science is in its infancy: we must not rush ahead and make mistakes Therefore, we have to be satisfied with small, focused, isolated results • we look at how a given aspect of something changes, given that everything else stays the same – an artificial situation What are the limitations of cognitive modelling? • • • We can only take one small step at a time • this science is in its infancy: we must not rush ahead and make mistakes Therefore, we have to be satisfied with small, focused, isolated results • we look at how a given aspect of something changes, given that everything else stays the same – an artificial situation The results are only ever approximations • we continue to refine models as our understanding improves What about music? What about music? • If music is (at least originally) a psychological phenomenon, then there are probably interesting things to learn about it by treating it as such What about music? • If music is (at least originally) a psychological phenomenon, then there are probably interesting things to learn about it by treating it as such • not least: WHY is it the way it is? What are the requirements of a cognitive model? What are the requirements of a cognitive model? • • We must be careful to make the right abstraction of our data • A representation based on a 12-note octave will not be able to model phenomena related to microtonal music • A representation based on a 12-note octave will not be able to model phenomena related to conventional tonal tuning (eg playing into the key) There is a very good abstraction of Western Common Practice music: the score • • • • • models categorical pitch and time perception (and tonality if need be) evolved over about 1,000 years to do this well not good for everything (eg no means of representing instrumental timbre) but very good at quite a lot! Many cognitive models of music use (an equivalent of) score notation Two kinds of cognitive model (of music) Two kinds of cognitive model (of music) • Some models are descriptive (Wiggins, 2007) • • • • • they say what happens when stimuli are applied in each circumstance the predict results in terms only of the application of rules these rules may be complicated these models do not explain WHY a cognitive effect is the way it is they do explain WHAT the cognitive effect is, at the same level of abstraction as the representation they use Two kinds of cognitive model (of music) • • Some models are descriptive (Wiggins, 2007) • • • • • they say what happens when stimuli are applied in each circumstance the predict results in terms only of the application of rules these rules may be complicated these models do not explain WHY a cognitive effect is the way it is they do explain WHAT the cognitive effect is, at the same level of abstraction as the representation they use Some models are explanatory (Wiggins, 2007) • they give a general underlying mechanism by which a phenomenon occurs • • they predict results using this mechanism they explain WHY a cognitive effect is the way it is (at some level of abstraction different from the representation) Example 1: GTTM • Generative Theory of Tonal Music (Lerdahl & Jackendoff, 1983) • • “complete” theory of tonal music (actually not – still being updated) has 4 components, each being a set of rules, written in English ‣ ‣ ‣ ‣ grouping metre time-span reduction prolongation • within each, there are two kinds of rule • • • “preference” rules mean that GTTM is not a computerisable theory ‣ fixed rules ‣ preference rules therefore, it is not a rigorously objective model it is only a descriptive model, because there is no mechanism Example 2: IDyOM • Information Dynamics of Music (Pearce & Wiggins, 06) • explanatory model because it is based on an independent statistical process (also found to model human speech understanding) • • representation is (equivalent to) simple score “learning” model ‣ the system is told NO rules ‣ it “hears” lots of music (973 tonal folk melodies) ‣ it “learns” the musical structure and generalises from common occurrences • predicts human expectation of tonal-melodic pitch (explains up to 91% of variance in human studies) Example 2: IDyOM • • Short-term memory (STM): n-gram (arbitrary n) model • • complex backoff/smoothing strategy dynamic weighting of features used for prediction, according to information content Long-term memory (LTM): as STM • trained with a database of >900 tonal melodies STM (this piece) Note data Entropy "Uncertainty" Distribution LTM (all pieces) Information Content "Unexpectedness" Example 2: IDyOM • • • • We have extended the model, using a further statistical technique • • to predict melodic segmentation in tonal music (Müllensiefen et al., 2007) to predict structure in minimalist music (Potter et al, 2007) This works by asking how certain the model is of its pitch prediction In information theoretic terms (Shannon, 1948): • • Uncertainty ≈ entropy Unexpectedness ≈ high information content Our speculation/empirical evidence: • • Closure (Narmour, 1990): drop in information content and entropy Increase in information content and entropy ≈ beginning of new section Example 2: IDyOM • • Two Pages (Glass, 1969) • Strictly systematic piece Useful because pitch is the only true dimension: • • • monodic isochronous monotimbral Example 2: IDyOM • • Two Pages (Glass, 1969) • Strictly systematic piece Useful because pitch is the only true dimension: • • • monodic isochronous monotimbral Example 2: IDyOM • Two Pages (Glass, 1969) • Strictly systematic piece 1 x 36 2 4 6 • 3 x 15 x 14 x 22 7 x 14 5 x 16 x 26 Useful because pitch is the only true dimension: • • • monodic isochronous monotimbral 8 x 26 etc. Example 2: IDyOM Part II Part III stm Entropy stm Information Content 2.0 Part I Part IV (York/model) Part IV (Glass) 1.0 0.5 0.0 Bits per event 1.5 Part V 0 1000 2000 3000 Time (quavers) 4000 5000 6000 7000 Discussion: Part IV? Part IV (Glass/Potter) Part IV (York/model) #" " x18 #" " " #x18 " #" " x20 #" " " #x39 " " " #" " " #" " " #x9 " " " " " " " " " " " ! " " " " " " (a) (b) • The score shows that the first section of Glass’ Part IV (a) is in fact exactly congruent with the preceding section—that is, it sounds like part of that preceding section • York (1981) analysed Two Pages by transcribing a performance, and places the boundary of Part IV at the same place as our system (b) Study 2: Gradus (Philip Glass) • Not a strictly systematic piece # # # # # # "" # # # # # # # # # # # # # # # # # # $ # # # # $ # # ! q = 132 "" # # # # $ # # # # # # # # $ # # $ # # # # # # $ # # # # # # # ! # etc. 2 Study 2: Gradus (Philip Glass) • Not a strictly systematic piece # # # # # # "" # # # # # # # # # # # # # # # # # # $ # # # # $ # # ! q = 132 "" # # # # $ # # # # # # # # $ # # $ # # # # # # $ # # # # # # # ! # etc. 2 Study 2: Gradus (Philip Glass) • Not a strictly systematic piece # # # # # # "" # # # # # # # # # # # # # # # # # # $ # # # # $ # # ! q = 132 "" # # # # $ # # # # # # # # $ # # $ # # # # # # $ # # # # # # # ! # etc. 2 Example 2: IDyOM Part I stm Entropy stm Information Content Part II 2.5 [33] [83-85] [78-82] [21-23] [68] [90] [74-75] 1.5 [21] [4] [28] [94] [71-74] [10] 1.0 [35-38] [59] [6&7] [58] [69-75] 0.5 [42-46] [66] [87] [76] [94] [83] 0.0 Bits per event 2.0 [42] 0 500 1000 1500 Time (quavers) 2000 2500 3000 3500 Example 2: IDyOM • There are clear correspondences between the expert music analysis and (changes in direction in) the curves output by our model • Future issues to resolve: • which statistical properties of monodic music most reliably predict perceived boundaries and human reactions? • • will the minimalist music results generalise? • (how) does this correspond with what brains do? how do these interact with other dimensions of music (eg rhythmic, metrical, harmonic structure) in influencing perceived grouping structure? Acknowledgements • This work is funded by UK Engineering and Physical Sciences Research Council grants • GR/S82220/01: “Techniques and Algorithms for Understanding the Information Dynamics of Music” • EP/D038855/01: “Modelling Musical Memory and the Perception of Melodic Similarity” References Lerdahl, F. & Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, MA: MIT Press Narmour, E. (1990) The Analysis and Cognition of Basic Melodic Structures: The Implication-realisation Model. Chicago: University of Chicago Press Müllensiefen, D., Pearce, M. T., Wiggins, G. A. & Frieler, K. (2007) Segmenting Pop Melodies: A Model Comparison Approach. Proceedings of SMPC’07, Montreal, Canada Pearce, M. T. & Wiggins, G. A. (2006) Expectancy in melody: The influence of context and learning. Music Perception, 23(5), 377–405 Potter, K., Wiggins, G. A. & Pearce, M. T. (2007) Towards Greater Objectivity in Music Theory: Information-Dynamic Analysis of Minimalist Music. Musicae Scientiae, 11(2), 295–324 Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal, 27, 379--423, 623--656 Wiggins, G.A. (2007) Models of Musical Similarity. Musicae Scientiae, Discussion Forum 4a, 315– 338