Audio Scene Analysis and Music Cognitive Elements of Music Listening Kevin D. Donohue Databeam Professor Electrical and Computer Engineering University of Kentucky What is Music? 1 a : the science or art of ordering tones or sounds in succession, in combination, and in temporal relationships to produce a composition having unity and continuity b : vocal, instrumental, or mechanical sounds having rhythm, melody, or harmony Merrian-Webster Online Dictionary: http://www.m-w.com/dictionary/music Auditory Scene: Input Sensory organs (ears) separate acoustic energy into frequency bands and convert band energy into neural firings The auditory cortex receives the neural responses and abstracts an auditory scene. 1 2 0 0.0 5 3 0.1 Time 4 http://hyperphysics.phy-astr.gsu.edu/hbase/sound/hearcon.html Frequency Auditory Scene: Perception Perception derives a useful representation of reality from sensory input. Auditory Stream refers to a perceptual unit associated with a single happening (A.S. Bregman, 1990) . Acoustic to Neural Conversion Organize into Auditory Streams Representation of Reality Auditory Stream Experiment Bergman & Campbell (1971) Streams tend to form by grouping notes close in time and frequency (similarity and proximity). Click on spectrograms to play tone sequence. Identify changes in tone grouping based on separation in time and frequency. http://www.psych.mcgill.ca/labs/auditory/demo3.html http://www.psych.mcgill.ca/labs/auditory/demo2.html Note change in grouping/phrasing from inserting a pair of closely spaced tones around the lower tone. Circularity in Pitch Judgement Shepard’s Scale (1964) (Auditory Demonstrations CD, from the Acoustical Society of America) Perceptual Organization Organization properties: Belongingness – a sensory element belongs to an organization (or stream) of which is a part. Exclusive allocation – a sensory element cannot belong to more than one organization at a time. Bregman & Rudnicky (1975) Click on spectrogram to listen to tone sequence. Note in first case the later tonal group sounds as one stream due to time proximity. In the second case flanking the lower tones with a sequence at same frequency, separates the lower tone from the upper tones creating 2 separate streams. Perceptual Organization Organization properties: Closure – perceived continuity, a tendency to close strong perceptual forms, response to missing evidence. Click on time waveform plots to listen. In the first case a low level tone is playing and then stops, but the gap is covered by a white noise mask. Most will hear the tone playing through the mask. Tone pattern first spectrogram White noise only, used in masking Sequential and Spectral Integration Sequential Integration Grouping sensory elements over time or events at different times and considered as from the same source. Melody, rhythm Spectral Integration Fusing simultaneous sensory elements over frequency into one Timbre, harmony Timbre and Spectral Integration The time harmonic structure (spectral envelope) and time envelope give rise the timbre of the sound. Click on spectra to hear sound. Note Impact of spectral and time envelopes 0.5 Amplitude 0 dB -20 -40 -60 0 2000 4000 6000 Hertz 8000 10000 -0.5 12000 dB -20 -40 0 2000 4000 6000 Hertz 8000 10000 0.2 0.4 0.6 Seconds 0.8 1 0 0.2 0.4 0.6 Seconds 0.8 1 0 0.2 0.4 0.6 Seconds 0.8 1 0.5 0 -0.5 -1 12000 0 Amplitude 1 dB -20 -40 -60 0 1 Amplitude 0 -60 0 0 2000 4000 6000 Hertz 8000 10000 12000 0.5 0 -0.5 -1 Timbre and Spectral Integration Simultaneous tones grouped by timbre Click on spectrograms to play sounds. Note that different spectral bands do not sound like different streams. Just one stream is heard. 2 Notes (F and A) 5000 5000 4000 4000 3000 3000 Hertz Hertz Same Note (A) 2000 2000 1000 1000 0 0.1 0.2 0.3 Seconds 0.4 0 0.1 0.2 0.3 Seconds 0.4 Auditory Scene Organization Primitive Stream Segregation Inherent constraints in auditory scene analysis (perceptual organization demonstrated by infants/children) Music: Organization of musical sensory units Schema-based segregation Learned constraints in auditory scene analysis (differences in perceptual organization resulting from training and culture) Music: Differences between musicians and non-musicians Music: Differences resulting from acculturation (A.S. Bregman, Auditory Scene Analysis, MIT Press 1990, pp. 1-45) Music Related Terms Pitch – Perceived frequency/fundamental tone (20Hz20kHz Range) Melody – Pattern of tones identified by the intervals between consecutive pitches Contour – Shape of the melody without regard to intervals Loudness – Perceived intensity of sound (0dB to 120dB) Timbre – Nature of a sound defined mostly by its harmonic structure and time envelope Rhythm – Repeated pattern of strong and weak sounds Tempo – Rate of the rhythm Melody Invariance A melody can typically be recognized over changes in pitch, loudness, timbre, tempo, spatial location, and reverberations. Contours are typically recalled better than actual melodies (intervals) for unfamiliar tunes. (Massaro, Kallman, and Kelly 1980). (Daniel J. Levitin, Memory for Musical Attributes, in Music Cognition and Computerized Sound, ed. P.R. Cook, MIT Press, 1999, pp. 209-227) Primitive Musical Perception Distinguish between cognitive components present at an early age and those resulting from acculturation. Infant: Grasp of musical structures Adult: Develop cognitive strategies for applying musical structures (W. Jay Dowling, The Development of Music Perception and Cognition, The Psychology of Music Academic Press, 1999, pp 603-625) Summary Innate perceptual organization separates sounds from different sources. Grouping by pitch, contour, rhythm (phrasing), and timbre are exhibited by infants. Acculturation refines melody distinctions and its relationship to harmonies and rhythms based on cultural scales and patterns. Melodic memory is enhanced for melodies following note of a known scale. Auditory scene analysis operations apply broadly to all sounds (speech, noise, music). Why some auditory streams become pleasurable/stimulating/interesting (music), and others are simply used to form a perception of reality is still not clear. How many streams are there? Tell Me Ma - Spectrogram in dB 8000 120 7000 100 6000 80 Hertz 5000 60 4000 3000 40 2000 20 1000 0 0 5 10 Seconds 15 Interesting Websites Mind, Music, and Machine http://www.nici.kun.nl/mmm/ Auditory Scene Analysis http://www.psych.mcgill.ca/labs/auditory/introASA.html Joe Wolfe’s Web Page http://www.phys.unsw.edu.au/~jw/Joe.html