Audio Scene Analysis and Music Cognitive Elements of Music Listening Databeam Professor

advertisement
Audio Scene Analysis and Music
Cognitive Elements of Music Listening
Kevin D. Donohue
Databeam Professor
Electrical and Computer Engineering
University of Kentucky
What is Music?
1 a : the science or art of ordering tones or
sounds in succession, in combination, and
in temporal relationships to produce a
composition having unity and continuity b :
vocal, instrumental, or mechanical sounds
having rhythm, melody, or harmony
Merrian-Webster Online Dictionary:
http://www.m-w.com/dictionary/music
Auditory Scene: Input
 Sensory organs (ears) separate acoustic energy into
frequency bands and convert band energy into neural
firings
 The auditory cortex receives the neural responses and
abstracts an auditory scene.
1
2
0
0.0
5
3
0.1
Time
4
http://hyperphysics.phy-astr.gsu.edu/hbase/sound/hearcon.html
Frequency
Auditory Scene: Perception
 Perception derives a useful representation of reality
from sensory input.
 Auditory Stream refers to a perceptual unit associated
with a single happening (A.S. Bregman, 1990) .
Acoustic to
Neural
Conversion
Organize into
Auditory
Streams
Representation
of Reality
Auditory Stream Experiment
Bergman & Campbell (1971)
 Streams tend to form by grouping notes close in time and frequency
(similarity and proximity).
 Click on spectrograms to play tone sequence. Identify changes in tone
grouping based on separation in time and frequency.
http://www.psych.mcgill.ca/labs/auditory/demo3.html
http://www.psych.mcgill.ca/labs/auditory/demo2.html
Note change in grouping/phrasing
from inserting a pair of closely
spaced tones around the lower tone.
Circularity in Pitch Judgement
 Shepard’s Scale (1964)
(Auditory Demonstrations CD, from the Acoustical Society of America)
Perceptual Organization
Organization properties:
 Belongingness – a sensory element belongs to an
organization (or stream) of which is a part.
 Exclusive allocation – a sensory element cannot belong to
more than one organization at a time.
 Bregman & Rudnicky (1975)
 Click on spectrogram to listen to tone sequence. Note in
first case the later tonal group sounds as one stream due to
time proximity. In the second case flanking the lower tones
with a sequence at same frequency, separates the lower tone
from the upper tones creating 2 separate streams.
Perceptual Organization
Organization properties:
 Closure – perceived continuity, a tendency to close strong
perceptual forms, response to missing evidence.
 Click on time waveform plots to listen. In the first case a
low level tone is playing and then stops, but the gap is
covered by a white noise mask. Most will hear the tone
playing through the mask.
Tone pattern first spectrogram
White noise only, used in masking
Sequential and Spectral Integration
Sequential Integration
 Grouping sensory elements over time or events at
different times and considered as from the same source.
Melody, rhythm
Spectral Integration
 Fusing simultaneous sensory elements over frequency
into one
Timbre, harmony
Timbre and Spectral Integration
 The time harmonic structure (spectral envelope) and time envelope give rise
the timbre of the sound.
 Click on spectra to hear sound. Note Impact of spectral and time envelopes
0.5
Amplitude
0
dB
-20
-40
-60
0
2000
4000
6000
Hertz
8000
10000
-0.5
12000
dB
-20
-40
0
2000
4000
6000
Hertz
8000
10000
0.2
0.4
0.6
Seconds
0.8
1
0
0.2
0.4
0.6
Seconds
0.8
1
0
0.2
0.4
0.6
Seconds
0.8
1
0.5
0
-0.5
-1
12000
0
Amplitude
1
dB
-20
-40
-60
0
1
Amplitude
0
-60
0
0
2000
4000
6000
Hertz
8000
10000
12000
0.5
0
-0.5
-1
Timbre and Spectral Integration
 Simultaneous tones grouped by timbre
 Click on spectrograms to play sounds. Note that different spectral
bands do not sound like different streams. Just one stream is heard.
2 Notes (F and A)
5000
5000
4000
4000
3000
3000
Hertz
Hertz
Same Note (A)
2000
2000
1000
1000
0
0.1
0.2 0.3
Seconds
0.4
0
0.1
0.2 0.3
Seconds
0.4
Auditory Scene Organization
 Primitive Stream Segregation
 Inherent constraints in auditory scene analysis (perceptual organization
demonstrated by infants/children)
 Music: Organization of musical sensory units
 Schema-based segregation
 Learned constraints in auditory scene analysis (differences in perceptual
organization resulting from training and culture)
 Music: Differences between musicians and non-musicians
 Music: Differences resulting from acculturation
(A.S. Bregman, Auditory Scene Analysis, MIT Press 1990, pp. 1-45)
Music Related Terms
 Pitch – Perceived frequency/fundamental tone (20Hz20kHz Range)
 Melody – Pattern of tones identified by the intervals
between consecutive pitches
 Contour – Shape of the melody without regard to
intervals
 Loudness – Perceived intensity of sound (0dB to 120dB)
 Timbre – Nature of a sound defined mostly by its
harmonic structure and time envelope
 Rhythm – Repeated pattern of strong and weak sounds
 Tempo – Rate of the rhythm
Melody Invariance
 A melody can typically be recognized over changes in
pitch, loudness, timbre, tempo, spatial location, and
reverberations.
 Contours are typically recalled better than actual melodies
(intervals) for unfamiliar tunes. (Massaro, Kallman, and
Kelly 1980).
(Daniel J. Levitin, Memory for Musical Attributes, in Music
Cognition and Computerized Sound, ed. P.R. Cook, MIT Press, 1999,
pp. 209-227)
Primitive Musical Perception
 Distinguish between cognitive components
present at an early age and those resulting from
acculturation.
 Infant: Grasp of musical structures
 Adult: Develop cognitive strategies for applying
musical structures
(W. Jay Dowling, The Development of Music Perception and Cognition,
The Psychology of Music Academic Press, 1999, pp 603-625)
Summary
 Innate perceptual organization separates sounds from different
sources. Grouping by pitch, contour, rhythm (phrasing), and timbre
are exhibited by infants.
 Acculturation refines melody distinctions and its relationship to
harmonies and rhythms based on cultural scales and patterns.
 Melodic memory is enhanced for melodies following note of a
known scale.
 Auditory scene analysis operations apply broadly to all sounds
(speech, noise, music). Why some auditory streams become
pleasurable/stimulating/interesting (music), and others are simply
used to form a perception of reality is still not clear.
How many streams are there?
Tell Me Ma - Spectrogram in dB
8000
120
7000
100
6000
80
Hertz
5000
60
4000
3000
40
2000
20
1000
0
0
5
10
Seconds
15
Interesting Websites
 Mind, Music, and Machine
http://www.nici.kun.nl/mmm/
 Auditory Scene Analysis
http://www.psych.mcgill.ca/labs/auditory/introASA.html
 Joe Wolfe’s Web Page
http://www.phys.unsw.edu.au/~jw/Joe.html
Download