CASA - University of Kentucky

advertisement
Computational Auditory Scene
Analysis
Kevin D. Donohue
Databeam Professor
Electrical and Computer Engineering
University of Kentucky
Describe What You Hear
Scene 1
Scene 2
Scene 3
Sounds downloaded from
http://www.prankcallsunlimited.com/
Auditory Scene Analysis
Auditory Scene Analysis (ASA) is a
cognitive process that organizes sounds into
perceptual objects.
• Computational Auditory Scene Analysis
(CASA) uses computational models to
study ASA.
Auditory Scene: Input
 Sensory organs (ears) separate acoustic energy into
frequency bands and convert band energy into neural
firings
 The auditory cortex receives the neural responses and
abstracts an auditory scene.
1
2
0
0.0
5
3
0.1
Time
4
http://hyperphysics.phy-astr.gsu.edu/hbase/sound/hearcon.html
Frequency
Auditory Scene: Perception
 Perception derives a useful representation of reality
from sensory input.
 Auditory Stream refers to a perceptual unit associated
with a single happening (A.S. Bregman, 1990) .
Acoustic to
Neural
Conversion
Schema-driven/Top-down
Processes
Organize into
Auditory
Streams
Primitive/Bottom-up
Processes
Representation
of Reality
HighLevel
Cognition
Auditory Stream Experiment
Bergman & Campbell (1971)
 Streams tend to form by grouping notes close in time and
frequency (similarity and proximity).
http://www.psych.mcgill.ca/labs/auditory/demo3.html
http://www.psych.mcgill.ca/labs/auditory/demo2.html
Circularity in Pitch Judgement
 Shepard’s Scale (1964)
(Auditory Demonstrations CD, from the Acoustical Society of America)
Perceptual Organization
Organization properties:
 Belongingness – a sensory element belongs to an
organization (or stream) of which is a part.
 Exclusive allocation – a sensory element cannot
belong to more than one organization at a time.
 Bregman & Rudnicky (1975)
Perceptual Organization
Organization properties:
 Closure – perceived continuity, a tendency to
close strong perceptual forms, response to
missing evidence.
Sequential and Spectral Integration
in Forming Streams
Sequential Integration
 Grouping sensory elements over time or events at
different times considered to be from the same
source/object.
Spectral Integration
 Fusing simultaneous sensory elements over frequency
into one.
Timbre and Spectral Integration
 The time envelope and harmonic structure give rise the
timbre of the sound.
0.5
Amplitude
0
dB
-20
-40
-60
0
2000
4000
6000
Hertz
8000
10000
-0.5
12000
dB
-20
-40
0
2000
4000
6000
Hertz
8000
10000
0.2
0.4
0.6
Seconds
0.8
1
0
0.2
0.4
0.6
Seconds
0.8
1
0
0.2
0.4
0.6
Seconds
0.8
1
0.5
0
-0.5
-1
12000
0
Amplitude
1
dB
-20
-40
-60
0
1
Amplitude
0
-60
0
0
2000
4000
6000
Hertz
8000
10000
12000
0.5
0
-0.5
-1
Timbre and Spectral Integration
Simultaneous tones grouped by timbre
2 Notes (F and A)
5000
5000
4000
4000
3000
3000
Hertz
Hertz
Same Note (A)
2000
2000
1000
1000
0
0.1
0.2 0.3
Seconds
0.4
0
0.1
0.2 0.3
Seconds
0.4
Auditory Scene Organization
 Primitive Stream Segregation
 Inherent constraints in auditory scene analysis (perceptual organization
demonstrated by infants/children)
 Schema-based segregation
 Learned constraints in auditory scene analysis (differences in perceptual
organization resulting from training and culture)
(A.S. Bregman, Auditory Scene Analysis, MIT Press 1990, pp. 1-45)
Cues Use for Grouping
From THE AUDITORY ORGANIZATION OF SPEECHAND OTHER SOURCES IN LISTENERS AND
COMPUTATIONAL MODELS, M. Cooke and D. P.W. Ellis, 1999
Download