Sequential Organization from an Ecological Perspective

advertisement
Sequential Organization
from an Ecological Perspective
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
dpwe@ee.columbia.edu
http://labrosa.ee.columbia.edu/
1. What is Sequential Organization / Streaming?
2. Why does Streaming Exist?
3. What are the Computational Implications?
Sequential Organization - Dan Ellis
2009-04-24 - 1 /10
1. Auditory Scene AnalysisBregman’90
Darwin & Carlyon’95
• How do people analyze sound mixtures?
break mixture into elements (time-freq atoms)
elements are grouped in to sources using cues
sources have aggregate attributes
• Grouping rules
cues: common onset/modulation, harmonicity, ...
Sound Frequency
analysis
Harmonicity
map
Atoms
Streams
Events
Simultaneous
grouping
Event
properties
Sequential
grouping
Source/
stream
properties
Spatial
map
Sequential Organization - Dan Ellis
2009-04-24 - 2 /10
(after Darwin
1996)
Onset
map
Auditory Streaming Miller & Heise ’50
1 kHz
!f:
–2 octaves
time
Kashino et al. ’07
Frequency separation in semitones
15
TRT: 60-150 ms
frequency
• “Discovered” by musicians,
beloved by psychologists...
• Ambiguity, buildup
Bregman & Campbell ’71
van Noorden ’75
temporal coherence boundary
10
ambiguous region
5
fission boundary
0
0
20
40
60
80
100
120
140
160
Tone repetition time (TRT) in ms
180
200
Sequential Organization - Dan Ellis
2009-04-24 - 3 /10
Relevance
• Does two-tone streaming tell us about the
real world?
Bregman ’90
Sequential Organization - Dan Ellis
2009-04-24 - 4 /10
Ecological Streaming
• Streaming in the real world
4000
2000
1000
0
Miriam Makeba
0
1
2
3
Time
4000
4
5
6
3000
Frequency
Frequency
3000
2000
1000
0
0
0.5
1
1.5
2
Time
Sequential Organization - Dan Ellis
2.5
3
3.5
4
2009-04-24 - 5 /10
Speech Streaming
• Task: Coordinate Response Measure
Brungart et al. ’02
“Ready Baron go to green eight now”
256 variants, 16 speakers
correct = color and number for “Baron”
crm-11737+16515.wav
• Accuracy as a function of spatial separation:
A, B same speaker
Sequential Organization - Dan Ellis
o Range effect
2009-04-24 - 6 /10
2. Why Does Streaming Exist?
• The effect of streaming
“fission” of auditory percept into separate streams
interferes with judgments between streams
• But for perception, context is critical
to understand a sound event’s meaning, you need
to know what comes before and after
.. and not be confused by random co-occurrences
• Need to integrate disparate evidence
streams as the common hook for glimpses
• So what cues can lead to streaming?
should be anything that can distinguish sources
Sequential Organization - Dan Ellis
2009-04-24 - 7 /10
Time, Scale, & Context
4000
Frequency
3000
2000
1000
0
0
0.5
Time
0
0.5
1
1.5
2
2.5
Time
3
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
Time
3
3.5
4
4.5
5
4000
Frequency
Footsteps
2000
1000
0
4000
fall in the
“ambiguous
region” ?
3000
Frequency
•
3000
2000
1000
0
Sequential Organization - Dan Ellis
2009-04-24 - 8 /10
3. Computational Implications
• “World model”
hypotheses
state evolution
“hook” for evidence
input
mixture
Front end
Noise
components
Hypothesis
management
signal
features
prediction
errors
Compare
& reconcile
Predict
& combine
Periodic
components
predicted
features
Ellis ’96
• Just maximizing P(observation | explanation)
• Bottom up vs. top down
frequency / kHz
unifies with simultaneous organization
events depend on streams
same old problems
of forming parts & organizing them
Sequential Organization - Dan Ellis
2
1
+
0
0.0
0.4
0.8
1.2
time / s
2009-04-24 - 9 /10
Summary
• Objects may sound intermittently
makes a stream of relevant sound events / glimpses
• Meaning relies on the full ‘history’ of sound
events from a particular source
between-event relations are useful
• Streaming is critical
forming streams is scene analysis
➡ If we are to correctly perceive
something in the world,
we must be able to make a stream out of it
Sequential Organization - Dan Ellis
2009-04-24 - 10/10
Download