Sequential Organization from an Ecological Perspective Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. What is Sequential Organization / Streaming? 2. Why does Streaming Exist? 3. What are the Computational Implications? Sequential Organization - Dan Ellis 2009-04-24 - 1 /10 1. Auditory Scene AnalysisBregman’90 Darwin & Carlyon’95 • How do people analyze sound mixtures? break mixture into elements (time-freq atoms) elements are grouped in to sources using cues sources have aggregate attributes • Grouping rules cues: common onset/modulation, harmonicity, ... Sound Frequency analysis Harmonicity map Atoms Streams Events Simultaneous grouping Event properties Sequential grouping Source/ stream properties Spatial map Sequential Organization - Dan Ellis 2009-04-24 - 2 /10 (after Darwin 1996) Onset map Auditory Streaming Miller & Heise ’50 1 kHz !f: –2 octaves time Kashino et al. ’07 Frequency separation in semitones 15 TRT: 60-150 ms frequency • “Discovered” by musicians, beloved by psychologists... • Ambiguity, buildup Bregman & Campbell ’71 van Noorden ’75 temporal coherence boundary 10 ambiguous region 5 fission boundary 0 0 20 40 60 80 100 120 140 160 Tone repetition time (TRT) in ms 180 200 Sequential Organization - Dan Ellis 2009-04-24 - 3 /10 Relevance • Does two-tone streaming tell us about the real world? Bregman ’90 Sequential Organization - Dan Ellis 2009-04-24 - 4 /10 Ecological Streaming • Streaming in the real world 4000 2000 1000 0 Miriam Makeba 0 1 2 3 Time 4000 4 5 6 3000 Frequency Frequency 3000 2000 1000 0 0 0.5 1 1.5 2 Time Sequential Organization - Dan Ellis 2.5 3 3.5 4 2009-04-24 - 5 /10 Speech Streaming • Task: Coordinate Response Measure Brungart et al. ’02 “Ready Baron go to green eight now” 256 variants, 16 speakers correct = color and number for “Baron” crm-11737+16515.wav • Accuracy as a function of spatial separation: A, B same speaker Sequential Organization - Dan Ellis o Range effect 2009-04-24 - 6 /10 2. Why Does Streaming Exist? • The effect of streaming “fission” of auditory percept into separate streams interferes with judgments between streams • But for perception, context is critical to understand a sound event’s meaning, you need to know what comes before and after .. and not be confused by random co-occurrences • Need to integrate disparate evidence streams as the common hook for glimpses • So what cues can lead to streaming? should be anything that can distinguish sources Sequential Organization - Dan Ellis 2009-04-24 - 7 /10 Time, Scale, & Context 4000 Frequency 3000 2000 1000 0 0 0.5 Time 0 0.5 1 1.5 2 2.5 Time 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 Time 3 3.5 4 4.5 5 4000 Frequency Footsteps 2000 1000 0 4000 fall in the “ambiguous region” ? 3000 Frequency • 3000 2000 1000 0 Sequential Organization - Dan Ellis 2009-04-24 - 8 /10 3. Computational Implications • “World model” hypotheses state evolution “hook” for evidence input mixture Front end Noise components Hypothesis management signal features prediction errors Compare & reconcile Predict & combine Periodic components predicted features Ellis ’96 • Just maximizing P(observation | explanation) • Bottom up vs. top down frequency / kHz unifies with simultaneous organization events depend on streams same old problems of forming parts & organizing them Sequential Organization - Dan Ellis 2 1 + 0 0.0 0.4 0.8 1.2 time / s 2009-04-24 - 9 /10 Summary • Objects may sound intermittently makes a stream of relevant sound events / glimpses • Meaning relies on the full ‘history’ of sound events from a particular source between-event relations are useful • Streaming is critical forming streams is scene analysis ➡ If we are to correctly perceive something in the world, we must be able to make a stream out of it Sequential Organization - Dan Ellis 2009-04-24 - 10/10