Computational Auditory Scene Analysis Kevin D. Donohue Databeam Professor Electrical and Computer Engineering University of Kentucky Describe What You Hear Scene 1 Scene 2 Scene 3 Sounds downloaded from http://www.prankcallsunlimited.com/ Auditory Scene Analysis Auditory Scene Analysis (ASA) is a cognitive process that organizes sounds into perceptual objects. • Computational Auditory Scene Analysis (CASA) uses computational models to study ASA. Auditory Scene: Input Sensory organs (ears) separate acoustic energy into frequency bands and convert band energy into neural firings The auditory cortex receives the neural responses and abstracts an auditory scene. 1 2 0 0.0 5 3 0.1 Time 4 http://hyperphysics.phy-astr.gsu.edu/hbase/sound/hearcon.html Frequency Auditory Scene: Perception Perception derives a useful representation of reality from sensory input. Auditory Stream refers to a perceptual unit associated with a single happening (A.S. Bregman, 1990) . Acoustic to Neural Conversion Schema-driven/Top-down Processes Organize into Auditory Streams Primitive/Bottom-up Processes Representation of Reality HighLevel Cognition Auditory Stream Experiment Bergman & Campbell (1971) Streams tend to form by grouping notes close in time and frequency (similarity and proximity). http://www.psych.mcgill.ca/labs/auditory/demo3.html http://www.psych.mcgill.ca/labs/auditory/demo2.html Circularity in Pitch Judgement Shepard’s Scale (1964) (Auditory Demonstrations CD, from the Acoustical Society of America) Perceptual Organization Organization properties: Belongingness – a sensory element belongs to an organization (or stream) of which is a part. Exclusive allocation – a sensory element cannot belong to more than one organization at a time. Bregman & Rudnicky (1975) Perceptual Organization Organization properties: Closure – perceived continuity, a tendency to close strong perceptual forms, response to missing evidence. Sequential and Spectral Integration in Forming Streams Sequential Integration Grouping sensory elements over time or events at different times considered to be from the same source/object. Spectral Integration Fusing simultaneous sensory elements over frequency into one. Timbre and Spectral Integration The time envelope and harmonic structure give rise the timbre of the sound. 0.5 Amplitude 0 dB -20 -40 -60 0 2000 4000 6000 Hertz 8000 10000 -0.5 12000 dB -20 -40 0 2000 4000 6000 Hertz 8000 10000 0.2 0.4 0.6 Seconds 0.8 1 0 0.2 0.4 0.6 Seconds 0.8 1 0 0.2 0.4 0.6 Seconds 0.8 1 0.5 0 -0.5 -1 12000 0 Amplitude 1 dB -20 -40 -60 0 1 Amplitude 0 -60 0 0 2000 4000 6000 Hertz 8000 10000 12000 0.5 0 -0.5 -1 Timbre and Spectral Integration Simultaneous tones grouped by timbre 2 Notes (F and A) 5000 5000 4000 4000 3000 3000 Hertz Hertz Same Note (A) 2000 2000 1000 1000 0 0.1 0.2 0.3 Seconds 0.4 0 0.1 0.2 0.3 Seconds 0.4 Auditory Scene Organization Primitive Stream Segregation Inherent constraints in auditory scene analysis (perceptual organization demonstrated by infants/children) Schema-based segregation Learned constraints in auditory scene analysis (differences in perceptual organization resulting from training and culture) (A.S. Bregman, Auditory Scene Analysis, MIT Press 1990, pp. 1-45) Cues Use for Grouping From THE AUDITORY ORGANIZATION OF SPEECHAND OTHER SOURCES IN LISTENERS AND COMPUTATIONAL MODELS, M. Cooke and D. P.W. Ellis, 1999