Enhancing the Intelligibility of Speech in Speech Noise 1. 2.

Enhancing the Intelligibility of Speech in Speech Noise Dan Ellis (Columbia) Pierre Divenyi (EBIRE) Alain de Cheveigné (ENS Paris) Te-Won Lee (UCSD) Barbara Shinn-Cunningham (BU) DeLiang Wang (OSU) 1. Speech in Speech Noise Problem 2. Approaches and Goals 3. The Team Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 1 /8 1. Speech in Speech Noise • Cocktail-party problem: Unintelligible voices                        • Scenarios:            real-world speech noise - crowds, reverb, etc. improve intelligibility • Applications hearing prostheses, communication devices audio archive review, surveillance Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 2 /8 2. Acoustic Source Separation • ICA (Bell & Sejnowski ’95 et seq.): Input: waveform (or STFT) Output: waveform (or STFT) Engine: cancellation Control: statistical independence of outputs - or energy minimization for beamforming Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 3 /8 2. Acoustic Source Separation • ICA (Bell & Sejnowski ’95 et seq.): • CASA (e.g. Brown ’92): Input: Periodicity, continuity, onset “maps” Output: Waveform (or mask) Engine: Time-frequency masking Control: “Grouping cues” from input - or: spatial features (Roman, ...) Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 4 /8 2. Acoustic Source Separation • ICA (Bell & Sejnowski ’95 et seq.): • CASA (e.g. Brown ’92): • Human Listeners: Input: excitation patterns ... Output: percepts ... Engine: ? Control: find a plausible explanation Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 5 /8 Separation Outputs • What is the output of a separation system? waveform with identified target energy abstract description of content reconstruction optimized for intelligibility... Frequency Frequency • e.g. time-frequency masking 0 dB SNR 8000 -12 dB SNR -24 dB SNR 6000 Noise mix 4000 2000 8000 6000 Mask resynth 4000 2000 0 0 1 2 Time 3 0 1 2 Time Speech Separation – Ellis, Divenyi et al. 3 0 1 2 Time 3 2005-11-11 - 6 /8 Project Goals • Developing source separation techniques single/multi channel auditory/blind/model-based combinations • Collection and simulation of data real-world scenarios and replicas .. for parametric testing .. for systematic evaluation • Connecting with perception intelligibility impact of different artifacts add “proxy noise” to leverage restoration Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 7 /8 3. A Multidisciplinary Project • • • • • • Emphasis: M = machine, H = human Dan Ellis (Director), Organizer of Curriculum - M machine learning, machine separation, natural scenes Pierre Divenyi (Co-Director), Coordination of Project Components - H auditory scene analysis, psychoacoustics, testing methods Alain de Cheveigné - H,M auditory models of separation, pitch, multichannel analysis Te-Won Lee - M blind signal separation methods Barbara Shinn-Cunningham - H spatial auditory scene analysis, spatial acoustics DeLiang Wang - M computational auditory scene analysis, sequential/spatial grouping Speech Separation – Ellis, Divenyi et al. 2005-11-11 - 8 /8

Enhancing the Intelligibility of Speech in Speech Noise 1. 2.

Related documents

Products

Support

Enhancing the Intelligibility of Speech in Speech Noise 1. 2.

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib