Audio and Speech Processing Lecture 6 : Single Channel Speech

advertisement
Digital Audio Signal Processing
Lecture-3
Noise Reduction
Marc Moonen
Dept. E.E./ESAT-STADIUS, KU Leuven
marc.moonen@esat.kuleuven.be
homes.esat.kuleuven.be/~moonen/
Overview
• Spectral subtraction for single-micr. noise reduction
–
–
–
–
Single-microphone noise reduction problem
Spectral subtraction basics (=spectral filtering)
Features: gain functions, implementation, musical noise,…
Iterative Wiener filter based on speech signal model
• Multi-channel Wiener filter for multi-micr. noise red.
– Multi-microphone noise reduction problem
– Multi-channel Wiener filter (=spectral+spatial filtering)
• Kalman filter based noise reduction
– Kalman filters
– Kalman filters for noise reduction
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 2
Single-Microphone Noise Reduction Problem
• Microphone signal is
desired signal s[k]
y[k ]  s[k ]  n[k ]
desired signal
noise
contribution contribution
y[k]
desired signal
estimate
?

s [k ]
noise signal(s)
• Goal: Estimate s[k] based on y[k]
• Applications:
Speech enhancement in conferencing, handsfree telephony, hearing aids, …
Digital audio restoration
• Will consider speech applications: s[k] = speech signal
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 3
Spectral Subtraction Methods: Basics
y[k ]  s[k ]  n[k ]
•
Signal chopped into `frames’ (e.g. 10..20msec), for each frame a
frequency domain representation is
Yi ( )  Si ( )  Ni ( ) (i-th frame)
•
However, speech signal is an on/off signal, hence some frames have
speech +noise, i.e.
Yi (w ) = Si (w )+ Ni (w )
framei Î {`speech + noise' frames}
some frames have noise only, i.e.
Yi ( ) 
•
0  Ni ( )
frame i {`noise - only' frames}
A speech detection algorithm is needed to distinguish between
these 2 types of frames (based on energy/dynamic range/statistical
properties,…)
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 4
Spectral Subtraction Methods: Basics
•
Definition: () = average amplitude of noise spectrum
 ( )  E{ Ni ( ) }
•
Assumption: noise characteristics change slowly, hence estimate
() by (long-time) averaging over (M) noise-only frames
ˆ ( ) 
•
1
M
 Y ( )
i
M noise-only frames
Estimate clean speech spectrum Si() (for each frame), using
corrupted speech spectrum Yi() (for each frame, i.e. short-time
estimate) + estimated ():
Sˆi ()  Gi ()Yi ()
based on `gain function’ G ( )  f (Y ( ), ˆ ( ))
i
i
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 5
Spectral Subtraction: Gain Functions
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 6
Spectral Subtraction: Gain Functions
• Example 1: Ephraim-Malah Suppression Rule (EMSR)
Gi (w ) =
2
M[q ]
with:
•
é
öæ SNR
ö
æ SNR
öù
prio
prio
÷÷çç
÷÷.M êSNR post çç
÷÷ú
1+
SNR
1+
SNR
ê
øè
è
prio ø
prio øú
ë
û
q
é
q
q ù
= e 2 ê(1+ q )I 0 ( ) + q I1 ( )ú
ë
2
2 û
-
SNR post (w ) =
SNR prio (w )
•
æ
1
çç
è SNR post
p
Yi (w )
2
m̂ (w )2
modified Bessel functions
G (w )Y (w )
= (1-a )max(SNR post -1,0) + a i-1 ⌢ i-12
m (w )
2
This corresponds to a MMSE (*) estimation of the speech spectral amplitude
|Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } )
assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984].
Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985].
(*) minimum mean squared error
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 7
Spectral Subtraction: Gain Functions
• Example 2: Magnitude Subtraction
– Signal model:
Yi ( ) 

Si ( )  N i ( )
Yi ( ) e
j y ,i ( )
– Estimation of clean speech spectrum:
Sˆi ( )

Y ( )  ˆ ( )e 


ˆ ( ) 
1 
Yi ( )
Yi ( ) 



j
y ,i
( )
i
Gi ( )
– PS: half-wave rectification
Gi ( )  max( 0, Gi ( ))
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 8
Spectral Subtraction: Gain Functions
• Example 3: Wiener Estimation

– Linear MMSE estimation:
Sˆi ( )


find linear filter Gi() to minimize MSE  E  S ( )  G ( ).Y ( )
 i
i
i


– Solution:
Gi ( ) 
ESi ( ).Yi ( ) Psy,i ( )

EYi ( ).Yi ( ) Pyy ,i ( )
2






<- cross-correlation in i-th frame
<- auto-correlation in i-th frame
Assume speech s[k] and noise n[k] are uncorrelated, then...
Gi ( ) 
Pss,i ( )
Pyy ,i ( )

Pyy ,i ( )  Pnn,i ( )
Pyy ,i ( )
Yi ( )  ˆ ( ) 2
2

Yi ( )
2
ˆ ( ) 2
 1
2
Yi ( )
– PS: half-wave rectification
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 9
Spectral Subtraction: Implementation
Y[n,i]
y[k]
Short-time
analysis
ˆ [n, i ]
Gain
functions
Sˆ [n, i ]
Short-time
synthesis
sˆ[ k ]
 Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank)
K-1
Y[n, i] = å h[k]y[i.D - k]e- j 2 p kn/N = estimate for Y( ) at time i (i-th frame)
n
k=0
N=number of frequency bins (channels) n=0..N-1
D=downsampling factor
K=frame length
h[k] = length-K analysis window (=prototype filter)
 frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2D..3D)
 subband processing: Sˆ[n, i ]  G[n, i ].Y [n, i ]
 synthesis bank: matched to analysis bank (see DSP-CIS)
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 10
Spectral Subtraction: Musical Noise
• Audio demo: car noise
y[k ]
magnitude subtraction
sˆ[k ]
• Artifact: musical noise
What?
Short-time estimates of |Yi()| fluctuate randomly in noise-only frames,
resulting in random gains Gi()
 statistical analysis shows that broadband noise is transformed into
signal composed of short-lived tones with randomly distributed
frequencies (=musical noise)
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 11
Spectral Subtraction: Musical Noise
Solutions?
- Magnitude averaging: replace Yi() in calculation of
Gi() by a local average over frames
Sˆi ()  G()Yi ()
average
instantaneous
- EMSR (p7)
- augment Gi() with soft-decision VAD:
Gi()  P(H1 | Yi()). Gi()
…
probability that speech is present, given observation
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 12
Spectral Subtraction: Iterative Wiener Filter
Example of signal model-based spectral subtraction…
• Basic:
Wiener filtering based spectral subtraction (p.9), with
(improved) spectra estimation based on parametric
models
•
Procedure:
1. Estimate parameters of a speech model from noisy signal y[k]
2. Using estimated speech parameters, perform noise reduction
(e.g. Wiener estimation, p. 9)
3. Re-estimate parameters of speech model from the speech
signal estimate
4. Iterate 2 & 3
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 13
Spectral Subtraction: Iterative Wiener Filter
pulse train
voiced
…
gs
…
pitch period
unvoiced
u[k]
all-pole filter
1
x
M
1    m e  j m
white noise
generator
speech
signal
m 1
frequency domain: S ( ) 
gs
M
1   m e  j m
U ( )
m 1
M
time domain: s[k ]    m s[k  m]  g s u[k ]
m 1
α  1   M 
T
Digital Audio Signal Processing
= linear prediction parameters
Version 2015-2016
Lecture-3: Noise Reduction
p. 14
Spectral Subtraction: Iterative Wiener Filter
For each frame (vector) y[m]
(i=iteration nr.)
1. Estimate g s ,i and αi  1,i   M ,i T
2. Construct Wiener Filter (p.9)
Repeat
until
some error
criterion is
satisfied
Pss ( )
G ( )  .. 
Pss ( )  Pnn ( )
with:
• Pnn ( ) estimated during noise-only periods
g s ,i
• P ( ) 
ss
2
M
1    m , i e  j m
m 1
3. Filter speech frame y[m]
Digital Audio Signal Processing
Version 2015-2016
sˆ i [m]
Lecture-3: Noise Reduction
p. 15
Overview
• Spectral subtraction for single-micr. noise reduction
–
–
–
–
Single-microphone noise reduction problem
Spectral subtraction basics (=spectral filtering)
Features: gain functions, implementation, musical noise,…
Iterative Wiener filter based on speech signal model
• Multi-channel Wiener filter for multi-micr. noise red.
– Multi-microphone noise reduction problem
– Multi-channel Wiener filter (=spectral+spatial filtering)
• Kalman filter based noise reduction
– Kalman filters
– Kalman filters for noise reduction
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 16
Multi-Microphone Noise Reduction Problem
s[k ] speech source
n[k ]
microphone signals
ym [k] = sm [k]+ nm [k], m =1...M
noise source(s)
speech part
Digital Audio Signal Processing
(some) speech estimate
Version 2015-2016
noise part
Lecture-3: Noise Reduction
M= number of microphones
?

s [k ]
p. 17
Multi-Microphone Noise Reduction Problem
Will estimate speech
part in microphone 1
(*) (**)
s[k ]
?

s1[k ]
n[k ]
ym [k] = sm [k]+ nm [k], m =1...M
(*) Estimating s[k] is more difficult, would include dereverberation...
(**) This is similar to single-microphone model (p.3), where additional microphones
(m=2..M) help to get a better estimate
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 18
Multi-Microphone Noise Reduction Problem
• Data model:
Y( )  S( )  N( )
 d( ).S ( )  N( )
 Y1 ( )   H1 ( ) 
 N1 ( ) 
 Y ( )   H ( ) 
 N ( ) 
 2
 2
.S ( )   2

     
  

 



YM ( )  H M ( )
 N M ( )
See Lecture-2 on multi-path propagation, with q left out for
conciseness.
Hm(ω) is complete transfer function from speech source position to
m-the microphone
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 19
Multi-Channel Wiener Filter (MWF)
• Data model:
Y( )  d( ).S ( )  N( )
• Will use linear filters to obtain speech estimate (as in Lecture-2)
M
Sˆ1 ( )   Fm* ( ).Ym ( )  F H ( ).Y( )
m 1
• Wiener filter (=linear MMSE approach)
2
min F ( ) E{ S1 ( )  F ( ).Y( ) }
H
Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!),
hence solution will be `unusual’…
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 20
Multi-Channel Wiener Filter (MWF)
• Wiener filter solution is (see DSP-CIS)
1
F ( )  E{Y ( ).Y H ( )} E{Y( ).S1* ( )}.
 
autocorrelation
 ...
crosscorrelation
*
(with E {S ( ). N 1 ( )}  0 )


 E{Y( ).Y H ( )}1. E{Y( ).Y1* ( )}  E{N ( ). N1* ( )}
compute during speech+noise periods
compute during noise-only periods
– All quantities can be computed !
– Special case of this is single-channel Wiener filter formula on p.9
– In practice, use alternative to ‘subtraction’ operation (see literature)
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 21
Multi-Channel Wiener Filter (MWF)
•
Note that…
MWF combines spatial filtering (as in Lecture-2) with
single-channel spectral filtering (as in single-channel noise reduction)
if
Y( )



 Y1 ( ) 
 Y ( ) 
 2
  d( ) .S ( )  N( )




   steering
vector
noise


YM ( )
E{N ( ).N H ( )}  Φ NN ( )
then…
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 22
Multi-Channel Wiener Filter (MWF)
…then it can be shown that
1
F( )  
(

)
.
Φ
 NN ( ).d( )
scalar
①
1
Φ NN
( ).d ( ) represents a spatial filtering (*)
Compare to superdirective & delay-and-sum beamforming (Lecture-2)
–
Delay-and-sum beamf. maximizes array gain in white noise field
–
Superdirective beamf. maximizes array gain in diffuse noise field
–
MWF maximizes array gain in unknown (!) noise field.
MWF is operated without invoking any prior knowledge (steering
vector/noise field) ! (the secret is in the voice activity detection… (explain))
(*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR
(at one frequency)
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 23
Multi-Channel Wiener Filter (MWF)
…then it can be shown that
1
F( )  
(

)
.
Φ
 NN ( ).d( )
scalar
①
1
Φ NN
( ).d ( ) represents a spatial filtering (*)
②
 ( )
represents an additional `spectral post-filter’
i.e. single-channel Wiener estimate (p.9), applied to output signal
of spatial filter
S ( ) .H1* ( )
2
 ( )  ... 
d
H

1
( ).Φ NN
( ).d( ) . S ( )  1
2
(prove it!)
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 24
Multi-Channel Wiener Filter: Implementation
• Implementation with short-time Fourier transform: see p.10
• Implementation with time-domain linear filtering:
s[k ]
y1[k ]

s1[k ]
ym [k] = sm [k]+ nm [k]
n[k ]
yTm [k] = éë ym [k] ym [k -1] ...
ym [k - L +1] ùû
⌢
s1[k] = å yTm [k].fm
filter
coefficients
M
m=1
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 25
Multi-Channel Wiener Filter: Implementation
• Implementation with time-domain linear filtering:
2
min f E{ s1[k ]  y [k ].f }
T

f  f1T

f 2T
... f MT

T

y[k ]  y1T [k ] y T2 [k ] ... y TM [k ]
T
Solution is…


 E{y[k ].y[k ] } .E{y[k ]. y [k ]}  E{n[k ].n [k ]}
T
1
T
1
f  E{y[k ].y[k ] } .E{y[k ].s1[k ]}
1
1
compute during speech+noise periods
compute during noise-only periods
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 26
Overview
• Spectral subtraction for single-micr. noise reduction
–
–
–
–
Single-microphone noise reduction problem
Spectral subtraction basics (=spectral filtering)
Features: gain functions, implementation, musical noise,…
Iterative Wiener filter based on speech signal model
• Multi-channel Wiener filter for multi-micr. noise red.
– Multi-microphone noise reduction problem
– Multi-channel Wiener filter (=spectral+spatial filtering)
• Kalman filter based noise reduction
– Kalman filters : See Lecture-6
– Kalman filters for noise reduction
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 27
Kalman filter for Speech Enhancement
• Assume AR model of speech and noise
s[k] =
Ns
åa s[k - n]+
n
gs u[k]
n=1
n[k] =
Nn
å b n[k - n]+
n
gn w[k]
u[k], w[k] = zero mean, unit
variance,white noise
n=1
• Equivalent state-space model is…
x[k  1]  Ax[k ]  v[k ]

T
y
[
k
]

c
x[k ]

Digital Audio Signal Processing
Version 2015-2016
y=microphone signal
Lecture-3: Noise Reduction
p. 28
with:
v[k ]  G.u[k ] w[k ]
T
A s
A
0
Q  G.GT
N
M
0  T   
g s
; C  0  0 1 0  0 1; G  
A n 


0

gTs  0  0
Digital Audio Signal Processing


g s ; gTn  0  0
Version 2015-2016

gn ;
Lecture-3: Noise Reduction
0
g n 
s[k] and n[k] are included in state vector,
hence can be estimated by Kalman Filter
Kalman filter for Speech Enhancement
p. 29
Kalman filter for Speech Enhancement
Iterative algorithm (details omitted)
iterations
y[k]
split
signal
in frames
estimate
parameters
ĝs,i ; â n,i
ĝn,i ; b̂n,i
Kalman Filter
ŝi [m]
reconstruct
signal
sˆ[ k ]
nˆ i [m]
Disadvantages iterative approach:
• complexity
• delay
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 30
Kalman filter for Speech Enhancement
Sequential algorithm (details omitted)
iteration index
time index
(no iterations)
D
ŝ[k -1| k -1]
nˆ[k  1 | k  1]
y[k ]
ân [k -1| k -1] b̂n [k -1| k -1]
ĝs [k -1| k -1] gˆ n [k  1 | k  1]
Digital Audio Signal Processing
sˆ[k | k ], nˆ[k | k ]
State Estimator
(Kalman Filter)
Parameters
Estimator
(Kalman Filter)
D
Version 2015-2016
sˆ, nˆ
ân , b̂n
gˆ s , gˆ n
ân [k | k] b̂n [k | k]
gˆ s [k | k ], gˆ n [k | k ]
Lecture-3: Noise Reduction
p. 31
CONCLUSIONS
• Single-channel noise reduction
– Basic system is spectral subtraction
– Only spectral filtering, hence can only exploit differences in spectra
between noise and speech signal:
• noise reduction at expense of speech distortion
• achievable noise reduction may be limited
• Multi-channel noise reduction
– Basic system is MWF,
– Provides spectral + spatial filtering (links with beamforming!)
• Iterative Wiener filter & Kalman filtering
– Signal model based approach
Digital Audio Signal Processing
Version 2015-2016
Lecture-3: Noise Reduction
p. 32
Download