Multi-Channel Wiener Filter - Home pages of ESAT

advertisement
Digital Audio Signal Processing
Lecture-4: Noise Reduction
Marc Moonen/Alexander Bertrand
Dept. E.E./ESAT-STADIUS, KU Leuven
marc.moonen@esat.kuleuven.be
homes.esat.kuleuven.be/~moonen/
Overview
• Spectral subtraction for single-micr. noise reduction
–
–
–
–
Single-microphone noise reduction problem
Spectral subtraction basics (=spectral filtering)
Features: gain functions, implementation, musical noise,…
Iterative Wiener filter based on speech signal model
• Multi-channel Wiener filter for multi-micr. noise red.
– Multi-microphone noise reduction problem
– Multi-channel Wiener filter (=spectral+spatial filtering)
• Kalman filter based noise reduction
– Kalman filters
– Kalman filters for noise reduction
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 2
Single-Microphone Noise Reduction Problem
• Microphone signal is
desired signal s[k]
y[k ]  s[k ]  n[k ]
desired signal
noise
contribution contribution
y[k]
desired signal
estimate
?

s [k ]
noise signal(s)
• Goal: Estimate s[k] based on y[k]
• Applications:
Speech enhancement in conferencing, handsfree telephony, hearing aids, …
Digital audio restoration
• Will consider speech applications: s[k] = speech signal
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 3
Spectral Subtraction Methods: Basics
y[k ]  s[k ]  n[k ]
•
Signal chopped into `frames’ (e.g. 10..20msec), for each frame a
frequency domain representation is
(i-th frame)
Yi ( )  Si ( )  Ni ( )
•
However, speech signal is an on/off signal, hence some frames have
speech +noise, i.e.
Yi (w ) = Si (w )+ Ni (w )
framei Î {`speech + noise' frames}
some frames have noise only, i.e.
Yi ( ) 
•
0  Ni ( )
frame i {`noise - only' frames}
A speech detection algorithm is needed to distinguish between
these 2 types of frames (based on energy/dynamic range/statistical
properties,…)
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 4
Spectral Subtraction Methods: Basics
•
Definition: () = average amplitude of noise spectrum
 ( )  E{ Ni ( ) }
•
Assumption: noise characteristics change slowly, hence estimate ()
by (long-time) averaging over (M) noise-only frames
ˆ ( ) 
•
1
M
 Y ( )
i
M noise-only frames
Estimate clean speech spectrum Si() (for each frame), using
corrupted speech spectrum Yi() (for each frame, i.e. short-time
estimate) + estimated ():
Sˆi ()  Gi ()Yi ()
based on `gain function’
Digital Audio Signal Processing
Version 2014-2015
Gi ( )  f (Yi ( ), ˆ ( ))
Lecture-4: Noise Reduction
p. 5
Spectral Subtraction: Gain Functions
Magnitude Subtraction
Spectral Subtraction
Wiener Estimation
Maximum Likelihood
Non-linear Estimation
Ephraim-Malah
= most frequently used in practice
Digital Audio Signal Processing
Version 2014-2015
see next slide
Lecture-4: Noise Reduction
p. 6
Spectral Subtraction: Gain Functions
• Example 1: Ephraim-Malah Suppression Rule (EMSR)
Gi ( ) 
with:
•
•

2
M [ ]

SNR post ( )

SNR prio ( )


1

 SNR
post

 SNR prio

 1  SNR
prio



 SNR prio
 .M SNR post 

 1  SNR

prio









 

(
1


)
I
(
)


I
(
)
0
1

2
2 
2
Yi ( )
modified Bessel functions
ˆ ( ) 2
2
Gi 1 ( )Yi 1 ( )
(1 -  )max(SNR post - 1,0)  

 ( ) 2
e

2
This corresponds to a MMSE (*) estimation of the speech spectral amplitude
|Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } )
assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984].
Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985].
(*) minimum mean squared error
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 7
Spectral Subtraction: Gain Functions
• Example 2: Magnitude Subtraction
– Signal model:
Yi ( ) 

Si ( )  N i ( )
Yi ( ) e
j y ,i ( )
– Estimation of clean speech spectrum:
Sˆi ( )

Y ( )  ˆ ( )e 


ˆ ( ) 
1 
Yi ( )
Yi ( ) 



j
y ,i
( )
i
Gi ( )
– PS: half-wave rectification
Gi ( )  max( 0, Gi ( ))
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 8
Spectral Subtraction: Gain Functions
• Example 3: Wiener Estimation
2


ˆ
Si ( )
– Linear MMSE estimation:

 

find linear filter Gi() to minimize MSE  E  Si ( )  Gi ( ).Yi ( ) 






– Solution:
Gi ( ) 
ES i ( ).Yi ( ) Psy,i ( )

EYi ( ).Yi ( ) Pyy ,i ( )
<- cross-correlation in i-th frame
<- auto-correlation in i-th frame
Assume speech s[k] and noise n[k] are uncorrelated, then...
Gi ( ) 
Pss,i ( )
Pyy ,i ( )

Pyy ,i ( )  Pnn,i ( )
Pyy ,i ( )
Yi ( )  ˆ ( ) 2
2

Yi ( )
2
ˆ ( ) 2
 1
2
Yi ( )
– PS: half-wave rectification
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 9
Spectral Subtraction: Implementation
Y[n,i]
y[k]
Short-time
analysis
ˆ [ n, i ]
Gain
functions
Sˆ [n, i ]
Short-time
synthesis
sˆ[ k ]
 Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank)
K 1
Y [n, i ]   h[k ] y[i.M  k ]e  j 2 kn / N = estimate for Y(n ) at time i (i-th frame)
k 0
N=number of frequency bins (channels) n=0..N-1
M=downsampling factor
K=frame length
h[k] = length-K analysis window (=prototype filter)
 frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2M..3M)
 subband processing: Sˆ[ n, i ]  G[ n, i ].Y [ n, i ]
 synthesis bank: matched to analysis bank (see DSP-CIS)
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 10
Spectral Subtraction: Musical Noise
• Audio demo: car noise
y[k ]
magnitude subtraction
sˆ[k ]
• Artifact: musical noise
What?
Short-time estimates of |Yi()| fluctuate randomly in noise-only frames,
resulting in random gains Gi()
 statistical analysis shows that broadband noise is transformed into
signal composed of short-lived tones with randomly distributed
frequencies (=musical noise)
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 11
Spectral Subtraction: Musical Noise
• Solutions?
- Magnitude averaging: replace Yi() in calculation of
Gi() by a local average over frames
Sˆi ()  G()Yi ()
average
instantaneous
- EMSR (p7)
- augment Gi() with soft-decision VAD:
Gi()  P(H1 | Yi()). Gi()
…
probability that speech is present, given observation
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 12
Spectral Subtraction: Iterative Wiener Filter
Example of signal model-based spectral subtraction…
• Basic:
Wiener filtering based spectral subtraction (p.9), with
(improved) spectra estimation based on parametric
models
•
Procedure:
1. Estimate parameters of a speech model from noisy signal y[k]
2. Using estimated speech parameters, perform noise reduction
(e.g. Wiener estimation, p. 9)
3. Re-estimate parameters of speech model from the speech
signal estimate
4. Iterate 2 & 3
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 13
Spectral Subtraction: Iterative Wiener Filter
pulse train
voiced
…
gs
…
pitch period
unvoiced
u[k]
all-pole filter
1
x
M
1    m e  j m
white noise
generator
speech
signal
m 1
frequency domain:
S ( ) 
gs
M
1   m e  j m
U ( )
m 1
M
time domain: s[k ]   s[k  m]  g u[k ]
 m
s
m 1
α  1   M 
T
Digital Audio Signal Processing
= linear prediction parameters
Version 2014-2015
Lecture-4: Noise Reduction
p. 14
Spectral Subtraction: Iterative Wiener Filter
For each frame (vector) y[m]
(i=iteration nr.)
1. Estimate g s ,i and αi  1,i   M ,i T
2. Construct Wiener Filter (p.9)
Repeat
until
some error
criterion is
satisfied
Pss ( )
G ( )  .. 
Pss ( )  Pnn ( )
with:
• Pnn ( ) estimated during noise-only periods
g s ,i
• P ( ) 
ss
2
M
1    m , i e  j m
m 1
3. Filter speech frame y[m]
Digital Audio Signal Processing
Version 2014-2015
sˆ i [m]
Lecture-4: Noise Reduction
p. 15
Overview
• Spectral subtraction for single-micr. noise reduction
–
–
–
–
Single-microphone noise reduction problem
Spectral subtraction basics (=spectral filtering)
Features: gain functions, implementation, musical noise,…
Iterative Wiener filter based on speech signal model
• Multi-channel Wiener filter for multi-micr. noise red.
– Multi-microphone noise reduction problem
– Multi-channel Wiener filter (=spectral+spatial filtering)
• Kalman filter based noise reduction
– Kalman filters
– Kalman filters for noise reduction
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 16
Multi-Microphone Noise Reduction Problem
s[k ] speech source
?
n[k ]
noise source(s)
(some) speech estimate
microphone signals
yi [k ]  si [k ]  ni [k ], i  1...M
speech part
Digital Audio Signal Processing

s [k ]
Version 2014-2015
noise part
Lecture-4: Noise Reduction
p. 17
Multi-Microphone Noise Reduction Problem
Will estimate speech
part in microphone 1
(*) (**)
s[k ]
?

s1[k ]
n[k ]
yi [k ]  si [k ]  ni [k ], i  1...M
(*) Estimating s[k] is more difficult, would include dereverberation...
(**) This is similar to single-microphone model (p.3), where additional microphones
(m=2..M) help to get a better estimate
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 18
Multi-Microphone Noise Reduction Problem
• Data model:
Y( )  S( )  N( )
 d( ).S ( )  N( )
 Y1 ( )   H1 ( ) 
 N1 ( ) 
 Y ( )   H ( ) 
 N ( ) 
 2
 2
.S ( )   2

     
  

 



Y
(

)
H
(

)
N
(

)
 M
  M

 M

See Lecture-2 on multi-path propagation, with q left out for
conciseness.
Hm(ω) is complete transfer function from speech source position to
m-the microphone
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 19
Multi-Channel Wiener Filter (MWF)
• Data model:
Y( )  d( ).S ( )  N( )
• Will use linear filters to obtain speech estimate (as in Lecture-2)
M
Sˆ1 ( )   Fm* ( ).Ym ( )  F H ( ).Y( )
m 1
• Wiener filter (=linear MMSE approach)
2
min F ( ) E{ S1 ( )  F ( ).Y( ) }
H
Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!),
hence solution will be `unusual’…
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 20
Multi-Channel Wiener Filter (MWF)
• Wiener filter solution is (see DSP-CIS)
1
F( )  E{Y( ).Y ( )} E{Y( ).S1* ( )}.
 
H
autocorrelation
 ...
crosscorrelation
*
(with E {S ( ). N 1 ( )}  0 )


 E{Y( ).Y H ( )}1. E{Y( ).Y1* ( )}  E{N( ).N1* ( )}
compute during speech+noise periods
compute during noise-only periods
– All quantities can be computed !
– Special case of this is single-channel Wiener filter formula on p.9
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 21
Multi-Channel Wiener Filter (MWF)
•
MWF combines spatial filtering (as in Lecture-2) with
single-channel spectral filtering (as in single-channel noise reduction)
if
Y( )



 Y1 ( ) 
 Y ( ) 
 2
  d( ) .S ( )  N( )




   steering
vector
noise


YM ( )
E{N ( ).N H ( )}  Φ NN ( )
then…
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 22
Multi-Channel Wiener Filter (MWF)
•
…then it can be shown that
1
F( )  
(

)
.
Φ
 NN ( ).d( )
scalar
•
1
Φ NN
( ).d ( ) represents a spatial filtering (*)
Compare to superdirective & delay-and-sum beamforming (Lecture-2)
–
Delay-and-sum beamf. maximizes array gain in white noise field
–
Superdirective beamf. maximizes array gain in diffuse noise field
–
MWF maximizes array gain in unknown (!) noise field.
MWF is operated without invoking any prior knowledge (steering
vector/noise field) ! (the secret is in the voice activity detection… (explain))
(*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR
(at one frequency)
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 23
Multi-Channel Wiener Filter (MWF)
•
…then it can be shown that
(continued)
1
F( )  
(

)
.
Φ
 NN ( ).d( )
scalar
•
 ( )
represents an additional `spectral post-filter’
i.e. single-channel Wiener estimate (p.9), applied to output signal
of spatial filter
S ( ) .H1* ( )
2
 ( )  ... 
d
H

1
( ).Φ NN
( ).d( ) . S ( )  1
2
(prove it!)
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 24
Multi-Channel Wiener Filter: Implementation
• Implementation with short-time Fourier transform: see p.10
• Implementation with time-domain linear filtering:
s[k ]
y1[k ]

s1[k ]
y i [k ]  s i [k ]  n i [k ]
y Ti [k ]   yi [k ]
n[k ]
M

s1[k ]   y Ti [k ].fi
i 1
Digital Audio Signal Processing
yi [k  1] ...
Version 2014-2015
yi [k  L  1]
filter
coefficients
Lecture-4: Noise Reduction
p. 25
Multi-Channel Wiener Filter: Implementation
• Implementation with time-domain linear filtering:
2
min f E{ s1[k ]  y [k ].f }
T

f  f1T

f 2T
... f MT

T

y[k ]  y1T [k ] y T2 [k ] ... y TM [k ]
T
Solution is…


 E{y[k ].y[k ] } .E{y[k ]. y [k ]}  E{n[k ].n [k ]}
T
1
T
1
f  E{y[k ].y[k ] } .E{y[k ].s1[k ]}
1
1
compute during speech+noise periods
compute during noise-only periods
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 26
Overview
• Spectral subtraction for single-micr. noise reduction
–
–
–
–
Single-microphone noise reduction problem
Spectral subtraction basics (=spectral filtering)
Features: gain functions, implementation, musical noise,…
Iterative Wiener filter based on speech signal model
• Multi-channel Wiener filter for multi-micr. noise red.
– Multi-microphone noise reduction problem
– Multi-channel Wiener filter (=spectral+spatial filtering)
• Kalman filter based noise reduction
– Kalman filters
– Kalman filters for noise reduction
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 27
Kalman Filter 1/12
State space model of a time-varying discrete-time system
ì
ï x[k +1] =
í
=
ï
î y[k]
process noise
A[k].x[k]+ B[k].u[k]+ v[k]
C[k].x[k]+ D[k].u[k]+ w[k]
measurement
noise
with v[k] and w[k]: mutually uncorrelated, zero mean, white noises
é v[k]] ù
ú. éê v[k]H
E{ê
êë w[k] úû ë
w[k]H
é
0 ù
ù ê V[k]
ú
úû} =
W[k] úû
êë 0
1
2
T
2
V[k] = V[k] .V[k]
= Cholesky/square-root factorization
Then:
given A[k], B[k], C[k], D[k], V[k], W[k] and
input/output-observations u[k],y[k], k=0,1,2,... then
Kalman filter produces MMSE estimates of internal states x[k], k=0,1,...
Digital Audio Signal Processing
PS: will use shorthand notation here, i.e.
Version 2014-2015
Lecture-4:
Reduction
xk, yk ,.. instead of
x[k], Noise
y[k],..
p. 28
Kalman Filter 2/12
Definition:
x̂ k|l
• `FILTERING’
= MMSE-estimate of xk using all
available data up until time l
= estimate
x̂ k|k
• `PREDICTION’ = estimate
x̂ k|k-n , n > 0
• `SMOOTHING’ = estimate
x̂ k|k+n, n > 0
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 29
Kalman Filter 3/12
Initalization:
=error covariance matrix
‘Conventional Kalman Filter’ operation @ time k (k=0,1,2,..)
Given x̂ k|k-1 and corresponding error covariance matrix P k|k-1 ,
Compute x̂ k|k , P k|k and x̂k+1|k , Pk+1|k using uk, yk :
Step 1: Measurement Update
(produces ‘filtered’ estimate)
(compare to standard RLS!)
Step 2: Time Update
(produces ‘1-step prediction’)
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 30
Kalman Filter 4/12
PS: ‘Standard RLS’ is a special case of ‘Conventional KF’

Digital Audio Signal Processing
Version 2014-2015
Internal state vector is FIR filter
coefficients vector, which is
assumed to be time-invariant
Lecture-4: Noise Reduction
p. 31
Kalman Filter 5/12
State estimation @ time k corresponds to overdetermined set of linear
equations, where vector of unknowns contains all previous state vectors :
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 32
Kalman Filter 6/12
Technical detail…
State estimation @ time k corresponds to overdetermined set of linear
equations, where vector of unknowns contains all previous state vectors :
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 33
Kalman Filter 7/12
State estimation @ time k corresponds to overdetermined set of linear
equations, where vector of unknowns contains all previous state vectors :
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 34
Kalman Filter 8/12
Propagated from
time k-1 to time k


‘Square-Root’ Kalman Filter
..hence requires only lower-right/lower part
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 35
Kalman Filter 9/12
Digital Audio Signal Processing
Version 2014-2015
Propagated from
time k-1 to time k


‘Square-Root’ Kalman Filter
Lecture-4: Noise Reduction
p. 36
Kalman Filter 10/12
Digital Audio Signal Processing



Propagated from
time k to time k+1
Propagated from
time k-1 to time k

‘Square-Root’ Kalman Filter
Version 2014-2015
Lecture-4: Noise Reduction
p. 37
Kalman Filter 11/12
QRD-
=QRDDigital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 38
Kalman Filter 12/12
can be derived from square-root KF equations
is
.
: can be worked into measurement update eq.
: can be worked into state update eq.
[details omitted]
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 39
Kalman filter for Speech Enhancement
• Assume AR model of speech and noise
s[k ] 
n[k ] 
M

m 1
N

m 1
m
m
s[k  m]  g s u[k ]
n[k  m]  g n w[k ]
u[k], w[k] = zero mean, unit
variance,white noise
• Equivalent state-space model is…
x[k  1]  Ax[k ]  v[k ]

T
 c x[k ]
 y[k ]
Digital Audio Signal Processing
Version 2014-2015
y=microphone signal
Lecture-4: Noise Reduction
p. 40
Kalman filter for Speech Enhancement
with:
xT [k ]  s[k  M  1]  s[k ] n[k  N  1]  n[k ]
v[k ]  G.u[k ] w[k ]
T
A s
A
0
 0
 
As  
 0

 M
Q  G.GT
N
M
0  T   
g s
; C  0  0 1 0  0 1; G  
A n 


0
1
0

 M 1

0
0
 
 0 
; An  
0
0 1


 1 
 N

gTs  0  0
Digital Audio Signal Processing


0
 0 
0 1

 1 

1
0

 N 1
g s ; gTn  0  0
Version 2014-2015
0
g n 

gn ;
Lecture-4: Noise Reduction
p. 41
Kalman filter for Speech Enhancement
Iterative algorithm
y[k]
split
signal
in frames
iterations
estimate
parameters
gˆ s ,i ; ˆ m,i
gˆ n ,i ; ˆm ,i
Kalman Filter
reconstruct
signal
sˆ i [m],
nˆ i [m]
sˆ[ k ]
Disadvantages iterative approach:
• complexity
• delay
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 42
Kalman filter for Speech Enhancement
Sequential algorithm
iteration index
sˆ[k  1 | k  1],
time index
(no iterations)
D
nˆ[k  1 | k  1]
y[k ]
ˆ m [k  1 | k  1],
ˆm [k 1 | k 1],
gˆ s [k  1 | k  1],
gˆ n [k  1 | k  1]
Digital Audio Signal Processing
sˆ[k | k ], nˆ[k | k ]
State Estimator:
Kalman Filter
sˆ, nˆ
Parameters Estimator
(Kalman Filter)
D
Version 2014-2015
ˆ m [k | k ],
ˆm [k | k ],
ˆ m , ˆm ,
gˆ s , gˆ n
gˆ s [k | k ],
gˆ n [k | k ]
Lecture-4: Noise Reduction
p. 43
CONCLUSIONS
• Single-channel noise reduction
– Basic system is spectral subtraction
– Only spectral filtering, hence can only exploit differences in spectra
between noise and speech signal:
• noise reduction at expense of speech distortion
• achievable noise reduction may be limited
• Multi-channel noise reduction
– Basic system is MWF,
– Provides spectral + spatial filtering (links with beamforming!)
• Iterative Wiener filter & Kalman filtering
– Signal model based approach
Digital Audio Signal Processing
Version 2014-2015
Lecture-4: Noise Reduction
p. 44
Download