SignalSubspaceSE

advertisement
Signal Subspace Speech Enhancement
Page 0 of 43
Presentation Outline
 Introduction
 Principals
 Orthogonal Transforms (KLT Overview)
 Papers Review
Page 1 of 47
Introduction
 Two major classes of speech enhancement
– By modeling of noise/speech: like HMM
 Highly dependent on speech signal syntax and noise
characteristics
– Based on transformation: Spectral Subtraction
 Musical noise
 Signal Subspace belongs to the second class
(nonparametric)
Page 2 of 47
Schematic Diagram
Noisy signal
(time domain)
Orthogonal
Transform
Modifying
Coefficients
Inverse
Transform
Estimated
Clean
Signal
Page 3 of 47
Schematic Diagram
Noisy signal
(time domain)
Framing
overlapping
Producing two
orthogonal
subspaces
Signal+Noise
subspace
Orthogonal
Transform
Estimating
Clean signal
from
Signal+Noise
subspace
Estimating
Dimensions of
Subspaces
Gs
Gn
Inverse
Transform
Noise
subspace
Clean
Signal
Page 4 of 47
Principals
 Procedure
– Estimate the dimension of the signal+noise subspace
in each frame
– Estimate clean signal from (S+N) subspace by
considering some criteria (main part)
 energy of the residual noise
 energy of the signal distortion
– Nulling the coefficients related to the noise subspace
Page 5 of 47
Principals
 Assumptions
– Noise & speech are uncorrelated
– Noise is additive & white (whitened)
– Covariance matrix of the noise in each frame is
positive definite and close to a Toeplitz matrix
– Signal is more statistically structured than noise
process
Page 6 of 47
Principals
 Key Factor in Signal Subspace method
– Covariance matrices of the clean signal have some
zero eigenvalues.
 The improvement in SNR is proportional to the number of
those zeros.
Nullifying the coefficients of the noise subspace corresponds
to that of weak spectral components in spectral subtraction.
Page 7 of 47
Orthogonal Transforms
 Signal Subspace decomposition can be achieved by
applying:
– KLT
 via Eigenvalue Decomposition (ED) of signal covariance
matrix
 via Singular Value Decomposition (SVD) of data matrix
 SVD approximation by recursive methods
– DCT as a good approximation to the KLT
– Walsh, Haar, Sine, Fourier,…
Page 8 of 47
Orthogonal Transforms:
Karhunen-Loeve Transform (KLT)
 Also known as “Hotelling”, “Principal Component” or
“Eigenvector" Transform
 Decorrelates the input vector perfectly
– Processing of one component has no effect on the
others
 Applications
– Compression, Pattern Recognition, Classification,
Image Restoration, Speech Recognition, Speaker
Recognition,…
Page 9 of 47
KLT Overview
Let R be the N N correlation matrix of a random
complex sequence x  ( x1 , x2 ,..., xN )T
then

R  E xxH

 x1

 x2

E 
 

x

 N


 
 x1




x2


xN








Where E is the expectation operator and R is
Hermitian matrix.
Page 10 of 47
KLT Overview
Let  be N N unitary matrix which diagonalizes R
 
1
H
 R  
  Diag 1 , 2 ,..., N 
H
i , i  1,2,..., N are the eigenvalues of R.

H
is called the KLT matrix.
Page 11 of 47
KLT Overview
Property of

H
:
•Consider the following transform:
y  x
H
sequence y is uncorrelated because :
  
E yy H  E  H xxH 
 

  H E xxH    H R

y has no cross-correlation
Page 12 of 47
KLT Overview
What is  ?
 R     R    R  
where   12  N 
H
and
H
`s
i are ith column of 
 Ri  ii , i  1,2,..., N
Thus
i ' s are eigenvectors corresponding to i ' s
Page 13 of 47
KLT Overview
 Comments
– The arrangement of y auto-correlations is the same as
that of i s '
– KLT can be based on Covariance matrix
– Using largest eigenvalues to reconstruct sequence with
negligible error
– KLT is optimal
Page 14 of 47
KLT Overview
 Difficulties
– Computational Complexity (no fast algorithm)
– Dependency on the statistics of the current frame
– Make uncorrelated not independent
 Utilize KLT as a Benchmark in evaluating the
performance of the other transforms.
Page 15 of 47
Papers Review
1. A Signal Subspace Approach for S.E. [Ephraim 95]
2. On S.E. Algorithms based on Signal Subspace Methods [Hansen]
3. Extension of the Signal Subspace S.E. Approach to Colored Noise
[Ephraim]
4. An Adaptive KLT Approach for S.E. [Gazor]
5. Incorporating the Human Hearing Properties in Signal Subspace
Approach for S.E. [Jabloun]
6. An Energy-Constrained Signal Subspace Method for S.E. [Huang]
7. S.E. Based on the Subspace Method [Asano]
Page 16 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Principal
– Decompose the input vector of the noisy signal into a
signal+noise subspace and a noise subspace by
applying KLT
 Enhancement Procedure
– Removing the noise subspace
– Estimating the clean signal from S+N subspace
– Two linear estimators by considering:
 Signal distortion
 Residual noise energy
Page 17 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Notes
– Keeping the residual noise below some threshold to
avoid producing musical noise
– Since DFT & KLT are related, SS is a particular case
of this method
– if # of basis vectors (for linear combination of a vector)
are less than the dim of the vector, then there are
some zero eigenvalues for its correlation matrix
Page 18 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Basics
– speech signal : z=y+w , K-dimensional
M
–
y   smVm , M  K
m 1
s1 ,, sM  Are zero mean complex variables
–  y  Vs
– If M=K, representation is always possible.
– Else “damped complex sinusoid model” can be used.
– Span( V ): produces all vector y
Page 19 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 When M<K, all vectors y lie in a subspace of
spanned by the columns of V
 SIGNAL+NOISE SUBSPACE
RK
 Covariance matrix of clean signal y
y  Vs
#
#
Ry  Eyy  VRsV
; K  M, M  M, M  K
Rank ( R y )  M
 has K  M zero eigenvalues
Page 20 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Covariance matrix of noise w : (K-Dim)
 
Rw  E ww   I
#
2
w
Rank ( Rw )  K
RK
n
S
n
n
– White noise vectors fill the entire Euclidean space RK
– Thus the noise exists in both S+N subspace and
complementary subspace
NOISE SUBSPACE
Page 21 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 The discussion indicates that Euclidean space of the
noisy signal is composed of a signal subspace and a
complementary noise subspace
 This decomposition can be performed by applying KLT to
the noisy signal :
 Let z  Vs  w
 The covariance matrix of z is:
 
Rz  E zz  VRsV  Rw
#
#
Page 22 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Noise is additive
 Let
Rz  U zU
Rz  Ry  Rw
#
be the eigendecomposition of Rz
U  u1 ,, uk  are eigenvectors of Rz and
 z  diag z 1,, z K 
 Where
 Eigenvalues of Rw are

2
w
 y k    if k  1,, M
 z k    2
if k  M  1,, K
 w
2
w
Page 23 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Estimating Dimensions of Signal Subspace M
U  u1 ,, uk 
Let U  U1 ,U 2 

U1  uk : z k   
2
w
 : principal eigenvectors
 Because span(U1 )  span(V ) ,Hence U1U1# is the
orthogonal projector onto the S+N subspace
Page 24 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Thus a vector z of noisy signal can be decomposed as
UU  I  U1U  U 2U  I
#
#
z  U1U1 z  U 2U 2 z
#
1
#

U 1#
#
2
is the Karhunen-Loeve Transform Matrix.
#
2
 The vector U 2U z does not contain signal information
and can be nulled when estimating the clean signal.
 However, M (dim of S+N subspace) must be calculated
precisely
Page 25 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Linear Estimation of the clean signal
– Time Domain Constrained Estimator
 Minimize signal distortion while constraining the energy of
residual noise in every frame below a given threshold
– Spectral Domain Constrained Estimator
 Minimize signal distortion while constraining the energy of
residual noise in each spectral component below a given
threshold
Page 26 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Time Domain Constrained Estimator
– Having z=y+w
Let y
ˆ  Hz be a linear estimator of y
where H is a K*K matrix
– The residual signal is

r  yˆ  y  ( H  I ) y  Hw  ry  rw
 Representing signal distortion and residual noise respectively
Page 27 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Defining Criterion
ry  ( H  I ) y 
Energy:
rw  Hw
Energy:
 Solving :

 y2  trE ry ry# 
 w2  trE rw rw# 
min  y2
H
subject to : ε  α
1
K
2
w
2
w
0  
M
K
Minimize signal distortion while constraining the energy of
residual noise in the entire frame below a given threshold
Page 28 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 After solving the Constrained minimization by ‘‘KuhnTucker’’ necessary conditions we obtain

HTDC  Ry Ry   I
2
w

1
Where  is the Lagrange multiplier that must satisfy

  tr R Ry   I 
1
K
2
y
2
2
w

 Eigendecomposition of HTDC
H TDC
G
U
0
0 #
U

0
Page 29 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 In order to null noisy components
H TDC
G
U
0

0 #
U

0
G   y  y  

2 1
w
HTDC  U1GU
#
1
 If   ( max  M K ) then HTDC=I, which means minimum
distortion and maximum noise
Page 30 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Spectral Domain Constrained Estimator
– Minimize signal distortion while constraining the energy
of residual noise in each spectral component below a
given threshold.
 Results: H  UQU #
Q  diag (q11 ,  , qKK )
qKK


0
12
k
k  1, , M
k  M  1,  , K
 k  exp{v / y (k )}
2
w
Page 31 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Notes
– The most computational complexity is in
Eigendecomposition of the estimated covariance.
– Eigendecomposition of Toeplitz covariance matrix of
the noisy vector is used as an approximate to KLT
– Compromise between large T in estimating Rz ,and
large K to satisfy M<K, while KT can not be too large
Page 32 of 47
A Signal Subspace Approach for S.E.
[Ephraim 95]
 Implementation Results
– The improvement in SNR is proportional to K /M
– The SDC estimator is more powerful than the TDC
estimator
– SNR improvements in Signal Subspace and SS are
similar
– Subjective Test
 83.9 preferred Signal Subspace over noisy signal
 98.2 preferred Signal Subspace over SS
Page 33 of 47
On S.E. Algorithms based on Signal
Subspace Methods [Hansen]
 The dimension of the signal subspace is chosen at a point
with almost equal singular values
 Gain matrices for different estimators
– SDC
Less sensitive to errors in
– TDC
the noise estimation
– MV
Musical noise
 Lowest residual noise
– LS  G=I
 Lowest signal distortion and highest residual noise M
K
 K /M improvement in SNR

2
noise
 SDC improves the SNR in the range 0-20 db
Page 34 of 47
Extension of the Signal Subspace S.E.
Approach to Colored Noise [Ephraim]
 Whitening approach is not desirable for SDC estimator.
 Obtaining gain matrix H for SDC estimator
min 
H

2
d

subject to : E viN  αi
12 ~
H  Rw UHU Rw1 2
2
i  1,...,m
~
H

is not diagonal when the input noise is colored
 Whitening  Orthogonal Transformation U’  modify
~
components by H
Page 35 of 47
An Adaptive KLT Approach for S.E.
[Gazor]
 Goal
– Enhancement of speech degraded by additive colored
noise
 Novelty
– Adaptive tracking based algorithm for obtaining KLT
components
– A VAD based on principle eigenvalues
Page 36 of 47
An Adaptive KLT Approach for S.E.
[Gazor]
 Objective
– Minimize the distortion when residual noise power is
limited to a specific level
 Type of colored noise
– Have a diagonal covariance matrix in KLT domain

G   y  y  

2 1
w
Replaced by
G   y  y   n 
1
Page 37 of 47
An Adaptive KLT Approach for S.E.
[Gazor]
 Adaptive KLT tracking algorithm
– named “projection approximation subspace tracking”
– reducing computational time
– Eigendecomposition is considered as a constrained
optimization problem
– Solving the problem considering quasi-stationarity of
speech
– Then a recursive algorithm is planned to find a close
approximation of eigenvectors of the noisy signal
Page 38 of 47
An Adaptive KLT Approach for S.E.
[Gazor]
 Voice activity detector
– When the current principle components’ energy is
above 1/12 its past minimum and maximum
 Implementation Results
SNR
(dB)
NonEphraim’s
Processed
Noise
Type
10
85%
55%
white
5
75%
69%
white
0
64%
89%
white
10
75%
73%
office
5
85%
79%
office
0
68%
89%
office
Page 39 of 47
Incorporating the Human Hearing Properties in
the Signal Subspace Approach for S.E. [Jabloun]
 Goal
– Keep the residual noise as much as possible, in order
to minimize signal distortion
 Novelty
– Transformation from Frequency to Eigendomain for
modeling masking threshold.
eigendomain
eigendomain
IFET
Masking
FET
Many masking models were introduced in frequency
domain; like Bark scale
Page 40 of 47
Incorporating the Human Hearing Properties in
the Signal Subspace Approach for S.E. [Jabloun]
 Use noise prewhitening to handle the colored noise
 Implementation results
Input SNR
Compared with
noisy signal
Compared with
Signal Subspace
20 dB
92%
71%
10 dB
85%
78%
5 dB
85%
92%
Page 41 of 47
An Energy-Constrained Signal
Subspace Method for S.E. [Huang]
 Novelty
– The colored noise is modelled by an AR process.
– Estimating energy of clean signal to adjust the speech
enhancement
 Prewhitening filter is constructed based on the estimated
AR parameters.
– Optimal AR coeffs is given by [Key 98]
Page 42 of 47
An Energy-Constrained Signal
Subspace Method for S.E. [Huang]
 Implementation Results
Word Recognition Accuracy for noisy digits
Input SNR
0 dB
5 dB
10 dB
20 dB
Baseline
40 %
70 %
90 %
100 %
ECSS
90 %
100 %
100 %
100 %
SNR improvement for isolated noisy digits
Input SNR
0 dB
5 dB
10 dB
20 dB
Improvement
7.6
6.4
5.2
2.9
Page 43 of 47
S.E. Based on the Subspace Method
[Asano]—Microphone Array
 The input spectrum observed at the mth microphone
D
X m k    Am,d k .S d k   N m k 
d 1
 Vector notation for all microphones
x k  Aksk  n k
Directional
Sources
Ambient
Noise
 (spatial) correlation matrix for xk is
R k  E[x k x Hk ]
 Then Eigenvalue Decomposition
is applied to R k
Microphone array
Page 44 of 47
S.E. Based on the Subspace Method
[Asano]—Microphone Array
 Procedure
– Weighting the eigenvalues of spatial correlation matrix
 Energy of D directional sources is concentrated on D largest
eigenvalues
 Ambient noise is reduced by weighting eigenvalues of the
noise-dominant subspace
discarding M-D smallest eigenvalues when direct-ambient
ratio is high
– Using MV beamformer to extract directional component
from modified spatial correlation matrix
Page 45 of 47
S.E. Based on the Subspace Method
[Asano]—Microphone Array
 Implementation results
– Two directional speech signals + Ambient noise
Recognition Rate:
MV
MV-NSR
SNR
A
B1
A
B1
5 dB
66.9%
71.5%
72.3%
78%
10 dB
81.1%
86.6%
81.5%
87.2%
Page 46 of 47
Thanks For Your Attention
The End
Page 47 of 47
Download