R 1

advertisement
R
1
DEPARTMENT OF INFORMATICS
March 2003
David Gesbert
Signal and Image Processing Group (DSB)
http://www.ifi.uio.no/~gesbert
D. Gesbert: IN357 Statistical Signal Processing1 of 21
Course book: Chap. 9 Statistical Digital Signal Processing and modeling,
M. Hayes 1996 (also builds on Chap 7.2).
IN357: ADAPTIVE FILTERS
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing2 of 21
• Example: Adaptive beamforming in mobile networks
• Performance of RLS
• The RLS algorithm
• Performance of LMS
• The LMS algorithm
• Steepest descent in adaptive filtering
• Steepest descent and optimization theory
• The adaptive FIR filter
• Motivations for adaptive filtering
Outline
UNIVERSITY
OF OSLO
Motivations for adaptive filtering
..
p−1
x (n)
x 0(n)
x2 (n)
p observations
desired signal
d(n)
Wn
filter
error signal
e(n)
estimated signal
D. Gesbert: IN357 Statistical Signal Processing3 of 21
^
d(n)
filter W must be adjusted over time n
observed random process, may be non stationary
desired random process (unobserved) may be non stationary
observed random process, may be non stationary
observed random process,may be non stationary
DEPARTMENT OF INFORMATICS
{xp−1(n)}
{d(n)}
{x0(n)}
{x2(n)}
Goal: “Extending optimum (ex: Wiener) filters to the case where the data
is not stationary or the underlying system is time varying”
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing4 of 21
• Example 2: To find the adaptive beamformer that tracks the location
of a mobile user, in a wireless network. d(n) is stationary (sequence of
modulation symbols), but {xi(n)} are not because the channel is
changing.
• Example 1: To find the wiener solution to the linear prediction of
speech signal. The speech signal is non stationary beyond approx
20ms of observations. d(n), {xi(n)} are non stationary.
The filter W must be adjusted over time and is denoted W (n) in order to
track non stationarity:
Cases of non stationarity
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing5 of 21
• (Block filtering) One splits time into short time intervals where the
data is approximately stationary, and re-compute the Wiener solution
for every block.
• (Adaptive filtering) One has a long training signal for d(n) and one
adjusts W (n) to minimize the power of e(n) continuously.
Two solutions to track filter W (n):
Aproaches to the problem
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
where T is the transpose operator.
ˆ
d(n)
= W (n)T X(n)
D. Gesbert: IN357 Statistical Signal Processing6 of 21
W (n) = [w0(n), w1(n), .., wp−1(n)]T
X(n) = [x0(n), x1(n), .., xp−1(n)]T
Vector Formulation (time-varying filter)
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing7 of 21
Find W (n) such that J(n) is minimum at time n. W (n) is the optimum
linear filter in the Wiener sense at time n.
where E() is the expectation.
ˆ
e(n) = d(n) − d(n)
J(n) = E|e(n)|2 varies with n due to non-stationarity
Time varying optimum linear filtering
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
(2)
(3)
(1)
D. Gesbert: IN357 Statistical Signal Processing8 of 21
Rx(n) = E(X(n)∗X(n)T )
rdx(n) = E(d(n)X(n)∗)
Rx(n)W (n) = rdx(n) where
The solution W (n) is given by the time varying Wiener-Hopf equations.
Finding the solution
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
(4)
D. Gesbert: IN357 Statistical Signal Processing9 of 21
(n + 1) = W (n) + ∆W (n)
where ∆W (n) is the correction applied to the filter at time n.
Tracking is formulated by:
• Recursive Least Squares (RLS) algorithm.
• Steepest descent (also called gradient search) algorithms.
Two key approaches:
The time-varying statistics used in (1) are unknown but can be
estimated. Adaptive algorithms aim at estimating and tracking the
solution W (n) given the observations {xi(n)}, i = 0..p − 1 and a training
sequence for d(n).
Adaptive Algorithms
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing10 of 21
Because J() is quadratic here, there is only one local minimum toward
which W (n) will converge.
where µ is a small step-size (µ << 1).
δJ
• W (n + 1) = W (n) − µ δW
∗ |W =W (n)
• W (0) is an arbitrary initial point
Idea: “Local extrema of cost function J(W) can be found by following the
path with the largest gradient (derivative) on the surface of J(W).”
Assumptions: Stationary case.
Steepest descent in optimization theory
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
J(W )
e(n)
δJ
δW ∗
δJ
δW ∗
δJ
δW ∗
= −E(e(n)X(n)∗)
D. Gesbert: IN357 Statistical Signal Processing11 of 21
= E(e(n)e(n)∗) where
ˆ = d(n) − W T X(n)
= d(n) − d(n)
δe(n)
δe(n)∗
∗
= E(
e(n) + e(n)
)
∗
∗
δW
δW
∗
δe(n)
= E(0 + e(n)
)
∗
δW
Derivation of the gradient expression:
The steepest descent Wiener algorithm
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
Problem: E(e(n)X(n)∗) is unknown!
D. Gesbert: IN357 Statistical Signal Processing12 of 21
W (n) will converge to Wo = R−1
x rdx (wiener solution) if 0 < µ < 2/λmax
(max eigenvalue of Rx . (see p. 501 for proof).
• W (n + 1) = W (n) + µE(e(n)X(n)∗)
• W (0) is an arbitrary initial point
Algorithm:
The steepest descent Wiener algorithm
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
• Repeat with n + 2..
• W (n + 1) = W (n) + µe(n)X(n)∗
• W (0) is an arbitrary initial point
D. Gesbert: IN357 Statistical Signal Processing13 of 21
Idea: E(e(n)X(n)∗) is replaced by its instantaneous value.
The Least Mean Square (LMS)
Algorithm
UNIVERSITY
OF OSLO
(5)
• The algorithm is derived under the assumption of stationarity, but
can be used in non-stationary environment
as a tracking method.
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing14 of 21
• A small µ results in larger accuracy but slower convergence.
• µ allows a trade-off between speed of convergence and accuracy of the
estimate.
• The variance of W (n) around its mean is function of µ.
Important Remarks:
(W (n)) → R−1
x rdx when n → ∞
Lemma: W (n) will converge in the mean toward Wo = R−1
x rdx , if
0 < µ < 2/λmax, (see p. 507) ie.:
The Least Mean Square (LMS)
Algorithm
UNIVERSITY
OF OSLO
rdx(n) =
Rx(n) =
k=0
k=0
k=n
X
k=n
X
λn−k d(k)X(k)∗
λn−k X(k)∗X(k)T
= rdx(n)
DEPARTMENT OF INFORMATICS
(8)
(7)
(6)
D. Gesbert: IN357 Statistical Signal Processing15 of 21
where λ is the forgetting factor (λ < 1 close to 1)
Where
x (n)W (n)
Idea: build a running estimate of the statistics Rx(n), rdx(n), and solve
the Wiener Hopf equation at each time:
A faster-converging algorithm
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
(9)
(10)
(11)
D. Gesbert: IN357 Statistical Signal Processing16 of 21
Answer: Using the matrix inversion lemma (Woodbury’s identity)
Question: How to determine the right correction ∆W (n − 1) ??
Rx(n) = λRx(n − 1) + X(n)∗X(n)T
rdx(n) = λrdx(n − 1) + d(n)X(n)∗
W (n) = W (n − 1) + ∆W (n − 1)
To avoid inverting a matrix a each step, ones finds a recursive solution
for W (n).
Recursive least-squares (RLS)
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
(12)
D. Gesbert: IN357 Statistical Signal Processing17 of 21
−1
H −1
A
uv
A
H −1
−1
A + uv ) = A −
1 + v H A−1u
We apply to Rx(n)−1 = (λRx(n − 1) + X(n)∗X(n)T )−1
We define P(n) = Rx(n)−1. The M.I.L. is used to update P(n − 1) to P(n)
directly:
Matrix inversion lemma
UNIVERSITY
OF OSLO
= λ Rx(n − 1)
−1
DEPARTMENT OF INFORMATICS
Rx(n)
−1
−1
D. Gesbert: IN357 Statistical Signal Processing18 of 21
λ−1Rx(n − 1)−1X(n)∗X(n)T Rx(n − 1)−1
−
(13)
1 + λ−1X(n)T Rx(n − 1)−1X(n)∗
(14)
Matrix inversion lemma
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
(21)
(19)
(20)
(18)
(15)
(16)
(17)
D. Gesbert: IN357 Statistical Signal Processing19 of 21
where δ << 1 is a small arbitrary initialization parameter
W (0) = 0
P(0) = δ −1I
Z(n) = P(n − 1)X(n)∗
Z(n)
G(n) =
λ + X(n)T Z(n)
α(n) = d(n) − W (n − 1)T X(n)
W (n) = W (n − 1) + α(n)G(n)
1
P(n) = (P(n − 1) − G(n)Z(n)H )
λ
The RLS algorithm
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
D. Gesbert: IN357 Statistical Signal Processing20 of 21
Accuracy: In LMS the accuracy is controlled via the step size µ. In RLS
via the forgetting factor λ. In both cases very high accuracy in the
stationary regime can be obtained at the loss of convergence speed.
Convergence speed: LMS slower because depends on amplitude of
gradient and eigenvalue spread of correlation matrix. RLS is faster
because it points always at the right solution (it solves the problem
exactly at each step).
Complexity: RLS more complex because of matrix multiplications. LMS
simpler to implement.
RLS vs. LMS
UNIVERSITY
OF OSLO
DEPARTMENT OF INFORMATICS
To be developped in class.
D. Gesbert: IN357 Statistical Signal Processing21 of 21
The LMS applied to the problem of adaptive beamforming...
Application
UNIVERSITY
OF OSLO
Download