Extension of GKB-FP algorithm to large-scale general

advertisement
Extension of GKB-FP algorithm to large-scale
general-form Tikhonov regularization
Fermı́n S. Viloche Bazán ∗, Maria C. C. Cunha and Leonardo S. Borges
†
Department of Mathematics, Federal University of Santa Catarina,
88040-900, Florianópolis SC, Brazil
e-mail: fermin@mtm.ufsc.br
Department of Applied Mathematics, IMECC-UNICAMP,
University of Campinas, CP 6065, 13081-970, Campinas SP, Brazil
e-mail:lsbplsb@yahoo.com
Abstract
In a recent paper [24] an algorithm for large-scale Tikhonov regularization in
standard form called GKB-FP was proposed and numerically illustrated. In this
paper, further insight into the convergence properties of this method is provided
and extensions to general-form Tikhonov regularization are introduced. In addition,
as alternative to Tikhonov regularization, a preconditioned LSQR method coupled
with an automatic stopping rule is proposed. Preconditioning seeks to incorporate smoothing properties of the regularization matrix into the computed solution.
Numerical results are reported to illustrate the methods on large-scale problems.
KEY WORDS: Tikhonov regularization, Large-scale problems, Ill-posed problems
1
Introduction
We are concerned with the computation of stable solutions to large-scale discrete illposed problems of the form
x = argminkb − Axk22
(1)
x∈Rn
where the matrix A ∈ Rm×n , m ≥ n, is severely ill-conditioned. Matrices of this type
arise, for example, when discretizing first kind integral equations with smooth kernel,
e.g., in signal processing and image restoration. In applications the vector b represents
the data and is assumed to be of the form b = bexact + e, where e denotes a noise vector
due to measurement or approximation errors, bexact denotes the unknown error-free data,
and xexact = A† bexact (where A† denotes the Moore-Penrose Pseudo Inverse of A) is the
noise-free solution of (1). In these cases, due to the noise in the data, the least squares
solution xLS = A† b is dominated by the noise and regularization methods are required to
∗
†
The work of this author is supported by CNPq, Brazil, grants 308709/2011-0, 477093/2011-6.
This research is supported by FAPESP, Brazil, grant 2009/52193-1.
1
construct stable approximations to xexact . In Tikhonov regularization [1], we replace xLS
by the regularized solution
xλ = argmin kb − Axk22 + λ2 kLxk22 ,
(2)
x∈Rn
where λ > 0 is the regularization parameter and L ∈ Rp×n is referred to as the regularization matrix. When L = In , the n × n identity matrix, the Tikhonov regularization
problem is said to be in standard form, otherwise, it is in general form. Solving (2) is
equivalent to solve the so-called regularized normal equations
(AT A + λ2 LT L)x = AT b,
(3)
whose solution xλ is unique when N (A) ∩ N (L) = {0}. In this paper N (·) denotes
the null space of (·) and R(·) denotes its column space. When A and L are small, the
regularized solution can readily be computed using the GSVD of the pair (A, L), and
good approximate solutions can be obtained provided that λ and L are properly chosen.
There are many Tikhonov parameter choice methods. These include the discrepancy
principle (DP) of Morozov [2], which depends on a priori knowledge of the norm of the
error e, and the so called noise-level free or heuristic parameter choice rules such as the
L-Curve criterion of Hansen and O’Leary [3], the Generalized Cross-Validation (GCV) of
Heath, Golub and Wahba [4] and the algorithmic realization of Regińska’s rule [5] via the
fixed-point (FP) method by Bazán [6], among others.
It is well-known that the regularization parameter selected by DP depends on the norm
of the error kek2 and that the error in the corresponding regularized solution converges
to zero as kek2 → 0 (under certain conditions), see, e.g., [7]. Unfortunately, this is not
the case with heuristic rules for which the regularization parameter is not a function of
the noise level. As a result, heuristic rules are not without difficulties and cannot be
successful in all problems. This is in accordance with a famous result of Bakushinski [8],
which asserts that parameter choice rules that do not depend explicitly on the noise level
should fail, at least for some problems. Nevertheless, noise-level free rules are widely used
in real applications, as for example the L-curve method. For discussions and analyses of
heuristic rules the reader is referred to [7, 9]. A numerical comparison of many heuristic
parameter choice rules can be found in [10], and, recently in [11].
From the practical point of view, all of the above and many other methods from the
literature can be readily implemented using the GSVD when A and L are small or of
moderate size. However, for large-scale problems the GSVD is computationally demanding, and one can use instead iterative methods such as LSQR or CGLS to approximate
xλ provided that the parameter λ is known a priori. Another way to proceed is to use
projection methods based on Lanczos/Arnoldi iterations; these include CGLS/LSQR and
the family of minimum residual methods [12, 13, 14, 15, 16, 17, 18, 19], where the number
of iterations k plays the role of the regularization parameter and where the choice of the
”best” k can be done using the discrepancy principle or L-curve [15, 20]. However, it is
well known that DP is likely to fail if the error norm kek is poorly estimated, and that
the L-curve criterion works well except possibly when the curve is not L-shaped, see e.g.,
Morigi [21] or when the curvature of the L-curve has more than a one local maximum,
see, e.g., Hansen et al. [20] or [22].
The difficulty in determining reliable stopping criteria for iterative methods can be
alleviated by combining them with an inner regularization method, which gives rise to
2
Hybrid methods. For the case L = In , two successful hybrid methods that do not require
knowledge of kek2 and combine projection over the Krylov subspace generated by the
Golub-Kahan bidiagonalization (GKB) method and standard Tikhonov regularization at
each iteration, are W-GCV [23] and GKB-FP [24]. Here the difference is on the projected
problem: while W-GCV uses weighted GCV as parameter choice rule, GKB-FP uses the
FP method. The case L 6= In has received less attention and there are few efficient
methods for large-scale regularization. An interesting approach that uses a joint bidiagonalization (JBD) procedure applied to the pair (A, L) and minimizes the Tikhonov
functional over the generated Krylov subspace is proposed by Kilmer et al. [15]. This
method may be infeasible for large-scale problems because the JBD procedure is computationally demanding. More recently, J. Lampe et. al [25], and Reichel et. al [26],
propose approximate solutions by minimizing the Tikhonov functional over generalized
Krylov subspaces; in both cases the regularization parameter for the projected problem is
determined by the discrepancy principle. A related method which also uses the discrepancy principle as parameter choice rule can be found in [27].
In this paper we assume that no estimate of kek2 is available and concentrate on
extensions of the GKB-FP algorithm to large-scale general-form Tikhonov regularization.
Two approaches are considered. The first approach relies on the observation that the
general-form Tikhonov regularization can be transformed into a Tikhonov problem in
standard form which can be handled efficiently by GKB-FP; therefore our first approach
is to apply GKB-FP to the transformed Tikhonov problem. As for the second approach,
it constructs regularized solutions by minimizing the general-form Tikhonov problem over
the Krylov subspace generated by the GKB algorithm applied to matrix A, as suggested
in [27]. In all cases, the underlying philosophy of GKB-FP is preserved, i.e., the fixed-point
method is always used on the projected problem. Also, to overcome possible difficulties
associated with the Tikhonov regularization parameter selection problem at each iteration,
as done by GKB-FP or W-GCV, we propose an iterative regularization algorithm based
on LSQR applied to the least squares problem min kb̄ − Āx̄k2 , where b̄ and Ā arise from
the transformed Tikhonov problem, coupled with a stopping rule that does not require
knowledge of the norm of the noise kek2 and that can be regarded as a discrete counterpart
of Regińska’s rule. Theoretically, we follow a paper by Hanke and Hansen [13], see also [28],
where it is shown how to construct regularized solutions to (2) via CGLS/LSQR in such
a way that the smoothing properties of L are incorporated into the iterative process, and
hence into the computed solution.
The paper is organized as follows. Section 2 presents an analysis of GKB-FP within
the framework of projection methods which provides valuable insight into the convergence
properties of the algorithm. In Section 3 the extensions of GKB-FP to general-form
Tikhonov regularization are presented and discussed. Our alternative to general-form
Tikhonov regularization comes in Section 4, and Section 5 contains numerical examples
devoted to illustrate the potential of the methods. Conclusions are in Section 6.
2
GKB-FP as a projection method
The purpose of this section is to provide further insight into the convergence properties
of GKB-FP. Recall that after k < n steps, the GKB algorithm applied to A with initial
vector b/kbk2 yields two matrices Uk+1 = [u1 , . . . , uk+1 ] ∈ Rm×(k+1) and Vk = [v1 , . . . , vk ] ∈
3
Rn×k with orthonormal columns, and a lower bidiagonal matrix Bk ∈ R(k+1)×k ,

such that
α1
 β2


Bk = 




α2
β3
..
.
..
.
αk
βk+1



,



β1 Uk+1 e1 = b = β1 u1 ,
AVk = Uk+1 Bk ,
AT Uk+1 = Vk BkT + αk+1 vk+1 eTk+1 ,
(4)
(5)
(6)
(7)
where ei denotes the i-th unit vector in Rk+1 [29]. The columns of Vk provide an orthonormal basis for the generated Krylov subspace Kk (AT A, AT b), which is an excellent
choice for use when solving discrete ill-posed problems [30]. GKB-FP combines the GKB
algorithm with standard form Tikhonov regularization in Kk (AT A, AT b), choosing the regularization parameter on the projected problem via the FP method [6]. The FP method
relies on an earlier work of Regińska [5], where the regularization parameter is chosen as
a minimizer of the function
Ψ(λ) = krλ k22 kxλ k2µ
2 , µ > 0,
(8)
where rλ = b − Axλ . Regińska proved that if the curvature of the L-curve is maximized
at λ = λ∗ , and if the slope of the L-curve corresponding to λ = λ∗ is −1/µ, then Ψ(λ) is
minimized at λ = λ∗ . However, no method to compute the minimum was done afterward.
More recently, Bazán [6] investigated the issue and, with a proper choice of µ, showed that
the minimum can be calculated via fixed-point iterations, giving rise to the FP method.
The key idea behind this parameter choice rule can be explained as follows. Let
Rλ = (AT A + λ2 In )−1 AT . Then the error in the regularized solution xλ can be expressed
as xexact − xλ = xexact − Rλ bexact − Rλ e, and therefore
kxexact − xλ k2 ≤ kxexact − Rλ bexact k2 + kRλ ek2 ≡ E1 (λ) + E2 (λ).
(9)
The term E1 (λ) represents the regularization error and increases with λ. The second
term denotes the error caused by the noise and decreases with λ, see, e.g., [31]. Thus,
in order to obtain a small error we should balance the terms as in this event the bound
is approximately minimized. However, neither E1 (λ) nor E2 (λ) is available, hence, an
alternative is to minimize a model for the bound. The motivation for using Ψ(λ) as a
model is that minimizing log(Ψ(λ)) we minimize the sum of competing terms, since kxλ k2
decreases with λ while krλ k2 increases, as illustrated in Figure 1.
The FP method can be summarized as follows:
• Given an initial guess, FP takes µ = 1 as a default value and proceeds by computing
the iterates
√
λj+1 = φ(λj ), j ≥ 0, where φ(λ) = µkrλ k2 /kxλ k2 , λ > 0,
(10)
until the largest convex fixed-point of φ is reached.
4
0.79
10
E (λ)
1
log(Ψ(λ))
1
10
E (λ)
2
E (λ)+E (λ)
1
2
0.78
10
0
10
0.77
10
0
0.01
0.02
0.03
0
0.01
0.02
0.03
Figure 1: Bound (9) and Ψ(λ) for shaw problem from [32], n = 512 and b = bexact +e where
e is white noise with kek2 = 0.005kbexact k2 . The small circle points out the location of
λoptimal = argmin{E1 (λ) + E2 (λ)} = 0.0102 and of λFP = argmin{Ψ(λ), µ = 1} = 0.0116.
The relative errors in xλoptimal and xλFP are 5.34% and 5.36% respectively.
• If µ = 1 does not work, µ is adjusted and the iterations restart; see [6, 33] for
details.
The GKB-FP algorithm computes approximations to the sought fixed-point by com√
(k)
(k)
puting a finite sequence of fixed-points of functions φ(k) (λ) = µkrλ k2 /kxλ k2 , where
(k)
xλ solves the constrained problem
(k)
xλ =
(k)
argmin
{kAx − bk22 + λ2 kxk22 }
(11)
x∈Kk (AT A,AT b)
(k)
and rλ = b − Axλ . Based on (5)-(7), it is straightforward to prove that
(k)
(k)
xλ = V k yλ ,
(k)
with yλ = argmin{kBk y − β1 e1 k22 + λ2 kyk22 },
(12)
y∈Rk
and the regularized solution norm and the corresponding residual norm satisfy
(k)
(k)
kxλ k2 = kyλ k2 ,
(k)
(k)
and krλ k2 = kBk yλ − β1 e1 k2 .
(13)
The evaluation of φ(k) (λ) for each λ can be done approximately in O(k) arithmetic
operations, which corresponds to the cost of solving the projected problem (12). Numerical
examples on large-scale problems reported in [24] show that the largest convex fixed-point
of φ(λ) is quickly captured. For a detailed description of GKB-FP, see [24] again; here
we summarize the main steps of GKB-FP, since, as we will see later, they will be also
followed by the methods proposed in this work.
GKB-FP Algorithm:
1. Perform p0 > 1 GKB steps applied to A with initial vector b/kbk2 .
2. For k ≥ p0 compute the fixed-point λ(k)
of φ(k) (λ) until a termination criterion is
FP
satisfied.
(k)
3. Once convergence is achieved compute xλ using (12).
5
The value of p0 can be freely chosen by the user. In our numerical experiments we take:
p0 = 10 for small problems, p0 = 15 for relatively large problems, and p0 = 20 for larger
problems. As for the initial guess of λ when computing the first fixed point at step k = p0 ,
it is set equal to 10−4 . For further details about the initial guess, see [6, 24].
To provide further insight into the convergence properties of GKB-FP, we will concentrate on the question of how well xλ can be approximated by “projecting” the large-scale
problem onto a subspace of small dimension. More specifically, assume that {Vk }k≥1 is a
family of k dimensional subspaces such that V1 ⊂ V2 ⊂ · · · ⊂ Vk ⊂ Vk+1 ⊂ · · · , and for
(k)
k > 1 consider approximations xλ defined by
(k)
xλ = argmin{kAx − bk22 + λ2 kLxk22 }.
(14)
x∈Vk
(k)
Then our goal is to bound the error kxλ − xλ k2 for the case where the regularization
parameter is selected by the FP method. To this end we first introduce some notation
that will be used later. Let the reduced singular value decomposition of A be given by
A = UΣVT ,
(15)
where U = [u1 , . . . , un ] ∈ Rm×n and V = [v1 , . . . , vn ] ∈ Rn×n have orthonormal columns
and Σ = diag(σ1 , . . . , σn ) is a diagonal matrix with σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0. In addition,
let Uk , Vk and Sk be defined by
Uk = [u1 , . . . , uk ],
Sk = span{v1 , . . . , vk },
Vk = [v1 , . . . , vk ],
and notice that if L = In , then the regularized Tikhonov solution xλ can be written as
n
X
σi (uTi b)
vi .
xλ =
σ 2 + λ2
i=1 i
(16)
We start with the following technical result.
Lemma 2.1. Let Pk denote the orthogonal projection on Vk . Define γk = kA − APk k2
and δk = kL − LPk k2 . Then ∀λ > 0 there holds,
r
γ2
(k)
kLxλ − Lxλ k2 ≤ δk2 + k2 k(In − Pk )xλ k2 .
(17)
λ
Proof : Since N (A) ∩ N (L) = {0}, it follows that the function < u, v ># =< Au, Av >
+λ2 < Lu, Lv >, where < ·, · > denotes the usual inner product in Rn , defines an inner
product in Rn . Note that (3) implies < Axλ − b, Ax > +λ2 < Lxλ , Lx >= 0, ∀x ∈ Rn and
(k)
that < Axλ − b, Aϕ > +λ2 < Lxλ , Lϕ >= 0, ∀ϕ ∈ Vk . Combining these we have
(k)
(k)
< xλ − xλ , ϕ ># = 0,
∀ϕ ∈ Vk .
(18)
Therefore, xλ = Pk xλ , where Pk denotes the orthogonal projector onto the subspace Vk
with respect to the inner product < ·, · ># and induced norm k · k# . Then
(k)
kxλ − xλ k2# =
≤
=
≤
=
kxλ − Pk xλ k2#
kxλ − Pk xλ k2#
.
kA(In − Pk )xλ )k22 + λ2 kL(In − Pk )xλ k22
2
2
2
2
2
kA(In − Pk )k2 k(In − Pk )xλ k2 + λ kL(In − Pk )k2 k(In − Pk )xλ k2
(γk2 + λ2 δk2 )k(In − Pk )xλ k22 .
6
(k)
(k)
Now it suffices to use the inequality λ2 kL(xλ − xλ )k22 ≤ kxλ − xλ k2# and the proof is
complete.
We are in a position to state the main result of the section.
Theorem 2.1. Let Pk and Pk be, respectively, the orthogonal projector onto Vk and Sk ,
and let Ωk be the subspace angle between Vk and Sk . Then for arbitrary λ > 0 there holds
k(In − Pk )xλ k ≤ sin(Ωk )kxλ k2 + γk
kb̃k k2
,
λ2
(19)
where b̃k = b − Uk UTk b. As a consequence, whenever L = In and the regularization parameter is selected by the fixed-point method, which is denoted by λFP , there holds
s
"
#
(k)
2
kxλFP − xλ k2
γ
k
b̃
k
γ
k 2
k
FP
≤ 1 + 2k sin(Ωk ) +
,
(20)
kxλFP k2
λFP
λFP krλFP k2
Proof : The regularized solution xλ can be written as xλ = Pk xλ +(In −Pk )xλ . Multiplying
on both sides of this equality by (In − Pk ) and taking norms
k(In − Pk )xλ k ≤ k(In − Pk )Pk xλ k2 + k(I − Pk )(In − Pk )xλ k2
(21)
The first term can be bounded as
k(In − Pk )Pk xλ k2 ≤ sin(Ωk )kxλ k2 ,
(22)
where we used the fact that k(I − Pk )Pk k2 = sin(Ωk ), see [34, Theorem 2.6.1, p. 76]. To
bound the second term, observe that from (16)
(In − Pk )xλ =
n
X
σi (uTi b)
v = (AT A + λ2 In )−1 AT b̃k = AT (AAT + λ2 Im )−1 b̃k ,
2
2 i
σ
+
λ
i=k+1 i
where we used the SVD of A to obtain (AT A + λ2 In )−1 AT = AT (AAT + λ2 Im )−1 . Thus,
k(In − Pk )(In − Pk )xλ k2 = k(In − Pk )AT (AAT + λ2 Im )−1 b̃k k2 ≤ γk
kb̃k k2
.
λ2
(23)
The inequality (19) follows by replacing (22) and (23) in (21).
On the other hand, if the regularization parameter λ is selected by the fixed-point
method, which means, λFP = krλFP k2 /kxλFP k2 , then (19) becomes
!
k(In − Pk )xλFP k
γk kb̃k k2
,
(24)
≤ sin(Ωk ) +
kxλFP k2
λFP krλFP k2
The second part of the proof concludes using this inequality in (17).
Theorem 2.1 shows that if the subspace Vk approximates Sk well, in which case sin(Ωk )
(k)
is small, then the solution of the projected problem (14), xλ , will be a good approximaFP
tion to xλFP provided that the ratio γk /λFP is sufficiently small. We shall illustrate this
by computing the relative error k(In − Pℓ )xλFP k2 /kxλFP k2 and its bound (24) for the case
7
where Vℓ coincides with Sℓ , as well as the bound for the case where we take approximations
to Sℓ instead. To this end, we consider two test problems from [32] and for fixed k, take
e k = Vk v, where
Seℓ , ℓ = 1, . . . , k, to be the subspace spanned by the first ℓ columns of V
k×k
Vk is from (6) and v ∈ R
is the matrix of right singular vectors of Bk also introduced
in (6). It is known that for the first values of ℓ the subspaces Seℓ approximate Sℓ well
and that the quality of the approximation deteriorates as ℓ approaches k, see, e.g., [30,
Chapter 6]. Computed quantities for k = 80 are displayed in Figure 2.
5
5
10
10
LHS
GKB
SVD
LHS
GKB
SVD
0
0
10
10
20
40
60
80
20
40
60
80
Figure 2: Relative error and estimates (24) for phillips (left) and heat (right) test problems
from [32], for n = 512 and b = bexact + e where e is white noise such that kek2 =
0.005kbexact k2 . Here, LHS stands for left-hand side of (24).
We now turn to describe the quantity γk as a function of k in connection with the GKB
process. To this end we shall assume exact arithmetic and that GKB runs to completion,
i.e., we are able to run n GKB steps without interruption, which is reasonable for discrete
ill-posed problems since the singular values decay gradually to zero without any gap.
Theorem 2.2. Assume that the GKB algorithm applied to A runs to completion, and let
bk be the bidiagonal matrices defined respectively by
Gk and G

αk+1
 βk+2


Gk = 


αk+2
..
.
Then for k = 1, . . . , n − 1 we have γk =

..
.
βn
αn
βn+1
kGk k2 ,
bk k2 ,
kG
 
bk
G

=

βn+1 eTn−k

(25)
if m > n,
, and γk+1 ≤ γk .
if m = n,
Proof : Assume that m > n. If the GKB algorithm runs to completion, from (6) it
follows that A = Un+1 Bn Vn , where Un+1 ∈ Rm×(n+1) and Vn ∈ Rn×n have orthonormal
columns, and Bn ∈ R(n+1)×n is lower bidiagonal. But matrix A can be rewritten as
B
V T,
(26)
A=U
0
where U = [Un+1 U ⊥ ] ∈ Rm×m with U ⊥ chosen so that U is orthogonal, V = Vn , B = Bn ,
and 0 ∈ R(m−n−1)×n is a zero matrix. Then Pk = Vk VkT , APk = AVk VkT = Uk+1 Bk VkT by
(5), and A − APk = U (B − U T Uk+1 Bk VkT V )V T . Hence
kA − APk k2 = kB − U T Uk+1 Bk VkT V k2 = kGk k2 ,
8
which proves the first statement of the theorem.
bn V,
Now if m = n and the GKB algorithm runs n steps, then we have that A = U B
n×n
n×n
bn ∈ R
where U, V ∈ R
have orthonormal columns, and B
defined by

α1
 β2
bn = 
B


α2
..
.
..

.
βn
αn


,

(27)
and the proof proceeds in the same way as above.
bk+1 ) is a
Finally, the second statement is a consequence of the fact that Gk+1 (resp. G
bk ), which completes the proof.
submatrix of Gk (resp. G
Remark 1: The number γk can be approximated using the 2-norm of the first column of
Gk , that is, if e1 denotes the unit vector in Rn−k , then
q
2
2
γk ' ||Gk e1 k2 = αk+1
.
(28)
+ βk+2
Remark 2: Provided that rank(APk ) = k, we have
γk ≥ σk+1 ,
(29)
the equality being attained when Vk is spanned by the first k right singular vectors of A.
In this case APk is the matrix of rank k that is closest to A in the 2-norm sense, see, e.g.,
[34, Thm. 2.5.3]. In practice, q
because Vk carries relevant information on the first k right
singular vectors, both γk and
2
2
αk+1
+ βk+2
approximate σk+1 , with γk decreasing as fast
as σk+1 , see Figure 3. As a result, whenever the fixed point of φ(k) (λ), λ(k)
, is close to
FP
λFP , the bound (17) becomes small because γk decreases quickly near zero. This explains
why GKB-FP often converges rapidly, as reported in [24].
2
2
10
10
σk+1
γk
0
10
γk
0
10
||Gke1||2
−2
||Gke1||2
−2
10
10
−4
−4
10
0
σk+1
10
20
40
60
80
0
20
40
60
80
Figure 3: Estimates (28) and (29) for phillips (left) and heat (right) test problems from
[32], for n = 512 and b = bexact + e where e is white noise such that kek2 = 0.005kbexact k2 .
3
Extensions of GKB-FP
We shall now consider extensions of GKB-FP to general-form Tikhonov regularization, concentrating, in particular, on problems that involve rank-deficient matrices L with
9
much more rows than columns, as often seen, e.g., in deblurring problems. Two welldistinguished approaches shall be considered. One approach is based on the fact that we
can apply GKB-FP to a related transformed problem. The second approach constructs
regularized solutions by solving the constrained problem (14).
3.1
Extensions of GKB-FP via standard-form transformation
Theoretically, we can always transform the general-form Tikhonov problem (2) into a
problem in standard form
x̄λ = argmin kb̄ − Āx̄k22 + λ2 kx̄k22 .
(30)
x̄∈Rp
If L is invertible we use y = Lx ⇔ x = L−1 y, b̄ = b, and Ā = AL−1 . When L is not
invertible the transformation takes the form
xλ = L†A x̄λ + xN ,
Ā = AL†A ,
b̄ = b − AxN ,
(31)
where xN lies in the null space of L and
† †
†
†
LA = I n − A I n − L L A L ,
is the A-weighted generalized inverse of L. If we know a full rank matrix W whose columns
span the null space N (L), then xN = W(AW )† b, and the A-weighted generalized inverse
of L reduces to L†A = In − W (AW )† A L† [28]. Thus, for the GKB-FP algorithm to be
successfully applied to (30), the matrix Ā must not be explicitly formed and the matrixvector products with Ā and ĀT must be performed as efficiently as possible. Is is clear
†
that these matrix-vector products depend on the way the products with L† and LT are
performed, and that computation of the product L† u requires determining the minimum
2-norm least squares solution of the problem
tLS = argmin kLt − uk2
(32)
†
as quickly and efficiently as possible; the same observations holds for LT v. Searching for
this efficiency, one can try iterative methods such as CGLS or LSQR and its subspace
preconditioned version [16]. Unfortunately, our computational experience with these techniques when kLxk2 is a Sobolev norm has been unsatisfactory due to the slow convergence of the iterates; however, for applications involving sparse regularization matrices
e.g., banded regularization matrices with small bandwidth, among others, the following
approaches can be considered:
3.1.1
LU-based approach
If L is assumed to be rank deficient with rank(L) = q < min{p, n}, we can determine
permutation matrices PL ∈ Rp×p and QL ∈ Rn×n such that
PL LQL = L̂Û
10
(33)
where L̂ ∈ Rp×q is “unit lower triangular”, Û ∈ Rq×n is “upper triangular”, and both have
rank q. Thus the linear squares problem (32) requires the minimum 2-norm least squares
solutions of the full-rank subproblems:
yLS = argmin kL̂y − PL uk2 ,
and tLS = argmin kÛ t − yLS k2 .
(34)
Once tLS is determined, we then get t = L† u = QL tLS . The approach becomes attractive
specially when L is banded with small bandwidth; in this case the two above subproblems
can be handled efficiently [28, 30], an important condition for GKB-FP to work well.
3.1.2
LD -based approach
In this case we consider regularization matrices L of the form
L=
Iñ ⊗ Ld1
Ld2 ⊗ Im̃
with Ld1 ∈ R(ñ−d1 )×ñ , and Ld2 ∈ R(m̃−d2 )×m̃ . The key idea is that if we use the SVDs of
the small matrices Ld1 and Ld2 : Ld1 = Ud1 Σd1 VdT1 , Ld2 = Ud2 Σd2 VdT2 , then the smoothing
seminorm satisfies the property
kLxk2 = kLD xk2 ,
with LD = D(Vd1 ⊗ Vd2 )T ,
(35)
where D is a nonnegative diagonal such that [14]
D2 = ΣTd1 Σd1 ⊗ Im + In ⊗ ΣTd2 Σd2 .
(36)
The null space of LD , N (LD ), is spanned by the columns of the matrix Vd1 ⊗Vd2 associated
with the zero entries of D, see [14] again. As a consequence, we can now solve the standard
Tikhonov problem (30) associated with
(37)
xλ = argmin kb − Axk22 + λ2 kLD xk22 ,
x∈Rn
taking advantage of the fact that Vd1 ⊗ Vd2 is orthogonal; in this case the products L†D u
T
and L†D v required in (32) can be performed efficiently as
L†D u = (Vd1 ⊗ Vd2 )D† u,
T
L†D v = D† (Vd1 ⊗ Vd2 )T v.
(38)
The main advantage here is that the cost of the subproblem (32) reduces essentially to a
matrix-vector product involving the Kronecker product Vd1 ⊗ Vd2 , which is important for
the efficiency of GKB-FP. Further savings are possible when the matrix A is a Kronecker
product of the form A1 ⊗ A2 , as seen in image processing and numerical analysis. To see
how this can be done, recall that Vd1 ⊗ Vd2 is orthogonal and consider the transformation
x̌ = (Vd1 ⊗ Vd2 )T x.
(39)
Then (37) reduces to
x̌λ = argmin kb − Ǎx̌k22 + λ2 kDx̌k22 ,
x̌∈Rn
with Ǎ = (A1 Vd1 ) ⊗ (A2 Vd2 ),
(40)
where only the matrices A1 Vd1 and A2 Vd2 need to be stored. Now it is evident that
the Tikhonov problem (40) can be solved much more efficiently than the one in (37).
Numerical results that illustrate the efficiency of this approach on deblurring problems
are postponed to the next section.
11
3.2
PROJ-L: GKB-FP free of standard-form transformation
We now consider regularized solutions obtained by minimizing the general-form Tikhonov
functional over the subspace Kk (AT A, AT b), as suggested in [27]. Therefore the approximate solution is now determined as
(k)
(k)
xλ = V k yλ ,
(k)
yλ = argmin{kAVk y − bk22 + λ2 kLVk yk22 }.
(41)
y∈Rk
Note that if we use the QR factorization of the product LVk , LVk = Qk Rk , using (5)-(6)
the least squares problem in (41) reduces to
(k)
yλ = argmin{kBk y − β1 e1 k22 + λ2 kRk yk22 },
(42)
y∈Rk
which can be computed efficiently in several ways, e.g., by a direct method or by first
transforming the stacked matrix [BkT λRkT ]T to upper triangular form, as done when
implementing GKB-FP [24]. Then it follows that
(k)
(k)
kLxλ k = kRk yλ k2 ,
(k)
(k)
krλ k2 = kBk yλ − β1 e1 k,
(43)
and the function φ(k) (λ) associated with the projected problem is
(k)
φ(k) (λ) =
√ kBk yλ − β1 e1 k2
.
µ
(k)
kRk yλ k2
(44)
Unlike the approach in [27] where the Tikhonov regularization parameter is determined
by the discrepancy principle, our proposal denoted by PROJ-L is to follow the same ideas
as GKB-FP. That is, for chosen p0 > 1 and k ≥ p0 , we determine the largest convex
fixed-point λ(k)
of φ(k) (λ), repeating the process until a stopping criterion is satisfied.
FP
Numerical examples have shown that the largest fixed-point of φ(λ) associated with the
large-scale problem is captured in a relatively small number of GKB steps.
To make our proposal computationally feasible, the following aspects must be considered
• the initial guess of the fixed-point method on the projected problem at step k + 1
is taken to be the fixed-point λ(k)
and
FP
• the QR factorization LVk = Qk Rk is calculated only once at step k = p0 and is
updated in subsequent steps.
Algorithms for updating the QR factorization can be found in [34, Chapter 12].
4
Alternative to Tikhonov regularization: smoothed
preconditioned LSQR
We have seen that GKB-FP requires the projected problem to be solved repeatedly
in order to calculate solution and residual norms for distinct values of the regularization
12
parameter λ. This may be expensive in connection with large-scale problems. An alternative is to use iterative regularization in which the number of iterations plays the role of
the regularization parameter. Methods in this class include CGLS/LSQR, GMRES, MINRES, and Landweber iterations, among many others, see., e.g., [14, 15, 16, 18, 19, 30, 35].
While the regularizing properties of these methods are reasonably well understood, less
clear is the situation when the task is to determine a proper number of iterations without
a priori knowledge about the norm of the noise. The purpose of this section is to show how
to construct regularized solutions via a version of LSQR which incorporates the smoothing properties of the regularization matrix L into the iterative process and automatically
stops the iterations without requiring the norm of the noise as input data.
4.1
Stopping rule for LSQR
Recall that LSQR constructs a sequence of approximations for the solution of (1) by
minimizing the residual over the Krylov subspace Kk (AT A, AT b). That is, the LSQR
approximations are defined by
xk =
argmin
x∈Kk (AT A,AT b)
kAx − bk2 ,
(45)
and can be expressed as
xk = V k yk ,
yk = argmin kBk y − β1 e1 k2 ,
(46)
y∈Rk
where Bk and Vk are from (4)-(6) [17]. To describe our stopping rule, we first consider
a truncation criterion for the method of truncated SVD (TSVD) solutions. Using the
notation introduced in (15) the TSVD solution xk is defined as
xk =
A†k b
=
k
X
uTj b
j=1
σj
vj ,
(47)
P
where Ak = kj=1 σj uj vjT and k is the truncation parameter. We shall assume that the
noise free data bexact satisfies the discrete Picard condition (DPC) [36], i.e., the coefficients
|uTj bexact |, on the average, decay to zero faster than the singular values, and that the noise
is Gaussian zero mean. These assumptions imply that there exists a integer k ∗ such that
|uTj b| = |uTj bexact + uTj e| ≈ |uTj e| ≈ constant, for j > k ∗ .
(48)
We now note that an error estimate for xk can be expressed as
kxk − xexact k ≤ kA†k bexact − xexact k2 + kA†k ek2 ≡ E1 (k) + E2 (k),
with
E1 (k) =
n
X
|uTj bexact |2
σj2
j=k+1
!1/2
,
E2 (k) =
k
X
|uTj e|2
j=1
σj2
!1/2
.
(49)
(50)
The first term, called the regularization error, decreases with k and can be small when
k is large. The second term measures the noise magnification error; it increases with k
13
and can be large for σj ≈ 0. Thus the choice of the truncation parameter requires a
balance between these two errors in order to make the overall error small. A closer look
at the two types of errors reveals that for k > k ∗ , the error E2 (k) increases dramatically
while E1 (k) remains under control due to the DPC, and the overall error should not be
minimized for k > k ∗ . Conversely, for k < k ∗ , E1 (k) increases regularly with k while E2 (k)
remains under control since the singular values σj dominate the coefficients |uTj e|; again
the overall error is not minimized. Therefore the error estimate should be minimized at
k = k ∗ . However, neither E1 (k) nor E2 (k) is available and k ∗ must be estimated by other
means. We propose to do this by minimizing the product
Ψk = krk k2 kxk k2 k ≥ 1,
(51)
P
where rk = b−Axk = nj=k+1 (uTj b)uj +b⊥ with b⊥ = (Im −UUT )b; this can be explained as
follows. Note from (48) that if k > k ∗ , the residual norm decreases approximately linearly
while the solution norm increases dramatically as 0 ≈ σk < |uTk e|. Thus Ψk cannot be
minimized for k > k ∗ . On the other hand, if k < k ∗ , the residual norm gets relatively
large since the coefficients |uTk b| are large, while the solution norm increases slowly with k.
Hence Ψk cannot be minimized for k < k ∗ . Thus, the sequence Ψk should be minimized
at k = k ∗ . We can thus conclude that a good choice of the truncation parameter for
TSVD is the minimizer of Ψk . Our experience is that this truncation criterion produces
a regularization parameter that very often coincides with the optimal parameter, the
minimizer of the relative error REk = kxexact − xk k2 /kxexact k2 , as seen in Fig. 4.
2
Ψk
E1(k)+E2(k)
10
REk
2
10
0
10
0
10
0
0
5
10
10
0
−2
5
10
10
0
5
10
Figure 4: Errors in xk , Ψk and relative errors for shaw test problem using the same data
as in with Fig. 1. The optimal relative error is 4.98% and reached at k ∗ = 7 = argmin Ψk .
The reader should compare this result with that of Tikhonov regularization, see Fig. 1
However, the SVD is infeasible for large-scale problems and the TSVD method may
not be of practical utility. As an alternative to TSVD for large-scale problems, we propose
to use LSQR coupled with a stopping rule defined similarly as the truncation criterion for
TSVD above. Therefore, our stopping rule for LSQR selects as stopping index the first
integer b
k satisfying :
b
k = argmin Ψk ,
Ψk = kb − Axk k2 kxk k2 .
(52)
The choice of the stopping rule can be supported as follows. First, the residual and
solution LSQR norms behave monotonically like the residual and solution TSVD norms,
i.e., while kb − Axk k2 decreases with k, kxk k2 increases, and second, the LSQR iterate xk
lives in the Krylov subspace Kk (AT A, AT b) which very often carries relevant information
14
on the dominant k right singular vectors of A [30], in which case xk approximates the k-th
TSVD solution well, and a similar observation applies to the sequences Ψk for TSVD and
LSQR, respectively, as we see in Figure 5.
Corner at k=7
4
10
Ψ (SVD)
L−Curve (LSQR)
Ψ (LSQR)
3
10
||xk||
RE (SVD)
k
k
2
10
RE (LSQR)
k
k
2
10
0
2
10
10
1
10
0
0
10
10
0
−2
5
10
10
0
5
10
||b−Axk||
Figure 5: L-curve, sequences Ψk associated with TSVD and LSQR for shaw test problem
with the same data of the previous figure and error histories of a few iterates.
From the practical point of view, our proposal for determining the stopping index b
k is
to evaluate the finite forward differences
∇Ψk = Ψk+1 − Ψk ,
k ≥ 1,
(53)
and then select the first index k for which ∇Ψk changes sign. More specifically, our
proposal is to select the first k such that ∇Ψk−1 ≤ 0 and ∇Ψk ≥ 0. The main feature of
this stopping rule is that only b
k+1 GKB steps are required to construct the approximation
xbk to the noise free solution of (1). All our numerical results to be presented in the sequel
are obtained with this stopping rule.
The following result bounds the error in the LSQR iterate xk as an approximation to
the k-th TSVD solution.
Theorem 4.1. Assume that the GKB algorithm runs to completion. Then the relative
distance between the LSQR and the TSVD solution can be bounded as
1
kxk − xk k2
≤
(Φk + γk ) ,
kxk k2
σk
(54)
where Φk = krk k2 /kxk k2 , and γk defined in the previous section.
For the proof of Theorem 4.1 we require a technical result.
Lemma 4.1. Let Ωk be the angle between the subspaces spanned by the k first right singular
vectors of A and the Krylov subspace Kk . Then sin Ωk ≤ γk /σk .
Proof: As in the proof of Theorem 2.2, we first assume m > n. In this case, after n GKB
steps matrix A can be decomposed as
A = Un+1 Bn+1 VnT ,
15
(55)
in which, for convenience of the proof, we write Un+1 = [Uk+1 U ⊥ ], Vn = [Vk V ⊥ ], and
k
k n −
Bk Ck
k+1
with Bk defined in (4). The other matrices, Ck , Dk ,
Bn+1 =
m−k−1
Dk F k
Ck
⊥
⊥
.
and Fk are clear from the context. Now by (55) we have AV = [Uk+1 U ]
Fk
Multiplying this equality with UTk and using the fact that UTk A = Σk VkT , where Uk
and Vk have k columns and
A, we get
of Σk contains the k largest singular values
Ck
T
⊥
Ck /σk (A),
.
Taking
norms
we
obtain
sin
Ω
≤
VkT V ⊥ = Σ−1
U
[U
U
]
k
k+1
k
k
Fk Fk
2
T ⊥
where we have used that kVk V k2 = sin Ωk , see, e.g., [34]. Thus, (4.1) follows because
the norm in this expression coincides with γk .
bn V T , with B
bn defined in (27), and the proof follows analogously.
If m = n then A = Un B
n
Proof of Theorem 4.1: Note that, based on (5)-(6), the LSQR iterate xk = Vk yk and
the associate residual rk = b − Axk satisfy
T
xk = Vk Bk† Uk+1
b,
T
Bk† Uk+1
rk = 0.
(56)
Then the error in xk with respect to xk can be written as
T
xk − xk = A†k b − Vk Bk† Uk+1
b
T
= (A†k − Vk Bk† Uk+1
)(rk + Axk )
T
= A†k rk + (A†k A − Vk Bk† Uk+1
A)xk .
But
A†k A = A†k Ak ,
T
T
Vk Bk† Uk+1
Axk = Vk Bk† Uk+1
AVk VkT xk = Vk Bk† Bk VkT xk = Vk VkT xk = xk ,
therefore
xk − xk = A†k rk − (In − A†k Ak )xk = A†k rk − (In − Pk )Pk xk .
Hence
kxk − xk k ≤ kA†k k2 krk k2 + k(In − Pk )Pk k2 kxk k2 ≤ kA†k k2 krk k2 + sin Ωk kxk k2 ,
where we have used the fact that k(In − Pk )Pk k2 = sin Ωk (see, Golub, [34, Theorem 2.6.1,
p. 76]). The assertion of the theorem follows on using Lemma (4.1).
Theorem 4.1 states that if the information contained in Kk (AT A, AT b) is rich enough
so that γk ≪ σk and if the sum γk + Φk is small compared to σk , then the LSQR iterate
xk will be close to the k-th TSVD solution. Theorem 4.1 can also be used to bound
the error in xk with respect to xexact by using the triangular inequality and bounds on
kxk − xexact k which require smoothness conditions on xexact . This is quite involved and is
not considered in the present paper.
We now turn to the stopping rule (52). Note that it is nothing more than a discrete
counterpart of the parameter choice rule (8) for µ = 1 which looks for a corner of the
16
continuous Tikhonov L-curve; hence, it should come as no surprise to see the minimizer
of Ψk closely related to a point on the discrete L-curve
(log kb − Axk k2 , log kxk k2 ),
k = 1, . . . , q,
(57)
located near the corner of the L-shaped region, see Fig. 5. Thus, the decision to stop
LSQR in connection with discrete ill-posed problems can, in principle, be also managed by
locating the corner of discrete L-curves for which several algorithms exist. Corner finding
methods include a method based on spline curve fitting by Hansen and O’Leary [3], the
triangle method by Castellanos et al. [37], a method by Rodriguez and Theis [38], and
an adaptive algorithm referred to as the “pruning algorithm” by Hansen et al. [20]. The
pruning algorithm advocates that to overcome difficulties of its predecessors, the corner
must be determined by evaluating the overall behavior of the L-curve. The effectiveness
of this approach and its capability to determine the “best” corner of discrete L-curves is
illustrated numerically in [20]. However, finding the corner using a limited sequence of
points is not an easy task, and the existing algorithms are not without difficulties, see,
e.g., Hansen [30, p. 190] for discussions on shortcomings, Hansen et al. [20] for a multiple
corner case, and Salehi Ravesh et al. [22] for an application of the L-curve criterion to
quantification of pulmonary microcirculation.
To illustrate the performance of the pruning algorithm in connection with LSQR
(with full reorthogonalization), we use gravity, heat, foxgood and shaw test problems
from [32], moler (with α = 0.5), lotkin, prolate and hilbert test problems from Matlab’s
“matrix gallery”. In all cases we consider coefficient matrices of size 1024 × 1024 and
distinct right-hand sides defined by b = Axexact + e, where e is generated by the Matlab
function randn with the state value set to 0 and with three distinct noise levels (NL)
NL = kek2 /kAxexact k2 = 10−4 , 10−3 , 10−2 . To ensure that the overall behavior of the
L-curve is contained in the data, we take q = 120 points. The corner of the L-curve,
denoted by kLC , is determined by using the Matlab code corner from [32], and the relative
error in xkLC is denoted by ELC . For comparison, the stopping index b
k determined by
minimizing Ψk , the relative error in xbk denoted by EΨ , optimal parameters and optimal
errors, denoted by kopt and Eopt , respectively, are also computed.
Average relative errors of 20 realizations as well as the minimum/maximum k determined by the pruning algorithm, the stopping rule (52) and the optimal one along the
realizations, are all displayed in Table 1.
Note that, except for the fact that the pruning algorithm failed solving moler and lotkin
test problems for NL = 10−4 and NL = 10−2 , respectively, the quality of the approximate
solutions produced by this algorithm and the stopping rule (52) remains comparable in
the other cases and other test problems. In particular, for the noise level NL = 10−4 , we
note that while the pruning algorithm produced approximate solutions of poor quality,
the solutions produced by the stopping rule (52) remained within tolerable bounds. The
pruning algorithm failed constructing reasonable approximate solutions because the corner
of the L-curve was not correctly identified several times, see Fig. 6 (left). Fig. 6 (center)
shows the discrete L-curve of the first realization with the corner determined by the
pruning algorithm (corresponding to kLC = 107) marked by (in red) and with the “true”
corner located at k = 19 marked by ◦ (in red). A similar observation applies for lotkin
test problem.
To learn more about the properties of the pruning algorithm, we investigated the
17
gravity
heat
foxgood
shaw
moler
lotkin
prolate
hilbert
gravity
heat
foxgood
shaw
moler
lotkin
prolate
hilbert
gravity
heat
foxgood
shaw
moler
lotkin
prolate
hilbert
kLC
13(19)
46(52)
4(5)
9(9)
18(108)
7(7)
20(22)
9(10)
kLC
11(15)
27(31)
3(4)
7(8)
9(10)
3(5)
17(17)
7(8)
kLC
7(11)
15(17)
2(2)
5(5)
4(4)
3(18)
10(13)
6(6)
b
k
11(14)
28(42)
5(5)
9(9)
19(21)
7(7)
10(12)
9(9)
NL = 10−4
kopt
ELC
11(13)
0.0522
28(31)
0.0902
4(4)
0.0096
9(9)
0.0325
9(11)
8.7458
7(9)
0.4384
9(13)
0.0223
10(12)
0.4358
EΨ
0.0109
0.0175
0.0119
0.0325
0.1283
0.4384
0.0002
0.4382
Eopt
0.0043
0.0125
0.0028
0.0325
0.0107
0.4317
0.0002
0.4258
b
k
10(11)
28(29)
3(4)
7(8)
9(10)
5(5)
12(16)
7(8)
NL = 10−3
kopt
ELC
9(11)
0.0368
20(22)
0.0774
3(3)
0.0201
7(9)
0.0514
7(8)
0.0496
5(7)
0.4478
6(11)
0.0260
7(9)
0.4396
EΨ
0.0224
0.0691
0.0201
0.0515
0.0654
0.4475
0.0145
0.4396
Eopt
0.0118
0.0222
0.0074
0.0439
0.0220
0.4445
0.0008
0.4391
b
k
7(8)
16(16)
2(2)
5(6)
4(4)
3(3)
7(12)
6(6)
NL = 10−2
kopt
ELC
6(8)
0.0454
14(16)
0.0674
2(3)
0.0311
6(8)
0.1094
5(6)
0.1885
3(5)
1.9 × 105
1(6)
0.0173
6(7)
0.4400
EΨ
0.0356
0.0674
0.0311
0.0660
0.1885
0.4522
0.0150
0.4400
Eopt
0.0266
0.0629
0.0217
0.0534
0.0788
0.4505
0.0071
0.4400
Table 1: Regularization parameters and relative errors of regularized solutions determined
by the pruning algorithm, the stopping rule (52) and the optimal one. The exact solution
for moler, lotkin, prolate and hilbert test problems were taken to be the solution of shaw
test problem from [32].
behavior of kLC as a function of the number of data points q being used. The results
for moler test problem with NL = 10−4 are displayed in Fig. 6 (right). We note that for
several values of q the corner index kLC stagnates at k = 21, which coincides with the
maximum minimizer of Ψk , and that for larger values of q this corner index falls far away
from k = 21. Corner indexes for heat and gravity test problems did not behave this way
and they are not reported here. The results suggest that the corner index kLC depends on
the number of points q, and that unfortunate choices of q might lead to wrong corners.
We shall return to this point later in connection with a deblurring problem.
18
Corner at k=107
150
kLC
150
kLC(q)
L−Curve
3
10
100
||xk||
100
2
50
10
50
0
0
0
10
0
20
10
0
50
100
||b−Axk||
Figure 6: Corner index determined by the pruning algorithm of 20 realizations for moler
test problem in connection with LSQR (left). L-curve of first realization (center); in this
case, the corner determined by pruning algorithm is marked by while the “true” corner
is marked by ◦. Corner index determined by pruning algorithm using q points with q
ranging from 30 until 120 (right).
4.2
P-LSQR
As commented earlier, our intention is to construct approximate solutions using a
version of LSQR that incorporates smoothing properties of the regularization matrix L
into the computed solution. We shall do this by applying LSQR to the least squares
problem min kĀx − b̄k with Ā and b̄ from (30), as suggested in [13], using (52) as stopping
rule. Our iterative regularization algorithm, which we denote by P-LSQR, can thus be
summarized as follows
• apply LSQR to kĀx− b̄k2 and compute the iterate x̄k using the stopping rule defined
in (52).
• once the stopping index is determined, take as approximate solution
xk = L†A x̄k + xN ,
with xN and L†A as in (31).
Obviously for P-LSQR to be computationally feasible, the dimension of N (L) must be
†
small and the products with L† and LT must be performed as efficiently as possible, i.e.,
all possibilities must be explored in order to reduce the computational cost, either through
fast matrix-vector products or through the use of preconditioners. However, the latter is
not considered in this paper.
5
Numerical examples
We give some examples to illustrate our methods. Two examples involve deblurring
problems and a third one involves a Super-resolution problem; as before, the data vector
is of the form b = Axexact + e, where e is generated by the Matlab code randn with the
state value set to 0, and NL = kek2 /kAxexact k2 is referred to as the the noise level. All
19
computations were carried out in Matlab. To simplify the notation, the extension of GKBFP based on the LU factorization of L is denoted by FP-LU and the LD -based approach
is denoted by FP-LD . Average values of regularization parameters, time and relative error
(k)
in xλ , are denoted by λ̄, t̄ and Ē, respectively, while the minimum and maximum number
of steps required by the algorithms are denoted by km and kM , respectively.
5.1
Deblurring problems
The goal in this case is to recover an image stored in a vector xexact ∈ RM N from a
blurred and noisy image b = bexact + e so that Axexact = bexact , where A plays the role of
blurring operator (often referred to as the PSF matrix), and bexact represents the blurred
image. For simplicity, we consider N × N images and use N 2 × N 2 PSF matrices defined
by A = (2πσ 2 )−1 T ⊗ T . In this case, σ controls the width of the Gaussian point spread
function and T is an N × N symmetric banded Toeplitz matrix with half-bandwidth
band [15]; in what follows we use σ = 2 and band = 16. The regularization matrix is
chosen as
L=
5.1.1
IN ⊗ L1
L1 ⊗ IN
,



L1 = 

1
-1
1
-1
..
.

..
.
1
-1


 ∈ R(N −1)×N .

(58)
Rice test problem
Rice 64: The purpose here is two-fold: to illustrate that the methods proposed in this
work are less expensive than the joint bidiagonalization (JBD) algorithm by Kilmer et
al. [15] and, to learn more about the capabilities of the pruning algorithm to identify
the corner of discrete L-curves. We start with the observation that at the k-th step, the
JBD-based approach determines approximate solutions given by
(k)
xλ = argmin kb − Axk22 + λ2 kLxk22 ,
(59)
x∈Zk
where Zk is a Krylov subspace generated by the JBD algorithm applied to QA and QL ,
with QA ∈ Rm×n , QL ∈ Rp×n and Q = [QTA QTL ]T , where Q is from the QR factorization
b = [AT LT ]T . Each step of the JBD approach requires two matrix vector products of
of A
the form vb = QQT v, but the QR factorization is never computed; in practice one takes
b ,
vb = Au
LS
b 2.
with uLS = argmin kv − Auk
(60)
u∈Rn
This not only explains why the JBD-based approach is expensive but also shows that its
efficiency depends on the way uLS is computed. Two distinct ways were considered in this
paper: one uses LSQR as reported in [39], and the other one uses LSQR with subspace
preconditioning, as done in [12, 16]. We will report results obtained through the former,
which turned out to be the most efficient. To achieve our first goal we will use the JBD
algorithm with the FP method as parameter choice rule, which we call JBD-FP. JBDFP proceeds like PROJ-L in that for chosen p0 > 1 and for k ≥ p0 , the largest convex
fixed-point of φ(k) (λ) is computed and the process is repeated until a stopping criterion is
20
satisfied. Two distinct implementations of JBD-FP were considered: one implementation
denoted by JBD-FPL , which deals with problem (2) using the matrix L, and other denoted
by JBD-FPD , which deals with the equivalent problem (40). In this example we consider
a 64 × 64 subimage of the image rice which was used in [15] to illustrate a JBD-based
algorithm. Thus N = 64, A ∈ R4096×4096 and L ∈ R8064×4096 ; similarly as in [15], we
use data with 1% of white noise. Both GKB and JBD were implemented with complete
reorthogonalization; computation of fixed-points started with p0 = 10, and the iterations
stopped using a tolerance parameter ǫ = 10−6 .
For future comparison, regularization parameters determined by L-curve, a fixed-point
method, the stopping rule (52), optimal regularization parameters, and relative errors, are
all reported in Table 2. Regularization parameters for L-curve and Fixed-Point methods
were computed using the GSVD of the pair (A, R), where R ∈ R4095×4096 is from the
QR factorization of L. The optimal Tikhonov regularization parameter, defined as the
minimizer of kxλ − xexact k2 /kxexact k2 , is computed via exhaustive search. P-LSQR uses
Ā and b̄ from the explicit transformation approach implemented in std− form in [32]. As
it is apparent, see Figure 7, in this case the L-Curve has a well defined corner and φ(λ)
(with µ = 1) has a unique fixed-point that minimizes Ψ(λ). The true image, the blurred
and noisy image, and the images determined by L-Curve and fixed-point methods are all
displayed in Fig. 8.
λ
E
L-Curve
0.0783
8.03%
FP
0.0956
8.19%
optimal
0.0409
7.75%
P-LSQR
k = 48
8.31%
optimal
k = 83
7.82%
Table 2: Regularization parameters and relative errors.
Continuous L−Curve
Discrete L−Curve
1
10
φ(λ)
kLC(150)
10
10
10
log(Ψk)
3
0
10
−1
5
10
10
2
10
−2
2
10
10
−2
10
0
2
10
10
0
50
100
150
Figure 7: L-Curve, φ(λ), discrete L-curve and log(Ψk ) for rice test problem with N = 64.
The corner of the L-curve, the fixed-point of φ(λ), the “true” corner of the discrete L-curve,
and the minimizer of Ψk , are all marked by ◦. The corner determined by the pruning
algorithm of the first of 20 realizations using q = 150 points is marked by .
We now turn to the results obtained through JBD-FP and the methods proposed in
this paper. Average time of 20 realizations are Table 3. It becomes apparent that the
JBD-FP approaches are in fact more expensive than the methods proposed in this work,
and that the fastest one is P-LSQR followed by PROJ-L and FP-LD . Note that FP-LU
can be a good option. As far accuracy is concerned, all methods produced solutions with
relative error of approximately 8.2%; this is in accordance with the results obtained using
the GSVD of the pair (A, R) shown in Table 2.
21
Figure 8: True image (top left), blurred and noisy image (top right), reconstructed image
by LC (bottom left) and reconstructed image by FP (bottom right).
t̄
JBD-FPL
5.2206
JBD-FPD
4.2688
FP-LU
2.0084
FP-LD
0.8863
PROJ-L
0.2576
P-LSQR
0.2809
Table 3: Average time (in seconds) of 20 realizations.
Concerning the capability of the pruning algorithm to find the corner of discrete Lcurves, we arrive at the same conclusion as before: the corner index kLC can vary with
the number of points q of the discrete L-curve. Table 4 shows the corner index k and the
relative error in xk of the first realization for several values of q. The starting value of q
was chosen relatively close to the minimizer of Ψk , k = 48, to evaluate how the corner
index behaves in these cases. A false corner corresponding to q = 150 is displayed in
Fig. 7.
q
kLC (q)
Error
60
3
20.65%
80
3
20.65%
100
43
8.46%
120
43
8.46%
140
3
20.65%
160
3
20.65%
180
46
8.38%
200
43
8.46%
Table 4: Corner index kLC (q) selected by the pruning algorithm as a function of the
number of points q of the discrete L-curve (log kr̄k k2 , log kĀx − b̄k2 ), where r̄k and x̄k are
LSQR iterates of min kĀx − b̄k2 for rice test problem.
The conclusion we can drawn from the numerical experiments so far is that LSQR
coupled with the proposed stopping rule is cheaper than the pruning algorithm, that
the corner determined by the pruning algorithm depends on the number of points of the
discrete L-curve, and that for the tested problems our approach performs similarly as the
pruning algorithm when the latter works well. These conclusions explain why the pruning
algorithm will not be used in the following examples.
Rice 256: We now consider the 256 × 256 entire rice image. The PSF matrix A ∈
R65536×65536 , it has singular values decaying gradually to zero without any particular gap
(not shown here) and condition number κ(A) ≈ 3.40 × 1016 . The regularization matrix
22
L ∈ R130560×65336 . Only FP-LD , P-LSQR and PROJ-L are used. Average results of 20
realizations each with NL = 0.01 with and without complete reorthogonalization (labeled
as reorth = 1 and reorth = 0, respectively) are displayed in Table 5. Again, all methods
produced solutions with approximately the same quality regardless of whether reorthogonalization is used or not. However, in terms of speed, PROJ-L is superior. In this
example µ = 1 did not work in all runs and adjustments were needed. For details on such
an adjustment the reader is referred to [6].
λ̄
Ē
t̄
km (kM )
µ̄
FP-LD
0.0706
0.0820
7.9603
433 (512)
0.6519
P-LSQR
0.0851
4.3365
242 (268)
reorth = 0
PROJ-L
0.0874
0.0831
1.0652
28 (30)
1
FP-LD
0.0746
0.0812
54.6983
280 (290)
0.6635
P-LSQR
0.0845
15.9995
140 (141)
reorth = 1
PROJ-L
0.0874
0.0831
1.8808
28 (30)
1
Table 5: Results for entire rice test problem for NL = 0.01, p0 = 15 and ǫ = 10−6 .
5.1.2
Pirate test problem
In this example we consider a large image of size 512 × 512 called pirate, see Fig. 9.
Thus N = 512, the PSF matrix A ∈ R262144×262144 and the regularization matrix L ∈
R523264×262144 . As in the previous example, we report average results of 20 realizations
using FP-LD , P-LSQR and PROJ-L with and without complete reorthogonalization. Numerical results for the noise level NL = 0.01 are shown in Table 6. In this case, the
relative error in the computed solutions is approximately 14.7% and once more the fastest
algorithm is PROJ-L. Visual results of this experiment can be seen in figure 9. For this
test problem, the choice µ = 1 works satisfactorily in all runs.
λ̄
Ē
t̄
km (kM )
FP-LD
0.1563
0.1479
64.5863
516 (572)
P-LSQR
0.1483
36.4999
309 (335)
reorth = 0
PROJ-L
0.1491
0.1463
12.5072
39 (39)
FP-LD
0.1577
0.1472
426.0727
282 (282)
P-LSQR
0.1477
156.7809
163 (163)
reorth = 1
PROJ-L
0.1492
0.1462
17.6866
39 (39)
Table 6: Results for Pirate test problem with NL = 0.01, p0 = 20 and ǫ = 10−6 .
5.2
Super-resolution
High-resolution (HR) images are important in a number of areas such as medical
imaging and video surveillance. However, due to hardware limitations and cost of image
acquisition systems, Low-resolution (LR) images are often available. We consider the
problem of estimating an HR image form observed multiple LR images. Let the original
HR image of size M = M1 × M2 in vector form be denoted by x ∈ RM , and let the k-th
LR image of size N = N1 × N2 in vector form be denoted by bk ∈ RN , k = 1, 2, . . . , q,
23
Figure 9: True image, LR noisy image, and restored image by PROJ-L.
with M1 = N1 × D1 , M2 = N2 × D2 , where D1 and D2 represent down-sampling factors
for the horizontal and vertical directions, respectively. Assuming that the acquisition
process of the LR sequence involves blurring, motion, subsampling and additive noise, an
observation model that relates x to bk is written as [40]
b k = Ak x + ǫ k
(61)
where Ak is N × M , and ǫk stands for noise. The goal is to estimate the HR image x from
all LR images bk . In this case Tikhonov regularization takes the form
xλ = arg min {kb − Axk22 + λ2 kLxk22 }
(62)
x∈RM
where b = [bT1 . . . bTq ]T , A = [AT1 . . . ATq ]T , and L is a discrete 2D differential operator.
In this example we estimate the 96 × 96 image tree from a sequence of five noisy LR
images with D1 = D2 = 2. Therefore A ∈ R11520×9216 and L ∈ R18240×9216 . As both A and
the regularization matrices are not too large, the JBD-FP approaches are used again and
all implemented with complete reorthogonalization. Average results of 20 realizations are
shown in Table 7. We note again that PROJ-L is superior. For this example the choice
µ = 1 worked satisfactorily in all cases. The original HR image and two noisy LR images
are depicted in the first row of Fig. 10. One of the restored images determined by PROJ-L
is depicted in the second row of the same figure. Also, to illustrate the performance of
λ̄
Ē
t̄
km (kM )
FP-LU
0.0309
0.0496
16.4869
281 (284)
FP-LD
0.0309
0.0496
9.1734
281 (284)
P-LSQR
0.0502
3.2403
157 (160)
PROJ-L
0.0300
0.0535
0.8753
47 (57)
JBD-FP
0.0309
0.0497
21.6653
112 (114)
JBD-FP
0.0309
0.0497
25.1221
112 (114)
Table 7: Results for super-resolution problem for NL = 0.01, ǫ = 10−6 , and p0 = 15.
the algorithms using distinct regularization matrices, we consider the cases L = I and
L2,2D =
IN ⊗ L2
L2 ⊗ IN
,


L2 = 
-1
2
..
.
-1
..
.
-1

..
.
2
-1

(N −2)×N
.
∈R
In this case we obtained solutions with relative error of approximately 28.80% and 4.95%,
respectively. That is, while the quality of the solutions for the case L = I deteriorate
24
Figure 10: First row: HR image (left) and two LR noisy images. Second row: Restored
image with L = I determined by GKB-FP (left), restored image with L from (58) determined by PROJ-L (center), and restored image with L2,2D determined by PROJ-L (right).
significantly, the quality of the solutions using L2,2D remain practically the same as that
obtained using L from (58). Visual results are depicted in Fig. 10.
We end the numerical results section with the observation that we also performed
numerical experiments involving the three above deblurring test problems using data
with distinct noise levels. The results showed that the extensions of GKB-FP perform
similarly as the original version of the algorithm, and are not show here.
6
Conclusions
We reviewed the GKB-FP algorithm and showed how to extend it to large-scale general form Tikhonov regularization. As a result, three distinct approaches that do not require estimates of the noise level were proposed and numerically illustrated on large-scale
deblurring problems. Numerical results for representative test problems using data with
realistic noise levels (e.g., 0.1% and 0.01%) showed that in term of accuracy and efficiency,
the extended versions of GKB-FP perform satisfactorily in as much the same way as the
original version of the algorithm, and are therefore competitive and attractive for largescale general-form Tikhonov regularization. In addition, to overcome possible difficulties
in GKB-FP when selecting the Tikhonov regularization parameter at each iteration, we
proposed a stopping rule for LSQR, and showed numerically that the smoothed preconditioned LSQR algorithm (P-LSQR) coupled with the new rule can be a good alternative
to large-scale general-form Tikhonov regularization.
Acknowledgments
The authors are grateful to the reviewers for their valuable comments which have
significantly improved the presentation of this work.
25
References
[1] Tikhonov AN. Solution of incorrectly formulated problems and the regularization
method. Soviet Mathematics Doklady 1963; 4:1035-1038.
[2] Morozov VA. Regularization Methods for Solving Incorrectly Posed Problems.
Springer: New York, 1984.
[3] Hansen PC, O’Leary DP. The use of the L-curve in the regularization of discrete
ill-posed problems. SIAM Journal on Scientific Computing 1993; 14:1487-1503.
[4] Golub GH, Heath M, Wahba G. Generalized cross-validation as a method for choosing
a good ridge parameter. Technometrics 1979; 21:215-222.
[5] Regińska T. A regularization parameter in discrete ill-posed problems. SIAM Journal
on Scientific Computing 1996; 3:740-749.
[6] Bazán FSV. Fixed-point iterations in determining the Tikhonov regularization parameter. Inverse Problems 2008; 24: DOI:10.1088/0266-5611/24/3/035001.
[7] Engl HW, Hanke M, Neubauer A. Regularization of Inverse Problems. Kluwer: Dordrecht, 1996.
[8] Bakushinski AB. Remarks on choosing a regularization parameter using quasioptimality and ratio criterion. USSR Computational Mathematics and Mathematical
Physics 1984; 24:181-182.
[9] Kindermann S. Convergence analysis of minimization-based noise level-free parameter choice rules for linear ill-posed problems. Electronic Transactions on Numerical
Analysis 2011; 38:233-257.
[10] Bauer F, Lukas MA. Comparing parameter choice methods for regularization of illposed problems. Mathematics and Computers in Simulation 2011; 81:1795-1841.
[11] Reichel L, Rodriguez G. Old and new parameter choice rules for discrete ill-posed
problems. Numerical Algorithms 2012; DOI 10.1007/s11075-012-9612-8.
[12] Bunse-Gerstner A, Guerra-Ones V, Madrid de la Vega H. An improved preconditioned LSQR for discrete ill-posed problem. Mathematics and Computers in Simulation 2006; 73:65-75.
[13] Hanke M, Hansen PC. Regularization methods for large-scale problems. Surveys on
Mathematics for Industry 1993; 3:253-315.
[14] Hansen PC, T. K. Jensen TK. Smoothing-Norm Preconditioning for Regularizing
Minimum-Residual Methods. SIAM Journal on Matrix Analysis and Applications
2006; 29:1-14.
[15] Kilmer ME, Hansen PC, Español MI. A projection-based approach to general-form
Tikhonov regularization. SIAM Journal on Scientific Computing 2007; 29:315-330.
26
[16] Jacobsen M, Hansen PC, Saunders MA. Subspace preconditioned LSQR for discrete
ill-posed problems. BIT 2003; 43:975-989.
[17] Paige CC, Saunders MA. LSQR: An algorithm for sparse linear equations and sparse
least squares. ACM Transactions on Mathematical Software 1982; 8:43-71.
[18] Saad Y, Schultz MH. GMRES: A generalized minimal residual algorithm for solving
nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing
1986; 7:856-869.
[19] Neuman A, Reichel L, Sadok H. Implementations of range restricted iterative methods for linear discrete ill-posed problems. Linear Algebra and its Applications 2012;
436:3974-3990.
[20] Hansen PC, Jensen TK, Rodriguez G. An adaptive pruning algorithm for the discrete
L-curve criterion. Journal of Computational and Applied Mathematics 2007; 198:483492.
[21] Morigi S, Reichel L, Sgallari F, Zama F. Iterative methods for ill-posed problems and
semiconvergent sequences. Journal of Computational and Applied Mathematics 2006;
193:157-167.
[22] Salehi Ravesh M, Brix G, Laun FB, Kuder TA, Puderbach M, Ley-Zaporozhan J,
Ley S, Fieselmann A, Herrmann MF, Schranz W, Semmler W, Risse F. Quantification of pulmonary microcirculation by dynamic contrast-enhanced magnetic resonance imaging: Comparison of four regularization methods. Magnetic Resonance in
Medicine 2012; DOI: 10.1002/mrm.24220.
[23] Chung J, Nagy JG, O’Leary DP. A Weighted-GCV Method for Lanczos-Hybrid Regularization. Electronic Transaction on Numerical Analysis 2008; 28:149-167.
[24] Bazán FSV, Borges LS. GKB-FP:an algorithm for large-scale discrete ill-posed problems. BIT 2010; 50:481-507.
[25] Lampe J, Reichel L, Voss H. Large-Scale Tikhonov Regularization via Reduction by
Orthogonal Projection. Linear Algebra and its Applications 2012; 436:2845-2865.
[26] Reichel L, Sgallari F, Ye Q. Tikhonov regularization based on generalized Krylov
subspace methods. Applied Numerical Mathematics 2012; 62:1215-1228.
[27] Hochstenback ME, Reichel L. An Iterative Method For Tikhonov Regularization
With a General Linear Regularization Operator. Journal of Integral Equations and
Applications 2010; 22:463-480.
[28] Eldén L. A weighted pseudoinverse, generalized singular values, and constrained least
square problems. BIT 1982; 22:487-502.
[29] Golub GH, Kahan W. Calculating the singular values and pseudo-inverse of a matrix.
Journal of the Society for Industrial and Applied Mathematics Series B Numerical
Analysis 1965; 2:205-224.
27
[30] Hansen PC. Rank-deficient and discrete ill-posed problems. SIAM: Philadelphia, 1998.
[31] Kirsch A. An Introduction to the Mathematical Theory of Inverse Problems. Springer:
New York, 1996.
[32] Hansen PC. Regularization Tools: A MATLAB package for analysis and solution of
discrete ill-posed problems. Numerical Algorithms 1994; 6:1-35.
[33] Bazán FSV, Francisco JB. An improved Fixed-point algorithm for determining the
Tikhonov regularization parameter. Inverse Problems 2009; 25: DOI:10.1088/02665611/25/4/045007.
[34] Golub GH, Van Loan CF. Matrix Computations. The Johns Hopkins University Press:
Baltimore, 1996.
[35] Paige CC, Saunders MA. Solution of sparse indefinite systems of linear equations.
SIAM Journal on Numerical Analysis 1975; 12:617-629.
[36] Hansen PC. The Discrete Picard Condition For Discrete Ill-Posed Problems. BIT
1990; 30:658-672.
[37] Castellanos JJ, Gómez S, Guerra V. The triangle method for finding the corner of
the L-curve. Applied Numerical Mathematics 2002; 43:359-373.
[38] Rodriguez G, Theis D. An algorithm for estimating the optimal regularization parameter by the L-curve. Rendiconti di Matematica 2005; 25:69-84.
[39] Saunders MA. Computing projections with LSQR. BIT 1997; 37:96-104.
[40] Park SC, Park MK, Kang MG. Super-resolution image reconstruction: a technical
overview. IEEE Signal Processing Magazine 2003; 20:21-36.
28
Download