1 Introduction - Optimization Online

advertisement
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
WITH APPLICATIONS
JAMES V. BURKE AND TIM HOHEISEL
Abstract. A new class of matrix support functionals is presented which establish a connection between optimal value functions for quadratic optimization problems, the matrix-fractional function, the pseudo matrix-fractional
function, and the nuclear norm. The support function is based on the graph
of the product of a matrix with its transpose. Closed form expressions for
the support functional and its subdifferential are derived. In particular, the
support functional is shown to be continuously differentiable on the interior of
its domain, and a formula for the derivative is given when it exists.
1 Introduction
One of the first topics studied in an elementary course on optimization is that of
optimizing a quadratic function over an affine set:
1
min uT V u − xT u s.t. Au = b,
(1)
u∈Rn 2
where (x, V, A, b) ∈ Rn × Sn × Rp×n × Rp , with Sn the linear space of real symmetric
n × n matrices. A complete characterization of the set of solutions and the optimal
value are easily obtained (see Theorem 3.2 below) without the aid of calculus. Less
well studied are the properties of the optimal value function
1 T
T
v(x, V ) := infn
u V u − x u | Au = b ,
(2)
u∈R
2
for given (A, b) ∈ Rp×n × Rp with b ∈ rge A. The variational properties of v, and its
extensions, are easily derived as a consequence of the more general results of this
paper. In particular, we show that v is a concave function, and, more specifically, it
is the negative of a convex support functional on Rn × Sn . In addition, we show −v
equals the (pseudo) matrix-fractional function when (A, b) = (0, 0) [2, 5, 7, 13, 14]
(see Section 5.2).
The central object of study is the support functional σD(A,B) : E → R̄ := R ∪
{±∞} given by
σD(A,B) (X, V ) := sup {h(X, V ), (U, W )i | (U, W ) ∈ D(A, B) } ,
n×m
× S , rge B ⊂ rge A, and
1
D(A, B) :=
Y, − Y Y T ∈ E Y ∈ Rn×m : AY = B .
2
where E := R
(3)
n
(4)
2010 Mathematics Subject Classification. 46N10, 49J52, 49J53, 90C20, 90C46.
Key words and phrases. Support function, quadratic programming, value function, matrixfractional function, nuclear norm, subdifferential, normal cone.
This research was partially supported by the DFG (Deutsche Forschungsgemeinschaft) under
grant HO 4739/2-1.
1
2
JAMES V. BURKE AND TIM HOHEISEL
Here, we use the Frobenius inner product for matrix spaces (see Section 2.2). The
set D(A, B) is the graph of the mapping Y 7→ − 12 Y Y T over the affine manifold
{Y | AY = B }. This mapping is a common transformation in numerous applications to statistics and semidefinite programming (e.g., see [1, 2]). In Section 4, we
obtain a closed form expression for σD(A,B) in terms of X, V, A, and B, a description of its domain and its interior, a characterization of the closed convex hull of
the set D(A, B), and a characterization of the convex subdifferential ∂σD(A,B) . In
particular, we show that σD(A,B) is differentiable on the interior of its domain and
provide a formula for the derivative. By taking m = 1, we obtain the representation
v = −σD(A,b) ,
where v is the optimal value function defined in (2) (see Theorem 5.1). In addition,
we find that σD(0,0) is precisely the pseudo matrix-fractional function studied in
[2, 5, 7, 13, 14]. The pseudo matrix-fractional function coincides with the matrixfractional function when V is positive definite (see Theorem 5.3).
Our derivation of the subdifferential ∂σD(A,B) makes use of techniques from nonconvex, nonsmooth variational analysis. For this reason, our study begins in Section
2 by recalling the necessary background material from convex and nonsmooth variational analysis [3, 16] as well as matrix analysis [12]. In Section 3, we review
the elementary results concerning the problem (1) reformulating them in a manner
more suitable for our study. These results are the key to our analysis. The core
results of the paper are presented in Section 4 where we develop the properties of
the support functional σD(A,B) . In Section 5 we present three applications of the
results of Section 4. The first of these is the representation of the optimal value
function v described above. The second is the application to the matrix-fractional
function and its generalizations. The final application establishes an interesting
relationship between σD(A,B) and the nuclear norm. In particular, we recover the
relationship discussed in [13] which is used to obtain smooth approximations to the
nuclear norm in optimization.
Notation: The support of a vector d ∈ Rp is the index set
supp d := {i ∈ {1, . . . , p} | di 6= 0} .
p
The unit simplex in R is given by
(
∆p :=
p
λ ∈ R | λi ≥ 0 (i = 1, . . . , p),
p
X
)
λi = 1 ,
i=1
while the unit sphere in Rp is the set
Sp−1 := {x ∈ Rp | kxk2 = 1 } .
For a set B ⊂ Rp its orthogonal complement is defined by
B ⊥ := x ∈ Rp xT b = 0 ∀b ∈ B .
Its linear hull or span is the set
(
)
n
X
span B := x ∈ Rp ∃n ∈ N, bi ∈ B , αi ∈ R (i = 1, . . . , n) : x =
αi bi .
i=1
For two sets A, B ⊂ Rn its Minkowski sum is the set
A + B := {a + b | a ∈ A, b ∈ B } .
Moreover, if A = {a} is a singleton, we loosely write a + B := {a} + B.
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
3
2 Preliminaries
2.1 Tools for variational analysis in Euclidean space
Let (E, h·, ·i) be a Euclidean space, i.e., a finite dimensional real inner product
space. We
p denote the norm on E induced by the inner product h·, ·i by k · k, i.e.,
kxk := hx, xi for all x ∈ E. In particular, we equip E = Rn with the standard
scalar product giving k · k = k · k2 .
For an extended real-valued function f : E → R its epigraph is the set
epi f := {(x, α) ∈ E × R | f (x) ≤ α},
and its domain is the set
dom f := {x ∈ Rn | f (x) < +∞}.
The notion of the epigraph allows for very handy definitions for a number of properties of extended real-valued functions. A function f : E → R is called lower
semicontinuous (lsc) (or closed) if epi f is a closed set, and upper semicontinuous
(usc) if −f is lsc. The lower semicontinuous hull of f , denoted cl f , is the function
whose epigraph is given by cl (epi f ). f is said to be convex if epi f is a convex
set, and concave if −f is convex. Morever, f is said to be proper if dom f 6= ∅ and
f > −∞.
The (convex) indicator function of a set C ⊂ E, δC : E → R ∪ {+∞}, is given by
0 if x ∈ C,
δC (x) :=
+∞ if x ∈
/ C.
The indicator function δC is convex if and only if C is convex, and δC is lsc if and
only if C is closed. The set C is a cone if λC ⊂ C for all λ ≥ 0. It is a convex cone
if, in addition, C + C ⊂ C.
The (convex) subdifferential of a function f : E → R at a point x̄ ∈ dom f is the
set
∂f (x̄) := {v ∈ E | f (x) ≥ f (x̄) + hv, x − x̄i
∀x ∈ E }
∗
with dom ∂f := {x̄ | ∂f (x̄) 6= ∅ }. The (convex) conjugate f : E → R of f is defined
by
f ∗ (y) :=
sup {hx, yi − f (x)}.
x∈dom f
We call a function f : E → R positively homogeneous if
0 ∈ dom f
and f (λv) = λf (v) ∀v ∈ E and λ > 0.
If, in addition,
f (v + w) ≤ f (v) + f (w) ∀v, w ∈ E,
we say that h is sublinear. Note that f is positively homogeneous if and only if epi f
is a cone, in which case either f (0) = 0 or f (0) = −∞. Moreover, h is sublinear
if and only if it is convex and positively homogeneous, which holds if and only if
epi f is a convex cone [16, Ex. 3.19].
For C ⊂ E, its convex hull is defined to be the set
\
conv C :=
{D ⊂ E | C ⊂ D, D convex } .
4
JAMES V. BURKE AND TIM HOHEISEL
By Carathéodory’s Theorem, e.g., see [3, Sec. 2.3, Ex. 5], we can represent conv C
as
(
)
κ+1
X
conv C = x ∈ E ∃λ ∈ ∆κ+1 , c1 , . . . , cκ+1 ∈ C : x =
λi ci ,
i=1
where κ := dim E. Furthermore, the closed convex hull of C is the closure of the
convex hull of C, i.e.,
conv C := cl (conv C).
The function
∗
σC (v) := δC
(v) = sup hv, wi
w∈C
is called the support functional for C, which is always sublinear (hence convex), lsc
and proper when C is nonempty, cf., e.g., [3, Sec. 4.2, Ex. 9], and the bi-conjugacy
theorem [3, Th. 4.2.1] gives
∗
(5)
σC
= δconv C .
Given a set C ⊂ E the regular normal cone of C at x̄ ∈ C is the set
hd, x − x̄i
N̂C (x̄) := d ∈ E lim inf
≤0 ,
x→C x̄ kx − x̄k
and the limiting normal cone of C at x̄ ∈ C is the set
n
o
NC (x̄) := v ∈ E ∃{xk ∈ C} → x̄, v k ∈ N̂C (xk ) : v k → v ,
i.e., the limiting normal cone is the outer limit of the regular normal cone in the
sense of Painlevé-Kuratowski, cf. [16, Def. 5.4].
We call a closed set C ⊂ E (Clarke) regular at a point x̄ ∈ C if NC (x̄) = N̂C (x̄).
If this holds true for every point x̄ ∈ C, we simply say C is (Clarke) regular. In
particular, every closed convex set D ⊂ E is regular and
ND (x̄) = {v ∈ E | hv, x − x̄i ≤ 0
∀x ∈ D } = ∂δD (x̄) ∀x̄ ∈ D.
2.2 Tools from matrix analysis
A general reference for the material in this section is [12]. For a matrix A ∈ Rp×m we
denote the ith column vector (i = 1, . . . , m) by ai ∈ Rp , i.e., A = [a1 , . . . , am ]. The
range of A is the set rge A := {Ax ∈ Rp | x ∈ Rm } = span {a1 , . . . , am }, and the
kernel or null space is ker A := {x ∈ Rm | Ax = 0 } . The following basic subspace
relationships are elementary and used freely throughout:
⊥
ker A = rge AT
and rge A = (ker AT )⊥ .
n×n
The trace of a square
is the sum of all diagonal elements, and we
Pn matrix M ∈ R
write tr (M ) = i=1 mii . If λP
1 (M ), . . . , λn (M ) are the eigenvalues of M , counting
n
multiplicities, then tr (M ) = i=1 λi (M ).
The singular value decomposition (SVD) of A ∈ Rp×m is a factorization of the form
A = U Σ̂QT , where U ∈ Rp×p and Q ∈ Rm×m are orthogonal and the only nonzero
entries of the matrix Σ̂ ∈ Rp×m are Σ̂ii > 0 for i = 1, 2, . . . , k := rank A. These
nonzero entries, σ1 (A) ≥ σ2 (A) ≥ · · · ≥ σk (A), are called the singular values of A,
and they equal the square roots of the nonzero eigenvalues of AT A (or, equivalently,
AAT ). The reduced SVD is the factorization A = U1 ΣQ1 , where Σ ∈ Rk×k is the
nonsingular square diagonal matrix of singular values, U1 ∈ Rp×k consists of the
first k columns of U , and Q1 ∈ Rk×m consists of the first k columns of Q. The
remaining columns of both U and Q can be given any desired but commensurate
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
5
left-right ordering. The singular values can be used to define a family of norms on
Rp×m called the Schatten norms. Of particular interest to us is the nuclear norm
(or Schatten 1-norm) defined by kAkn = tr Σ.
For A ∈ Rp×n , the matrix S is called the Moore-Penrose pseudoinverse of A if and
only if
SA = (SA)T , AS = (AS T ), ASA = A, and SAS = S.
(6)
†
We denote the Moore-Penrose pseudoinverse by A (cf. [11, p. 139] or [15, Ch.
2]). It is straightforward to show that A† = Q1 Σ−1 U1T , where A = U1 ΣQT1 is the
reduced SVD for A. In particular, A† A and AA† are the orthogonal projections
onto rge AT and rge A, respectively. Consequently, given b ∈ rge A, we have
{x ∈ Rn | Ax = b } = A† b + ker A,
which is of particular importance in our study.
For two matrices A ∈ Rr×s , B ∈ Rp×q , we define their Kronecker product by


a11 B · · · a1s B

..
..  .
A ⊗ B :=  ...
.
. 
ar1 B · · · ars B
The space of real symmetric matrices of dimension p × p is denoted by Sp . The
subset of symmetric positive semidefinite matrices is denoted by Sp+ , and the subset
of symmetric positive definite matrices is Sp++ . In addition, we use the following
notation:
S T :⇐⇒ S − T ∈ Sp+ ,
and
S T :⇐⇒ S − T ∈ Sp++ .
Whenever we consider a matrix (sub-)space of the form E ⊂ Rr×s we equip it with
the inner product h·, ·i : E → R defined by
hA, Bi := tr (AT B),
which makes E a Euclidean space. The induced norm is called the Frobenius norm.
This procedure is also adopted for matrix product-spaces. In particular, the space
E := Rn×m × Sn is equipped with the inner product h·, ·i : E → R, defined by
h(X, V ), (Y, W )i := tr (X T Y ) + tr (V W )
yielding a Euclidean space (of dimension κ := mn + n(n+1)
) with induced norm
2
k · k : E → R given by
q
q
k(X, V )k = tr (X T X) + tr (V 2 ) = kXk2F + kV k2F ,
where k · kF denotes the Frobenius norm on the respective space.
3 Optimization of quadratic functions
We now recall the elementary properties of the optimization problem (1) and its
associated optimal value function (2).
Lemma 3.1 (Solvabilty of equality constrained QPs). For (x, V, A, b) ∈ Rn × Sn ×
Rp×n ×Rp consider the quadratic optimization problem (1). Then (1) has a solution
if and only if the following three conditions hold:
6
JAMES V. BURKE AND TIM HOHEISEL
i) b ∈ rge A (i.e. (1) is feasible).
ii) x ∈ rge V AT .
iii) uT V u ≥ 0 ∀u ∈ ker A.
If, however, b ∈ rge A, but ii) or iii) are violated, we have v(x, V, A, b) = −∞.
Proof. This is a standard result, see, for example, [4, Sec. 4.3].
In the sequel, we use the following notation:
V ker A 0
uT V u ≥ 0
:⇐⇒
∀u ∈ ker A
and
V ker A 0
uT V u > 0
:⇐⇒
∀u ∈ ker A \ {0}.
A complete description of the optimal value function v defined in (2) is given in
our next result. This is the foundation on which our discussion of the support
functional σD(A,B) is based.
Theorem 3.2. At (x, V ), the optimal value-function v in (2) is given by

T x

V

 − 21
b
A

−∞


+∞
AT
0
† x
b
x
V AT
, V ker A 0,
∈ rge
b
A
0
b ∈ rge A ∧ (x ∈
/ rge [V AT ] ∨ V 6 ker A 0),
b∈
/ rge A.
if
if
if
Proof. First, if b ∈
/ rge A, then (1) is infeasible, hence, by convention
v(V,
x, A, b) =
+∞. Second, if b ∈ rge A, Lemma 3.1 tells us that if x ∈
/ rge V AT or V is not
positive semidefinite on ker A, wehavev(V, x, A,
b) = −∞.
Hence, we need only
x
V AT
show the expression for v when
∈ rge
(i.e. b ∈ rge A and
b
A 0
x ∈ rge V AT ) and V is positive semidefinite on ker A. Again, Lemma 3.1 tells
us that, in this case, a solution to (1) exists. The first-order necessary optimality
conditions at
ū ∈ argmin
u∈Rn
1 T
u V u − xT u s.t.
2
Au = b 6= ∅
are that there exists ȳ ∈ Rp for which
V
A
AT
0
ū
x
=
,
ȳ
b
or equivalently,
ū
V
∈
A
ȳ
AT
0
† x
V
+ ker
A
b
AT
0
.
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
Plugging such a pair
ū
ȳ
7
into the objective function yields
1 T
ū V ū − xT ū =
2
1 T
1
ū (V ū − x) − xT ū
|
{z
}
2
2
=−AT ȳ
1
1
= − |ūT{z
AT} ȳ − xT ū
2
2
=bT
T 1 x
ū
= −
2 b
ȳ
T 1 x
V
= −
A
2 b
AT
0
where the last equation is due to the fact that
x
V AT
V
∈ rge
= ker
A 0
A
b
† T
x
,
b
AT
0
⊥
.
Since all such points yield the same optimal value, this concludes the proof.
The matrix
V AT
M (V ) :=
(7)
A 0
in the foregoing theorem plays a pivotal role in our analysis. The next result sheds
some light on properties of M (V ) which is often referred to as a bordered matrix in
the literature [6, 8, 15].
Proposition 3.3 (Finsler’s Lemma and bordered matrices). Let A ∈ Rp×n and
V ∈ Sn . Then
V ker A 0
⇐⇒
∃ε > 0 : V + εAT A 0.
Moreover, the bordered matrix M (V ) is invertible if and only if rank A = p and
V ker A 0 in which case its inverse M (V )−1 is given by
P (P T V P )−1 P T
I − P (P T V P )−1 P T V A† ,
(A† )T I − V P (P T V P )−1 P T
(A† )T V P (P T V P )−1 P T V − V A†
where P ∈ Rn×(n−p) is any matrix whose columns form an orthonormal basis of
ker A.
Proof. For the first statement see, e.g., [6, Th. 2] and for the characterization of
the invertibility of M (V ) see, e.g., [6, Th. 7]. The inversion formula can be found
in [8, Th. 1] and is easily verified by direct matrix multiplication.
Remark 3.4. A general formula for the pseudoinverse M (V )† can be found in [15,
p. 58].
The first statement in Proposition 3.3 is referred to in the literature as Finsler’s
Lemma and originally goes back to [9].
Corollary 3.5. Let A ∈ Rp×n and V ∈ Sn . Then the following hold:
a) V ker A 0 =⇒ rge [V AT ] = Rn .
b) V ker A 0 ⇐⇒ [V ker A 0] ∧ [∃ U ⊂ Rp : ker M (V ) = {0}n × U ],
in which case U = ker AT .
8
JAMES V. BURKE AND TIM HOHEISEL
Proof. a) Let x ∈ Rn . Due to Finsler’s Lemma, there existsε > 0 and r ∈ Rn such
that (V + εAT A)r = x. Putting s := εAr gives [V AT ] rs = V r + εAT Ar = x,
which proves a).
b) ‘⇒:’ Let xy ∈ ker M (V ), i.e.
V x + AT y = 0
and Ax = 0.
Multiplying the first condition by xT yields xT V x = 0. By assumption, as x ∈
ker A, we get x = 0. Hence, y ∈ ker AT , and thus ker M (V ) = {0}n × ker AT . The
condition V ker A 0 holds trivially.
‘⇐:’ Let x ∈ ker A \ {0}. By assumption ker M (V ) = {0}n × U , and so xy ∈
/
ker M (V ) for all y ∈ Rp . Thus, since Ax = 0, we must have V x + AT y 6= 0 for
all y ∈ Rp or, equivalently, V x ∈
/ rge AT = (ker A)⊥ . Hence, there exists x̂ ∈ ker A
such that 0 < x̂T V x. We now compute that for all ε > 0, we have
0 ≤ (εx̂ − x)T V (εx̂ − x) = ε2 x̂T V x̂ − 2εx̂T V x + xT V x,
as εx̂ − x ∈ ker A and V ker A 0 by assumption. Rearranging the terms and
dividing by ε > 0 yields
0 < 2x̂T V x − εx̂T V x̂ ≤
xT V x
ε
for all ε sufficiently small. Consequently, xT V x > 0.
4 The class of matrix support functionals σD(A,B)
We are finally positioned to develop the properties of the support functional σD(A,B)
defined in (3), where A ∈ Rp×n and B ∈ Rp×m are such that rge B ⊂ rge A. We
begin by establishing a closed form expression for σD(A,B) . The key fact used in
this derivation is Theorem 3.2.
Theorem 4.1. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A and let
D(A, B) be given by (4). Then
(
σD(A,B) (X, V ) =
1
2 tr
X T
M (V )† X
B
B
if rge
else.
+∞
X
B
⊂ rge M (V ), V ker A 0,
In particular,
dom σD(A,B) = dom ∂σD(A,B) = (X, V ) ∈ E
rge X ⊂ rge M (V ), V ker A 0 ,
B
with this set being closed. Moreover, we have
int (dom σD(A,B) ) = {(X, V ) ∈ E | V ker A 0 } .
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
9
Proof. By direct computation,
σD(A,B) (X, V ) =
h(X, V ), (U, W )i
sup
(U,W )∈D(A,B)
1
tr (X T U ) − tr (U U T V )
2
U : AU =B
!)
(
m
X
1
i i T
T
u (u ) V
=
sup
tr (X U ) − tr
2
U : AU =B
i=1
(m
)
X
1
=
sup
(xi )T ui − (ui )T V ui
2
U : AU =B
i=1
m
X
1 T
i T
= −
u V u − (x ) u
inf
(8)
u: Au=bi 2
i=1
(
m
i T
i
X
xi
1 x
M (V )† xbi
if
i
2
b
bi ∈ rge M (V ), V ker A 0,
=
+∞
else
i=1

T
!

X
X
X
1 tr
M (V )†
if rge
⊂ rge M (V ), V ker A 0,
B
B
B
= 2


+∞
else.
=
sup
Here, the sixth equation exploits Theorem 3.2. This establishes the representation
for σD(A,B) as well as for its domain. In addition, since
∂σD(A,B) (X, V ) = arg max {h(X, V ), (U, W )i | (U, W ) ∈ D(A, B) } ,
Theorem 3.2 also yields the equivalence dom σD(A,B) = dom ∂σD(A,B) .
In order to see that dom σD(A,B) is closed, let {(Xk , Vk ) ∈ dom σD(A,B) } →
(V, X). In particular, rge Xk ⊂ rge [Vk AT ] and Vk ker A 0. These properties
are preserved by passing to the limit, hence (X, V ) ∈ dom σD(A,B) , i.e., dom σD(A,B)
is closed.
It remains to prove the expression for int (dom σD(A,B) ). First we show O :=
{(X, V ) ∈ E | V ker A 0 } is open. For this purpose, let (X, V ) ∈ O. Suppose
q := rank A and let P ∈ Rn×(n−q) be such that its columns form a basis of ker A
so that P T V P is positive definite. Using the fact that Sn++ = int Sn+ (e.g., see [3,
Ch. 1, Ex. 1]), there exists an ε > 0 such that
kW − V k ≤ ε
⇒
PTWP 0
∀W ∈ Sn .
Hence, Bε (X, V ) ⊂ O so that O is open.
Next, we show that O ⊂ dom σD(A,B) . Again let (X, V ) ∈ O, so that V ker A 0.
By Corollary 3.5, rge [V AT ] = Rn and so rge X ⊂ rge [V AT ]. Hence (X, V ) ∈
dom σD(A,B) yielding O ⊂ dom σD(A,B) . All
in all, since O is open and O ⊂
dom σD(A,B) , we have O ⊂ int dom σD(A,B) .
We now show the reverse inclusion. Let (X, V ) ∈ dom σD(A,B) \ O. Hence V
is positive semidefinite, but not positive definite on ker A. Therefore, there exists
x ∈ ker A \ {0} such that xT V x = 0 . Define
Vk := V −
1
I ∈ Sn
k
(k ∈ N).
10
JAMES V. BURKE AND TIM HOHEISEL
2
Then, for all k ∈ N, xT Vk x = − kxk
< 0 , so that (X, Vk ) ∈
/ dom σD(A,B) for
k
all k ∈ N while (X, Vk ) → (X, V ). Hence (X, V ) ∈ bd dom σ
D(A,B) , that is,
(X, V ) ∈
/ int (dom σD(A,B) ). Consequently, O ⊃ int dom σD(A,B) .
Our next goal is to compute the subdifferential of σD(A,B) . For this purpose, we
make use of an alternative representation of the closed convex hull of D(A, B) which
we now develop. Set
)
(
d ≥ 0, kdk = 1,
κ+1
n×m(κ+1) , (9)
F(A, B) := (d, Z) ∈ R
×R
AZi = di B (i = 1, . . . , κ + 1)
where, for (d, Z) ∈ F(A, B), we interprete Z as a block matrix of the form
Z = [Z1 , . . . , Zκ+1 ],
Zi ∈ Rn×m (i = 1, . . . , κ + 1).
Lemma 4.2. For the set D(A, B) defined in (4), we have
1
convD(A, B) = Z(d ⊗ Im ), − ZZ T | (d, Z) ∈ F(A, B) : Zi = 0 (i ∈
/ supp d) .
2
(10)
Proof. Recall that dim E = κ, hence, by Carathéodory’s Theorem (see Section 2.1),
every element of conv D(A, B) can be represented as a convex combination of no
more than κ + 1 elements of D(A, B).
In order to see the ⊂ −inclusion, let (Y, − 21 W ) ∈ conv D(A, B). Then there exist
Yi ∈ Rn×m with AYi = B (i = 1, . . . , κ + 1) and λ ∈ ∆κ+1 such that
Y =
κ+1
X
i=1
λi Yi
and W =
√
For i = 1, . . . , κ + 1 define di := λi and Zi :=
and d = (d1 , . . . , dκ+1 )T , we have
kdk2 =
κ+1
X
λi = 1
√
κ+1
X
λi Yi YiT .
i=1
λi Yi . With Z = [Z1 , Z2 , . . . , Zκ+1 ]
and AZi = di B
(i = 1, . . . , κ + 1).
i=1
That is,
(d, Z) ∈ F(A, B)
and Zi = 0
(i ∈
/ supp d).
Moreover, it follows that
Z(d ⊗ Im ) =
κ+1
X
di Zi =
i=1
κ+1
X
λi Yi = Y
i=1
and
ZZ T =
κ+1
X
i=1
Zi ZiT =
κ+1
X
λi Yi YiT = W.
i=1
Hence, conv D(A, B) is contained in the set on the right-hand side of (10).
To show the reverse inclusion, suppose that (Z(d ⊗ Im ), − 21 ZZ T ) is an element of
the set on the right-hand side of (10), and define
1
if i ∈ supp d,
di Zi
λi := d2i and Yi :=
(i = 1, . . . , κ + 1).
A† B else
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
11
Then, in particular, λ ∈ ∆κ+1 and AYi = B for all i = 1, . . . , κ + 1. Finally, we
have
κ+1
κ+1
X
X
X
λi Yi =
di Zi =
di Zi = Z(d ⊗ Im )
i=1
i=1
i∈supp d
and
κ+1
X
λi Yi YiT =
i=1
X
i∈supp d
Zi ZiT =
κ+1
X
Zi ZiT ,
i=1
where we exploit the fact that Zi = 0 if i ∈
/ supp d. Hence
1
T
∈ conv D(A, B)
Z(d ⊗ Im ), − ZZ
2
which concludes the proof.
Next, we compute the closed convex hull of D(A, B).
Proposition 4.3. For the set D(A, B) defined in (4), we have
1
conv D(A, B) = (Z(d ⊗ Im ), − ZZ T ) | (d, Z) ∈ F(A, B) .
2
(11)
Proof. The ⊂ −inclusion follows as soon as we establish that the set on the righthand side is closed (as it obviously contains conv D(A, B)): For this purpose, suppose (W, − 21 H) ∈ E is such that there is a sequence {(Zk (dk ⊗ Im ), − 21 Zk ZkT )} →
(W, − 12 H) with (dk , Zk ) ∈ F(A, B) for all k ∈ N. Let Uk Σk QTk = Zk be the singular
value decomposition of Zk with UkT Uk = In and QTk Qk = Im(κ+1) for all k ∈ N. For
each k ∈ N, define Σ̂k ∈ Rn×n to be the non-negative diagonal matrix comprised of
the first n columns of Σk . Note that the diagonal of Σ̂k contains all of the nonzero
singular values of Zk . Since Uk Σ̂2k UkT = Zk ZkT → H with Uk orthogonal, it must
be the case that {Σk } is bounded. Hence, with no loss in generality, there exist
(κ+1)
(U, Q, Σ, d) ∈ Rn×n × Rm(κ+1)×m(κ+1) × Rn×m
× R+
with U and Q orthogonal,
+
k
Σ diagonal, and kdk2 = 1 such that (Uk , Qk , Σk , d ) → (U, Q, Σ, d).
In particular, it holds that Zk → Z := U ΣQT , ZZ T = H, W = Z(d ⊗ Im ) and,
clearly, AZi = di B for all i = 1, . . . , κ + 1. This shows that (W, − 12 H) is an element
of the set on the right-hand side of (11) and so this set is closed.
To prove the reverse inclusion, let (d, Z) ∈ F(A, B). We are done once we find
a sequence {(dk , Zk ) ∈ F(A, B)} → (d, Z) such that Zik = 0 if dki = 0. To this end,
define the index set
Q := {i ∈ {1, . . . , κ + 1} | di = 0, Zi 6= 0} ,
and put q := |Q|. If q = 0 (i.e. Q = ∅), the definitions dk := d and Z k := Z for all
k ∈ N will suffice.
Otherwise, choose i0 ∈ supp d (which is nonempty, as kdk = 1). Then define

0 if di = 0, Zi = 0,


q


2k−1 di0
if i ∈ Q,
q
k
dki :=
k−1

if i = i0 ,


k di0

di if i ∈ supp d \ {i0 }
12
JAMES V. BURKE AND TIM HOHEISEL
and
Zik :=

Zi



 dki A† B + Zi
if
if
di = 0, Zi = 0,
i ∈ Q,
Z i0
Zi
if
if
i = i0 ,
i ∈ supp d \ {i0 }.
dk
i0
di0




Then dk → d and Z k → Z. Also, we have AZik = dki B for all i = 1, . . . , κ + 1, and
Zik = Zi = 0 if dki = 0. In addition,
X 2k − 1
X
(k − 1)2 2
2
kdk k2 =
d
+
d
+
d2i
i
i
0
0
qk 2
k2
i∈Q
i∈supp d\{i0 }
X
(k − 1)2 2
2k − 1 2
+
d2i
d
di0 +
i0
2
2
k
k
i∈supp d\{i0 }
X
2
2
di0 +
di
=
=
i∈supp d\{i0 }
X
=
d2i
i∈supp d
kdk2
=
=
k
1.
k
This shows that {(d , Z ) ∈ conv D(A, B)} → (d, Z), which gives the desired inclusion concluding the proof.
As a simple consequence of the bi-conjugacy relationship in (5), we obtain the
following corollary.
Corollary 4.4. For A ∈ Rp×n and B ∈ Rp×m with rge B ∈ rge A, the conjugate
of the support functional σD(A,B) for the set D(A, B) from (4) is given by
∗
σD(A,B)
= δconv D(A,B) ,
where conv D(A, B) is provided by Proposition 4.3.
To describe the subdifferential of σD(A,B) , we first compute the normal cone of the
set F(A, B) defined in (9). Our approach uses the following preparatory lemmas.
Lemma 4.5. Let F := F(0, 0) be defined through (9) and let (d, Z) ∈ F. Then
κ+1
{0} if di > 0,
NF (d, Z) = span d + X
× {0},
R− . if di = 0,
i=1
and F is regular.
Proof. Let (d, Z) ∈ F = Sκ ∩ Rκ+1
× Rn×m(κ+1) . Due to [16, Prop. 6.41], we
+
have
NF (d, Z) = NSκ ∩Rκ+1 (d) × NRn×m(κ+1) (Z).
+
Clearly, NRn×m(κ+1) (Z) = {0}, and by invoking [16, Ex. 6.8], we find that Sκ is
regular with
NSκ (d) = span d.
Again by [16, Prop. 6.41] in conjunction with [16, Ex. 6.10], we have
κ+1
κ+1
{0} if di > 0,
NRκ+1 (d) = X N[0,∞) (di ) = X
+
R− , if di = 0.
i=1
i=1
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
13
In particular, we have the following implication:
[v ∈ NSκ (d),
w ∈ NRκ+1 (d) :
v + w = 0]
=⇒
v = w = 0.
+
The desired expression for NSκ ∩Rκ+1 now follows from [16, Th. 6.42], since Sκ is
+
regular (being a smooth manifold) and so is Rκ+1
(as a convex set). This result also
+
κ+1
tells us that Sκ ∩ R+ is regular; hence F is also regular (cf. [16, Prop. 6.41]). Lemma 4.6. For A ∈ Rp×n and B ∈ Rp×m consider the linear operator J :
Rκ+1 × Rn×m(κ+1) → Rp×m(κ+1) defined by
J(d, Z) := (AZ1 − d1 B, . . . , AZκ+1 − dκ+1 B) .
The adjoint J ∗ : Rp×m(κ+1) → Rκ+1 × Rn×m(κ+1) (w.r.t. the inner products introduced on matrix product-spaces in Section 2.2) is given by
 


tr (B T W1 )

 
 T
..
T
J ∗ (W ) = − 
 , A W1 , . . . , A Wκ+1  .
.
tr (B T Wκ+1 )
Proof. Let (d, Z) and W be arbitrary elements of Rκ+1 ×Rn×m(κ+1) and Rp×m(κ+1) ,
respectively, having conformal decompositions d ∈ Rκ+1 , Z = [Z1 , . . . , Zκ+1 ] with
Zi ∈ Rn×m , i = 1, . . . , κ + 1, and W = [W1 , . . . , Wκ+1 ] with Wi ∈ Rp×m , i =
1, . . . , κ + 1. Then
hW, J(d, Z)i =
κ+1
X
tr WiT (AZi − di B)
i=1
=
κ+1
X
X
κ+1
di tr (B T Wi )
tr (AT Wi )Zi −
i=1
=
i=1
[−(hB, W1 i , . . . , hB, Wκ+1 i)T , [AT W1 , . . . , AT Wκ+1 ], (d, Z) ,
which proves the result.
We now establish a normal cone formula for the set F(A, B).
Proposition 4.7. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A. Then
NF (A,B) (d, Z) = NF (d, Z) + rge J ∗
∀(d, Z) ∈ F(A, B),
with F and J given by Lemmas 4.5 and 4.6, respectively.
Proof. We first note that
F(A, B) = F ∩ ker J.
Moreover, F is regular by Lemma 4.5 and so is ker J as a subspace. The assertion
will follow from [16, Th. 6.42] as soon as we establish the implication
(r, R) + (s, S) = 0 ⇒ (r, R) = (s, S) = 0,
(12)
for all (r, R) ∈ NF (d, Z) and (s, S) ∈ Nker T (d, Z). For these purposes, first note
that Nker J (d, Z) = rge J ∗ , see [16, Ex. 6.8]. Now, let (r, R) ∈ NF (d, Z) and
14
JAMES V. BURKE AND TIM HOHEISEL
(s, S) ∈ Nker T (d, Z) such that (r, R) + (s, S) = 0: By the representation of J ∗ from
Lemma 4.6, we know that there exists W ∈ Rp×m(κ+1) such that


tr (B T W1 )
T


..
T
s = −
 and S = A W1 , . . . , A Wκ+1 .
.
tr (B T Wκ+1 )
Moreover, from the representation of NF (d, Z) from Lemma 4.5 we know that
R = 0, hence S = 0, i.e. AT Wi = 0 for all i = 1, . . . , κ + 1. This, in turn, implies
that
rge Wi ⊂ ker AT ⊂ ker B T ∀i = 1, · · · κ + 1,
thus B T Wi = 0 for all i = 1, . . . , κ + 1 . This immediately implies s = 0, hence
r = 0, which proves the implication in (12). This concludes the proof.
A formula for the the subdifferential ∂σD(A,B) follows.
Theorem 4.8. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A. Then, for
all (X, V ) ∈ dom σD(A,B) , the subdifferential of σD(A,B) at (X, V ) is given by


Z̄i
N1i
† X

¯


= di M (V )
+
, i = 1, 2, . . . , κ + 1




! W̄i
B
N2i


κ+1
 κ+1

X
X
1
T
p×m(κ+1)
n×m(κ+1)
d¯i Z̄i , −
.
Z̄i Z̄i for some W̄ , N2 ∈ R
, N1 ∈ R
,


2 i=1


i=1




with (d,

¯ Z̄) ∈ F(A, B) and rge N1 ⊂ ker M (V ) 


N2
Moreover, σD(A,B) is continuously differentiable on int (dom σD(A,B) ).
Proof. Let (X, V ) ∈ dom σD(A,B) , i.e., rge X
B ⊂ rge M (V ) and V is positive semidefinite on ker A. Then
(Ȳ , W̄ ) ∈ ∂σD(A,B) (X, V )
⇔ (Ȳ , W̄ ) ∈ conv D(A, B)
and
⇔ (Ȳ , W̄ ) ∈ conv D(A, B)
and
(X, V ) ∈ Nconv D(A,B) (Ȳ , W̄ )
0 ≥ h(X, V ), (Y, W ) − (Ȳ , W̄ )i
¯ Z̄) ∈ F(A, B) : Ȳ =
⇔ ∃(d,
0≥
κ+1
X
1
d¯i Z̄i , W̄ = −
2
i=1
1
tr X T (di Zi − d¯i Z̄i ) −
2
i=1
¯ Z̄) ∈
⇔ ∃(d,
argmin
where f : R
×R
κ+1
X
n×m(κ+1)
f (d, V ) :=
κ+1
X
Z̄i Z̄iT , and
i=1
tr V (Zi ZiT − Z̄i Z̄iT ) ∀ (d, Z) ∈ F(A, B)
i=1
f (d, Z) : Ȳ =
(d,Z)∈F (A,B)
κ+1
∀(Y, W ) ∈ conv D(A, B)
κ+1
X
κ+1
X
1
d¯i Z̄i , W̄ = −
2
i=1
κ+1
X
Z̄i Z̄iT ,
i=1
→ R is defined by
κ+1
κ+1
X
1X
tr (ZiT V Zi ) −
di tr (X T Zi ).
2 i=1
i=1
Here, the first equivalence uses [16, Cor. 8.25], and the third one exploits the
description of conv D(A, B) from Proposition 4.3.
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
15
¯ Z̄) to be a minimizer of f over
The necessary optimality conditions for (d,
F(A, B) are
¯ Z̄) + NF (A,B) (d,
¯ Z̄) with (d,
¯ Z̄) ∈ F(A, B),
0 ∈ f 0 (d,
(13)
which, invoking Proposition 4.7 (as well as Lemma 4.5 and Lemma 4.6), reads
∃µ̄ ∈ R, W̄ ∈ Rp×m(κ+1) : d¯i tr
tr
X T Z̄i
B
W̄i
X T Z̄i
B
W̄i
M (V )
Z̄i
W̄i
= d¯i
= µ̄d¯i
(d¯i > 0),
(d¯i = 0),
≤0
X
(14)
(i = 1, . . . , κ + 1),
B
d ∈ Sκ ∩ Rκ+1
.
+
The third condition gives
Z̄i
W̄i
N1i
† X
¯
= di M (V )
+
B
N2i
with
N1i
rge
N2i
⊂ ker M (V ),
(15)
for some N1i ∈ Rn×m and N2i ∈ Rp×m (i = 1, . . . , κ + 1). In particular, as
T
ker M (V ) ⊂ ker X
, we get
B
tr
T !
X
Z̄i
= d¯i tr
B
W̄i
T
!
X
† X
M (V )
(i = 1, . . . , κ + 1),
B
B
which tells us that the first condition in (14) is generically fulfilled with
µ̄ := tr
!
T
X
† X
M (V )
.
B
B
Moreover, if d¯i = 0, we get from the third condition in (14) that
Z̄i
rge
W̄i
X
⊂ ker M (V ) ⊂ ker
=
B
T !⊥
X
rge
,
B
and hence
tr
T !
X
Z̄i
= 0 ∀i : d¯i = 0,
B
W̄i
i.e., the second condition in (14) is also generically fulfilled. This means that the
set of all critical points of f over F(A, B) is given by
Z̄i
p×m(κ+1)
† X
¯
¯
C := (d, Z̄) ∈ F(A, B) ∃ W̄ ∈ R
: rge
⊂ di M (V )
+ker M (V ) .
W̄i
B
16
JAMES V. BURKE AND TIM HOHEISEL
¯ Z̄) ∈ C and plug it into the objective
Now, we pick an arbitrary critical point (d,
function f . This gives
¯ Z̄)
f (d,
=
κ+1
κ+1
1X
1X¯
tr (Z̄i (V Z̄i − d¯i X )) −
di tr X T Z̄i
{z
}
|
2 i=1
2 i=1
=−AT W̄i
= −
1
2
κ+1
X
κ+1
1X¯
tr (( AZ̄i )T W̄i ) −
di tr X T Z̄i
|{z}
2 i=1
i=1
=d¯i B
= −
1
2
κ+1
X
d¯i tr (B T W̄i ) + tr (Z̄iT X)
i=1
T !
X
Z̄i
tr
B
W̄i
i=1
!
κ+1
T
X
1 X ¯2
† X
= −
d tr
M (V )
B
B
2 i=1 i
!
T
1
X
X
= − tr
M (V )†
.
2
B
B
1
= −
2
κ+1
X
Hence, every critical point has the same function value. Theorem 4.1 tells us that
∂σD(A,B) (X, V ) 6= ∅ for (X, V ) ∈ dom σD(A,B) . In view of the chain of equivalences
at the beginning of this proof, we know that there is a minimizer of f over F(A, B),
which is in particular a critical point. Therefore, all critical points are minimizers
of f over F(A, B), and the expression for the subdifferential follows.
In order to prove the smoothness assertion, we first note that, clearly, the maximally possible set of continuous differentiability of σD(A,B) is int (dom σD(A,B) ).
Moreover, we recall the well-known fact, that a proper, lsc, convex function is continuously differentiable at a point in the interior of its domain if and only if the
subdifferential is a singleton, which hence comprises its gradient (cf., e.g., [16, Th.
9.18]). Furthermore, we have the following chain of equivalences:
∂σD(A,B) (V, X) is a singleton
⇐⇒
∃ U ⊂ Rp : ker M (V ) = {0}n × U
⇐⇒
V ker A 0
⇐⇒
(X, V ) ∈ int dom σD(A,B) .
Here, the first equivalence is obtained from the representation of ∂σD(A,B) that was
proven above. The second equivalence is due to Corollary 3.5 b), bearing in mind
that (X, V ) ∈ dom σD(A,B) , hence V ker A 0. The third equivalence simply uses
the representation of int (dom σD(A,B) ) given in Theorem 4.1.
Corollary 4.9. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A and
rank A = p. Then σD(A,B) is continuously differentiable on int (dom σD(A,B) ) with
1
0
σD(A,B)
(X, V ) = Y (X, V ), − Y (X, V )Y (X, V )T ∀ (X, V ) ∈ int (dom σD(A,B) ),
2
where
Y (X, V ) := A† B + P (P T V P )−1 P T [X − V A† B]
and P ∈ Rn×(n−p) is such that its columns form an orthonormal basis of ker A.
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
17
Proof. Apply the inversion formula in Proposition 3.3
to the subdifferential of
Pκ+1
¯2
¯
σD(A,B) from Theorem 4.8, bearing in mind, that
i=1 di = 1 for (d, Z̄) ∈
F(A, B).
5 Applications
5.1 The optimal value function v
Equation (8) in the proof of Theorem 4.1 tells us that the optimal value function
v defined in (2) satisfies v = −σD(A,b) . As a consequence we have the following
theorem.
Theorem 5.1. Let v : Rn × Sn → R̄ be the optimal value function for the constrained quadratic optimization problem
1 T
T
u V u − x u | Au = b ,
v(x, V ) := infn
u∈R
2
where A ∈ Rp×n and b ∈ Rp are such that b ∈ rge A. Then
v(x, V )
=
=
−σD(A,b) (x, V )
(
T
− 12 xb M (V )† xb
−∞
(16)
if V ker A 0,
else.
x
b
∈ rge M (V ),
That is, v is concave and upper semicontinuous. In addition,
x
dom v = dom ∂v = (x, V ) ∈ Rn × Sn ∈ rge M (V ) and V ker A 0 ,
b
where ∂v is the concave subdifferential
entiable on
int (dom v) = (x, V ) ∈ Rn × Sn
of v. The function v is continuously differ x
b ∈ rge M (V ) and V ker A 0 ,
and, if rank A = p, then, for all (x, V ) ∈ int (dom v), v 0 (x, V ) = (−y, 21 yy T ) with
y = A† b + P (P T V P )−1 (x − V A† b) and P ∈ Rn×(n−p) is any matrix whose columns
form an orthonormal basis for ker A.
Remark 5.2. Theorem 5.1 refers to the concave subdifferential of a concave function φ which is given by ∂φ(ū) := {g | φ(u) ≤ φ(ū) + hg, u − ūi ∀u } = −∂(−φ)(ū)
where ∂(−φ) is the convex subdifferential of the convex function −φ.
Proof. The equivalence (16) is an immediate consequence of (8). Consequently,
the remaining statements in the theorem follow from Theorem 4.1 and Corollary
4.9.
5.2 The matrix-fractional function and its generalization
Consider the function γ : E → R ∪ {+∞} given by
1
T †
if
V 0, rge X ⊂ rge V,
2 tr X V X
γ(X, V ) :=
+∞ else.
(17)
For the case m = 1 (recall that E = Rn×m ×Sn ), this function has been investigated
under the moniker pseudo matrix-fractional function [7, Ex. 3.5.0.0.2]. We refer
to γ as the generalized matrix-fractional function when m > 1. It is related to the
general fractional-quadratic matrix inequality introduced in [1, p. 155].
18
JAMES V. BURKE AND TIM HOHEISEL
Theorem 4.1 tells us that γ is the support functional σD(0,0) . This yields a series
of immediate consequences.
Theorem 5.3. The function γ defined in (17) is the support functional of the set
D := D(0, 0) defined by (4), i.e, γ = σD . In particular, γ is sublinear (hence
convex) and lsc. Moreover,
dom γ = dom ∂γ = {(X, V ) ∈ E | V 0, rge X ⊂ rge V } ,
which is closed with
int (dom γ) = {(X, V ) ∈ E | V 0 } .
In addition, for (X, V ) ∈ dom γ, we have

!

 κ+1
X
X
1
∂γ(X, V ) =
d¯i Z̄i , −
Z̄i Z̄iT

2 i=1
 i=1

¯
(d, Z̄) ∈ F, N ∈ Rn×m(κ+1) 

with rge N ⊂ ker V
,

such that Z̄ = d¯ V † X + N 
i
i
i
and γ is continuously differentiable exactly on int (dom γ) with
1
γ 0 (X, V ) = V −1 X, − V −1 XX T V −1
∀(X, V ) ∈ int (dom γ).
2
Proof. Set A = 0 and B = 0 in Theorem 4.1 and Theorem 4.8, respectively, to get
all assertions other than the explicit formula for the derivative. In order to prove
the latter, let (X, V ) ∈ int (dom γ). In particular, this implies that ker V = {0}
and Z̄i = d¯i V −1 X, we have
and V † = V −1 . Hence, for d¯ ∈ Sκ ∩ Rκ+1
+
κ+1
X
d¯i (d¯i V −1 X) = V −1 X
i=1
and
−
κ+1
1
1 X ¯ −1
(di V X)(d¯i V −1 X)T = − V −1 XX T V −1 .
2 i=1
2
Therefore,
1 −1
T −1
−1
= {γ 0 (X, V )},
∂γ(X, V ) =
V X, − V XX V
2
which concludes the proof.
Next we consider the function φ : E → R ∪ {+∞} defined by
1
T −1
X)
if
V 0,
2 tr (X V
φ(X, V ) :=
+∞ else.
(18)
Clearly, this function is closely related to the function γ defined above with dom φ =
Rn×m × Sn++ = int (dom γ) and γ|dom φ = φ|dom φ . In the case where m = 1, φ is
called the matrix-fractional function, e.g., see [2, Ex. 3.4] or [7, Ex. 3.5.0.0.4]. In
the remainder of this section, we show that γ = cl φ. We begin with the following
well-known result which can be found for example in [2, App. A.5.5] or [10, Th.
16.1] with the latter containing a proof.
Lemma 5.4 (Schur complement). Let S ∈ Sn , T ∈ Sm , R ∈ Rn×m . Then
S R
0 ⇐⇒ [S 0, rge R ⊂ rge S, T − RT S † R 0]
RT T
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
19
Proposition 5.5. For the functions γ from (17) and φ from (18) we have:
V
X
1
a) epi γ = (X, V, α)| ∃ Y ∈ Sm :
0,
tr
(Y
)
≤
α
.
2
XT Y
V
X
1
m
b) epi φ = (X, V, α)| ∃ Y ∈ S :
0, V 0, 2 tr (Y ) ≤ α .
XT Y
Moreover, epi γ = cl (epi φ), or equivalently, γ = cl φ. Therefore, epi φ is a convex
cone, i.e., φ is sublinear.
Remark 5.6. An immediate consequence of part a) of this proposition is that epi γ
is semidefinite representable, e.g., see [1, p. 144].
Proof. a) First note that
Y X T V † X ⇒ tr (Y ) ≥ tr (X T V † X).
(19)
We have
(X, V, α) ∈ epi γ ⇔ V 0, rge X ⊂ rge V,
1
tr (X T V † X) ≤ α
2
1
⇔ ∃ Y ∈ Sm : V 0, rge X ⊂ rge V, Y X T V † X, tr (Y ) ≤ α
2
1
V
X
m
⇔ ∃Y ∈ S :
0, tr (Y ) ≤ α,
XT Y
2
where the second equivalence follows from (19) and the fact that we may take
Y = X T V † X, and the final equivalence follows from Lemma 5.4.
b) Analogous to a) with the additional condition that V 0 in each step.
In order to show that cl (epi φ) = epi γ, we first note that, as γ is lsc, epi γ is
closed. Moreover, since obviously epi φ ⊂ epi γ, the inclusion cl (epi φ) ⊂ epi γ
follows immediately.
In order to see the converse inclusion, let (X, V, α) ∈ epi γ. Now, set Vk :=
V + k1 In 0 and put αk := α + γ(X, Vk ) − γ(X, V ) for all k ∈ N. Then, by
definition, we have
αk = α + γ(X, Vk ) − γ(X, V ) ≥ γ(X, Vk ) = φ(X, Vk ),
i.e., (X, Vk , αk ) ∈ epi φ for all k ∈ N. Clearly, Vk → V with Vk nonsingular, and so
Vk−1 → V + (e.g., see [15, p. 153]). Consequently, φ(X, Vk ) = γ(X, Vk ) → φ(X, Vk )
so that αk → α. Therefore, (X, Vk , αk ) → (X, V, α), and so epi γ = cl (epi φ), i.e.,
γ = cl φ.
Also, since γ is a support functional (by Theorem 5.3), hence, in particular,
sublinear and lsc, epi γ is a closed, convex cone. Moreover, as
epi φ = epi γ ∩ K,
with K := Rn×m × Sn++ × R being a convex cone, also epi φ is a convex cone. Hence,
φ is also sublinear.
5.3 The relationship between σD(A,B) and the nuclear norm
There is a fascinating relationship between σD(A,B) and the nuclear norm which was
first suggested to us by results in [13] for the case (A, B) = (0, 0) where it is used
20
JAMES V. BURKE AND TIM HOHEISEL
to smooth the nuclear norm. The key idea is to consider optimization problems of
the form
D
E
c + σD(A,B) (X, V ),
minn V, W
(20)
V ∈S
c ∈ Sn are given and fixed. In this section we consider one
where X ∈ Rn×m and W
c (described below).
possible choice for the matrix W
To begin with, let X ∈ Rn×m , A ∈ Rp×n and B ∈ Rp×m be given with rge B ⊂
rge A and m ≥ n. Set k := dim (ker A), and define P ∈ Rn×k to be any matrix
whose columns form an orthonormal basis for ker A so that P P T is the orthogonal
projection onto ker A. Let P P T X have singular value decomposition

Σ
[U1 , U2 , U3 ]  0(k−t)×t
0(n−k)×t
0t×(k−t)
0t×(n−k)
0(k−t)×(k−t)
0(n−k)×(k−t)
0(k−t)×(n−k)
0(n−k)×(n−k)
 
 QT1
QT2 
T

0(k−t)×(m−n)  
QT3  = U1 ΣQ1 ,
0(n−k)×(m−n)
QT4
(21)
0t×(m−n)
where t := rank (P P T X), Σ := diag (σ1 , σ2 , . . . , σt ) with σ1 , σ2 , . . . , σt the singular
values of P P T X, U := [U1 , U2 , U3 ] and Q := [Q1 , Q2 , Q3 , Q4 ] are orthogonal, and
b := [U1 , U2 ] is chosen so that its columns form an orthonormal basis for the kernel
U
bU
b T , so that
of A. Set Pb := U
P P T = PbPbT = U1 U1T + U2 U2T .
(22)
Finally, define
bQ
b T )(A† B + U
bQ
b T )T ,
c := 1 (A† B + U
W
(23)
2
b := [Q1 , Q2 ]. The matrix U
bQ
b T establishes an isometry between the kwhere Q
b
b = rge P . We have the
dimensional subspaces ker A and rge Q, where ker A = rge U
following result.
c be as above with m ≥ n. Then
Theorem 5.7. Let X, A, B, P, Σ, U, Q, and W
the optimal value in (20) is
(24)
X, A† B + P P T X n ,
and is attained at the matrix Vb := U1 ΣU1 . In particular, if rge X ⊂ ker A, then
T
⊥
the optimal value is kXkn
; while, on
the other hand, if rge X ⊂ rge A = (ker A) ,
†
then the optimal value is X, A B .
Proof. We begin by showing that the matrix Vb := U1 ΣU1T solves (20). For this it
is helpful to keep in mind certain key relationships between the matrices defined
above. First, recall that AA† is the orthogonal projection onto rge A and A† A is
the orthogonal projection onto rge AT , i.e., A† A = I − P P T . Moreover, rge A† =
(ker A)⊥ which, in particular, implies that P T A† = 0, U1T A† = 0, and U2T A† = 0.
In addition, we use the fact that Vb † = U1 Σ−1 U1T (see discussion following (6)), as
well as the relations Vb = P P T Vb P P T and Vb † = P P T Vb † P P T which follow from
the equivalence U1 = P P T U1 .
We claim that
Vb †
A†
.
M (Vb )† = K :=
(A† )T 0
By (6), we need to show that
M (Vb )KM (Vb ) = M (Vb ) and KM (Vb )K = K
(25)
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
21
and
(M (Vb )K)T = M (Vb )K
and
(KM (Vb ))T = KM (Vb ).
(26)
First observe that
U ΣU T U Σ−1 U1T + AT (A† )T
b
M (V )K = 1 1 1
AU1 ΣU1T
U1 U1T + A† A
0
=
0
AA†
I − U2 U2T
0
=
,
0
AA†
U1 ΣU1T A†
AA†
and
U Σ−1 U1T U1 ΣU1T + A† A
b
KM (V ) = 1
(A† )T U1 Σ−1 U1T
U1 U1T + A† A
0
=
0
AA†
I − U2 U2T
0
=
.
0
AA†
U1 Σ−1 U1T A†
(A† )T AT
This establishes (26). To see (25), write
U Σ−1 U T (I − U2 U2T ) A† AA†
KM (Vb )K = 1 † T 1
(A ) (I − U2 U2T )
0
†
†
Vb
A
=
(A† )T 0
=K
and
(I − U2 U2T )Vb
b
b
M (V )KM (V ) =
AA† A
Vb AT
=
A 0
= M (Vb ).
(I − U2 U2T )AT
0
Since the problem (20) is convex, one need only show that Vb satisfies the first-order
c + ∂V σD(A,B) (X, Vb ). By Theorem
necessary condition for optimality, i.e., 0 ∈ W
4.8 and [16, Cor. 10.11], these conditions read
¯ Z̄) ∈ F(A, B),
∃ W̄ ∈ Rp×m(κ+1) , (d,
and
N̄1
N̄1
n×m(κ+1)
p×m(κ+1)
∈R
×R
with rge
⊂ ker M (V )
N̄2
N̄2
(27)
such that
Z̄i
W̄i
N̄1i
† X
¯
= di M (V )
+
(i = 1, 2, . . . , κ + 1)
B
N̄2i
(28)
and
bQ
b T )(A† B + U
bQ
b T )T =
(A† B + U
κ+1
X
i=1
Z̄i Z̄iT .
(29)
22
JAMES V. BURKE AND TIM HOHEISEL
¯ Z̄), and N̄1 . Set
We now construct a suitable W̄ , (d,
N̄2
N11
U2 QT2
N1i
0
¯
¯
d1 = 1, di = 0,
:=
,
:=
(i = 2, 3, . . . , κ + 1),
N21
0
N2i
0
Z̄
and define W̄
using (28). This gives
b†
†
bQ
bT Z̄1
X
U2 QT2
V X + A† B + U2 QT2
A B+U
=
= M (V )†
+
=
W̄1
B
0
(A† )T X
(A† )T X
and
0
(i = 2, 3, . . . , κ + 1).
0
In particular, this implies that (29) holds. Clearly, 0 ≤ d¯ and d¯ = 1 and AZ̄1 =
¯ Z̄) ∈ F(A, B) and (27) is satisfied. Hence Vb is indeed a
AA† B = B, so that (d,
solution to (20).
We now compute the optimal value in (20). This is given by
T
!
E 1
E D
D
X
† X
b
c
b
b
M (V )
.
Y , X + W , V + tr
2
B
B
Z̄i
W̄i
=
We have
D
c , Vb
W
E
=
=
=
=
=
=
1 †
bQ
b T )(A† B + U
bQ
b T )T U1 ΣU1T
tr (A B + U
2
1 †
bQ
b T )(B T (A† )T U1 ΣU1T + Q1 ΣU1T )
tr (A B + U
2
1 †
bQ
b T )Q1 ΣU1T
tr (A B + U
2
1
tr A† BQ1 ΣU1T + tr U1 ΣU1T
2
1
1
tr U1T A† BQ1 Σ + tr (Σ)
2
2
1
P P T X n
2
and
tr
T
!
X
† X
b
M (V )
= tr
B
B
T b †
!
X
V X + A† B
B
(A† )T X
= tr (X T Vb † X) + 2tr (X T A† B)
= tr (X T P P T Vb † P P T X) + 2tr (X T A† B)
= tr (Q1 ΣQT1 ) + 2tr (X T A† B)
= P P T X n + 2tr (X T (I − P P T )A† B)
= P P T X n + 2tr (X T A† B) .
b
Hence the optimal value is given
by (24)
whenT Y := †0. Finally,
note that if rge X ⊂
T
†
ker A, then P P X = X so X, A B = P P X, A B = X, P P T A† B = 0. c =
By applying Theorem 5.7 to γ = σD(0,0) with W
result.
1
2 I,
we obtain the following
ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS
23
Corollary 5.8. [13, Lemma 1] Let X ∈ Rn×m with m ≥ n, and let γ : E →
R ∪ {+∞} be defined in (17). Then
1
kXkn = minn tr (V ) + γ(X, V ),
V ∈S 2
with the minimum value attained at V̂ := U1 ΣU1T , where X = U1 ΣQT1 is the reduced
SVD for X.
This representation of the nuclear norm is used in [13] as a means to introduce
a smoothing term for optimization problems involving a nuclear norm regularizer.
Indeed, Theorem 5.3 shows that, on the interior of its domain, γ is a smooth function
with an easily computed derivative. Using the more general formulation developed
here, it is possible to derive smoothers for more general types of regularizers. We
illustrate one such possibility in the following corollary.
Corollary 5.9. Let X ∈ Rp×p and let P ∈ Rp×k be any matrix having orthonormal
columns. Define A := I − P P T and B := λ(I − P P T )X. With this specification,
c in (23) is given by Ŵ = 1 (I + B)(I + B)T and the optimal value in
the matrix W
2
(20) with Yb := 0 is
2 λ (I − P P T )X + P P T X .
F
†
n
†
Proof. Simply observe that A = A and A B = λB, and apply Theorem 5.7 with
n = p = m.
6 Final Remarks
We have introduced a new class of support functionals σD(A,B) , where
1
T
D(A, B) =
Y, − Y Y
| AY = B .
2
It is shown that these support functionals have a natural connection to the optimal
value function for quadratics over affine manifolds. Connections are also made to the
matrix-fractional function and the pseudo matrix-fractional function as well as their
generalizations. It is remarkable that all of these objects can be represented by the
support functional of D(A, B) for an appropriate choice of A and B. These support
functional representations had not previously been observed. In addition, we have
computed the the closed convex hull of D(A, B) which allowed us to compute the
subdifferential of σD(A,B) . This computation made use of nontrivial techniques
from nonconvex variational analysis. Finally, we obtained a general representation
theorem that can be used to give alternative representation for the nuclear norm and
related regularizers in matrix optimization. These representations are particularly
useful in that they can be used to develop smoothers for the regularization terms.
References
[1] A. Ben-Tal and A. Nemirovski Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications MPS-SIAM Series on Optimization, SIAM, Philadelphia, 2001.
[2] S. Boyd and L. Vandenbergh: Convex Optimization. Cambridge University Press, 2004.
[3] J.M. Borwein and A.S. Lewis: Convex Analysis and Nonlinear Optimization. Theory and
Examples. CMS Books in Mathematics, Springer-Verlag, New York, 2000.
[4] J.V. Burke: Nonlinear Optimization. Lecture Notes, Math 408, University of Washington,
Spring 2014.
24
JAMES V. BURKE AND TIM HOHEISEL
[5] J.V. Burke and A.Y Aravkin: Smoothing dynamic systems with state-dependent covariance
matrices. To appear in the Conference on Decision and Control (CDC), 2014 (6 pages).
[6] Y. Chabrillac and J.-P. Crouzeix: Definiteness and semidefiniteness of quadratic forms.
Linear Algebra and its Applications 63, 1984, pp. 283–292.
[7] J. Dattorro: Convex Optimization & Euclidean Distance Geometry. Mεβoo Publishing
USA, Version 2014.04.08, 2005 .
[8] M. Faliva and M.G. Zoia: On a partitioned inversion formula having useful applications
in econometrics. Econometric Theory 18, 2002, pp. 525–530.
[9] P. Finsler: Über das Vorkommen definiter und semidefiniter Formen und Scharen
quadratischer Formen. Commentarii Mathematici Helvetici 9, 1937, pp. 188–192.
[10] J. Gallier: Geometric Methods and Applications: For Computer Science and Engineering.
Texts in Applied Mathematics, Springer New York, Dordrecht, London, Heidelberg, 2011.
[11] G.H. Golub and C.F van Loan: Matrix Computations. John Hopkins Series in Mathematicals Sciences, The John Hopkins University Press, Baltimore, Maryland, 1983.
[12] R. A. Horn and C. R. Johnson: Matrix Analysis. Cambridge University Press, New York,
N.Y., 1985.
[13] C.-J. Hsieh and P. Olsen: Nuclear Norm Minimization via Active Subspace Selection.
JMLR W&CP 32 (1) :575-583, 2014.
[14] S.-J. Kim and S. Boyd: A Minimax Theorem with Applications to Machine Learning, Signal
Processing, and Finance. SIAM Journal on Optimization, 19(3):13441367, 2008.
[15] J.R. Magnus and H. Neudecker Matrix Differential Calculus with Applications in Statistics
and Econometrics. John Wiley & Sons, New York, NY, 1988.
[16] R.T. Rockafellar and R.J.-B. Wets: Variational Analysis. A Series of Comprehensive
Studies in Mathematics, Vol. 317, Springer, Berlin, Heidelberg, 1998.
University of Washington, Department of Mathematics, Box 354350 Seattle, Washington 98195-4350
E-mail address: burke@math.washington.edu
University of Würzburg, Institute of Mathematics, Campus Hubland Nord, EmilFischer-Straße 30, 97074 Würzburg, Germany
E-mail address: hoheisel@mathematik.uni-wuerzburg.de
Download