ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS WITH APPLICATIONS JAMES V. BURKE AND TIM HOHEISEL Abstract. A new class of matrix support functionals is presented which establish a connection between optimal value functions for quadratic optimization problems, the matrix-fractional function, the pseudo matrix-fractional function, and the nuclear norm. The support function is based on the graph of the product of a matrix with its transpose. Closed form expressions for the support functional and its subdifferential are derived. In particular, the support functional is shown to be continuously differentiable on the interior of its domain, and a formula for the derivative is given when it exists. 1 Introduction One of the first topics studied in an elementary course on optimization is that of optimizing a quadratic function over an affine set: 1 min uT V u − xT u s.t. Au = b, (1) u∈Rn 2 where (x, V, A, b) ∈ Rn × Sn × Rp×n × Rp , with Sn the linear space of real symmetric n × n matrices. A complete characterization of the set of solutions and the optimal value are easily obtained (see Theorem 3.2 below) without the aid of calculus. Less well studied are the properties of the optimal value function 1 T T v(x, V ) := infn u V u − x u | Au = b , (2) u∈R 2 for given (A, b) ∈ Rp×n × Rp with b ∈ rge A. The variational properties of v, and its extensions, are easily derived as a consequence of the more general results of this paper. In particular, we show that v is a concave function, and, more specifically, it is the negative of a convex support functional on Rn × Sn . In addition, we show −v equals the (pseudo) matrix-fractional function when (A, b) = (0, 0) [2, 5, 7, 13, 14] (see Section 5.2). The central object of study is the support functional σD(A,B) : E → R̄ := R ∪ {±∞} given by σD(A,B) (X, V ) := sup {h(X, V ), (U, W )i | (U, W ) ∈ D(A, B) } , n×m × S , rge B ⊂ rge A, and 1 D(A, B) := Y, − Y Y T ∈ E Y ∈ Rn×m : AY = B . 2 where E := R (3) n (4) 2010 Mathematics Subject Classification. 46N10, 49J52, 49J53, 90C20, 90C46. Key words and phrases. Support function, quadratic programming, value function, matrixfractional function, nuclear norm, subdifferential, normal cone. This research was partially supported by the DFG (Deutsche Forschungsgemeinschaft) under grant HO 4739/2-1. 1 2 JAMES V. BURKE AND TIM HOHEISEL Here, we use the Frobenius inner product for matrix spaces (see Section 2.2). The set D(A, B) is the graph of the mapping Y 7→ − 12 Y Y T over the affine manifold {Y | AY = B }. This mapping is a common transformation in numerous applications to statistics and semidefinite programming (e.g., see [1, 2]). In Section 4, we obtain a closed form expression for σD(A,B) in terms of X, V, A, and B, a description of its domain and its interior, a characterization of the closed convex hull of the set D(A, B), and a characterization of the convex subdifferential ∂σD(A,B) . In particular, we show that σD(A,B) is differentiable on the interior of its domain and provide a formula for the derivative. By taking m = 1, we obtain the representation v = −σD(A,b) , where v is the optimal value function defined in (2) (see Theorem 5.1). In addition, we find that σD(0,0) is precisely the pseudo matrix-fractional function studied in [2, 5, 7, 13, 14]. The pseudo matrix-fractional function coincides with the matrixfractional function when V is positive definite (see Theorem 5.3). Our derivation of the subdifferential ∂σD(A,B) makes use of techniques from nonconvex, nonsmooth variational analysis. For this reason, our study begins in Section 2 by recalling the necessary background material from convex and nonsmooth variational analysis [3, 16] as well as matrix analysis [12]. In Section 3, we review the elementary results concerning the problem (1) reformulating them in a manner more suitable for our study. These results are the key to our analysis. The core results of the paper are presented in Section 4 where we develop the properties of the support functional σD(A,B) . In Section 5 we present three applications of the results of Section 4. The first of these is the representation of the optimal value function v described above. The second is the application to the matrix-fractional function and its generalizations. The final application establishes an interesting relationship between σD(A,B) and the nuclear norm. In particular, we recover the relationship discussed in [13] which is used to obtain smooth approximations to the nuclear norm in optimization. Notation: The support of a vector d ∈ Rp is the index set supp d := {i ∈ {1, . . . , p} | di 6= 0} . p The unit simplex in R is given by ( ∆p := p λ ∈ R | λi ≥ 0 (i = 1, . . . , p), p X ) λi = 1 , i=1 while the unit sphere in Rp is the set Sp−1 := {x ∈ Rp | kxk2 = 1 } . For a set B ⊂ Rp its orthogonal complement is defined by B ⊥ := x ∈ Rp xT b = 0 ∀b ∈ B . Its linear hull or span is the set ( ) n X span B := x ∈ Rp ∃n ∈ N, bi ∈ B , αi ∈ R (i = 1, . . . , n) : x = αi bi . i=1 For two sets A, B ⊂ Rn its Minkowski sum is the set A + B := {a + b | a ∈ A, b ∈ B } . Moreover, if A = {a} is a singleton, we loosely write a + B := {a} + B. ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 3 2 Preliminaries 2.1 Tools for variational analysis in Euclidean space Let (E, h·, ·i) be a Euclidean space, i.e., a finite dimensional real inner product space. We p denote the norm on E induced by the inner product h·, ·i by k · k, i.e., kxk := hx, xi for all x ∈ E. In particular, we equip E = Rn with the standard scalar product giving k · k = k · k2 . For an extended real-valued function f : E → R its epigraph is the set epi f := {(x, α) ∈ E × R | f (x) ≤ α}, and its domain is the set dom f := {x ∈ Rn | f (x) < +∞}. The notion of the epigraph allows for very handy definitions for a number of properties of extended real-valued functions. A function f : E → R is called lower semicontinuous (lsc) (or closed) if epi f is a closed set, and upper semicontinuous (usc) if −f is lsc. The lower semicontinuous hull of f , denoted cl f , is the function whose epigraph is given by cl (epi f ). f is said to be convex if epi f is a convex set, and concave if −f is convex. Morever, f is said to be proper if dom f 6= ∅ and f > −∞. The (convex) indicator function of a set C ⊂ E, δC : E → R ∪ {+∞}, is given by 0 if x ∈ C, δC (x) := +∞ if x ∈ / C. The indicator function δC is convex if and only if C is convex, and δC is lsc if and only if C is closed. The set C is a cone if λC ⊂ C for all λ ≥ 0. It is a convex cone if, in addition, C + C ⊂ C. The (convex) subdifferential of a function f : E → R at a point x̄ ∈ dom f is the set ∂f (x̄) := {v ∈ E | f (x) ≥ f (x̄) + hv, x − x̄i ∀x ∈ E } ∗ with dom ∂f := {x̄ | ∂f (x̄) 6= ∅ }. The (convex) conjugate f : E → R of f is defined by f ∗ (y) := sup {hx, yi − f (x)}. x∈dom f We call a function f : E → R positively homogeneous if 0 ∈ dom f and f (λv) = λf (v) ∀v ∈ E and λ > 0. If, in addition, f (v + w) ≤ f (v) + f (w) ∀v, w ∈ E, we say that h is sublinear. Note that f is positively homogeneous if and only if epi f is a cone, in which case either f (0) = 0 or f (0) = −∞. Moreover, h is sublinear if and only if it is convex and positively homogeneous, which holds if and only if epi f is a convex cone [16, Ex. 3.19]. For C ⊂ E, its convex hull is defined to be the set \ conv C := {D ⊂ E | C ⊂ D, D convex } . 4 JAMES V. BURKE AND TIM HOHEISEL By Carathéodory’s Theorem, e.g., see [3, Sec. 2.3, Ex. 5], we can represent conv C as ( ) κ+1 X conv C = x ∈ E ∃λ ∈ ∆κ+1 , c1 , . . . , cκ+1 ∈ C : x = λi ci , i=1 where κ := dim E. Furthermore, the closed convex hull of C is the closure of the convex hull of C, i.e., conv C := cl (conv C). The function ∗ σC (v) := δC (v) = sup hv, wi w∈C is called the support functional for C, which is always sublinear (hence convex), lsc and proper when C is nonempty, cf., e.g., [3, Sec. 4.2, Ex. 9], and the bi-conjugacy theorem [3, Th. 4.2.1] gives ∗ (5) σC = δconv C . Given a set C ⊂ E the regular normal cone of C at x̄ ∈ C is the set hd, x − x̄i N̂C (x̄) := d ∈ E lim inf ≤0 , x→C x̄ kx − x̄k and the limiting normal cone of C at x̄ ∈ C is the set n o NC (x̄) := v ∈ E ∃{xk ∈ C} → x̄, v k ∈ N̂C (xk ) : v k → v , i.e., the limiting normal cone is the outer limit of the regular normal cone in the sense of Painlevé-Kuratowski, cf. [16, Def. 5.4]. We call a closed set C ⊂ E (Clarke) regular at a point x̄ ∈ C if NC (x̄) = N̂C (x̄). If this holds true for every point x̄ ∈ C, we simply say C is (Clarke) regular. In particular, every closed convex set D ⊂ E is regular and ND (x̄) = {v ∈ E | hv, x − x̄i ≤ 0 ∀x ∈ D } = ∂δD (x̄) ∀x̄ ∈ D. 2.2 Tools from matrix analysis A general reference for the material in this section is [12]. For a matrix A ∈ Rp×m we denote the ith column vector (i = 1, . . . , m) by ai ∈ Rp , i.e., A = [a1 , . . . , am ]. The range of A is the set rge A := {Ax ∈ Rp | x ∈ Rm } = span {a1 , . . . , am }, and the kernel or null space is ker A := {x ∈ Rm | Ax = 0 } . The following basic subspace relationships are elementary and used freely throughout: ⊥ ker A = rge AT and rge A = (ker AT )⊥ . n×n The trace of a square is the sum of all diagonal elements, and we Pn matrix M ∈ R write tr (M ) = i=1 mii . If λP 1 (M ), . . . , λn (M ) are the eigenvalues of M , counting n multiplicities, then tr (M ) = i=1 λi (M ). The singular value decomposition (SVD) of A ∈ Rp×m is a factorization of the form A = U Σ̂QT , where U ∈ Rp×p and Q ∈ Rm×m are orthogonal and the only nonzero entries of the matrix Σ̂ ∈ Rp×m are Σ̂ii > 0 for i = 1, 2, . . . , k := rank A. These nonzero entries, σ1 (A) ≥ σ2 (A) ≥ · · · ≥ σk (A), are called the singular values of A, and they equal the square roots of the nonzero eigenvalues of AT A (or, equivalently, AAT ). The reduced SVD is the factorization A = U1 ΣQ1 , where Σ ∈ Rk×k is the nonsingular square diagonal matrix of singular values, U1 ∈ Rp×k consists of the first k columns of U , and Q1 ∈ Rk×m consists of the first k columns of Q. The remaining columns of both U and Q can be given any desired but commensurate ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 5 left-right ordering. The singular values can be used to define a family of norms on Rp×m called the Schatten norms. Of particular interest to us is the nuclear norm (or Schatten 1-norm) defined by kAkn = tr Σ. For A ∈ Rp×n , the matrix S is called the Moore-Penrose pseudoinverse of A if and only if SA = (SA)T , AS = (AS T ), ASA = A, and SAS = S. (6) † We denote the Moore-Penrose pseudoinverse by A (cf. [11, p. 139] or [15, Ch. 2]). It is straightforward to show that A† = Q1 Σ−1 U1T , where A = U1 ΣQT1 is the reduced SVD for A. In particular, A† A and AA† are the orthogonal projections onto rge AT and rge A, respectively. Consequently, given b ∈ rge A, we have {x ∈ Rn | Ax = b } = A† b + ker A, which is of particular importance in our study. For two matrices A ∈ Rr×s , B ∈ Rp×q , we define their Kronecker product by a11 B · · · a1s B .. .. . A ⊗ B := ... . . ar1 B · · · ars B The space of real symmetric matrices of dimension p × p is denoted by Sp . The subset of symmetric positive semidefinite matrices is denoted by Sp+ , and the subset of symmetric positive definite matrices is Sp++ . In addition, we use the following notation: S T :⇐⇒ S − T ∈ Sp+ , and S T :⇐⇒ S − T ∈ Sp++ . Whenever we consider a matrix (sub-)space of the form E ⊂ Rr×s we equip it with the inner product h·, ·i : E → R defined by hA, Bi := tr (AT B), which makes E a Euclidean space. The induced norm is called the Frobenius norm. This procedure is also adopted for matrix product-spaces. In particular, the space E := Rn×m × Sn is equipped with the inner product h·, ·i : E → R, defined by h(X, V ), (Y, W )i := tr (X T Y ) + tr (V W ) yielding a Euclidean space (of dimension κ := mn + n(n+1) ) with induced norm 2 k · k : E → R given by q q k(X, V )k = tr (X T X) + tr (V 2 ) = kXk2F + kV k2F , where k · kF denotes the Frobenius norm on the respective space. 3 Optimization of quadratic functions We now recall the elementary properties of the optimization problem (1) and its associated optimal value function (2). Lemma 3.1 (Solvabilty of equality constrained QPs). For (x, V, A, b) ∈ Rn × Sn × Rp×n ×Rp consider the quadratic optimization problem (1). Then (1) has a solution if and only if the following three conditions hold: 6 JAMES V. BURKE AND TIM HOHEISEL i) b ∈ rge A (i.e. (1) is feasible). ii) x ∈ rge V AT . iii) uT V u ≥ 0 ∀u ∈ ker A. If, however, b ∈ rge A, but ii) or iii) are violated, we have v(x, V, A, b) = −∞. Proof. This is a standard result, see, for example, [4, Sec. 4.3]. In the sequel, we use the following notation: V ker A 0 uT V u ≥ 0 :⇐⇒ ∀u ∈ ker A and V ker A 0 uT V u > 0 :⇐⇒ ∀u ∈ ker A \ {0}. A complete description of the optimal value function v defined in (2) is given in our next result. This is the foundation on which our discussion of the support functional σD(A,B) is based. Theorem 3.2. At (x, V ), the optimal value-function v in (2) is given by T x V − 21 b A −∞ +∞ AT 0 † x b x V AT , V ker A 0, ∈ rge b A 0 b ∈ rge A ∧ (x ∈ / rge [V AT ] ∨ V 6 ker A 0), b∈ / rge A. if if if Proof. First, if b ∈ / rge A, then (1) is infeasible, hence, by convention v(V, x, A, b) = +∞. Second, if b ∈ rge A, Lemma 3.1 tells us that if x ∈ / rge V AT or V is not positive semidefinite on ker A, wehavev(V, x, A, b) = −∞. Hence, we need only x V AT show the expression for v when ∈ rge (i.e. b ∈ rge A and b A 0 x ∈ rge V AT ) and V is positive semidefinite on ker A. Again, Lemma 3.1 tells us that, in this case, a solution to (1) exists. The first-order necessary optimality conditions at ū ∈ argmin u∈Rn 1 T u V u − xT u s.t. 2 Au = b 6= ∅ are that there exists ȳ ∈ Rp for which V A AT 0 ū x = , ȳ b or equivalently, ū V ∈ A ȳ AT 0 † x V + ker A b AT 0 . ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS Plugging such a pair ū ȳ 7 into the objective function yields 1 T ū V ū − xT ū = 2 1 T 1 ū (V ū − x) − xT ū | {z } 2 2 =−AT ȳ 1 1 = − |ūT{z AT} ȳ − xT ū 2 2 =bT T 1 x ū = − 2 b ȳ T 1 x V = − A 2 b AT 0 where the last equation is due to the fact that x V AT V ∈ rge = ker A 0 A b † T x , b AT 0 ⊥ . Since all such points yield the same optimal value, this concludes the proof. The matrix V AT M (V ) := (7) A 0 in the foregoing theorem plays a pivotal role in our analysis. The next result sheds some light on properties of M (V ) which is often referred to as a bordered matrix in the literature [6, 8, 15]. Proposition 3.3 (Finsler’s Lemma and bordered matrices). Let A ∈ Rp×n and V ∈ Sn . Then V ker A 0 ⇐⇒ ∃ε > 0 : V + εAT A 0. Moreover, the bordered matrix M (V ) is invertible if and only if rank A = p and V ker A 0 in which case its inverse M (V )−1 is given by P (P T V P )−1 P T I − P (P T V P )−1 P T V A† , (A† )T I − V P (P T V P )−1 P T (A† )T V P (P T V P )−1 P T V − V A† where P ∈ Rn×(n−p) is any matrix whose columns form an orthonormal basis of ker A. Proof. For the first statement see, e.g., [6, Th. 2] and for the characterization of the invertibility of M (V ) see, e.g., [6, Th. 7]. The inversion formula can be found in [8, Th. 1] and is easily verified by direct matrix multiplication. Remark 3.4. A general formula for the pseudoinverse M (V )† can be found in [15, p. 58]. The first statement in Proposition 3.3 is referred to in the literature as Finsler’s Lemma and originally goes back to [9]. Corollary 3.5. Let A ∈ Rp×n and V ∈ Sn . Then the following hold: a) V ker A 0 =⇒ rge [V AT ] = Rn . b) V ker A 0 ⇐⇒ [V ker A 0] ∧ [∃ U ⊂ Rp : ker M (V ) = {0}n × U ], in which case U = ker AT . 8 JAMES V. BURKE AND TIM HOHEISEL Proof. a) Let x ∈ Rn . Due to Finsler’s Lemma, there existsε > 0 and r ∈ Rn such that (V + εAT A)r = x. Putting s := εAr gives [V AT ] rs = V r + εAT Ar = x, which proves a). b) ‘⇒:’ Let xy ∈ ker M (V ), i.e. V x + AT y = 0 and Ax = 0. Multiplying the first condition by xT yields xT V x = 0. By assumption, as x ∈ ker A, we get x = 0. Hence, y ∈ ker AT , and thus ker M (V ) = {0}n × ker AT . The condition V ker A 0 holds trivially. ‘⇐:’ Let x ∈ ker A \ {0}. By assumption ker M (V ) = {0}n × U , and so xy ∈ / ker M (V ) for all y ∈ Rp . Thus, since Ax = 0, we must have V x + AT y 6= 0 for all y ∈ Rp or, equivalently, V x ∈ / rge AT = (ker A)⊥ . Hence, there exists x̂ ∈ ker A such that 0 < x̂T V x. We now compute that for all ε > 0, we have 0 ≤ (εx̂ − x)T V (εx̂ − x) = ε2 x̂T V x̂ − 2εx̂T V x + xT V x, as εx̂ − x ∈ ker A and V ker A 0 by assumption. Rearranging the terms and dividing by ε > 0 yields 0 < 2x̂T V x − εx̂T V x̂ ≤ xT V x ε for all ε sufficiently small. Consequently, xT V x > 0. 4 The class of matrix support functionals σD(A,B) We are finally positioned to develop the properties of the support functional σD(A,B) defined in (3), where A ∈ Rp×n and B ∈ Rp×m are such that rge B ⊂ rge A. We begin by establishing a closed form expression for σD(A,B) . The key fact used in this derivation is Theorem 3.2. Theorem 4.1. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A and let D(A, B) be given by (4). Then ( σD(A,B) (X, V ) = 1 2 tr X T M (V )† X B B if rge else. +∞ X B ⊂ rge M (V ), V ker A 0, In particular, dom σD(A,B) = dom ∂σD(A,B) = (X, V ) ∈ E rge X ⊂ rge M (V ), V ker A 0 , B with this set being closed. Moreover, we have int (dom σD(A,B) ) = {(X, V ) ∈ E | V ker A 0 } . ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 9 Proof. By direct computation, σD(A,B) (X, V ) = h(X, V ), (U, W )i sup (U,W )∈D(A,B) 1 tr (X T U ) − tr (U U T V ) 2 U : AU =B !) ( m X 1 i i T T u (u ) V = sup tr (X U ) − tr 2 U : AU =B i=1 (m ) X 1 = sup (xi )T ui − (ui )T V ui 2 U : AU =B i=1 m X 1 T i T = − u V u − (x ) u inf (8) u: Au=bi 2 i=1 ( m i T i X xi 1 x M (V )† xbi if i 2 b bi ∈ rge M (V ), V ker A 0, = +∞ else i=1 T ! X X X 1 tr M (V )† if rge ⊂ rge M (V ), V ker A 0, B B B = 2 +∞ else. = sup Here, the sixth equation exploits Theorem 3.2. This establishes the representation for σD(A,B) as well as for its domain. In addition, since ∂σD(A,B) (X, V ) = arg max {h(X, V ), (U, W )i | (U, W ) ∈ D(A, B) } , Theorem 3.2 also yields the equivalence dom σD(A,B) = dom ∂σD(A,B) . In order to see that dom σD(A,B) is closed, let {(Xk , Vk ) ∈ dom σD(A,B) } → (V, X). In particular, rge Xk ⊂ rge [Vk AT ] and Vk ker A 0. These properties are preserved by passing to the limit, hence (X, V ) ∈ dom σD(A,B) , i.e., dom σD(A,B) is closed. It remains to prove the expression for int (dom σD(A,B) ). First we show O := {(X, V ) ∈ E | V ker A 0 } is open. For this purpose, let (X, V ) ∈ O. Suppose q := rank A and let P ∈ Rn×(n−q) be such that its columns form a basis of ker A so that P T V P is positive definite. Using the fact that Sn++ = int Sn+ (e.g., see [3, Ch. 1, Ex. 1]), there exists an ε > 0 such that kW − V k ≤ ε ⇒ PTWP 0 ∀W ∈ Sn . Hence, Bε (X, V ) ⊂ O so that O is open. Next, we show that O ⊂ dom σD(A,B) . Again let (X, V ) ∈ O, so that V ker A 0. By Corollary 3.5, rge [V AT ] = Rn and so rge X ⊂ rge [V AT ]. Hence (X, V ) ∈ dom σD(A,B) yielding O ⊂ dom σD(A,B) . All in all, since O is open and O ⊂ dom σD(A,B) , we have O ⊂ int dom σD(A,B) . We now show the reverse inclusion. Let (X, V ) ∈ dom σD(A,B) \ O. Hence V is positive semidefinite, but not positive definite on ker A. Therefore, there exists x ∈ ker A \ {0} such that xT V x = 0 . Define Vk := V − 1 I ∈ Sn k (k ∈ N). 10 JAMES V. BURKE AND TIM HOHEISEL 2 Then, for all k ∈ N, xT Vk x = − kxk < 0 , so that (X, Vk ) ∈ / dom σD(A,B) for k all k ∈ N while (X, Vk ) → (X, V ). Hence (X, V ) ∈ bd dom σ D(A,B) , that is, (X, V ) ∈ / int (dom σD(A,B) ). Consequently, O ⊃ int dom σD(A,B) . Our next goal is to compute the subdifferential of σD(A,B) . For this purpose, we make use of an alternative representation of the closed convex hull of D(A, B) which we now develop. Set ) ( d ≥ 0, kdk = 1, κ+1 n×m(κ+1) , (9) F(A, B) := (d, Z) ∈ R ×R AZi = di B (i = 1, . . . , κ + 1) where, for (d, Z) ∈ F(A, B), we interprete Z as a block matrix of the form Z = [Z1 , . . . , Zκ+1 ], Zi ∈ Rn×m (i = 1, . . . , κ + 1). Lemma 4.2. For the set D(A, B) defined in (4), we have 1 convD(A, B) = Z(d ⊗ Im ), − ZZ T | (d, Z) ∈ F(A, B) : Zi = 0 (i ∈ / supp d) . 2 (10) Proof. Recall that dim E = κ, hence, by Carathéodory’s Theorem (see Section 2.1), every element of conv D(A, B) can be represented as a convex combination of no more than κ + 1 elements of D(A, B). In order to see the ⊂ −inclusion, let (Y, − 21 W ) ∈ conv D(A, B). Then there exist Yi ∈ Rn×m with AYi = B (i = 1, . . . , κ + 1) and λ ∈ ∆κ+1 such that Y = κ+1 X i=1 λi Yi and W = √ For i = 1, . . . , κ + 1 define di := λi and Zi := and d = (d1 , . . . , dκ+1 )T , we have kdk2 = κ+1 X λi = 1 √ κ+1 X λi Yi YiT . i=1 λi Yi . With Z = [Z1 , Z2 , . . . , Zκ+1 ] and AZi = di B (i = 1, . . . , κ + 1). i=1 That is, (d, Z) ∈ F(A, B) and Zi = 0 (i ∈ / supp d). Moreover, it follows that Z(d ⊗ Im ) = κ+1 X di Zi = i=1 κ+1 X λi Yi = Y i=1 and ZZ T = κ+1 X i=1 Zi ZiT = κ+1 X λi Yi YiT = W. i=1 Hence, conv D(A, B) is contained in the set on the right-hand side of (10). To show the reverse inclusion, suppose that (Z(d ⊗ Im ), − 21 ZZ T ) is an element of the set on the right-hand side of (10), and define 1 if i ∈ supp d, di Zi λi := d2i and Yi := (i = 1, . . . , κ + 1). A† B else ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 11 Then, in particular, λ ∈ ∆κ+1 and AYi = B for all i = 1, . . . , κ + 1. Finally, we have κ+1 κ+1 X X X λi Yi = di Zi = di Zi = Z(d ⊗ Im ) i=1 i=1 i∈supp d and κ+1 X λi Yi YiT = i=1 X i∈supp d Zi ZiT = κ+1 X Zi ZiT , i=1 where we exploit the fact that Zi = 0 if i ∈ / supp d. Hence 1 T ∈ conv D(A, B) Z(d ⊗ Im ), − ZZ 2 which concludes the proof. Next, we compute the closed convex hull of D(A, B). Proposition 4.3. For the set D(A, B) defined in (4), we have 1 conv D(A, B) = (Z(d ⊗ Im ), − ZZ T ) | (d, Z) ∈ F(A, B) . 2 (11) Proof. The ⊂ −inclusion follows as soon as we establish that the set on the righthand side is closed (as it obviously contains conv D(A, B)): For this purpose, suppose (W, − 21 H) ∈ E is such that there is a sequence {(Zk (dk ⊗ Im ), − 21 Zk ZkT )} → (W, − 12 H) with (dk , Zk ) ∈ F(A, B) for all k ∈ N. Let Uk Σk QTk = Zk be the singular value decomposition of Zk with UkT Uk = In and QTk Qk = Im(κ+1) for all k ∈ N. For each k ∈ N, define Σ̂k ∈ Rn×n to be the non-negative diagonal matrix comprised of the first n columns of Σk . Note that the diagonal of Σ̂k contains all of the nonzero singular values of Zk . Since Uk Σ̂2k UkT = Zk ZkT → H with Uk orthogonal, it must be the case that {Σk } is bounded. Hence, with no loss in generality, there exist (κ+1) (U, Q, Σ, d) ∈ Rn×n × Rm(κ+1)×m(κ+1) × Rn×m × R+ with U and Q orthogonal, + k Σ diagonal, and kdk2 = 1 such that (Uk , Qk , Σk , d ) → (U, Q, Σ, d). In particular, it holds that Zk → Z := U ΣQT , ZZ T = H, W = Z(d ⊗ Im ) and, clearly, AZi = di B for all i = 1, . . . , κ + 1. This shows that (W, − 12 H) is an element of the set on the right-hand side of (11) and so this set is closed. To prove the reverse inclusion, let (d, Z) ∈ F(A, B). We are done once we find a sequence {(dk , Zk ) ∈ F(A, B)} → (d, Z) such that Zik = 0 if dki = 0. To this end, define the index set Q := {i ∈ {1, . . . , κ + 1} | di = 0, Zi 6= 0} , and put q := |Q|. If q = 0 (i.e. Q = ∅), the definitions dk := d and Z k := Z for all k ∈ N will suffice. Otherwise, choose i0 ∈ supp d (which is nonempty, as kdk = 1). Then define 0 if di = 0, Zi = 0, q 2k−1 di0 if i ∈ Q, q k dki := k−1 if i = i0 , k di0 di if i ∈ supp d \ {i0 } 12 JAMES V. BURKE AND TIM HOHEISEL and Zik := Zi dki A† B + Zi if if di = 0, Zi = 0, i ∈ Q, Z i0 Zi if if i = i0 , i ∈ supp d \ {i0 }. dk i0 di0 Then dk → d and Z k → Z. Also, we have AZik = dki B for all i = 1, . . . , κ + 1, and Zik = Zi = 0 if dki = 0. In addition, X 2k − 1 X (k − 1)2 2 2 kdk k2 = d + d + d2i i i 0 0 qk 2 k2 i∈Q i∈supp d\{i0 } X (k − 1)2 2 2k − 1 2 + d2i d di0 + i0 2 2 k k i∈supp d\{i0 } X 2 2 di0 + di = = i∈supp d\{i0 } X = d2i i∈supp d kdk2 = = k 1. k This shows that {(d , Z ) ∈ conv D(A, B)} → (d, Z), which gives the desired inclusion concluding the proof. As a simple consequence of the bi-conjugacy relationship in (5), we obtain the following corollary. Corollary 4.4. For A ∈ Rp×n and B ∈ Rp×m with rge B ∈ rge A, the conjugate of the support functional σD(A,B) for the set D(A, B) from (4) is given by ∗ σD(A,B) = δconv D(A,B) , where conv D(A, B) is provided by Proposition 4.3. To describe the subdifferential of σD(A,B) , we first compute the normal cone of the set F(A, B) defined in (9). Our approach uses the following preparatory lemmas. Lemma 4.5. Let F := F(0, 0) be defined through (9) and let (d, Z) ∈ F. Then κ+1 {0} if di > 0, NF (d, Z) = span d + X × {0}, R− . if di = 0, i=1 and F is regular. Proof. Let (d, Z) ∈ F = Sκ ∩ Rκ+1 × Rn×m(κ+1) . Due to [16, Prop. 6.41], we + have NF (d, Z) = NSκ ∩Rκ+1 (d) × NRn×m(κ+1) (Z). + Clearly, NRn×m(κ+1) (Z) = {0}, and by invoking [16, Ex. 6.8], we find that Sκ is regular with NSκ (d) = span d. Again by [16, Prop. 6.41] in conjunction with [16, Ex. 6.10], we have κ+1 κ+1 {0} if di > 0, NRκ+1 (d) = X N[0,∞) (di ) = X + R− , if di = 0. i=1 i=1 ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 13 In particular, we have the following implication: [v ∈ NSκ (d), w ∈ NRκ+1 (d) : v + w = 0] =⇒ v = w = 0. + The desired expression for NSκ ∩Rκ+1 now follows from [16, Th. 6.42], since Sκ is + regular (being a smooth manifold) and so is Rκ+1 (as a convex set). This result also + κ+1 tells us that Sκ ∩ R+ is regular; hence F is also regular (cf. [16, Prop. 6.41]). Lemma 4.6. For A ∈ Rp×n and B ∈ Rp×m consider the linear operator J : Rκ+1 × Rn×m(κ+1) → Rp×m(κ+1) defined by J(d, Z) := (AZ1 − d1 B, . . . , AZκ+1 − dκ+1 B) . The adjoint J ∗ : Rp×m(κ+1) → Rκ+1 × Rn×m(κ+1) (w.r.t. the inner products introduced on matrix product-spaces in Section 2.2) is given by tr (B T W1 ) T .. T J ∗ (W ) = − , A W1 , . . . , A Wκ+1 . . tr (B T Wκ+1 ) Proof. Let (d, Z) and W be arbitrary elements of Rκ+1 ×Rn×m(κ+1) and Rp×m(κ+1) , respectively, having conformal decompositions d ∈ Rκ+1 , Z = [Z1 , . . . , Zκ+1 ] with Zi ∈ Rn×m , i = 1, . . . , κ + 1, and W = [W1 , . . . , Wκ+1 ] with Wi ∈ Rp×m , i = 1, . . . , κ + 1. Then hW, J(d, Z)i = κ+1 X tr WiT (AZi − di B) i=1 = κ+1 X X κ+1 di tr (B T Wi ) tr (AT Wi )Zi − i=1 = i=1 [−(hB, W1 i , . . . , hB, Wκ+1 i)T , [AT W1 , . . . , AT Wκ+1 ], (d, Z) , which proves the result. We now establish a normal cone formula for the set F(A, B). Proposition 4.7. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A. Then NF (A,B) (d, Z) = NF (d, Z) + rge J ∗ ∀(d, Z) ∈ F(A, B), with F and J given by Lemmas 4.5 and 4.6, respectively. Proof. We first note that F(A, B) = F ∩ ker J. Moreover, F is regular by Lemma 4.5 and so is ker J as a subspace. The assertion will follow from [16, Th. 6.42] as soon as we establish the implication (r, R) + (s, S) = 0 ⇒ (r, R) = (s, S) = 0, (12) for all (r, R) ∈ NF (d, Z) and (s, S) ∈ Nker T (d, Z). For these purposes, first note that Nker J (d, Z) = rge J ∗ , see [16, Ex. 6.8]. Now, let (r, R) ∈ NF (d, Z) and 14 JAMES V. BURKE AND TIM HOHEISEL (s, S) ∈ Nker T (d, Z) such that (r, R) + (s, S) = 0: By the representation of J ∗ from Lemma 4.6, we know that there exists W ∈ Rp×m(κ+1) such that tr (B T W1 ) T .. T s = − and S = A W1 , . . . , A Wκ+1 . . tr (B T Wκ+1 ) Moreover, from the representation of NF (d, Z) from Lemma 4.5 we know that R = 0, hence S = 0, i.e. AT Wi = 0 for all i = 1, . . . , κ + 1. This, in turn, implies that rge Wi ⊂ ker AT ⊂ ker B T ∀i = 1, · · · κ + 1, thus B T Wi = 0 for all i = 1, . . . , κ + 1 . This immediately implies s = 0, hence r = 0, which proves the implication in (12). This concludes the proof. A formula for the the subdifferential ∂σD(A,B) follows. Theorem 4.8. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A. Then, for all (X, V ) ∈ dom σD(A,B) , the subdifferential of σD(A,B) at (X, V ) is given by Z̄i N1i † X ¯ = di M (V ) + , i = 1, 2, . . . , κ + 1 ! W̄i B N2i κ+1 κ+1 X X 1 T p×m(κ+1) n×m(κ+1) d¯i Z̄i , − . Z̄i Z̄i for some W̄ , N2 ∈ R , N1 ∈ R , 2 i=1 i=1 with (d, ¯ Z̄) ∈ F(A, B) and rge N1 ⊂ ker M (V ) N2 Moreover, σD(A,B) is continuously differentiable on int (dom σD(A,B) ). Proof. Let (X, V ) ∈ dom σD(A,B) , i.e., rge X B ⊂ rge M (V ) and V is positive semidefinite on ker A. Then (Ȳ , W̄ ) ∈ ∂σD(A,B) (X, V ) ⇔ (Ȳ , W̄ ) ∈ conv D(A, B) and ⇔ (Ȳ , W̄ ) ∈ conv D(A, B) and (X, V ) ∈ Nconv D(A,B) (Ȳ , W̄ ) 0 ≥ h(X, V ), (Y, W ) − (Ȳ , W̄ )i ¯ Z̄) ∈ F(A, B) : Ȳ = ⇔ ∃(d, 0≥ κ+1 X 1 d¯i Z̄i , W̄ = − 2 i=1 1 tr X T (di Zi − d¯i Z̄i ) − 2 i=1 ¯ Z̄) ∈ ⇔ ∃(d, argmin where f : R ×R κ+1 X n×m(κ+1) f (d, V ) := κ+1 X Z̄i Z̄iT , and i=1 tr V (Zi ZiT − Z̄i Z̄iT ) ∀ (d, Z) ∈ F(A, B) i=1 f (d, Z) : Ȳ = (d,Z)∈F (A,B) κ+1 ∀(Y, W ) ∈ conv D(A, B) κ+1 X κ+1 X 1 d¯i Z̄i , W̄ = − 2 i=1 κ+1 X Z̄i Z̄iT , i=1 → R is defined by κ+1 κ+1 X 1X tr (ZiT V Zi ) − di tr (X T Zi ). 2 i=1 i=1 Here, the first equivalence uses [16, Cor. 8.25], and the third one exploits the description of conv D(A, B) from Proposition 4.3. ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 15 ¯ Z̄) to be a minimizer of f over The necessary optimality conditions for (d, F(A, B) are ¯ Z̄) + NF (A,B) (d, ¯ Z̄) with (d, ¯ Z̄) ∈ F(A, B), 0 ∈ f 0 (d, (13) which, invoking Proposition 4.7 (as well as Lemma 4.5 and Lemma 4.6), reads ∃µ̄ ∈ R, W̄ ∈ Rp×m(κ+1) : d¯i tr tr X T Z̄i B W̄i X T Z̄i B W̄i M (V ) Z̄i W̄i = d¯i = µ̄d¯i (d¯i > 0), (d¯i = 0), ≤0 X (14) (i = 1, . . . , κ + 1), B d ∈ Sκ ∩ Rκ+1 . + The third condition gives Z̄i W̄i N1i † X ¯ = di M (V ) + B N2i with N1i rge N2i ⊂ ker M (V ), (15) for some N1i ∈ Rn×m and N2i ∈ Rp×m (i = 1, . . . , κ + 1). In particular, as T ker M (V ) ⊂ ker X , we get B tr T ! X Z̄i = d¯i tr B W̄i T ! X † X M (V ) (i = 1, . . . , κ + 1), B B which tells us that the first condition in (14) is generically fulfilled with µ̄ := tr ! T X † X M (V ) . B B Moreover, if d¯i = 0, we get from the third condition in (14) that Z̄i rge W̄i X ⊂ ker M (V ) ⊂ ker = B T !⊥ X rge , B and hence tr T ! X Z̄i = 0 ∀i : d¯i = 0, B W̄i i.e., the second condition in (14) is also generically fulfilled. This means that the set of all critical points of f over F(A, B) is given by Z̄i p×m(κ+1) † X ¯ ¯ C := (d, Z̄) ∈ F(A, B) ∃ W̄ ∈ R : rge ⊂ di M (V ) +ker M (V ) . W̄i B 16 JAMES V. BURKE AND TIM HOHEISEL ¯ Z̄) ∈ C and plug it into the objective Now, we pick an arbitrary critical point (d, function f . This gives ¯ Z̄) f (d, = κ+1 κ+1 1X 1X¯ tr (Z̄i (V Z̄i − d¯i X )) − di tr X T Z̄i {z } | 2 i=1 2 i=1 =−AT W̄i = − 1 2 κ+1 X κ+1 1X¯ tr (( AZ̄i )T W̄i ) − di tr X T Z̄i |{z} 2 i=1 i=1 =d¯i B = − 1 2 κ+1 X d¯i tr (B T W̄i ) + tr (Z̄iT X) i=1 T ! X Z̄i tr B W̄i i=1 ! κ+1 T X 1 X ¯2 † X = − d tr M (V ) B B 2 i=1 i ! T 1 X X = − tr M (V )† . 2 B B 1 = − 2 κ+1 X Hence, every critical point has the same function value. Theorem 4.1 tells us that ∂σD(A,B) (X, V ) 6= ∅ for (X, V ) ∈ dom σD(A,B) . In view of the chain of equivalences at the beginning of this proof, we know that there is a minimizer of f over F(A, B), which is in particular a critical point. Therefore, all critical points are minimizers of f over F(A, B), and the expression for the subdifferential follows. In order to prove the smoothness assertion, we first note that, clearly, the maximally possible set of continuous differentiability of σD(A,B) is int (dom σD(A,B) ). Moreover, we recall the well-known fact, that a proper, lsc, convex function is continuously differentiable at a point in the interior of its domain if and only if the subdifferential is a singleton, which hence comprises its gradient (cf., e.g., [16, Th. 9.18]). Furthermore, we have the following chain of equivalences: ∂σD(A,B) (V, X) is a singleton ⇐⇒ ∃ U ⊂ Rp : ker M (V ) = {0}n × U ⇐⇒ V ker A 0 ⇐⇒ (X, V ) ∈ int dom σD(A,B) . Here, the first equivalence is obtained from the representation of ∂σD(A,B) that was proven above. The second equivalence is due to Corollary 3.5 b), bearing in mind that (X, V ) ∈ dom σD(A,B) , hence V ker A 0. The third equivalence simply uses the representation of int (dom σD(A,B) ) given in Theorem 4.1. Corollary 4.9. Let A ∈ Rp×n and B ∈ Rp×m such that rge B ⊂ rge A and rank A = p. Then σD(A,B) is continuously differentiable on int (dom σD(A,B) ) with 1 0 σD(A,B) (X, V ) = Y (X, V ), − Y (X, V )Y (X, V )T ∀ (X, V ) ∈ int (dom σD(A,B) ), 2 where Y (X, V ) := A† B + P (P T V P )−1 P T [X − V A† B] and P ∈ Rn×(n−p) is such that its columns form an orthonormal basis of ker A. ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 17 Proof. Apply the inversion formula in Proposition 3.3 to the subdifferential of Pκ+1 ¯2 ¯ σD(A,B) from Theorem 4.8, bearing in mind, that i=1 di = 1 for (d, Z̄) ∈ F(A, B). 5 Applications 5.1 The optimal value function v Equation (8) in the proof of Theorem 4.1 tells us that the optimal value function v defined in (2) satisfies v = −σD(A,b) . As a consequence we have the following theorem. Theorem 5.1. Let v : Rn × Sn → R̄ be the optimal value function for the constrained quadratic optimization problem 1 T T u V u − x u | Au = b , v(x, V ) := infn u∈R 2 where A ∈ Rp×n and b ∈ Rp are such that b ∈ rge A. Then v(x, V ) = = −σD(A,b) (x, V ) ( T − 12 xb M (V )† xb −∞ (16) if V ker A 0, else. x b ∈ rge M (V ), That is, v is concave and upper semicontinuous. In addition, x dom v = dom ∂v = (x, V ) ∈ Rn × Sn ∈ rge M (V ) and V ker A 0 , b where ∂v is the concave subdifferential entiable on int (dom v) = (x, V ) ∈ Rn × Sn of v. The function v is continuously differ x b ∈ rge M (V ) and V ker A 0 , and, if rank A = p, then, for all (x, V ) ∈ int (dom v), v 0 (x, V ) = (−y, 21 yy T ) with y = A† b + P (P T V P )−1 (x − V A† b) and P ∈ Rn×(n−p) is any matrix whose columns form an orthonormal basis for ker A. Remark 5.2. Theorem 5.1 refers to the concave subdifferential of a concave function φ which is given by ∂φ(ū) := {g | φ(u) ≤ φ(ū) + hg, u − ūi ∀u } = −∂(−φ)(ū) where ∂(−φ) is the convex subdifferential of the convex function −φ. Proof. The equivalence (16) is an immediate consequence of (8). Consequently, the remaining statements in the theorem follow from Theorem 4.1 and Corollary 4.9. 5.2 The matrix-fractional function and its generalization Consider the function γ : E → R ∪ {+∞} given by 1 T † if V 0, rge X ⊂ rge V, 2 tr X V X γ(X, V ) := +∞ else. (17) For the case m = 1 (recall that E = Rn×m ×Sn ), this function has been investigated under the moniker pseudo matrix-fractional function [7, Ex. 3.5.0.0.2]. We refer to γ as the generalized matrix-fractional function when m > 1. It is related to the general fractional-quadratic matrix inequality introduced in [1, p. 155]. 18 JAMES V. BURKE AND TIM HOHEISEL Theorem 4.1 tells us that γ is the support functional σD(0,0) . This yields a series of immediate consequences. Theorem 5.3. The function γ defined in (17) is the support functional of the set D := D(0, 0) defined by (4), i.e, γ = σD . In particular, γ is sublinear (hence convex) and lsc. Moreover, dom γ = dom ∂γ = {(X, V ) ∈ E | V 0, rge X ⊂ rge V } , which is closed with int (dom γ) = {(X, V ) ∈ E | V 0 } . In addition, for (X, V ) ∈ dom γ, we have ! κ+1 X X 1 ∂γ(X, V ) = d¯i Z̄i , − Z̄i Z̄iT 2 i=1 i=1 ¯ (d, Z̄) ∈ F, N ∈ Rn×m(κ+1) with rge N ⊂ ker V , such that Z̄ = d¯ V † X + N i i i and γ is continuously differentiable exactly on int (dom γ) with 1 γ 0 (X, V ) = V −1 X, − V −1 XX T V −1 ∀(X, V ) ∈ int (dom γ). 2 Proof. Set A = 0 and B = 0 in Theorem 4.1 and Theorem 4.8, respectively, to get all assertions other than the explicit formula for the derivative. In order to prove the latter, let (X, V ) ∈ int (dom γ). In particular, this implies that ker V = {0} and Z̄i = d¯i V −1 X, we have and V † = V −1 . Hence, for d¯ ∈ Sκ ∩ Rκ+1 + κ+1 X d¯i (d¯i V −1 X) = V −1 X i=1 and − κ+1 1 1 X ¯ −1 (di V X)(d¯i V −1 X)T = − V −1 XX T V −1 . 2 i=1 2 Therefore, 1 −1 T −1 −1 = {γ 0 (X, V )}, ∂γ(X, V ) = V X, − V XX V 2 which concludes the proof. Next we consider the function φ : E → R ∪ {+∞} defined by 1 T −1 X) if V 0, 2 tr (X V φ(X, V ) := +∞ else. (18) Clearly, this function is closely related to the function γ defined above with dom φ = Rn×m × Sn++ = int (dom γ) and γ|dom φ = φ|dom φ . In the case where m = 1, φ is called the matrix-fractional function, e.g., see [2, Ex. 3.4] or [7, Ex. 3.5.0.0.4]. In the remainder of this section, we show that γ = cl φ. We begin with the following well-known result which can be found for example in [2, App. A.5.5] or [10, Th. 16.1] with the latter containing a proof. Lemma 5.4 (Schur complement). Let S ∈ Sn , T ∈ Sm , R ∈ Rn×m . Then S R 0 ⇐⇒ [S 0, rge R ⊂ rge S, T − RT S † R 0] RT T ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 19 Proposition 5.5. For the functions γ from (17) and φ from (18) we have: V X 1 a) epi γ = (X, V, α)| ∃ Y ∈ Sm : 0, tr (Y ) ≤ α . 2 XT Y V X 1 m b) epi φ = (X, V, α)| ∃ Y ∈ S : 0, V 0, 2 tr (Y ) ≤ α . XT Y Moreover, epi γ = cl (epi φ), or equivalently, γ = cl φ. Therefore, epi φ is a convex cone, i.e., φ is sublinear. Remark 5.6. An immediate consequence of part a) of this proposition is that epi γ is semidefinite representable, e.g., see [1, p. 144]. Proof. a) First note that Y X T V † X ⇒ tr (Y ) ≥ tr (X T V † X). (19) We have (X, V, α) ∈ epi γ ⇔ V 0, rge X ⊂ rge V, 1 tr (X T V † X) ≤ α 2 1 ⇔ ∃ Y ∈ Sm : V 0, rge X ⊂ rge V, Y X T V † X, tr (Y ) ≤ α 2 1 V X m ⇔ ∃Y ∈ S : 0, tr (Y ) ≤ α, XT Y 2 where the second equivalence follows from (19) and the fact that we may take Y = X T V † X, and the final equivalence follows from Lemma 5.4. b) Analogous to a) with the additional condition that V 0 in each step. In order to show that cl (epi φ) = epi γ, we first note that, as γ is lsc, epi γ is closed. Moreover, since obviously epi φ ⊂ epi γ, the inclusion cl (epi φ) ⊂ epi γ follows immediately. In order to see the converse inclusion, let (X, V, α) ∈ epi γ. Now, set Vk := V + k1 In 0 and put αk := α + γ(X, Vk ) − γ(X, V ) for all k ∈ N. Then, by definition, we have αk = α + γ(X, Vk ) − γ(X, V ) ≥ γ(X, Vk ) = φ(X, Vk ), i.e., (X, Vk , αk ) ∈ epi φ for all k ∈ N. Clearly, Vk → V with Vk nonsingular, and so Vk−1 → V + (e.g., see [15, p. 153]). Consequently, φ(X, Vk ) = γ(X, Vk ) → φ(X, Vk ) so that αk → α. Therefore, (X, Vk , αk ) → (X, V, α), and so epi γ = cl (epi φ), i.e., γ = cl φ. Also, since γ is a support functional (by Theorem 5.3), hence, in particular, sublinear and lsc, epi γ is a closed, convex cone. Moreover, as epi φ = epi γ ∩ K, with K := Rn×m × Sn++ × R being a convex cone, also epi φ is a convex cone. Hence, φ is also sublinear. 5.3 The relationship between σD(A,B) and the nuclear norm There is a fascinating relationship between σD(A,B) and the nuclear norm which was first suggested to us by results in [13] for the case (A, B) = (0, 0) where it is used 20 JAMES V. BURKE AND TIM HOHEISEL to smooth the nuclear norm. The key idea is to consider optimization problems of the form D E c + σD(A,B) (X, V ), minn V, W (20) V ∈S c ∈ Sn are given and fixed. In this section we consider one where X ∈ Rn×m and W c (described below). possible choice for the matrix W To begin with, let X ∈ Rn×m , A ∈ Rp×n and B ∈ Rp×m be given with rge B ⊂ rge A and m ≥ n. Set k := dim (ker A), and define P ∈ Rn×k to be any matrix whose columns form an orthonormal basis for ker A so that P P T is the orthogonal projection onto ker A. Let P P T X have singular value decomposition Σ [U1 , U2 , U3 ] 0(k−t)×t 0(n−k)×t 0t×(k−t) 0t×(n−k) 0(k−t)×(k−t) 0(n−k)×(k−t) 0(k−t)×(n−k) 0(n−k)×(n−k) QT1 QT2 T 0(k−t)×(m−n) QT3 = U1 ΣQ1 , 0(n−k)×(m−n) QT4 (21) 0t×(m−n) where t := rank (P P T X), Σ := diag (σ1 , σ2 , . . . , σt ) with σ1 , σ2 , . . . , σt the singular values of P P T X, U := [U1 , U2 , U3 ] and Q := [Q1 , Q2 , Q3 , Q4 ] are orthogonal, and b := [U1 , U2 ] is chosen so that its columns form an orthonormal basis for the kernel U bU b T , so that of A. Set Pb := U P P T = PbPbT = U1 U1T + U2 U2T . (22) Finally, define bQ b T )(A† B + U bQ b T )T , c := 1 (A† B + U W (23) 2 b := [Q1 , Q2 ]. The matrix U bQ b T establishes an isometry between the kwhere Q b b = rge P . We have the dimensional subspaces ker A and rge Q, where ker A = rge U following result. c be as above with m ≥ n. Then Theorem 5.7. Let X, A, B, P, Σ, U, Q, and W the optimal value in (20) is (24) X, A† B + P P T X n , and is attained at the matrix Vb := U1 ΣU1 . In particular, if rge X ⊂ ker A, then T ⊥ the optimal value is kXkn ; while, on the other hand, if rge X ⊂ rge A = (ker A) , † then the optimal value is X, A B . Proof. We begin by showing that the matrix Vb := U1 ΣU1T solves (20). For this it is helpful to keep in mind certain key relationships between the matrices defined above. First, recall that AA† is the orthogonal projection onto rge A and A† A is the orthogonal projection onto rge AT , i.e., A† A = I − P P T . Moreover, rge A† = (ker A)⊥ which, in particular, implies that P T A† = 0, U1T A† = 0, and U2T A† = 0. In addition, we use the fact that Vb † = U1 Σ−1 U1T (see discussion following (6)), as well as the relations Vb = P P T Vb P P T and Vb † = P P T Vb † P P T which follow from the equivalence U1 = P P T U1 . We claim that Vb † A† . M (Vb )† = K := (A† )T 0 By (6), we need to show that M (Vb )KM (Vb ) = M (Vb ) and KM (Vb )K = K (25) ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 21 and (M (Vb )K)T = M (Vb )K and (KM (Vb ))T = KM (Vb ). (26) First observe that U ΣU T U Σ−1 U1T + AT (A† )T b M (V )K = 1 1 1 AU1 ΣU1T U1 U1T + A† A 0 = 0 AA† I − U2 U2T 0 = , 0 AA† U1 ΣU1T A† AA† and U Σ−1 U1T U1 ΣU1T + A† A b KM (V ) = 1 (A† )T U1 Σ−1 U1T U1 U1T + A† A 0 = 0 AA† I − U2 U2T 0 = . 0 AA† U1 Σ−1 U1T A† (A† )T AT This establishes (26). To see (25), write U Σ−1 U T (I − U2 U2T ) A† AA† KM (Vb )K = 1 † T 1 (A ) (I − U2 U2T ) 0 † † Vb A = (A† )T 0 =K and (I − U2 U2T )Vb b b M (V )KM (V ) = AA† A Vb AT = A 0 = M (Vb ). (I − U2 U2T )AT 0 Since the problem (20) is convex, one need only show that Vb satisfies the first-order c + ∂V σD(A,B) (X, Vb ). By Theorem necessary condition for optimality, i.e., 0 ∈ W 4.8 and [16, Cor. 10.11], these conditions read ¯ Z̄) ∈ F(A, B), ∃ W̄ ∈ Rp×m(κ+1) , (d, and N̄1 N̄1 n×m(κ+1) p×m(κ+1) ∈R ×R with rge ⊂ ker M (V ) N̄2 N̄2 (27) such that Z̄i W̄i N̄1i † X ¯ = di M (V ) + (i = 1, 2, . . . , κ + 1) B N̄2i (28) and bQ b T )(A† B + U bQ b T )T = (A† B + U κ+1 X i=1 Z̄i Z̄iT . (29) 22 JAMES V. BURKE AND TIM HOHEISEL ¯ Z̄), and N̄1 . Set We now construct a suitable W̄ , (d, N̄2 N11 U2 QT2 N1i 0 ¯ ¯ d1 = 1, di = 0, := , := (i = 2, 3, . . . , κ + 1), N21 0 N2i 0 Z̄ and define W̄ using (28). This gives b† † bQ bT Z̄1 X U2 QT2 V X + A† B + U2 QT2 A B+U = = M (V )† + = W̄1 B 0 (A† )T X (A† )T X and 0 (i = 2, 3, . . . , κ + 1). 0 In particular, this implies that (29) holds. Clearly, 0 ≤ d¯ and d¯ = 1 and AZ̄1 = ¯ Z̄) ∈ F(A, B) and (27) is satisfied. Hence Vb is indeed a AA† B = B, so that (d, solution to (20). We now compute the optimal value in (20). This is given by T ! E 1 E D D X † X b c b b M (V ) . Y , X + W , V + tr 2 B B Z̄i W̄i = We have D c , Vb W E = = = = = = 1 † bQ b T )(A† B + U bQ b T )T U1 ΣU1T tr (A B + U 2 1 † bQ b T )(B T (A† )T U1 ΣU1T + Q1 ΣU1T ) tr (A B + U 2 1 † bQ b T )Q1 ΣU1T tr (A B + U 2 1 tr A† BQ1 ΣU1T + tr U1 ΣU1T 2 1 1 tr U1T A† BQ1 Σ + tr (Σ) 2 2 1 P P T X n 2 and tr T ! X † X b M (V ) = tr B B T b † ! X V X + A† B B (A† )T X = tr (X T Vb † X) + 2tr (X T A† B) = tr (X T P P T Vb † P P T X) + 2tr (X T A† B) = tr (Q1 ΣQT1 ) + 2tr (X T A† B) = P P T X n + 2tr (X T (I − P P T )A† B) = P P T X n + 2tr (X T A† B) . b Hence the optimal value is given by (24) whenT Y := †0. Finally, note that if rge X ⊂ T † ker A, then P P X = X so X, A B = P P X, A B = X, P P T A† B = 0. c = By applying Theorem 5.7 to γ = σD(0,0) with W result. 1 2 I, we obtain the following ON A NEW CLASS OF MATRIX SUPPORT FUNCTIONALS 23 Corollary 5.8. [13, Lemma 1] Let X ∈ Rn×m with m ≥ n, and let γ : E → R ∪ {+∞} be defined in (17). Then 1 kXkn = minn tr (V ) + γ(X, V ), V ∈S 2 with the minimum value attained at V̂ := U1 ΣU1T , where X = U1 ΣQT1 is the reduced SVD for X. This representation of the nuclear norm is used in [13] as a means to introduce a smoothing term for optimization problems involving a nuclear norm regularizer. Indeed, Theorem 5.3 shows that, on the interior of its domain, γ is a smooth function with an easily computed derivative. Using the more general formulation developed here, it is possible to derive smoothers for more general types of regularizers. We illustrate one such possibility in the following corollary. Corollary 5.9. Let X ∈ Rp×p and let P ∈ Rp×k be any matrix having orthonormal columns. Define A := I − P P T and B := λ(I − P P T )X. With this specification, c in (23) is given by Ŵ = 1 (I + B)(I + B)T and the optimal value in the matrix W 2 (20) with Yb := 0 is 2 λ (I − P P T )X + P P T X . F † n † Proof. Simply observe that A = A and A B = λB, and apply Theorem 5.7 with n = p = m. 6 Final Remarks We have introduced a new class of support functionals σD(A,B) , where 1 T D(A, B) = Y, − Y Y | AY = B . 2 It is shown that these support functionals have a natural connection to the optimal value function for quadratics over affine manifolds. Connections are also made to the matrix-fractional function and the pseudo matrix-fractional function as well as their generalizations. It is remarkable that all of these objects can be represented by the support functional of D(A, B) for an appropriate choice of A and B. These support functional representations had not previously been observed. In addition, we have computed the the closed convex hull of D(A, B) which allowed us to compute the subdifferential of σD(A,B) . This computation made use of nontrivial techniques from nonconvex variational analysis. Finally, we obtained a general representation theorem that can be used to give alternative representation for the nuclear norm and related regularizers in matrix optimization. These representations are particularly useful in that they can be used to develop smoothers for the regularization terms. References [1] A. Ben-Tal and A. Nemirovski Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications MPS-SIAM Series on Optimization, SIAM, Philadelphia, 2001. [2] S. Boyd and L. Vandenbergh: Convex Optimization. Cambridge University Press, 2004. [3] J.M. Borwein and A.S. Lewis: Convex Analysis and Nonlinear Optimization. Theory and Examples. CMS Books in Mathematics, Springer-Verlag, New York, 2000. [4] J.V. Burke: Nonlinear Optimization. Lecture Notes, Math 408, University of Washington, Spring 2014. 24 JAMES V. BURKE AND TIM HOHEISEL [5] J.V. Burke and A.Y Aravkin: Smoothing dynamic systems with state-dependent covariance matrices. To appear in the Conference on Decision and Control (CDC), 2014 (6 pages). [6] Y. Chabrillac and J.-P. Crouzeix: Definiteness and semidefiniteness of quadratic forms. Linear Algebra and its Applications 63, 1984, pp. 283–292. [7] J. Dattorro: Convex Optimization & Euclidean Distance Geometry. Mεβoo Publishing USA, Version 2014.04.08, 2005 . [8] M. Faliva and M.G. Zoia: On a partitioned inversion formula having useful applications in econometrics. Econometric Theory 18, 2002, pp. 525–530. [9] P. Finsler: Über das Vorkommen definiter und semidefiniter Formen und Scharen quadratischer Formen. Commentarii Mathematici Helvetici 9, 1937, pp. 188–192. [10] J. Gallier: Geometric Methods and Applications: For Computer Science and Engineering. Texts in Applied Mathematics, Springer New York, Dordrecht, London, Heidelberg, 2011. [11] G.H. Golub and C.F van Loan: Matrix Computations. John Hopkins Series in Mathematicals Sciences, The John Hopkins University Press, Baltimore, Maryland, 1983. [12] R. A. Horn and C. R. Johnson: Matrix Analysis. Cambridge University Press, New York, N.Y., 1985. [13] C.-J. Hsieh and P. Olsen: Nuclear Norm Minimization via Active Subspace Selection. JMLR W&CP 32 (1) :575-583, 2014. [14] S.-J. Kim and S. Boyd: A Minimax Theorem with Applications to Machine Learning, Signal Processing, and Finance. SIAM Journal on Optimization, 19(3):13441367, 2008. [15] J.R. Magnus and H. Neudecker Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons, New York, NY, 1988. [16] R.T. Rockafellar and R.J.-B. Wets: Variational Analysis. A Series of Comprehensive Studies in Mathematics, Vol. 317, Springer, Berlin, Heidelberg, 1998. University of Washington, Department of Mathematics, Box 354350 Seattle, Washington 98195-4350 E-mail address: burke@math.washington.edu University of Würzburg, Institute of Mathematics, Campus Hubland Nord, EmilFischer-Straße 30, 97074 Würzburg, Germany E-mail address: hoheisel@mathematik.uni-wuerzburg.de