c 2002 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 13, No. 1, pp. 228–239 NONLINEARLY CONSTRAINED BEST APPROXIMATION IN HILBERT SPACES: THE STRONG CHIP AND THE BASIC CONSTRAINT QUALIFICATION∗ CHONG LI† AND XIAO-QING JIN‡ Abstract. We study best approximation problems with nonlinear constraints in Hilbert spaces. The strong “conical hull intersection property” (CHIP) and the “basic constraint qualification” (BCQ) condition are discussed. Best approximations with differentiable constraints and convex constraints are characterized. The analysis generalizes some linearly constrained results of recent works [F. Deutsch, W. Li, and J. Ward, J. Approx. Theory, 90 (1997), pp. 385–444; F. Deutsch, W. Li, and J. D. Ward, SIAM J. Optim., 10 (1999), pp. 252–268]. Key words. best approximation, strong CHIP, BCQ condition, differentiable constraint, convex constraint AMS subject classifications. 41A65, 41A29 PII. S1052623401385600 1. Introduction. In recent years, a lot of attention has been focused on constrained best approximation problems in Hilbert spaces; see, e.g., [5, 6, 9, 10, 11, 16, 17]. These problems find applications (cf. [2]) in statistics, mathematical modeling, curve fitting, and surface fitting. The setting is as follows. Let X be a Hilbert space, C a nonempty closed convex subset of X, and A a bounded linear operator from X to a finite-dimensional Hilbert space Y . Given “data” b ∈ Y , the problem consists of finding the best approximation PK (x) to any x ∈ X from the set K := C ∩ A−1 (b) = C ∩ {x ∈ X : Ax = b}. Generally, it is easier to compute the best approximation from C than from K. Therefore, the interest of several papers [5, 6, 9, 11, 16, 17] was centered on the following problem: for any x ∈ X, does there exist a y ∈ Y such that PK (x) = PC (x + A∗ y)? It was proved in [9] that a sufficient and necessary condition for an affirmative answer to this question is that the pair {C, A−1 (b)} satisfy the strong “conical hull intersection property” (CHIP). Very recently, Deutsch, Li, and Ward in [10] considered a more general problem of finding the best approximation PK (x) to any x ∈ X from the set (1.1) K = C ∩ {x ∈ X : Ax ≤ b} and established a result similar to that of [9]. More precisely, they proved the following theorem (see Theorem 3.2 and Lemma 3.1 in [10]). Theorem DLW. Let A be defined on X by Ax := (x, h1 , x, h2 , . . . , x, hm ) ∗ Received by the editors February 26, 2001; accepted for publication (in revised form) February 11, 2002; published electronically July 16, 2002. http://www.siam.org/journals/siopt/13-1/38560.html † Department of Mathematics, Zhejiang University, Hangzhou 310027, P. R. China (cli@seu.edu. cn). The research of this author is supported by the National (grant 19971013) and Jiangsu Provincial (grant BK99001) Natural Science Foundations of China. ‡ Faculty of Science and Technology, University of Macau, Macau, P. R. China (xqjin@umac.mo). The research of this author is supported by research grants RG010/99-00S/JXQ/FST and RG026/ 00-01S/JXQ/FST from the University of Macau. 228 NONLINEAR APPROXIMATION IN HILBERT SPACES 229 for some hi ∈ X \ {0} for i = 1, 2, . . . , m. Let b ∈ Rm and x∗ ∈ K = C ∩ {x ∈ X : Ax ≤ b}. Then the following two statements are equivalent: m (i) For any x ∈ X, x∗ = PK (x) ⇐⇒ x∗ = PC (x − i=1 λi hi ) for some λi ≥ 0 with λi (x, hi − bi ) = 0 for all i. (ii) {C, H1 , . . . , Hm } has the strong CHIP at x∗ , where Hi := {x ∈ X : x, hi ≤ bi } for all i. Theorem DLW gives an unconstrained reformulation for the linearly constrained system, for which a complete theory has been established. The importance of such a theory was described in detail in [10, 11], etc. One natural problem is: can one extend such a theory to a nonlinearly constrained system? Admittedly, this problem for a general nonlinearly constrained system is quite difficult. In this paper, we shall relax the linearity assumption made on the operator A in the constraint (1.1) in two ways. First, we study the case in which A is assumed to be Fréchet differentiable, and second, we examine the case in which A is convex (i.e., each component is convex). In the Fréchet differentiable case, we will give a theorem (Theorem 4.1) that is similar to Theorem DLW, where hi in Theorem DLW is replaced by the Fréchet derivative Ai (x∗ ) of Ai at x∗ , for i = 1, 2, . . . , m. Note that, when A is nonlinear, the approximating set K is, in general, nonconvex (see Example 4.1). Thus Theorem DLW does not work in this case, since K can not be re-expressed as the intersection of C and a polyhedron. In addition, the nonconvexity of the set K makes the original problem very complicated. In fact, there is no successful way to characterize the best approximation from general nonconvex sets. The merit of the present results lies in converting a nonconvex constrained problem into a convex unconstrained one. In the convex case, the sets Hi , i = 1, 2, . . . , m, may not be well defined, although K is convex and, in general, Theorem DLW does not work either (see Example 5.1). To establish a similar unconstrained reformulation result, we introduce the concept of the “basic constraint qualification” (BCQ) relative to C, which is a generalization of the BCQ considered in [12, 13]. We prove that the BCQ relative to C is a sufficient and necessary condition to ensure the following “perturbation property”: for any m x ∈ X, PK (x) = x∗ if and only if PC (x − 1 λi hi ) = x∗ for some hi ∈ ∂Ai (x∗ ) and λi ≥ 0 with λi (Ai (x∗ ) − bi ) = 0. Clearly, in either case, the present results generalize the main results in [10]. The paper is organized as follows. We describe some notations and a useful proposition in section 2. To deal with the problem with differentiable constraints, we need to linearize the constraints in section 3. Unconstrained reformulation results for differentiable constraints and convex constraints are established in sections 4 and 5, respectively. Finally, a concluding remark is given in section 6. 2. Preliminaries. Let X be a Hilbert space. For a nonempty subset S of X, the convex hull (resp., conical hull) of S, denoted by convS (resp., coneS), is the intersection of all convex sets (resp., convex cones) including S, while the dual cone S ◦ of S is defined by S ◦ = {x ∈ X : x, y ≤ 0 ∀y ∈ S}. Then the normal cone of S at x is defined by NS (x) = (S − x)◦ . The closure (resp., interior, relative interior) of any set S is denoted by S (resp., intS, riS). For a function f from X to R, the subdifferential of f at x ∈ X, denoted by ∂f (x), is defined by ∂f (x) = {z ∈ X : f (x) + z, y − x ≤ f (y) ∀y ∈ X}. 230 CHONG LI AND XIAO-QING JIN It is well known that ∂f (x) = ∅ for all x ∈ X if f is a continuous convex function. Let G be a nonempty closed convex set in X. Then for any x ∈ X, there exists a unique best approximation PG (x) from G to x. We define τ (x, y) = lim t→+0 x + ty − x . t Since x/x is the unique supporting functional of x, we have τ (x, y) = x, y . x The following well-known characterization of the best approximation is useful; see [9, 10]. Proposition 2.1. Let G be a convex subset of X, x ∈ X, and g0 ∈ G. Then PG (x) = g0 ⇐⇒ x − g0 , g0 − g ≥ 0 for any g ∈ G ⇐⇒ x − g0 ∈ (G − g0 )◦ . 3. Linearization of the constraints. In the remainder of the paper, we always assume that C = ∅ is a closed convex subset of X. Suppose that A(·) = (A1 (·), . . . , Am (·)) is Fréchet differentiable from X to Rm and b = (b1 , . . . , bm ) ∈ Rm . Let me ∈ {1, 2, . . . , m} be a fixed integer. Define K0 = {x ∈ X : Ai (x) = bi , i ∈ E} ∩ {x ∈ X : Ai (x) ≤ bi , i ∈ I} and K = C ∩ K0 , where E = {1, 2, . . . , me }, I = {me + 1, . . . , m}. Furthermore, let I(x∗ ) = {i ∈ I : Ai (x∗ ) = bi } ∀x∗ ∈ K. The following concepts can be easily found in any book on constrained optimization; see, e.g., [14, 20]. Definition 3.1. Let x∗ ∈ K. A vector d = 0 is called a feasible direction of K ∗ at x if there exists δ > 0 such that x∗ + td ∈ K ∀t ∈ [0, δ]. The set of all feasible directions of K at x∗ is denoted by FD(x∗ ). Definition 3.2. Let x∗ ∈ K. A vector d is called a linearized feasible direction of K at x∗ if d, Ai (x∗ ) = 0 ∀i ∈ E and d, Ai (x∗ ) ≤ 0 ∀i ∈ I(x∗ ), NONLINEAR APPROXIMATION IN HILBERT SPACES 231 where Ai (x∗ ) is the Fréchet derivative of Ai at x∗ . The set of all linearized feasible directions of K at x∗ is denoted by LFD(x∗ ). Definition 3.3. Let x∗ ∈ K. A vector d is called a sequentially feasible direction of K at x∗ if there exist a sequence {dk } ⊂ X and a sequence {δk } of real positive numbers such that dk → d, x∗ + δk dk ∈ K, δk → 0, k = 1, 2, . . . . ∗ The set of all sequentially feasible directions of K at x is denoted by SFD(x∗ ). Obviously, we have the following inclusion relationship for various feasible directions. Proposition 3.1. Let x∗ ∈ K. Then FD(x∗ ) ⊆ SFD(x∗ ) ⊆ LFD(x∗ ). For convenience, let KS (x∗ ) = conv(x∗ + SFD(x∗ )) ∩ C and KL (x∗ ) = (x∗ + LFD(x∗ )) ∩ C. Then KS (x∗ ) and KL (x∗ ) are closed convex cones. The following two theorems describe the equivalence of the best approximation from K and from KS (x∗ ), which plays an important role in our study. Theorem 3.1. Let x∗ ∈ K. Then, for any x ∈ X, if PK (x) x∗ , we have PKS (x∗ ) (x) = x∗ . Proof. For any x̄ ∈ x∗ + SFD(x∗ ), d = x̄ − x∗ ∈ SFD(x∗ ) there exist dk ∈ X with dk → d and δk > 0 with δk → 0 such that x∗ + δk dk ∈ K. It follows from PK (x) x∗ that x − x∗ ≤ x − x∗ − δk dk , k = 1, 2, . . . . Since τ (x − x∗ , x∗ − x̄) = lim k ≥ lim k and x − x∗ − δk d − x − x∗ δk x − x∗ − δk d − x − x∗ − δk dk δk x − x∗ − δk d − x − x∗ − δk dk ≤ dk − d, δk it follows that x − x∗ , x∗ − x̄ ≥ 0 ∀x̄ ∈ x∗ + SFD(x∗ ). Since KS (x∗ ) ⊆ conv(x∗ + SFD(x∗ )), we have x − x∗ , x∗ − x̄ ≥ 0 ∀x̄ ∈ KS (x∗ ). This, with Proposition 2.1, implies that PKS (x∗ ) (x) = x∗ , and the theorem follows. Theorem 3.2. Let x∗ ∈ K. Then the following two statements are equivalent: 232 CHONG LI AND XIAO-QING JIN (i) K ⊆ KS (x∗ ). (ii) For any x ∈ X, PKS (x∗ ) (x) = x∗ =⇒ PK (x) = x∗ . Proof. It suffices to prove that (ii) =⇒ (i). Let G = conv(x∗ + SFD(x∗ )). Suppose on the contrary that K ⊆ KS (x∗ ). Then K ⊆ G, so that there is x̄ ∈ K but x̄ ∈ / G. Let g0 = PG (x̄) and x = x̄ − g0 + x∗ . Then PG (x) = x∗ . In fact, since G = x∗ + conv(SFD(x∗ )), for any g ∈ G, there exist ḡ0 , ḡ ∈ conv(SFD(x∗ )) such that g0 = x∗ + ḡ0 and g = x∗ + ḡ. Note that G is a cone with vertex x∗ . It follows that g + g0 − x∗ = x∗ + ḡ + ḡ0 ∈ G, which, by Proposition 2.1, implies that x̄ − g0 , g0 − (g + g0 − x∗ ) ≥ 0 as g0 = PG (x̄). Thus we have x − x∗ , x∗ − g = x̄ − g0 , g0 − (g + g0 − x∗ ) ≥ 0, which proves that PG (x) = x∗ . Now define xt = x∗ + t(x − x∗ ) ∀t > 0. From 1 1 ∗ x + g xt − x = tx − x ≤ t x − 1 − = xt − g t t ∗ ∗ ∀g ∈ G, t > 1, it follows that PG (xt ) = x∗ for t > 1. Therefore, from (ii) and KS (x∗ ) ⊆ G, we have PK (xt ) = x∗ for t > 1. On the other hand, for t > 1 we obtain xt − x̄2 = x∗ + t(x̄ − g0 ) − x̄2 = x∗ − g0 + (t − 1)(x̄ − g0 )2 = (t − 1)2 x̄ − g0 2 + 2(t − 1)x̄ − g0 , x∗ − g0 + x∗ − g0 2 . Since g0 = PG (x̄), it follows from Proposition 2.1 that x̄ − g0 , x∗ − g0 ≤ 0, and hence xt − x̄2 ≤ (t − 1)2 x̄ − g0 2 + x∗ − g0 2 = t2 x̄ − g0 2 − 2tx̄ − g0 2 + x̄ − g0 2 + x∗ − g0 2 < t2 x̄ − g0 2 = xt − x∗ 2 for all t > 1 large enough. This means that x∗ ∈ / PK (xt ), which is a contradiction. The proof is complete. Similarly, we have the following result for KL (x∗ ). Theorem 3.3. Let x∗ ∈ K. Then the following statements are equivalent: (i) K ⊆ KL (x∗ ). (ii) For any x ∈ X, PKL (x∗ ) (x) = x∗ ⇐⇒ PK (x) x∗ . Corollary 3.1. Let x∗ ∈ K. Consider the following statements: NONLINEAR APPROXIMATION IN HILBERT SPACES 233 (i) K ⊆ KL (x∗ ) and KS (x∗ ) = KL (x∗ ). (ii) For any x ∈ X, PK (x) x∗ ⇐⇒ PKL (x∗ ) (x) = x∗ . (iii) For any x ∈ X, PK (x) x∗ =⇒ PKL (x∗ ) (x) = x∗ . Then (i) =⇒ (ii) =⇒ (iii). Furthermore, if K ⊆ KS (x∗ ), then (i) ⇐⇒ (ii) ⇐⇒ (iii). Proof. If (i) holds, by Theorems 3.1 and 3.2, we have PK (x) x∗ ⇐⇒ PKS (x∗ ) (x) = x∗ ⇐⇒ PKL (x∗ ) (x) = x∗ . Therefore (ii) holds. The implication (ii) =⇒ (iii) is trivial. Now assume that K ⊆ KS (x∗ ). If (iii) holds, then, for any x ∈ X, PKS (x∗ ) (x) = x∗ =⇒ PK (x) x∗ =⇒ PKL (x∗ ) (x) = x∗ . Thus, with almost the same arguments as in the proof of Theorem 3.2, we have KL (x∗ ) ⊆ KS (x∗ ). By Proposition 3.1, KL (x∗ ) = KS (x∗ ) and so K ⊆ KL (x∗ ); i.e., (i) holds. It should be noted that if K is convex (e.g., A1 , . . . , Ame are linear and Ame +1 , . . . , Am are convex), K ⊆ KS (x∗ ) holds. We therefore have the following corollary. Corollary 3.2. Let x∗ ∈ K. If K is convex, then the following statements are equivalent: (i) KS (x∗ ) = KL (x∗ ). (ii) For any x ∈ X, PK (x) = x∗ ⇐⇒ PKL (x∗ ) (x) = x∗ . (iii) For any x ∈ X, PK (x) = x∗ =⇒ PKL (x∗ ) (x) = x∗ . 4. Reformulations of differentiable constraints. The following notation of the strong CHIP, taken from [9, 10], plays an important role in optimization theory; see, e.g., [7, 8, 12, 18]. Definition 4.1. Let {C0 , . . . , Cm } be a collection of closed convex sets and x ∈ ∩m j=0 Cj . Then {C0 , . . . , Cm } is said to have the strong CHIP at x if ◦ m m Cj − x = (Cj − x)◦ . j=0 j=0 Now, for convenience, we write Ai+m (x∗ ) = − Ai (x∗ ), b̄i = bi − A(x∗ ) + x∗ , Ai (x∗ ), Hi = {d ∈ X : d, Ai (x∗ ) ≤ b̄i }, i = 1, 2, . . . , me , i = 1, 2, . . . , m + me , i = 1, 2, . . . , m + me , and E0 = E ∪ I(x∗ ) ∪ {m + 1, . . . , m + me }, E1 = I \ I(x∗ ). We define the bounded linear mapping A(x∗ )| from X to Rme by A(x∗ )|x = (x, A1 (x∗ ), . . . , x, Ame (x∗ )) ∈ Rme ∀x ∈ X. The inverse of A(x∗ )|, which is generally a set-valued mapping, is denoted by A(x∗ )|−1 . Let b̄ = (b̄1 , . . . , b̄me ). 234 CHONG LI AND XIAO-QING JIN Then we are ready to give the main result of this section. Theorem 4.1. Let x∗ ∈ K. Suppose that K ⊆ KL (x∗ ) and KS (x∗ ) = KL (x∗ ). Then the following statements are equivalent: (i) {C, A(x∗ )|−1 (b̄), Hi , i ∈ I(x∗ )} has the strong CHIP at x∗ . (ii) {C, A(x∗ )|−1 (b̄), Hi , i ∈ I} has the strong CHIP at x∗ . (iii) For any x ∈ X, m PK (x) x∗ ⇐⇒ PC x − λi Ai (x∗ ) = x∗ 1 for some λi , i = 1, . . . , m, with λi ≥ 0 for all i ∈ I, and λi = 0 for all i∈ / E ∪ I(x∗ ). Proof. We first assume that (i) holds. Since x∗ ∈ int ∩i∈E1 Hi , it follows from Proposition 2.3 of [10] that {C ∩ (∩i∈E0 Hi ), Hi , i ∈ E1 } has the strong CHIP at x∗ . Thus (i) implies that {C, A(x∗ )|−1 (b̄), Hi , i = 1, . . . , m} has the strong CHIP at x∗ . Therefore, (ii) holds. Now suppose that (ii) holds. By Corollary 3.1, we have that, for any x ∈ X, PK (x) x∗ ⇐⇒ PKL (x∗ ) (x) = x∗ . We will show that PKL (x∗ ) (x) = x∗ if and e only if PKL0 (x∗ ) (x) = x∗ , where KL0 (x∗ ) = C ∩ (∩m+m Hi ). In fact, it is clear that i=1 ∗ ∗ PKL (x∗ ) (x) = x implies PKL0 (x∗ ) (x) = x . Conversely, assume that PKL0 (x∗ ) (x) = x∗ . Since KL (x∗ ) U (x∗ , r) ⊆ KL0 (x∗ ) for some r > 0, where U (x∗ , r) denotes the ∗ ∗ open ball with∗ center x and∗ radius r > 0, x is a best approximation to ∗x from ∗ KL (x ) U (x , r), that is, x is a local best approximation to x from KL (x ), and hence PKL (x∗ ) (x) = x∗ by [3]. Note that any finite collection of half-spaces has the strong CHIP [9]. It follows that {C, A(x∗ )−1 (b̄), Hi : i ∈ I} has the strong CHIP at x∗ ⇐⇒ {C, Hi : i = 1, 2, . . . , m + me } has the strong CHIP at x∗ . Thus, using Theorem DLW, we have m+m e ∗ ∗ PKL0 (x∗ ) (x) = x ⇐⇒ PC x − λi Ai (x ) = x∗ i=1 for some λi ≥ 0, i = 1, . . . , m + me , with λi (x∗ , Ai (x∗ ) − b̄i ) = 0. Consequently, (iii) holds. Finally, if (iii) holds, it follows from Corollary 3.1 that, for any x ∈ X, m PKL (x∗ ) (x) = x∗ ⇐⇒ PK (x) x∗ ⇐⇒ PC x − λi Ai (x∗ ) = x∗ 1 for some λi , i = 1, . . . , m, with λi ≥ 0 for all i ∈ I, and λi = 0 for all i ∈ / E ∪ I(x∗ ). Consequently, λi Ai (x∗ ) = x∗ PKL (x∗ ) (x) = x∗ ⇐⇒ PC x − i∈E∪I(x∗ ) for some λi , i ∈ E ∪ I(x∗ ), satisfy λi ≥ 0 for all i ∈ I(x∗ ), or equivalently, ∗ ∗ PKL (x∗ ) (x) = x ⇐⇒ PC x − λi Ai (x ) = x∗ i∈E0 NONLINEAR APPROXIMATION IN HILBERT SPACES 235 for some λi ≥ 0, i ∈ E0 . Thus, using Theorem DLW again, we know that {C, Hi , i ∈ E0 } has the strong CHIP at x∗ , and so does {C, A(x∗ )|−1 (b̄), Hi , i ∈ I(x∗ )}; i.e., (i) holds. The proof of the theorem is complete. Remark 4.1. Recall that the constraint qualification condition on span(C − x∗ ) is satisfied at x∗ if SFD(x∗ ) ∩ span(C − x∗ ) = LFD(x∗ ) ∩ span(C − x∗ ), which plays an important role in nonlinear optimization theory; see [1, 14]. Clearly, if the constraint qualification condition is satisfied at x∗ (indeed, it does if each Ai , i ∈ I(x∗ ), is linear or the Mangasarian–Fromovitz constraint qualification on span(C − x∗ ) (see [15]) is satisfied, with x∗ ∈ riC), then KS (x∗ ) = KL (x∗ ). The following proposition shows that the conditions K ⊆ KL (x∗ ) and KS (x∗ ) = KL (x∗ ) are “almost” necessary. Proposition 4.1. Suppose that the conclusion of Theorem 4.1 is valid. Suppose in addition that one of conditions (i)–(iii) holds; then K ⊆ KL (x∗ ). Moreover, if K ⊆ KS (x∗ ), in particular if K is convex, then KS (x∗ ) = KL (x∗ ). Proof. Under the assumption of Proposition 4.1, we have that, for any x ∈ X, PK (x) x∗ ⇐⇒ PKL (x∗ ) (x) = x∗ . Thus, by Theorem 3.1, K ⊆ KL (x∗ ). Moreover, if K ⊆ KS (x∗ ), we have KS (x∗ ) = KL (x∗ ) from Corollary 3.1. Now we give an example to illustrate the main theorem of this section. Example 4.1. Let X = R2 , C = {(x1 , x2 ) : (x1 − 4)2 + x22 ≤ 16}, and A1 (x) = x2 − sin x1 , A2 (x) = −x1 − x2 ∀x = (x1 , x2 ) ∈ X. For x∗ = (0, 0) we have A1 (x∗ ) = (−1, 1), A2 (x∗ ) = (−1, −1), and KL (x∗ ) = KS (x∗ ) = {(x1 , x2 ) : x2 ≤ x1 , −x1 ≤ x2 }. Let me = 0. Clearly, K ⊂ KL (x∗ ). Since intC ∩ H1 ∩ H2 = ∅, it follows from Proposition 2.3 of [10] that {C, H1 , H2 } has the strong CHIP. Then, by Theorem 4.1, for any x = (x1 , x2 ) ∈ X, PK (x) x∗ if and only if there exist λ1 , λ2 ≥ 0 such that PC (x − λ1 (−1, 1) − λ2 (−1, −1)) = x∗ . Observe that, for any y = (y1 , y2 ), PC (y) = x∗ if and only if y1 ≤ 0, y2 = 0. It follows that PK (x) x∗ if and only if x = (x1 , x2 ) satisfies that x1 + x2 ≤ 0 and x1 − x2 ≤ 0. We remark that this result can not be deduced from Theorem DLW. 5. Reformulations of convex constraints. Throughout this section, we always assume that Ai , i = 1, . . . , m, are convex continuous functions. Without loss of generality, let Ci = {x ∈ X : Ai (x) ≤ 0}, and K=C∩ m i = 1, . . . , m, Ci . i=1 We first introduce the concept of the BCQ relative to C. For convenience, in what follows, cone{∂Ai (x) : Ai (x) = 0} is understood to be 0 when Ai (x) < 0 for all i. 236 CHONG LI AND XIAO-QING JIN Definition 5.1. Let x ∈ K. The system of convex inequalities A1 (x) ≤ 0, . . . , Am (x) ≤ 0 (5.1) is said to satisfy the BCQ relative to C at x if NK (x) = NC (x) + cone{∂Ai (x) : Ai (x) = 0}. The system of convex inequalities (5.1) is said to satisfy the BCQ relative to C if it satisfies the BCQ relative to C at any x ∈ K. Remark 5.1. When C = X, the BCQ relative to C at x is just the BCQ at x considered in [12, 13]. Note that if x ∈ K and Ai (x) = 0, then cone(∂Ai (x)) ⊆ NCi (x), and the equality holds if x is not a minimizer of Ai ; see [4, Corollary 1, p. 50]. Similar to the general BCQ, we also have the following properties about the BCQ relative to C. Proposition 5.1. Let x ∈ K. The system (5.1) satisfies the BCQ relative to C at x if and only if NK (x) ⊆ NC (x) + cone{∂Ai (x) : Ai (x) = 0}. Proof. Note that NC (x) + cone{∂Ai (x) : i ∈ I(x)} ⊆ NC (x) + ⊆ NC (x) + m i=1 i∈I(x) NCi (x) NCi (x) ⊆ NK (x). The result follows. Proposition 5.2. Let x ∈ K. Suppose that the system (5.1) satisfies the BCQ relative to C at x. Then {C, C1 , . . . , Cm } has the strong CHIP at x. Definition 5.2. The system (5.1) is said to satisfy the weak Slater condition on C if there exists some x̄ ∈ (riC) ∩ K, called a weak Slater point, such that for any i, Ai is affine or Ai (x̄) < 0. Remark 5.2. When C = X, the weak Slater condition on C is just the weak Slater condition studied in [12, 13]. The following proposition is a generalization of Corollary 7 of [12]. Proposition 5.3. Suppose that the system (5.1) satisfies the weak Slater condition on C. Then it satisfies the BCQ relative to C. Proof. Let I0 = {i ∈ I : Ai is affine}, H0 = ∩i∈I / 0 Ci , and H = ∩i∈I0 Ci . From Theorem 5.1 of [10] and Proposition 2.3 of [10], it follows that {C, H} and {C ∩H, H0 } have the strong CHIP. Thus, for any x ∈ K, we have NK (x) = NC∩H (x) + NH0 (x) = NC (x) + NH (x) + NH0 (x). Observe that the system (5.1) satisfies the weak Slater condition [12]. Then Remark 5.1 implies that the system (5.1) satisfies the BCQ. Hence NH (x) + NH0 (x) = cone{∂Ai (x) : i ∈ I(x)} for {H, H0 } has the strong CHIP by Proposition 2.3 of [10]. Therefore, the system (5.1) satisfies the BCQ relative to C. The proof is complete. The following lemma isolates a condition that does not depend upon the BCQ but also still allows the computation of PK (x) via a perturbation technique. NONLINEAR APPROXIMATION IN HILBERT SPACES 237 m Lemma 5.1. Let x∗ = PC (x − 1 λi hi ) ∈ K for some hi ∈ ∂Ai (x∗ ) and λi ≥ 0 with λi = 0 for i ∈ / I(x∗ ). Then x∗ = PK (x). Proof. Since λi = 0 for all i ∈ / I(x∗ ) and x∗ = PC (x − i∈I(x∗ ) λi hi ), it follows from Proposition 2.1 that λi hi − x∗ ∈ (C − x∗ )◦ . x− i∈I(x∗ ) Hence x−x∗ ∈ (C −x∗ )◦ + λi hi ⊆ (C −x∗ )◦ +cone{∂Ai (x∗ ) : i ∈ I(x∗ )} ⊆ (K −x∗ )◦ . i∈I(x∗ ) Using Proposition 2.1 again, we have x∗ = PK (x). The main theorem of this section is stated as follows. Theorem 5.1. Let x∗ ∈ K. Then the following two statements are equivalent: (i) The system (5.1) satisfies the BCQ relative to C at x∗ . (ii) For any x ∈ X, m ∗ ∗ PK (x) = x ⇐⇒ x = PC x − λ i hi 1 for some hi ∈ ∂Ai (x∗ ) and λi ≥ 0 with λi = 0 for i ∈ / I(x∗ ). Proof. Assume that (i) holds. To show (ii), by Lemma 5.1,we need only to m prove that, for any x ∈ X, PK (x) = x∗ implies that x∗ = PC (x − 1 λi hi ) for some ∗ ∗ / I(x ). From Proposition 2.1 and (i), we hi ∈ ∂Ai (x ) and λi ≥ 0, with λi = 0 for i ∈ have x − x∗ ∈ (K − x∗ )◦ ⊆ (C − x∗ )◦ + cone{∂Ai (x∗ ) : i ∈ I(x∗ )}. Therefore, there exist hi ∈ ∂Ai (x∗ ) and λi ≥ 0 for i ∈ I(x∗ ) such that x − x∗ ∈ (C − x∗ )◦ + λ i hi . i∈I(x∗ ) That is, x− λi hi − x∗ ∈ (C − x∗ )◦ . i∈I(x∗ ) m It follows from Proposition 2.1 that x∗ = PC (x − 1 λi hi ) and (ii) holds. Conversely, assume that (ii) holds. For z ∈ (K − x∗ )◦ , let x = z + x∗ . Observe that x −x∗ ∈ (K − x∗ )◦ implies that PK (x) = x∗ . It follows from (ii) that x∗ = m / I(x∗ ). Using PC (x − 1 λi hi ) for some hi ∈ ∂Ai (x∗ ) and λi ≥ 0, with λi = 0 for i ∈ Proposition 2.1, we have z =x− m 1 ∗ λ i hi − x + m λi hi ∈ (C − x∗ )◦ + cone{∂Ai (x∗ ) : i ∈ I(x∗ )}. 1 Hence (K − x∗ )◦ ⊆ (C − x∗ )◦ + cone{∂Ai (x∗ ) : i ∈ I(x∗ )}. From Proposition 5.1, (i) holds. The proof is complete. Corollary 5.1. The following two statements are equivalent: 238 CHONG LI AND XIAO-QING JIN (i) The system of convex inequalities (5.1) satisfies the BCQ relative to C. m (ii) For any x ∈ X, x∗ ∈ K, PK (x) = x∗ ⇐⇒ x∗ = PC (x − 1 λi hi ) for some ∗ ∗ hi ∈ ∂Ai (x ) and λi ≥ 0, with λi = 0 for i ∈ / I(x ). The following corollary describes the relationship between the BCQ and the strong CHIP. Corollary 5.2. Let x∗ ∈ K. Suppose that Ai , i = 1, . . . , m, are, in addition, differentiable at x∗ . Let KS (x∗ ), KL (x∗ ), and Hi , i ∈ I(x∗ ), be defined as in the previous sections. Then the following statements are equivalent: (i) The system of convex inequalities (5.1) satisfies the BCQ relative to C at x∗ . ∗ ∗ (ii) {C, H1 , H2 , . . . , Hm } has the strong CHIP at x∗ and L (x ). mKS (x ) = K ∗ ∗ ∗ (iii) For any x ∈ X, PK (x) = x ⇐⇒ x = PC (x − 1 λi Ai (x )) for some / I(x∗ ). λi ≥ 0, with λi = 0 for all i ∈ Proof. The equivalence of (i) and (iii) is a direct consequence of Theorem 5.1; hence we need only to prove that (ii) is equivalent to (iii). Since K is convex, Theorem 4.1 gives the implication (iii) =⇒ (ii). Conversely, assume that (iii) holds. By Lemma 3.1 of [10], we have that PK (x) = x∗ =⇒ PKL (x∗ ) (x) = x∗ . Then from Corollary 3.2 it follows that KS (x∗ ) = KL (x∗ ). Again, using Theorem 4.1, we have that {C, H1 , H2 , . . . , Hm } has the strong CHIP at x∗ . The proof is complete. Finally, we give an example with nondifferentiable convex constraints. Example 5.1. Let X = l2 and C be the half-space defined by C = {x = (x1 , x2 , . . .) ∈ l2 : x1 ≤ 0}. Define A(x) = ∞ |xk | − 1 ∀x = (x1 , x2 , . . .) ∈ l2 , k=1 and take x∗ = (x∗k ) ∈ K, where x∗k = 0 1 2n for k = 2n + 1, for k = 2n, n = 0, 1, 2, . . . , n = 1, 2, . . . . Then x∗ ∈ K, A(x∗ ) = 0, and ∂A(x∗ ) = {z = (z1 , z2 , . . .) : z2n = 1, z2n+1 ∈ [−1, 1], n = 0, 1, 2, . . .}. Since the system of convex inequalities A(x) ≤ 0 satisfies the weak Slater condition on C, it satisfies the BCQ relative to C. Thus, using Theorem 5.1, we get that, for any x = (x1 , x2 , . . .) ∈ l2 , PK (x) = x∗ if and only if there exists b ≥ 0 such that x1 ≥ −b, x2n = 1 + b, 2n x2n+1 ∈ [−b, b], n = 1, 2, . . . . In fact, for any x ∈ l2 , PC (x) = x∗ if and only if x1 ≥ 0, xk = x∗k , k > 1. By Theorem 5.1, PK (x) = x∗ if and only if there exist λ ≥ 0 and t2n+1 ∈ [−1, 1] such that PC (x − λ(t1 , 1, t3 , 1, . . .)) = x∗ . From this we can deduce our desired result. NONLINEAR APPROXIMATION IN HILBERT SPACES 239 6. Concluding remark. Nonlinear best approximation problems in Hilbert spaces have been studied in this paper. As in the case of linear constraints, the strong CHIP is used to characterize the “perturbation property” of best approximations in the case of differentiable constraints. However, this is the first time that the “perturbation property” has been characterized using the generalized BCQ for convex constraints. Our main results are Theorems 4.1 and 5.1. In particular, for both differentiable and convex constraints, the equivalence of the generalized BCQ, the “perturbation property,” and the strong CHIP with the constraint qualification condition KL (x∗ ) = KS (x∗ ) has been obtained. Moreover, some examples with nonlinear constraints have been given to show that our main results genuinely generalize some recent work obtained in [9, 10] on best approximations with linear constraints. Acknowledgments. We wish to thank the referees for their valuable comments and suggestions. We wish to express our gratitude to Dr. K. F. Ng and Dr. W. Li for their careful reading of drafts of the present paper and for their helpful remarks. REFERENCES [1] M. Bazaraa, J. Goode, and C. Shetty, Constraint qualifications revisited, Management Sci., 18 (1972), pp. 567–573. [2] C. De Boor, On “best” interpolation, J. Approx. Theory, 16 (1976), pp. 28–48. [3] B. Brosowski and F. Deutsch, On some geometric properties of suns, J. Approx. Theory, 10 (1974), pp. 245–267. [4] F. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, 1983. [5] C. Chui, F. Deutsch, and J. Ward, Constrained best approximation in Hilbert space, Constr. Approx., 6 (1990), pp. 35–64. [6] C. Chui, F. Deutsch, and J. Ward, Constrained best approximation in Hilbert space II, J. Approx. Theory, 71 (1992), pp. 231–238. [7] F. Deutsch, The role of the strong conical hull intersection property in convex optimization and approximation, in Approximation Theory IX, Vol. I: Theoretical Aspects, C. Chui and L. Schumaker, eds., Vanderbilt University Press, Nashville, TN, 1998, pp. 105–112. [8] F. Deutsch, W. Li, and J. Swetits, Fenchel duality and the strong conical intersection property, J. Optim. Theory Appl., 102 (1997), pp. 681–695. [9] F. Deutsch, W. Li, and J. Ward, A dual approach to constrained interpolation from a convex subset of Hilbert space, J. Approx. Theory, 90 (1997), pp. 385–444. [10] F. Deutsch, W. Li, and J. D. Ward, Best approximation from the intersection of a closed convex set and a polyhedron in Hilbert space, weak Slater conditions, and the strong conical hull intersection property, SIAM J. Optim., 10 (1999), pp. 252–268. [11] F. Deutsch, V. Ubhaya, J. Ward, and Y. Xu, Constrained best approximation in Hilbert space III: Application to n-convex functions, Constr. Approx., 12 (1996), pp. 361–384. [12] H. Bauschke, J. Borwein, and W. Li, Strong conical hull intersection property, bounded linear regularity, Jameson’s property (G), and error bounds in convex optimization, Math. Program., 86 (1999), pp. 135–160. [13] J. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I, Grundlehren Math. Wiss. 305, Springer, New York, 1993. [14] O. Mangasarian, Nonlinear Programming, McGraw–Hill, New York, 1969. [15] O. L. Mangasarian and S. Fromovitz, The Fritz John necessary optimality conditions in the presence of equality constraints, J. Math. Anal. Appl., 17 (1967), pp. 37–47. [16] C. Micchelli, P. Smith, J. Swetits, and J. Ward, Constrained Lp -approximation, Constr. Approx., 1 (1985), pp. 93–102. [17] C. A. Micchelli and F. I. Utreras, Smoothing and interpolation in a convex subset of a Hilbert space, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 728–746. [18] I. Singer, Duality for optimization and best approximation over finite intersection, Numer. Funct. Anal. Optim., 19 (1998), pp. 903–915. [19] R. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. [20] Y. Yuan and W. Sun, Optimization Theory and Methods, Science Press, Beijing, 1997 (in Chinese).