Course Notes for Greedy Approximations Math 663-601 Th. Schlumprecht March 10, 2015 2 Contents 1 The 1.1 1.2 1.3 Threshold Algorithm Greedy and Quasi Greedy Bases . . . . . . . . . . . . . . . . . . The Haar basis is greedy in Lp [0, 1] and Lp (R) . . . . . . . . . . Quasi greedy but not unconditional . . . . . . . . . . . . . . . . . 5 5 15 17 2 Greedy Algorithms In Hilbert Space 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 30 42 3 Greedy Algorithms in general Banach Spaces 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Convergence of the Weak Dual Chebyshev Greedy Algorithm 3.3 Weak Dual Greedy Algorithm with Relaxation . . . . . . . . 3.4 Convergence Theorem for the Weak Dual Algorithm . . . . . . . . . 51 51 55 60 65 4 Open Problems 4.1 Greedy Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 73 5 Appendix A: Bases in Banach spaces 5.1 Schauder bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Markushevich bases . . . . . . . . . . . . . . . . . . . . . . . . . 75 75 79 6 Appendix B: Some facts about Lp [0, 1] and Lp (R) 6.1 The Haar basis and Wavelets . . . . . . . . . . . . . . . . . . . . 6.2 Khintchine’s inequality and Applications . . . . . . . . . . . . . . 87 87 96 3 . . . . 4 CONTENTS Signals or images are often modeled as elements of some Banach space consisting of functions, for example C(D), Lp (D), or more generally Sobolev spaces W r,p (D), for a domain D ⊂ Rd . These functions need to be “processed”: approximated, converted into an object which is storable, like a sequence of numbers, and then reconstructed. This means to find an appropriate basis of the Banach space, or more generally a dictionary and to compute as many coordinates of the given functions with respect to this basis as necessary to satisfy the given error estimates. Now the question one needs to solve, is to find the coordinates one wants to use, given a restriction on the budget. Definition 0.0.1. Let X (always) be a separable and real Banach space. We call D ⊂ SX a dictionary of X if span(D) is dense and x ∈ D implies that −x ∈ D. An approximation algorithm is a map G : X → span(D)N , x 7→ G(x) = (Gn (x)), with the property that for n ∈ N and x ∈ X, there is a set Λ(n,x) ⊂ D of cardinality at most n so that Gn (x) ∈ span(Λn ). For n ∈ N we call Gn (x) the n-term approximation of x. Usually Gn (x) is computed inductively by maximizing a certain value, therefore these algorithms are often called greedy algorithms. Remark. If X has a basis (en ) with biorthogonals (e∗n ) and G = (Pn ), where Pn is the n-th canonical projection, would be an example of an approximation algorithm. Nevertheless the point is to be able to adapt the set Λn to the vector x, and not letting it be independent. of x. The main questions are 1) Does (Gn (x)) converge to x? 2) If so, how fast does it converge? How fast does it converge for certain x? 3) How does kx − Gn (x)k compare to the best n-term approximation defined by σn (x) = σn (x, D) = inf inf kz − xk? Λ⊂D,#Λ=n z∈span(Λ) Chapter 1 The Threshold Algorithm 1.1 Greedy and Quasi Greedy Bases We start with the Threshold Algorithm: Definition 1.1.1. Let X be a separable Banach space with a normalized M basis (ei , e∗i ) : i ∈ N ; we mean by that kei k = 1, for i ∈ N) For n ∈ N and x ∈ X let Λn ⊂ N so that min |e∗i (x)| ≥ max |e∗i (x)|, i∈Λn i∈N\Λn i.e. we are reordering (e∗i (x)) into (e∗σ(i) (x)), so that |e∗σ1 (x)| ≥ |e∗σ2 (x)| ≥ |e∗σ3 (x)| ≥ . . . , and put for n ∈ N Λn = {σ1 , σ2 , . . . σn }. Then define for n ∈ N GTn (x) = X e∗i (x)ei . i∈Λn (GTn ) is called the Threshold Algorithm. Definition 1.1.2. A normalized M -basis (ei ) is called Quasi-Greedy, if for all x x = lim GTn (x). (QG) n→∞ A basis is called greedy if there is a constant C so that x − GT (x) ≤ Cσn (x), (G) where we define σn (x) = σn x, (ej ) = inf inf Λ⊂N,#Λ=n z∈span(ej :j∈Λ) 5 kz − xk. 6 CHAPTER 1. THE THRESHOLD ALGORITHM In that case we say that (ei ) is C-greedy. We call the smallest constant C for which (G) holds the greedy constant of (en ) and denote it by Cg . Remarks. Let (ei , e∗i ) : i ∈ N be a normalized M basis. 1. From the property that (en ) is fundamental we obtain that for every x ∈ X σn (x) →n→∞ 0, it follows therefore that every greedy basis is quasi greedy. P 2. If (ej ) is an unconditional basis of X, and x = ∞ i=1 ai ∈ X, then x = lim n→∞ n X aπ(j) eπ(j) , j=1 for any permutation π : N → N and thus, in particular, also for a greedy permutation, i.e. a permutation, so that |aπ(1) | ≥ |aπ(2) | ≥ |aπ(3) | . . . . Thus, an unconditional basis is always quasi-greedy. 3. Schauder bases have a special order and might be reordered so that the cease to be basis. But • unconditional bases, • M bases, • quasi greedy M -bases, • greedy bases keep their properties under any permutation, and can therefore be indexed by any countable set. 4. In order to obtain a quasi greedy M -Basis which is not a Schauder basis, one could take quasi greedy Schauder basis, which is not unconditional (its existence will be shown later), but admits a suitable reordering under which is not a Schauder basis anymore. Nevertheless, by the observations in (3), it will still be a quasi greedy M -basis. But it seems unknown whether or not there is a quasi greedy M -basis which cannot be reordered into a Schauder basis. Examples 1.1.3. 1-greedy. 1. If 1 ≤ p < ∞, then the unit vector basis (ei ) of `p is 1.1. GREEDY AND QUASI GREEDY BASES 7 2. The unit vector basis (ei ) in c0 is 1-greedy. P 3. The summing basis sn of c0 (sn = nj=1 ej ) is not quasi greedy. 4. The unit bias of (`p ⊕`q )1 is not greedy (but 1-unconditional and thus quasi greedy). P Proof. To prove (1) let x = ∞ j=1 xj ej ∈ `p , and let Λn ⊂ N be of cardinality n so that min{|xj | : j ∈ Λn } ≥ max{|xj | : j ∈ N \ Λn } P and Λ ⊂ N be any subset of cardinality n and z = zi ei ∈ `p with supp(z) = {i ∈ N : |zi | = 6 0} ⊂ Λ. Then kx − zkpp = X ≥ X |xj − zj |p + j∈Λ |xj |p j∈N\Λ |xj − zj |p + j∈Λ X |xj |p j∈N\Λn X ≥ X |xj |p = kGT (x) − xkp . j∈N\Λn Thus σn (x) = inf{kz − xkp : #supp(z) ≤ n} = kGT (x) − xkp . (2) can be shown in the same way as (1). In order to show (3) we choose sequences (εj ) ⊂ (0, 1), (nj ) ⊂ N as follows: 1 ε2j = 2−j and ε2j−1 = 2−j 1 + 3 , for j ∈ N j and j nj = j2 and Nj = n X ni for i ∈ N0 . i=1 Note that the series x= ∞ X Nj X (ε2j−1 s2i−1 − ε2j s2i ) j=1 i=Nj−1 +1 = ∞ X Nj X j=1 i=Nj−1 +1 (ε2j−1 − ε2j )s2i−1 − ε2j e2i 8 CHAPTER 1. THE THRESHOLD ALGORITHM converges, because Nj X ∞ X ε2j e2i ∈ c0 j=1 i=Nj−1 +1 and ∞ X Nj X j=1 i=Nj−1 +1 ∞ ∞ X X 1 (ε2j−1 − ε2j )s2i−1 = < ∞. nj (ε2j−1 − ε2j ) = j2 j=1 j=1 Now we compute for l ∈ N0 the vector x − GT2Nl +nl+1 (x): Nl+1 x− GT2Nl +nl+1 (x) X =− ε2l+2 s2i + ∞ X Nj X (ε2j−1 s2i−1 − ε2j s2i ). j=l+2 i=Nj−1 +1 i=Nl +1 From the monotonicity of (si ) we deduce that ∞ X Nj X ε2j−1 s2i−1 − ε2j s2i ≤ kx||. j=l+2 i=Nj−1 +1 However, Nl+1 l+1 NX X ε2l+2 = l + 1 →l→∞ ∞, ε2l+2 s2i = i=Nl +1 i=Nl +1 which implies that GTn (x) is not convergent. To show (4) assume w.l.o.g. p < q, and denote the unit vector basis of `p by (ei ) and the unit vector basis of `q by (fj ) for n ∈ N and we put x(n) = n X 1 j=1 Thus GTn (x(n)) = n X j=1 Nevertheless 2 ej + n X fj . j=1 1 fj , and thus GTn (x(n)) − x(n) = n1/p . 2 n X 1 ej = n1/q , x − 2 j=1 and since 12 n1/p /n1/q % ∞, for n % ∞, the basis {ej : j ∈ N} ∪ {fj : j ∈ N} cannot be greedy. 1.1. GREEDY AND QUASI GREEDY BASES 9 Remarks. With the arguments used in (4) Examples 1.1.3 one can show that ∞ n ∞ the usual bases of ⊕n=1 `q `p and `p (`q ) = ⊕n=1 `q `p are also not greedy but of course unconditional. Now in [BCLT] it was shown that `p ⊕ `q has up to permutation and up to isomorphic equivalence a unique unconditional basis, namely the one indicated above. Since, as it will be shown later, every greedy basis must be unconditional, the space does not have any greedy basis. n Due to a result in [DFOS] however ⊕∞ n=1 `q `p has a greedy bases if 1 < p, q < ∞. More precisely, the following was shown: Let 1 ≤ p, q ≤ ∞. n a) If 1 < q < ∞ then the Banach space (⊕∞ n=1 `p )`q has a greedy basis. n b) If q = 1 or q = ∞, and p 6= q, then (⊕∞ n=1 `p )`q has not a greedy basis. Here we take c0 -sum if q = ∞. The question whether or not `p (`q ) has a greedy basis is open and quite an interesting question. The following result by Wojtaszczyk can be seen the analogue of the characterization of Schauder bases by the uniform boundedness of the canonical projections for quasi-greedy bases. Theorem 1.1.4. [Wo2] A bounded M -basis (ei , e∗i ), with kei k = 1, i ∈ N, of a Banach space X is quasi greedy if and only if there is a constant C so that for any x ∈ X and any m ∈ N it follows that (1.1) kGTm (x)k ≤ Ckxk We call the smallest constant so that (1.1) is satisfied the Greedy Projection Constant. Remark. Theorem 1.1.4 is basically a uniform boundedness result. Nevertheless, since the GTm are nonlinear projections we need a direct proof. We need first the following Lemma: Lemma 1.1.5. Assume there is no positive number C so that kGTm (x)k ≤ Ckxk for all x ∈ X and all m ∈ N. Then the following holds: For all finite A ⊂ N all KP > 0 there is a finite B ⊂ N , which is disjoint from A and a vector x, with x = j∈B xj ej , such that kxk = 1 and kGTm (x)k ≥ K, for some m ∈ N. Proof. For a finite set F ⊂ N, define PF to be the coordinate projection onto span(ei : i ∈ F ), generated by the (e∗i ), i.e. X PF : X → span(ei : i ∈ F ), x 7→ PF (x) = e∗j (x)ej . j∈F 10 CHAPTER 1. THE THRESHOLD ALGORITHM Since there are only finitely many subsets of A we can put X X M = max kPF k = max sup e∗j (x)ej ≤ ke∗j k · kej k < ∞. F ⊂A F ⊂A x∈BX j∈F j∈A Let K1 > 1 so that (K1 −M )/(M +1) > K, and choose x1 ∈ SX ∩span(ej : j ∈ N) and k ∈ N so that so that kGTk (x1 )k ≥ K1 . We assume without loss of generality (after suitable small perturbation) that all the non zero numbers |e∗n (x1 )| are different from each other. Then let x2 = x1 − PA (x1 ), and note that kx2 k ≤ M + 1 and GTk (x1 ) = GTm (x2 ) + PF (x1 ) for some m ≤ k and F ⊂ A. Thus kGTm (x2 )k ≥ K1 − M , and if we define x3 = x2 /kx2 k, we have kGTk (x3 )k ≥ (K1 − M )/(M + 1) > K. It follows that the support B of x = x3 is disjoint from A and that kGTm (x)k > K. Proof of Theorem 1.1.4. Let b = supi ke∗i k. “⇒” Assume there is no positive number C so that kGTm (x)k ≤ Ckxk for all x ∈ X and all m ∈ N. Applying Lemma 1.1.5 we can choose recursively vectors y1 , y2 , . . . in SX ∩ span(ej : j ∈ N) and numbers mn ∈ N, so that the supportsP of the yn , which we denote by Bn , are pairwise disjoint, (Recall that for z = ∞ i=1 zi ei , we call ∗ supp(z) = {i ∈ N : ei (z) 6= 0}, the support of z) and so that kGTmn (yn )k (1.2) n n ≥2 b n−1 Y ε−1 j , j=1 where εj = min 2−j , min{|e∗i (yj )| : i ∈ Bj } . Then we let ∞ n−1 X Y x= (εj /b) yn , n=1 j=1 (which clearly converges) and write x as x= ∞ X xj ej . j=1 Since |e∗i (yj )| ≤ b, for i, j ∈ N n n n n o n−1 n o [ Y εj Y [ εj min |xi | : i ∈ Bj ≥ εn = b ≥ max |xi | : i ∈ N \ Bj . b b j=1 j=1 j=1 j=1 1.1. GREEDY AND QUASI GREEDY BASES 11 We may assume w.l.o.g. that mn ≤ #Bn , for n ∈N. Letting kj = mj + Pj−1 i=1 #Bi , it follows that GTkj (x) = j−1 Y i−1 X i=1 (εs /b) yi + GTmj ! j−1 Y (εi /b) yj . s=1 i=1 and thus by (1.2) ! j−1 i−1 XY j−1 Y T T G (x) ≥ Gm − (ε /b) kyi k ≥ 2j b, (ε /b) y s i j kj j i=1 i=1 s=1 which implies that GTkj does not converge. “⇐” Let C > 0 such that kGTm (x)k ≤ Ckxk for all m ∈ N and all x ∈ X. Let x ∈ X and assume w.l.o.g. that supp(x) is infinite. For ε > 0 choose x0 with finite support A so that kx − x0 k < ε. Using small perturbations we can assume that A ⊂ supp(x) and that A ⊂ supp(x − x0 ). We can therefore choose m ∈ N large enough so that GTm (x) and GTm (x − x0 ) are of the form GTm (x) = X e∗i (x)ei and GTm (x − x0 ) = j∈B X e∗i (x − x0 )ei j∈B with B ⊂ N such that A ⊂ B. It follows therefore that kx − GTm (x)k ≤ kx − x0 k + kx0 − GTm (x)k = kx − x0 k + kGTm (x0 − x)k ≤ (1 + C)ε, which implies our claim by choosing ε > 0 to be arbitrarily small. Definition 1.1.6. An M basis (ej , e∗j ) is called unconditional for constant coefficients if there is a positive constant C so that for all finite sets A ⊂ N and all (σn : n ∈ A) ⊂ {±1} we have X 1 X X en . en ≤ σn en ≤ C C n∈A n∈A n∈A Proposition 1.1.7. A quasi-greedy M basis (en , e∗n ) is unconditional for constant coefficients. Actually the constant in Definition 1.1.6 can be chosen to be equal to twice the projection constant in Theorem 1.1.4. Remark. We will show later that there are quasi-greedy bases which are not unconditional. Actually there are Banach spaces which do not contain any unconditional basic sequence, but in which every normalized weakly null sequence contains a quasi-greedy subsequence. 12 CHAPTER 1. THE THRESHOLD ALGORITHM Proof of Proposition 1.1.7. Let A ⊂ N be finite and (σn : n ∈ A) ⊂ {±1}. Then if we let δ ∈ (0, 1) and put m = #{j ∈ A : σj = +1} we obtain X en = GTm n∈A,σn =+1 ≤ C X (1 − δ)en X en + n∈A,σn =+1 n∈A,σn =−1 X X n∈A,σn =+1 en + (1 − δ)en . n∈A,σn =−1 By taking δ > 0 to be arbitrarily small, we obtain that X X en ≤ C en . n∈A,σn =+1 n∈A Similarly we have X n∈A,σn =−1 X en ≤ C en , n∈A and thus, X X σn en ≤ 2C en . n∈A n∈A We now present a characterization of greedy bases obtained by Konyagin and Temliakov. We need the following notation. Definition 1.1.8. We call a a normalized basic sequence democratic if there is a constant C so that for all finite E, F ⊂ N, with #E = #F it follows that X X (1.3) ej ≤ C ej j∈E j∈F In that case we call the smallest constant, so that (1.3) holds, the Constant of Democracy of (ei ) and denote it by Cd . Theorem 1.1.9. [KT1] A normalized basis (en ) is greedy if and only it is unconditional and democratic. In this case (1.4) max(Cs , Cd ) ≤ Cg ≤ Cd Cs Cu2 + Cu , where Cu is the unconditional constant and Cs is the suppression constant. Remark. The proof will show that the first inequality is sharp. Recently it was shown in [DOSZ1] that the second inequality is also sharp. 1.1. GREEDY AND QUASI GREEDY BASES 13 P ∗ Proof of Theorem ei (x)ei ∈ X, n ∈ N and let η > 0. P 1.1.9. “⇐” Let x ∗= Choose x̃ = i∈Λ∗n ai ei so that #Λn = n which is up to η the best n term approximation to x (since we allow ai to be 0, we can assume that #Λ is exactly n), i.e. kx − x̃k ≤ σn (x) + η. (1.5) Let Λn be a set of n coordinates for which b := mini∈Λn |e∗i (x)| ≥ max |e∗i (x)| and GTn (x) = i∈N\Λn X e∗i (x)ei . i∈Λn We need to show that kx − GTn (x)k ≤ (Cd Cs Cu2 + Cu )(σn (x) + η). Then x − GTn (x) = X e∗i (x)ei = X e∗i (x)ei + i∈Λ∗n \Λn i∈N\Λn But we also have X (1.6) e∗i (x)ei ≤ bCu i∈Λ∗n \Λn X i∈Λ∗n \Λn X e∗i (x)ei . i∈N\(Λ∗n ∪Λn ) ei (By Proposition 5.1.11) X ≤ bCu Cd ei \ i∈Λn Λ∗n [Note that #(Λn \ Λ∗n ) = #(Λ∗n \ Λn )] X ≤ Cu2 Cd e∗i (x)ei i∈Λn \Λ∗n [Note that |e∗i (x)| ≥ b if i ∈ Λn \ Λ∗n ] X X (e∗i (x) − ai )ei + e∗i (x)ei ≤ Cs Cu2 Cd i∈Λ∗n = Cs Cu2 Cd kx − x̃k ≤ i∈N\Λ∗n Cs Cu2 Cd (σn (x) + η) and (1.7) X i∈N\(Λ∗n ∪Λn ) X X e∗i (x)ei ≤ Cs (e∗i (x) − ai )ei + e∗i (x)ei i∈Λ∗n i∈N\Λ∗n = Cs kx − x̃k ≤ Cs (σn (x) + η). This shows that (ei ) is greedy and, since η > 0 is arbitrary, we deduce that Cg ≤ Cs Cu2 Cd + Cs . 14 CHAPTER 1. THE THRESHOLD ALGORITHM “⇒” Assume that (ei ) is greedy. In order to show that (ei ) is democratic let Λ1 , Λ2 ⊂ N with #Λ1 = #Λ2 . Let η > 0 and put m = #(Λ2 \ Λ1 ) and X X ei . x= ei + (1 + η) i∈Λ1 i∈Λ2 \Λ1 Then it follows X ei = kx − GTm (x)k i∈Λ1 ≤ Cg σm (x) (since (ei ) is Cg -greedy) X X X ≤ Cg x − ei ≤ Cg ei + (1 + η) ei . i∈Λ1 \Λ2 i∈Λ1 ∩Λ2 i∈Λ2 \Λ1 Since η > 0 can be taken arbitrary, we deduce that X X e ≤ C ei . i g i∈Λ1 i∈Λ2 Thus, it follows that (ei ) is democratic and Cd ≤ Cg . P In order to show that (ei ) is unconditional let x = e∗i (x)ei ∈ X have finite support S. Let Λ ⊂ S and put y= X i∈Λ X e∗i (x)ei + b ei , i∈S\Λ with b > maxi∈S |e∗i (x)|. For n = #(S \ Λ) it follows that X GTn (y) = b ei , i∈S\Λ and since (ei ) is greedy we deduce that (note that #supp(y − x) = n) X e∗i (x)ei = ky − GTn (y)k ≤ Cg σn (y) ≤ Cg ky − (y − x)k = Cg kxk, i∈Λ which implies that (ei ) is unconditional with Cs ≤ Cg . 1.2. THE HAAR BASIS IS GREEDY IN LP [0, 1] AND LP (R) 1.2 15 The Haar basis is greedy in Lp [0, 1] and Lp (R) Theorem 1.2.1. For 1 < p < ∞ there are two constants cp ≤ Cp , depending only on p, so that for all n ∈ N and all A ⊂ T with #A = n X (p) cp n1/p ≤ ht ≤ Cp n1/p . t∈A (p) In particular (ht )t∈T is democratic in Lp [0, 1]. With Theorem 1.1.9 and Theorem 6.1.1 we deduce that Corollary 1.2.2. The Haar Basis of Lp [0, 1], 1 < p < ∞ is greedy. The proof will follow from the following three Lemmas. Lemma 1.2.3. For any 0 < q < ∞ there is a dq > 0 so that the following holds. Let n1 < n2 < . . . nk be integers and let Ej ⊂ [0, 1] be measurable for j = 1, . . . k. Then we have k 1X Z 0 nj /q 2 k q X 1Ej (x) dx ≤ dq 2nj m(Ej ). j=1 j=1 Proof. Define f (x) = k X 2nj /q 1Ej (x). j=1 For j = 1, . . . k write Ej0 = Ej \ f (x) ≤ j X 2ni /q ≤ i=1 Sk i=j+1 Ei . nj X 2i/q = i=1 It follows that for x ∈ Ej0 2(nj +1)/q − 1 21/q 2nj /q . ≤ 1/q − 1 21/q − 1 2 | {z } 1/q dq Thus Z 1 f (x)q dx ≤ dq 0 k X 2ni m(Ei0 ) ≤ dq i=1 k X 2nj m(Ej ), j=1 which finishes the proof. Lemma 1.2.4. For 1 < p < ∞ there is a Cp > 0 so that for all n ∈ N, A ⊂ T with #A = n, and (εt ) ⊂ {−1, 1} it follows that X (p) εt ht ≤ Cp n1/p . t∈A p 16 CHAPTER 1. THE THRESHOLD ALGORITHM (p) Proof. Abbreviate ht = ht for t ∈ T . Let n1 < n2 < . . . < nk be all the integers ni for which there is a t ∈ A so that m(supp(ht )) = 2−ni . For j = 1, . . . k put [ supp(h(i,nj ) ). Ej = i∈{0,1,...2nj −1},(nj ,i)∈A Since m(Ej ) = 2−nj #{i ∈ {0, 1, . . . 2nj − 1}, (nj , i) ∈ A} and thus #{i ∈ {0, 1, . . . 2nj − 1}, (nj , i) ∈ A} = 2nj m(Ej ). It follows therefore that (P Pk k nj nj − 1}, (n , i) ∈ A} = j j=1 2 m(Ej ) if 0 6∈ A j=1 #{i ∈ {0, 1, . . . 2 n= Pk if 0 ∈ A. 1 + j=1 2nj m(Ej ) Assume without loss of generality that 0 6∈ A. It follows that "Z #1/p " k #1/p k X ip 1hX X 1/p εt ht = 2nj /p 1Ej dx ≤ d1/p = d1/p . 2nj m(Ej ) p p n t∈A p 0 j=1 j=1 [dp as in Lemma 1.2.3] Lemma 1.2.5. For 1 < p < ∞ there is a cp > 0 so that for all n ∈ N, A ⊂ T with #A = n, and (εt ) ⊂ {−1, 1} it follows that X (p) 1/p ε h . t t ≥ cp n t∈A Proof. Note that for 1 < p, q < ∞ with p 1 p + 1 q and s, t ∈ T it follows that (p) hht , h(q) s i = δ(t, s), (p) thus the claim follows from the fact that the ht ’s are normalized in Lp [0, 1] and by Lemma 1.2.4 using the duality between Lp [0, 1] and Lq [0, 1]. Indeed, * + P (q) X X εt ht (p) (p) t∈A εt h t ≥ εt h t , P (q) ε h t∈A t∈A t∈A t t n n1/p ≥ = , P (q) cq t∈A εt ht where cq is chosen like in Lemma 1.2.5. Our claim follows therefore bu letting Cp = 1/cq . 1.3. QUASI GREEDY BUT NOT UNCONDITIONAL 1.3 17 A quasi greedy basis of Lp [0, 1] which is not unconditional In this section we make the general assumption on a separable Banach space X, that X has a normalized basis (en ) which is Besselian meaning that for some constant CB ∞ ∞ X 1/2 1 X ∗ ∗ (1.8) kxk = ej (x)ej ≥ |ej (x)|2 for all x ∈ X. CB j=1 j=1 where (e∗j ) denote the coordinate functionals for (ej ) We secondly assume that (ej ) has a subsequence (emj : j ∈ N) which is Hilbertian which means that for some constant CH ∞ ∞ X X 1/2 ∗ (1.9) emj (x)emj ≤ CH for all x ∈ span(emj : j ∈ N). |e∗mj (x)|2 j=1 j=1 Example 1.3.1. An example for such a basic sequence are the trigonometrically polynomial (tn : n ∈ Z) in Lp [0, 1] with p > 2. Indeed, for (an : |n| ≤ N ) ⊂ C it follows from Hölder’s (or Jensen’s) inequality that !1/p !1/2 Z 1 X Z 1 X N N N p 2 X 1/2 inξ/2π inξ/2π aj e ≥ aj e = |aj |2 . dξ dξ 0 0 n=−N n=−N n=−N Secondly it follows from the complex version of Khintchine’ s inequality (Theorem 6.2.4) that the subsequence (t2n : n ∈ N) of the trigonometric polynomials is equivalent to the `2 -unit vector basis. (n) We recall the 2n by 2n matrices A(n) = (a(i,j) : 1 ≤ i, j ≤ 2n ), for n ∈ N, which were introduced in Section 5.2. Let us recall the following two properties which we will need here: n (1.10) A(n) is unitary operator on `22 , and (1.11) a(j,1) = 2−n/2 . (n) (k) k k) For k ∈ N we put nk = 22 and B (k) = (bi,j : 1 ≤ i, j ≤ nk ) = A(2 `n2 k ), for k ∈ N which implies that nk+1 = n2k . We let (acting on (hj : j ∈ N) = (emj : j ∈ N) and (fi : i ∈ N) = (es : s ∈ N \ {mi : i ∈ N}), so that if fi = es and fj = et then then i < j if and only if s < t. For k ∈ N we (k) define a family (gj : j = 1, 2, . . . nk ) as follows (k) (k) g1 = fk and gi = hSk−1 +i−1 , for i = 2, 3, . . . nk , 18 CHAPTER 1. THE THRESHOLD ALGORITHM (k) where S0 = 0, and, inductively, Sj = Sj−1 + nj − 1. If we order (gj 1, 2, . . . nk ) lexicographically we note that the sequence (1) (1) (2) (3) g1 , g2 , . . . gn(1) , g1 , . . . , gn(2) , g1 , , . . . 1 2 : k ∈ N, j = is equal to the sequence f1 , h1 , h2 , . . . hn1 −1 , f2 , hn1 , . . . hn2 −2 , f3 , . . . . (k) Then we define for k ∈ N a new system of elements (ψj : j = 1, 2 . . . nk ), by (k) (k) ψ1 g (k) 1(k) ψ2 g . = B (k) ◦ 2. . . . . (1.12) (k) (k) ψnk gnk or, in other words, (k) ψi = nk X (k) (k) b(i,j) gj for i = 1, 2 . . . nk . j=1 Our goal is now to prove the following result (k) Theorem 1.3.2. Ordered lexicographically, the system (ψj is a quasi-greedy basis of X. (k) Proposition 1.3.3. Ordered lexicographically, (gj Besselian basis of X. (k) Proof. Given that (gj : k ∈ N, j = 1, 2 . . . nk ) : k ∈ N, j = 1, 2 . . . nk ) is a : k ∈ N, j = 1, 2 . . . nk ) is a reordering of (ej ), which was (k) assumed to be a Besselian basis of X, we only need to show that (gj : k ∈ N, j = 1, 2 . . . nk ) is a basic sequence. To do so we need to show that there is a constant C ≥ 1 so that for all N ∈ N, (k) all M ∈ {1, 2 . . . nM } and all (cj : k ∈ N, j = 1, 2 . . . nk ), with (k) #{(k, j) : k ∈ N, j = 1, 2 . . . nk , cj 6= 0} being finite, it follows (1.13) nk nk −1 X M ∞ X NX X X (k) (k) (N ) (N ) (k) (k) cj gj + cj gj ≤ C cj gj . k=1 j=1 j=1 k=1 j=1 1.3. QUASI GREEDY BUT NOT UNCONDITIONAL (k) Since the gi 19 are a reordering of the original basis (ej ) we can write x= nk ∞ X X (k) (k) cj gj as x = k=1 j=1 (k) ∞ X ci ei , i=1 (k) where ci = cj if ei = gj (and for each i ∈ N there is exactly one such choice of k and j ∈ {1, 2 . . . nk }). From (1.8) and (1.9) we deduce that nk nk −1 X −1 X M M NX NX 1/2 X X (k) (k) (k) (N ) (N ) (N ) (1.14) cj gj + |cj |2 + cj gj ≤ CH |cj |2 k=1 j=2 k=1 j=2 j=2 ≤ CH ∞ X j=2 |cj |2 1/2 ≤ CH CB kxk. j=1 (k) Since gj = fj = esj , where (sj ) which consists of the elements of N \ {mj : j ∈ N}, ordered increasingly it follows that we can write N X (k) (k) c1 g1 = sN X j=1 k=1 X cj ej − X ci ei = i∈{1,2,...sN }\{sj :j≤N } (k) (k) cj gj , (k,j)∈A for some set A ⊂ {(k, j) : k ∈ N, j = 2, 3 . . . nk }. If Ce is the basis constant of (ej ) we deduce therefore that sN X cj ej ≤ Ce kxk, j=1 and thus, using (1.14), sN N X X X (k) (k) (k) (k) c1 g1 ≤ cj ej + cj gj ≤ (Ce + CB CH )kxk. j=1 k=1 (k,j)∈A This implies that nk −1 X M NX X (k) (k) (N ) (N ) cj gj + cj gj k=1 j=1 j=1 nk N −1 X M X NX X (k) (k) (k) (k) (N ) (N ) ≤ c1 g1 + cj gj + cj gj k=1 k=1 j=2 j=2 ≤ (Ce + CB CH )kxk + CB CH kxk which implies our claim with C = Ce + 2CB CH . (k) Proposition 1.3.4. Under the lexicographical order, (ψj Besselian basis of X with the same constant CB . : j = 1, 2 . . . nk ) is a 20 CHAPTER 1. THE THRESHOLD ALGORITHM Proof. We first note that for k ∈ N (k) Xk = span(ψj (k) : j = 1, 2 . . . nk ) = span(gj : j = 1, 2 . . . nk ) (k) (k) and thus it follows that (ψj : k ∈ N, j = 1.2, . . . nk ) spans as (gj 1.2, . . . nk ) a dense subspace of X. Secondly we observe that if (k) (dj : k ∈ N, j = : k ∈ N, j = 1, 2 . . . nk ) ⊂ K with (k) #{(k, j) : k ∈ N, j = 1, 2 . . . nk , dj and let x= nk ∞ X X (k) 6= 0} (k) dj Ψj k=1 j=1 or in g-coordinates: x= nk ∞ X X (k) (k) cj gj . k=1 j=1 We write x = P∞ k=1 xk with xk = nk X (k) (N ) dj ψj = j=1 nk X (k) (k) cj gj . j=1 Since xk = nk X (k) (k) di ψi i=1 = nk X (k) di i=1 nk X (k) (k) b(i,j) gj j=1 = nk X (k) gj j=1 nk X (k) (k) b(i,j) di = i=1 nk X (k) (k) cj gj j=1 this means that (k) (ci (k) : j = 1, 2 . . . nk ) = (B (k) )−1 (di : j = 1, 2 . . . nk ) or (k) (di (k) : j = 1, 2 . . . nk ) = (B (k) )(ci : j = 1, 2 . . . nk ). If we project x to its first, say L, coordinates in the lexicographical order of P −1 (k) (Ψj : k ∈ N, j = 1, . . . kn ), for N ∈ N and M ≤ nN , so that L = N k=1 kn + M , this projected vector equals to: nk N −1 X X k=1 j=1 (k) (k) dj ψj + M X j=1 (N ) (N ) dj ψj = nk N −1 X X k=1 j=2 (k) (k) cj gj + M X j=1 (N ) dj (N ) ψj . 1.3. QUASI GREEDY BUT NOT UNCONDITIONAL 21 Therefore we only need to show that there is a constant C ≥ 1 so that for all k and all M ≤ nk nk M X X (k) (k) (k) (k) dj ψj dj ψj ≤ C (1.15) j=1 j=1 (k) and that (ψj ) is Besselian. It follows from the assumption that the matrices B (k) are unitary and Proposition 1.3.4 that ∞ ∞ nk ∞ nk X 1/2 1/2 1 XX 1 XX (k) (k) kxk = xk ≥ |cj |2 |dj |2 = , CB CB k=1 j=1 k=1 k=1 j=1 (k) which proves that (ψj ) is Besselian. Secondly we note that (1.11) yields nk M M X X X (k) (k) (k) (k) (k) d ψ = d b g i i i (i,j) j i=1 i=1 ≤ M X j=1 (k) −1/2 |di |nk nk M X X (k) (k) (k) (k) kg1 k + di b(i,j) gj i=1 ≤ M X i=1 (k) 1/2 (k) 1/2 (k) 1/2 |di |2 i=1 ≤ M X j=2 |di |2 ≤ i=1 nk X M X (k) (k) 2 1/2 + CH di b(i,j) i=1 nk X j=2 |di |2 j=2 nk M X X (k) (k) (k) + gj di b(i,j) + CH nk X i=1 i=1 (k) |di |2 1/2 (By (1.10)) i=1 Therefore applying (1.10) and then (1.8) it follows that nk M X X 1/2 (k) (k) (k) di ψi ≤ (1 + CH ) |di |2 i=1 i=1 = (1 + CH ) nk X (k) |ci |2 1/2 ≤ (1 + CH )CB kxk k i=1 which proves our claim. Our last step of proving Theorem 1.3.2 is the following 22 CHAPTER 1. THE THRESHOLD ALGORITHM (k) Proposition 1.3.5. (ψj : j = 1, 2 . . . nk ) is quasi-greedy. Proof. Let x= nk ∞ X X (k) (k) di ψi ∈ X, k=1 i=1 with kxk = 1 and suppose that the m-th greedy approximate is given by X X (k) (k) GTm (x, Ψ) = di ψi , k∈J i∈Ik P where m = k∈J #Ik . We need to show that there is a constant C ≥ 1 (of course not dependent on x and m) so that kGTm (x, Ψ)k ≤ Ckxk (1.16) We write GTm (x, Ψ) as GTm (x, Ψ) = XX (k) (k) di (ψi (k) − b(i,1) fk ) + k∈J i∈Ik | XX k∈J i∈Ik {z } Σ1 | (k) {z Σ2 (k) (recall that g1 = fk ). From the definition of the ψj Σ1 = XX (k) di k∈J i∈Ik nk X (k) (k) di b(i,1) fk . (k) (k) b(i,j) gj j=2 = nk XX we get that (k) gj } X k∈J j=2 (k) (k) di b(i,j) , i∈Ik (k) which yields by the choice of the gj , properties (1.9), and (1.10) that (1.17) Σ1 ≤ CH nk X XX (k) (k) 2 d b i (i,j) !1/2 k∈J j=2 i∈Ik !1/2 = CH X B (k) −1 ◦ (d(k) : i ∈ Ik )2 i 2 k∈J !1/2 = CH X (k) (d : i ∈ Ik )2 i 2 (By (1.10)) k∈J ≤ CH CB kxk (By Proposition (1.3.4)). In order to estimate Σ2 we split Ik , k ∈ N into the following subsets: (1) (k) Ik = i ∈ Ik : |di | ≤ n−1 k } (2) (k) −1/2 Ik = i ∈ Ik : |di | ≥ nk 1.3. QUASI GREEDY BUT NOT UNCONDITIONAL 23 (3) (k) −1/2 Ik = i ∈ Ik : n−1 k < |di | < nk and let (s) Σ2 = (k) (k) X X di b(i,1) fk for s = 1, 2, 3. k∈J i∈I (s) k (1) From the definition of Ik and 1.11 it follows that X (k) (k) −1/2 d b . i (i,1) ≤ nk (1) i∈Ik and thus (1) X −1/2 Σ ≤ nk ≤ 1. 2 (1.18) k∈J (2) (2) In order to estimate Σ2 we we first note that the definition of Ik yields that (2) (#Ik )n−1 k ≤ X (k) |di |2 ≤ nk X (k) |di |2 , i=1 (2) i∈Ik and, thus, (2) X X (k) (k) (1.19) Σ2 = di b(i,1) fk k∈J i∈I (2) k ≤ X −1/2 nk k∈J ≤ X (k) |di | (By (1.11)) (2) i∈Ik X −1/2 nk (2) (#Ik )1/2 X k∈J ≤ (k) |di |2 1/2 (By Hölder’s inequality) (2) i∈Ik X X (k) |di |2 k∈J i∈I (2) k 2 2 ≤ CB kxk2 = CB (By Proposition (1.3.4)). (3) Finally we have to estimate Σ2 . Before that let us make some observations: We first note that in the estimation of kΣ1 k we did not use specific properties of the sets Ik . Replacing in the estimation of kΣ1 k the sets Ik by any set Ik0 ⊂ {1, 2 . . . nk } in (1.17) and J by any set J 0 ⊂ N we obtain (1.20) nk XX X (k) (k) (k) di b(i,j) gj ≤ CH CB kxk k∈J 0 i∈Ik0 j=2 24 CHAPTER 1. THE THRESHOLD ALGORITHM Taking Ik0 to be all of {1, 2 . . . nk } and J 0 = [1, K] for some K ∈ N we deduce from Proposition 1.3.5 that (1.21) nk K X X (k) (k) di b(i,1) fk k=1 i=1 nk nk K X K X X X (k) (k) (k) (k) (k) di (ψi −b(i,1) fk ) di ψi + ≤ k=1 i=1 k=1 i=1 ≤ CΨ kxk + CH CB kxk = CΨ + CH CB , (k) where CΨ denotes the basis constant of (ψj ; k ∈ N, j = 1, 2 . . . nk ). Secondly we (1) note that in the estimation of kΣk k we could replace the set J by any subset (1) J 0 ⊂ N and Ik by any subset (1) (k) Ik0 ⊂ Ik = i ≤ nk : |di | ≤ n−1 k } , to obtain XX (k) (k) di b(i,1) fk ≤ 1. (1.22) k∈J 0 i∈Ik0 (2) Thirdly we note that in the estimation of Σ2 in (1.19) we could have also (2) replaced J by any subset of N, and for k ∈ N the set Ik by any subset (2) (k) −1/2 Ik0 ⊂ Ik = i ∈ Ik : |di | ≥ nk to obtain XX (k) (k) 2 di b(i,1) fk ≤ CB . (1.23) k∈J 0 i∈Ik0 (3) In order to estimate the kΣ2 k we define (3) K = max{k ∈ J : Ik 6= ∅}, (3) (k) −1/2 which means that for some i ∈ Ik it follows that |di | < nK and note that for any k ∈ [1, K − 1] either k ∈ J or (here we use the first time that we are dealing with the threshold algorithm) k 6∈ J, which implies (1.24) (k) −1/2 |di | < nK ≤ n−1 k for all i ∈ {1, 2, . . . nk } (3) (here we are using that nk+1 = n2k ) and thus for such a k the sets Ik are empty. (2) and Ik 1.3. QUASI GREEDY BUT NOT UNCONDITIONAL We compute now X (K) (K) (3) di b(i,1) fK + Σ2 = X (k) (k) X di b(i,1) fk k∈J,j<K i∈I (3) (3) i∈IK = X 25 k (K) (K) b(i,1) fK di + nj K−1 XX (k) (k) di b(i,1) fk k=1 i=1 (3) i∈IK − K−1 X X (k) (k) X di b(i,1) fk − k=1 i∈I (1) k X (k) (k) di b(i,1) fk k∈J,k<K i∈I (2) k The first term we estimate, using Hölder’s inequality: nK NK X X X (K) (K) 2 1/2 (K) (K) −1/2 d ≤ n−1/2 n1/2 d di b(i,1) fK ≤ nK ≤ CB . i i K K (3) i∈IK j=1 j=1 It follows therefore from (1.21), (1.22) and (1.23) (3) 2 Σ ≤ CB + CΨ + CH CB + 1 + CB 2 2. which implies our claim letting C = CB + CΨ + CH CB + 1 + CB Corollary 1.3.6. Apply Theorem 1.3.2 to the trigonometrical polynomials (tn )= (e−i2πn(·) : n ∈ Z) which are a basis of Lp [0, 1] and satisfy by Example 1.3.1 the assumptions if p > 2. This leads to a quasi greedy basis (Ψn : n ∈ N) of Lp [0, 1]. Secondly note since (tn ) is absolutely bounded by 1(in L∞ [0, 1]), and since the matrices B (k) , which where used in the construction of the basis (Ψn ) are uniformly bounded as linear operators on `n∞k , it follows that also (Ψn : n ∈ N) is bounded L∞ [0, 1]. This implies by Corollary 6.2.7 that (Ψn ) cannot be unconditional. 26 CHAPTER 1. THE THRESHOLD ALGORITHM Chapter 2 Greedy Algorithms In Hilbert Space 2.1 Introduction We will now replace in our greedy algorithms, bases by more general and possibly redundant systems. Let H (always) be a separable and real Hilbert space. Recall that D ⊂ SH is a dictionary of X if span(D) is dense and x ∈ D implies that −x ∈ D. An n-term approximation algorithm is a map G : H → span(D)N , x 7→ G(x) = (Gn (x)), with the property that for n ∈ N and x ∈ H, there is a set Λn ⊂ D of cardinality at most n so that Gn (x) ∈ span(Λn ), Gn (x) is then called an n-term approximation of x. Perhaps the first example was considered by Schmidt [Schm]: Example 2.1.1. [Schm] Let f ∈ L2 [0, 1]2 , i.e. f is a square integrable function in two variables. By the Theorem of Arcela and Ascoli we know that the set D= n nX o uj ⊗ vj : n ∈ N, ui , vj ∈ C[0, 1] j=1 is dense in C [0, 1]2 . Here we denote for two functions f, g : [0, 1] → K f ⊗ g : [0, 1]2 → K, (x, y) 7→ f (x)g(y). Since C [0, 1]2 is dense in L2 [0, 1]2 it follows that D= n nX o uj ⊗ vj : n ∈ N, ui , vj ∈ L2 [0, 1] j=1 27 28 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE is dense in L2 [0, 1]2 . The question is now, how to find a good approximate to f from D. E. Schmidt considered the following procedure and showed that it worked: Let f ∈ L2 [0, 1]2 and define f0 = f . Then choose u1 , v1 ∈ L2 [0, 1] so that kf0 − u1 ⊗ v1 k2 = inf kf0 − u ⊗ vk2 : u, v ∈ L2 [01]}. Since this infimum might be hard to achieve he also considered a weaker condition, and fixed some weakening factor t ∈ (0, 1) and chose u1 , v1 ∈ L2 [0, 1] so that 1 kf0 − u1 ⊗ v1 k2 ≤ inf kf0 − u ⊗ vk2 : u, v ∈ L2 [01] . t Then he let f1 = f0 − u1 ⊗ v1 . After n steps he obtained u1 , v1 , u2 , v2 , . . . un , vm ∈ L2 [0, 1], and let fn = f − n X uj ⊗ vj , j=1 and chose un+1 and vn+1 in L2 [0, 1] so that kfn − un+1 ⊗ vn+1 k = kf0 − n+1 X uj ⊗ vj k2 ≤ j=1 1 inf kfn − u ⊗ vk2 : u, v ∈ L2 [01] . t P Finally he proved that fn converges in L2 [0, 1]2 to 0 and thus Gn (f ) = n+1 j=1 uj ⊗ vj converges to f . He asked whether there is some general principle behind, and how and whether this generalizes.. (PGA) The Pure Greedy Algorithm. 1) For x ∈ H we define Gn = Gn (x), for each n ∈ N0 , by induction. G0 = 0 and assuming that G0 , G1 . . . Gn−1 , have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D and an ∈ R so that kx − Gn−1 − an zn k = 2) Put Gn = Gn−1 + an zn . Note that for any x ∈ H it follows that (2.1) inf kx − azk2 z∈D,a∈R inf z∈D,a∈R kx − Gn−1 − azk. 2.1. INTRODUCTION 29 inf kxk2 − 2ahx, zi + a2 kzk2 ] z∈D,a∈R = inf kxk2 − hx, zi2 ] = z∈D [a 7→ kxk2 −2ahx, zi+a2 kzk2 is minimal for a = hx, zi] = kxk2 − suphx, zi2 . z∈D So condition (1) in (PGA) can be replaced by the following condition (1’) 1’) Choose zn ∈ D so that hx − Gn−1 , zn i = suphx − Gn−1 , zi z∈D 2’) and (2) by Put Gn = Gn−1 + hx − Gn−1 , zn izn . As already noted in Example 2.1.1, the “sup” in (1’) (PGA), respectively the “inf” in (1) might not be attained or might be hard to attain. In this case we might consider the following modification. (WPGA) The Weak Pure Greedy Algorithm. We are given a sequence τ = (tn ) ⊂ (0, 1). For x ∈ X we define Gn = Gn (x), for each n ∈ N0 , by induction. G0 = 0 and assuming that G0 , G1 . . . Gn−1 , have been defined for some n ∈ N we proceed as follows: 1) Choose zn ∈ D, so that hx − Gn−1 , zn i ≥ tn suphx − Gn−1 , zi z∈D 2) Put Gn = Gn−1 + hx − Gn−1 , zn izn . For WPGA we call the sequence (tn ) the weakness factors. A possibly faster (but computational more laborious) algorithm is the following Orthogonal Greedy Algorithm. (OGA) 1) The Orthogonal Greedy Algorithm. For x ∈ H we define Gon = Gon (x), for each n ∈ N0 , by induction. Go0 = 0 and assuming that Go0 , Go1 . . . Gon−1 , and vectors z1 . . . zn−1 have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D so that hx − Gon−1 , zn i = suphx − Gon−1 , zi z∈D 2) Define Zn = span(z1 , z2 . . . zn )) and let Gon−1 be the best approximation of x to Zn , i.e. kGon−1 − xk = inf kz − xk : z ∈ Zn }, 30 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE which means that Gn−1 = PZn (x), where PZn denotes the orthonormal projection of H onto Zn . (GAR) The Greedy Algorithm with free Relaxation. 1) For x ∈ H we define Grn = Grn (x), for each n ∈ N0 , by induction. Gr0 = 0 and assuming that Gr0 , Gr1 . . . Grn−1 , have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D so that hx − Grn−1 , zn i = suphx − Grn−1 , zi z∈D 2) Put Grn = an Grn−1 + bn zn , where Grn is best approximation of x by an element of the two dimensional space span(Grn−1 , zn ). (GAFR) The Greedy Algorithm with fixed Relaxation. 1) Let c > 0. For x ∈ H we define Gfn = Gfn (x), for each n ∈ N0 , by induction. Gf0 = 0 and assuming that Gf0 , Gf1 . . . Gfn−1 , have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D so that hx − Gfn−1 , zn i = suphx − Gfn−1 , zi z∈D 2) Put Gfn = c 1 − n1 Gfn−1 + nc zn , Similar to the weak purely greedy algorithm there are also weak versions of the orthogonal greedy algorithm and the pure greedy algorithm with relaxation and We denote them by WOGA, WGAR and WGAFR. 2.2 Convergence Proposition 2.2.1. Assume that we consider the WPGA, WOGA or WGAR and assume for the weakness factors (tn ) that (2.2) X t2k = ∞. k∈N For x we let xn = x − Gn (x), xn = x − Gon (x) or xn = x − Gr (x), respectively. If the sequence (xn ) converges it converges to 0. 2.2. CONVERGENCE 31 Proof. Assume that xn converges to some u ∈ H and u 6= 0. Then, since D is a dictionary, there is a d ∈ D so that δ = hd, ui > 0 and thus we find a large enough N ∈ N so that hd, xn i ≥ δ/2, for all n > N In the case that we consider WPGA we obtain for n ≥ N 2 kxn+1 k2 = xn − hzn+1 , xn izn+1 = kxn k2 − hzn+1 , xn i2 ≤ kxn k2 − t2n+1 δ 2 /4. and thus for k = 1, 2, 3 . . . 2 2 kxN k − kxN +k k = NX +k−1 2 2 kxj k − kxj+1 k ≥ j=N N +k X t2j+1 δ 2 /4 →N →∞ ∞. j=N But this is a contradiction. In the case of the WOGA we similarly have for n ≥ N n+1 n X kxn+1 k2 = min x − aj zj : a1 , a2 , . . . an+1 ∈ R j=1 2 ≤ xn − hzn+1 , xn izn+1 = kxn k2 − hzn+1 , xn i2 ≤ t2n+1 δ 2 /4 and we obtain a contradiction as in the WPGA case. Similarly in the case we consider the WGAR we estimate: n kxn+1 k2 = min x − aGrn (x) − bzn+1 : a, b ∈ R 2 ≤ x − Grn (x) − hzn+1 , xn izn+1 = kxn k2 − hzn+1 , xn i2 ≤ kxn k2 − t2n+1 δ 2 /4. Theorem 2.2.2. Assume that condition (2.2) of Proposition 2.2.1 holds. Then (Gon (x) : n ∈ N) (as defined in WOGA) converges for all x ∈ H to x. Proof. Let x ∈ H. For n ∈ N let Zn be the space defined in WOGA. Gon (x) = o PZn (x). S Since Z1 ⊂ Z2 ⊂ Z3 . . . it follows that Gn (x) converges to PZ (x), where Z = n∈N Zn . Thus the claim follows from Proposition 2.2.1 Theorem 2.2.3. Assume that the sequence (tk ) ⊂ (0, 1) satisfies (2.3) X tk k∈N k = ∞. For x ∈ X consider the WPGA (Gn (x)) with weakness factors (tn ). Then (Gn (x)) converges. 32 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE Remark. Since by Hölder’s inequality ∞ ∞ X 1/2 X 1 1/2 2 , ≤ tk k k ∞ X tk k=1 k=1 P condition 2.3 implies that 2 k∈N tk k=1 = ∞. We will need the following Lemma first Lemma 2.2.4. Assume y = (yj ) ∈ `2 and (tk ) ⊂ (0, 1) satisfies (2.3) . Then n lim inf n→∞ |yn | X |yj | = 0. tn j=1 Proof. (an alternate, and shorter proof due to Sheng Zhang will be given below) We will prove the following claim: Claim. If f ∈ L2 [0, ∞] and we define Z x F (x) = |f (t)| dt 0 then ∞ Z (2.4) 0 F 2 (x) dx ≤ 4 x2 Z n=1 n |yj | i2 ≤ ∞ h n−1 X 1X f 2 (x) dx. 0 If we apply the claim to the function f (·) = n ∞ h X X 1 ∞ P∞ j=1 1(j−1,j] |yj |, it follows that ∞ i2 X |yj | + |yj |2 n n=1 j=1 " #2 Z ∞ Z x ∞ X 1 ≤ f (t) dt dx + |yj |2 x 0 0 n=1 Z ∞ ∞ ∞ X X ≤4 f 2 (t) dt + |yj |2 = 5 |yj |2 n=1 j=1 0 n=1 n=1 It follows therefore from the Cauchy Schwarz inequality that " ∞ #1/2 " ∞ # ∞ n ∞ n n i2 1/2 X X X Xh1 X tn |yn | X 1X <∞ |yj | ≤ |yn | |yj | ≤ |yj |2 |yj | n tn n n n=1,tn 6=0 since j=1 n=1 n=1 j=1 ∞ X tn n=1 n =∞ n=1 j=1 2.2. CONVERGENCE 33 it follows that n |yn | X |yj | = 0. tn lim inf n→∞ j=1 In order to prove the claim we can assume that f (x) is a positive function, we note first that by Hölder’s inequality, Z x Z x 1/2 f 2 (t) dt, F (x) = f (t) dt ≤ x 0 0 and thus F (x) ≤ x1/2 (2.5) x Z f 2 (t) dt →x→0 0 0 For a fixed x0 > 0 we also deduce from Hölder’s inequality for x > x0 that Z x Z x Z ∞ F (x) − F (x0 ) = f (t) dt ≤ (x − x0 )1/2 f 2 (t) dt ≤ x1/2 f 2 (t) dt, x0 x0 and thus F (x0 ) F (x) ≤ 1/2 + 1/2 x x Z ∞ f 2 (t) dt. x0 By choosing for a given ε > 0 x0 far enough out so that −1/2 then x1 > x0 so that x1 x0 R∞ x0 f 2 (t) dt < ε/2 and F (x0 ) < ε/2, it follows that F (x) < ε whenever x > x1 , x1/2 and thus (2.6) F (x) ≤ x1/2 Z x f 2 (t) dt →x→∞ 0. 0 Using integration by parts, it follows for any 0 < a < b < ∞ that Z a b Z b F 2 (x) b F 2 (x) dx = − +2 F (x)f (x)x−1 dx x2 x x=a a " #2 " #2 !1/2 Z b 2 F (a) F (b) F (x) ≤ + +2 x2 a1/2 b1/2 a [By Hölder’s inequality] Z !1/2 b 2 f (x) dx a 34 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE and thus, in case that a is chosen small enough and b large enough so that F (x) does not a.e. vanish on [a, b], we have Z b a F 2 (x) x2 !1/2 " h F (a) i2 h F (b) i2 + 1/2 ≤ a1/2 b # Z b a F 2 (x) x2 !−1/2 Z +2 !1/2 b f 2 (x) dx a Our claim follows now by letting a → 0 and b → ∞ Proof by Sheng Zhang. Suppose, to the contrary that n δ = lim inf n→∞ |yn | X |y + j| > 0, tn j=1 and, thus for some n0 ∈ N n |yn | X |y + j| > δ/2 whenever n ≥ n0 . tn j=1 For n ≥ n0 , we deduce fro Hölders’s inequality n n δ 1 X 1 X < |yn ||yj | ≤ |yj |2 n|yn |2 , 2 tn tn j=1 j=1 and thus |yn | tn n ≥ δ 1 Pn , 2 j=1 |yj |2 which yields lim inf n→∞ |yn | tn n ≥ δ 1 P∞ 2 =: ε 2 j=1 yj 2 Thus there is an n> ≥ n0 , so that for all n ≥ nP 1 , |yn | ≥ εtn /2n. But this ∞ contradicts the assumption that y = (yn ) ∈ `2 and n=1 tn /n = ∞. Proof of Theorem 2.2.3. Let x ∈ H and put for n ∈ N, Gn = Gn (x) with Gn (x) = n X hx − Gj−1 , zj izj , j=1 where zn ∈ D satisfies (2.7) hzn , x − Gn−1 i = hzn , xn−1 i ≥ tn suphz, xn−1 i. z∈D . 2.2. CONVERGENCE 35 Define (2.8) n X xn = x − Gn (x) = x − hzj , x − Gj−1 izj = xn−1 − hzn , x − Gn−1 izn , j=1 By induction we show that for every n ∈ N kxn k2 = kxk2 − (2.9) n X hzj , xj−1 i2 . j=1 Indeed, for n = 1 the claim is clear and assuming that (2.9) is true for n ∈ N we compute kxn+1 k2 = kxn − hxn , zn+1 izn+1 k2 = kxn k2 − hxn , zn+1 ikzn+1 k2 = kxk2 − n+1 X hzj , xj−1 i2 . j=1 It follows therefore from (2.9) that ∞ X (2.10) hzj , xj−1 i2 ≤ kxk2 j=1 For m < n we compute (2.11) kxn − xm k2 = kxm k2 − kxn k2 − 2hxm − xn , xn i, and n X hxm − xn , xn i = hxj−1 − xj , xn i ≤ = j=m+1 n X hxj−1 − xj , xn i j=m+1 n X hzj , xn i · hzj , xj−1 i j=m+1 n hzn+1 , xn i X hzj , xj−1 i ≤ tn+1 j=m+1 h i hzj , xn i ≤ max hd, xn i ≤ t−1 hzn+1 , xn i n+1 d∈D hzn+1 , xn i n+1 X hzj , xj−1 i. ≤ tn+1 j=1 36 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE We can therefore apply Lemma 2.2.4 to (tn ) and yn = hzn+1 , xn i, for n ∈ N, and deduce that lim inf max hxm − xn , xn i = 0. n→∞ m<n Together with the fact that kxn k is decreasing and (2.11) this implies that there is subsequence (xnk ) which converges to some x ∈ H. We claim that the whole sequence (xn ) converges to that x, which, together with Proposition 2.2.1, would finish the proof. Note that for any n ∈ N and any k ∈ N so that nk > n we have kxn − xk ≤ kxn − xnk k + kxnk − xk 1/2 = kxn k2 − kxnk k2 − 2hxn − xnk , xnk i + kxnk − xk 1/2 1/2 ≤ kxn k2 − kxnk k2 + 2 max hxm − xnk , xnk i + kxnk − xk. m≤nk 1/2 So, given ε > 0 we can choose n0 large enough so that kxn k2 −kxnk k2 < ε/3, for all n ≥ n0 and k with nk > n. Then we choose k0 so that for all k > k0 , 1/2 2 maxm≤nk hxm − xnk , xnk i < ε/3 and kxnk − xk < ε/3. For any n ≥ n0 , we can therefore choose k ≥ k0 so that also nk > n, and from above inequalities we deduce that kxn − xk < ε. The next Theorem proves that at least among the decreasing weakness factors τ the condition 2.3 is optimal in order to imply convergence of the WPGA. Theorem 2.2.5. In the class of monotone decreasing sequences τ = (tk ), the condition (2.3) is necessary for the WPGA to converge. In other words, if (tn ) is a decreasing sequence for which (2.12) ∞ X tn <∞ n n∈N then there is a dictionary D of H, an x ∈ H and sequences (Gn ) ⊂ H and (zn ) ⊂ D , with G0 = 0 so that for xn = x − Gn the following is satisfied: (2.13) xn = xn−1 − hxn−1 , zn izn (2.14) hxn−1 , zn i ≥ tn max0 hxn−1 , zi, z∈D but so that xn does not converge to 0. We will need the following notation: Definition 2.2.6. Assume D0 ⊂ SH has the property that z ∈ D0 implies that −z ∈ D0 and assume that τ = (tn )N n=1 ⊂ (0, 1], with N ∈ N ∪ {∞} isa finite N 0 sequence of positive numbers. A pair of sequences (xn )N n=0 ⊂ H and (zn )n=0 ⊂ D 2.2. CONVERGENCE 37 and are called a pair of WPGA-sequences with weakness factor τ and dictionary D if x0 ∈ span(D0 ) and for all n = 1, 2 . . . N (2.15) xn = xn−1 − hxn−1 , zn zn i (2.16) hxn−1 , zn i ≥ tn max0 hxn−1 , zi. z∈D Remark. To given sequence τ = (tn )∞ n=1 ⊂ (0, 1], satisfying (2.12) we will choose elements of a dictionary D as well as the elements xn and zn of a pair of WPGAsequences with weakness factor τ and dictionary D recursively. To achieve that we will choose inductively elements xn , n ≥ 0 and zn , n ≥ 1, so that for all n ∈ N (2.17) xn = xn−1 − hxn−1 , izn (2.18) hxn−1 , zn i ≥ tn max sup |hxn−1 , ei i|, i∈N (2.19) max j=1,2,...n−1 |hxn−1 , zj | hxj , zj+1 i ≥ tj hxj , zn i, for all j = 0, 1, 2 . . . n − 1. Here (ej ) denotes an orthonormal basis of H. ∞ We deduce then that (xn )∞ n=0 and (zn )n=0 is a pair of WPGA-sequences with weakness factor τ and dictionary D = {±ej , ±zj : j ∈ N}. Proof of Theorem 2.2.5. The following procedure is the key observation towards inductively producing our example. We let (ej ) be an orthonormal basis of H. For given t ∈ (0, 1/3] and i 6= j in N. We define elements xn ∈ span(ei , ej ), n ≥ 0 and zn ∈ (ei , ej ), kzn k = 1, n ≥, and αn ∈ [0, 1] recursively until we stop at some n = N , when some criterium is satisfied, as follows: We put x0 = ei , Now assume that for some n ∈ N, we defined xs = as ei + bs ej and αs ∈ (0, 1) and zs ∈ SH for all 1 ≤ s ≤ n − 1 so that for all 1 ≤ s < n we have (2.20) hxs−1 , zs i = t, as long as s ≤ N − 1, (2.21) (2.22) zs = αs ei − (1 − αs2 )1/2 ej , √ as , bs ≥ 0, and as − bs ≥ 2, as long as s ≤ N , (2.23) xs = xs−1 − hxs−1 , zs izs . (Conditions (2.20) and (2.22) become vacuous once we defined N for s = N ) Then we first define z̃n as z̃n = α̃n ei − (1 − α̃n2 )1/2 ej where α̃n is defined so that hx̃n−1 , z̃n i = t. Secondly define x̃n to be x̃n = xn−1 − hxn−1 , zn izn 38 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE and write x̃n as x̃n = ãn ei + b̃n ej . √ Case 1. ãn − b̃n ≥ 2t In that case we choose αn = α̃n and zn = αn ei − (1 − αn2 )1/2 ej . Thus (2.20) and (2.21) are satisfied for s = n. Then we let xn = x̃n , and have therefore satisfied (2.22) and (2.23). √ √ Case 2. ãn −√ b̃n < 2t. Then we let N = Nt = n and put αN = 1/ 2, zN = (ei − ej )/ 2 and xN = xN −1 − hxN −1 , zN izN . Then (2.21),and (2.23) are satisfied while (2.20) is vacuous. From the definition of XN in Case 2, we observe that (2.24) hxN −1 , zN i = 2−1/2 aN −1 − bN −1 ≥ t, and it follows therefore that 1 (2.25) aN = bN = (aN −1 + bN −1 ). 2 In particular also (2.22) is satisfied for n = N , assuming that N is finite, which we will see later (here the second part of (2.22) is vacuous). Once the second case happens we finish the definition of our sequences. We still will have to show that eventual Case 2 will happens and that N is finite; for the moment we think of N being an element of N ∪ {∞} We make the following observations. From (2.20) and (2.21) we deduce that (2.26) 2 an+1 = an − tαn+1 and bn+1 = bn + t(1 − αn+1 )1/2 , if n < N − 1 which implies that (2.27) 2 an+1 − bn+1 = an − bn − t αn+1 + (1 − αn+1 )1/2 ( √ ≥ an − bn − 2t if n < N − 1. ≤ an − bn − t This yields 1 = a0 − b0 ≥ N −2 X (as − bs ) − (as+1 − bs+1 ) ≥ (N − 1)t s=0 and therefore we showed that N is finite. Since by definition of N and ãN and b̃N √ 2t > ãN − b̃N √ 1/2 2 = aN −1 − bN −1 − t α̃N + (1 − α̃N ) ≥ aN −1 − bN −1 − t 2 it follows that (2.28) −1/2 hxN −1 , zN i = 2 ( ≤ 2t aN −1 − bN −1 ≥t . 2.2. CONVERGENCE 39 It follows therefore from (2.27),(2.24) and (2.25) that ( N −1 X ≥ tN √ 1 = a0 − b0 = (as − bs ) − (as+1 − bs+1 ) ≤ 2tN s=0 and thus (2.29) 1 1 √ ≤N ≤ . t 2t From the definition of xN and (2.28) we deduce that kxN k2 = kxN −1 k2 − hxN −1 , zN i2 (By (2.28)) ≥ kxN −1 k2 − 4t2 = kx0 k2 + N −1 X kxs k2 − kxs−1 k2 − 4t2 s=1 2 = kx0 k − (N − 1)t2 − 4t2 ≥ kx0 k2 − t − 3t2 (By 2.29) and thus, since t ≤ 1/3, (2.30) kxN k2 ≥ kx0 k2 − 2t Finally note that the sequence (xn )N n=0 is a WAGD sequence for the Dictionary t D0 = {zn : n = 1, 2 . . . Nt } ∪ {ei } with the weakness factor t. We call (xn )N n=0 Nt together with the sequence (zn )n=0 the WAGD sequence generated by t and the pair (ei , ej ). P∞ Now we assume that (tn )∞ n=1 is a sequence in (0, 1], so that j=1 tn < ∞. We P∞ tn 3 first require the additional assumption that j=1 n < ∞ < ε = 16 . It follows that ∞ X t2s = t1 + t2 + t4 + t8 . . . s=0 1 1 1 ≤ t1 + t2 + (t3 + t4 ) + (t5 + t6 + t7 + t8 ) + (t9 + t10 + . . . t16 ) + . . . 2 4 8 h i t2 1 1 1 ≤ 2 t1 + + (t3 + t4 ) + (t5 + t6 + t7 + t8 ) + (t9 + t10 + . . . t16 ) + . . . 2 4 8 16 ∞ X tn ≤ 2ε. ≤2 n n=1 We will construct recursively sequences (xn : n = 0, 1, 2, . . .) and (zn : n = 1, 2 . . .) so that x0 = e1 , xn = xn−1 − hxn−1 , zn izn , so that for every n ∈ N (2.31) hxn−1 , zn i ≥ tn max hxn−1 , zj i and hxn−1 , zn i ≥ tn suphxn−1 , ej i j=1,...n−1 j∈N 40 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE and (2.32) tj hxj , zn i ≤ hxj , zj+1 i for all j = 0, 1, 2, . . . n − 1. As noted in the remark before the proof, these two conditions will ensure that that for each n the vector is of the form xn = x−Gn (x), where (Gn (x) : n ∈ N0 ) is the result of a WPGA with weakness factors (tn ) and dictionary D = {zn , en : n ∈ N}. (1) (1) We start with x = x0 = e1 , and let (xn : n = 0, 1, 2, . . . Nt1 ) and (zn : n = 1, 2, . . . Nt1 ) be the WAGD sequence generated by t and the pair (e1 , e2 ), then (1) (1) we put xn = xn and zn = zn for n = 1, 2, 3 . . . Nt1 . Note that we satisfied so far our required conditions (2.31) and (2.32) since by construction hxn−1 , zn i = t1 ≥ tn ≥ tn kxn−1 k for all n = 1, 2 . . . , N1 = Nt1 . By (2.25) xN1 is of the form xN1 = c1 (e1 − e2 ), and we deduce from (2.30) and the fact that kxN1 k ≤ 1 that c21 ≤ 1/2, N1 ≥ 1 and kxN1 k2 ≥ 1 − 2t1 . (2,1) (2,1) Then we consider let (xn : n = 0, 1, . . . Nt2 ) and (zn : n = 1, 2 . . . Nt2 ) be (2,2) the WAGD sequence generated by t2 and the pair (e1 , e3 ), and (xn : n = (2,2) 0, 1, . . . Nt2 ) and (zn : n = 1, 2 . . . Nt2 ) be the WAGD sequence generated by t2 and the pair (e2 , e4 ). We put N2 = Nt2 and for n = 1, 2, . . . N2 we define xN1 +n = c1 x(2,1) + c1 e2 and zN1 +n = zn(2,1) n (2,1) xN1 +N2 +n = c1 xN1 + c1 x(2,2) and zN1 +N2 +n = zn(2,2) . n We observe that for n = 1, 2 . . . N2 hxN1 +n−1 , zN1 +n i = c1 t2 ≥ t2 maxhxN1 +n−1 , es i and s∈N hxN1 +n−1 , zN1 +n i = c1 t2 ≥ t2 max hxN1 +n−1 , zs i s=1,2,...N1 (the first inequality follows from the fact that the coordinates of xN1 +n , n = 1, 2 . . . N2 are absolutely, not larger than c1 , the second inequality follows from (2.21) and the fact moreover the coordinates of xN1 +n , n = 1, 2 . . . N2 are not negative while zs , s = 1, 2 . . . N1 , has a positive and negative coordinate). Secondly we note that for j = 1, 2, . . . , N1 and n = 1, 2, . . . , N2 − 1, it follows from (2.20) and (2.24) that tj hxj , zN1 +n i ≤ t1 ≤ hxj , zj+1 i. This implies that the conditions (2.31) and (2.32) hold for all N1 ≤ n ≤ N1 + N2 . Similarly we can show that they also hold for all N1 + N2 ≤ n ≤ N1 + 2N2 . Finally (2.30) implies that XN1 +2N2 is of the form XN1 +2N2 = c2 (e1 + e2 + e3 + e4 ) 2.2. CONVERGENCE 41 with c22 ≤ 1/4 and kxN1 +2N2 k2 ≥ kxN1 k2 − c21 2t2 − c21 2t2 ≥ 1 − 2t1 − 2t2 . Now assume that for some r ∈ N we have chosen (xn : n = 1, 2, . . . Mr ), with Mr = r X 2j−1 Nj and Nj = Nt2j−1 , j = 1, 2 . . . r, j=1 and (zn : n = 1, 2, . . . Mr ) so that (2.31) and (2.32) hold for all n ≤ Mr , and so that r xMr = cr 2 X ei i=1 for some cr with c2r ≤ 2−r , and so that kxMr k2 ≥ 1 − 2t1 − 2t2 − 2t4 − . . . − 2t2r−1 (r+1,j) (r+1,j) then we let for j = 1, 2 . . . 2r (xn : n = 0, . . . Nr+1 ) and (zn : n = 0, . . . Nr+1 ),with Nr+1 = Nt2r , be the WPGA sequences generated by by t2r and the pair (ej , e2r +j ), and finally put for i = 1, 2 . . . 2r and n = 1, 2, . . . Nr xMr +(i−1)Nr+1 +n = i−1 X r (r+1,s) cr xNr + cr x(r+1,i) n + cr s=1 2 X es , and s=i+1 zMr +(i−1)Nr+1 +n = zn(r+1,i) . We deduce as in the case r = 1 that the conditions (2.31) and (2.32) hold for P P2r+1 s−1 N = M all n ≤ Mr + 2r Nr+1 = r+1 s r+1 , that xMr+1 = cr+1 s=1 2 s=1 es , for some cr+1 ≤ 2−r−1 , and that kxMr+1 k ≥ 1 − 2t1 − 2t2 , . . . 2tr . This finishes the choice of the the xn and zn . P 12 Since kxMr k2 ≥ 1 − 2 rs=1 t2s ≥ 1 − 4ε > 1 − 16 = 14 , it follows that (xn ) does not converge. We therefore proved our claim under the additional assumption P∞ that n=1 (tn /n) < 3/16. In the general case we proceed as follows. We first find an n0 so that ∞ X s=n0 t2s < 3/16, 42 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE and let n x= 2 0 X ej . j=1 1, 2 . . . , 2n0 − 1, and thus x0 = x and recursively X n0 xn = xn−1 − hxn−1 , zn i = j = n + 1 2 ej Then we choose zi = ei , i = for n = 1, 2, . . . 2n0 − 1. In particular x2n0 −1 = e2n0 from then on we choose x2n0 −1+n = x̃n , n = 1, 2 . . . and z2n0 −1+n = z̃n , where the x̃n and z̃n are chosen like the xn and the zn in the special case, but in the Hilbertspace H̃ = span(ej : j ≥ 2n0 ). 2.3 Convergence Rates Note that without any special conditions on the starting point in the Pure greedy algorithm (or others) we can not expect being able to estimate the convergence rate. Indeed let (ξn ) be any sequence of positive numbers, which decreases to 0 and let D = {±en : n ∈ N}, where (en ) is an orthonormal basis of our Hilbert space H, then take ∞ X p x= ξj − ξj+1 ej j=1 then it follows for the n-th approximates Gn = Gn (x) define as in (PGA) n X p Gn = ξj − ξj+1 ej j=1 and thus 2 kx − Gn k = ∞ X ξj − ξj+1 = ξn+1 . j=1+n Thus no matter how slow (ξn ) converges to 0, there is a x so that Gn (x) converges at least as slow as (ξn ). In order to state our first result we introduce for a dictionary D of H the following linear subspace: nX o X (2.33) A1 = A1 (D) = cz z : (cz ) ⊂ K and |cz | < ∞ . z∈D z∈D For x ∈ A1 we put (2.34) kxkA1 = inf nX z∈D |cz | : (cz ) ⊂ K and f = X z∈D o cz . 2.3. CONVERGENCE RATES 43 Theorem 2.3.1. [DT] Assume D is a dictionary of a separable Hilbert space. Let x ∈ A1 (D) and assume that (Gn ) = (Gn (x)) is defined as in (PGA) and let xn = x − Gn , for n ∈ N. Then kxn k ≤ kf kA1 n−1/6 for n ∈ N. (2.35) For the proof of Theorem 2.3.1 we need the following observation. Lemma 2.3.2. Assuming that (ξm ) is a sequence of positive numbers so that for some number A > 0 (2.36) ξ1 ≤ A and ξm+1 ≤ ξm (1 − ξm /A), for m ≥ 1. Then ξm ≤ (2.37) A , for all m ∈ N. m Proof. We assume A = 1 (pass to ξ˜m = ξm /A) We prove the claim by induction for each m ∈ N. For m = 1 (2.37) follows from the assumption. Assume that the 1 1 claim is true for m ∈ N. If ξm ≤ m+1 then also ξm+1 ≤ m+1 since from (2.36) it 1 1 follows that the sequence (ξi ) is decreasing. If m+1 < ξm ≤ m we deduce that ξm+1 ≤ ξm (1 − ξm ) ≤ 1 1 1 m 1 1− = = , m m+1 mm+1 m+1 which implies the claim for m + 1 and finishes the induction step. Proof of Theorem 2.3.1. For x ∈ H we put hx, zi . z∈D kxk ρ(x) = sup P + Note that if x ∈ A , η > 0 and (c ) ⊂ R is such that x = 1 z z∈D 0 z∈D cz z and P cz ≤ η + kxkA1 it follows that D X E X kxk2 = x, cz z ≤ cz suphz, xi ≤ kxkA1 + η kxkρ(x), z∈D z∈D z∈D and, thus, since η > 0 was arbitrary, (2.38) ρ(x) ≥ kxk . kxkA1 P Let x ∈ A1 and P let us assume that there is a representation x = z∈D cz z so that kxkA1 = z∈D cz (otherwise we use arbitrary approximations). Let (zm ) 44 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE and (Gm ) be defined as in (PGA) and xn = x − Gn , for n ∈ N. We note that for m ∈ N0 kxm+1 k2 = kxm − hxm , zm+1 izm+1 k2 (2.39) = kxm k2 − hxm , zm+1 i2 = kxm k2 (1 − ρ2 (xm )). Putting am = kxm k2 , b0 = kx0 kA1 = kxkA1 and, assuming that bm has been 1/2 defined, we let bm+1 = bm + ρ(xm )kxm k = bm + ρ(xm )am . First we observe that kxm kA1 ≤ bm . (2.40) Indeed, for m = 0 this simply follows from the definition of b0 , and assuming (2.40) holds for m ∈ N0 it follows that kxm+1 kA1 = kxm − hxm , zm+1 izm+1 kA1 ≤ kxm kA1 + |hxm , zm+1 i| = kxm kA1 + ρ(xm )kxm k = bm+1 . Secondly we compute using (2.39), (2.38) and (2.40) kxm k2 am am+1 = kxm+1 k2 = am (1 − ρ2 (xm )) ≤ am 1 − ≤ a 1 − m b2m kxm k2A1 and thus, since bm+1 ≥ bm am+1 am am am+1 ≤ 1 − . ≤ b2m b2m b2m b2m+1 Note that ξn = an−1 b2n−1 (2.41) a0 b20 = kxk2 kxk2A ≤ 1. We therefore apply Lemma 2.3.2 to sequence ξn with 1 and deduce that am b−2 m ≤ 1 , whenever m ∈ N. m Since by the recursive definition of (bj ), (2.40) and (2.38) we get −1 −1 1/2 2 bm+1 = bm 1 + ρ(xm )a1/2 m bm ≤ bm 1 + ρ(xm )am kxm kA1 ≤ bm 1 + ρ (xm ) , we obtain together with (2.39) am+1 bm+1 ≤ am bm (1 − ρ2 (xm ))(1 + ρ2 (xm )) ≤ am bm . (am bm ) is therefore decreasing and am bm ≤ a0 b0 = kxk2 · kxkA1 . Multiplying both sides of (2.41) by a2m b2m we obtain therfore a3m ≤ kxk4 · kxk2A1 a2m b2m ≤ , m m which implies our claim after taking on both sides the sixth root. 2.3. CONVERGENCE RATES 45 The next Example due to DeVore and Temlyakov gives a lower bound for the convergence rate of (PGA) Example 2.3.3. [DT] Let H be a separable Hilbertspace and (hj ) an orthonormal basis of H. We will define a dictionary D ⊂ H, a vector x ∈ H for which kxkA1 (D) = 2, and so that c kxm k = kx − Gm k ≥ √ , for m ∈ N. m Define a= 23 11 1/2 and A = and z = A(h1 + h2 ) + aA ∞ X 33 89 1/2 k(k + 1) −1/2 hk . k=3 Note that ∞ X 1 1 kzk = 2A + a A kk+1 2 2 2 2 k=3 ∞ X 1 1 2 2 2 = 2A + a A − k k+1 k=3 1 33 23 = 2A2 + a2 A2 = 2+ = 1. 3 89 33 Put D = {±g} ∪ {±hj : j ∈ N} and let x = h1 + h2 and we apply (PGA) to f Claim: In Step 1 of (PGA) we have z1 = z and x1 = x − hx, ziz = (1 − 2A2 )(h1 + h2 ) − 2aA2 ∞ X k(k + 1) k=3 Indeed, hx, zi = 2A > 1, hx, h1 i = hx, h2 i = A, and hx, hj i = 0, if, j > 2. Thus z1 = z and x1 = x − hx, ziz ∞ X −1/2 = h1 + h2 − 2A A(h1 + h2 ) + aA k(k + 1) hk k=3 −1/2 hk . 46 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE = (1 − 2A2 )(h1 + h2 ) − 2aA2 ∞ X k(k + 1) −1/2 hk . k=3 Claim: In Step 2 and Step 3 of (PGA) we have z2 = h1 and z3 = h2 or vice versa. Indeed, hx1 , zi = 0 (by construction of x1 ) 23 hx1 , h1 i = hx1 , h2 i = (1 − 2A2 ) = and 89 −1/2 1 2 hx1 , hj i = 2aA2 j(j + 1) ≤ aA < (1 − 2A2 ) if j > 2. 6 So take W.l.o.g z2 = h1 and thus 2 2 x2 = x1 − hx1 , h1 ih1 = (1 − 2A )h2 − 2aA ∞ X k(k + 1) −1/2 hk . k=3 Then we observe that hx2 , zi = hx1 , zi + hx2 − x1 , zi = −(1 − 2A2 )hh1 , zi = −A(1 − 2A2 ) hx2 , h1 i = 0, hx2 , h2 i = (1 − 2A2 ) > |hx2 , zi| −1/2 1 2 ≤ aA < (1 − 2A2 ) if j > 2. hx2 , hj i = 2aA2 j(j + 1) 6 which implies that z3 = h2 and that x3 = x2 − hx2 , h2 ih2 = −2aA 2 ∞ X k(k + 1) −1/2 hk . k=3 From now on we prove by induction that 2 zm = hm−1 and xm = −2aA ∞ X k(k + 1) −1/2 hk whenever m ≥ 3. k=m Indeed, for m = 3 this was already shown. Assuming that our claim is true for some m ≥ 3 we compute that hxm , zi = −2a2 A3 ∞ X k=m 2a2 A3 k(k + 1) = − m(m + 1) hxm , h` i = 0 for ` < m, and hxm , h` i = −2a2 A3 `(` + 1) −1/2 for ` ≥ m. 2.3. CONVERGENCE RATES 47 Thus zm+1 = hm and xm = −2aA2 induction step. Finally note that for m ≥ 3 P∞ kxm k2 = 4a2 A4 k=m+1 ∞ X k=m −1/2 k(k + 1) hk , which finishes the 1 4a2 A4 = . k(k + 1) m Thus kxkA1 = 2 and (kxm k) is of the order m−1/2 . Remark. In [KT2] the rate of convergence in Theorem 2.3.1 was slightly improved to Cn−11/62 where C is some universal constant. And in [LT] a dictionary D of H was constructed for which there is an x ∈ A1 (D) for which kx − Gn (x)k ≥ Cn−.27 , whenever n ∈ N. It is conjectured that the rate of convergences should be of the order of n−1/4 . Theorem 2.3.4. [Jo] Consider the Greedy Greedy Algorithm (Gfm (x)), x ∈ H with fixed relaxation, and let x ∈ A1 (D) with kxkA1 ≤ c, where c is the constant in (GAFR) then (2.42) 2c kx − Grm kA1 ≤ √ , for all m ∈ N. m We first need the following elementary Lemma. Lemma 2.3.5. Let (am ) be a sequence of non-negative numbers, for which there is an A > so that A 2 am−1 + 2 if m ≥ 2. (2.43) a1 ≤ A and am ≤ 1 − m m Then (2.44) am ≤ A for all m ∈ N. m Proof. We prove (2.44) by induction. For m = 1 (2.44) is part of our assumption, and assuming (2.47) is true for m − 1 we deduce from the second part of our assumption that 2 A am ≤ 1 − am−1 + 2 m m 2 A A ≤ 1− + 2 m m−1 m 1 m+1 =A − 2 m − 1 m (m − 1) m2 − m − 1 =A m2 (m − 1) 48 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE = A m2 − m − 1 A < , m m(m − 1) m which finishes the induction step and the proof of our claim. Proof of Theorem (2.3.4). W.l.o.g. we can assume that c = 1. Let Gfn = Gfn (x) and zn be as in GAFR and let xn = x − Gfn , for n ∈ N we compute: 2 (2.45) kxn k2 = x − Gfn 1 1 f 2 Gn−1 − zn = x − 1 − n n 2 1 2 2 x − Gfn−1 , Gfn−1 − zn + 2 Gfn−1 − zn = x − Gfn−1 + n n 2 2 4 f f f ≤ x − Gn−1 + x − Gn−1 , Gn−1 − zn + 2 n n (for the inequality notice that if kGfn−1 k ≤ kGfn−1 kA1 ≤ 1) and x − Gfn−1 , Gfn−1 − zn ≤ inf x − Gfn−1 , Gfn−1 − z z∈D = inf x − Gfn−1 , Gfn−1 − z z∈A1 (D),kzkA1 ≤1 = hx − Gfn−1 , Gfn−1 i − sup z∈A1 (D),kzkA1 ≤1 f Gn−1 , Gfn−1 − z ≤ x − Gfn−1 , Gfn−1 − x = −kx − Gfn−1 k2 . Inserting this inequality into (2.45) yields 2 x − Gf 2 + 4 = 1 − 2 kxn−1 k|2 + 4 , kxn k2 ≤ 1 − n−1 n n2 n n2 which together with Lemma 2.3.5 yields our claim. Theorem 2.3.6. If we consider the orthogonal greedy algorithm (OGA), then for each x ∈ H with kxkA1 = kxkA1 (D) < ∞, it follows (2.46) kx − Gon (x)k ≤ kxkA1 (D) √ . n Proof. Assume that kxkA1 = 1, and Gon = Gon (x), zn are given as in (OGA), i.e. hx − Gon1 , zn i = suphx − Gn−1 , zi z∈D 2.3. CONVERGENCE RATES 49 and Gon is the orthogonal projection PZ⊥n (x) of x onto Zn = span(z1 , z2 , . . . zn ). As usual we put xn = x − Gon . Since G0n is the best approximation of x by elements of Zn , it follows that (2.47) kxn k2 ≤ kxn−1 − hxn−1 , zn izn k2 = kxn−1 k2 − hxn−1 , zn i2 = kxn−1 k2 1 − hx n−1 , zn i 2 kxn−1 k (W.l.o.g. xn−1 6= 0, otherwise we would be done Write x as x = P cz = 1, cz ≥ 0, for z ∈ D. Then P kxn−1 k2 = hx − Gon−1 , x − Gon−1 i = hx − Gon−1 , xi (Since x − Gon−1 ⊥ Gon−1 ) D E X = x − Gon−1 , cz z z∈D D E X ≤ x − Gon−1 , cz zn z∈D = hxn−1 , zn−1 i = kxn−1 k hxn−1 , zn i kxn−1 k and thus by (2.47) kxn k2 ≤ kxn−1 k2 (1 − kxn−1 k2 ). Our claim follows therefore again from Lemma 2.3.5. ! z∈D cz z, with 50 CHAPTER 2. GREEDY ALGORITHMS IN HILBERT SPACE Chapter 3 Greedy Algorithms in general Banach Spaces 3.1 Introduction The algorithms from Chapter 2 can be generalized to separable Banach spaces X. Again let D ⊂ SX be a dictionary of X, i.e. X = span(D), and with z ∈ D, it also follows that −z ∈ D. (XGA) The X-Greedy Algorithm. 1) For x ∈ X we define Gn = Gn (x), for each n ∈ N0 , by induction. G0 = 0 and assuming that G0 , G1 . . . Gn−1 , have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D and an ∈ R so that kx − Gn−1 − an zn k = 2) inf z∈D,a≥0 kx − Gn−1 − azk. Put Gn = Gn−1 + an zn . As in the Hibert space case, the “inf” in the above defined algorithm (XGA) may not be attained. In this case we can consider the following modification. (WXGA) The Weak X-Greedy Algorithm. We are given a sequence τ = (tn ) ⊂ (0, 1). For x ∈ X we define Gn = Gn (x), for each n ∈ N0 , by induction. G0 = 0 and assuming that G0 , G1 . . . Gn−1 , have been defined for some n ∈ N we proceed as follows: 1) Choose zn ∈ D and an ∈ R so that tn kx − Gn−1 − an zn k ≤ 51 inf z∈D,a∈R kx − Gn−1 − azk. 52CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES 2) Put Gn = Gn−1 + an zn . The following example shows that without any further conditions, one can ”get stuck pretty easily”: Example 3.1.1. On R2 consider the `2∞ norm: k(x, y)k∞ = max(|x|, |y|), if x, y ∈ R. and let D = {±e1 , ±e2 } be the dictionary. Now for vector x = (1, 1) we have inf a≥0,z∈D kx − azk∞ = 1 = kxk∞ . In order to avoid cases like in Example 3.1.1 we will assume that our space X is at least smooth: Definition 3.1.2. A Banach space X is called smooth if for every x ∈ X there is a unique support functional fx ∈ X ∗ , i.e. with kfx k = 1 and fx (x) = kxk. Remark. As shown for example in [Schl, Theorem 4.1.3] the assumption that X is smooth is equivalent with the condition that the norm is Gateaux differentiable on X \ {0}. In that case it follows for x0 i ∈ X \ {0} fx0 (y) = 1 kx0 + hyk − kx0 k 1 ∂ kx0 + λykλ=0 = lim , kx0 k ∂λ kx0 k h→∞ h for all y ∈ SX This implies that the X-greedy algorithm or the weak X-greedy algorithm cannot become stationary at point x0 6= 0. Indeed, if for all z ∈ D and all λ kx0 − λzk ≥ kx0 k, it follows that for all z ∈ D fx0 (z) = 1 kx0 + hzk − kx0 k lim = 0. kx0 k h→0 h Since span(D) is dense this would mean that fx0 = 0 which is a contradiction. In the Hilbert space case minimizing kx − azk over all z ∈ D and all a ∈ R is equivalent to maximizing hx, zi over all z ∈ D. Generalizing this to Banach spaces will lead to a different algorithm, i.e. to an algorithm which does not coincide with (XGA). 3.1. INTRODUCTION (DGA) 1) 53 The Dual Greedy Algorithm. For x ∈ X we define Gdn = Gdn (x), for each n ∈ N0 , by induction as follows. Gd0 = 0 and assuming Gd0 , Gd1 . . . Gdn−1 , have been defined, choose zn ∈ D so that fx−Gd n−1 2) (zn ) = sup fx−Gdn (z), z∈D and then an so that k(x − Gdn−1 ) − an zn k = min k|(x − Gdn−1 ) − azk. a∈R Then put Gdn = Gdn−1 + an zn . Similar to XGA we can also define the weak version of (DGA) and denote it by (WDGA). (XGDAR) 1) The X-Greedy Dual Algorithm with relaxation. We are given a sequence ρ = (rn ) ⊂ [0, 1). For x ∈ X we define Grn (ρ), for each n ∈ N0 , by induction. Gr0 (ρ) = 0 and assuming that Gr0 (ρ), Gr1 (ρ) . . . Grn−1 (ρ), have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D and an ∈ R so that kx−(1−rn )Grn−1 (ρ)−an zn k ≤ 2) inf z∈D,a∈R kx−(1−rn )Grn−1 (ρ)−azk. Put Grn (ρ) = (1 − rn )Grn−1 (ρ) + an zn . The following algorithm is a generalization of the the Orthogonal Greed Algorithm for Hilbert spaces. (CDGA) 1) The Chebyshev Dual Greedy Algorithm. For x ∈ X we define GC n , for each n ∈ N0 , by induction. C C C G0 = 0 and assuming that GC 0 , G1 . . . Gn−1 , and vectors z1 . . . zn−1 have been defined for some n ∈ N we proceed as follows: Choose zn ∈ D so that fx−GC (zn ) ≥ sup fx−GC (z). n−1 2) z∈D n−1 Define Zn = span(z1 , z2 . . . zn )) and let GC n be the (or a) best approximation of x to Zn . Similar to (WXGA) there are weak version of (XGDAR) and (CDGA) which we denote (WXGDAR) and (WCDGA), respectively. 54CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES The following algorithm is of a different nature than the previous ones. We will assume that (ei ) is a semi normalized basis of X. Before discussing these algorithms and there convergence properties we will need to introduce the following strengthening of smoothness of Banach spaces. Definition 3.1.3. For a Banach space X define the modulus of uniform smoothness by 1 (3.1) ρ(u) = ρX (u) = sup kx + uyk + kx − uyk − 1 . x,y∈SX 2 We say that X is uniformly smooth if ρ(u) = 0. u→0 u (3.2) lim Remark. We will use the modulus of uniform smoothness as follows. For some x, y ∈ X \ {0} (not necessarily in SX we would like to find an upper estimate for kx − yk, and write ! x kyk kyk y − kx + yk. kx − yk = kxk · − ≤2 1+ρ kxk kxk kyk kxk Example 3.1.4. As was shown in [Schl], the spaces Lp [0, 1] are uniform smooth if 1 < p < ∞, more precisely for X = Lp [0, 1] and u ≥ 0 ( cp up if 1 ≤ p ≤ 2 (3.3) ρX (u) ≤ 2 (p − 1)u /2 if 2 ≤ p < ∞ Proposition 3.1.5. For any Banach space X ρ is a convex and even function and max(0, u − 1) ≤ ρ(u) ≤ u, for u ≥ 0. Lemma 3.1.6. For x 6= 0. Then for u ∈ R (3.4) ukyk 0 ≤ kx + uyk − kxk − fx (y) ≤ 2kxkρ . kxk Proof. First we note that kx + uyk ≥ fx (x + uy) = kxk + ufx (y), which implies the first inequality in (3.4). Secondly, from the definition of ρ(u), and assuming w.l.o.g that y 6= 0, it follows that x x ukyk y ukyk y kx + uyk + kx − uyk = kxk + − + kxk kxk kxk kyk kxk kxk kyk 3.2. CONVERGENCE OF THE WEAK DUAL CHEBYSHEV GREEDY ALGORITHM55 ukyk ≤ 2kxk 1 + ρ kxk ! and kx − uyk ≥ fx (x − uy) = kxk − ufx (y), and, thus, ukyk kx + uyk ≤ 2kxk 1 + ρ kxk ! ukyk − kx − uyk ≤ 2ρ + ufx (y), kxk which implies our claim. Corollary 3.1.7. If X is a uniformly smooth Banach space and x ∈ X \ {0}, then ! d kx + uyk − kxk (3.5) fx (y) = kx + uyk (0) = lim , u→0 dx u and this convergence is uniform in x, y ∈ SY . The norm is therefore Fréchet differentiable Proof. Note, that by (3.4) kx + uyk − kxk kxk ukyk − fx (y) ≤ 2 ρ →u→0 . u u kxk If kyk = 1 and kxk > ε it follows therefore kx + uyk − kxk kxk u − fx (y) ≤ 2 kρ ≤ 2ρ(u)/u →u→0 , u u kxk which implies the claimed uniform convergence. 3.2 Convergence of the Weak Dual Chebyshev Greedy Algorithm Recall the Weak Dual Chebyshev Greedy Algorithm: We are given a sequence of weakness factors τ = (tn ) ⊂ (0, 1) and a dictionary D ⊂ X. For x ∈ X we choose (zn ) ⊂ D and Gcn = Gcn (x) as folllows. Gc0 = 0 and assuming Gcj , j = 01, 2 . . . n and zj , j = 1, 2 . . . n have been chosen, we let zn ∈ D so that c c fx−G (zn ) ≥ tn sup fx−G (z). n −1 n−1 z∈D 56CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES Then let Zn = span(z1 , z2 , . . . zn ) and let Gcn be the (or a) best approximation of x inside Zn . The main goal of this section is to prove two results by Temliakov We will need the following technical definition first. Definition 3.2.1. Let ρ be an even convex function on [−2, 2] with limu→0 ρ(u)/u = 0 and ρ(2) ≥ 1, let τ = (tn ) of numbers in (0, 1] and Θ ∈ (0, 1/2] then let ξm = ξm (ρ, τ, θ) = 0 the (by the Intermediate Value Theorem uniquely existing) number for which (3.6) ρ(ξm ) = Θtm ξm . Theorem 3.2.2. Let X be a uniformly smooth Banach space and ρ its modulus of uniform smoothness, and let τ = (tn ) be sequence of numbers in (0, 1]. Assume that for any Θ > 0 we have ∞ X tm ξm (ρ, τ, Θ) = ∞. m=1 We consider the weak Chebysheev Greedy Algorithm (WCDGA). This means for x ∈ X Gcn = Gcn (x) and zn ∈ D is such that fx−GC (zn ) ≥ sup fx−GC (z), n−1 z∈D n−1 and Gcn is the best approximation of x to Zn = span(z1 , z2 , . . . zn ). Let xn = x − Gcn , for n ∈ N. Then limn→∞ kxn k = 0. Theorem 3.2.3. Let X be a uniformly smooth Banach space and and assume that its modulus of uniform smoothness ρ satisfies ρ(u) ≤ γuq for some q ∈ (1, 2] and γ ≥ 1. Let τ = (tn ) be sequence of numbers in (0, 1]. For x ∈ X, assume that x ∈ A1 (D) and let Gcn = Gcn (x)be defined as in Theorem 3.2.2. Then there is constant C(q, γ), only dependent on qand γ so that for all n ∈ N (3.7) m X x − Gcn ≤ C(q, γ)kxkA (D) 1 + tpk 1 !−1/p k=1 where p = q/(q − 1). We will first need some Lemmas Lemma 3.2.4. Let X be a uniformly smooth Banach space and let Z ⊂ X be a finite dimensional subspace. If y is the best approximate of some x ∈ X \ Z from Z then fx−y (z) = 0, for all z ∈ Z. 3.2. CONVERGENCE OF THE WEAK DUAL CHEBYSHEV GREEDY ALGORITHM57 Proof. Assume to the contrary that there is a z ∈ Z, kzk = 1, so that β = fx−y (z) > 0. By the definition of ρ(u) it follows for any λ and z ∈ SZ that ! λ (3.8) kx − y − λzk + kx − y + λzk ≤ 2kx − yk 1 + ρ kx − yk and secondly kx − y + λzk ≥ fx−y (x − y + λz) = kx − yk + λβ. (3.9) It follows therefore from (3.8) and (3.9) that ! λ − kx − y + λzk kx − yk λ ≤ kx − yk + ρ − λβ kx − yk ρ(λ/kx − yk) . = kx − yk − λ β − λ/kx − yk kx − y − λzk ≤ 2kx − yk 1 + ρ Since ρ(u)/u →u→0 , it follows therefore that kx − y + λzk < kx − yk, which is a contradiction. Lemma 3.2.5. For any x∗ ∈ X ∗ we have sup x∗ (z) = (3.10) z∈D Proof. for x = that P z∈D cz z, x∗ (x). sup x∈A1 (D),kxkA1 ≤1 with cz ≥ 0, for z ∈ D and x∗ (x) = X P z∈D cz ≤ 1, it follows cz x∗ (z) ≤ sup x∗ (z), z∈D z∈D and thus sup x∗ (z) ≥ z∈D sup x∗ (x). x∈A1 (D),kxkA1 ≤1 The reverse inequality is trivial. Lemma 3.2.6. Let X be a uniformly smooth Banach space and ρ its modulus of uniform smoothness, and let τ = (tn ) be a sequence of numbers in (0, 1]. For x ∈ X let Gcn = Gcn (x) be defined as in (WCDGA). Assume that x ∈ X and that for some ε ≥ 0 there is a xε ∈ X so that kx − xε k ≤ ε and xε ∈ A(D). Then it follows for all n ∈ N " # λ kx − Gcn k λtn ε 1− (3.11) ≤ inf 1 − + 2ρ . kx − Gcn−1 k λ≥0 kxkA1 kx − Gcn−1 k kxn−1 k 58CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES Proof. Abbreviate A = kxε kA1 . Let zn ∈ D, be chosen as in (WCDGA) and put xn = x − Gcn , for n ∈ N. From the definition of ρ, it follows for every λ ≥ 0 that λ (3.12) kxn−1 − λzn k + kxn−1 + λzn k ≤ 2kxn−1 k 1 + ρ . kxn−1 k and by the definition of (WCDGA) and Lemma 3.2.5 it follows that fxn−1 (zn ) ≥ tn sup fxn−1 (z) = tn z∈D sup z∈A1 (D),kzkA1 ≤1 fxn−1 (z) ≥ tn fx (xε ). A n−1 From Lemma 3.2.4 we deduce that fxn−1 (xε ) = fxn−1 (x + xε − x) ≥ fxn−1 (x) − ε = fxn−1 (xn−1 ) − ε = kxn−1 k − ε, and thus kxn−1 + λzn k ≥ kxn−1 k + λfxn−1 (zn ) λtn λtn λtn fxn−1 (xε ) ≥ kxn−1 k 1 + − ε. ≥ kxn−1 k + A A A Finally (3.12) yields kxn k ≤ inf kxn−1 − λzn k λ≥0 ≤ 2kxn−1 k 1 + ρ λ − kxn−1 + λzn k kxn−1 k ! λ λt λtn n ≤ kxn−1 k 1 + 2ρ − + ε kxn−1 k A Akxn−1 k ! λ λt ε n − 1− = kxn−1 k 1 + 2ρ kxn−1 k A xn−1 k which proves our assertion. Proof of Theorem 3.2.2. Let xn = x − Gcn , for n ∈ N. By construction, the sequence (kxn k : n ∈ N) is decreasing and thus, α = limn→∞ kxn k exists and we have to show that α = 0. We assume that α > 0 and will deduce a contradiction. Let ε = α/2. Since span(D) is dense in X we find an xε ∈ span(D), so that kx − xε k < ε and denote A = kxε kA1 . From Lemma 3.2.6 we deduce that # h λ λt ε n kxn k ≤ kxn−1 k inf 1 + 2ρ − 1− λ≥0 kxn−1 k A kxn−1 k 3.2. CONVERGENCE OF THE WEAK DUAL CHEBYSHEV GREEDY ALGORITHM59 h λ λt i n ≤ kxn−1 k inf 1 + 2ρ − . λ≥0 α 2A We let Θ = α/8A and take λ = αξn (ρ, τ, Θ) (recall that this means that ρ(ξn ) = Θtn ξn ) and obtain that h h i tn i kxn k ≤ kxn−1 k 1 + 2Θtn ξn − = kxn−1 k 1 − 2Θtn ξn . 2A and thus kxn k = kxk + n X kxj k − kxj−1 k ≤ kxk − j=1 n X kxj−1 k2Θtj ξj ≤ kxk − α j=1 n X k2Θtj ξj . j=1 But this contradicts our assumption that Σtn ξn = ∞. Before proving Theorem 3.2.3 we need one more Lemma. Lemma 3.2.7. Assume that (an ) and (sn ) are sequences of positive numbers satisfying for some A > 0 the following assumption (3.13) a1 < A sn and an ≤ an−1 1 − an−1 . 1 + s1 A Then it follows for all n ∈ N (3.14) n −1 X an ≤ A 1 + sj j=1 Proof. We prove(3.14) by induction. For n = 1 this is the assumption. Assuming our claim is true for n − 1 we obtain sn an ≤ an−1 1 − an−1 A ! n−1 n −1 X −1 X sn ≤A 1+ 1− sj ≤A 1+ sj Pn−1 1 + j=1 sj j=1 j=1 (last inequality follows from cross multiplication). Proof of Theorem 3.2.3. W.l.o.g. we assume kxkA1 = 1. Let the sequences (zn ) and (Gcn ) = (Gcn (x)) be given as in (WCDGA) and let xn = x − Gcn , for n ∈ N. By Lemma 3.2.6 with ε = 0 we obtain " # λ q (3.15) kxn k ≤ kxn−1 k inf 1 − λtn + 2γ λ≥0 kxn−1 k 60CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES We choose λ such that λ q 1 , λtm = 2γ 2 kxn−1 k or λ = kxn−1 kq/(q−1) (4γ)1/(q−1) tn1/(q−1) . Abbreviating Aq = 2(4γ)1/(q−1) and p = q/(q − 1), and inserting the choice of λ into (3.15), we obtain 1 kxn k ≤ kxn−1 k 1 − λtn = kxn−1 k 1 − kxn−1 kp tpn /Aq . 2 Taking the pth power on each side and using the fact that x ≥ xp if 0 < x ≤ 1, we obtain kxn kp ≤ kxn−1 kp 1 − kxn−1 kp tpn /Aq . Since γ ≥ 1 and thus Aq > 2, and since kxk ≤ kxkA1 = 1 we can apply Lemma 3.2.7 to an = kxn kp and sn = tpn , and A = Aq , and deduce that for all n ∈ N p kxn k ≤ Aq 1 + n X tpn −1 , j=1 which implies the claim of our Theorem. 3.3 Weak Dual Greedy Algorithm with Relaxation For the Weak Chebyshev Dual Greedy Algorithm we need to approximate x by an element of the n dimensional space Zn , which might be computationally complicated. The following algorithm is a compromise between the Chebyshev Dual Greedy Algorithm and Dual Greedy Algorithm. Here we only have to find a good approximation to a two dimensional subspace: (WDGAFR) The weak dual greedy algorithm with free relaxation As usual we are given a sequence of weakness factors τ = (tn ) ⊂ (0, 1] and a dictionary D ⊂ SX . For x ∈ X we define Grn = Grn (x) , n ∈ N, as follows: Gr0 = 0 and assuming Grn−1 has been defined we choose zn ∈ D so that fx−Grn−1 (zn ) ≥ tn sup fx−Grn−1 (z), z∈D and then we let wn and λn so that x − (1 − wn )Grn−1 − λn zn ≤ inf x − (1 − w)Grn−1 − λzn ., λ,w and define Grn q = (1 − wn )Grn−1 + λzn . 3.3. WEAK DUAL GREEDY ALGORITHM WITH RELAXATION 61 Proposition 3.3.1. For all x ∈ X kx − Grn (x)k is decreasing in n ∈ N. We will need the following analog to Lemma 3.2.6 Lemma 3.3.2. Assume that X is a uniformly smooth Banach space and denote its modulus of uniform smoothness by ρ. Let x ∈ X, kxk ≥ ε ≥ 0 and xε ∈ A1 , so that kx − xε k < ε. For n ∈ N put xn = x − Gfn Then ! 5λ λtn ε kxn k 1− . (3.16) ≤ inf 1 − + 2ρ kxn−1 k λ≥0 kxε kA1 kxn−1 k kxn−1 k Proof. From the definition of ρ we deduce for w ∈ R and λ ≥ 0 that (3.17) dr kxn−1 + wGdr n−1 − λzn k + kxn−1 − wGn−1 + λzn k kwGdr − λz k n n−1 ≤ 2kxn−1 k 1 + ρ . kxn−1 k For all w ∈ R and λ ≥ 0 we estimate (3.18) kxn−1 −wGdr n−1 + λzn k ≥ fxn−1 (xn−1 − wGdr n−1 + λzn ) ≥ kxn−1 k − fxn−1 (wGdr n−1 ) + λtn sup fxn−1 (z) z∈D = kxn−1 k − fxn−1 (wGdr n−1 ) + λtn sup fxn−1 (z) z∈A1 ,kzkA1 ≤1 (By Lemma 3.2.5) λtn fx (xε ) kxε kA1 n−1 λtn ε λtn fxn−1 (x) − . = kxn−1 k − fxn−1 (wGdr n−1 ) + kxε kA1 kxε kA1 ≥ kxn−1 k − fxn−1 (wGdr n−1 ) + Letting w∗ = λtn /kxε kA1 we deduce that (3.19) kxn−1 −w∗ Gdr n−1 + λzn k λtn ε λtn ≥ kxn−1 k + fxn−1 (x − Gdr n−1 ) − kxε kA1 kxε kA1 λtn λtn ε ≥ kxn−1 k + kxn−1 k − . kxε kA1 kxε kA1 Thus we obtain kxn k = inf λ≥0,w∈R kxn−1 + wGdr n − 1 − λzn k 62CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES kwGdr − λz k i n n−1 2kxn−1 k 1 + ρ − kxn−1 − wGdr n−1 + λzn k λ≥0,w∈R kxn−1 h kw∗ Gdr − λz k n n−1 ≤ inf 2kxn−1 k 1 + ρ λ≥0 kxn−1 k λtn ε i λtn kxn−1 k − − kxn−1 k + kxε kA1 kxε kA1 " # kw∗ Gdr − λz k λtn ε n n−1 = kxn−1 k inf 1 − 1− + 2ρ λ≥0 kxε kA1 kxn−1 k kxn−1 k ≤ inf h In order to achieve (3.16) we need to estimate kw∗ Gdr n−1 − λzn k. First we note that kGdr n−1 k = kx − xn−1 k ≤ 2kxk ≤ 2kxε kA1 + 2ε ≤ 4kxε kA1 and thus ∗ kw∗ Gdr n−1 − λzn k ≤ 4w kxε kA1 + λ ≤ 5λ, which implies our claim since ρ(·) is increasing on [0, ∞). Remark. Before stating the next Theorem let as note that if ρ is an even and convex function on R, with ρ(0) = 0, limu→0 ρ(u)/u = 0, then the function s : u 7→ ρ(u)/u is increasing on [0, ∞), thus has a inverse function s−1 which is also increasing and s−1 (0) = 0. Theorem 3.3.3. Assume that X is a separable and uniformly smooth Banach space, and denote its modulus of uniform smoothness by ρ. Let s−1 (·) be the inverse function of s : u 7→ ρ(u)/u, u ≥ 0. We consider the (WDGAFR) with a sequence of weakness factors τ = (tn ) ⊂ (0, 1] satisfying (3.20) ∞ X tn s−1 (Θtn ) = ∞ for all Θ > 0 n=1 Then for any x ∈ X (3.21) lim kx − Gdr n (x)k = 0. n→∞ dr dr Proof. Let Gdr n = Gn (x) and xn = x − Gn , for n ∈ N. Since kxn k decreases β = lim kxn k n→∞ exists and we need to show that β = 0. We assume that β > 0 and will derive a contradiction. We set ε = β/2 and choose xε ∈ A1 (D) with kx − xε k < ε. Note that kxk ≥ βε. By Lemma 3.3.2 we have 5λ λtn kxn k ≤ kxn−1 k inf 1 − + 2ρ . λ≥0 2A β 3.3. WEAK DUAL GREEDY ALGORITHM WITH RELAXATION 63 Putting Θ = β/40A and λ = βs−1 (Θtm )/5, we obtain βtn s−1 (Θtn ) kxn k ≤ kxn−1 k 1 − + 2ρ s−1 (Θtn ) 10A ρ s−1 (Θtn ) βtn s−1 (Θtn ) −1 = kxn−1 k 1 − + 2s (Θtn ) −1 10A s (Θtn ) −1 ρ s−1 (Θtn ) βtn s (Θtn ) −1 = kxn−1 k 1 − + 2s (Θtn ) −1 10A s (Θtn ) −1 −1 = kxn−1 k 1 − 4Θtn s (Θtn ) + 2s (Θtn )Θtn = kxn−1 k 1 − 2Θtn s−1 (Θtn ) . We can iterate this inequality and obtain kxn k ≤ kxn−1 − kxn−1 k2Θtn s−1 (Θtn ) ≤ kxn−1 k − β2Θtn s−1 (Θtn ) ≤ kxn−2 k − β2Θtn−1 s−1 (Θtn−1 ) − β2Θtn s−1 (Θtn ) .. . n X ≤ kxk − β2Θtn−1 tj s−1 (Θtj ) j=1 and thus letting n → ∞ β ≤ kxk − β2Θtn−1 ∞ X tj s−1 (Θtj ), j=1 which is the contradiction since we assumed that Pn j=1 tj s −1 (Θt ) j = ∞. There is also a result on the rate of the convergence in Theorem 3.3.3 Since the proof is similar to the proof of the corresponding result for the Chebyshev Greedy Algorithm we omit a proof. Theorem 3.3.4. Let X be a uniformly smooth Banach space with modulus of uniform smoothness ρ, which satisfies for some q ∈ (1, 2] (3.22) ρ(u) ≤ γuq . Then there is a number C only depending on q and γ so that the following holds. If x ∈ X and if ε > 0 and xε ∈ A1 so that kx − xε k < ε, and if τ = (tn ) ⊂ dr (0, 1], then it follows for the (WGGAFR) (Gdr n ) = (Gn (x)) with weakness factors (tn ) that ! n −1/p X (3.23) kx − Gdr tpj n k ≤ max 2ε, C(kxε kA1 + ε) 1 + j=1 64CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES where p = q/(q − 1). Remark. We like to point out something about the proof of Theorem 3.3.4 which will be useful when we consider the next greedy algorithm. In the proof of Theorem 3.3.4 It was only used that the sequence (kxn k : n ∈ N) is decreasing and that the inequality 3.16 holds. Of course in order to proof this inequality we needed the specific choice of zn , namely in the second “≥” of (3.18). Thus if Gn (x) is any algorithm so that kx−Gn (x)k is decreasing which satisfies equation 3.16 the Gn (x) converges to x. Keeping that remark in mind we now turn to the X-Greedy Algorithm with Free Relaxation. (XGAFR) the X-Greedy Algorithm with Free Relaxation As usual we are given a dictionary D ⊂ SX . For x ∈ X we define Grn = Grn (x) , n ∈ N, as follows: Gr0 = 0 and assuming Grn−1 has been defined we choose λn ≥ 0, wn ∈ R , zn ∈ D so that kx−(1−wm )Grn−1 −λn zn k = inf z∈D,λ≥0,w∈R kx−(1−w)Grn−1 −λzk and then define Grn q = (1 − wn )Grn−1 + λzn . We notice that at each step the value of kx − Gr n(x)k kx − Gr n − 1(x)k is at most as large as the value we would have obtained if we had computed Gdr n (x) form xn−1 . We deduce therefore that the conclusion of Lemma still holds and that if 0 ≤ ε ≤ kxk and if xε ∈ A1 with kx − xε k ≤ ε it follows that ! 5λ kx − Grn (x)k λtn ε (3.24) ≤ inf 1 − 1− + 2ρ . kx − Grn−1 (x)k λ≥0 kxε kA1 kxn−1 k kxn−1 k From the previous made remark we deduce therefore the following convergence result. Theorem 3.3.5. Assume that X is a separable and uniformly smooth Banach space. We consider the (XGAFR) (Grn (x) : n ∈ N), for x ∈ X Then for any x ∈ X (3.25) lim kx − Grn (x)k = 0. n→∞ 3.4. CONVERGENCE THEOREM FOR THE WEAK DUAL ALGORITHM65 3.4 Convergence Theorem for the Weak Dual Algorithm For a Banach space X assume that f(·) : X \ {0} → SX ∗ support map, i.e. for every x ∈ X \ {0} we have fx (x) = kxk. Recall from the the remark after Definition 3.1.2 that every x ∈ X \ {0} has a unique support map fx if and only if the norm is Gateaux differentiable fx0 (y) = 1 ∂ 1 kx0 + hyk − kx0 k kx0 + λykλ=0 = lim . kx0 k ∂λ kx0 k h→∞ h for all y ∈ SX Let D ⊂ SX be a dictionary for X and put for x ∈ X: ρ(x) = ρD (x) = sup fx (z). z∈D We consider the Weak Dual Greedy Algorithm with fixed weakness factor as in Section 3.1 but slightly reformulated: (WDGA) Fix c ∈ (0, 1). For x ∈ X we choose sequences (xn )n≥0 , (zn )n≥1 ⊂ D and (tn ) ⊂ [0, ∞) recursively as follows. x0 = x and assuming that xn−1 ∈ X, has been chosen we choose zn ∈ D arbitrary and tn = 0 if xn−1 = 0, and otherwise we choose zn ∈ D and tn ≥ 0 so that a) fxn−1 (zn ) ≥ cρD (xn−1 ) = c supz∈D fxn−1 (z), b) kxn−1 − tn zn k = mint≥0 kxn−1 − tzn k, and in both cases we finally let c) Gn = Gn−1 + tn zn and xn = x − Gn = xn−1 − tn zn . We say that the weak dual greedy algorithm converges for D, if for all x ∈ X, lim xn = 0 or, equivalently, lim n→∞ n→∞ n X ti zi = x. i=1 Lemma 3.4.1. Let X have Gateaux differentiable norm, 0 < c < 1, and assume that for x ∈ X the sequences (xn ), (tn ), and (zn ) satisfy (a) and (c) of (WDGA), but instead of (b) the following condition kxn−1 k − kxn k ≥ cρ(xn−1 ) for all n ∈ N. tn P P∞ Then, if ∞ n=1 tn = ∞, we have x = n=1 tn zn . d) 66CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES Proof. Define sn = Then e Pn i=1 ti , Pn j=2 for n ∈ N ln((sj −tj )/sj ) = n Y sj−1 s1 = →n→∞ 0, sj sn j=2 and thus lim n→∞ n X ln((sj − tj )/sj ) = −∞ j=2 It follows that ∞=− ∞ X ln((sj − tj )/sj ) = − j=2 ∞ t2j tj X tj ln 1 − ≤ + 2, sj s sj j=2 j=2 j n X P tj and thus ∞ j=2 sj = ∞. P P We note that if (an ) and (bn ) are two positive sequences and an < ∞, while bn = ∞, then there is a subsequence (nk ) of N, so that limk→∞ ank /bnk = 0. Indeed, for every k ∈ N the set Nk = {n ∈ N : kan < bn } must be infinite, and we can therefore choose n1 < n2 < n3 < . . ., with ank ∈ Nk , for k ∈ N. Thus we can find n1 < n2 < n3 < . . . so that snk +1 (kxnk k − kxnk +1 k) = 0. k→∞ tnk +1 lim It follows that 0 ≤ snk ρD (xnk ) ≤ 1 snk +1 (kxnk k − kxnk+1 k) 1 snk (kxnk k − kxnk +1 k) ≤ →k→∞ 0, C tnk +1 C tnk +1 and thus, in particular, (3.26) lim ρD (xnk ) = 0. k→∞ For 1 ≤ l ≤ nk − 1 we have: (3.27) nk nk X X kxn k − fxn (xl ) = fxn tj zj ≤ tj ρD (xnk ) = snk ρD (xnk ) →k→∞ 0. k k k j=1 j=l+1 Now assume that x∗ is a w∗ -cluster point of the sequence (fxnk ) and let L = limn→∞ kxn k.Then, by (3.27), it follows that x∗ (xl ) = L for all l ∈ N. We now claim that this implies that L = 0. Indeed, other wise, x∗ 6= 0, and thus, since span(D) = X, and D = −D, θ = supz∈D x∗ (z) > 0, and thus lim sup sup fxnk (z) ≥ θ, k∈N which contradicts (3.26). z∈D 3.4. CONVERGENCE THEOREM FOR THE WEAK DUAL ALGORITHM67 Lemma 3.4.2. Suppose 1 < p < ∞. Then there is a Cp > 0 such that for any a, b ∈ R (3.28) b|a+b|p−1 sign(a+b)−b|a|p−1 sign(a) ≤ Cp |a+b|p −pb|a|p−1 sign(a)−|a|p . Proof. First, note that replacing a and b simultaneously by −a and −b, the inequality does not change. We can therefore assume that a ≥ 0. Also note that for b = 0 we have equality if we let Cp = 1. Thus, we can assume that a > 0 and, since both sides of (3.28) are p-homogenous, we can assume that a = 1, and also that b 6= 0. We need therefore to show that φ(b) = b |1 + b|p−1 sign(1 + b) − 1) , |1 + b|p − pb − 1 b 6= 0, has an upper bound. We note that b(1 + b)p−1 − b b→0 (1 + b)p − pb − 1 (1 + b)p−1 + (p − 1)b(1 + b)p−2 = lim b→0 p(1 + b)p−1 − p (p − 1)(1 + b)p−2 + (p − 1)(1 + b)p−2 + (p − 1)(p − 2)b(1 + b)p−3 2 = lim = p−2 b→0 p(p − 1)(1 + b) p lim φ(b) = lim b→0 and lim φ(b) = 1, b→±∞ which implies the claim since φ is continuous. Definition 3.4.3. A Banach space X with Gateaux differentiable norm is said to have property Γ if there is a constant 0 < γ ≤ 1 so that for x, y ∈ X for which fx (y) = 0 it follows that kx + yk ≥ kxk + γfx+y (y). Remark. For x, y ∈ Lp [0, 1], fx (y) = 0 means that Z (3.29) 0 1 sign(x(t))|x(t)|p−1 y(t) dt = Z 1 sign(x(t))|x(t)|p/q y(t) dt = 0. 0 Proposition 3.4.4. If 1 < p < ∞, every quotient of Lp [0, 1] has property Γ. 68CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES Proof. We first show that Lp [0, 1] itself has property Γ. So let x, y ∈ Lp [0, 1] with fx (y) = 0. We can assume that y 6= 0, and, after dividing x and y by kyk, that kyk = 1 By Lemma 3.4.2, y(s)|x(s) + y(s)|p−1 sign(x(s) + y(s)) ≤ Cp |x(s) + y(s)|p − |x(s)|p + (1 − pCp )y(s)|x(s)|p−1 sign(x(s)). Integrating both sides and using (3.29), yields Z 1 y(s)|x(s) + y(s)|p−1 sign(x(s) + y(s)) ds ≤ Cp kx + ykpp − kxkpp . 0 d It follows from the fact that fx (y) is a positive multiple of dt kx + tykp , and the convexity of the function t 7→ kx + tyk that kxkp ≤ kx + ykp . Moreover we have p−1 ∈ S , for z ∈ L [0, 1], and thus fz = kzk1−p p sign(z(·))|z(·)| p Lq kx + ykp−1 p fx+y (y) Z 1 y(s)|x(s) + y(s)|p−1 sign(x(s) + y(s)) ds ≤ 0 ≤ Cp kx + ykpp − kxkpp d = Cp kx + yk − kxk kx + tykpp dt t=t0 (By Taylor’s Theorem for some t0 ∈ (0, 1)) d kx + tyk = Cp kx + yk − kxk pkx + ykp−1 p p dt t=t0 p−1 ≤ Cp kx + yk − kxk pkx + ykp d 0 ≤ kx + tykp ≤ 1, since kxkp ≤ kx + ykp and since kykp = 0 , dt t=t0 = kx + ykpp−1 which proves our claim if we let γ = 1/pCp . From the following more general Proposition it will follow that every quotient of Lp [0, 1] has property Γ. Proposition 3.4.5. The quotient of a reflexive space X with property Γ and Gateaux differentiable norm also has property Γ (with respect to the same constant γ). Proof. Assume that Y = X/Z, where X is a reflexive space with property Γ and Z ⊂ X is a closed subspace of X. Let x, y ∈ X, let x = x + Z, y = y + Z be the images under the quotient map Q : X → Y . Since X is reflexive we can assume that k|xbkX/Z = inf x̃∈X+Z kx̃kX and find an element w ∈ X so that kx + ykX/Z = kwkX . Note that fx = fx ◦ Q (since fx (Q(x)) = fx (x) = kxkX/Y = kxkX ) and fw = fx+y ◦ Q. Hence if fx (y) = 0 it follows that fx (u − x) = 0 and thus 3.4. CONVERGENCE THEOREM FOR THE WEAK DUAL ALGORITHM69 kx + yk = kwk = kx + (w − x)k ≥ kxk + γfx+(w−x) (w − u) = kxk + γfx+y (y). Now we are ready to show the final result which implies in particular that the (WDGA) converges in Lp [0, 1] for any dictionary D. Theorem 3.4.6. Suppose X is s Banach space with property Γ and Fréchet differentiable norm. If D is a dictionary of X and 0 < c ≤ 1 then the (WDGA) converges. Proof. By Proposition 4.2.2 (Class Notes in Functional Analysis), the map x 7→ fx is a norm continuous map between X \ {0} and SX ∗ . Let x = x0 be in X and let (xn ), (zn ) and (tn ) as in (WDGA). If tn = 0 for some n ∈ N then ρD (xn ) = 0, and since D is total it follows that xn−1 = 0 and thus xk = 0, for all k ≥ n. So we can assume without loss of generality that tn > 0 for all n ∈ N. By condition (b) it follows that d kxn−1 − tzn k|t=tn = 0, dt and thus fxn (zn ) = 0, which yields using property Γ, that kxn−1 k = kxn + tn zn k ≥ kxn k + γtn fxn−1 (zn ) and thus kxn−1 k − kxn k ≥ γfxn−1 (zn ) ≥ cγρD (xn−1 ). tn Using 3.4.1 with cγ instead of γ we only need to show that limn→∞ kxn k = PLemma ∞ 0 if P n=1 tn < ∞. If ∞ n=1 tn < ∞ then (xn ) converges to some x∞ ∈ X. To deduce a contradiction assume that x∞ 6= 0 which implies that limn→∞ kfxn − fx∞ k = 0. Now since, as observed previously, fxn (zn ) = 0, we have that limn→∞ fxn−1 (zn ) = 0, and thus by (WDGA)(a) limn→∞ ρD (xn−1 ) = 0, and thus for any z ∈ D fx∞ (z) = lim fxn−1 (z) ≤ lim ρD (xn−1 ) = 0 n→∞ n→∞ and similarly, since D = −D, −fx∞ (z) = fx∞ (−z) = lim fxn−1 (−z) ≤ lim ρD (xn−1 ) = 0 n→∞ n→∞ which implies that fx∞ (z) = 0 for all z ∈ D, and thus x∞ = 0, which is a contradiction and proves our claim. 70CHAPTER 3. GREEDY ALGORITHMS IN GENERAL BANACH SPACES Remark. Assume that the Banach space X is Gateaux differentiable and let x ∈ X. As in (WXGA) (xn ) ⊂ X, (zn ) ⊂ D and (tn ) ∈ [0, ∞) are chosen so that x0 = x, xn = xn−1 − tn zn , for n ∈ N, and so that for some c ∈ (0, 1] kxn−1 k − kxn k ≥ c sup sup kxn−1 k − kxn−1 − tgk . g∈D t≥0 P We note that, if (xn ) has a convergent subsequence (for example if n tn < ∞), then (xn ) has to converge to 0. Indeed, assume that x∞ = limk→∞ xnk exists for some subsequence (nk ). We claim x∞ = 0, and this would imply that (xn ) converges to 0, since (kxn k) is decreasing. Assume that x∞ 6= 0. Then, since D is total in X, it follows that supg∈D fx∞ > 0, and thus there exists a g ∈ cD and a t > 0 so that ε = kx∞ k − kx∞ − tgk > 0. But this would imply that for some k0 ∈ N kxnk k − kxnk+1 k ≥ c kxnk k − kxnk − tgk ≥ cε/2, whenever k ≥ k0 . But this is a contradiction since (kxn k − kxn+1 k) is a non negative and summable sequence. Chapter 4 Open Problems 4.1 Greedy Bases Problem 4.1.1. Does every infinite dimensional Banach space contain a quasi greedy basis ? Comments to Problem 4.1.1: First of all there are separable Banach spaces which have a basis but do not have quasi greedy bases (for the whole spaces). Indeed in [DKK] it was shown that a L∞ -space (for example any C(K), K compact) which is not isomorphic to c0 does not have a quasi greedy basis. Secondly Gowers and Maurey solved the unconditional basis problem and showed that not every separable Banach space contains a unconditional basic sequence (which is stronger than being quasi greedy). Nevertheless, in [DKK], Dilworth, Kalton and Kutzarova proved that all known counterexamples to the unconditional basis problem actually contain quasi greedy basic sequences. The showed the following Theorem 4.1.2. Let (xn ) be a semi normalized weakly null sequence in a Banach space X with spreading model (en ), and suppose that (en ) has the property that n X ei → ∞, if n → ∞. i=1 Then (xn ) has a subsequence which is quasi greedy and whose quasi greedy constant does not exceed 3 + ε (for given ε > 0). Here the spreading model of semi normalized sequences (xn ) ⊂ X is defined as follows: Assume that for all k ∈ N and all scalars (aj )nj=1 n X aj ej = lim j=1 n X lim ... . . . lim aj xj n1 →∞ n2 →∞ 71 nk →∞ j=1 72 CHAPTER 4. OPEN PROBLEMS exists. It is clear that ||| · ||| is a semi norm on c00 . Using Ramsey’s Theorem (some kind of generalized pigeon whole principle) one can prove that every semi normalized sequence has a subsequence so that above limit exists for all (aj ) ∈ c00 , and that if (xn ) is weakly null or basic, then ||| · ||| is a norm on c00 and (en ) is a basis of the completion of c00 with respect to ||| · |||. We call in this case this completion together with its basis (en ) the spreading model of (xn ). For more comments on the problem and its relation to the problem whether or not the Elton number has universal upper bound see also [DOSZ2]. Problem 4.1.3. Does `p (`q ), 1 < p, q < ∞ and p 6= q, have a greedy basis? Comments to Problem 4.1.3 Besov spaces are function spaces defined on the real line or on some subset of it and are of importance for example in Partial Differential Equations and Approximation Theory. It can be shown that Besov spaces of function defined on R are isomorphic to `p (`q ), 1 < p 6= q < ∞, where n o X `p (`q ) = (xn ) : xn ∈ `q , for n ∈ N, and kxn kpq < ∞ , n with the norm !1/p k(xn )k = X kxn kpq , for (xn ) ∈ `p (`q ). n It was shown in [EW] that for the space `p ⊕ `q , p 6= q, every unconditional basis (xn ) of `p ⊕ `q splits in a basis of `p and in a basis of `q , i.e. there is a partition of N into N1 and N2 , so that (xn : n ∈ N1 ) is a basis of `p and (xn : n ∈ N2 ) is a basis of `q . From that result it is easy to see that `p ⊕ `q cannot have a ∞ m n greedy basis. On the other hand the space ⊕n=1 `q p , with 1 < p < ∞ and 1 ≤ q ≤ ∞, and mn → ∞, for n → ∞, which is isomorphic to Besov spaces of function defined on the torus (or any closed bounded interval in R) has a greedy mn basis [DFOS]. In the case p = 1, ∞ it was shown in [BCLT] that ⊕∞ n=1 `2 p has a unique unconditional basis up to permutation, and thus this space cannot have a greedy basis (the usual one is clearly not democratic). From the proof in mn [BCLT] we can also deduce that for general q ∈ [1, ∞] the spaces ⊕∞ n=1 `q 1 mn and ⊕∞ have no greedy bases, unless, of course, in the trivial case that n=1 `q c0 p = q = ∞ or p = q = 1. Problem 4.1.4. Given any ε > 0, can a Banach space with normalized a greedy basis (en ) be renormed so that (en ) is (1 + ε)-greedy? Comments to 4.1.4: First Albiac and Wojtaszczyk [AW] asked whether or not every Banach space with normalized a greedy basis (en ) can be renormed so that (en ) is 1-greedy. This was solved negatively (recall that by Theorem 1.1.9 every 1-greedy basis must be 1-democratic) in [DOSZ2] where the following was shown: 4.2. GREEDY ALGORITHMS 73 Proposition 4.1.5. Assume that X is a Banach space with a normalized suppression 1-unconditional basis (ei ) and that there is a sequence (ρn ) ⊂ (0, 1] with ρ = inf n∈N ρn > 0 so that X whenever n ∈ N and E ⊂ N with #E = n . ei = ρn n i∈E Then (ei ) is ρ2 -equivalent to the unit vector basis of `1 . Corollary 4.1.6. 1. Hardy space H1 cannot be renormed so that the the Haarbasis in H1 (which is greedy) is 1-greedy. 2. Tsirelson space T1 cannot be renormed so that it has a 1-greedy basis. Problem 4.1.7. Can Lp [0, 1], 1 < p < ∞ be renormed so that the Haar basis becomes 1-greedy, or at least (1 + ε)- greedy? Coments to Problem 4.1.7: In Corollary 1.2.2 it was shown that Haar basis in Lp [0, 1] is greedy and in [DOSZ2] it was shown that one can renorm Lp [0, 1] so that the Haar basis is 1-democratic and 1-unconditional. But the fact that a basis is 1-democratic and suppression 1-unconditional does only imply that it is 1- greedy. From Theorem 1.1.9 it follows that every suppression 1-unconditional and 1-democratic basis is only at least 2-greedy and in [DOSZ2] it was shown that this is optimal and that for any ε > 0 there is a basic sequence which is 1democratic and suppression 1-unconditional (even 1-unconditional) which is not 2 − ε-greedy. 4.2 Greedy Algorithms For Hilbertspace it is sill open to find better convergence rates of the pure greedy algorithm but for general Banach spaces the questions about the convergence of X-Greedy Algorithm in are quite basic and wide open. Problem 4.2.1. Find one infinite dimensional Banach space X, other than Hilbert space, on which (X-PGA) converges for any dictionary? Problem 4.2.2. Does (X-PGA) converge on `p , 1 < p < ∞, p 6= 2, for any dictionary? Comments on Problem 4.2.2: In [DKSTW] at least the weak convergence of the Pure Greedy Algorithm was shown: Theorem 4.2.3. [DKSTW, Theorem 3.2] Suppose that for n ∈ N Xn is a finite dimensional space whose norm is Gâtaux differentiable. Then the weak pure greedy algorithm with fixed weakness factor converges weakly. 74 CHAPTER 4. OPEN PROBLEMS Problem 4.2.4. Does (X-PGA) converge in Lp [0, 1], 1 < p < ∞, p 6= 2, at least if one takes the Haar basis as dictionary? Comments on Problem: 4.2.3: The following finite dimensional version to Problem 4.2.4 was shown in [DOSZ3] Theorem 4.2.5. Let 1 < p < ∞ and let hj : j ∈ N be Haar basis of Lp (ordered consistently with the usual partial order). For each m there is number N = (N (p, m)) so that X- PGA terminates after N steps, assuming that the starting point was chosen in span(xj : j ≤ m). Problem 4.2.6. Are there examples of separable and uniform smooth Banach spaces X with dictionaries for which the (weak) dual greedy algorithm does not converge? Chapter 5 Appendix A: Bases in Banach spaces 5.1 Schauder bases In this section we recall some of the notions and results presented in the course on Functional Analysis in Fall 2012 [Schl]. Like every vector space a Banach space X admits an algebraic or Hamel basis, i.e. a subset B ⊃ X, so that every x ∈ X is in a unique way the (finite) linear combination of elements in B. This definition does not take into account that we can take infinite sums in Banach spaces and that we might want to represent elements in X as converging series. Hamel bases are also not very useful for Banach spaces, since (see Exercise 1), the coordinate functionals might not be continuous. Definition 5.1.1. [Schauder bases of Banach Spaces] Let X be an infinite dimensional Banach space. A sequence (en ) ⊂ X is called Schauder basis of X, or simply a basis of X, if for every x ∈ X, there is a unique sequence of scalars (an ) ⊂ K so that x= ∞ X an en . n=1 Examples 5.1.2. For n ∈ N let en = ( 0, . . . 0 , 1, 0, . . .) ∈ KN | {z } n−1 times Then (en ) is a basis of `p , 1 ≤ p < ∞ and c0 . We call (en ) the unit vector of `p and c0 , respectively. Remarks. Assume that X is a Banach space and (en ) a basis of X. Then 75 76 CHAPTER 5. APPENDIX A: BASES IN BANACH SPACES a) (en ) is linear independent. b) span(en : n ∈ N) is dense in X, in particular X is separable. c) Every P∞ element x is uniquely determined by the sequence (an ) so Nthat x = j=1 an en . So we can identify X with a space of sequences in K . Proposition 5.1.3. Let (en ) be the Schauder basis of a Banach space X. For n ∈ N and x ∈ X define e∗n (x) ∈ K to be the unique element in K, so that x= ∞ X e∗n (x)en . n=1 e∗n Then : X → K is linear. For n ∈ N let Pn : X → span(ej : j ≤ n), x 7→ n X e∗n (x)en . j=1 Then Pn : X → X are linear projections onto span(ej : j ≤ n) and the following properties hold: a) dim(Pn (X)) = n, b) Pn ◦ Pm = Pm ◦ Pn = Pmin(m,n) , for m, n ∈ N, c) limn→∞ Pn (x) = x, for every x ∈ X. Pn , n ∈ N, are called the Canonical Projections for (en ) and (e∗n ) the Coordinate Functionals for (en ) or biorthogonals for (en ). Theorem 5.1.4. Let X be a Banach space with a basis (en ) and let (e∗n ) be the corresponding coordinate functionals and (Pn ) the canonical projections. Then Pn is bounded for every n ∈ N and b = sup ||Pn kL(X,X) < ∞, n∈N and thus e∗n ∈ X ∗ and ke∗n kX ∗ = kPn − Pn−1 k 2b ≤ . ken k ken k We call b the basis constant of (ej ). If b = 1 we say that (ei ) is a monotone basis. Furthermore ∞ ∞ n X X X ||| · ||| : X → R+ , a e → 7 a e = sup a e i i i i i i , 0 j=1 j=1 n∈N j=1 is an equivalent norm under which (ei ) is a monotone basis. 5.1. SCHAUDER BASES 77 Definition 5.1.5. [Basic Sequences] Let X be a Banach space. A sequence (xn ) ⊂ X \ {0} is called a basic sequence if it is a basis for span(xn : n ∈ N). If (ej ) and (fj ) are two basic sequences (in possibly two different Banach spaces X and Y ), we say that (ej ) and (fj ) are isomorphically equivalent if the map n n X X T : span(ej : j ∈ N) → span(fj : j ∈ N), aj ej 7→ aj fj , j=1 j=1 extends to an isomorphism between the Banach spaces between span(ej : j ∈ N) and span(fj : j ∈ N). Note that this is equivalent with saying that there are constants 0 < c ≤ C so that for any n ∈ N and any sequence of scalars (λj )nj=1 it follows that n n n X X X c λj ej ≤ λj fj ≤ C λj ej . j=1 j=1 j=1 Proposition 5.1.6. Let X be Banach space and (xn : n ∈ N) ⊂ X \ {0}. Then (xn ) is a basic sequence if and only if there is a constant K ≥ 1, so that for all m < n and all scalars (aj )nj=1 ⊂ K we have (5.1) m n X X ai xi ≤ K ai xi . i=1 i=1 In that case the basis constant is the smallest of all K ≥ 1 so that (5.1) holds. Theorem 5.1.7. [The small Perturbation Lemma] Let (xn ) be a basic sequence in a Banach space X, and let (x∗n ) be the coor∗ dinate functionals (they are elements of span(xj : j ∈ N) ) and assume that (yn ) is a sequence in X such that (5.2) c= ∞ X kxn − yn k · kx∗n k < 1. n=1 Then a) (yn ) is also basic in X and isomorphically equivalent to (xn ), more precisely ∞ ∞ ∞ X X X (1 − c) an xn ≤ an yn ≤ (1 + c) an xn , n=1 n=1 for all in X converging series x = n=1 P n∈N an xn . b) If span(xj : j ∈ N) is complemented in X, then so is span(yj : j ∈ N). 78 CHAPTER 5. APPENDIX A: BASES IN BANACH SPACES c) If (xn ) is a Schauder basis of all of X, then (yn ) is also a Schauder basis of X and it follows for the coordinate functionals (yn∗ ) of (yn ), that yn∗ ∈ span(x∗j : j ∈ N), for n ∈ N. Now we recall the notion of unconditional basis. First the following Proposition. Proposition 5.1.8. For a sequence (xn ) in Banach space X the following statements are equivalent. a) For any reordering P (also called permutation) σ of N (i.e. σ : N → N is bijective) the series n∈N xσ(n) converges. b) For any ε > 0 there is an n ∈ N so that whenever M ⊂ N is finite with P min(M ) > n, then n∈M xn k < ε. P c) For any subsequence (nj ) the series j∈N xnj converges. P d) For sequence (εj ) ⊂ {±1} the series ∞ j=1 εj xnj converges. P In the case that above conditions hold we say that the series xn converges unconditionally. Definition 5.1.9. A basis (ej ) for P a Banach space X is called unconditional, if for every x ∈ X the expansion x = he∗j , xiej converges unconditionally, where (e∗j ) are coordinate functionals of (ej ). A sequence (xn ) ⊂ X is called an unconditional basic sequence if (xn ) is an unconditional basis of span(xj : j ∈ N). Proposition 5.1.10. For a sequence of non zero elements (xj ) in a Banach space X the following are equivalent. a) (xj ) is an unconditional basic sequence, b) There is a constant C, so that for all finite B ⊂ N, all scalars (aj )j∈B ⊂ K, and A ⊂ B X X (5.3) aj xj ≤ C aj xj . j∈A j∈B c) There is a constant C 0 , so that for all finite sets B ⊂ N, all scalars (aj )j∈B ⊂ K, and all (εj )j∈B ⊂ {±1}, if K = R, or (εj )j∈B ⊂ {z ∈ C : |z| = 1}, if K = C, (5.4) n X X 0 ε a x ≤ C a x j j j j j . j∈B j=1 5.2. MARKUSHEVICH BASES 79 In this case we call the smallest constant C = Cs which satisfies (5.3) for all n, A ⊂ {1, 2 . . . , n} and all scalars (aj )nj=1 ⊂ K the supression-unconditional constant of (xn ) and we call the smallest constant C 0 = Cu so that (5.4) holds for all n, (εj )nj=1 ⊂ {±1}, or (εj )nj=1 ⊂ {z ∈ C : |z| = 1}, and all scalars (aj )nj=1 ⊂ K the unconditional constant of (xn ). Moreover, it follows that Cs ≤ Cu ≤ 2Cs . (5.5) Proposition 5.1.11. Let (xn ) be an unconditional basic sequence. Then (5.6) ∞ ∞ n X o X Cu = sup ai bi xi : x = ai xi ∈ BX and |bi | ≤ 1 . j=1 i=1 Remark. While for Schauder bases it is in general important how we order them, the ordering is not relevant for unconditional bases. We can therefore index unconditional bases by any countable set. 5.2 Markushevich bases Not every separable Banach space has a Schauder basis [En]. But it has at least a bounded and norming Markushevich basis according to a result of Ovsepian and Pelczyński [OP]. We want to present this result in this section, Definition 5.2.1. A countable family (en , e∗n )n∈N ⊂ X × X ∗ is called • biorthogonal, if e∗n (em ) = δ(m,n) , for all m, n ∈ N, • fundamental, or complete, if span(en : i ∈ N) is dense in X, • total, if for any x ∈ X with e∗n (x) = 0, for all n ∈ N, it follows that x = 0, • norming, if for some constant c > 0, sup |x∗ (x)| ≥ ckxk, for all x ∈ X. x∗ ∈span(e∗n :n∈N)∩BX ∗ and in that case we also say that (en , e∗n )n∈N is c-norming, • shrinking, if span(e∗n : n ∈ N) = X ∗ , and • bounded, or uniformly minimal, if C = supn∈N ken k·ke∗n k < ∞, and we say in that case that (en , e∗n )n∈N is C-bounded and call C the bound of (en , e∗n )n∈N . A biorthogonal, fundamental and total sequence (en , e∗n )n∈N is called an Markushevich basis or simply M -Basis. 80 CHAPTER 5. APPENDIX A: BASES IN BANACH SPACES Remark. Assume (en , e∗n ) is an M -basis. It follows from the totality that span(e∗n : n ∈ N) is w∗ -dense in X ∗ Thus in every reflexive space M -bases are shrinking, and shrinking M bases are 1-norming. Our goal is to prove following Theorem 5.2.2. [OP] Every separable Banach space X admits a bounded, norming M -basis which can be chosen to be shrinking if X ∗ is (norm) separable.√ Moreover, the bound of that M -basis can be chosen arbitrarily close to 4(1 + 2)2 . Remark. Pelczyński [Pe] improved later the above result and showed that for all separable Banach spaces and all ε > 0 there exists a bounded M -basis, whose bound does not exceed 1 + ε. It is an open question whether or not every separable Bach space has a 1bounded M -basis. But it is not hard to show that a space X with a bounded and norming M -basis can be renormed so that this basis becomes 1-bounded and 1 norming. Remark. It might be nice to know that every separable Banach space has a bounded and norming Markushevish basis (ei , e∗i ) . Nevertheless, given z ∈ X, we do not have any (set aside a good one) procedure to approximate z by finite linear combinations of the ei , we only know that such an approximation exists. This is precisely the difference to Schauder bases, for which we know that the canonical projections converge point wise. Lemma 5.2.3. [LT, Lemma 1.a.6] Assume that X is an infinite dimensional space and that F ⊂ X and G∗ ⊂ X ∗ are finite dimensional subspaces of X and X ∗ , respectively. Let ε > 0. Then there is an x ∈ X, kxk = 1 and an x∗ ∈ X ∗ so that kx∗ k ≤ 2 + ε, x∗ (x) = 1, z ∗ (x) = 0, for all z ∗ ∈ G∗ , and x∗ (z) = 0 for all z ∈ F. Proof. Let (yi∗ )m i=1 ⊂ SX ∗ be finite and 1/(1 + ε) norming the space F . Pick x ∈ ⊥ {yj∗ : j = 1, 2, . . . m} ∪ G∗ = z ∈ X : yj∗ (z) = 0, j = 1, 2 . . . m, and z ∗ (z) = 0, z ∗ ∈ G∗ , with kxk = 1. It follows for all λ ∈ R and all y ∈ F that kyk ||y + λxk ≥ max yj∗ (y + λx) = max yj∗ (y) ≥ . 1+ε Then define u∗ : span(F ∪ {x}) → R, y + λx 7→ λ. We claim that ku∗ k ≤ 2 + ε. Indeed, let y ∈ F , y 6= 0, and λ ∈ R. Then ! |λ| ∗ λx + y u = kλx + yk kλx + yk 5.2. MARKUSHEVICH BASES ≤ 81 kyk 2 kλx+yk ≤ 2(1 + ε) 2kyk 2kyk−kyk =2 if |λ| ≤ 2kyk, if |λ| > 2kyk. Letting now x∗ be a Hahn Banach extension of u∗ onto all of X, our claim is proved. Lemma 5.2.4. ([Ma], see also [HMVZ, Lemma 1.21]) Let X be an infinite dimensional Banach space. Suppose that (zn ) ⊂ X and (zn∗ ) ⊂ X ∗ are sequences so that span(zn : n ∈ N) and (zn∗ : n ∈ N) are both infinite dimensional and so that (M1) (zn ) separates points of span(zn∗ : n ∈ N), (M2) (zn∗ ) separates points of span(zn : n ∈ N). Let N ⊂ N be co-infinite, and ε > 0. Then we can choose a biorthogonal system (xn , x∗n ) ⊂ span(zn : n ∈ N) × span(zn∗ : n ∈ N) with (5.7) (5.8) span(zn : n ∈ N) ⊂ span(xn : n ∈ N) and span(zn∗ : n ∈ N) ⊂ span(x∗n : n ∈ N) sup kx∗n k · kx∗n k < 2 + ε. n∈N Remark. Note that (xn , x∗n ) is an M -basis if span(zn : n ∈ N) is dense in X, ∗ ∩ span(z ∗ ) is norming X. and if (zn∗ ) separates points of X. It is norming if BX n Proof. Choose s1 = min{n ∈ N : zn 6= 0}. x1 = zs1 /kzs1 k. If 1 ∈ N choose x∗1 ∈ SX ∗ with x∗1 (x1 ) = 1. Otherwise choose x∗1 = zt∗1 with t1 = min{m ∈ N : ∗ (x ) 6= 0} (which exists by (M1)). zm 1 Write N \ N = {k1 , k2 , . . .} and proceed by induction to choose x1 , x2 , . . . xn and x∗1 , . . . x∗n as follows. Assume that x1 , x2 , . . . xn and x∗1 , . . . x∗n have been been chosen. Case 1: n + 1 ∈ N . Then let F = span(xi : i ≤ n) and G∗ = span(x∗i : i ≤ n); and choose xn+1 and x∗n+1 by Lemma 5.2.3. Case2: n + 1 = k2j−1 ∈ N \ N . Then let s2j−1 = min s : zs 6∈ span(xi : i ≤ n) and define xn+1 = zs2j−1 − n X i=1 x∗i (zs2j−1 )xi . 82 CHAPTER 5. APPENDIX A: BASES IN BANACH SPACES This implies that x∗i (xn+1 ) = 0 for i = 1, 2, . . . n. Next choose (using (M2)) t2j−1 = min{t : zt∗ (xn+1 ) 6= 0} and let x∗n+1 = zt∗2j−1 − Pn ∗ ∗ i=1 zt2j−1 (xi )xi zt∗2j−1 (xn+1 ) , which yields that x∗n+1 (xi ) = 0, for i = 1, 2 . . . n, and x∗n+1 (xn+1 ) = 0. Case 3: n + 1 = k2j ∈ N \ N . Then we choose t2j = min s : zs∗ 6∈ span(x∗i : i ≤ n) . Let x∗n+1 = zs∗2j − n X zs2j−1 (xi )x∗i , i=1 and hence x∗n+1 (xi ) = 0, for i = 1, 2 . . . n, and then let (using (M1) s2j = min{s : x∗n+1 (xs ) 6= 0}, and xn+1 = zs2j − Pn ∗ i=1 xi (zs2j ))xi x∗n+1 (zs2j ) , which implies that x∗i (xn+1 ) = 0, for i = 1, 2 . . . n and x∗n+1 (xn+1 ) = 1. We insured by this choice that (xi , x∗i ) : i ∈ N is a biorthogonal sequence in X × X ∗ which also satisfies (5.8) and, since for any m ∈ N we have span(zi : i ≤ m) ⊂ span(xk2j−1 : j ≤ m) and span(zi∗ : i ≤ m) ⊂ span(x∗k2j : j ≤ m), (xi , x∗i ) : i ∈ N it also satisfies (5.7). n For n ∈ N we consider on `22 the discrete Haar basis {h0 } ∪ {h(r,s) , r = 0, 1, . . . , n − 1, and s = 0, 1, . . . 2r−1 − 1}, with h0 = 2−n/2 χ{1,2,3...,2n } χ{2s2n−r−1 +1,2s2n−r−1+2,...(2s+1)2n−r−1 } −χ{(2s+1)2n−r−1 +1,(2s+1)2n−r−1+2,...(2s+2)2n−r−1 } h(r,s) = 2(n−r)/2 if r = 0, 1, 2, . . . n − 1 and s = 0, 1, 2 . . . 2r − 1. n The unit vector basis (ei )2i=1 as well as the Haar basis {h0 } ∪ {h(r,s) , r = 0, 1, . . . , n − 1, s = 0, 1, . . . 2r−1 − 1} 5.2. MARKUSHEVICH BASES 83 n are orthonormal bases in `22 . Thus the matrix A = A(n) with the property that A(e1 ) = h0 and A(e2r +s+1 ) = h(r,s) is a unitary matrix. If we write (n) A(n) = (a(i,j) : 0 ≤ i, j ≤ 2n − 1) (n) then it follows for k = 0, 1, 2 . . . 2n − 1 that a(k,0) = h0 (k) = 2−n/2 , and if r = 0, 1, 2 . . . n − 1 and s = 0, 1, . . . 2r − 1 that (n) a(k,2r +s) = h(r,s) (k) −(n−r)/2 2 = −2−(n−r)/2 0 and thus 2−n/2 2−n/2 .. . .. . .. . .. . if k ∈ {2s2n−r−1 + 1, 2s2n−r−1 +2, . . . (2s+1)2n−r−1 }, if {(2s + 1)2n−r−1 + 1, (2s+ 1)2n−r−1 +2, . . . (2s+2)2n−r−1 }, if k ≤ 2s2n−r−1 or k > (2s+2)2n−r−1 . 2−(n−1))/2 .. . 0 .. . .. . .. . .. . −n/2 2 .. . 2−(n−1))/2 . .. −2−(n−1))/2 .. .. . . . .. 2−n/2 −2−(n−1)/2 0 A= . .. −2−n/2 0 2−(n−1))/2 .. .. .. .. . . . . .. .. .. (n−1))/2 . . . 2 . . . .. .. .. −2−(n−1))/2 .. .. .. .. . . . . 2−n/2 −2−n/2 0 −2−(n−1)/2 ··· 2−1/2 ··· −21/2 · · · ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· 0 .. . .. . .. . .. . .. . .. . .. . .. . 0 ··· ··· 0 .. . .. . .. . .. . .. . .. . .. . .. . .. . ··· ··· 2−1/2 −2−1/2 ··· ··· ··· ··· ··· ··· ··· It follows therefore that for all k = 0, 1 . . . , 2n − 1 we have (5.9) n −1 2X j=1 √ n X X √ (n) n−1 2 1 i −(n−r)/2 a = √ √ ≤ = 1 + 2, 2 = (k,j) 2 2−1 r=0 i=1 because, leaving out the first column, in each row and for each r ∈ {0, 1, 2 . . . n−1} the value 2(n−r)/2 is absolutely taken exactly once. This implies the following: 84 CHAPTER 5. APPENDIX A: BASES IN BANACH SPACES Corollary 5.2.5. If (xi , x∗i ) : i = 0, 1, . . . 2n − 1 is a biorthogonal sequence of length 2n , in a Banach space X and we let (5.10) ek = n −1 2X (n) a(k,j) xj and j=0 (5.11) e∗k = n −1 2X (n) a(k,j) x∗j for k = 0, 1, . . . 2n − 1, j=0 then (5.12) (5.13) (5.14) (5.15) (5.16) max n kek k < (1 + 0≤k<2 √ √ 2) max n kxk k + 2−n/2 kx0 k, 0≤k<2 max ke∗k k < (1 + 2) max n kx∗k k + 2−n/2 kx∗0 k, 1≤k<2n 0≤k<2 ∗ n (ej , ej ) : j = 0, 1, . . . , 2 − 1 is biorthogonal span(ej : 0 ≤ j < 2n ) = span(xj : 0 ≤ j < 2n ) and span(e∗j : 0 ≤ j < 2n ) = span(x∗j : 0 ≤ j < 2n ). Proof of Theorem 5.2.2 . Let δ > 0 and put M = 2 + δ. We start with a fundamental sequence (zi ) ⊂ X and a w∗ -dense sequence (zi∗ ) ⊂ BX ∗ ((BX ∗ , w∗ ) is separable if X is norm separable) , which we choose norm dense if X ∗ is norm separable. Then we use Lemma 5.2.4 to choose a norming (reps. shrinking) M basis (xn , x∗n ) : n ∈ N of X which satisfies for N being the odd numbers the conditions (5.7) and (5.8). Without loss of generality we assume that kxn k = 1, ∗ for n ∈ N. Now we will define a reordering (x̃n , x̃n ) : n ∈ N of (xn , x∗n ) : n ∈ N as follows: P By induction we choose for ` ∈ N a number m` ∈ N and define q` = `j=1 2mj and q0 = 0, and choose x̃q`−1 , x̃∗q`−1 , x̃q`−1 +1 , x̃∗q`−1 +1 , . . . , x̃q` −1 , x̃∗q` −1 as follows Assume that for all 0 ≤ r < `, ` ≥ 1, mr , and (x̃0 , x̃∗0 ), (x̃1 , x̃∗1 ), . . . (x̃qr −1 , x̃∗qr −1 ) have been chosen. Put s` = min s : (x2s , x∗2s ) 6∈ {(x̃t , x̃∗t ) : t ≤ q`−1 − 1} (recall that ((x2s , x∗2s ) : s ∈ N) are the elements of ((xs , x∗s ) : s ∈ N for which we do not control the norm) and choose m` ∈ N, so that √ √ √ (1 + 2) + 2−m` /2 · (1 + 2)M + kx∗2s` k2−m` /2 < (1 + 2)2 M + δ. Then let (x̃q`−1 , x̃∗q`−1 ) = (x2s` , x∗2s` ) while (x̃q`−1 +1 , x̃∗q`−1 +1 ) . . . (x̃q` −1 , x̃∗q` −1 ) con sist of the elements of (x2t−1 , x∗2t−1 ) : t ∈ N ) which are not in the set {(x̃t , x̃∗t ) : t ≤ q`−1 − 1} and have the lowest 2m` − 1 indices. 5.2. MARKUSHEVICH BASES 85 By that choice we made sure that allelements of (xt , x∗t ) : t ∈ N appear exactly once in the sequence x̃t , x̃∗t : t ∈ N . Then we apply Corollary 5.2.5 and define for k = 0, 1, 2, . . . 2m` −1 eq`−1 +k = ` −1 2m X j=0 (m` ) a(k,j) x̃q`−1 +j and e∗q`−1 +k = ` −1 2m X (m ) ` a(k,j) x̃∗q`−1 +j . j=0 It follows then from (5.12) and (5.13) that for k = 0, 1, 2, . . . 2m` −1 √ keq`−1 +k k · ke∗q`−1 +k k ≤ (1 + 2)2 M + δ. √ 2 √ 2 Choosing δ > 0 small enough we can ensure that (1+ 2) M +δ < 2(1+ 2) +ε. Since (xn , x∗n ) : n ∈ N is a norming M -basis, it follows from (5.14), (5.15) and (5.16) that (en , e∗n ) : n ∈ N is a norming M basis which is shrinking if (xn , x∗n ) : n ∈ N is shrinking. 86 CHAPTER 5. APPENDIX A: BASES IN BANACH SPACES Chapter 6 Appendix B: Some facts about Lp[0, 1] and Lp(R) 6.1 The Haar basis and Wavelets We recall the definition of the Haar basis of Lp [0, 1]. Let T = {(n, j) : n ∈ N0 , j = 0, 1 . . . , 2n − 1} ∪ {0}. Let 1 ≤ p < ∞ be fixed. We define the Haar basis (ht )t∈T and the normalized (p) Haar basis (ht )t∈T in Lp [0, 1] as follows. (p) h0 = h0 ≡ 1 on [0, 1] and for n ∈ N0 and j = 0, 1, 2 . . . 2n − 1 we put h(n,j) = 1[j2−n ,(j+ 1 )2−n ) − 1[(j+ 1 )2−n ,(j+1)2−n ) . 2 2 and we let ∆(n,j) = supp(h(n,j) ) = j2−n , (j + 1)2−n , h 1 −n −n ∆+ = j2 , j + 2 (n,j) 2 h 1 −n − −n ∆(n,j) = j + 2 , (j + 1)2 . 2 (∞) We let h(n,j) = h(n,j) . And for 1 ≤ p < ∞ (p) h(n,j) = h(n,j) = 2n/p 1[j2−n ,(j+ 1 )2−n − 1[(j+ 1 )2−n ),(j+1)2−n ) . 2 2 kh(n,j) kp Theorem 6.1.1. [Schl, Theorems 3.2.2, 5.5.1] (p) (p) We order (ht : t ∈ T ) into as sequence (hn : n ∈ N), with h1 = h0 , and the property that if 2 ≤ m < n, then either supp(hn ) ⊂ supp(hm ) or supp(hm ) ∩ supp(hn ) = ∅. Then (hn ) is a monotone basis for Lp [0, 1]. 87 88CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) (p) For 1 < p < ∞, (ht : t ∈ T ) is an unconditional basis for Lp [0, 1], but it is not unconditional for p = 1. In fact L1 [0, 1] does not embed into a Banach space with unconditional basis. Define h = 1[0,1/2]−1(1/2,1] . Then we can write for n ∈ N0 and j = 0, 1, . . . 2n −1. h(n,j) (t) = h(2n t − j), for t ∈ [0, 1], and (p) h(n,j) (t) = 2n/p h(2n t − j), for t ∈ [0, 1]. We define now for all n ∈ Z and all j ∈ Z a function h(n,j) as follows (6.1) h(n,j) (t) = h(2n t − j), for t ∈ R, and (6.2) h(n,j) (t) = 2n/p h(2n t − j), for t ∈ R. (p) For all n, j ∈ Z we have supp(h(n,j) ) := {t : h(n,j) (t) < 0} = {t : 0 ≤ 2n t−j ≤ 1} = [j2−n , (j+1)2−n ) =: ∆(n,j) and h 1 −n 2 =: ∆+ {h(n,j) > 0} = j2−n , j + (n,j) , 2 h 1 −n {h(n,j) < 0} = j + 2 , j + 1)2−n =: ∆− (n,j) , 2 We note for and (m, i) and (n, j) in Z × Z that Either ∆(m,i) ⊂ ∆(n,j) or ∆(n,j) ⊂ ∆(m,i) or ∆(m,i) ∩ ∆(n,j) = ∅. (6.3) Theorem 6.1.2. Let 1 ≤ p < ∞. (p) 1. {h(n,j) : n ∈ N0 , j ∈ Z} ∪ {1[j,j+1) : j ∈ Z}, when appropriately ordered, is a monotone basis for Lp (R), which is unconditional if 1 < p < ∞. (p) 2. {h(n,j) : n ∈ Z, j ∈ Z} is an unconditional basis for Lp (R) if 1 < p < ∞. Remark. Note that the second part of Theorem 6.1.2 is wrong for p = 1. Indeed, the integral functional Z ∞ I : L1 (R) → R, f 7→ f (x) dx, −∞ is a bounded linear functional on L1 (R) which is not identical to the zero(1) functional, but for all n, j ∈ Z h(n,j) is in the kernel of I, and thus the span (1) of the h(n,j) cannot be dense in L1 (R). The same argumentation is invalid for 1 < p < ∞ since I is unbounded on Lp (R), if p > 1. 6.1. THE HAAR BASIS AND WAVELETS 89 Proof. First note that Lp (R) is isometrically isomorphic to (⊕i∈Z Lp [i, i + 1]) via the map Lp (R) → (⊕i∈Z Lp [i, i + 1]), f 7→ (f |[i,i+1] : i ∈ Z), and that for i ∈ Z, by Theorem 6.1.1, the shifted Haar basis (p,i) h(n,j) :n ∈ N0 , j = 0, 1, . . . , 2n −1 ) ∪ {1[i,i+1) } (p) = h(n,j) : n ∈ N0 , j = 2n i, 2n i+1, 2n i+2 . . . 2n (i+1) − 1 ∪ {1[i,i+1) } is a monotone basis for Lp [i, i+1], if ordered appropriately, which is unconditional if 1 < p < ∞. Since Lp (R) is the 1-unconditional sum of the spaces Lp [i, i + 1), (p,i) i ∈ Z the union of h(n,j) : n ∈ N0 , j = 0, 1, . . . , 2n −1 ) ∪ {1[i,i+1) over all i ∈ Z is a monotone basis of Lp [0, 1], if ordered appropriately, which is unconditional if p > 1 In order to show (2) we assume B ⊂ Z × Z is finite and A ⊂ B and then verify condition (5.3). Since B is finite, there is n1 , ∈ N so that ∆(n,j) ⊂ ∆(−n1 ,0) = [0, 2n1 ) for all (n, j) ∈ B + = {(m, i) ∈ Z × N0 } ∩ B, and there is n2 , ∈ N so that ∆(n,j) ⊂ ∆(−n2 ,−1) = [−2n2 , 0) for all (n, j) ∈ B − = {(m, i) ∈ Z × (−N)} ∩ B, Since the Lp -nrm is shift invariant, it is enough to assume that B − = ∅ and B = B+. Consider the map (rescaling) φ : ∆(0,0) = [0, 1] → ∆(−n1 ,0) , t 7→ t2n1 and the map T : Lp (∆(−n1 ,0) ) → Lp [0, 1], f 7→ 2n1 /p f ◦ φ, which is an isometry between Lp (∆(n1 ,0) ) and Lp [0, 1], mapping the family (p) {h(n,j) : ∆(n,j) ⊂ ∆(n1 ,0) } ∪ {1∆(n1 ,0) } into the Haar basis of Lp [0, 1] (p) {h(n,j) : ∆(n,j) ⊂ ∆(0,0) } ∪ {1[0,1] }. (p) This proves with Theorem 6.1.1 that {h(n,j) : ∆(n,j) ⊂ ∆(n1 ,0) } ∪ {1∆(n1 ,0) } is unconditional and therefore (5.3) is satisfied. (p) It is left to show that the closed linear span of {h(n,j) : n ∈ Z, j ∈ Z} is all of Lp (R). By part (1) and using shifts, it is enough to show that 1[0,1] is in the (p) closed linear span of {h(n,j) : n ∈ Z, j ∈ Z}. 90CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) Notice that for N ∈ N we have N X 2−(n+1) 1[0,2n ) − 1[2n ,2n+1 ) n=0 = 1[0,1) N X 2−(n+1) n=0 +1[1,2) N X 2−(n+1) − 2−1 2−(n+1) − 2−2 n=1 +1[2,4) N X n=2 .. . +1[2N −1 ,2N ) (2−(N +1) − 2−N ) −1[2N ,2N +1 ) 2−(N +1) = 1[0,1) (1 − 2−(N +1) ) − 1[1,2N ) 2−(N +1) − 1[2N ,2N +1 ) 2−(N +1) . For the last equality note that for k = 1, 2, . . . N − 1 N X 2−n−1 − 2−k = 2−k − 2−(N +1) − 2−k = −2−(N +1) . n=k Since we assumed that p > 1, it follows that Lp − lim N →∞ N X 2−(n+1) 1[0,2n ) − 1[2n ,2n+1 ) = 1[0,1) , n=0 which finishes the prove of our claim. Definition 6.1.3. A function Ψ ∈ L2 (R) is called wavelet if the family (Ψ(n,j) : n, j ∈ Z), defined by Ψ(n,j) (t) = 2n/2 Ψ(2n t − j), for t ∈ R and n, j ∈ Z, is an orthonormal basis of L2 (R). Definition 6.1.4. A Multi Resolution Analysis of L2 (R) (MRA) is sequence of closed subspaces (Vn : n ∈ Z) of L2 (R) such that (MRA1) . . . V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 . . ., S (MRA2) n∈Z Vn = L2 (R), 6.1. THE HAAR BASIS AND WAVELETS (MRA3) T n∈Z Vn 91 = {0}, (MRA4) Vn = {f (2+n (·) ) : f ∈ V0 }, for n ∈ Z, (MRA5) there is a compactly supported function Φ ∈ V0 , so that (Φ((·)−m) : m ∈ Z) is an orthonormal basis of V0 . In this case we call Φ a scaling function of the MRA (Vn : n ∈ Z). Note that (MRA5) implies (MRA6) V0 (and thus any Vn ) is translation invariant by integer shifts, i.e. f ∈ V0 ⇐⇒ f ((·) − j) ∈ V0 , for all j ∈ Z and For h ∈ R and f ∈ L2 (R) we put Th : L2 (R) → L2 (R), f 7→ f ((·) − h) (Shift to there right by h units) Jh : L2 (R) → L2 (R), f 7→ 2h/2 f ((·)2h ) (Scaling ) Remark. For h ∈ R the operators Th and Jh are isometries and (6.4) Th−1 = T−h and Jh−1 = J−h . We can rephrase (MRA4) and (MRA6) equivalently as follows (MRA4’) Vn = Jn (V0 ), for n ∈ Z, and (MRA6’) V0 = Tn (V0 ) for all n ∈ Z. Finally note that (MRA4), (MRA6) and (MRA5) implies that for j ∈ Z (MRA5’) {2j/2 Φ(2j (·) − k) : j, k ∈ Z} is an orthonormal basis of Vj . and note that Th as well as Jh are both isometries on L2 (R). Example 6.1.5. Take Φ = 1[0,1) , and for (n ∈ Z), put Vn = span(Jn ◦ Tj (φ) : j ∈ Z) = span(1[j2−n ,(j+1)2−n ) ; j ∈ Z). (Note 1[j2−n ,(j+1)(2−n ) (2−n (·)) = 1[j,j+1) (·) ∈ V0 ). Then (Vn ) is an MRA. We now discuss how to produce a wavelet Ψ starting with an MRA (Vn : n ∈ Z) with scaling function Φ. We denote the orthogonal complement of Vj inside Vj+1 by Wj ; this means that any f ∈ Vj+1 can be written as f = g + h with g ∈ Vj and h ∈ Wj and 92CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) p kf k2 = kgk22 + khk22 . We write Vj+1 = Wj ⊕ Vj . Since Jj is an unitary operator (keeping orthogonality), we deduce that Vj ⊕ Wj = Vj+1 Jj (V1 ) = Jj (V0 ⊕ W0 ) = Jj (V0 ) ⊕ Jj (W0 ) = Vj ⊕ Jj (W0 ). Since the orthogonal complement to a subspace W of a Hilbert space is unique (namely W ⊥ = {h ∈ H : ∀w ∈ W hh, wi = 0}), we obtain Wj = Jj (W0 ) for all j ∈ Z. (6.5) Next we observe that (6.6) Wj and Wi are orthonormal if j 6= i. Indeed, assume w.l.o.g. that i < j. Then Wi ⊂ Vi+1 ⊂ Vj and Wj is orthogonal to Vj . From (MRA2) we deduce that every f ∈ L2 (R) can be arbitrary approximated by some g ∈ Vn for large enough n ∈ N and (MRA3) yields that lim PV−k (f ) = 0 k→∞ where PV−k is the orthogonal projection of L2 (R) onto V−k . Thus, choosing k ∈ N we can arbitrarily approximate g by an element h in the orthogonal complement of V−k inside Vn . But since Vn = Wn−1 ⊕ Vn−1 = Wn−1 ⊕ Wn−2 ⊕ Vn−2 = . . . (Wn−1 ⊕ Wn−2 ⊕ W−k ) ⊕ V−k , it follows that h ∈ Wn−1 ⊕ Wn−2 ⊕ . . . W−k . As a consequence we deduce that L2 (R) is the orthonormal sum of the Wj , j ∈ Z. Together with our observations (6.5) and (6.6) this yields the following result. Proposition 6.1.6. Every Ψ ∈ W0 for which {Ψ((·) − j) : j ∈ Z} is an orthonormal basis of W0 is a wavelet. We say in that case that Ψ is the wavelet associated to the MRA (Vn ). Example 6.1.7. We consider the Example 6.1.5. Then Z j+1 n o W0 = {f ∈ V1 : ∀g ∈ V0 hg, f i = 0} = f ∈ V1 : f (t) dt = 0 . j Thus we could take Ψ = 1[0,1/2) − 1[1/2,1) , as a wavelet associated to (Vn ). 6.1. THE HAAR BASIS AND WAVELETS 93 We would like to explain how to construct the wavelet Ψ associated to an MRA (Vn : n ∈ Z) with scaling function Φ. Theorem 6.1.8. Suppose that (Vn : n ∈ Z) is an MRA with scaling function Φ which is integrable and Z ∞ Φ(t) dt 6= 0. −∞ 1. The the following Scaling Relation holds: (6.7) Φ= X Z ∞ pk Φ(2(·) − k) with pk = 2 Φ(x)Φ(2x − k)dx. −∞ k∈Z More generally (6.8) Φ(2j−1 (·) − l) = X pk−2l Φ(2j (·) − k) for all j, l ∈ Z. k∈Z 2. The sequence (pk : k ∈ Z) satisfies X (6.9) pk−2l pk = 2δ0,l for all l ∈ Z k∈Z X (6.10) pk = k∈2Z X pk = 1. k∈2Z+1 3. The function Ψ defined by (6.11) Ψ= X (−1)k p1−k Φ(2(·) − k) k∈Z is a wavelet associated to (Vn : n ∈ Z). Proof. Since Φ ∈ V0 ⊂ V1 and since (21/2 Φ(2(·) − k) : k ∈ Z) is an orthonormal basis of V1 , we can write Φ as in (6.7). For j, l ∈ Z it follows therefore that X Φ(2j−1 (·) − l) = pk Φ 2(2j−1 (·) − l) − k k∈Z = X k∈Z X pk Φ 2j (·) − 2l − k = pk−2l Φ(2j (·) − k). k∈Z Since (Φ((·) − l) : l ∈ Z) is an orthonormal sequence we obtain from (6.7) and (6.8) (with j =) that δ(0,l) = hΦ((·) − l), Φi X = pm−2l pk hΦ(2(·) − m), Φ(2(·) − k)i m,k∈Z 94CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) = X Z pk−2l pk k∈Z ∞ |Φ(2t)|2 dt = −∞ 1X pk−2l pk , 2 k∈Z which proves (6.9) and implies (replacing l by −l) that X 2=2 δ(0,−l) l∈Z = X pk+2l pk l,k∈Z = XX = X = p2k+2l p2k + l∈Z k∈Z p2k+1+2l p2k+1 l∈Z k∈Z p2k X k∈Z l∈Z X p2k X k∈Z XX p2k+2l + X p2k+1 X k∈Z p2l + X l∈Z p2k+1 X k∈Z l∈Z p2k+1+2l p2l+1 l∈Z X 2 X 2 = p2k + p2k+1 =: A2 + B 2 . k∈Z k∈Z Moreover if we integrate the scaling relation (6.7) we obtain Z ∞ Z ∞ 1X (6.12) Φ(x)dx = pk Φ(x)dx. 2 −∞ −∞ k∈Z By our assumption on the integral of Φ, we can cancel the integral on both sides of (6.12) and obtain that X A+B = pk = 2. k∈Z The only solution of A2 + B 2 = 2 and A + B = 2 is A = B = 1 which proves (6.10) (draw a picture!). In order to prove the (3) we first note that Ψ ∈ V1 . Secondly, we note that the sequence (Ψ((·) − k) : k ∈ Z) is orthonormal. Indeed, it follows from (6.8) and (6.9) for l, m ∈ Z that hΨ((·) − l),Ψ((·) − m)i DX E X = (−1)k p1−k Φ(2((·) − l) − k), (−1)k p1−k Φ(2((·) − m) − k) k=1 k=1 DX E X = (−1)k p1−k+2l Φ(2(·) − k), (−1)k p1−k+2m Φ(2(·) − k) k=1 1X = p1−k+2l p1−k+2m 2 k∈Z k=1 6.1. THE HAAR BASIS AND WAVELETS = 95 1X p1−k+2l−2m p1−k = δ(l,m) . 2 k∈Z Thirdly, it follows from 6.8 for any l, m ∈ Z that hΦ((·) − l),Ψ((·) − m))i DX E X = pk−2l Φ((·) − k), (−1)k p1−k Φ(2((·) − m) − k) k∈Z = DX k=1 E X k pk−2l Φ((·) − k), (−1) p1−k+2m Φ((·) − k) k∈Z = = = = k=1 1X (−1)k pk−2l p1−k+2m 2 k∈Z 1X (−1)k pk p1−k+r with r = 2m − 2l 2 k∈Z 1X 1X p2k p1−2k+r − p2k+1 p−2k+r 2 2 k∈Z k∈Z 1X 1X p2k p1−2k+r − p1−2l+r p2l = 0. 2 2 k∈Z l∈Z (substitute 2l = r − 2k) Finally, in order to show that (Ψ((·) − k) : k ∈ Z) and (Φ((·) − k) : k ∈ Z) span all of V1 we need to show that for given j ∈ Z the projection of Φ(2(·) − j) onto the space spanned by (Ψ((·) − k) : k ∈ Z) and (Φ((·) − k) : k ∈ Z) has the same norm (namely 1/2) as Φ(2(·) − j). Let us denote the projected vector by Φ̃j . By the above shown orthonormalities of (Ψ((·) − k) : k ∈ Z) and (Φ((·) − k) : k ∈ Z) we can write X Φ̃j = ak Φ((·) − k) + bk Ψ((·) − k)) , k∈Z with 1 ak = hΦ(2(·) − j), Φ((·) − k)i = hΦ(2(·) + 2k − j), Φi = pj−2k . 2 and bk = hΦ(2(·) − j), Ψ((·) − k)i X 1 = (−1)l hΦ(2(·) − j), p1−l Φ(2(·) − l − 2k)i = (−1)j p1−j+2k 2 l∈Z for k ∈ Z. It follows therefore that 1X 1X 1X 1 kΦ̃j k2 = |pj−2k |2 + |p1−j+2k |2 = |pk |2 = . 4 4 4 2 k∈Z k∈Z k∈Z 96CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) The proof of the following result goes beyond the scope of these notes, since it requires several tools from harmonic analysis Theorem 6.1.9. [Wo1, Theorem 8.13] Assume that Ψ is a wavelet for L2 (R) satisfying the following two conditions for some constant C > 0: (6.13) (6.14) |Ψ(x)| ≤ C(1 + |x|)−2 for all x ∈ R −2 Ψ is differentiable and Ψ0 (x) ≤ C 1 + |x| for all x ∈ R (p) Then for every 1 < p < ∞ the family Ψj,k = (2j/p Ψ(2j (·) − k) : j, k ∈ Z) is a (p) basis of Lp (R) which is isomorphically equivalent to (hj,k ) : j, k ∈ Z). We finally, want to present without proof another basis of Lp [0, 1]. Recall (·)n/2π that e is an orthonormal basis of L2 [0, 1]. A deep Theorem by M. Riesz states the following. Theorem 6.1.10. (c.f. [Ka, Chapter II and III]) The sequence of trigonometric polynomials (tn : n ∈ Z), with tn (ξ) = eiξn/2π , for ξ ∈ [0, 1] is a Schauder basis of Lp [0, 1], 1 < p < ∞, when ordered as (t0 , t1 , t−1 , t2 , t−2 , . . . In the next section we prove that for p 6= 2 (tn ) cannot be unconditional. 6.2 Khintchine’s inequality and Applications Theorem 6.2.1. [Khintchine’s Theorem, see Theorem 5.3.1 in [Schl]] Lp [0, 1], 1 ≤ p ≤ ∞ contains subspaces isomorphic to `2 . If 1 < p < ∞ Lp [0, 1], contains a complemented subspaces isomorphic to `2 . Definition 6.2.2. The Rademacher functions are the functions: rn : [0, 1] → R, t 7→ sign(sin 2n πt), whenever n ∈ N. Lemma 6.2.3. [Khintchine inequality], see Lemma 5.3.3 in [Schl] For every p ∈ [1, ∞) there are numbers 0 < Ap ≤ 1 ≤ Bp so that for any m ∈ N and any scalars (aj )m j=1 , (6.15) Ap m X j=1 |aj |2 1/2 m X ≤ aj rj j=1 Lp ≤ Bp m X |aj |2 1/2 . j=1 There is a complex version of the Rademacher functions namely the sequence (gn : n ∈ N), with n gn (t) = ei2 πt for t ∈ [0, 1]. 6.2. KHINTCHINE’S INEQUALITY AND APPLICATIONS 97 Theorem 6.2.4. [Complex Version of Khintchine’s Theorem] For every p ∈ [1, ∞) there are numbers 0 < A0p ≤ 1 ≤ Bp0 so that for any m ∈ N and any scalars (aj )m j=1 , (6.16) A0p m X 2 |aj | 1/2 m X ≤ aj gj j=1 Lp j=1 ≤ Bp0 m X |aj |2 1/2 . j=1 Moreover, if 1 < p < ∞ then (gj ) generates a copy of `2 inside Lp [0, 1] which is complemented. Theorem 6.2.5 (The square-function norm). Let 1 ≤ p < ∞ and let (fn ) be a λ-unconditional basic sequence in Lp [0, 1] for some λ ≥ 1. Then there is a constant C = C(p, λ) ≥ 1, depending only on the unconditionality constant of (fi ) and the constants Ap and Bp in Khintchine’s Inequality P (Lemma 6.2.3), so that for any g = ∞ i=1 ai fi ∈ span(fi : i ∈ N) it follows that ∞ ∞ X 1/2 1/2 1 X 2 2 2 2 |ai | |fi | |ai | |fi | , ≤ kgkp ≤ C C p p i=1 i=1 which means that k · kp is on span(fi : i ∈ N) equivalent to the norm ∞ ∞ X X 1/2 1/2 2 2 |||f ||| = |ai |2 |fi |2 = |a | |f | . i i p i=1 i=1 p/2 Proof. For two positive numbers A and B and c > 0 we write: A ∼c B if 1 c A ≤ B ≤ cA. Let Kp be the Khintchine constant for Lp , i.e the smallest number so that for the Rademacher sequence (rn ) ∞ ∞ X X 1/2 ai ri ∼Kp |ai |2 for (ai ) ⊂ K, p i=1 i=1 and let Cu be the unconditionality constant of (fi ), i.e. ∞ ∞ X X ai fi for (ai ) ⊂ K and (σi ) ⊂ {±1}. σi ai fi ∼Cu i=1 p i=1 p We consider Lp [0, 1] in a natural way as subspace of Lp [0, 1]2 , with f˜(s, t) := f (s) for f ∈ Lp [0, 1]. Then let rn (t) = rn (s, t) be the nth Rademacher function action on the second coordinate, i.e rn (s, t) = sign(sin(2n πt)), (s, t) ∈ [0, 1]2 . It follows from the Cu -unconditionality for any (aj )m j=1 ⊂ K, that m m X p X p aj fj (·) ∼Cup aj fj (·)rj (t) j=1 p j=1 p 98CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) m X 1 Z = 0 aj fj (s)rj (t) p ds for all t ∈ [0, 1], j=1 and integrating over all t ∈ [0, 1] implies Z m X p aj fj (·) ∼Cup p j=1 0 Z 0 0 Z 0 0 aj fj (s)rj (t) ds dt !p aj fj (s)rj (t) dt ds(By Theorem of Fubini) j=1 m 1X p aj fj (s)rj (·) ds p j=1 Z ∼Kpp !p j=1 m X 1Z 1 = = m X 1Z 1 m 1X 0 |aj fj (s)| 2 p/2 m X 1/2 p ds = |aj fj |2 , j=1 j=1 p which proves our claim using C = Kp Cu . Theorem 6.2.6. Assume that 1 < p < ∞ and assume that (fj ) is a normalized λ-unconditional sequence in Lp [0, 1] for some λ ≥ 1. Let C = C(p, λ) as in Theorem 6.2.5. Then for all scalars (aj )nj=1 ⊂ K (6.17) n n n 1/2 X X 1/p 1 X |aj |2 ≤ aj fj ≤ C |aj |p if 1 ≤ p ≤ 2, and C p j=1 (6.18) n 1 X C j=1 |aj |p 1/p j=1 n n X 1/2 X ≤ aj fj ≤ C |aj |2 if 2 ≤ p < ∞. j=1 p j=1 j=1 Proof. We will show the inequalities (6.17) and (6.18) but with their middle terms replaced by the square-function norm. Having done that the claim will follow from Theorem 6.2.5. Let (ai )ni=1 ⊂ K. First we note that since for for 1 ≤ s < t ≤ ∞ the `s -norm of a vector (bj )j≤n is at least the `t -norm we deduce n X 1/2 p 2 2 |a | |f | j j p j=1 Z = 0 n 1X j=1 |aj |2 |fj (t)|2 p/2 dt 6.2. KHINTCHINE’S INEQUALITY AND APPLICATIONS Z 1 n X ≤ |aj |p |fj (t)|p dt 0 j=1 Z 1X n ≥ |aj |p |fj (t)|p dt 99 if 1 ≤ p ≤ 2, if 2 ≤ p < ∞ 0 j=1 = n X |aj |p j=1 which implies the second inequality of (6.17) and the first of (6.18). In order to verify the other two inequalities we observe that n X 1/2 |aj |2 |fj |2 p j=1 !1/2 n n X X 1/2 |aj |2 2 2 P |f | |a | = j i n 2 i=1 |ai | j=1 p i=1 ! !1/p p/2 n n X 1/2 2 X |a | 2 2 Pn j = |f | |a | j i 2 i=1 |ai | j=1 p i=1 n n 1/p X 1/2 X |aj |2 p 2 ≥ Pn |f | |a | if 1 ≤ p ≤ 2 j i 2 p i=1 |ai | i=1 j=1 n n X 1/p X 1/2 |aj |2 p 2 P |f | |a | if 2 ≤ p < ∞ ≤ j i n 2 p i=1 |ai | i=1 j=1 ξ → ξ p/2 is convex if p > 2 and concave if 1 ≤ p < 2 1/p Z 1X n n n X 1/2 X 1/2 2 |aj | p 2 2 Pn = |f | dt |a | = |a | , j i i 2 0 i=1 |ai | j=1 i=1 i=1 which implies the first inequality of (6.17) and the second of (6.18). Corollary 6.2.7. Let 1 ≤ p < ∞. Every normalized unconditional sequence (fj ) ⊂ Lp [0, 1] which consists of uniformly bounded functions is equivalent to the `2 -unit vector basis. In particular, if p 6= 2, (fj ) cannot span all of Lp [0, 1]. We will need Jensen’s inequality. Theorem 6.2.8. [Jensen’s Inequality] If f : R → R is convex and g : [0, 1] → R, so that g and f ◦ g are integrable. It follows that ! Z Z 1 f 1 g(ξ) dξ 0 ≤ f (g(ξ)) dξ. 0 100CHAPTER 6. APPENDIX B: SOME FACTS ABOUT LP [0, 1] AND LP (R) (Here [0, 1], together with the Lebesgues measure, can be replaced by any probability space) Proof of Corollary 6.2.7. Assume that (fj ) is uniformly bounded, normalized and λ-unconditional. Let C = supj∈N kfj kL∞ . If 1 ≤ p ≤ 2 we deduce for (aj ) ∈ c00 from the proof of 6.2.6 that P 1/2 ∞ 2 ∞ X ≤ C 1/2 a i=1 i |ai fi |2 P 1/2 Lp ∞ 2 ≥ a . j=1 i=1 i Thus our claim follows for in the case that 1 ≤ p ≤ 2. If p ≥ 2 we obtain for (aj ) ∈ c00 first that ∞ ∞ X 1/2 X 1/2 |ai fi |2 |ai |2 . ≤C Lp i=1 i=1 Since kfi kLp = 1 and kfi k∞ ≤ C for i ∈ N, we deduce that 1 = kfi kpp Z 1 |fi (t)|p dt = 0 1 m({|fi | < 1/2C}) (2C)p 1 ≤ C p m({|fi | ≥ 1/2C}) + , 2 ≤ C p m({|fi | ≥ 1/2C}) + and thus m({|fi | ≥ 1/2C}) ≥ We deduce that ∞ X 1/2 |ai fi |2 i=1 Lp Z = ∞ 1X 0 |aj fj (t)| p/2 !1/p dt j=1 1 ≥ 2C Z 1 ≥ 2C Z ≥ 2 1 . 2C p 0 ∞ 1X !1/p dt j=1 ∞ 1X 0 j=1 ∞ X 1 1 2C 2C p |aj |2 1{|fj |≥1/2C} p/2 !1/2 |aj |2 1{|fj |≥1/2C} dt |aj |2 (By Jensen’s inequality) 1/2 j=1 Our claim follows therefore from the equivalence between the Lp -norm and the square function norm in Lp (Theorem 6.2.5). Bibliography [AW] F. Albiac and P. Wojtaszczyk. Characterization of 1-greedy bases J. Approx. Theory, 138(1):65–86, 2006. [EW] I. S. Edelstein and P. Wojtaszcyk, On projections and unconditional bases in direct sums of Banach spaces. Stud. Math. 56 (3) (1976) 263 – 276. [An] A. D. Andrew, On subsequences of the Haar system in C(∆), Israel Journal of Math, 31, N0. 1 (1978) 85–90. [BCLT] J. Bourgain, P.G. Casazza, J. Lindenstrauss, and L. Tzafriri, Banach spaces with a unique unconditional basis, up to permutation, Memoirs Amer. Math. Soc. No. 322, 1985. [DT] R. A. DeVore and V. N. Temlyakov, Some remarks on greedy algorithms, Adv. in Comp. Math. 5 (1996) 173 –187. [DFOS] S.J. Dilworth, D. Freeman and E. Odell, and Th. Schlumprecht, Greedy bases for Besov spaces. Constr. Approx. 34 (2011), no. 2, 281296. [DKSTW] S. Dilworth, D. Kutzarova, K. Shuman, V. Temlyakov, and P. Wojtaszczyk, Weak convergence of greedy algorithms in Banach spaces. J. Fourier Anal. Appl. 14 (2008), no. 5-6, 609 - - 628. [DOSZ1] S. Dilworth, E. Odell, Th. Schlumprecht, and A. Zsak, Renormings and symmetry properties of one-greedy bases, J. Approx. Theory 163 (2011), no. 9, 1049 – 1075. [DOSZ2] S. Dilworth, E. Odell, Th. Schlumprecht, and A. Zsak,Partial Unconditionality [DOSZ3] S. Dilworth, E. Odell, Th. Schlumprecht, and A. Zsak, On the convergence of greedy algorithms for initial segments of the Haar basis. Math. Proc. Cambridge Philos. Soc. 148 (2010), no. 3, 519529. 101 102 BIBLIOGRAPHY [1] S. Dilworth, D. Kutzarova, E. Odell, Th. Schlumprecht, and P. Wojtaszczyk, Weak thresholding greedy algorithms in Banach spaces. J. Funct. Anal. 263 (2012), no. 12, 3900– 3921. [2] S. Dilworth, D. Kutzarova, E. Odell, Th. Schlumprecht, and A. Zsak, Renorming spaces with greedy bases, J. Approx. Theory 188 (2014), 3956. [DKK] Dilworth, S. J.; Kalton, N. J.; Kutzarova, Denka On the existence of almost greedy bases in Banach spaces. Dedicated to Professor Aleksander Pelczyński on the occasion of his 70th birthday. Studia Math. 159 (2003), no. 1, 67–101. [En] P. Enflo,, A counterexample to the approximation problem in Banach spaces. Acta Math. 130 (1973), 309 – 317. [GK] M. Ganichev and N. J. Kalton, Convergence of the weak dual greedy algorithm in Lp -space, J. Approx. Theory 124 (2003) 89 – 95. [HMVZ] P. Hájek, V. Montesinos Santalucı́a, J. Vanderwerff, and V. Zizler, Biorthogonal systems in Banach spaces. CMS Books in Mathematics/Ouvrages de Mathmatiques de la SMC, 26. Springer, New York, (2008) xviii+339 pp. [Jo] L. K. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Annals of Stat. 20 (1992) 608–613. [Ka] Y. Katznelson, An Introduction to Harmonic Analysis [KT1] S. V. Konyagin and V. N. Temlyakov, A remark on greedy approximation in Banach spaces, East Journal on Approximation, 5 no.3 (1999), 365–379. [KT2] S. V. Konyagin and V. N. Temlyakov, Rate of convergence of pure greedy algorithm. East J. Approx. 5 (1999), no. 4, 493–499. [LT] Lindenstrauss, J. and Tzafriri, L., “Classical Banach Spaces I – Sequence Spaces,” Springer-Verlag, Berlin, 1979. [Ma] A.I. Markushevich, On a basis in the wide sense for linear spaces, Dokl. Akad. Nauk. 41 (1943), 241– 244. [OP] R.I. Ovsepian and A. Pelczyński, On the existence of a fundamental total and bounded biorthogonal sequence in every separable Banach space, and related constructions of uniformly bounded orthonormal systems in L2 . Studia Math. 54 (1975), no. 2, 149 – 159. BIBLIOGRAPHY 103 [Pe] A. Pelczyński, All separable Banach spaces admit for every ε > 0 fundamental total and bounded by 1 + ε biorthogonal sequences. Studia Math. 55 (1976), no. 3, 295 – 304. [Schl] Th. Schlumprecht, Course Notes for Functional Analysis, Fall 2012, http://www.math.tamu.edu/∼schlump/course notes FA2012.pdf [Sch2] Th. Schlumprecht, Embedding Theorem of Banach spaces into Banach spaces with bases. Adv. Math. 274 (2015), 833 – 880 [Schm] E. Schmidt, Zur Theoreie der linearen und nicht linearen Integralgleichungen. I, Math. Annalen 63 (1906) , 433 – 476. [T1] V. Temlyakov, Nonlinear methods of approximation. Found. Comput. Math. 3 (2003), no. 1, 33–107 [T2] V. Temlyakov, Relaxation in greedy approximation, preprint. [T3] V. Temlyakov, Greedy algorithms in Banach spaces, Adv. Comp. Math., 14 (2001) 277 – 292. [Wo1] P. Wojtaszczyk, A mathematical introduction to wavelets. London Mathematical Society Student Texats 37 (1997). [Wo2] P. Wojtaszczyk, Greedy algorithm for general biorthogonal systems. J. Approx. Theory 107 (2000), no. 2, 293 314.