i / 4ô- To appear in Proc. of the 22’ Annual Asilomar Conference on Signals Systems and Computers, Nov. 1-3, 1993/ Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition Y. C. PATI Information Systems Laboratory Dept. of Electrical Engineering Stanford University, Stanford, CA 94305 R. Abstract In this paper we describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocks e.g. aiEne (wa.velet) frames. We propoeea modification to the Matching Pursuit algorithm of Mallat and Zhang (1992) that maintains full backward orthogonality of the residual (error) at every step and thereby leads to improved convergence. We refer to this modified algorithm as Orthogonal Matching Pursuit (OMP). It is shown that all additional computation required for the OMP al gorithm may be performed recursively. REZAIIFAR AND P. S. KRISHNAPRASAD Institute for Systems Research Dept. of Electrical Engineering University of Maryland, College Park, MD 20742 where fk is the current approximation, and Rkf the current residual (error). Using initial values of R f= 0 1, fo = 0, and k = 1, the MP algorithm is comprised of the following steps, .,.41) Compute the inner-products {(Rkf,z)}. (H) Find flki such that +,)l 1 I(R*f,1:n where 0 < a < 1. (III) Set, fk+1 1 Introduction and Background Given a collection of vectors D space 1i, let us define V = Span{z}, and W = = {z} in a Hubert 1 (in E). V We shall refer to D as a dictionary, and will aasume the vectors z,,, are normalized (jlxII = 1). In [3] Mal lat and Zhang proposed an iterative algorithm that they termed Matching Pursuit (MP) to construct rep resentations of the form Pyf=Eanzn, (1) ft where P , is the orthogonal projection operator onto 1 V. Each iteration of the MP algorithm results in an intermediate representation of the form f=Ea:n+R*ffk+Pf, 1 asupl(Rkf,z,)I, = = 1k + (Rkf, z+ ) Zflk+j 1 Rkf— (Rkf,zflk+l)zflk+, (IV) Increment k, (k +— k + 1), and repeat steps (I)— (IV) until some convergence criterion has been satisfied. The proof of convergence [3] of MP relies essentially on the fact that (Rk+lf,zflk+l) = 0. This orthogonality of the residual to the last vector selected leads..±o the following “energy conservation” equation. 2 lIfII fjI + l(kf,Zn4+1)I2. 1 jj’,’+ 2 (2) It has been noted that the MP algorithm may be d& rived as a special case of a technique known as Pro jection Pursuit (c.f. [2]) in the statisti literature. A shortcoming of the Matching Pursuit algorithm in its originally proposed form is that although asymp t.otic convergence is guaranteed, the resulting approxi mation after any finite number of iterations will in gen eral be suboptimal in the following sense. Let N < co be the number of MP iterations performed. Thus we have fN = (f,zflk+I) z,. We shall refer Define VN = Span{z,,. to fN as an optima! N-term apprnximation if fri i.e. fri is the best approximation we can ‘VNf’ ,. .,zflN} of 1 construct using the selected subset {z the dictionary D. (Note that this notion of optimal ity does not involve the problem of selecting an “op timal” N-element subset of the dictionary.) In this sense, fri is an optimal N-term approximation, if and only if RN! E V. As MP only guarantees that RNf L x,,, fri as generated by MP will in general be suboptimal. The difficulty with such suboptimality . Let 2 is easily illustrated by a simple example in JR 2 as 2 and take f € i 2 be two vectors in JR z and x , 1 shown in Figure 1(a). Figure 1(b) is a plot of IIRsfll 2 . ./ gives the optimal approximation with respect to the selected subset of the dictionary. This is achieved by ensuring full backward orthogonality of the error i.e. at each iteration Rkf E V. For the example in Fig ure 1, OMP ensures convergence in exactly two itera tions. It is also shown that the additional computation required for OMP, takes a simple recursive form. We demonstrate the utility of OMP by example of applications to representing functions with respect to time-frequency localized affine wavelet dictionaries. We also compare the performance of OMP with that of MP on two numerical examples. 2 Orthogonal Matching Pursuit Assume we have the following kthorder model for f € Ii, f=azn+Rkf, with (Rkf,s) =0, ..--+--.-r n 1,...k. (3) The superscript k, in the coefficients a, show the de pendence of these coefficients on the model-order. We model to a model would like to update this of order k+1, (7 (a) k+1 + Rk+lf, with (Rk+lf, z,) = 0, f= n=1,...k+1. ,i=1 (4) Since elements of the dictionary D are not required to be orthogonal, to perform such an update, we also require an auxiliary model for the dependence of zk+1 on the previous x’s (n = 1,.. .k). Let, (b) Ill — a ——— a : (a) Dic 2 Figure 1: Matching pursuit example in 1R 2 ,z and a vector f €1R 1 {z } tionary D = 2 versus k. Hence although asymptotic convergence is guaranteed, after any finite number of steps, the error may still be quite large. ‘ In this paper we propose a refinement of the Match ing Pursuit (MP) algorithm that we refer to as Or thogonal Matching Pursuit (OMP). For nonorthogo nal dictionaries, OMP will in general converge faster than MP. For any finite size dictionary of N elements, OMP converges to the projection onto the span of the dictionary elements in no more than N steps. Fur• thermore after any finite number of iterations, OMP with <7k,Zn)O, n=1,...k. (5) Thus = Pvk+1, = Pvbzk+a, and is the component of xk1 which is unexplained by . ..,z}. 11 {z Using the auxiliary model (5), it may be shown that the correct update from the ktorder model to the model of order k + 1, is given by an5+1 and at ‘A aimlai difficulty with the Projection Pursuit algorithm noted by Donoho eLaL [11 who suggested that bec.kfitting may be used to improve the convergence o( PPR Although the tedinique is not fully de.ibed in (iJ it appears that it is in the same spirit a. the tednique we present here. where a — k_ a b 3 —1 ,..., k (5 = = , Zk+i) (R f 5 (7s,zsi) (Rkf, Zk+j) = , R f 5 Ilzt+ i112 — 117511 2 Xt+i) 1 b (z,,, Z54.j) E 2 Theorem 2.1 For f € ?t, let gerzeruted by OMP. Then It also follows that the residual Rk+jf satisfies, Rkf = Rk+lf + akj’k, and 2 IIRkfH 2.1 = ifil 2 IIR+ + (1) (7) 117k II foO, Zo=O, f, R f 0 a=O, (I) Compute {(Rkf, x,) ; z Xllb+ ED \ Dk The key difference between MP and OMP lies in Prop erty (ii/) of Theorem 2.1. Property (iif) implies that at the N step we have the best approximation we can get using the N vectors we have selected from the dictionary. Therefore in the case of finite dictio naries of size M, OMP converges in no more than M iterations to the projection of f onto the span of the dictionary elements. As mentioned earlier, Matching Pursuit does not possess this property. {) D = 0 k=O D \ Dk). such that kf,xi)I, 1(R*f,znh+jIasupl(R J (IV) O<’a 1. (Rkf,xnh+,)I <, ( > 0) then stop. 2.3 the dictionary D, by applying the per k + 1 4-k k+1• on mutati Reorder 1 bz +7k = E and (7k,z)=0, n=1,...,k. cz =ak = (f,z) = (f-fk,zf) = (f,x)—a (x,z). +l), k11 Rkf,Zk 2 (117 (8) OMP, for d The only additional computation require arises in determining the b ‘a of the auxiliary model (5). To compute the b’s we rewrite the normal equa tions associated with (5) as a system of k linear equa tions, (9) Vk = Akbk a—akb, n=1,...,k, a = 1 and update the model, fk+1 = R*+af = Some Computational Details As in the case of MP, the inner products {(Rkf, x,)) may be computed recursively. For OMP we may express these recursions implicitly in the for mula , such that, 1 (V) Compute {b} (VI) Set, k+1 a1z,* where = DkU{zk+1). T = VII) Set k 2.2 — N=O,1,2 Remarks: 1itiali.ation (III) If urn IIRkf—.PvfIIO. Proof: The proof of convergence parallels the proof of Theorem 1 in [3]. The proof of the the second prop erty follows immediately from the orthogonality con ditions of Equation (3). The results of the previous section may be used to construct the following algorithm that we will refer to as Orthogonal Matching Pursuit (OMP). (II) Find be the residuals k-.oo (ii)fN=PV,,,f, The OMP Algorithm Rkf k + 1, and repeat (I)—(V1I). [(Zk+1,Z1),(Xk+j,Z2). . = and Some Properties of OMP As in the case of MP, convergence of OMP relies on an ener’ conservation equation that now takes the form (7). The following theorem summarizes the convergence properties of OMP. 3 ,x 1 (z ) z 1 , 2 (z ) . . . (x,z) ) 2 (:i,x x2) 21 (x . . . (z,z) (X1,Xk) (z2,Zk) ..: Oir.4 Sp w.d 0.W Amcn Note that the positive constant used in Step (111) of OMP guarantees nonsingulazity of the matrix Ak, hence we may write bk = A’vk. (a) (10) However, since Ak+1 may be written as r Ak I. k V i 1 10’ (11) I, 10•’ (where * denotes conjugate transpose) it may be shown using the block matrix inversion formula that k+11r — 1 + bkb A —,3bk —flb 1 (a) 10 (12) 1 0 — Examples 700 I In the following examples we consider represen tations with repect to an a.ffine wavelet frame con structed from dilates and translates of the second derivate of a Gaussian, i.e. D = {m,,,, m, n E Z} where, = 2m/2,(2m1 14 40 / \1/2 (:2 — 1) .40 • a 40 to to Figure 3: Distribution of coefficients obtained by ap plying OMP in Example 1. Shading is proportional to squared magnitude of the coefficients 0 k, with dark colors indicating large magnitudes. and the analyzing wavetet 0 is given by, 4 (i.;;:;) 40 Tr.i W4 — = too 4 Figure 2: Example I: (a) Original signal f, with OMP 2 norm of approximation superimposed, (b) Squared L residual Rkf versus iteration number k, for both OMP (solid lini) and MP (dashed line). , and therefore 1 where fi = 1/(1 vZbk). Hence A , and 1 using A recursively computed be may bk÷j, previous step. .bk from the 3 o e_’’2 to 8 times more expensive to achieve a prespecified er ror using OMP even though OMP converges in fewer iterations. On the other hand for the signal shown in Figure 5, which lies in the span of three dictionary vectors, it is approximately 20 times more expensive to apply MP. In this case OMP converges in exactly three iterations. Note that for wavelet dictionaries, the initial set of in ner products {(f, Om,n)}, are readily computed by one convolution followed by sampling at each dilation level m. The dictionary used in these examples consists of a total of 351 vectors. In our first example, both OMP and MP were ap plied to the signal shown in Figure 2(a). We see from Figure 2(b) that OMP clearly converges in far fewer iterations than MP. The squared magnitude of the co efficients 0 k, of the resulting representation is shown in Figure 3. We could also compare the two algorithms on the basis of required computational effort to com pute representations of signals to within a prespecified error. However such a comparison can only be made for a given signal and dictionary, as the number of it erations required for each algorithm depends on both the signal and the dictionary. For example, for the signal of Example I, we see from Figure 4 that it is 3 4 Summary and Conclusions In this paper we have described a recursive al gorithm, which we refer to as Orthogonal Matching Pursuit (OMP), to compute representations of signals with respect to arbitrary dictionaries of elementary functions. The algorithm we have described is a mod ification of the Matching Pursuit (MP) algorithm of Mallat and Zhang [3] that improves convergence us4 at each step. 7 to Acknowledgements 5_ t0• This research of Y.C.P. was supported in part by NASA Headquarters, Center for Aeronautics and Space Informa tion Sciences (CASIS) under Grant NAGW419,S6, and in part by the Advanced Research Projects Agency of the De partment of Defense monitored by the Air Force Office of Scientific Research under Contract F49620-93-I-Oog5, This research of R.R. and P.S.K was supported in part by the Air Force Office of Scientific Research under con tract F49620-92-J-0500, the AFOSR University Research Initiative Program under Grant AFOSR-90-0105, by the Army Research Office under Smart Structures URI Con tract no. DAALO3-92-G-0121, and by the National Sci ence Foundation’s Engineering Research Centers Program, NSFD CDR 8803012. 10 .15 -2 ta — z.d 1.2 ,w Figure 4: Computational cost (FLOPS) versus ap proximation error for both OMP (solid line) and MP (dashed line) applied to the signal in Example 1. (a) 02 0 - -150 .100 - 100 50 0 150 References ZO [1) D. Donoho, I. Johnstone, P. Rousseeuw, and W. Stahel. The Annals of Statistics, 13(2):496— 500, 1985. Discussion following article by P. Hu ber. (b) 0 10 40 00 70 [2) P. .1. Huber. Projection pursuit. The Annals of Statistics, 13(2) :435—475, 1985. [3] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. Preprint. Submitted to IEEE Trzrnsactions on Signal Processing, 1992. Figure 5: Example II: (a) Original signal f, (b) 2 norm of residual R.f versus iteration Squared L number k, for both OMP (solid line) and MP (dashed line). ing an additional orthogonalization step. The main benefit of OMP over MP is the fact that it is guar anteed to converge in a finite number of steps for a finite dictionary. We also demonstrated that all addi tional computation that is required for OMP may be performed recursively. The two algorithms, MP and OMP, were compared on two simple examples of decomposition with respect to a wavelet dictionary. It was noted that although OMP converges in fewer iterations than MP, the com putational effort required for each algorithm depends on both the class of signals and choice of dictionary. Although we do not provide a rigorous argument here, it seems reasonable to conjecture that OMP will be computationally cheaper than MP for very redundant dictionaries, as knowledge of the redundancy is ex ploited in OMP to reduce the error as much as poible 5