Spectral Analysis Jesus Fernandez-Villaverde University of Pennsylvania 1 Why Spectral Analysis? We want to develop a theory to obtain the business cycle properties of the data. Burns and Mitchell (1946). General problem of signal extraction. We will use some basic results in spectral (or harmonic) analysis. Then, we will develop the idea of a lter. 2 Riesz-Fisher Theorem First we will show that there is an intimate link between L2 [ and l2 ( 1; 1). ; ] Why? Because we want to represent a time series in two di erent ways. Riesz-Fischer Theorem: Let fcng1 n= 1 2 l2 ( 1; 1). Then, there exist a function f (! ) 2 L2 [ ; ] such that: 1 X f (! ) = cj e i!j j= 1 This function is called the Fourier Transform of the series. 3 Properties Its nite approximations converge to the in nite series in the mean square norm: lim n!1 Z n X 2 cj e i!j f (! ) d! = 0 j= n It satis es the Parseval's Relation: Z jf (! )j2 d! = Inversion formula: 1 ck = 2 Z 1 X n= 1 jcnj2 f (! ) ei!k d! 4 Harmonics n oj=1 are called harmonics and constitute an The functions e i!j j= 1 orthonormal base in L2 [ ; ]. The orthonormality follows directly from: if j 6= k and Z e i!j ei!k d! = 0 Z e i!j ei!j d! = 1 The fact that they constitute a base is given by the second theorem that goes in the opposite direction than Riesz-Fischer: given any function in L2 [ ; ] we can nd an associated sequence in l2 ( 1; 1). 5 Converse Riesz-Fischer Theorem Let f (! ) 2 L2 [ such that: ; ]. Then there exist a sequence fcng1 n= 1 2 l2 ( 1; 1) f (! ) = 1 X cj e i!j j= 1 where: Z 1 f (! ) ei!k d! ck = 2 and nite approximations converge to the in nite series in the mean square norm: lim n!1 Z n X 2 cj e i!j j= n 6 f (! ) d! = 0 Remarks I . The Riesz-Fisher theorem and its converse assure then that the Fourier transform is an bijective from l2 ( 1; 1) into L2 [ ; ]. The mapping is isometric isomorphism since it preserves linearity and 1 distance: for any two series fxng1 , fy g n n= 1 n= 1 2 l2 ( 1; 1) with Fourier transforms x (! ) and y (! ) we have: 1 X x (! ) + y (! ) = x (! ) = j= 1 1 X j= 1 1 2 Z jx (! ) (xn + yn) ei!j y (! )j2 d! 7 1 2 xnei!j 0 =@ 1 X j= 1 jxk 11 yk j2A 2 Remarks II . There is a limit in the amount of information in a series. We either have concentration in the original series or concentration in the Fourier transform. This limit is given by: 0 @ 1 X j= 1 2 1 j 2 cj A Z 1 1 ! 2 jf (! )j2 d! 4 1 cj 4 This inequality is called the Robertson-Schr• odinger relation. 8 Stochastic Process X fXt : ! Rm, m 2 N, t = 1; 2; :::g is a stochastic process dened on a complete probability space ( ; =; P ) where: 1. = Rm 1 T Rm t=0 limT !1 m) m 1 is just 2. = limT !1 =T limT !1 T B ( R B R t=0 the Borel -algebra generated by the measurable nite-dimensional product cylinders. 3. P T (B ) P (B ) j=T De ne a T segment as P Y T 2 B , 8B 2 =T . XT realization of that segment as 0 0 0 X1; :::; XT with X 0 0 xT x01; :::; x0T . 9 = f;g and a Moments Moments, if they exist, can be de ned in the usual way: t tj = Ext xt j = = Ext = Z Rm Z (j+1) Rm (xt xtdP T ) xt j dP T If both t and tj are independent of t for all j , X is a covariancestationary or weakly stationary process. Now, let us deal with X : covariance-stationary process with zero mean (the process can be always renormalized to satisfy this requirement). 10 z Transform If P1 j= 1 j < 1, de ne the operator gx : C ! C : gx (z ) = 1 X jz j j= 1 where C is the complex set. This mapping is known as the autocovariance generating function. Dividing this function by 2 and evaluating it at e i! (where ! is a real scalar), we get the spectrum (or power spectrum) of the process xt : 1 1 X i!j sx (! ) = je 2 j= 1 11 Spectrum The spectrum is the Fourier Transform of the covariances of the process, divided by 2 to normalizes the integral of sx ( ) to 1. The spectrum and the autocovariances are equivalent: there is no information in one that is not presented in other. That does not mean having two representations of the same information is useless: some characteristics of the series, as its serial correlation are easier to grasp with the autocovariances while others as its unobserved components (as the di erent uctuations that compose the series) are much easier to understand in the spectrum. 12 Example General ARMA model: (L) xt = (L) "t. P . (1) representation of xt in the lag Since xt = 1 " ; an M A t j j j=0 operator (L) and variance 2 satis es: gx (L) = j (L)j2 = L 1 (L) 2 Wold's theorem assures us that we can write any stationary ARMA as an M A (1) and then since we can always make: (L) = (L) (L) we will have gx (L) = (L) L 1 2 and then the spectrum is given by: 2 e i!j sx (! ) = 2 e i!j 13 = (L) L 1 (L) L 1 ei!j ei!j e i!j 2 Working on the Spectrum Since j = j: . 0 1 X 1 X 1 1 @ i!j sx (! ) = = je 0+ 2 j= 1 2 j=1 0 1 1 X 1 @ A = 0+2 j cos (!j ) 2 j=1 iz by Euler's Formula ( e +e 2 iz 1 h i i!j i!j A +e j e = cos z ) Hence: 1. The spectrum is always real-valued. 2. It is the sum of an in nite number of cosines. 3. Since cos (! ) = cos ( ! ) = cos (! + 2 k), k = 0; 1; 2; :::, the spectrum is symmetric around 0 and all the relevant information is concentrated in the interval [0; ]. 14 Spectral Density Function . Sometimes the autocovariance generating function is replaced by the autocorrelation generating function (where every term is divide by 0). Then, we get the spectral density function: s0x (! ) where j = j = 0. 0 1 X 1 1 @ A = 1+2 j cos (!j ) 2 j=1 15 Properties of Periodic Functions . Take the modi ed cosine function: yj = A cos (!j ) ! (measured in radians): angular frequency (or harmonic or simply the frequency ). 2 =! : period or whole cycle of the function. A: amplitude or range of the function. : phase or how much the function is translated in the horizontal axis. 16 Di erent Periods . ! = 0, the period of uctuation is in nite, i.e. the frequency associated with a trend (stochastic or deterministic). ! = (the Nyquist frequency), the period is 2 units of time, the minimal possible observation of a uctuation. Business cycle uctuations: usually de ned as uctuations between 6 and 32 quarters. Hence, the frequencies of interest are those comprised between =3 and =16. We can have cycles with higher frequency than . When sampling in not continuous these higher frequencies will be imputed to the frequencies between 0 and . This phenomenon is known as aliasing. 17 Inversion Using the inversion formula, we can nd all the covariances from the spectrum: . Z sx (! ) ei!j d! = Z sx (! ) cos (!j ) d! = j When j = 0, the variance of the series is: Z sx (! ) d! = 0 Alternative interpretation of the spectrum: integral between [ it is the variance of the series. In general, for some ! 1 and and ! 2 Z ! 2 !1 ; ] of : sx (! ) d! is the variance associated with frequencies in the [! 1; ! 2] Intuitive interpretation: decomposition of the variances of the process. 18 Spectral Representation Theorem Any stationary process can be written in its Cramer's Representation: . Z Z xt = 0 u (! ) cos !td! + 0 v (! ) sin !td! A more general representation is given by: xt = where Z (! ) d! ( ) is any function that satis es an orthogonality condition: Z (! 1) (! 2)0 d! 1d! 2 = 0 for ! 1 6= ! 2 19 Relating Two Time Series Take a zero mean, covariance stationary random process Y with real. process X , with realization xt: ization yt and project it against the yt = B (L) xt + "t where B (L) = 1 X bj Lj j= 1 and Ext"t = 0 for all j . Adapting our previous notation of covariances to distinguish between di erent stationary processes we de ne: l j = Eltlt j 20 Thus: 0 ytyt j = @ 1 X 10 bs x t s A @ s= 1 0 1 X +@ s= 1 1 X 1 b r xt j r A r= 1 1 0 b s xt s A "t j + @ 1 X r= 1 1 br xt j r A "t + "t"t j Taking expected values of both sides, the orthogonality principle implies that: y j = 1 X 1 X bsbr xj+r s + "j s= 1 r= 1 With these covariances, the computation of the spectrum of y is direct: 1 0 1 1 1 X 1 @ X 1 X y i!j i!j " e i!j A x e = b b e + sy (! ) = r s j j+r s 2 j= 1 j 2 j;s;r= 1 j= 1 21 If we de ne h = j + r s: e i!j = e i!(h r+s) = e i!he i!sei!r we get: 1 X 1 X 1 1 X X 1 1 " e i!j x e i!h + sy (! ) = br ei!r bse i!s j h 2 2 r= 1 s= 1 r= 1 j= 1 = B ei!r B e i!s sx (! ) + s" (! ) Using the symmetry of the spectrum: sy (! ) = B e i! 22 2 sx (! ) + s" (! ) LTI Filters Given a process X , a linear time-invariant lter (LTI- lter) is a operator from the space of sequences into itself that generates a new . process Y of the form: yt = B (L) xt Since the transformation is deterministic, s" (! ) = 0 and we get: 2 sy (! ) = B e i! sx (! ) B e i! : frequency transform or the frequency response function is the Fourier transform of the coe cients of the lag operator. G (! ) = B e i! is the gain (its modulus). 2 2 B e i! is the power transfer function (since B e i! is a quadratic form, it is a real function). It indicates how much the spectrum of the series is changed at each particular frequency. 23 Gain and Phase The de nition of gain implies that the ltered series has zero variance P i0 = P1 b e at ! = 0 if and only if B e i0 .= 1 j= 1 bj = 0. j= 1 j Since the gain is a complex mapping, we can write: B e i! = B a (! ) + iB b (! ) where B a (! ) and B b are both real functions. Then we de ne the phase of the lter (! ) = tan 1 B b (! ) B a (! ) ! The phase measures how much the series changes its position with respect to time when the lter is applied. For a given (! ), the lter shifts the series by (! ) =! time units. 24 Symmetric Filters I A lter is symmetric if bj = b j : . B (L) = b0 + 1 X bj Lj + L j j=1 Symmetric lters are important for two reasons: 1. They do not induce a phase shift since the Fourier transform of B (L) will always be a real function. 2. Corners of the a T segment of the series are di cult to deal with since the lag operator can only be applied to one side. 25 Symmetric Filters II P1 If j= 1 bj = 0, symmetric lters can remove trend frequency components, either deterministic or stochastic up to second order (a quadratic deterministic trend or a double unit root): . 1 X bj Lj = j= 1 = 1 X j=1 1 X bj Lj +L j 2 bj Lj + L j = (1 where we used 1 (1 L) 1 L) 1 Lj = (1 bj Lj j= 1 1 X 2 = bj 1 1 X j=1 jX1 (k j=1 = 1 X L 1 bj j=1 L 1 B 0 (L) h= j+1 Lj 1 jhj) Lh L) 1 + L + ::: + Lj 1 and (1 + L + :::) 1 + L 1 + ::: = 1 X j=1 if the sum is well de ned. 26 bj jX1 h= j+1 (k jhj) Lh L j Ideal Filters An ideal lter is an operator B (L) such that the new process Y only . has positive spectrum in some speci ed part of the domain. Example: a band-pass lter for the interval f(a; b) [ ( b; a)g 2 ( ; ) and 0 < a b , we need to choose B e i! such that: ( = 1; for ! 2 (a; b) [ ( b; a) ; B e i! = = 0 otherwise Since a > 0, this de nition implies that B (0) = 0. Thus, a band-pass lter shuts o all the frequencies outside the region (a; b) or ( b; a) and leaves a new process that only uctuates in the area of interest. 27 How Do We Build an Ideal Filter? Since X is a zero mean, covariance stationary process, . 1, the Riesz-Fischer Theorem holds. P1 2 n= 1 jxn j < If we make f (! ) = B e i! , we can use the inversion formula to set the Bj : 1 bj = 2 Z B e i! ei!j d! Substituting B e i! by its value: Z 1 1 B e i! ei!j d! = bj = 2 2 Z 1 b i!j = e + e i!j d! 2 a where the second step just follows from 28 Z a b ei!j d! + Z b a ei!j d! ! R a i!j Rb i!j d! . e d! = ae b Building an Ideal Filter Now, using again Euler's Formula, 1 bj = 2 Z b 1 sin jb sin ja b . 2 cos !jd! = sin !jja = 8 j 2 N n f0g j j a Since sin z = b j= sin ( z ), sin ( jb) sin ( ja) = j Also: sin jb + sin ja sin jb sin ja = = bj j j Z b 1 b 2 cos ! 0d! = b0 = 2 a and we have all the coe cients to write: yt = 0 b @ a + 1 X sin jb j=1 sin ja j a 1 Lj + L j A xt The coe cients bj converge to zero at a su cient rate to make the sum well de ned in the reals. 29 Building an Low-Pass Filter Analogously, a low-pass lter allows only the low frequencies (between . ( b; b)), implying a choice of B e i! such that: B e i! = ( = 1; for ! 2 ( b; b) ; = 0 otherwise Just set a = 0. sin jb 8 j 2 N n f0g bj = b j = j b b0 = 0 yt = @ b 1 X 1 sin jb j + L + L j A xt j j=1 30 Building an High-Pass Filter Finally a high-pass allows only the high frequencies: . ( = 1; for ! 2 (b; ) [ ( ; i! B e = = 0 otherwise b) ; This is just the complement of a low-pass ( b; b): 0 yt = @1 b 1 X 1 sin jb j L + L j A xt j j=1 31 Finite Sample Approximations With real, nite, data, it is not possible to apply any of the previous formulae since they require an in. nite amount of observations. Finite sample approximations are this required. We will study two approximations: 1. the Hodrick-Prescott lter 2. the Baxter-King lters. We will be concern with the minimization of several problems: 1. Leakage: the lter passes frequencies that it was designed to eliminate. 2. Compression: the lter is less than one at the desired frequencies. 3. Exacerbation: the lter is more than one at the desired frequencies. 32 HP Filter Hodrick-Prescott (HP) lter: 1. It is a coarse and relatively uncontroversial procedure to represent the behavior of macroeconomic variables and their comovements. 2. It provides a benchmark of regularities to evaluate the comparative performance of di erent models. Suppose that we have T observations of the stochastic process X , fxtgt=1 t=1 . HP decomposes the observations into the sum of a trend component, xtt and a cyclical component xct: xt = xtt + xct 33 Minimization Problem How? Solve: min xtt T X xt 2 t xt + t=1 TX1 h xtt+1 t=2 Intuition. Meaning of : 1. = 0 )trivial solution (xtt = xt). 2. = 1 )linear trend. 34 xtt xtt xtt 1 i2 Matrix Notation To compute the HP lter is easier to use matrix notation, and rewrite minimization problem as: 0 t x x min x xt 0 where x = (x1; :::; xT ) , 0 B B B A=B B B @ 1 0 ... ... 0 xt = xt 0 t t x1; :::; xT 2 1 1 ... ... + 0 2 1 ... ... ... 1 35 0 t Ax Axt and: 0 0 ... . . . ... 2 1 0 1 2 1 1 C C C C C C A (T 2) T Solution First order condition: xt x + A0Axt = 0 or xt = I + A 0 A 1 x I + A0A 1 is a sparse matrix (with density factor (5T We can exploit this property. 36 6) =T 2). Properties I To study the properties of the HP lter is however more convenient . to stick with the original notation. We write the rst-order condition of the minimization problem as: 0 = xtt 2 xt 4 h xtt+1 +2 xtt h xtt xtt i t t t xt 1 xt 1 xt 2 i h t xt 1 + 2 xtt+2 xtt+1 xtt+1 xtt i Now, grouping coe cients and using the lag operator L: xt = = = xtt h + L 2 1+ h 1 (1 4 2L + L2 L 1 2 L) i 2 + (6 + 1) 1 2 1 L 37 h L 1 i 2+L + 4 L+ L2 = F (L) xtt i h L 2 2L 1 +1 i xtt Properties II De ne the operator C (L) as: (1 1) F (L) . 1 = C (L) = (F (L) 1+ (1 L)2 1 2 L 1 L)2 1 2 L 1 Now, if we let, for convenience, T = 1, and since xtt = B (L) xt xct = (1 B (L)) xt we can see that 1 B (L) = 1 B (L) = F (L) 1 F (L) 1 = (F (L) 1) F (L) 1 = C (L) i.e., the cyclical component xct is equal to: xct (1 = 1+ 2 L) 1 L)2 1 (1 2 1 L xt 2 L 1 a stationary process if xt is integrated up to fourth order. 38 Properties III Remember that: sxc (! ) = 1 2 2 B e i! sx (! ) = C e i! sx (! ) . We can evaluate at e i! and taking the module, the gain of the cyclical component is: Gc (! ) = 1+ = 2 1 e i! 1 2 e i! 1 1 4 (1 2 ei! 2 cos (! ))2 1 + 4 (1 where we use the identity 1 ei! cos (! ))2 e i! 1 ei! = 2 (1 cos (! )). This function gives zero weight to zero frequencies and close to unity on high frequency, with increases in moving the curve to the left. Finally note that since the gain is real, translated in time. 39 (! ) = 0 and the series is not Butterworth Filters Filters with gain: 2 . 12d3 1 ! sin 2 A 7 6 G (! ) = 41 + @ 5 !0 sin 2 0 that depends on two parameters, d, a positive integer, and ! 0, that sets the frequency for which the gain is 0:5. Higher values of d make the slope of the band more vertical while smaller values of ! 0 narrow the width of the lter band. 40 Butterworth Filters and the HP Filter cos (! ) = 2 sin2 . Gc (! ) = 1 1 + 16 Using the fact that 1 Since B (L) = 1 ! ; 2 the HP gain is: 1 sin4 !2 C (L), the gain of the trend component is just: t Gxt (! ) = 1 Gc ( ! ) = 1 1 + 16 sin4 !2 a particular case of a Butterworth lter of the sine version with d = 2 1 and ! 0 = 2 arcsin 0:25 . 2 41 Ideal Filter Revisited Recall that in our discussion about the ideal lter yt = we derived the formulae: sin jb sin ja bj = b j = 8j2N j b b0 = for the frequencies a = 2pu and b = 2p where 2 l the lengths of the uctuations of interest. P1 jx b L t j j= 1 pl < pu < 1 are Although this expressions cannot we evaluate for all integers, they suggest the feasibility of a more direct approach to band-pass ltering using some form of these coe cients. 42 Baxter and King (1999) Take the rst k coe cients of the above expression. . Then, since we want a gain that implies zero variance at ! = 0, we normalize each coe cient as: to get that bfj = bj b0 + 2 Pk j=1 bj 2k + 1 Pk f = 0 as originally proposed by Craddock (1957). b j= k j This strategy is motivated by the remarkable result that the solution to the problem of minimizing Z 1 Q= B e 2 where B ( ) is the ideal band-pass is simply to take the rst k coe and to make all the higher order i! Bk e i! 2 d! lter and Bk ( ) is its k-approximation, cients bj from the inversion formula coe cients zero. 43 To see that, write the minimization problem s.t. L= 1 2 Z where !; eb = First order conditions: Z 1 e i!j 2 0 e !; b d! !; eb 2 4B e i! 0 e !; b + 0= k X j= k for j = 1; :::; k. 44 1 X j= 1 + Pk f j= k bj = 0 as: k X j= k 3 e bj e i!j 5 0 i!j e !; b e d! = bfj bfj Since the harmonics e i!j are orthonormal 0 = 2 bj bfj + Pk j= k bj = 2 2k + 1 for j = 1; :::; k, that provides the desired result. Notice that, since the ideal lter is a step function, this approach su ers from the same Gibbs phenomenon that arises in Fourier analysis (Koopmans, 1974). 45