Forecastable Component Analysis Georg M. Goerg Carnegie Mellon University, Department of Statistics, Pittsburgh, PA 15213 E(X − EX)2 in (1); independent component analysis (ICA) recovers statistically independent signals (Hyvärinen and Oja, 2000); slow feature analysis (SFA) (Wiskott and Sejnowski, 2002) finds “slow” signals and is equivalent to maximizing the lag 1 autocorrelation coefficient. Abstract I introduce Forecastable Component Analysis (ForeCA), a novel dimension reduction technique for temporally dependent signals. Based on a new forecastability measure, ForeCA finds an optimal transformation to separate a multivariate time series into a forecastable and an orthogonal white noise space. I present a converging algorithm with a fast eigenvector solution. Applications to financial and macro-economic time series show that ForeCA can successfully discover informative structure, which can be used for forecasting as well as classification. DR techniques are often applied to multivariate time series Xt , hoping that forecasting on the lowerdimensional space St is more accurate, simpler, more efficient, etc. Standard DR techniques such as PCA or ICA, however, do not explicitly address forecastability of the sources. For example, just because a signal has high variance does not mean it is easy to forecast. The R package ForeCA accompanies this work and is publicly available on CRAN. 1. Introduction With the rise of high-dimensional datasets it has become important to perform dimension reduction (DR) to a lower dimensional representation of the data. For simplicity we consider linear transformations W ∈ Rk×n , which map an n-dimensional X to a k ≤ n dimensional S = WX. Typically, the transformed data should be somewhat “interesting”; there is no point in transforming X to an arbitrary S that is less useful, meaningful, etc. Let ι (S) measure “interestingness” of S. DR can then be set up as an optimization problem b j = arg max ι w> X , j = 1, . . . , k, (1) w subject to w∈Rn×1 > > wj X ⊥ {w1> X, . . . , wj−1 X}, gmg@stat.cmu.edu (2) where (2) is a common DR constraint, which makes Sj = wj> X orthogonal (uncorrelated) to previously obtained signals. For example, principal component analysis (PCA) keeps large variance signals (Jolliffe, 2002) – ι (X) = Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28. Copyright 2013 by the author(s). Thus let’s define interesting as being predictable. Forecasting is not only good for its own sake (finance, economics), but even when future values are not immediately interesting, signals that do have predictive power exhibit non-trivial structure by definition – and are thus easier to interpret. For example, the time series in Fig. 1 are ordered from least (S&P500 daily returns) to most forecastable (monthly temperature in Nottingham) according to the ForeCA forecastability measure Ω(xt ) I propose in Definition 3.1 below. And indeed moving from left to right they exhibit more structure. The main contributions of this work are i) a modelfree, comparable measure of forecastability for (stationary) time series (Section 3), ii) a novel data-driven DR technique, ForeCA, that finds forecastable signals, iii) an iterative algorithm that provably converges to (local) optima using fast eigenvector solutions (Section 4), and iv) applications showing that ForeCA outperforms traditional DR techniques in finding lowdimensional, forecastable subspaces, and that it can also be used for time series classification (Section 5). Related work will be reviewed in Section 6. All computations and simulations were done in R (R Development Core Team, 2010). 2. Time Series Preliminaries Let yt be a univariate, second-order stationary time series with mean Eyt = µy < ∞, variance Vyt = σy2 , Fahrenheit 30 45 60 -3000 -1000 1000 1920 1925 1930 1935 1940 Year ^ = 1.25% Ω 0.1 0.0 0.1 0.2 ωj 0.3 0.4 0.5 ACF 0 20 60 lag 100 140 ^ = 14.99% Ω 0.0 0.1 0.2 ωj 0.3 0.4 0.5 0 ^f (ω ) (log-scale) j 80 -0.5 ACF 60 0.02 1.00 lag ^f (ω ) (log-scale) j 40 0.0 0.0 ACF 0.6 0.5 Year (3435BC to 1969AD) 5 10 lag 15 20 ^ = 34.37% Ω 0.50 20 2500 Avg temperature in Nottingham 0.01 0 1500 Days 0.0 % -6 500 0.6 0 0.5 f (ω j) (log-scale) Mount Campito tree rings 0.6 0 4 S&P 500 returns width in mm Forecastable Component Analysis 0.0 0.1 0.2 ωj 0.3 0.4 0.5 Figure 1. Observations (top); sample ACF ρb(k) (middle); smoothed WOSA spectral density estimate (bottom). From left to right: i) S&P 500 daily returns; ii) Mount Campito tree ring series; iii) monthly mean temperatures in Nottingham. Data publicly available in R packages: SP500 in MASS; camp in tseries; nottem in datasets. and autocovariance function (ACVF) γy (k) = E(yt − µy ) (yt−k − µy ) , k ∈ Z. (3) The ACVF for univariate processes is symmetric in k, γy (k) = γy (−k). Let ρ(k) = γ(k)/γ(0) be the autocorrelation function (ACF). A large ρ(k) means that the process k time steps ago is highly correlated with the present yt . The sample ACFs ρb(k) in Fig. 1 show that, e.g., S&P 500 daily returns are uncorrelated with their own past (stock market efficiency); yearly tree ring growth is highly correlated over time with significant lags even for k ≥ 100 years; and intuitively temperature in month t is highly correlated with the temperature k = 6 (cold ↔ warm) and k = 12 (cold → cold; warm → warm) months ago (or in the future). The building block of time series models is white noise εt , which has zero mean, finite variance, and is uncorrelated over time: εt ∼ W N (0, σε2 ) iff1 i) Eεt = 0, ii) Vεt = γε (0) = σε2 , and iii) γε (k) = 0 if k 6= 0. Only if εt is a Gaussian process, then it is also independent. For multivariate second-order stationary Xt with mean2 µ ∈ Rn and covariance matrix ΣX the ACVF > Rn×n 3 ΓX (k) = E (Xt − µ) (Xt−k − µ) , (4) is a matrix-valued function of k ∈ Z. In particular, ΓX (0) = ΣX . The diagonal of ΓX (k) contains the ACVF of each Xi (t); the off-diagonal element 1 2 Iff will be used as an abbreviation for if and only if. Without loss of generality (WLOG) assume µ = 0. ΓX (k)(i,j) is the cross-covariance between the ith and jth series at lag k: γij (k) = E (Xi,t − µi ) (Xj,t−k − µj ) ∈ R. (5) Contrary to γy (k), ΓX (k) is not symmetric, but ΓX (k) = ΓX (−k)> . (6) 2.1. Spectrum and Spectral Density The spectrum of a univariate stationary process can be defined as the Fourier transform of its ACVF, Sy (λ) = ∞ 1 X γy (j)eijλ , 2π j=−∞ λ ∈ [−π, π], (7) √ where i = −1 is the imaginary unit. Since γy (k) is symmetric, the spectrum is a real-valued, non-negative function, Sy : [−π, π] → R+ . For white noise εt all σ2 γε (k) = 0 if k 6= 0, thus Sε (λ) = 2πε is constant for all λ ∈ [−π, π]. When γ(k) > 0 for k 6= 0 the spectrum has peaks at the corresponding frequencies. For example, the spectral density of monthly temperature series (right in Fig. 1) has large peaks at λ ≈ π/6 and π/12, which represent the half- and one-year cycle.3 Vice versa, the ACVF can be recovered from the spec3 Frequencies λ are often scaled by π, λ̃ = λ/π. This does not change results qualitatively, but simplifies interpretation since the corresponding cycle length equals λ̃−1 . Forecastable Component Analysis trum using the inverse Fourier transform Z π γy (k) = Sy (λ)e−ikλ dλ, k ∈ Z. noise, which is unpredictable by definition (using linear predictors). Consequently, for any stationary yt (8) −π In particular, Rπ −π fy (λ) = Sy (λ)dλ = σy2 for k = 0. Let ∞ Sy (λ) 1 X = ρy (j)eijλ , σy2 2π j=−∞ (9) be R π the spectral density of yt . As fy (λ) ≥ 0 and f (λ)dλ = 1, the spectral density can be inter−π y preted as a probability density function (pdf) of an (unobserved) random variable (RV) Λ that “lives” on 1 , which the unit circle. For white noise fε (λ) = 2π represents the uniform distribution U (−π, π). Hs,a (yt ) ≤ Hs,a (white noise) Z π 1 1 =− loga dλ = loga 2π, 2π 2π −π with equality iff yt is white noise. Definition 3.1 (Forecastability of a stationary process). For a second-order stationary process yt , let Ω : yt 7→ [0, ∞], Ω(yt ) = 1 − Hs,a (yt ) = 1 − Hs,2π (yt ), loga (2π) (12) be the forecastability of yt . Remark 2.1 (Spectrum and spectral density). In the time series literature “spectrum” and “spectral density” are often used interchangeably. Here I reserve “spectral density” for fy (λ) in (9), as it integrates to one such as standard probability density functions. Contrary to other measures in the signal processing and time series literature, Ω(yt ) does not require actual forecasts, but is a characteristic of the process yt . It is therefore not biased to a particular – perhaps suboptimal – model, forecast horizon, or loss function; as used in e.g., Box and Tiao (1977); Stone (2001). 3. Measuring Forecastability Properties 3.2. Ω(yt ) satisfies: Forecasting is inherently tied to the time domain. Yet, since Eqs. (7) & (8) provide a one-to-one mapping between the time and frequency domain, we can use frequency domain properties to measure forecastability. The intuition for the proposed measure of forecastability is as follows. Consider √ yt = 2 cos (2πYt + θ) , (10) θ ∼ U (−π, π), Y ∼ py (y) independent of θ. One can show that Sy (λ) = py (λ) (Gibson, 1994). If we have to predict the future of yt , then uncertainty about yt+h , h > 0, is only manifested in uncertainty about Y, since cos (2πYt + θ) is a deterministic function of t: less uncertainty about Y means less uncertainty about yt+h . We can measure this uncertainty using the Shannon entropy of py (y) (Shannon, 1948). It is thus natural to measure uncertainty about the future as (differential) entropy of fy (λ), Z π Hs,a (yt ) := − fy (λ) loga fy (λ)dλ, (11) −π where a > 0 is the logarithm base. On a finite support [b, c] the maximum entropy occurs for the uniform distribution U (b, c); thus a flat spectrum should indicate the least predictable sequence. And indeed, a flat spectrum corresponds to white a) Ω(yt ) = 0 iff yt is white noise. b) invariant to scaling and shifting: Ω(ayt + b) = Ω(yt ) for a, b ∈ R, a 6= 0. c) max sub-additivity for uncorrelated processes: p Ω(αxt + 1 − α2 yt ) ≤ max{Ω(xt ), Ω(yt )}, (13) if Ext ys = 0 for all s, t ∈ Z; equality iff α ∈ {0, 1}. The three series in Fig. 1 are ordered (left to right) b corby increasing forecastability and indeed larger Ω respond to intuitively more predictable real-world events: stock returns are in general not predictable; average monthly temperature is. We can thus use (12) to guide the search for optimal w that make yt = w> Xt as forecastable as possible. 3.1. Plug-in Estimator for Ω To estimate Ω(yt ), we first estimate Sy (λ), normalize it, and then plug it in (11). An unbiased estimator of Sy (λ) is the periodogram T −1 2 1 X IT,y1T (ωj ) = √ yt e−2πiωj t , T t=0 (14) Forecastable Component Analysis where ωj = j/T , j = 0, 1, . . . , T − 1 are the (scaled) Fourier frequencies, and y1T = {y1 , . . . , yT } is a sample of yt . It is well known that (14) is not a good estimate (e.g., periodograms are not consistent). In the numerical examples we therefore use weighted overlapping segment averaging (WOSA) (Nuttal and Carter, 1982) Sby (ωj ) from the R package sapa: SDF(y, ’’wosa’’). The bottom row of Figure 1 shows the normalized by (ωj ) S fbj,y = PT −1 b (ω ) along with the plug-in estimate S j=0 y j b 1T ) = 1 + Ω(y T −1 X fbj,y · loga=T fbj,y . (15) j=0 Remark 3.3. Typically, to estimate Eg(X) for X ∼ p(x) (here: g(X) = log p(X)) the sample average is solely over g(xj ) without multiplicative p(xj ) terms. This however assumes Pn that each xj is sam1 pled from p(x) (and thus i=1 g(xi ) → Ep g(X) = n R g(x)p(x)dx by the strong law of large numbers). While this is true in a standard sampling framework, here the “data” are the Fourier frequencies ωj and the fast Fourier transform (FFT) samples them uniformly (and deterministically) from [−π, π] and not according to the “true” spectral density f (λ).4 Eq. (15) can be improved by a better spectral density (Fryzlewicz, Nason, and von Sachs, 2008; Lees and Park, 1995; Trobs and Heinzel, 2006) and entropy estimation (Paninski, 2003). Future research can also address direct estimation of (11) – as is common for classic entropy estimates (Sricharan, Raich, and Hero, 2011; Stowell and Plumbley, 2009). However, since neither spectrum nor entropy estimation are the primary focus of this work, we use standard estimators for Sy (λ) and then the plug-in estimator of (15). b T ) in (15) is based on It must be noted though that Ω(y 1 discrete rather than differential entropy. It still has the intuitive property that white noise has zero estimated b T ) ∈ [0, 1]; Ω(y b T ) = 1 iff forecastability, but now Ω(y 1 1 the sample is a perfect sinusoid. Applications show that (15) yields reasonable estimates and we do not expect the results to change qualitatively for other estimators. We leave differential entropy estimates of Ω to future work. Notice that Ω(yt ) relies on Gaussianity as only then fy (λ) captures all the temporal dependence structure of yt . While time series are often non-Gaussian, Ω(·) is a computationally and algebraically manageable forecastability measure – similarly to the importance of variance in PCA for iid data, even though they are rarely Gaussian. 4. ForeCA: Maximizing Forecastability Recall from Eq. (1) that we want to find a linear combination of a multivariate Xt that makes yt = w> Xt as forecastable as possible. Based on the forecastability measure in Section 3, we can now formally define the ForeCA optimization problem: ! Rπ f (λ) log f (λ)dλ y y a −π , max Ω(w> Xt ) = max 1 + w w loga (2π) (16) > subject to w ΣX w = 1, (17) where (17) must hold since (11) uses the spectral density of yt , i.e. we need Vyt = w> ΣX w = 1. Property 3.2c seems to let (16) only have a trivial boundary solution. However, it is intuitively clear that combining uncorrelated series makes forecasting (in general) more difficult, e.g., signal + noise. But if Ext ys 6= 0 for some s, t ∈ Z then combining them can make√ it simpler: for some α ∈ (0, 1) it holds Ω(αxt + 1 − α2 yt ) > max{Ω(xt ), Ω(yt )}. To optimize the right hand side of (16) we need to evaluate fy (λ) = fw> Xt (λ) for various w and do this efficiently. We now show how to obtain fy (λ) by simple matrix-vector multiplication from fX (λ). 4.1. Spectrum of Multivariate Time Series and Their Linear Combinations For multivariate Xt the spectrum equals SX (λ) = ∞ 1 X ΓX (k)e2πikλ , 2π λ ∈ [−π, π]. (18) k=−∞ Contrary to the univariate case, (18) is in general complex-valued. Yet, since ΓX (k) = ΓX (−k)> , SX (λ) ∈ Cn×n is Hermitian for every λ, SX (λ) = SX (λ)> , where z = a − ib is the complex conjugate of z = a + ib ∈ C (Brockwell and Davis, 1991, p. 436). For dimension reduction we consider linear combinations yt = w> Xt , w ∈ Rn . By assumption Eyt = w> EXt = 0 and γy (k) = Eyt yt−k = w> ΓX (k)w. In particular, γy (0) = σy2 = w> ΣX w. The spectrum of w> Xt can be quickly computed via Sy (λ) = w> SX (λ)w and consequently 4 Advances in “compressed sensing” (Jacques and Vandergheynst, 2010) might improve estimates; see also “nonuniform FFT” (Fessler and Sutton, 2003). fy (λ) = w> SX (λ)w , w > ΣX w λ ∈ [−π, π]. (19) Forecastable Component Analysis Since fy (λ) ≥ 0 for every yt , w> SX (λ)w ≥ 0 for all w ∈ Rn ; thus SX (λ) is positive semi-definite. b ωj ), and then minimizing the quadratic form `(w; (i) wi+1 = arg min w> SbU w, 4.2. Solving the Optimization Problem Since Ω is invariant to shift and scale (Property 3.2b), we shall not only assume zero mean, but also contemporaneously uncorrelated observed signals with unit variance in each component. WLOG consider Ut = −1/2 c ΣX Xt ; thus EUt U> t = In . Given WU for Ut , cX = W cU Σ b −1/2 . the transformation for Xt becomes W X Problem (16) is then equivalent to w∗ = arg min h(w) (20) w,kwk2 =1 where Z π w> SU (λ)w · ` (w; λ) dλ, h(w) = − (21) −π is the spectral entropy (Eq. (11)) of w> Xt as a function of w. We use ` (w; λ) := log w> SU (λ)w = log fw> U (λ) for better readability. In practice we approximate (21) with SbU (ωj ) ∈ Cn×n and thus obtain5 w∗ = arg min b hT (w). (22) w,kwk2 =1 Here 1 b hT (w) = − T T −1 X w> SbU (ωj )w · `b(w; ωj ) (23) j=1 is the discretized version of (20), where `b(w; ωj ) = log w> SbU (ωj )w. Notice that SbU (ωj ) ∈ Cn×n varies with ωj while w ∈ Rn is fixed over all frequencies, which makes it difficult to obtain an analytic, closedform solution. However, (22) can be solved iteratively borrowing ideas from the expectation maximization (EM) algorithm (Dempster, Laird, and Rubin, 1977). 4.2.1. A Convergent EM-like Algorithm For every w ∈ Rn , kwk2 = 1, h(w) has the form of a mixture model with weights π b(j | w) := >b b w R π SU (ωj )w ≥ 0 and “log-likelihood” ` (w; ωj ). Since f > (λ)dλ = 1, π b(j | w) is indeed a discrete prob−π w U ability distribution over {ωj | 0 = 1, . . . , T − 1}. Just as in an EM algorithm, the objective h(w) can be optimized iteratively by first fixing w ← w(i) in 5 (24) w,kwk2 =1 We use ‘‘wosa’’ estimates (sapa R package). However, any other estimate of SU (λ) can be used. PT −1 (i) where SbU = − T1 j=0 SbU (ωj ) · `(wi ; ωj ). (i) Proposition 4.1. SbU is positive semi-definite. Thus (24) can be solved analytically by the last eigen(i) vector of SbU – automatically guaranteeing kwk2 = 1. The procedure iterates until kwi+1 − wi k < tol for some tolerance level tol. For initialization we sample w0 from an n-dimensional uniform hyper-cube, qP n 2 Un (−1, 1), and normalize to w0 = w0 / j=1 wj,0 . Theorem 4.2 (Convergence). The sequence {wi }i≥0 obtained via (24) converges to a local minimum (∗) (∗) b hT (w∗ ) = λmin ≥ 0, where limi→∞ wi = w∗ and λmin (∗) is the smallest eigenvalue of Sb . U T,(∗) Corollary 4.3. The transformed data y1 w(∗) > XT1 satisfies b yT,(∗) = 1 − λ∗ . Ω min 1 = (25) Proof of Theorem 4.2. The entropy of a RV taking values in a finite alphabet {ω0 , . . . , ωT −1 } is bounded: 0≤b hT (w) ≤ loga T for all w ∈ Rn . For convergence it remains to be shown that b hT (wi ) ≥ b hT (wi+1 ) with ∗ equality iff wi+1 = wi = w . First, 1 b hT (wi ) = − T T −1 X b i ; ωj ) wi> SU (ωj )wi · `(w j=1 (i) (i) > b SU wi+1 = wi> SbU wi ≥ wi+1 (26) (i) since wi+1 is the last eigenvector of SbU . Second, (i) > b wi+1 SU wi+1 = − ≥− T −1 1 X > b i ; ωj ) w SU (ωj )wi+1 · `(w T j=1 i+1 T −1 1 X > b i+1 ; ωj ) w SU (ωj )wi+1 · `(w T j=1 i+1 (27) =b hT (wi+1 ), Pn where (27) holds as Ep − log q = − j=1 pj log qj ≥ Pn − j=1 pj log pj = Ep − log p for any q 6= p. To lower the chance of landing in local optima we repeat (24) for several random starting positions w0 and then select the best solution. 1351 1311 (b) biplots of ForeCA (top) and PCA (bottom) WATER 2 4 6 Component 8 b (c) scree-plot of Ω(·) 40 0 10 20 30 ForeC 3 40 0 10 20 30 ForeC 4 40 0 10 -0.10 0.05 20 30 ForeC 2 0 10 20 30 ForeC 6 40 0 10 20 30 ForeC 7 40 0 10 20 30 ForeC 8 40 0 10 -0.05 10 20 30 0.10 1.0 CHINA 0 -0.05 0.2 -0.15 LATAM -0.20 0.05 GOLD MINING 0.0 0.2 PC3 0.05 -0.1 2.0 -20 20 ^ (x ) (in %) Ω t 1.5 -60 0 ENERGY 1309 1310 1330 607 566 1149 218 644 1395 207 526 40 197 386 238 1163 341 308 130 620 103 956 618 153 1172 LATAM 633 986 936 636 849 143 44 128 441 596 14 1059 343 236 817 145 825 510 572 1002 49 6 135 916 830 216 182 632 891 1001 115 481 320 1078 41 507 661 429 773 150 31 48 712 126 113 353 952 514 1072 783 118 269 133 623 1281 593 28 266 1346 776 401 808 142 698 84 553 1154 958 168 1246 580 884 283 1190 1179 16 686 793 612 1177 54 1327 70 734 1103 766 575 330 864 602 461 899 578 951 155 163 211 291 771 1053 972 1056 1003 62 233 720 818 753 645 1171 7 969 964 557 707 1404 1124 640 658 862 653 234 1409 31100 351 430 601 1367 140 45 789 996 540 1289 1031 579 803 1348 801 747 562 547 559 166 185 810 360 1253 726 225 1329 190 331 1188 940 66 300 794 1319 665 504 210 364 990 23 539 1191 99 570 696 1227 1115 200 137 55 302 3 8 1321 1201 545 730 1081 37 282 980 1131 699 1392 669 462 754 1060 245 310 252 970 999 850 527 919 279 116 882 804 908 485 621 1200 973 543 193 740 12 369 995 309 714 905 791 4 416417 823824 896 466 976 59 202 454 213 1095 194 582 24 263 121 293 1140 551 199 1243 1088 922 1036 1158 71 47 901 1267 306 914 405 198 787 336 966 352 367 255 1145 643 209 1114 1237 78 335 92 388 231 866 112 328 1148 1401 161 325 254 1015 820 494 1236 1153 304 289 169 941 1313 1051 58 456 911 1352 1316 418 977 419 1184 715 1270 856 2 90 915 1084 903 480 1324 1299 677 272 345 1173 531 512 1357 1361 384 706 1023 589 1181 955 1118 939 344 1204 1055 842 838 1288 299 1213 676 260 682 530 120 948 743 148 723 96 359 165 522 503 906 642 495 46 1354 663 17 800 469 1257 968 785 285 662 685 680 318 1098 180 1268 1296 1075 610 53 1364 1308 319 721 878 673 846 390 1079 963 1336 1291 374 1287 1080 704 327 186 1249 1196 473 563 1142 571 1147 883 873 784 189 342 1022 881 1085 538 1050 458 1398 1089 1342 560 227 689 385 501 183 490 853 1244 934 1221 1383 178 489 894 502 1045 171 946 637 725 1317 691 845 201 893 1265 949 655 162 1112 670 561 569 435 424 1239 1086 1274 534 1210 1272 366 1024 412 765 453 1412 839 886 795 1070 1282 806 506 74 1323 1166 1183 1144 876 927 261 737 701 356 452 821 537 933 1019 590 1378 312 932 542 172 997 639 1189 100 1126 759 226 1116 241 175 301 1379 1242 1214 36 11 159 767 1013 1280 422 1058 717 1344 1403 1048 1125 376 98 807 1155 1198 395 913 924 887 587 1363 72 1037 1374 1286 981 108 954 605 1387 992 483 709 1010 1134 322 826 930 675 1018 591 812 1345 1362 393 666 822 1385 444 1194 154 1303 736 1377 617 660 280 204 1083 1109 295 1264 761 786 1284 1176 727 251 1238 1121 1137 3 63 619 413 1320 1370 455 554 1283 929 247 1069 1041 1209 1276 872 898 1046 219 1175 423 426 1006 230 80 297 442 630 1375 1011 926 656 333 323 1307 273 1301 432 1356 1208 1278 378 131 294 192 449 782 348 138 271 843 93 865 496 1016 1152 857 816 205 681164 1026 1402 755 910 760 3 57 890 1373 1365 1218 1314 584 516 1030 835 1337 90 1180 1076 459460 1168 1167 375 1333 859 848 1130 1195 69 147 749 362 262 1111 1044 814 1139 307 597 731 7 10 340 88 798 1391 1347 445 97 381 694 626 870 702 508 840 700 1101 1405 1376 697 917 64 1025 523 844 595 1040 518 1340 1222 347 889 1216 1410 229 546 1261 975 339 613 1266 311 288 1151 1251 950 257 1074 762 176 533 1256 1192 151 1305 1049 953 1259 1341 354 674 604 1119 1300 305 576 811 1203 1215 431 594 1038 841 1063 1102 1136 809 875 18 836 467 1255 1110 387 270 43 888 447 232 1304 667 991 931 1129 286 568 1262 1328 505 985 1012 852 19 355 421 565 1217 1229 1052 446 249 719 871 1042 79 519 1292 1298 1117 799 690 87 488 497 314 32 599 9 750 394 1223 1141 1202 52 1372 358 1156 558 681 657 744 1068 1027 780 334 627 650 1368 258 1406 535 989 1366 1369 1186 303 647 406 994 987 1226 170 457 513 105 945 851 683 409 1393 525 1384 439 1388 1 1182 792 777 959 1254 957 858 567 962 967 1065 1230 1358 912 892 778 1032 1322 1389 1258 1245 1020 181 191 228 918 757677 695 805 94 874 1211 436 221 815 1021 703 1386 1212 921 942 1028 751 1199 861 85 1339 938 615 109 411 646 748 1073 223 711 829 1250 1162 764 1353 1381 1097 281 15 897 22 529 338 772 867 404 935 745 770 1233 515 484 588 95 1047 235 1293 684 722 819 434 902 904 1228 389 827 586 1225 739 1294 474 264 693 136 1338 854 1033 414 34 1399 1132 598 82 250 521 337 1061 592 438 832 947 965 532 86 57 758 21 1122 847 214 276 101 1248 1220 397 855 259 275 1187 984 831 372 797 243 536 465 511 974 732 111 486 475 1411 796 574 573 925 1092 672 287 1113 1106 106 1231 332 256 380 600 1017 585 1007 392 1252 993 1290 500 134 1224 756 742 1334 1205 735 79 298 1285 678 443 1185 463 1128 407 705 609 1107 583 517 614 448 1325 1413 1359 1135 110 1277 774 651 428 1382 470 652 649 377 555 781 493 274 10 909 790 65 67 907 1035 1099 1039 1043 1133 1271 382 403 1093 1090 757 396 813 1326 775 1335 943 868 324 482 326 371 1004 606 164 1343 961 1064 1295 564 581 477 1360 729 437 900 346 552 960 860 292 179 296 479 738 63 1014 398 544 1247 114 119 923 802 220 265 1094 833 1057 1005 139 828 556 1169 1275 769 1062 415 1263 1355 1273 1279 244 129 5 550 648 1105 450 1104 196 349 188 733 746 1219 577 937 410 1150 321 141 498 1029 692 1235 13 39 224 402 713 1206 370 158 998 724 1193 928 625 1054 INDIA 146 1127 877 391 240 1108 728 2 1123 1371 160 671 1260 634 1407 1394 451 541 1315 379 1174 971 1332 1207 509 433 215 464 1306 222 528 668 471 1000 156 2930 73 1241 1397 51 1159 1408 664 195 895 1380 420 149 982 56 716 373 81 1096 979 659 246 1165 33 708 629 920 1067 427 91 688 122 368 1240 763 885 383 408 548 400 718 1297 1009 741 1120 167 880 679 978 1077 863 8 208 1318 752 1091 628 425 187 1034 125 117 83 468 638 1161 1390 157 315 1170 127 1312 611 1160 1071 478 284 1269 1082 788 242 399 152 608 20 1232 1349 177 440 329 203 524 313 267 641 277 50 654 132 350 487 879 35 1066 768 144 206 104 492 278 1400 472 1143 491 1302 1197 174 248 834 499 173 239 1157 365 317 1178 107 184 687 253 603 1234 123 1087 25 60 1138 124 102 89 944 26 217 316 616 61 237 520 268 361 MINING 635 622 27 624 869 631 42 983 1396 549 988 1008 837 476 1146 1331 212 GOLD CHINA WATER -0.2 EASTEU ENERGY 20 1350 EASTEU INDIA ForeC 5 40 -0.05 620 0.00 0.15 PC1 0.00 ForeC3 -20 0 20 orig PCA SFA ForeCA -0.05 -0.15 155 -0.10 0 20 40 PC4 0.0 0.2 0 (a) daily returns in % -0.15 WATER 1000 Time -20 PC2 0.00 0.15 0 5 -8 -10 4 68 WATER ForeC 1 20 150 MINING 118 558 206125476 1152 97 1135 146 49 1160 445 261 316 1147 365 89208 28 622 992 31 678 CHINA 537 1057 1162 351 46 193 614 950 543 613 367 395 624 1395 610 69 1143 467 267 67 113 508 559 1157 548 944 151 533 275 149 108 986 415 464 459 1139 1290 824 1335 835 1146 1351 188 18 204 128 123 716 1187 941 1234 422 928 1065 791 1253 451 194207 940 1145 931 920 240 842 270 370 77 1748 89 515 1094 396 1399 186 1144 411 1233 37 1076 195 816 723 496 109 92 792 649 852 474 292 1104 361 895 1347 1312 435 180 1112 5 1326 421 357 314 296 129 775 820 468 311 377 1314 135 182 534 532 693 167 1311 383 1319 132 94 1378 836 1189 544 1115 869 147 163 1172 574 166 988 929 657 458 818 586 867 975 837 162 877 1344 638 414 491 290 302 1370 1156 1161 1261 802 672 750 1022 536 666 589 1412 1170 853 1069 998 286 930 611 300 1397 628 916 1306 346 1082 608 84 1138 1392 795 1025 746 663 631 288 642 592 1186 1396 606 156 402 832 1200 1321 1029 484 838 671 694 691 279 524 30 187 191 201 1332 437 227 1014 142 506 1091 767 807 462 1386 1107 452 62 550 794 1113 720 76 1365 426 994 1357 478 1248 1219 1116 168 859 223 1353 873 503 1283 1368 744 787 124 465 114 47 1355 160 497 233 35 514 1133 106 9 522 777 1067 554 708 599600 454 228 1080 788 831 683 1071 1213 1247 1277 739 276 897 616 11021103 713 933 811 1179 1118 982 1054 710 756 460 911 757758 400 1291 840 1 180 1342 328 1266 79 581 1285 917 984 134 1099 404 1337 301 251 573 989 237 312 700 425 656 334 1360 246 1336 889 489 615 1194 502 937 115 1257 576 429 398 971 75 359 1250 1275 1036 1184 116 949 555 ENERGY 1056 505 796 782 172 1226 17 202 1004 1377 409 50 1120 854 1001 211 345 19 282 355 410 810 434 1366 121 768 957 500 96 1129 281 157 635 295 1024 1053 1196 224 152 278 1198 1376 1193 269 1039 306 1106 681 636 647 790 1322 629 833 1034 699 133 1021 1339 1413 1096 925 438 1031 1265 799 806 741 705 733 1359 894 511512 1181 329 416 74 254 379 221 70 952 479 1141 684 293 333 203 1340 562 9 26 585 1211 825 71 1140 1286 1389 1101 1232 310 378 1297 924 1003 977 33 801 604 222 20 1364 1394 1408 1331 697 1320 1131 448 1190 43 1343 170 650 1264 577 412 169 1215 566238 639 386 397 1251 1223 371 769 1016 1009 1083 303 196 898 1318 1315 903 712 1151 447 287 15 865 888 381 856 1382 1367 1407 1173 1163 1063 252 519 239 1292 1114 1400 1310 1241 531 179 264 241 481 256 510 324 936 1255 1124 327 470 1282 1048 735 1018 675 1245 446 1169 444 625 1333 1 206 701 513 742 891 1409 360 1092 375 1260 828 215 621 1238 175 51 504 14 715 1296 424 95 1287 1309 399 966 1086 217 91 885 784 1387 1406 362 1050 53 1254 876 1035 205 1046 1269 1403 66 1341 1345 1272 760 131 851 967 922 1278 210 774 364 725 1244 552 819 442 373 1205 1043 318 908 8788 1032 394 847 646 999 1273 1302 1372 881 1077 661 1237 740 1267 538 1358 882 1217 706 858 1153 1402 730 595 556 499 86 849 1230 590 291 962 348 1242 1262 956 652 487 1204 455 817 340 80 21 729 181 1028 65 93 571 1075 1183 734 305 453 779 72 1060 1274 1281 972 1301 953 284 632 219 863 1097 1201 1105 1276 321 38 283 36 570 1037 262 871 495 273 332 1125 766 594 1300 1119 1305 390 257 645 339 1410 5 18 1040 523 844 64 1391 1130 250 1209 7 212 829 630 1375 737 997 648 719 1246 389 609 1279 413 1349 1295 1294 234 1087 1374 1385 143 797 593 1371 161 668 1110 659 461 567 1023 1293 232 477 634 1010 258 255 907 602 915 507 695 130 728 214 1220 861 667 1166 517 899 225 754 45 839 1303 231 1033 521 1088 111 1 763 376 765 22 1134 664 304 979 289 545 564 884 3209 845 698 978 539 1089 868 372 1229 488 800 961 557 626 846 408 1325 1381 752 387 1178 1225 1117 272 1393 815 368 48 781 864 722 905 463 1288 55 563 1405 890 1228 259 1045 826 1098 896 356 641 320 1203 1171 1352 1398 393 560 1388 575 945 230 918 703 1324 480 1328 1384 780 535 822 1041 686 369 EASTEU 927 747 655 755 651 875 1052 549 1256 100 1177 1298 430 1158 159 921 1362 1329 1316 1007 1068 1258 1122 1005 1093 319 1346 235 798 1030 596 1208 702 158 99 335 669 939 904 1259 993 428 902 1338 1361 1390 313 1182 1055 914 880 1012 783 456 1079 711 298 607 778 1252 850 761 983 886 103 1239 405 724 1015 213 1081 665 5 9 277 879 220 1214 565 401 1373 307 803 814 834 1006 322 855 1216 403 1240 1212 271 58 964 354 974 112 1100 601 471 578 1042 690 1062 1155 138 1192 623 597 687 190 493 772 1148 268 677 317 901 658 1202 1011 243 685 1127 1231 990 199 526 955 676 1304 627 909910 776 679 1354 420 1284 336 417 498 323 923 366 516 1150 486 736 391 830 786 591 973 643 81 883 1026 935 1167 82 862 709 976 42 1380 1073 620 1084 745 1299 85 919 546 1195 943 704 6 1249 1058 29 101 688 718 110 34 11 198 384 178 265 753 696 1176 670 406 347 773 707 1070 1051 660 1017 959 144 39 349 541 553 947 299 689 1027 1224 841 423 1074 529 821 906 1008 587 1401 363 183 483 617 263 40 970 1207 331 551 731 274 436 996 1019 1221 848 1334 598 804 751 433 1356 640 785 1243 482 809 714 141 248 985 407 247 860 1085 1137 900 789 1123 542 958 344 78 57 749 1066 1222 771 870 229 1210 358 23 INDIA 1236 236 1327 1149 965 618 226 960 449 637 1271 427 1044 1411 954 682 297 1002 185 338 912 2 177 1383 738 1111 1263 68 980 153 995 582 991 350 443 579 1218 342 547 337 1061 440 981 823 717 385 1348 25 1 65 102 893 1095 54 1108 176 892 10 127 1317 98 727 469 583 104 759 540 644 1136 1280 805 1 40 1049 1142 492 588 457 1109 808 1268 139 501 528 119 812 872 1191 673 1165 1038 1363 164 285 509 1369 1308 44 680 987 1330 743 374 1121 1090 61 969 603 561 41 654 126 419 963 568 1227 1350 244 32 107 1047 1197 4 56 450 633 1126 948 1132 942 13 294 951 913 1313 105 674 485 762 117 946 441 16 662 827 24 619 1072 584 494 770 242 148 122 380 245 63 260 353 1168 726 1185 192 764 466 1199 136 525 1164 813 431 90 1000 1289 866 1379 392 60 197 1013 326 938 330 418 1174 878 472 1059 8173 843 1188 280 520 1020 732 475 1064 432 26 605 874 612 325 857 352 530 934 1159 692 388 932 1078 1270 309 793 184 887 653218 52 1128 1175 1323 1235 968 527 1154 308 382 200 1404 569 154 31527 580 572 12 266145 216 120 83 721 GOLD 341 73 490 439 253 171 LATAM 343 137 1307 174 249 473 -20 1351 0.00 ForeC1 -20 0 20 40 116 INDIA 150 CHINA 115 508 146147 622 402 528 738 524 1400 295 621 455 460 49 41 132 602 395 720 387 228 267 1075 88 193 346 128 452 204 1270 1031 281 153 613 539 1163 1353 1272 112 113 994 365 291 541 166 15 532 318 1164 763764 479 2 668 468 1224 182 312 1241 443 133 1274 1205 1142 1153 625 1227 1072 651 28 329 19 1202 1094 480 767 206 229 805 1190 97 129 1159 252 506 1386 678 692 496 1229 590 811 727 139 1338 254 596 801 311 607 157635 370 597 5 1308 988 1194 431 412 1388 1091 1295 369 947 330 38 454 555 389 422 1017 232 1187 316 202 708 1002 1009 413 288 863 1404 906 547 272 70 581 731 1135 513 879 742 1329 1302 400 1156 691 434 1225 161 1307 1071 397 130 868 549 744 637 285 563 186 437 470 808 1235 618 209 499 527 143 240 481 165 1207 429 1309 1090 1380 762 1033 741 1259 705 207 WATER 324 652 634 304 425 201 1299 1239 283 858 247 897 981 1216 535 1124 177 236 76 1074 123 45 1048 920 1242 507 718 567 3 857 328 867 924 35 632 1086 641 1173 13 515 10 1192 982 1172 54 235 264 577 757 319 1287 1233 1268 533 476 824 761 410 464 1198 494 953 362 802 583 785 44 77 1411 327 980 585 701 756 1101 1133 671 458 110 428 1276 1330 500 865 217 1061 11271128 156 1140 611 1087 451 1 385 1000 1076 81 1398 381 827 208 1226 638 46 463 1376 971 907 878 1282 854 1314 664 1084 889 1131 1209 357 772 554 1294 1001 1078 975 1384 415 1232 933 1362 588 778 1298 350 372 926 665 552 168 697 843 1120 922 751 771 1132 1070 163 786 320 956 194 1306 1085 66 86 503 929 1304 1355 379 432 914 932 244 873 1278 1286 804 543 587 1381 1360 582 356 1348 117 173 261 760 1058 871 1315 367 108 1319 710 271 486 1109 364 898 92 109 1004 47 1023 314 1214 1248 514 1342 1251 803 266 31 1201 474 386 700 457 167 531 1014 1035 1011 EASTEU 1333 1165 559 891 1037 842 233 1372 396 910 1100 696 1203 942 695 793 347 1020 649 954 781 896 564 72 1262 852 378 383 1005 666 50 1006 392 589 702 647 849 747 946 915 936 931 125 334 62 1183 834 1413 749 1220 737 1285 877 529 302 1363 234 495 1034 484 1139 1200 1007 847 465 375376 765 426 1082 1361 1326 1244 856 782 902 1032 1047 1292 925 592 282 1297 1316 894 974 923 178 795 1261 768 142 966 1063 1393 1332 561 1369 1177 540 1083 1204 30 159 1223 336 487 1401 1111 1217 289 420 322 398 978 895 1215 446 957 226 239 1012 687 323 775 222 1293 965 809 1256 504 774 1180 624 776 344 181 1254 53 276 960 170 4 1246 1267 337 197 377 237 LATAM 676 89 1303 275 930 218 766 810 121 273 1168 586 1366 1098 870 693 519 424 520 883 1178 1339 746 435 670 1354 608 544 1166 1317 603 1402 1147 313 1110 1409 459 393 522 905 243 414 881 1403 1277 1255 34 175 840 409 939 220 65 1219 37 246 399 1054 363 948 569 103 512 1378 1107 93 1064 1379 491 1114 326 876 345 850 614 807 300 1099 1231 1053 573 1390 864 851 1341 550 477 419 669 358 1377 55 1043 798 1069 1290 654 1050 1211 1046 1123 284 784 191 1364 1368 1406 1320 1188 516 517 872 14 348 657 1356 29 141 1022 256 1394 212 725 848 1213 1026 1230 829 1382 94 74 1392 565 558 753 1036 591 263 238 1113 1387 962 656 663 593 482 796 825 200 684 595 1365 698 659 822 1371 935 461 164 1030 490 87 213 343 475 340 11491154 908 601 681 1396 706 679 1057 748 1 604 447 1266 682 594 388 1300 1119 297 1305 826 257 1301 339 269 258 835 1410 48 1040 518 844 523 64 1284 636 321 816 780 21 831 1258 262 1189 1228 57 1130 1243 1081 667 688 439 1021 570 192 373 704 1375 1044 440 630 104 959 572 724 351 721 1405 1060 1018 650 779 740 149 368 880 977 134 1318 310 1408 736 874875 224 598 521 1117 215 1374 296 837 719 699 293 919 1273 1104 916 1106 286 853 23 1324 941 752 355 964 305 301 711 501 219 1222 648 498 912 979 1065 1125 1195 884 1391 59 1029 660 783 1322 686 1260 885 815 268 866 1265 1041 105 210 812 961 1283 1068 1080 1051 963 1019 1325 1206 1118 830 391 189 1092 556 418 1179 1045 1191 945 456 1263 1122 690 1373 1052 135 1175 36 241 1288 548 845 18 359 270 248 646 917 100 199 680 525 833 303 709 568 390 444 909 927 1025 1088 1247 723 789 628 1264 227 1397 777 758 991 888 126 694 1407 33 1351 427 407 394 1096 96 317 1345 1115 1357 675 677 726 84 717 770 943 1340 817 755 119 1252 1112 1249 828 886 183 653 6 674 1197 353 823 366 380 24 423 819 728 511 231 510 196 911 1337 1269 8 87 739 1095 937 1148 952 25 820 839 290 438 382 308 411 1221 1313 940 714 1237 992 306 478 1 7 707 1250 1155 1103 16 449 384 921 846 1049 361 1174 1328 575 799 672 98 645 1331 1323 436 1412 984 1193 1136 ENERGY 255 626 1024 1143615 469 335 1169 813 277 1280 900 1346 571 928 998 1186 557 792 404 1238 1042 821 534 1008 1389 82 352 421 934 832 627 1116 970 1240 599 1395 972 1028 841 950 67 1066 453 1208 838 1176 732 1182 899 309 623 1349 661 371 1016 51 859 1358 976 225 1271 759 790 7118 332 1150 913 639 298 754 433 536 73 610 918 901 734 560 1253 1077 467 987 250 292 1310 730 655 712 545 949 629 713 1067 955 1161 1279 221 999 502 806 405 773 958 1162 860 791 993 890 27 20 566 600 1102 140 32 1089 1167 574 1352 1383 938 1055 354 120 79 136 1056 71 138 1038 1181 861 493 1013 794 1218 729 188 642 11 441 1151 162 1171 892 211 1093 408 174 1062 83 1160 968 374 1105 673 497 360 703 797 1184 1289 1097 605 951 745 836 1015 1134 814 1129 190 9 97 537 1138 1367 198 1359 967 99 722 80 788 12 403 1039 1121 1059 869 1199 1170 986 385 483 341 750 1312 155 609 294 818 489 662 1027 855 60 631 1257 530 996 131 989 1234 983 1311 9633 214 259 485 1296 689 584 1141 466 1291 90 715 1399 299 1236 107 882 187 985 553 969 95 338 576 40 562 944 743 733 78 551 995 1144 462 1010 91 903 1196 52 223 579 505 769 1245 683 893 1370 1185 1321 1003 606 75 61 253 1335 616 416 342 331 185 1327 509 973 1212 1073 445 127 251 1334 401 658 612 471 203 287 862 716 904 643 1344 42 1511146 617 1347 1145 417 260 106 787 230 58 542 546 578 179 85 39 990 1336 1281 538 349 205 216 406 279 1137 43 278 195 307 800 1210 1108 265 1350 152 1152 450 488 176 735 1126 148 448 169 26 280 472473 315 245 63 1158 430 526 114 184 111 644 160 249 22 640 685 1343 1275 172 144 102 8274 124 442 492 171 MINING 158 1079 333 180 242 154 5801157 137 56 122 145 101 619 GOLD 325 69 0 400 -20 0 40 ForeC4 0.00 1311 -60 ForeC2 0.00 -0.10 -60 -0.10 -4 1000 Time 40 -0.10 4 -4 -2 0 0 4 -15 0 4-6 0 5 15-4 INDIA -10 0 400 0 1350 1142 MINING 1395 622 993 1188 991 13094EASTEU 145 1100 1071 616 146 1164 981 1193 1174 678 1318 189 1135 1310 1222 1062 1273 409 92 68 1179 1073 91 959 589 1320 3 133 550 609 1330 1085 1182 75 553 1048 739 1125 128 1313 408 1241 1208 1139 1147 1116 13 747 954 976 17 680 922 982 167 416 1266 467 1323 374 345 1160 1358 986 946 1020 401 112 459 1228 1262 201 539 1059 1168 821 1108 452 1079 1080 1346155 1090 1024 525 683 1398 997 1411 70 686 506 644 1008 224 570 574 1027 661 1055 221 172 584 228 1356 1220 1328 331 1004 1029 202 581 1355 1053 1280 259 1303 1413 883 1198 927 966 62 500 1050 113 1171 907 162 1201 137 621 238 965 1283 74 1191 1294 848 843 435 267 1123 208 1216 1260 1114 461 817 502 985 1295 691 442 554 728 59 1035 1245 934 1237 1097 852 186 281 369 1110 559 462 1165 293 1010 190 1376 395 456 973 778 1001 1204 139 386 1377 1306 716 212 1338 928 286 200 328 1132 515 181 939 1337 1247 1322 742 210 799 1315 1103 1369 1286 429 545 770 670 1003 125 390 1316 746 1254 709 831 1353 1212 1238 656 26 279 1258 451131 1163 903 1077 478 810 1156 786 22 349 536 103 562 578 734 968969 971 1275 505 998 1068 72 791 117 766 776 565 1231 744 1018 763 1240 108 474 1067 958 510 365 379 543 825 58 1109 168 693 1361 177 227 947 136 1340 1154 109 730 1105 800 1113 1276 1332 805 360 937 1013 6 919 313 29 667 270 832 923 1335 142143 737 178 479 534 1151 260 593 264 468 381 819 1388 445 1089 23 488 414 11 292 183 1365 801 795 917 116 784 187 263 131 1251 192 16 1359 CHINA 302 115 234 894 84 522 1394 1291 290 882 375 724 880 718 941 203 1249 1406 1099 1400 384 1307 987 1007 1180 243 775 945 1176 707 1159 1345 636 410 915 833 1138 367 27 235 1297 1288 1134 251 625 289 566 1381 225 1205 422 1372 807 1039 874 196 623 1408 513 5627 14 1269 199 1093 653 165 782783 174 657 1263 8 5 406 310 1221 296 1380 1150 359 458 1057 1112 272 749 335 1211 914 920 37 485 869 1083 298 311 498 9 1244 705 751 233 1217 99 463 604 1284 171 1 214 1207 446 380 1371 95 364 648 1354 432 863 547 1402 449 649 393 1126 1074 652 1185 655 69 1243 1403 592 464 912 913 1349 230 1042 1005 806 1017 277 1088 694 32 INDIA 5 10 140 31 1014 48 979 676 897 503 448 389 854 347 38 901 216 635 118 1343 396 1034 495 645 908 684 144 856 700 773 400 90 394 834 1363 355 288 996 1405 483 858 1033 1287 217 860 7 556 857 397 890 731 415 930 44 723 752 665 845 211 161 326 761 711 579 1304 759 1299 46 8 524 830 892 871 557 612 663 1019 720 569 1265 480 1386 1195 268 526 236 241 497 179 321 861 529 788789 312 413 1285 849 49 1374 226 706 98 950 517 41 1218 823824 940 1148 816 275 1329 780 1082 925 21 423 695 1036 1246 601 1199 1397 1301 1043 1278 231 215 15 411 1177 1327 166 451 1196 1234 1047 424 674 1061 933 307 815 494 223 93 803 886 428 519 696 357 847 765 889 630 1375 1348 78 685 412 340 465 1382 303 111 1203 496 1041 1095 595 632 12 658 39 437 436 385 1393 781 702 637 3 30 33 80 120 134 962 188 647 123 576 611 1314 100 356 643 47 804 421 261 71 701 1385 628 511 141 354 1130 1391 688 618 392 664 585 184 314 453 1127 455 198 868 745 156 1032 794 246 650 1364 382 771 538 1373 1076 53 1045 910 785 14 721 64 558 50 523 844 96 1040 518 714 853 838 820 1293 1410 444 774 521 60 399 1341 207 851 1252 339 105 713 450 257 719 836 454 599 358 329 1305 924 337 362 855 333 417 466 811 764 974 703 613 669 1300 1119 1257 341 594 750 818 875 885 83 1225 1277 34 240 88 262 1215 52 54 151 1149 1379 754 440 673 575 130 271 361 1298 1325 438 1031 535 944 282 254 662 642 1133 1334 1038 163 253 671 977 138 209 841 542 624 28 867 1259 1044 586 881 1224 1181 587 195 299 614 107 1366 888 325 931 760 426 660 741 250 158 287 990 792 610 391 1172 489 332 1189 276 338 548 1155 1137 879 12321233 1086 551 304305 1121 433434 877 336 430 607 1194 477 301 951 152 176 1022 197 269 427 582 284 13671368 1200 1122 1279 371 431 425 617 182 249 160 2204 1157 258 481 736 564 687 1037 1384 659 955 398 245 1344 66 528 634 896 893 86 418 1290 712 420 169 905 904 1390 1025 441 300 699 708 30 102 5 83 447 222 1175 902 677 55 185 318 546 790 935 387 350 1255 975 31167 51 323 899 842 1239 740 255 520 530 129 1271 865 24 753 822 170 94 42 40 320 404 953 870 872 516 921 1101 327 873 943 895 690 948 248 67 294 1011 1308 469 1052 1214 278 963 812 846 348 476 1075 1009 633 1002 813 828 132 1118 698 956 605 191 509 475 76 1336 891 768 1339 1352 1136 1030 507 555 697 884 372 484 978 911 1021 675 1342 900 1202 1104 194 6 39 353 772 317 1253 531 590 101 104 826 777 758 672 403 859 541 126 306 577 808 725 65 793 600 1226 571 352 972 135 35 1120 346 1015 402 8182 366 193 491 471 377 280 180 295 388 572 1289 376 878 1268 1 064 443 1000 537 1098 980 308 756 175 1401 1264 1146 1006 124 334 315 1111 407 1302 1383 319 1186 1115 715 840 970 499 173 936 797 682 493 638 689 487 419 1106 591 929 309 1213 540 563 1129 1407 769 439 568 220 1056 114 1063 LATAM 383 97 960 580 738 835 470 87 651 1210 252 368 827 363 796 370 646 802 1166 285 964 73 11831184 961 19 983 1261 787 735 809 504 110 722 274 762 918 1173 932 512 460 206 457 779 666 106 602 1223 482 1066 1070 1392 864 89 1248 343 681 704 218 668 588 866 1081 63 1 54 43 501 1128 1312 492 237 405 1236 1370 373 629 733 1321 533 1360 620 153 239 898 757 938 596 297 256 1333 862 473 205 1362 567 122 36 1227 952 1250 837 150 710 266 573 1065 729 679 229 213 490 942 244 743 283 1058 232 561 1023 1026 598 949 1292 486 322 814 1051 56 273 887 549 798 79 748 839 1170 18 909 20 1230 1178 219 560 532 1229 1144 344 552 242 1197 1387 316 717 291 1256 265 727 57 1069 1084 1282 995 726 1102 378 640 1092 GOLD 148 641 692 1389 61 906 767 988 324 119 1272 1012 1409 1094 732 1117 342 1219 999 603 967 544 25 994 121 159 1274 926 1281 1140 1192 1209 1267 755 508 51 527 626 147 1141 247 1124 631 1162 1296 1096 1091 1054 876 1270 606 829 1331 1324 1152 1319 1049 472 771046157 1078 1187 1060 1206 1153 164 1317 1072 1158 1326 1145 1235 1028 654 984 597 1107 1242 916 1378 619 1016 127 1169 1161 1412 1399 1357 615 1396 1347 1087 850 957 608 989 1190 149 1404 1143 992 WATER ENERGY -0.2 MINING ENERGY GOLD -60 4 LATAM EASTEU CHINA Forecastable Component Analysis 20 30 40 (d) sample ACF ρb(k) of ForeCs (b ρ(0) = 1 omitted) Figure 2. Equity fund returns analyzed with PCA, SFA, and ForeCA. (Dataset equityFunds in R package fEcofin.) 4.3. Obtaining a K-dimensional Subspace To obtain all K loadings W1,...,K = [w1 , . . . , wK ] that give uncorrelated series yj,t , we iteratively (starting at k = 1) i) compute wk , ii) project U onto the null space ⊥ of W1,...,k → U(k) = W1,...,k U ∈ RK−k , iii) apply the EM-type algorithm on U(k) to obtain w̃k+1 , and finally iv) transform w̃k+1 back to loadings w(k) of U. cU . Doing this for k = 1, . . . , K gives K loadings W −1/2 c c b Loadings for Xt are given by WX = WU ΣX . 5. Applications Here we demonstrate the usefulness of ForeCA to find informative, forecastable signals, but also as a tool for time series classification. 5.1. Improving Portfolio Forecasts Figure 2a shows daily returns of eight equity funds from 2002/01/01 to 2007/05/31 (T = 1413). In the financial context finding forecastable series is an important goal by itself, not just for structure discovery. In particular, we can interpret a linear combination w as a portfolio of stocks. The w∗ with the highest Ω gives the most forecastable portfolio. Figure 2b shows a bi-plot for PCA and ForeCA for (w1 , w2 ) and (w3 , w4 ). As PC 1 weighs all funds almost equally, it represents the average market movement; the second component contrasts Gold & Mining with the rest and we can therefore label PC 2 as the “commodity” index. The third and fourth PC indicate energy/infrastructure and geographic regions. However, even though PC 1 is also the most preb than the dictable PC, it has only a slightly larger Ω most forecastable fund, India (Fig. 2c). On the other hand, combining Water (weight wwater,1 = 0.72) with Energy (0.58) is almost twice as forecastable as India (weights are from ForeC 1 in Fig. 2b). ForeC 2 also has high forecastability by selling Energy & Water (−0.53 & −0.47) and buying Mining & Eastern Europe (0.55 & 0.38). The third and fourth ForeCs seem to be hedging strategies (ForeC 3: Water vs. Energy; ForeC 4: Latin America & Gold vs. China & Mining). As financial data only has very small autocorrelation – and usually at lag 1, if any –, SFA and ForeCA yield overall very similar results, except for a “wrong” ranking by SFA (Fig. 2c): SF 8 is the fastest feature (large, but negative lag 1 autocorrelation), yet it is the second most forecastable component. While it is true that white noise is slower than an auto-regressive process of order 1 (AR(1)) with negative autocorrelation, the latter is still more forecastable. Since we want to reveal intertemporal structure, white noise must be ranked lowest; and ForeCA indeed does so (Fig. 2d). ForeC 5 and 8 detect the 20 day lag (one trading month), but correlations are too low to achieve much higher forecastability than – simpler and faster – SFA. In the next example I study quarterly income data, where ForeCA can leverage its nonparametric power and detect important dependencies at various frequencies automatically from the data. 5.2. Classification of US State Economies I consider quarterly per-capita income growth rates of the “lower 48” from 1982/1 to 2011/4 (last 30 years) gj,t = rj,t − rU S,t , j ∈ {AL, . . ., WY}, where rj,t is the annual growth rate of region j.6 Interested in finding similar state economies within the US, we subtract the US baseline. Clustering states with similar economic dynamics can help to decide where to 6 Publicly available at www.bea.gov/itable. Forecastable Component Analysis σ^ (gt) μ^ (gt) -0.2 0.0 0.2 0.4 (a) Average 0.4 0.6 0.8 1.0 1.2 ρ^ 1(gt) 1.4 1.6 1.8 -0.2 0.0 ρ^ 4(gt) 0.2 0.4 (b) Standard deviation σ b; (c) Lag k = 1 autocorreND omitted (b σN D = 2.98). lation ρb(1) 4 ^ (g ) Ω t |ρ^ 1| f (ω j) (log2-scale) 0.5 1 2 Nevada -0.2 0.0 0.2 0.4 (d) Lag k = 4 autocorrelation ρb(4) |ρ^ 4| white noise 0.125 Nebraska 1 2 3 4 5 6 7 8 b (e) Forecastability Ω 0.0 0.1 0.2 ωj 0.3 0.4 0.5 (f) Spectra fbg (λ) 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 (g) Absolute value of ρb(1) (h) Absolute value of ρb(4) Figure 3. Summary statistics of quarterly income growth rates (in %) from 1982/1 – 2011/4 with respect to US baseline b U S,t ) = 4.86%, ρb1 (rU S,t ) = 0.42, ρb4 (rU S,t ) = 0.13. µ b(rU S,t ) = 1.32%, σ b(rU S,t ) = 0.92% per quarter; Ω(r provide support when facing difficult economic times. For example, if certain states do not show any important dynamics on a 7-8 year scale – also known as the “business cycle” (Hughes Hallett and Richter, 2008) – then it might be better to support states that are affected by these global economy swings. The first row of Fig. 3 displays basic summary statistics: sample average, standard deviation, and first and fourth order autocorrelation. The second row give b statistics related to forecastability: Fig. 3e shows Ω based on the spectra in Fig. 3f; Fig. 3g shows the absolute lag 1 correlation (analogously for lag 4 in Fig. 3h), since two AR(1)s with a ±φ lag 1 coefficient are equivalent in terms of forecasting (compare to SFA ranking in the portfolio example). The spectral densities of Nevada and Nebraska illustrate the intuitive derivation of Ω(xt ) from Eq. (10): for Nebraska all frequencies are equally important and it is thus difficult to forecast any better than the sample mean; contrary, Nevada’s income growth rates are mainly driven by a yearly cycle (ωj ≈ 0.25) and low frequencies, thus Nevada is much easier to forecast. A similar dataset (but annually and for different years) has been analyzed in Dhiral, Kalpakis, Gada, and Puttagunta (2001), who fit AR(1) models to the nonadjusted growth rates rj,t for 25 pre-selected states, and then cluster them in the model space. Although they obtain interpretable results, it is unlikely that US state economies only differ in their lag 1 coefficient. In particular, simple AR(1) models cannot capture the business cycle, which is clearly visible in Fig. 3f (even for the adjusted rates). Similarly, as SFA maximizes lag 1 correlation, it misses the quarterly cycle. ForeCA does not face this model selection bias, but can find forecastability across all frequencies. In particular, only ForeC 4 detects interesting high frequency signals (Fig. 4b). The most forecastable PCs are PC 5, 4, and 1; interestingly PC 3 is least important for forecasting among all 48 PCs. Also note that ForeCs are more interpretable than SFs or PCs (Figs. 4b - 4d). Particularly, ForeC 1 shows a clear ≈ 25 year period (generation cycle), whereas PC 1 looks somewhat arbitrary. Yet, the associated loadings in Fig. 4a are quite similar. 6. Related Work Using predictability to separate signals is not new. In the classic time series literature Box and Tiao (1977) introduced canonical analysis and measure predictive power by the residual variance of fitting vector auto-regression (VAR) models. Recently Matteson and Tsay (2011) propose another DR technique that blends PCA and ICA by separating signals to the extent of fourth moments (but not higher). Stone (2001) use predictability as a contrast function for blind source separation (BSS). While their approach is similar to ours, it relies on subjective measures of “short” and “long” term moving averages, which are then used to produce actual forecasts. Much work in BSS (Gomez-Herrero, Rutanen, and Egiazarian, 2010; Li and Adali, 2010), especially ICA, focuses on minimizing entropy rate. The entropy rate H(yt ) = limt→∞ H(yt | yt−1 , yt−2 , . . .) of a Gaus- Forecastable Component Analysis SF 1 ForeC 1 PC 1 Thomas, 1991, p. 417) H(yt ) = 0.0 0.0 0.2 0.4 ForeC 2 -0.1 0.2 0.5 0.2 -0.1 0.1 -0.3 0.2 0.0 0.3 -0.2 0.0 0.2 0.2 PC 4 0.0 0.2 0.4 -0.3 0.0 0.3 5 PC 1 (b) ForeCs -6 8 4 PC 3 0 0.0 SF 3 1985 1995 2005 1985 (c) SFs 1995 2005 25 log Sy (λ)dλ. (28) −π However, these approaches require VAR model fits and/or numerical optimization. It is important to point out that spectral entropy, i.e., differential entropy of (11), is neither equal nor proportional to the entropy rate in (28). For particular processes they coincide (e.g., for an AR(1); Gibson (1994)), but in general they don’t. They measure different properties of the signal. Thus ICA algorithms based on entropy rate minimization do not yield the same results as ForeCA. In fact, the ForeCA measure can be used to rank ICs by decreasing forecastability. Cardoso (2004) gives an excellent account of the intertwined relations between Gaussianity, autocorrelation, and dependence in multivariate time series and their effect on objective functions for BSS. Exactly because of this tangle, we only consider frequency properties of the signal and not entropy rate – since for forecasting the distribution itself is of minor importance compared to the temporal dependence. 7. Discussion (d) PCs I introduce Forecastable Component Analysis (ForeCA), a new dimension reduction technique for multivariate time series. Contrary to other popular methods – such as PCA or ICA – ForeCA takes temporal dependence into account and actively searches for the most forecastable subspace. ForeCA minimizes the entropy of the spectral density: lower entropy implies a more forecastable signal. The optimization problem has an iterative, yet fast analytic solution, and provably leads to a (local) optimum. orig PCA SFA ForeCA 0 5 ^ (x ) (in %) Ω t 10 15 20 π -4 -0.2 SF 4 PC 4 0.2 0 2 4 6 -4 -0.6 0.0 0.4 ForeC 4 -0.6 2005 -2 0.0 SF 2 -0.4 0.0 0.6 -0.4 0.0 0.4 ForeC 3 -0.6 1995 PC 2 2 ForeC 2 0.4 0.4 -0.4 -0.4 -5 0.0 0 SF 1 0.0 ForeC 1 0.4 0.4 (a) First 4 loadings. 1985 Z On the contrary, the ForeCA measure Ω(yt ) is based on information-theoretic uncertainty and is an inherent property of the stochastic process yt . We believe that this makes Ω(yt ) a more principled measure of forecastability than model-dependent measures. Furthermore, it can be estimated quickly using data-driven, nonparametric techniques. PC 3 SF 4 0.1 0.3 0.0 PC 2 SF 3 ForeC 4 -0.2 -0.2 0.0 0.2 0.4 ForeC 3 -0.4 -0.1 0.3 SF 2 1 1 log 2πe + 2 4π 0 10 20 30 40 50 Component b (e) scree-plot of Ω(·). Figure 4. PCA, SFA, and ForeCA on US income data. sian process is related to the spectrum via (Cover and While SFA is a good approximation (maximizing lag 1 correlation), real world signals often have more complex correlation structure. The here proposed ForeCA can automatically detect arbitrary autocorrelation structure using nonparametric estimators. Applications to financial and macro-economic data demonstrate that ForeCA is better than PCA and SFA at finding the most predictable signals, and can also be used for time series classifications. Forecastable Component Analysis References Box, G. E. P. and G. C. Tiao (1977). A canonical analysis of multiple time series. Biometrika 64 (2), 355–365. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (2 ed.). New York, NY: Springer Series in Statistics. Cardoso, J.-F. (2004). Dependence, correlation and gaussianity in independent component analysis. J. Mach. Learn. Res. 4 (7-8), 1177–1203. Cover, T. M. and J. Thomas (1991). Elements of Information Theory. Wiley. Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B Methodological 39 (1), 1–38. Dhiral, K. K., K. Kalpakis, D. Gada, and V. Puttagunta (2001). Distance Measures for Effective Clustering of ARIMA Time-Series. In Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 273–280. Fessler, J. A. and B. P. Sutton (2003). Nonuniform fast fourier transforms using min-max interpolation. IEEE Trans. Signal Process 51, 560–574. Fryzlewicz, P., G. P. Nason, and R. von Sachs (2008). A wavelet-Fisz approach to spectrum estimation. Journal of Time Series Analysis 29 (5), 868–880. Gibson, J. (1994). What is the interpretation of spectral entropy? In Proceedings of IEEE International Symposium on Information Theory, 1994, pp. 440. Gibson, J., S. Stanners, and S. McClellan (1993). Spectral entropy and coefficient rate for speech coding. In Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on, pp. 925 –929 vol.2. Gomez-Herrero, G., K. Rutanen, and K. Egiazarian (2010). Blind source separation by entropy rate minimization. Signal Processing Letters, IEEE 17 (2), 153 –156. Hughes Hallett, A. and C. Richter (2008). Have the Eurozone economies converged on a common European cycle? International Economics and Economic Policy 5, 71–101. Hyvärinen, A. and E. Oja (2000). Independent Component Analysis: Algorithms and Applications. Neural Networks 13, 411–430. Jacques, L. and P. Vandergheynst (2010). Compressed Sensing: “When sparsity meets sampling”, Chapter 23, pp. 507–528. Wiley-Blackwell. Jolliffe, I. T. (2002). Principal Component Analysis (2 ed.). New York, NY: Springer. Lees, J. M. and J. Park (1995). Multiple-Taper Spectral-Analysis - A Stand-Alone C-Subroutine. Computers & Geosciences 21 (2), 199–236. Li, X.-L. and T. Adali (2010). Blind spatiotemporal separation of second and/or higher-order correlated sources by entropy rate minimization. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 1934 –1937. Matteson, D. S. and R. S. Tsay (2011). Dynamic orthogonal components for multivariate time series. Journal of the American Statistical Association 106 (496), 1450–1463. Nuttal, A. H. and G. C. Carter (1982). Spectral Estimation and Lag Using Combined Time Weighting. In Proceedings of IEEE, Volume 70, pp. 1111–1125. Paninski, L. (2003). Estimation of entropy and mutual information. Neural Comput. 15 (6), 1191–1253. R Development Core Team (2010). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–23, 623–656. Sricharan, K., R. Raich, and A. Hero (2011). K-nearest neighbor estimation of entropies with confidence. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposiumon, pp. 1205 –1209. Stone, J. V. (2001). Blind source separation using temporal predictability. Neural Comput. 13 (7), 1559– 1574. Stowell, D. and M. D. Plumbley (2009). Fast Multidimensional Entropy Estimation by k-d Partitioning. IEEE Signal Processing Letters 16, 537–540. Trobs, M. and G. Heinzel (2006). Improved spectrum estimation from digitized time series on a logarithmic frequency axis. Measurement 39 (2), 120–129. Wiskott, L. and T. J. Sejnowski (2002). Slow Feature Analysis: Unsupervised Learning of Invariances. Neural computation 14 (4), 715–770.