Forecastable Component Analysis - Journal of Machine Learning

advertisement
Forecastable Component Analysis
Georg M. Goerg
Carnegie Mellon University, Department of Statistics, Pittsburgh, PA 15213
E(X − EX)2 in (1); independent component analysis (ICA) recovers statistically independent signals
(Hyvärinen and Oja, 2000); slow feature analysis
(SFA) (Wiskott and Sejnowski, 2002) finds “slow” signals and is equivalent to maximizing the lag 1 autocorrelation coefficient.
Abstract
I introduce Forecastable Component
Analysis (ForeCA), a novel dimension reduction technique for temporally dependent
signals.
Based on a new forecastability
measure, ForeCA finds an optimal transformation to separate a multivariate time series
into a forecastable and an orthogonal white
noise space. I present a converging algorithm
with a fast eigenvector solution. Applications to financial and macro-economic time
series show that ForeCA can successfully
discover informative structure, which can be
used for forecasting as well as classification.
DR techniques are often applied to multivariate time
series Xt , hoping that forecasting on the lowerdimensional space St is more accurate, simpler, more
efficient, etc. Standard DR techniques such as PCA or
ICA, however, do not explicitly address forecastability
of the sources. For example, just because a signal has
high variance does not mean it is easy to forecast.
The R package ForeCA accompanies this
work and is publicly available on CRAN.
1. Introduction
With the rise of high-dimensional datasets it has become important to perform dimension reduction (DR)
to a lower dimensional representation of the data. For
simplicity we consider linear transformations W ∈
Rk×n , which map an n-dimensional X to a k ≤ n dimensional S = WX. Typically, the transformed data
should be somewhat “interesting”; there is no point in
transforming X to an arbitrary S that is less useful,
meaningful, etc. Let ι (S) measure “interestingness” of
S. DR can then be set up as an optimization problem
b j = arg max ι w> X , j = 1, . . . , k, (1)
w
subject to
w∈Rn×1
>
>
wj X ⊥ {w1> X, . . . , wj−1
X},
gmg@stat.cmu.edu
(2)
where (2) is a common DR constraint, which makes
Sj = wj> X orthogonal (uncorrelated) to previously
obtained signals.
For example, principal component analysis (PCA)
keeps large variance signals (Jolliffe, 2002) – ι (X) =
Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR:
W&CP volume 28. Copyright 2013 by the author(s).
Thus let’s define interesting as being predictable. Forecasting is not only good for its own sake (finance, economics), but even when future values are not immediately interesting, signals that do have predictive power
exhibit non-trivial structure by definition – and are
thus easier to interpret. For example, the time series
in Fig. 1 are ordered from least (S&P500 daily returns)
to most forecastable (monthly temperature in Nottingham) according to the ForeCA forecastability measure
Ω(xt ) I propose in Definition 3.1 below. And indeed
moving from left to right they exhibit more structure.
The main contributions of this work are i) a modelfree, comparable measure of forecastability for (stationary) time series (Section 3), ii) a novel data-driven
DR technique, ForeCA, that finds forecastable signals,
iii) an iterative algorithm that provably converges to
(local) optima using fast eigenvector solutions (Section 4), and iv) applications showing that ForeCA
outperforms traditional DR techniques in finding lowdimensional, forecastable subspaces, and that it can
also be used for time series classification (Section 5).
Related work will be reviewed in Section 6.
All computations and simulations were done in R (R
Development Core Team, 2010).
2. Time Series Preliminaries
Let yt be a univariate, second-order stationary time
series with mean Eyt = µy < ∞, variance Vyt = σy2 ,
Fahrenheit
30 45 60
-3000
-1000
1000
1920 1925 1930 1935 1940
Year
^ = 1.25%
Ω
0.1
0.0
0.1
0.2
ωj
0.3
0.4
0.5
ACF
0
20
60
lag
100
140
^ = 14.99%
Ω
0.0
0.1
0.2
ωj
0.3
0.4
0.5
0
^f (ω ) (log-scale)
j
80
-0.5
ACF
60
0.02 1.00
lag
^f (ω ) (log-scale)
j
40
0.0
0.0
ACF
0.6
0.5
Year (3435BC to 1969AD)
5
10
lag
15
20
^ = 34.37%
Ω
0.50
20
2500
Avg temperature in Nottingham
0.01
0
1500
Days
0.0
%
-6
500
0.6
0
0.5
f (ω j) (log-scale)
Mount Campito tree rings
0.6
0 4
S&P 500 returns
width in mm
Forecastable Component Analysis
0.0
0.1
0.2
ωj
0.3
0.4
0.5
Figure 1. Observations (top); sample ACF ρb(k) (middle); smoothed WOSA spectral density estimate (bottom). From
left to right: i) S&P 500 daily returns; ii) Mount Campito tree ring series; iii) monthly mean temperatures in Nottingham.
Data publicly available in R packages: SP500 in MASS; camp in tseries; nottem in datasets.
and autocovariance function (ACVF)
γy (k) = E(yt − µy ) (yt−k − µy ) ,
k ∈ Z.
(3)
The ACVF for univariate processes is symmetric in k,
γy (k) = γy (−k). Let ρ(k) = γ(k)/γ(0) be the autocorrelation function (ACF). A large ρ(k) means that
the process k time steps ago is highly correlated with
the present yt . The sample ACFs ρb(k) in Fig. 1 show
that, e.g., S&P 500 daily returns are uncorrelated with
their own past (stock market efficiency); yearly tree
ring growth is highly correlated over time with significant lags even for k ≥ 100 years; and intuitively
temperature in month t is highly correlated with the
temperature k = 6 (cold ↔ warm) and k = 12 (cold
→ cold; warm → warm) months ago (or in the future).
The building block of time series models is white noise
εt , which has zero mean, finite variance, and is uncorrelated over time: εt ∼ W N (0, σε2 ) iff1 i) Eεt = 0,
ii) Vεt = γε (0) = σε2 , and iii) γε (k) = 0 if k 6= 0. Only
if εt is a Gaussian process, then it is also independent.
For multivariate second-order stationary Xt with
mean2 µ ∈ Rn and covariance matrix ΣX the ACVF
>
Rn×n 3 ΓX (k) = E (Xt − µ) (Xt−k − µ) ,
(4)
is a matrix-valued function of k ∈ Z. In particular, ΓX (0) = ΣX . The diagonal of ΓX (k) contains
the ACVF of each Xi (t); the off-diagonal element
1
2
Iff will be used as an abbreviation for if and only if.
Without loss of generality (WLOG) assume µ = 0.
ΓX (k)(i,j) is the cross-covariance between the ith and
jth series at lag k:
γij (k) = E (Xi,t − µi ) (Xj,t−k − µj ) ∈ R.
(5)
Contrary to γy (k), ΓX (k) is not symmetric, but
ΓX (k) = ΓX (−k)> .
(6)
2.1. Spectrum and Spectral Density
The spectrum of a univariate stationary process can
be defined as the Fourier transform of its ACVF,
Sy (λ) =
∞
1 X
γy (j)eijλ ,
2π j=−∞
λ ∈ [−π, π],
(7)
√
where i = −1 is the imaginary unit. Since γy (k) is
symmetric, the spectrum is a real-valued, non-negative
function, Sy : [−π, π] → R+ . For white noise εt all
σ2
γε (k) = 0 if k 6= 0, thus Sε (λ) = 2πε is constant for
all λ ∈ [−π, π]. When γ(k) > 0 for k 6= 0 the spectrum has peaks at the corresponding frequencies. For
example, the spectral density of monthly temperature
series (right in Fig. 1) has large peaks at λ ≈ π/6 and
π/12, which represent the half- and one-year cycle.3
Vice versa, the ACVF can be recovered from the spec3
Frequencies λ are often scaled by π, λ̃ = λ/π. This
does not change results qualitatively, but simplifies interpretation since the corresponding cycle length equals λ̃−1 .
Forecastable Component Analysis
trum using the inverse Fourier transform
Z π
γy (k) =
Sy (λ)e−ikλ dλ, k ∈ Z.
noise, which is unpredictable by definition (using linear predictors). Consequently, for any stationary yt
(8)
−π
In particular,
Rπ
−π
fy (λ) =
Sy (λ)dλ = σy2 for k = 0. Let
∞
Sy (λ)
1 X
=
ρy (j)eijλ ,
σy2
2π j=−∞
(9)
be
R π the spectral density of yt . As fy (λ) ≥ 0 and
f (λ)dλ = 1, the spectral density can be inter−π y
preted as a probability density function (pdf) of an
(unobserved) random variable (RV) Λ that “lives” on
1
, which
the unit circle. For white noise fε (λ) = 2π
represents the uniform distribution U (−π, π).
Hs,a (yt ) ≤ Hs,a (white noise)
Z π
1
1
=−
loga
dλ = loga 2π,
2π
2π
−π
with equality iff yt is white noise.
Definition 3.1 (Forecastability of a stationary process). For a second-order stationary process yt , let
Ω : yt 7→ [0, ∞],
Ω(yt ) = 1 −
Hs,a (yt )
= 1 − Hs,2π (yt ),
loga (2π)
(12)
be the forecastability of yt .
Remark 2.1 (Spectrum and spectral density). In the
time series literature “spectrum” and “spectral density” are often used interchangeably. Here I reserve
“spectral density” for fy (λ) in (9), as it integrates to
one such as standard probability density functions.
Contrary to other measures in the signal processing
and time series literature, Ω(yt ) does not require actual
forecasts, but is a characteristic of the process yt . It
is therefore not biased to a particular – perhaps suboptimal – model, forecast horizon, or loss function; as
used in e.g., Box and Tiao (1977); Stone (2001).
3. Measuring Forecastability
Properties 3.2. Ω(yt ) satisfies:
Forecasting is inherently tied to the time domain. Yet,
since Eqs. (7) & (8) provide a one-to-one mapping between the time and frequency domain, we can use frequency domain properties to measure forecastability.
The intuition for the proposed measure of forecastability is as follows. Consider
√
yt = 2 cos (2πYt + θ) ,
(10)
θ ∼ U (−π, π), Y ∼ py (y) independent of θ.
One can show that Sy (λ) = py (λ) (Gibson, 1994).
If we have to predict the future of yt , then uncertainty
about yt+h , h > 0, is only manifested in uncertainty
about Y, since cos (2πYt + θ) is a deterministic function of t: less uncertainty about Y means less uncertainty about yt+h . We can measure this uncertainty
using the Shannon entropy of py (y) (Shannon, 1948).
It is thus natural to measure uncertainty about the
future as (differential) entropy of fy (λ),
Z
π
Hs,a (yt ) := −
fy (λ) loga fy (λ)dλ,
(11)
−π
where a > 0 is the logarithm base.
On a finite support [b, c] the maximum entropy occurs
for the uniform distribution U (b, c); thus a flat spectrum should indicate the least predictable sequence.
And indeed, a flat spectrum corresponds to white
a) Ω(yt ) = 0 iff yt is white noise.
b) invariant to scaling and shifting:
Ω(ayt + b) = Ω(yt ) for a, b ∈ R, a 6= 0.
c) max sub-additivity for uncorrelated processes:
p
Ω(αxt + 1 − α2 yt ) ≤ max{Ω(xt ), Ω(yt )}, (13)
if Ext ys = 0 for all s, t ∈ Z; equality iff α ∈ {0, 1}.
The three series in Fig. 1 are ordered (left to right)
b corby increasing forecastability and indeed larger Ω
respond to intuitively more predictable real-world
events: stock returns are in general not predictable;
average monthly temperature is.
We can thus use (12) to guide the search for optimal
w that make yt = w> Xt as forecastable as possible.
3.1. Plug-in Estimator for Ω
To estimate Ω(yt ), we first estimate Sy (λ), normalize
it, and then plug it in (11).
An unbiased estimator of Sy (λ) is the periodogram
T −1
2
1 X
IT,y1T (ωj ) = √
yt e−2πiωj t ,
T t=0
(14)
Forecastable Component Analysis
where ωj = j/T , j = 0, 1, . . . , T − 1 are the (scaled)
Fourier frequencies, and y1T = {y1 , . . . , yT } is a sample
of yt . It is well known that (14) is not a good estimate
(e.g., periodograms are not consistent). In the numerical examples we therefore use weighted overlapping
segment averaging (WOSA) (Nuttal and Carter, 1982)
Sby (ωj ) from the R package sapa: SDF(y, ’’wosa’’).
The bottom row of Figure 1 shows the normalized
by (ωj )
S
fbj,y = PT −1
b (ω ) along with the plug-in estimate
S
j=0
y
j
b 1T ) = 1 +
Ω(y
T
−1
X
fbj,y · loga=T fbj,y .
(15)
j=0
Remark 3.3. Typically, to estimate Eg(X) for X ∼
p(x) (here: g(X) = log p(X)) the sample average is solely over g(xj ) without multiplicative p(xj )
terms. This however assumes
Pn that each xj is sam1
pled
from
p(x)
(and
thus
i=1 g(xi ) → Ep g(X) =
n
R
g(x)p(x)dx by the strong law of large numbers).
While this is true in a standard sampling framework,
here the “data” are the Fourier frequencies ωj and the
fast Fourier transform (FFT) samples them uniformly
(and deterministically) from [−π, π] and not according
to the “true” spectral density f (λ).4
Eq. (15) can be improved by a better spectral density (Fryzlewicz, Nason, and von Sachs, 2008; Lees and
Park, 1995; Trobs and Heinzel, 2006) and entropy estimation (Paninski, 2003). Future research can also
address direct estimation of (11) – as is common for
classic entropy estimates (Sricharan, Raich, and Hero,
2011; Stowell and Plumbley, 2009). However, since
neither spectrum nor entropy estimation are the primary focus of this work, we use standard estimators
for Sy (λ) and then the plug-in estimator of (15).
b T ) in (15) is based on
It must be noted though that Ω(y
1
discrete rather than differential entropy. It still has the
intuitive property that white noise has zero estimated
b T ) ∈ [0, 1]; Ω(y
b T ) = 1 iff
forecastability, but now Ω(y
1
1
the sample is a perfect sinusoid. Applications show
that (15) yields reasonable estimates and we do not
expect the results to change qualitatively for other estimators. We leave differential entropy estimates of Ω
to future work.
Notice that Ω(yt ) relies on Gaussianity as only then
fy (λ) captures all the temporal dependence structure
of yt . While time series are often non-Gaussian, Ω(·) is
a computationally and algebraically manageable forecastability measure – similarly to the importance of
variance in PCA for iid data, even though they are
rarely Gaussian.
4. ForeCA: Maximizing Forecastability
Recall from Eq. (1) that we want to find a linear combination of a multivariate Xt that makes yt = w> Xt
as forecastable as possible. Based on the forecastability measure in Section 3, we can now formally define
the ForeCA optimization problem:
!
Rπ
f
(λ)
log
f
(λ)dλ
y
y
a
−π
,
max Ω(w> Xt ) = max 1 +
w
w
loga (2π)
(16)
>
subject to w ΣX w = 1,
(17)
where (17) must hold since (11) uses the spectral density of yt , i.e. we need Vyt = w> ΣX w = 1.
Property 3.2c seems to let (16) only have a trivial
boundary solution. However, it is intuitively clear
that combining uncorrelated series makes forecasting
(in general) more difficult, e.g., signal + noise. But if
Ext ys 6= 0 for some s, t ∈ Z then combining them
can make√ it simpler: for some α ∈ (0, 1) it holds
Ω(αxt + 1 − α2 yt ) > max{Ω(xt ), Ω(yt )}.
To optimize the right hand side of (16) we need to
evaluate fy (λ) = fw> Xt (λ) for various w and do this
efficiently. We now show how to obtain fy (λ) by simple
matrix-vector multiplication from fX (λ).
4.1. Spectrum of Multivariate Time Series and
Their Linear Combinations
For multivariate Xt the spectrum equals
SX (λ) =
∞
1 X
ΓX (k)e2πikλ ,
2π
λ ∈ [−π, π]. (18)
k=−∞
Contrary to the univariate case, (18) is in general
complex-valued. Yet, since ΓX (k) = ΓX (−k)> ,
SX (λ) ∈ Cn×n is Hermitian for every λ, SX (λ) =
SX (λ)> , where z = a − ib is the complex conjugate
of z = a + ib ∈ C (Brockwell and Davis, 1991, p. 436).
For dimension reduction we consider linear combinations yt = w> Xt , w ∈ Rn . By assumption Eyt =
w> EXt = 0 and γy (k) = Eyt yt−k = w> ΓX (k)w.
In particular, γy (0) = σy2 = w> ΣX w. The spectrum of w> Xt can be quickly computed via Sy (λ) =
w> SX (λ)w and consequently
4
Advances in “compressed sensing” (Jacques and Vandergheynst, 2010) might improve estimates; see also “nonuniform FFT” (Fessler and Sutton, 2003).
fy (λ) =
w> SX (λ)w
,
w > ΣX w
λ ∈ [−π, π].
(19)
Forecastable Component Analysis
Since fy (λ) ≥ 0 for every yt , w> SX (λ)w ≥ 0 for all
w ∈ Rn ; thus SX (λ) is positive semi-definite.
b ωj ), and then minimizing the quadratic form
`(w;
(i)
wi+1 = arg min w> SbU w,
4.2. Solving the Optimization Problem
Since Ω is invariant to shift and scale (Property 3.2b),
we shall not only assume zero mean, but also contemporaneously uncorrelated observed signals with unit
variance in each component. WLOG consider Ut =
−1/2
c
ΣX Xt ; thus EUt U>
t = In . Given WU for Ut ,
cX = W
cU Σ
b −1/2 .
the transformation for Xt becomes W
X
Problem (16) is then equivalent to
w∗ = arg min h(w)
(20)
w,kwk2 =1
where
Z
π
w> SU (λ)w · ` (w; λ) dλ,
h(w) = −
(21)
−π
is the spectral entropy (Eq. (11)) of w> Xt as a function of w. We use ` (w; λ) := log w> SU (λ)w =
log fw> U (λ) for better readability.
In practice we approximate (21) with SbU (ωj ) ∈ Cn×n
and thus obtain5
w∗ = arg min b
hT (w).
(22)
w,kwk2 =1
Here
1
b
hT (w) = −
T
T
−1 X
w> SbU (ωj )w · `b(w; ωj )
(23)
j=1
is the discretized version of (20), where `b(w; ωj ) =
log w> SbU (ωj )w. Notice that SbU (ωj ) ∈ Cn×n varies
with ωj while w ∈ Rn is fixed over all frequencies,
which makes it difficult to obtain an analytic, closedform solution. However, (22) can be solved iteratively
borrowing ideas from the expectation maximization
(EM) algorithm (Dempster, Laird, and Rubin, 1977).
4.2.1. A Convergent EM-like Algorithm
For every w ∈ Rn , kwk2 = 1, h(w) has the
form of a mixture model with weights π
b(j | w) :=
>b
b
w
R π SU (ωj )w ≥ 0 and “log-likelihood” ` (w; ωj ). Since
f > (λ)dλ = 1, π
b(j | w) is indeed a discrete prob−π w U
ability distribution over {ωj | 0 = 1, . . . , T − 1}.
Just as in an EM algorithm, the objective h(w) can
be optimized iteratively by first fixing w ← w(i) in
5
(24)
w,kwk2 =1
We use ‘‘wosa’’ estimates (sapa R package). However, any other estimate of SU (λ) can be used.
PT −1
(i)
where SbU = − T1 j=0 SbU (ωj ) · `(wi ; ωj ).
(i)
Proposition 4.1. SbU is positive semi-definite.
Thus (24) can be solved analytically by the last eigen(i)
vector of SbU – automatically guaranteeing kwk2 = 1.
The procedure iterates until kwi+1 − wi k < tol for
some tolerance level tol. For initialization we sample w0 from an n-dimensional uniform
hyper-cube,
qP
n
2
Un (−1, 1), and normalize to w0 = w0 /
j=1 wj,0 .
Theorem 4.2 (Convergence). The sequence {wi }i≥0
obtained via (24) converges to a local minimum
(∗)
(∗)
b
hT (w∗ ) = λmin ≥ 0, where limi→∞ wi = w∗ and λmin
(∗)
is the smallest eigenvalue of Sb .
U
T,(∗)
Corollary 4.3. The transformed data y1
w(∗) > XT1 satisfies
b yT,(∗) = 1 − λ∗ .
Ω
min
1
=
(25)
Proof of Theorem 4.2. The entropy of a RV taking
values in a finite alphabet {ω0 , . . . , ωT −1 } is bounded:
0≤b
hT (w) ≤ loga T for all w ∈ Rn . For convergence
it remains to be shown that b
hT (wi ) ≥ b
hT (wi+1 ) with
∗
equality iff wi+1 = wi = w . First,
1
b
hT (wi ) = −
T
T
−1
X
b i ; ωj )
wi> SU (ωj )wi · `(w
j=1
(i)
(i)
> b
SU wi+1
= wi> SbU wi ≥ wi+1
(26)
(i)
since wi+1 is the last eigenvector of SbU . Second,
(i)
> b
wi+1
SU wi+1 = −
≥−
T −1
1 X >
b i ; ωj )
w SU (ωj )wi+1 · `(w
T j=1 i+1
T −1
1 X >
b i+1 ; ωj )
w SU (ωj )wi+1 · `(w
T j=1 i+1
(27)
=b
hT (wi+1 ),
Pn
where (27) holds as Ep − log q = − j=1 pj log qj ≥
Pn
− j=1 pj log pj = Ep − log p for any q 6= p.
To lower the chance of landing in local optima we repeat (24) for several random starting positions w0 and
then select the best solution.
1351
1311
(b) biplots of ForeCA (top)
and PCA (bottom)
WATER
2
4
6
Component
8
b
(c) scree-plot of Ω(·)
40
0
10
20 30
ForeC 3
40
0
10
20 30
ForeC 4
40
0
10
-0.10 0.05
20 30
ForeC 2
0
10
20 30
ForeC 6
40
0
10
20 30
ForeC 7
40
0
10
20 30
ForeC 8
40
0
10
-0.05
10
20
30
0.10
1.0
CHINA
0
-0.05
0.2
-0.15
LATAM
-0.20 0.05
GOLD
MINING
0.0 0.2
PC3
0.05 -0.1
2.0
-20
20
^ (x ) (in %)
Ω
t
1.5
-60
0
ENERGY
1309
1310
1330
607
566
1149
218
644
1395
207
526
40
197
386
238
1163
341
308
130
620
103
956
618
153
1172
LATAM
633
986
936
636
849
143
44
128
441
596
14
1059
343
236
817
145
825
510
572
1002
49
6
135
916
830
216
182
632
891
1001
115
481
320
1078
41
507
661
429
773
150
31
48
712
126
113
353
952
514
1072
783
118
269
133
623
1281
593
28
266
1346
776
401
808
142
698
84
553
1154
958
168
1246
580
884
283
1190
1179
16
686
793
612
1177
54
1327
70
734
1103
766
575
330
864
602
461
899
578
951
155
163
211
291
771
1053
972
1056
1003
62
233
720
818
753
645
1171
7
969
964
557
707
1404
1124
640
658
862
653
234
1409
31100
351
430
601
1367
140
45
789
996
540
1289
1031
579
803
1348
801
747
562
547
559
166
185
810
360
1253
726
225
1329
190
331
1188
940
66
300
794
1319
665
504
210
364
990
23
539
1191
99
570
696
1227
1115
200
137
55
302
3
8
1321
1201
545
730
1081
37
282
980
1131
699
1392
669
462
754
1060
245
310
252
970
999
850
527
919
279
116
882
804
908
485
621
1200
973
543
193
740
12
369
995
309
714
905
791
4
416417
823824
896
466
976
59
202
454
213
1095
194
582
24
263
121
293
1140
551
199
1243
1088
922
1036
1158
71
47
901
1267
306
914
405
198
787
336
966
352
367
255
1145
643
209
1114
1237
78
335
92
388
231
866
112
328
1148
1401
161
325
254
1015
820
494
1236
1153
304
289
169
941
1313
1051
58
456
911
1352
1316
418
977
419
1184
715
1270
856
2
90
915
1084
903
480
1324
1299
677
272
345
1173
531
512
1357
1361
384
706
1023
589
1181
955
1118
939
344
1204
1055
842
838
1288
299
1213
676
260
682
530
120
948
743
148
723
96
359
165
522
503
906
642
495
46
1354
663
17
800
469
1257
968
785
285
662
685
680
318
1098
180
1268
1296
1075
610
53
1364
1308
319
721
878
673
846
390
1079
963
1336
1291
374
1287
1080
704
327
186
1249
1196
473
563
1142
571
1147
883
873
784
189
342
1022
881
1085
538
1050
458
1398
1089
1342
560
227
689
385
501
183
490
853
1244
934
1221
1383
178
489
894
502
1045
171
946
637
725
1317
691
845
201
893
1265
949
655
162
1112
670
561
569
435
424
1239
1086
1274
534
1210
1272
366
1024
412
765
453
1412
839
886
795
1070
1282
806
506
74
1323
1166
1183
1144
876
927
261
737
701
356
452
821
537
933
1019
590
1378
312
932
542
172
997
639
1189
100
1126
759
226
1116
241
175
301
1379
1242
1214
36
11
159
767
1013
1280
422
1058
717
1344
1403
1048
1125
376
98
807
1155
1198
395
913
924
887
587
1363
72
1037
1374
1286
981
108
954
605
1387
992
483
709
1010
1134
322
826
930
675
1018
591
812
1345
1362
393
666
822
1385
444
1194
154
1303
736
1377
617
660
280
204
1083
1109
295
1264
761
786
1284
1176
727
251
1238
1121
1137
3
63
619
413
1320
1370
455
554
1283
929
247
1069
1041
1209
1276
872
898
1046
219
1175
423
426
1006
230
80
297
442
630
1375
1011
926
656
333
323
1307
273
1301
432
1356
1208
1278
378
131
294
192
449
782
348
138
271
843
93
865
496
1016
1152
857
816
205
681164
1026
1402
755
910
760
3
57
890
1373
1365
1218
1314
584
516
1030
835
1337
90
1180
1076
459460
1168
1167
375
1333
859
848
1130
1195
69
147
749
362
262
1111
1044
814
1139
307
597
731
7
10
340
88
798
1391
1347
445
97
381
694
626
870
702
508
840
700
1101
1405
1376
697
917
64
1025
523
844
595
1040
518
1340
1222
347
889
1216
1410
229
546
1261
975
339
613
1266
311
288
1151
1251
950
257
1074
762
176
533
1256
1192
151
1305
1049
953
1259
1341
354
674
604
1119
1300
305
576
811
1203
1215
431
594
1038
841
1063
1102
1136
809
875
18
836
467
1255
1110
387
270
43
888
447
232
1304
667
991
931
1129
286
568
1262
1328
505
985
1012
852
19
355
421
565
1217
1229
1052
446
249
719
871
1042
79
519
1292
1298
1117
799
690
87
488
497
314
32
599
9
750
394
1223
1141
1202
52
1372
358
1156
558
681
657
744
1068
1027
780
334
627
650
1368
258
1406
535
989
1366
1369
1186
303
647
406
994
987
1226
170
457
513
105
945
851
683
409
1393
525
1384
439
1388
1
1182
792
777
959
1254
957
858
567
962
967
1065
1230
1358
912
892
778
1032
1322
1389
1258
1245
1020
181
191
228
918
757677
695
805
94
874
1211
436
221
815
1021
703
1386
1212
921
942
1028
751
1199
861
85
1339
938
615
109
411
646
748
1073
223
711
829
1250
1162
764
1353
1381
1097
281
15
897
22
529
338
772
867
404
935
745
770
1233
515
484
588
95
1047
235
1293
684
722
819
434
902
904
1228
389
827
586
1225
739
1294
474
264
693
136
1338
854
1033
414
34
1399
1132
598
82
250
521
337
1061
592
438
832
947
965
532
86
57
758
21
1122
847
214
276
101
1248
1220
397
855
259
275
1187
984
831
372
797
243
536
465
511
974
732
111
486
475
1411
796
574
573
925
1092
672
287
1113
1106
106
1231
332
256
380
600
1017
585
1007
392
1252
993
1290
500
134
1224
756
742
1334
1205
735
79
298
1285
678
443
1185
463
1128
407
705
609
1107
583
517
614
448
1325
1413
1359
1135
110
1277
774
651
428
1382
470
652
649
377
555
781
493
274
10
909
790
65
67
907
1035
1099
1039
1043
1133
1271
382
403
1093
1090
757
396
813
1326
775
1335
943
868
324
482
326
371
1004
606
164
1343
961
1064
1295
564
581
477
1360
729
437
900
346
552
960
860
292
179
296
479
738
63
1014
398
544
1247
114
119
923
802
220
265
1094
833
1057
1005
139
828
556
1169
1275
769
1062
415
1263
1355
1273
1279
244
129
5
550
648
1105
450
1104
196
349
188
733
746
1219
577
937
410
1150
321
141
498
1029
692
1235
13
39
224
402
713
1206
370
158
998
724
1193
928
625
1054
INDIA 146
1127
877
391
240
1108
728
2
1123
1371
160
671
1260
634
1407
1394
451
541
1315
379
1174
971
1332
1207
509
433
215
464
1306
222
528
668
471
1000
156
2930
73
1241
1397
51
1159
1408
664
195
895
1380
420
149
982
56
716
373
81
1096
979
659
246
1165
33
708
629
920
1067
427
91
688
122
368
1240
763
885
383
408
548
400
718
1297
1009
741
1120
167
880
679
978
1077
863
8
208
1318
752
1091
628
425
187
1034
125
117
83
468
638
1161
1390
157
315
1170
127
1312
611
1160
1071
478
284
1269
1082
788
242
399
152
608
20
1232
1349
177
440
329
203
524
313
267
641
277
50
654
132
350
487
879
35
1066
768
144
206
104
492
278
1400
472
1143
491
1302
1197
174
248
834
499
173
239
1157
365
317
1178
107
184
687
253
603
1234
123
1087
25
60
1138
124
102
89
944
26
217
316
616
61
237
520
268
361
MINING
635
622
27
624
869
631
42
983
1396
549
988
1008
837
476
1146
1331
212
GOLD CHINA
WATER
-0.2
EASTEU
ENERGY
20
1350
EASTEU
INDIA
ForeC 5
40
-0.05
620
0.00 0.15
PC1
0.00
ForeC3
-20
0
20
orig
PCA
SFA
ForeCA
-0.05
-0.15
155
-0.10
0 20 40
PC4
0.0 0.2
0
(a) daily returns in %
-0.15
WATER
1000
Time
-20
PC2
0.00 0.15
0 5 -8
-10
4
68
WATER
ForeC 1
20
150
MINING
118
558
206125476
1152
97
1135
146
49
1160
445
261
316
1147
365
89208
28
622
992
31
678
CHINA
537
1057
1162
351
46
193
614
950
543
613
367
395
624
1395
610
69
1143
467
267
67
113
508
559
1157
548
944
151
533
275
149
108
986
415
464
459
1139
1290
824
1335
835
1146
1351
188
18
204
128
123
716
1187
941
1234
422
928
1065
791
1253
451
194207
940
1145
931
920
240
842
270
370
77
1748
89
515
1094
396
1399
186
1144
411
1233
37
1076
195
816
723
496
109
92
792
649
852
474
292
1104
361
895
1347
1312
435
180
1112
5
1326
421
357
314
296
129
775
820
468
311
377
1314
135
182
534
532
693
167
1311
383
1319
132
94
1378
836
1189
544
1115
869
147
163
1172
574
166
988
929
657
458
818
586
867
975
837
162
877
1344
638
414
491
290
302
1370
1156
1161
1261
802
672
750
1022
536
666
589
1412
1170
853
1069
998
286
930
611
300
1397
628
916
1306
346
1082
608
84
1138
1392
795
1025
746
663
631
288
642
592
1186
1396
606
156
402
832
1200
1321
1029
484
838
671
694
691
279
524
30
187
191
201
1332
437
227
1014
142
506
1091
767
807
462
1386
1107
452
62
550
794
1113
720
76
1365
426
994
1357
478
1248
1219
1116
168
859
223
1353
873
503
1283
1368
744
787
124
465
114
47
1355
160
497
233
35
514
1133
106
9
522
777
1067
554
708
599600
454
228
1080
788
831
683
1071
1213
1247
1277
739
276
897
616
11021103
713
933
811
1179
1118
982
1054
710
756
460
911
757758
400
1291
840
1
180
1342
328
1266
79
581
1285
917
984
134
1099
404
1337
301
251
573
989
237
312
700
425
656
334
1360
246
1336
889
489
615
1194
502
937
115
1257
576
429
398
971
75
359
1250
1275
1036
1184
116
949
555
ENERGY
1056
505
796
782
172
1226
17
202
1004
1377
409
50
1120
854
1001
211
345
19
282
355
410
810
434
1366
121
768
957
500
96
1129
281
157
635
295
1024
1053
1196
224
152
278
1198
1376
1193
269
1039
306
1106
681
636
647
790
1322
629
833
1034
699
133
1021
1339
1413
1096
925
438
1031
1265
799
806
741
705
733
1359
894
511512
1181
329
416
74
254
379
221
70
952
479
1141
684
293
333
203
1340
562
9
26
585
1211
825
71
1140
1286
1389
1101
1232
310
378
1297
924
1003
977
33
801
604
222
20
1364
1394
1408
1331
697
1320
1131
448
1190
43
1343
170
650
1264
577
412
169
1215
566238
639
386
397
1251
1223
371
769
1016
1009
1083
303
196
898
1318
1315
903
712
1151
447
287
15
865
888
381
856
1382
1367
1407
1173
1163
1063
252
519
239
1292
1114
1400
1310
1241
531
179
264
241
481
256
510
324
936
1255
1124
327
470
1282
1048
735
1018
675
1245
446
1169
444
625
1333
1
206
701
513
742
891
1409
360
1092
375
1260
828
215
621
1238
175
51
504
14
715
1296
424
95
1287
1309
399
966
1086
217
91
885
784
1387
1406
362
1050
53
1254
876
1035
205
1046
1269
1403
66
1341
1345
1272
760
131
851
967
922
1278
210
774
364
725
1244
552
819
442
373
1205
1043
318
908
8788
1032
394
847
646
999
1273
1302
1372
881
1077
661
1237
740
1267
538
1358
882
1217
706
858
1153
1402
730
595
556
499
86
849
1230
590
291
962
348
1242
1262
956
652
487
1204
455
817
340
80
21
729
181
1028
65
93
571
1075
1183
734
305
453
779
72
1060
1274
1281
972
1301
953
284
632
219
863
1097
1201
1105
1276
321
38
283
36
570
1037
262
871
495
273
332
1125
766
594
1300
1119
1305
390
257
645
339
1410
5
18
1040
523
844
64
1391
1130
250
1209
7
212
829
630
1375
737
997
648
719
1246
389
609
1279
413
1349
1295
1294
234
1087
1374
1385
143
797
593
1371
161
668
1110
659
461
567
1023
1293
232
477
634
1010
258
255
907
602
915
507
695
130
728
214
1220
861
667
1166
517
899
225
754
45
839
1303
231
1033
521
1088
111
1
763
376
765
22
1134
664
304
979
289
545
564
884
3209
845
698
978
539
1089
868
372
1229
488
800
961
557
626
846
408
1325
1381
752
387
1178
1225
1117
272
1393
815
368
48
781
864
722
905
463
1288
55
563
1405
890
1228
259
1045
826
1098
896
356
641
320
1203
1171
1352
1398
393
560
1388
575
945
230
918
703
1324
480
1328
1384
780
535
822
1041
686
369
EASTEU
927
747
655
755
651
875
1052
549
1256
100
1177
1298
430
1158
159
921
1362
1329
1316
1007
1068
1258
1122
1005
1093
319
1346
235
798
1030
596
1208
702
158
99
335
669
939
904
1259
993
428
902
1338
1361
1390
313
1182
1055
914
880
1012
783
456
1079
711
298
607
778
1252
850
761
983
886
103
1239
405
724
1015
213
1081
665
5
9
277
879
220
1214
565
401
1373
307
803
814
834
1006
322
855
1216
403
1240
1212
271
58
964
354
974
112
1100
601
471
578
1042
690
1062
1155
138
1192
623
597
687
190
493
772
1148
268
677
317
901
658
1202
1011
243
685
1127
1231
990
199
526
955
676
1304
627
909910
776
679
1354
420
1284
336
417
498
323
923
366
516
1150
486
736
391
830
786
591
973
643
81
883
1026
935
1167
82
862
709
976
42
1380
1073
620
1084
745
1299
85
919
546
1195
943
704
6
1249
1058
29
101
688
718
110
34
11
198
384
178
265
753
696
1176
670
406
347
773
707
1070
1051
660
1017
959
144
39
349
541
553
947
299
689
1027
1224
841
423
1074
529
821
906
1008
587
1401
363
183
483
617
263
40
970
1207
331
551
731
274
436
996
1019
1221
848
1334
598
804
751
433
1356
640
785
1243
482
809
714
141
248
985
407
247
860
1085
1137
900
789
1123
542
958
344
78
57
749
1066
1222
771
870
229
1210
358
23
INDIA
1236
236
1327
1149
965
618
226
960
449
637
1271
427
1044
1411
954
682
297
1002
185
338
912
2
177
1383
738
1111
1263
68
980
153
995
582
991
350
443
579
1218
342
547
337
1061
440
981
823
717
385
1348
25
1
65
102
893
1095
54
1108
176
892
10
127
1317
98
727
469
583
104
759
540
644
1136
1280
805
1
40
1049
1142
492
588
457
1109
808
1268
139
501
528
119
812
872
1191
673
1165
1038
1363
164
285
509
1369
1308
44
680
987
1330
743
374
1121
1090
61
969
603
561
41
654
126
419
963
568
1227
1350
244
32
107
1047
1197
4
56
450
633
1126
948
1132
942
13
294
951
913
1313
105
674
485
762
117
946
441
16
662
827
24
619
1072
584
494
770
242
148
122
380
245
63
260
353
1168
726
1185
192
764
466
1199
136
525
1164
813
431
90
1000
1289
866
1379
392
60
197
1013
326
938
330
418
1174
878
472
1059
8173
843
1188
280
520
1020
732
475
1064
432
26
605
874
612
325
857
352
530
934
1159
692
388
932
1078
1270
309
793
184
887
653218
52
1128
1175
1323
1235
968
527
1154
308
382
200
1404
569
154
31527
580
572
12 266145
216
120
83
721
GOLD
341
73
490
439
253
171
LATAM
343
137
1307
174
249
473
-20
1351
0.00
ForeC1
-20 0 20 40
116
INDIA
150
CHINA
115
508 146147
622
402
528
738
524
1400
295
621
455
460
49
41
132 602
395
720
387
228
267
1075
88
193
346
128
452
204
1270
1031
281
153
613
539
1163 1353
1272
112
113
994
365
291
541
166
15
532
318
1164
763764
479
2
668
468
1224
182
312
1241
443
133
1274
1205
1142
1153
625
1227
1072
651
28
329
19
1202
1094
480
767
206
229
805
1190
97
129
1159
252
506
1386
678
692
496
1229
590
811
727
139
1338
254
596
801
311
607
157635
370
597
5
1308
988
1194
431
412
1388
1091
1295
369
947
330
38
454
555
389
422
1017
232
1187
316
202
708
1002
1009
413
288
863
1404
906
547
272
70
581
731
1135
513
879
742
1329
1302
400
1156
691
434
1225
161
1307
1071
397
130
868
549
744
637
285
563
186
437
470
808
1235
618
209
499
527
143
240
481
165
1207
429
1309
1090
1380
762
1033
741
1259
705
207
WATER
324
652
634
304
425
201
1299
1239
283
858
247
897
981
1216
535
1124
177
236
76
1074
123
45
1048
920
1242
507
718
567
3
857
328
867
924
35
632
1086
641
1173
13
515
10
1192
982
1172
54
235
264
577
757
319
1287
1233
1268
533
476
824
761
410
464
1198
494
953
362
802
583
785
44
77
1411
327
980
585
701
756
1101
1133
671
458
110
428
1276
1330
500
865
217
1061
11271128
156 1140
611
1087
451
1
385
1000
1076
81
1398
381
827
208
1226
638
46
463
1376
971
907
878
1282
854
1314
664
1084
889
1131
1209
357
772
554
1294
1001
1078
975
1384
415
1232
933
1362
588
778
1298
350
372
926
665
552
168
697
843
1120
922
751
771
1132
1070
163
786
320
956
194
1306
1085
66
86
503
929
1304
1355
379
432
914
932
244
873
1278
1286
804
543
587
1381
1360
582
356
1348
117
173
261
760
1058
871
1315
367
108
1319
710
271
486
1109
364
898
92
109
1004
47
1023
314
1214
1248
514
1342
1251
803
266
31
1201
474
386
700
457
167
531
1014
1035
1011
EASTEU
1333
1165
559
891
1037
842
233
1372
396
910
1100
696
1203
942
695
793
347
1020
649
954
781
896
564
72
1262
852
378
383
1005
666
50
1006
392
589
702
647
849
747
946
915
936
931
125
334
62
1183
834
1413
749
1220
737
1285
877
529
302
1363
234
495
1034
484
1139
1200
1007
847
465
375376
765
426
1082
1361
1326
1244
856
782
902
1032
1047
1292
925
592
282
1297
1316
894
974
923
178
795
1261
768
142
966
1063
1393
1332
561
1369
1177
540
1083
1204
30
159
1223
336
487
1401
1111
1217
289
420
322
398
978
895
1215
446
957
226
239
1012
687
323
775
222
1293
965
809
1256
504
774
1180
624
776
344
181
1254
53
276
960
170
4
1246
1267
337
197
377
237
LATAM
676
89
1303
275
930
218
766
810
121
273
1168
586
1366
1098
870
693
519
424
520
883
1178
1339
746
435
670
1354
608
544
1166
1317
603
1402
1147
313
1110
1409
459
393
522
905
243
414
881
1403
1277
1255
34
175
840
409
939
220
65
1219
37
246
399
1054
363
948
569
103
512
1378
1107
93
1064
1379
491
1114
326
876
345
850
614
807
300
1099
1231
1053
573
1390
864
851
1341
550
477
419
669
358
1377
55
1043
798
1069
1290
654
1050
1211
1046
1123
284
784
191
1364
1368
1406
1320
1188
516
517
872
14
348
657
1356
29
141
1022
256
1394
212
725
848
1213
1026
1230
829
1382
94
74
1392
565
558
753
1036
591
263
238
1113
1387
962
656
663
593
482
796
825
200
684
595
1365
698
659
822
1371
935
461
164
1030
490
87
213
343
475
340
11491154
908
601
681
1396
706
679
1057
748
1
604
447
1266
682
594
388
1300
1119
297
1305
826
257
1301
339
269
258
835
1410
48
1040
518
844
523
64
1284
636
321
816
780
21
831
1258
262
1189
1228
57
1130
1243
1081
667
688
439
1021
570
192
373
704
1375
1044
440
630
104
959
572
724
351
721
1405
1060
1018
650
779
740
149
368
880
977
134
1318
310
1408
736
874875
224
598
521
1117
215
1374
296
837
719
699
293
919
1273
1104
916
1106
286
853
23
1324
941
752
355
964
305
301
711
501
219
1222
648
498
912
979
1065
1125
1195
884
1391
59
1029
660
783
1322
686
1260
885
815
268
866
1265
1041
105
210
812
961
1283
1068
1080
1051
963
1019
1325
1206
1118
830
391
189
1092
556
418
1179
1045
1191
945
456
1263
1122
690
1373
1052
135
1175
36
241
1288
548
845
18
359
270
248
646
917
100
199
680
525
833
303
709
568
390
444
909
927
1025
1088
1247
723
789
628
1264
227
1397
777
758
991
888
126
694
1407
33
1351
427
407
394
1096
96
317
1345
1115
1357
675
677
726
84
717
770
943
1340
817
755
119
1252
1112
1249
828
886
183
653
6
674
1197
353
823
366
380
24
423
819
728
511
231
510
196
911
1337
1269
8
87
739
1095
937
1148
952
25
820
839
290
438
382
308
411
1221
1313
940
714
1237
992
306
478
1
7
707
1250
1155
1103
16
449
384
921
846
1049
361
1174 1328
575
799
672
98
645
1331
1323
436
1412
984
1193
1136
ENERGY
255
626
1024
1143615
469
335
1169
813
277
1280
900
1346
571
928
998
1186
557
792
404
1238
1042
821
534
1008
1389
82
352
421
934
832
627
1116
970
1240
599
1395
972
1028
841
950
67
1066
453
1208
838
1176
732
1182
899
309
623
1349
661
371
1016
51
859
1358
976
225
1271
759
790
7118
332
1150
913
639
298
754
433
536
73
610
918
901
734
560
1253
1077
467
987
250
292
1310
730
655
712
545
949
629
713
1067
955
1161
1279
221
999
502
806
405
773
958
1162
860
791
993
890
27
20
566
600
1102
140
32
1089
1167
574
1352
1383
938
1055
354
120
79
136
1056
71
138
1038
1181
861
493
1013
794
1218
729
188
642
11
441
1151
162
1171
892
211
1093
408
174
1062
83
1160
968
374
1105
673
497
360
703
797
1184
1289
1097
605
951
745
836
1015
1134
814
1129
190
9
97
537
1138
1367
198
1359
967
99
722
80
788
12
403
1039
1121
1059
869
1199
1170
986
385
483
341
750
1312
155
609
294
818
489
662
1027
855
60
631
1257
530
996
131
989
1234
983
1311
9633
214
259
485
1296
689
584
1141
466
1291
90
715
1399
299
1236
107
882
187
985
553
969
95
338
576
40
562
944
743
733
78
551
995
1144
462
1010
91
903
1196
52
223
579
505
769
1245
683
893
1370
1185
1321
1003
606
75
61
253
1335
616
416
342
331
185
1327
509
973
1212
1073
445
127
251
1334
401
658
612
471
203
287
862
716
904
643
1344
42
1511146
617
1347 1145
417
260
106
787
230
58
542
546
578
179
85
39
990
1336
1281
538
349
205
216
406
279
1137
43
278
195
307
800
1210
1108
265
1350
152
1152
450
488
176
735
1126
148
448
169
26
280
472473
315
245
63
1158
430
526
114
184
111
644
160
249
22
640
685
1343
1275
172
144
102
8274
124
442
492
171
MINING
158
1079
333
180
242
154 5801157
137
56
122
145
101
619
GOLD
325
69
0 400
-20
0 40
ForeC4
0.00
1311
-60
ForeC2
0.00
-0.10
-60
-0.10
-4
1000
Time
40
-0.10
4 -4
-2
0
0
4 -15
0
4-6
0
5 15-4
INDIA
-10
0 400
0
1350
1142 MINING
1395 622
993 1188
991 13094EASTEU
145
1100
1071
616
146
1164
981
1193 1174
678
1318
189
1135
1310
1222
1062
1273
409
92
68
1179
1073
91
959
589
1320
3
133
550
609
1330
1085
1182
75
553
1048
739
1125
128
1313
408
1241
1208
1139
1147
1116
13
747
954
976
17
680
922
982
167
416
1266
467
1323
374
345
1160
1358
986
946
1020
401
112
459
1228
1262
201
539
1059
1168
821
1108
452
1079
1080
1346155
1090
1024
525
683
1398
997
1411
70
686
506
644
1008
224
570
574
1027
661
1055
221
172
584
228
1356
1220
1328
331
1004
1029
202
581
1355
1053
1280
259
1303
1413
883
1198
927
966
62
500
1050
113
1171
907
162
1201
137
621
238
965
1283
74
1191
1294
848
843
435
267
1123
208
1216
1260
1114
461
817
502
985
1295
691
442
554
728
59
1035
1245
934
1237
1097
852
186
281
369
1110
559
462
1165
293
1010
190
1376
395
456
973
778
1001
1204
139
386
1377
1306
716
212
1338
928
286
200
328
1132
515
181
939
1337
1247
1322
742
210
799
1315
1103
1369
1286
429
545
770
670
1003
125
390
1316
746
1254
709
831
1353
1212
1238
656
26
279
1258
451131 1163
903
1077
478
810
1156 786
22
349
536
103
562
578
734
968969
971
1275
505
998
1068
72
791
117
766
776
565
1231
744
1018
763
1240
108
474
1067
958
510
365
379
543
825
58
1109
168
693
1361
177
227
947
136
1340
1154
109
730
1105
800
1113
1276
1332
805
360
937
1013
6
919
313
29
667
270
832
923
1335
142143
737
178
479
534
1151
260
593
264
468
381
819
1388
445
1089
23
488
414
11
292
183
1365
801
795
917
116
784
187
263
131
1251
192
16
1359
CHINA
302
115
234
894
84
522
1394
1291
290
882
375
724
880
718
941
203
1249
1406
1099
1400
384
1307
987
1007
1180
243
775
945
1176
707
1159
1345
636
410
915
833
1138
367
27
235
1297
1288
1134
251
625
289
566
1381
225
1205
422
1372
807
1039
874
196
623
1408
513
5627
14
1269
199
1093
653
165
782783
174
657
1263
8
5
406
310
1221
296
1380
1150
359
458
1057
1112
272
749
335
1211
914
920
37
485
869
1083
298
311
498
9
1244
705
751
233
1217
99
463
604
1284
171
1
214
1207
446
380
1371
95
364
648
1354
432
863
547
1402
449
649
393
1126
1074
652
1185
655
69
1243
1403
592
464
912
913
1349
230
1042
1005
806
1017
277
1088
694
32
INDIA
5
10
140
31
1014
48
979
676
897
503
448
389
854
347
38
901
216
635
118
1343
396
1034
495
645
908
684
144
856
700
773
400
90
394
834
1363
355
288
996
1405
483
858
1033
1287
217
860
7
556
857
397
890
731
415
930
44
723
752
665
845
211
161
326
761
711
579
1304
759
1299
46
8
524
830
892
871
557
612
663
1019
720
569
1265
480
1386
1195
268
526
236
241
497
179
321
861
529
788789
312
413
1285
849
49
1374
226
706
98
950
517
41
1218
823824
940
1148
816
275
1329
780
1082
925
21
423
695
1036
1246
601
1199
1397
1301
1043
1278
231
215
15
411
1177
1327
166
451
1196
1234
1047
424
674
1061
933
307
815
494
223
93
803
886
428
519
696
357
847
765
889
630
1375
1348
78
685
412
340
465
1382
303
111
1203
496
1041
1095
595
632
12
658
39
437
436
385
1393
781
702
637
3
30
33
80
120
134
962
188
647
123
576
611
1314
100
356
643
47
804
421
261
71
701
1385
628
511
141
354
1130
1391
688
618
392
664
585
184
314
453
1127
455
198
868
745
156
1032
794
246
650
1364
382
771
538
1373
1076
53
1045
910
785
14
721
64
558
50
523
844
96
1040
518
714
853
838
820
1293
1410
444
774
521
60
399
1341
207
851
1252
339
105
713
450
257
719
836
454
599
358
329
1305
924
337
362
855
333
417
466
811
764
974
703
613
669
1300
1119
1257
341
594
750
818
875
885
83
1225
1277
34
240
88
262
1215
52
54
151 1149
1379
754
440
673
575
130
271
361
1298
1325
438
1031
535
944
282
254
662
642
1133
1334
1038
163
253
671
977
138
209
841
542
624
28
867
1259
1044
586
881
1224
1181
587
195
299
614
107
1366
888
325
931
760
426
660
741
250
158
287
990
792
610
391
1172
489
332
1189
276
338
548
1155
1137
879
12321233
1086
551
304305
1121
433434
877
336
430
607
1194
477
301
951
152
176
1022
197
269
427
582
284
13671368
1200
1122
1279
371
431
425
617
182
249
160
2204
1157
258
481
736
564
687
1037
1384
659
955
398
245
1344
66
528
634
896
893
86
418
1290
712
420
169
905
904
1390
1025
441
300
699
708
30
102
5
83
447
222
1175
902
677
55
185
318
546
790
935
387
350
1255
975
31167
51
323
899
842
1239
740
255
520
530
129
1271
865
24
753
822
170
94
42
40
320
404
953
870
872
516
921
1101
327
873
943
895
690
948
248
67
294
1011
1308
469
1052
1214
278
963
812
846
348
476
1075
1009
633
1002
813
828
132
1118
698
956
605
191
509
475
76
1336
891
768
1339
1352
1136
1030
507
555
697
884
372
484
978
911
1021
675
1342
900
1202
1104
194
6
39
353
772
317
1253
531
590
101
104
826
777
758
672
403
859
541
126
306
577
808
725
65
793
600
1226
571
352
972
135
35
1120
346
1015
402
8182
366
193
491
471
377
280
180
295
388
572
1289
376
878
1268
1
064
443
1000
537
1098
980
308
756
175
1401
1264
1146
1006
124
334
315
1111
407
1302
1383
319
1186
1115
715
840
970
499
173
936
797
682
493
638
689
487
419
1106
591
929
309
1213
540
563
1129
1407
769
439
568
220
1056
114
1063
LATAM
383
97
960
580
738
835
470
87
651
1210
252
368
827
363
796
370
646
802
1166
285
964
73
11831184
961
19
983
1261
787
735
809
504
110
722
274
762
918
1173
932
512
460
206
457
779
666
106
602
1223
482
1066
1070
1392
864
89
1248
343
681
704
218
668
588
866
1081
63
1
54
43
501
1128
1312
492
237
405
1236
1370
373
629
733
1321
533
1360
620
153
239
898
757
938
596
297
256
1333
862
473
205
1362
567
122
36
1227
952
1250
837
150
710
266
573
1065
729
679
229
213
490
942
244
743
283
1058
232
561
1023
1026
598
949
1292
486
322
814
1051
56
273
887
549
798
79
748
839
1170
18
909
20
1230
1178
219
560
532
1229
1144
344
552
242
1197
1387
316
717
291
1256
265
727
57
1069
1084
1282
995
726
1102
378
640
1092
GOLD
148
641
692
1389
61
906
767
988
324
119
1272
1012
1409
1094
732
1117
342
1219
999
603
967
544
25
994
121
159
1274
926
1281
1140
1192
1209
1267
755
508
51
527
626
147
1141
247
1124
631
1162
1296
1096
1091
1054
876
1270
606
829
1331
1324
1152
1319
1049
472
771046157
1078
1187
1060
1206
1153
164
1317
1072
1158
1326
1145
1235
1028
654
984
597
1107
1242
916
1378
619
1016
127
1169
1161
1412
1399
1357
615
1396
1347
1087
850
957
608
989
1190
149
1404
1143 992
WATER
ENERGY
-0.2
MINING ENERGY
GOLD
-60
4
LATAM EASTEU
CHINA
Forecastable Component Analysis
20
30
40
(d) sample ACF ρb(k) of
ForeCs (b
ρ(0) = 1 omitted)
Figure 2. Equity fund returns analyzed with PCA, SFA, and ForeCA. (Dataset equityFunds in R package fEcofin.)
4.3. Obtaining a K-dimensional Subspace
To obtain all K loadings W1,...,K = [w1 , . . . , wK ] that
give uncorrelated series yj,t , we iteratively (starting at
k = 1) i) compute wk , ii) project U onto the null space
⊥
of W1,...,k → U(k) = W1,...,k
U ∈ RK−k , iii) apply
the EM-type algorithm on U(k) to obtain w̃k+1 , and
finally iv) transform w̃k+1 back to loadings w(k) of U.
cU .
Doing this for k = 1, . . . , K gives K loadings W
−1/2
c
c
b
Loadings for Xt are given by WX = WU ΣX .
5. Applications
Here we demonstrate the usefulness of ForeCA to find
informative, forecastable signals, but also as a tool for
time series classification.
5.1. Improving Portfolio Forecasts
Figure 2a shows daily returns of eight equity funds
from 2002/01/01 to 2007/05/31 (T = 1413). In the
financial context finding forecastable series is an important goal by itself, not just for structure discovery.
In particular, we can interpret a linear combination w
as a portfolio of stocks. The w∗ with the highest Ω
gives the most forecastable portfolio.
Figure 2b shows a bi-plot for PCA and ForeCA for
(w1 , w2 ) and (w3 , w4 ). As PC 1 weighs all funds almost equally, it represents the average market movement; the second component contrasts Gold & Mining
with the rest and we can therefore label PC 2 as the
“commodity” index. The third and fourth PC indicate
energy/infrastructure and geographic regions.
However, even though PC 1 is also the most preb than the
dictable PC, it has only a slightly larger Ω
most forecastable fund, India (Fig. 2c). On the other
hand, combining Water (weight wwater,1 = 0.72) with
Energy (0.58) is almost twice as forecastable as India
(weights are from ForeC 1 in Fig. 2b). ForeC 2 also has
high forecastability by selling Energy & Water (−0.53
& −0.47) and buying Mining & Eastern Europe (0.55
& 0.38). The third and fourth ForeCs seem to be hedging strategies (ForeC 3: Water vs. Energy; ForeC 4:
Latin America & Gold vs. China & Mining).
As financial data only has very small autocorrelation
– and usually at lag 1, if any –, SFA and ForeCA yield
overall very similar results, except for a “wrong” ranking by SFA (Fig. 2c): SF 8 is the fastest feature (large,
but negative lag 1 autocorrelation), yet it is the second most forecastable component. While it is true that
white noise is slower than an auto-regressive process of
order 1 (AR(1)) with negative autocorrelation, the latter is still more forecastable. Since we want to reveal
intertemporal structure, white noise must be ranked
lowest; and ForeCA indeed does so (Fig. 2d).
ForeC 5 and 8 detect the 20 day lag (one trading
month), but correlations are too low to achieve much
higher forecastability than – simpler and faster – SFA.
In the next example I study quarterly income data,
where ForeCA can leverage its nonparametric power
and detect important dependencies at various frequencies automatically from the data.
5.2. Classification of US State Economies
I consider quarterly per-capita income growth rates of
the “lower 48” from 1982/1 to 2011/4 (last 30 years)
gj,t = rj,t − rU S,t ,
j ∈ {AL, . . ., WY},
where rj,t is the annual growth rate of region j.6 Interested in finding similar state economies within the US,
we subtract the US baseline. Clustering states with
similar economic dynamics can help to decide where to
6
Publicly available at www.bea.gov/itable.
Forecastable Component Analysis
σ^ (gt)
μ^ (gt)
-0.2
0.0
0.2
0.4
(a) Average
0.4
0.6
0.8
1.0
1.2
ρ^ 1(gt)
1.4
1.6
1.8
-0.2
0.0
ρ^ 4(gt)
0.2
0.4
(b) Standard deviation σ
b; (c) Lag k = 1 autocorreND omitted (b
σN D = 2.98). lation ρb(1)
4
^ (g )
Ω
t
|ρ^ 1|
f (ω j) (log2-scale)
0.5
1
2
Nevada
-0.2
0.0
0.2
0.4
(d) Lag k = 4 autocorrelation ρb(4)
|ρ^ 4|
white noise
0.125
Nebraska
1
2
3
4
5
6
7
8
b
(e) Forecastability Ω
0.0
0.1
0.2
ωj
0.3
0.4
0.5
(f) Spectra fbg (λ)
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
0.5
(g) Absolute value of ρb(1) (h) Absolute value of ρb(4)
Figure 3. Summary statistics of quarterly income growth rates (in %) from 1982/1 – 2011/4 with respect to US baseline
b U S,t ) = 4.86%, ρb1 (rU S,t ) = 0.42, ρb4 (rU S,t ) = 0.13.
µ
b(rU S,t ) = 1.32%, σ
b(rU S,t ) = 0.92% per quarter; Ω(r
provide support when facing difficult economic times.
For example, if certain states do not show any important dynamics on a 7-8 year scale – also known as the
“business cycle” (Hughes Hallett and Richter, 2008)
– then it might be better to support states that are
affected by these global economy swings.
The first row of Fig. 3 displays basic summary statistics: sample average, standard deviation, and first and
fourth order autocorrelation. The second row give
b
statistics related to forecastability: Fig. 3e shows Ω
based on the spectra in Fig. 3f; Fig. 3g shows the absolute lag 1 correlation (analogously for lag 4 in Fig.
3h), since two AR(1)s with a ±φ lag 1 coefficient are
equivalent in terms of forecasting (compare to SFA
ranking in the portfolio example).
The spectral densities of Nevada and Nebraska illustrate the intuitive derivation of Ω(xt ) from Eq. (10):
for Nebraska all frequencies are equally important and
it is thus difficult to forecast any better than the sample mean; contrary, Nevada’s income growth rates are
mainly driven by a yearly cycle (ωj ≈ 0.25) and low
frequencies, thus Nevada is much easier to forecast.
A similar dataset (but annually and for different years)
has been analyzed in Dhiral, Kalpakis, Gada, and Puttagunta (2001), who fit AR(1) models to the nonadjusted growth rates rj,t for 25 pre-selected states,
and then cluster them in the model space. Although
they obtain interpretable results, it is unlikely that US
state economies only differ in their lag 1 coefficient. In
particular, simple AR(1) models cannot capture the
business cycle, which is clearly visible in Fig. 3f (even
for the adjusted rates).
Similarly, as SFA maximizes lag 1 correlation, it misses
the quarterly cycle. ForeCA does not face this model
selection bias, but can find forecastability across all
frequencies. In particular, only ForeC 4 detects interesting high frequency signals (Fig. 4b). The most
forecastable PCs are PC 5, 4, and 1; interestingly PC
3 is least important for forecasting among all 48 PCs.
Also note that ForeCs are more interpretable than SFs
or PCs (Figs. 4b - 4d). Particularly, ForeC 1 shows a
clear ≈ 25 year period (generation cycle), whereas PC
1 looks somewhat arbitrary. Yet, the associated loadings in Fig. 4a are quite similar.
6. Related Work
Using predictability to separate signals is not new.
In the classic time series literature Box and Tiao
(1977) introduced canonical analysis and measure predictive power by the residual variance of fitting vector auto-regression (VAR) models. Recently Matteson
and Tsay (2011) propose another DR technique that
blends PCA and ICA by separating signals to the extent of fourth moments (but not higher).
Stone (2001) use predictability as a contrast function
for blind source separation (BSS). While their approach is similar to ours, it relies on subjective measures of “short” and “long” term moving averages,
which are then used to produce actual forecasts.
Much work in BSS (Gomez-Herrero, Rutanen, and
Egiazarian, 2010; Li and Adali, 2010), especially ICA,
focuses on minimizing entropy rate. The entropy rate
H(yt ) = limt→∞ H(yt | yt−1 , yt−2 , . . .) of a Gaus-
Forecastable Component Analysis
SF 1
ForeC 1
PC 1
Thomas, 1991, p. 417)
H(yt ) =
0.0
0.0 0.2 0.4
ForeC 2
-0.1
0.2
0.5
0.2
-0.1 0.1
-0.3
0.2
0.0
0.3
-0.2 0.0
0.2
0.2
PC 4
0.0 0.2 0.4
-0.3
0.0
0.3
5
PC 1
(b) ForeCs
-6
8
4
PC 3
0
0.0
SF 3
1985
1995
2005
1985
(c) SFs
1995
2005
25
log Sy (λ)dλ.
(28)
−π
However, these approaches require VAR model fits
and/or numerical optimization.
It is important to point out that spectral entropy, i.e.,
differential entropy of (11), is neither equal nor proportional to the entropy rate in (28). For particular
processes they coincide (e.g., for an AR(1); Gibson
(1994)), but in general they don’t. They measure different properties of the signal. Thus ICA algorithms
based on entropy rate minimization do not yield the
same results as ForeCA. In fact, the ForeCA measure
can be used to rank ICs by decreasing forecastability.
Cardoso (2004) gives an excellent account of the intertwined relations between Gaussianity, autocorrelation,
and dependence in multivariate time series and their
effect on objective functions for BSS. Exactly because
of this tangle, we only consider frequency properties of
the signal and not entropy rate – since for forecasting
the distribution itself is of minor importance compared
to the temporal dependence.
7. Discussion
(d) PCs
I introduce Forecastable Component Analysis
(ForeCA), a new dimension reduction technique for
multivariate time series. Contrary to other popular
methods – such as PCA or ICA – ForeCA takes temporal dependence into account and actively searches
for the most forecastable subspace. ForeCA minimizes
the entropy of the spectral density: lower entropy
implies a more forecastable signal. The optimization
problem has an iterative, yet fast analytic solution,
and provably leads to a (local) optimum.
orig
PCA
SFA
ForeCA
0
5
^ (x ) (in %)
Ω
t
10
15
20
π
-4
-0.2
SF 4
PC 4
0.2
0 2 4 6
-4
-0.6
0.0 0.4
ForeC 4
-0.6
2005
-2
0.0
SF 2
-0.4
0.0
0.6
-0.4
0.0 0.4
ForeC 3
-0.6
1995
PC 2
2
ForeC 2
0.4
0.4
-0.4
-0.4
-5
0.0
0
SF 1
0.0
ForeC 1
0.4
0.4
(a) First 4 loadings.
1985
Z
On the contrary, the ForeCA measure Ω(yt ) is based on
information-theoretic uncertainty and is an inherent
property of the stochastic process yt . We believe that
this makes Ω(yt ) a more principled measure of forecastability than model-dependent measures. Furthermore, it can be estimated quickly using data-driven,
nonparametric techniques.
PC 3
SF 4
0.1 0.3
0.0
PC 2
SF 3
ForeC 4
-0.2
-0.2
0.0 0.2 0.4
ForeC 3
-0.4 -0.1
0.3
SF 2
1
1
log 2πe +
2
4π
0
10
20
30
40
50
Component
b
(e) scree-plot of Ω(·).
Figure 4. PCA, SFA, and ForeCA on US income data.
sian process is related to the spectrum via (Cover and
While SFA is a good approximation (maximizing lag
1 correlation), real world signals often have more complex correlation structure. The here proposed ForeCA
can automatically detect arbitrary autocorrelation
structure using nonparametric estimators. Applications to financial and macro-economic data demonstrate that ForeCA is better than PCA and SFA at
finding the most predictable signals, and can also be
used for time series classifications.
Forecastable Component Analysis
References
Box, G. E. P. and G. C. Tiao (1977). A canonical
analysis of multiple time series. Biometrika 64 (2),
355–365.
Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (2 ed.). New York, NY:
Springer Series in Statistics.
Cardoso, J.-F. (2004). Dependence, correlation and
gaussianity in independent component analysis. J.
Mach. Learn. Res. 4 (7-8), 1177–1203.
Cover, T. M. and J. Thomas (1991). Elements of Information Theory. Wiley.
Dempster, A. P., N. M. Laird, and D. B. Rubin (1977).
Maximum likelihood from incomplete data via the
EM algorithm. Journal of the Royal Statistical Society Series B Methodological 39 (1), 1–38.
Dhiral, K. K., K. Kalpakis, D. Gada, and V. Puttagunta (2001). Distance Measures for Effective Clustering of ARIMA Time-Series. In Proceedings of the
2001 IEEE International Conference on Data Mining, pp. 273–280.
Fessler, J. A. and B. P. Sutton (2003). Nonuniform
fast fourier transforms using min-max interpolation.
IEEE Trans. Signal Process 51, 560–574.
Fryzlewicz, P., G. P. Nason, and R. von Sachs (2008).
A wavelet-Fisz approach to spectrum estimation.
Journal of Time Series Analysis 29 (5), 868–880.
Gibson, J. (1994). What is the interpretation of spectral entropy? In Proceedings of IEEE International
Symposium on Information Theory, 1994, pp. 440.
Gibson, J., S. Stanners, and S. McClellan (1993).
Spectral entropy and coefficient rate for speech coding. In Signals, Systems and Computers, 1993. 1993
Conference Record of The Twenty-Seventh Asilomar
Conference on, pp. 925 –929 vol.2.
Gomez-Herrero, G., K. Rutanen, and K. Egiazarian
(2010). Blind source separation by entropy rate minimization. Signal Processing Letters, IEEE 17 (2),
153 –156.
Hughes Hallett, A. and C. Richter (2008). Have the
Eurozone economies converged on a common European cycle? International Economics and Economic
Policy 5, 71–101.
Hyvärinen, A. and E. Oja (2000). Independent
Component Analysis: Algorithms and Applications.
Neural Networks 13, 411–430.
Jacques, L. and P. Vandergheynst (2010). Compressed
Sensing: “When sparsity meets sampling”, Chapter 23, pp. 507–528. Wiley-Blackwell.
Jolliffe, I. T. (2002). Principal Component Analysis (2
ed.). New York, NY: Springer.
Lees, J. M. and J. Park (1995). Multiple-Taper
Spectral-Analysis - A Stand-Alone C-Subroutine.
Computers & Geosciences 21 (2), 199–236.
Li, X.-L. and T. Adali (2010). Blind spatiotemporal
separation of second and/or higher-order correlated
sources by entropy rate minimization. In Acoustics
Speech and Signal Processing (ICASSP), 2010 IEEE
International Conference on, pp. 1934 –1937.
Matteson, D. S. and R. S. Tsay (2011). Dynamic
orthogonal components for multivariate time series. Journal of the American Statistical Association 106 (496), 1450–1463.
Nuttal, A. H. and G. C. Carter (1982). Spectral Estimation and Lag Using Combined Time Weighting.
In Proceedings of IEEE, Volume 70, pp. 1111–1125.
Paninski, L. (2003). Estimation of entropy and mutual
information. Neural Comput. 15 (6), 1191–1253.
R Development Core Team (2010). R: A Language
and Environment for Statistical Computing. Vienna,
Austria: R Foundation for Statistical Computing.
ISBN 3-900051-07-0.
Shannon, C. E. (1948). A Mathematical Theory of
Communication. Bell System Technical Journal 27,
379–23, 623–656.
Sricharan, K., R. Raich, and A. Hero (2011). K-nearest
neighbor estimation of entropies with confidence. In
Information Theory Proceedings (ISIT), 2011 IEEE
International Symposiumon, pp. 1205 –1209.
Stone, J. V. (2001). Blind source separation using temporal predictability. Neural Comput. 13 (7), 1559–
1574.
Stowell, D. and M. D. Plumbley (2009). Fast Multidimensional Entropy Estimation by k-d Partitioning.
IEEE Signal Processing Letters 16, 537–540.
Trobs, M. and G. Heinzel (2006). Improved spectrum
estimation from digitized time series on a logarithmic frequency axis. Measurement 39 (2), 120–129.
Wiskott, L. and T. J. Sejnowski (2002). Slow Feature Analysis: Unsupervised Learning of Invariances. Neural computation 14 (4), 715–770.
Download