Statistical methods in climate prediction

advertisement
Empirical Climate Prediction:
Some Features of EOF ANALYSIS
when used on its own or
for Regression or CCA
Slides taken from Willem Landman, Simon Mason,
and Tony Barnston
A long-standing, simple method to do climate prediction:
Analogs
Finding cases in the past that are similar to the current climate
state, and predicting what happened, on average, in those past cases.
Variations in analog forecasting:
--Number of analogs
--Weighting: equal vs. by degree of similarity
--using EOFs, for “climate state vector” (Barnett & Preisendorfer 1978)
--Inclusion of cases that are opposite (mult. by –1; assumes linearity)
Advantage of analogs: Nonlinearity is taken into account
Constructed analog: Building an analog for the present climate state
from ALL available cases, using a multiple regression overfit (Van den
Dool 1992). This makes analog forecasting more like regression.
A related empirical method
to do climate prediction:
Composites Based on a (Believed)
Relevant Dimension, Such as the ENSO state
Advantage: Nonlinearity taken into account
Example: GUAM ANNUAL RAIN
160
NOTE: RED INDICATES POST EL NINO YEARS
We can make a composite of these
140
120
100
NORMAL
80
60
40
20
NOTE: POST-EL NINO YEARS IN RED
2000
1998
1996
1994
1992
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
1960
1958
1956
1954
1952
1950
0
20
1969-70
GUAM
18
1972-73
1976-77
16
Year after El Nino
1982-83
14
1991-92
1997-98
12
NORMAL
10
8
6
4
2
El Nino Year
0
JAN
1
2
3
4
5
6
7
8
9
10
11
12
JAN
13 14
15
16
17
18
19
20
21
22
23
24
JAN
25
Percent of Normal
GUAM
180
EL NINO YEAR
YEAR AFTER EL NINO
160
140
120
NORMAL
100
80
60
40
20
0
1
JAN
2
3
4
5
6
7
8
9
10
11
12
13
JAN
14
15
16
17
18
19
20
21
22
23
24
25
JAN
GUAM
Average El Nino VS Mean Annual
20
EL NINO YEAR
Average El Nino
18
NORMAL
16
YEAR AFTER EL NINO
14
12
10
8
6
4
2
0
1
JAN
2
3
4
5
6
7
8
9
10
11
12
13
JAN
14
15
16
17
18
19
20
21
22
23
24
25
JAN
Probabilistic
Composites
Based on
El Nino
Mason and
Goddard
(2001)
For Oct-Nov-Dec
precipitation
http://iri.columbia.edu/climate/forecast//enso/index.html
Correlation between two variables
Pearson product-moment correlation
Correlation is a systematic relationship between x and y:
When one goes up, the other tends to go up also, or may
tend to go down. Need corresponding pairs of cases of x, y.
“Perfect” positive correlation is +1
“Perfect” negative correlation is –1
No correlation (x and y completely unrelated) is 0
Correlation can be anywhere between –1 and +1.
A relationship between x and y may or may not be causal –
if not, x and y may be under control of some third variable.
Correlation can be estimated visually by looking at a
scatterplot of dots on an x vs. y graph.
|
|
|
o
|
|
o
o
|
|
o
o
o
|
|
o
|
|
o
o
|
Y|
o
o
o
o
|
|
o
|
|
o
o
|
|
o
o
|
|
o
|
|
o
|
|
o
|
|_______________________________________________|
X
correlation = 0.8
|
o
|
|
o
|
|
o
o
o
|
|
o
o
o
o
|
|
o
o
o
|
|
o
o
|
Y|
o
o
o
o
|
|
o
o
o
|
|
o
o
o
|
|
o
o
|
|
o
|
|
o
o
o
|
|
o
|
|_______________________________________________|
X
correlation = 0.55
|
o
o
o
o
o
|
| o
o
o
oo
o
|
Y | o
o
o
o
o |
|
o
o
o
o
o
|
|
o
o
o
o
o
o
|
|
o
o o
o o
o
|
|
o
o
o
o o
o
|
| o
o
o
o
o
o
|
|
o
o o
o o o
oo o
o
|
|_______________________________________|
X
correlation = 0.0
|
|
|
|
Y |
|
|
o |
|
o
|
|
o
|
|
o
|
|
o
|
| o
|
|_______________________________________|
X
correlation = 1.0
|
o
|
|
o
|
Y |
o
|
|
o
|
|
o
|
|
o
|
|
o
|
|
o
|
|
o
|
|_______________________________________|
X
correlation = 1.0
|
o
|
|
|
Y |
o
|
|
|
|
o
|
|
|
|
o
|
|
|
|
o
|
|_______________________________________|
X
correlation = -1.0
|
o
|
|
o
|
Y |
o
|
|
o
|
|
o
|
|
o
|
|
o
|
|
o
|
|
o
|
|_______________________________________|
X
correlation = undefined because SD of X is zero (X doesn’t change)
|
|
|
|
Y |
|
|
|
| o o o o o o o o o o o o o |
|
|
|
|
|
|
|_______________________________________|
X
correlation = undefined because SD of Y is zero (Y doesn’t change)
|
o
x
|
| o
x
|
Y |
o
x
|
|
x
o
|
|
x
o
|
|
o x
|
|
x
o
|
|
x
o
cor = 0 |
| x
o
cor = 1 |
|_______________________________________|
X
correlation for all 18 points = 0.707 correlation squared = 0.5
When points having a perfect correlation are mixed with
an equal number of points having no correlation, and
the two sets have same mean and variance for X and Y,
correlation is 0.707. Correlation squared (“amount of
variance accounted for”) is 0.5.
|
o |
|
|
Y |
|
|
|
|
|
|
|
|
o
|
|o o o
|
|oooooo o o
|
|oooooo_o_________________________________|
X
correlation = 0.87 (due to one outlier in upper right)
If domination by one case is not desired, can use the Spearman
rank correlation (correlation among ranks instead of actual values).
|
|
|
|
Y |
|
|
o
|
|
|
|
|
|
o
|
|
|
|
|
|_________________________________________|
X
correlation = 1.0 (correlation between two points is always 1 or
-1, unless x is the same for both or y is the same for both,
in which case correlation is undefined)
|
|
|
|
Y |
|
|
|
|
o o o
|
|
ooo
ooo
|
|
oo
oo
|
|
oo
oo
|
|
oo
oo
|
|______oo________________________oo____|
X
correlation = 0 but there is a strong nonlinear relationship
The Pearson correlation only detects linear relationships
|
|
|
o|
Y |
o
|
|
o
|
|
o
|
|
o
|
|
o
|
| o
|
| o
|
|_______________________________________|
X
correlation = 0.9 but there is an exact nonlinear relationship
such as y = x
How is linear correlation measured?
First it is necessary to have data for the two variables
whose correlation is to be computed. For each case
of an x value, there is an associated y value.
X
Y
Case 1
30
18
Case 2
28
22
Case 3
11
9
Case 4
3
7
..
.
.
..
.
.
Case n
50
27
Covariance and correlation between two variables
 x  x )
SD   
n 1
2
x itself
2


x

x
)
2
2
Variance  ( SD)   
n 1
 x  x )( y  y )
Co var iance 
n
n

(x i - x) ( yi  y )
1
Correlatio n 
n i 1  x
y
x itself
x vs. y
n

1

z( xi )z( yi )
n i 1
z x vs z y
The above formula defines the Pearson product-moment correlation
Correlation between X and itself = 1
n
n
n
(x i - x) ( xi  x ) 1
1
1
2

z( xi )z( xi ) 
z( xi ) 1
n i 1  x
x
n i 1
n i 1



So the expected (or mean) value of the square of z is 1
That means that if x and y are equal, their correlation is 1.
If they are proportional, their correlation also is 1.
If they are the negative of one another, correlation is –1.
If they are negatively proportional, correlation also is –1.
If the graph of y vs. x is any straight line*, their correlation
Is either 1 or –1 depending on the sign of the slope. This
implies perfect predictability of y from x.
*Exception: If line is exactly horizontal or exactly vertical,
then either x or y has SD=0, and correlation is undefined.
Approximate* Standard Error of a Zero Correlation Coefficient
(for example if X and Y are random data)
1
 cor 
n 1
Examples of  cor and critical values for 2-sided
significance at 0.05 level for various sample sizes n
n
 cor
10
20
50
100
400
0.33
0.23
0.14
0.10
0.05
*For small n, true values of
x 1.96 =
corcritical.025
0.65
0.45
0.28
0.20
0.10
 cor
Note: For
significance of
a correlation,
z-distribution
is used, rather
than t-distribution, for any
sample size.
are slightly smaller.
Confidence intervals for a nonzero correlation are smaller
than those for zero correlation, and are asymmetric such
that the interval toward lower absolute values is larger.
(Uses the Fisher R-to-Z transformation)
Example: for n=100 and sample correlation = 0.35,
95% confidence interval is 0.17 to 0.51. That is 0.35
minus 0.18, but 0.35 plus 0.16. (For zero correlation,
it is zero plus 0.20 and zero minus 0.20.)
A line in the x vs. y coordinate system has the form
y = a + bx
a is y-intercept
b is slope
Regression line is defined such that the sum of squares
of the errors (predicted y vs. true y) is minimized.
Such a line predicts y from x such that:
z y  corxy z x
For example, if corxy  .5 then y will be predicted
to be half as many SDs away from its mean as x.
When correlation between y and x is zero, the mean of
y will always be predicted, no matter what x is. When
we have no predictive info, the mean is the best guess
for minimizing the sum of squared errors.
z y  corxy z x
Simple regression prediction:
Now we incorporate the actual units of x and y rather
than the standardized (z) version in SD units. This
is the “raw numbers” form of the same equation:
y  y  (corxy )
SD y
SDx
(x  x)
The above equation “dresses up” the basic z relationship
by adjusting for (1) ratio of SD of y to SD of x, and (2)
the difference between the mean of y and the mean of x.
(corxy )
SDy
SDx
is the slope (b) of
the regression line
y  bx
is the yintercept
Standard error of estimate of regression forecasts
….is the standard deviation of the error distribution,
where the errors are y predicted  yactual
St Error of Estimate (of standardized y data, or z y ) =
2
1 cor xy
St Error of Estimate (of actual y data) =
2
SDy 1 cor xy
When cor = 0, St Error of Estimate is same as the SD of y.
When cor = 1, St Error of Estimate is 0 (all errors are zero).
Standard Error of Estimate vs. Correlation
Standard Error
of Estimate
2
1 cor xy
Correlation (as a fraction of SD
of the predictand [y] )
1.00
0.00
0.90
0.44
We need a very
0.80
0.60
high correlation to
0.70
0.71
get a low standard
0.60
0.80
error of estimate:
0.50
0.87
need cor = 0.866
0.40
0.92
to get SD of error
0.30
0.95
of half of SD or the
0.20
0.98
predicted variable (y).
0.10
0.99
This has implications for
0.00
1.00
probability of middle tercile!
Tercile probabilities for various correlation skills and predictor
signal strengths (in SDs). Assumes Gaussian probability distribution. Forecast (F) signal = (Predictor Signal) x (Correl Skill).
Correlation
Skill
Predictor
Signal=0.0
Predictor
Signal +0.5
Predictor
Signal +1.0
Predictor
Signal +1.5
Predictor
Signal +2.0
0.00
F signal 0.00
33 / 33 / 33
F signal 0.00
33 / 33 / 33
F signal 0.00
33 / 33 / 33
F signal 0.00
33 / 33 / 33
F signal 0.00
33 / 33 / 33
0.20
F signal 0.00
33 / 34 / 33
F signal 0.10
29 / 34/ 37
F signal 0.20
26 / 33 / 41
F signal 0.30
23 / 33 / 45
F signal 0.40
20 / 31 / 49
0.30
F signal 0.00
33 / 35 / 33
F signal 0.15
27 / 34 / 38
F signal 0.30
22 / 33 / 45
F signal 0.45
17 / 31 / 51
F signal 0.60
14 / 29 / 57
0.40
F signal 0.00
32 / 36 / 32
F signal 0.20
25 / 35 / 40
F signal 0.40
18 / 33 / 49
F signal 0.60
13 / 30 / 57
F signal 0.80
9 / 25 / 65
0.50
F signal 0.00
31 / 38 / 31
F signal 0.25
22 / 37 / 42
F signal 0.50
14 / 33 / 53
F signal 0.75
9 / 27 / 64
F signal 1.00
5 / 21 / 74
0.60
F signal 0.00
30 / 41 / 30
F signal 0.30
18 / 38 / 44
F signal 0.60
10 / 32 / 58
F signal 0.90
5 / 23 / 72
F signal 1.20
2 / 15 / 83
0.70
F signal 0.00
27 / 45 / 27
F signal 0.35
13 / 41 / 46
F signal 0.70
6 / 30 / 65
F signal 1.05
2 / 17 / 81
F signal 1.40
1 / 8 / 91
0.80
F signal 0.00
24 / 53 / 24
F signal 0.40
8 / 44 / 48
F signal 0.80
2 / 25 / 73
F signal 1.20
0* / 10 / 90
F signal 1.60
0** / 3 / 97
*0.3
**0.04
Spearman rank correlation
Rank correlation is simply the correlation between the
ranks of X vs. the ranks of Y, treating ranks as numbers.
Rank correlation defuses outliers by not honoring
original intervals between the numbers corresponding
to adjacent ranks. Adjacent ranks only differ by 1.
When there are outliers, or when the X and/or Y data are
very much non-normal, the Spearman rank correlation
should be computed in addition to the standard correlation.
Example of conversion to ranks for X or for Y:
Original numbers:
2 9 189 3 21 7
Corresponding ranks: 6 3 1 5 2 4
Multiple Linear Regression
uses 2 or more predictors
General form:
z y  b1z x1  b2 z x 2  b3 z x3  ....  bn z xn
Let us take simplest multiple regression case--two predictors:
z y  b1z x1  b2 z x 2
Here, the b’s are not simply corx1, yand corx 2, y , unless
x1 and x2 have zero correlation with one another. Any correlation between x1 and x2 makes determining the b’s less simple.
The b’s are related to the partial correlation, in which the
value of the other predictor(s) is held constant. Holding other
predictors constant eliminates the part of the correlation due
to the other predictors and not just to the predictor at hand.
Notation: partial correlation of y with x1, with x2 held
constant, is written cor
y, x1.x 2
z y  b1z x1  b2 z x 2
For 2 (or any n) predictors, there are 2 (or any n) equations
in 2 (or any n) unknowns to be solved simultaneously.
When n >3 or so, determinant operations are necessary.
For case of 2 predictors, and using z values (variables
standardized by subtracting their mean and then dividing
by the standard deviation) for simplicity, the solution can
be done by hand. The two equations to be solved
simultaneously are:
b1.2
b1.2(corx1,x2)
+b2.1(cor x1,x2)
+b2.1
= cory,x1
= cory,x2
Goal is to find the two b coefficients, b1.2 and b2.1
X1: Polar north Atlantic 500 millibar height
Example
X2: North tropical Pacific sea level pressure
Prediction
Y : Seasonal number of hurricanes in North Atlantic
corAtlantic 500 mb, hurricanes
corPacificSLP ,hurricanes
corAtlantic 500 mb , PacificSLP
= 0.20 (x1,y)
one predictor vs
= 0.30 (x1,x2)  the other
Simultaneous equations to be solved
b1.2
+(0.30)b2.1
= 0.20
(0.30)b1.2
+b2.1
= 0.40
= 0.40 (x2,y)
Solution: Multiply 1st equation by 3.333, then subtract
second equation from first equation. This gives
(3.033)b1.2
+0
= 0.267
So b1.2 = 0.088 and use this to find that b2.1 = 0.374
Regression equation is Zy = (0.088)zx1 + (0.374)zx2
Multiple correlation coefficient = R = correlation between
predicted y and actual y using multiple regression. The
b coefficients are for standardized (z) X1, X2, and Y.
R  b1.2corx1 y  b2.1corx2 y
In example above,
R  (.088)(.20)  (.373)(.40)
= 0.408
Note this is only very slightly better than using the second
predictor alone in simple regression. This is not surprising,
since the first predictor’s total correlation with y is only
0.2, and it is correlated 0.3 with the second predictor, so
that the second predictor already accounts for some of what
the first predictor has to offer. A decision would probably
be made concerning whether it is worth the effort to include
the first predictor for such a small gain. Note: the multiple
correlation can never decrease when more predictors are added.
Multiple R is usually inflated somewhat compared with
the true relationship, since additional predictors fit
the accidental variations found in the test sample.
Adjustment (decrease) of R for the existence of multiple
predictors gives a less biased estimate of R:
Adjusted R =
2
R (n  1)  k
n  k 1
n = sample size
k = number of predictors
Sampling variability of a simple (x, y) correlation coefficient
around zero when population correlation is zero is approximately
1
StError( zerocorrel ) 
n 1
In multiple regression the same approximate relationship
holds except that n must be further decreased by the
number of predictors additional to the first one.
If the number of predictors (x’s) is denoted by k, then
the sampling variability of R around zero, when there is
no true relationship with any of the predictors, is given by
1
StError( zerocorrel ) 
nk
It is easier to get a given multiple correlation by chance as
the number of predictors increases.
Partial Correlation is correlation between y and x1, where a
variable x2 is not allowed to vary. Example: in an elementary school, reading ability (y) is highly correlated with
the child’s weight (x1). But both y and x1 are really caused
by something else: the child’s age (call x2). What would the
correlation be between weight and reading ability if the age
were held constant? (Would it drop down to zero?)
ry , x1. x 2 
ry , x1  (ry , x 2 )(rx1, x 2 )
(1  r
b1  ry , x1. x 2
2
y,x2
)(1  r
2
x1, x 2
)
StErrorEst y , x 2
StErrorEst x1, x 2
A similar set of equations exists for the second predictor.
Suppose the three correlations are:
reading vs. weight : ry , x1  0.66
reading vs. age:
weight vs. age:
ry , x 2  0.82
rx1, x 2  0.71
The two partial correlations come out to be:
ry , x1. x 2  0.193
ry , x 2. x1  0.664
Finally, the two regression weights (for z’s) turn out to be:
b1  0.157
b2  0.709
R  0.827
Weight is seen to be a minor factor compared with age.
The means and the standard deviations of three data sets (y, x1, x2) are
y: Jul-Aug-Sep Sahel rainfall (mm): mean 230 mm, SD 88 mm
x1: Tropical Atlantic/Indian ocean SST: mean 28.3 degr C, SD 1.7 C
x2: Deforestation (percent of initial): mean 34%, SD 22%
Suppose that Cor(x1,y)= -0.52
Cor(x2,y)= -0.37
Cor(x1,x2)=0.50
If regression equation in SD units is Zy = -0.447-zx1 -0.147zx2
( y  y)
( x1  x1)
( x2  x2 )
 0.447
 0.147
SD y
SDx1
SDx 2
( x1  28.3)
( x2  34)
( y  230)
 0.447
 0.147
88
1.7
22
After simplification, final form will be:
y = coeff x1 + coeff x2 + constant (here, both coeff <0)
b1
b2
We now compute the multiple correlation R, and the
standard error of estimate for the multiple regression.
Using the two individual correlations and the b terms:
Cor(x1,y)= -0.52 Cor(x2,y)= -0.37 Cor(x1,x2)=0.50
Regression equation is Zy = -0.447 zx1 -0.147 zx2
R  b1.2corx1 y  b2.1corx2 y
R  (.447)(.52)  (.147)(.37)
= 0.535
The deforestation factor helps the prediction accuracy only
slightly. If there were less correlation between the two
predictors, then the second predictor would be more valuable.
Standard Error of Estimate =
1 R
2
y ,( x1x 2)
= 0.845
In physical units it is (0.845)(88 mm) =74.3 mm
Let us evaluate the significance of the multiple correlation
of 0.535. How likely could it have arisen by chance alone?
First we find the standard error of samples of 50 drawn from
a population having no correlations at all, using 2 predictors:
1
StError( zerocorrel ) 
nk
1
For n=50 and k=2 we get
= 0.145
50  2
For a 2-sided z test at the 0.05 level, we need 1.96(0.145) = 0.28
This is easily exceeded, suggesting that the combination of the
two predictors (SST and deforestation) do have an impact on
Sahel summer rainfall. (Using SST alone in simple regression, with
cor=0.52, would have given nearly the same level of significance.)
Example problem using this regression equation:
Suppose that a climate change model predicts that in year
2050, the SST in the tropical Atlantic and Indian oceans
will be 2.4 standard deviations above the means given for
the 50-year period of the preceding problem. (It is now
about 1.6 standard deviations above that mean.) Assume
that land use practices (percentage deforestation) will be
the same as they are now, which is 1.3 standard deviations
above the mean. Under this scenario, using the multiple
regression relationship above, how many standard deviations
away from the mean will Jul-Aug-Sep Sahel rainfall be,
and what seasonal total rainfall does that correspond to?
The problem can be solved either in physical units or in standard
deviation units, and then the answer can be expressed in either (or
both) kinds of units afterward.
If solved in physical units, the values of the two predictions in SD
units (2.4 and 1.3) can be converted to raw units using the means
and standard deviations of the variables provided previously,
and the raw units form of the regression equation would be used.
If solved in SD units, the simpler equation can be used:
Zy = -0.447zx1 -0.147zx2 The z’s of the two predictors,
according to the scenario given, will be 2.4 and 1.3, respectively.
Then Zy = -0.447(2.4) – 0.147(1.3) = -1.264. This is how many
SDs away from the mean the rainfall would be. Since the rainfall
mean and SD are 230 and 88 mm, respectively, the actual amount
predicted is 230 – 1.264(88) = 230 – 111.2 = 118.8 mm.
A problem in multiple regression: Colinearity
When the predictors are highly correlated with one another
in multiple regression, a condition of colinearity exists. When
this happens, the coefficients of two highly correlated
predictors may have opposing signs, even when each of them
has the same sign of simple correlation with the predictand.
(Such opposing signed coefficients minimizes squared errors.)
Issues and problems with this are (1) it is counterintuitive,
and (2) the coefficients are very unstable, such that if one
more sample is added to the data, they may change drastically.
When colinearity exists, the multiple regression formula
will often still provide useful and accurate predictions. To
eliminate colinearity, predictors that are highly correlated
can be combined into a single predictor.
EOF Analysis,
and its use in Regression
or in CCA
Empirical Orthogonal Functions (EOFs)
(closely related to Principal Components,
and Factor Analysis)
Identifying preferred patterns within many variables
Suppose we have a long time record of data for a field
variable, such as temperatures at many locations.
Examples in climate science
:
average temperature data for a 40-year period across
across much of globe, at grid points 5 degrees latitude
and 5 degrees longitude apart (over 2,000 grid points)
Sea surface temperature data for a 40-year period over
much of the globe’s oceans, on a 4 degree grid (again,
roughly 2,000 grid points).
Time 1
Time 2
……
……
……
Time 40
grid1 grid2…………..grid2000
grid1 grid2…………..grid2000
…..
….. …………………..
…..
….. …………………..
…..
….. …………………..
grid1 grid2…………..grid2000
Climate sciences often deal with data that have high dimensionality
such as collections of spatially distributed time series like the
temperature observations or SST observations. Because such
observations are not entirely random and are often related to eacg
other, the information contained in such datasets can often be
compressed down to a few spatial patterns that cluster stations/
grid points that are strongly related. EOF is an exploratory analysis
technique designed to perform such a compression in an objective
way, without any prior knowledge of the relationships linking the
observations or underlying physical processes. It expresses
the data in a smaller set of new variables defined through a linear
combination of the original ones. The desired result is a limited
collection of patterns, called EOF modes, that are sufficient to
reconstruct a good approximation of the original data and also easy
to visualize and recognize. Although such modes sometimes represent known physically phenomena, they are not designed to isolate
only physical mechanisms. EOF should be always thought of only
as an efficient statistical compression tool.
How EOF modes are defined from a dataset.
First, a complete intercorrelation matrix is computed
|
1
2 …………….. 2000
-------|---------------------------------------------1
| 1.00
0.81 …………….. -0.13
|
2
| 0.81
1.00 …………….. 0.07
… |
… |
2000 | -0.13 0.07….…………... 1.00
Then, using the cross-correlation matrix, a procedure is
used to identify which grid points best form a coherent
cluster—points that vary similarly or oppositely from one
another. This information leads to the formation of a linear
combination of all the grid points. In this combination, each
gridded value will be assigned a weight (positive or negative),
something like the weights assigned to the predictors in multiple regression. The pattern of these weights often shows up,
visually, as a coherent (non-random) pattern in the spatial
domain. Such a pattern of weights is an EOF loading pattern
(technically, it is called an eigenvector).
By multiplying the values at the grid points for one particular
time by their loading weights, and adding them all up, we get
the amplitude (or temporal “score”) for that time. Times whose
original data assume that pattern have high (+ or -) scores.
EOF analysis is performed by inputting the correlation
matrix to a procedure called eigenvalue/eigenvector
analysis. It involves solving a large set of linear equations.
Grid points having high correlations with the most other
grid points (+ or -) participate most strongly.
Each EOF pattern that emerges explains a certain percentage of the total variance of all the grid points over time.
This percentage of variance explained is maximized.
The first EOF mode gathers the most variance, and then
the second EOF mode works on what remains after all
the variability associated with the first mode is removed.
Often, after 2 to 6 modes have been defined, the coherent
portion of the total variability is exhausted, and further
modes just work on the remaining incoherent “noise”.
When this happens, the loading patterns start looking
random and physically meaningless, and the amounts of
additional variance explained become small.
What EOF analysis provides:
1. A set of EOF loading patterns (eigenvectors)
2. A set of corresponding amplitudes (temporal scores)
3. A set of corresponding variances accounted for
(from the eigenvalues)
Often, the EOF analysis allows a set of hundreds
or thousands of variables (like grid points) to be
compacted into just 3 to 6 EOF variables (modes)
that account for two-thirds or more of the original
variance. These modes capture coherent variations.
More than one field can be input to EOF analysis.
EOF mode 1: Global 500 mb height, JFM 1950-2004
loading
pattern
temporal scores (amplitude) for mode 1
EOF mode 2: Global 500 mb height, JFM 1950-2004
loading
pattern
temporal scores (amplitude) for mode 2
EOF mode 3: Global 500 mb height, JFM 1950-2004
loading
pattern
temporal scores (amplitude) for mode 3
EOF mode 1: Global 500 mb height, JFM 1950-2004
loading
pattern
temporal scores (amplitude) for mode 1
Using correlation matrix
Mean SSTs for JFM, 1950-2002
Since all grid point data are standardized, only coherent relationships
matter, and differing grid point variances do not affect the pattern.
Using covariance matrix
Mean SSTs for JFM, 1950-2002
Differing variances count, and grid points in extratropics (having high variance) receive more weight.
EOF analysis of JFM SST history using correlation matrix:
Amount of variance explained by EOF modes 1 to 12
EOF Mode number
Principal component regression
EOF amplitude time series, instead of raw (original)
variables, can be used as predictors for a multiple
regression.
The EOF time series represent scores with respect to
the EOF loading patterns, which contain large portions
of the variance of the original variables. The EOF time
series therefore may be an efficient way to include
the information of many raw predictors at once. This
depends, however, on whether or not the pattern is
relevant to what is being predicted. (Sometimes, grid
points that matter may not be part of the main EOF
Patterns, such as SST points right along a coastline.)
Interpreting EOFs
EOFs are sometimes difficult to interpret physically.
The weights are defined to maximize the variance, which
may not necessarily maximize the physical interpretability.
With spatial data (including climate data) the interpretation
becomes even more difficult because there are geometric
controls on the correlations between the data points.
Buell patterns
Imagine a rectangular domain in which all the points are
strongly correlated with their neighbors.
Buell patterns
The points in the middle of the domain will have the
strongest average correlations with all other points, simply
because their average distance to all other grids is the
smallest.
The strong
correlations
between neighboring grids will be
represented by
EOF 1, with the
central points
dominating the
pattern.
Buell patterns
The points in the corners of the domain will have the
weakest average correlations with all other points, simply
because their average distance to all other grids is the
greatest.
Mode 2 will represent points with
weak correlations
between distant
grids, because their
variance has not yet
been ex-plained. A
dipole pattern
appears. The axis of
the dipole is determined by the domain
shape: it is along the
long dimension.
Buell patterns
Are these real, or are they related to the domain shape?
(They may be both together.)
First EOF of Indian Ocean SST during Oct-Nov-Dec, for several decades
Buell patterns
Domain shape dependency can create these influences:
1. the first EOF frequently indicates positive loadings with
strongest values in the center of the domain;
2. the second PC frequently indicates negative loadings on
one side and positive loadings on the other side, with
axis along the longest dimension of the domain.
Similar kinds of problems can occur when using:
1. gridded data with converging longitudes, or simply with
longitude spacing different from latitude spacing;
2. station data (“middle” stations vs. “edge” stations).
EOF input can be correlation matrix or covariance
matrix. Covariance matrix should NOT be used...
(1) When variances differ greatly. For example:
Eq. Indian Ocean SST variance <<
Eq. Pacific Ocean SST variance. Indian Ocean
will have only very small influence on results.
(2) When units are different. For example:
SSTs combined with 200 hPa geopotential heights (SSTs
would have very low weight, almost no influence).
How many modes should be used when doing EOF analysis?
Ways to determine the answer:
•
•
•
•
Proportion of variance
SCREE test (shape of eigenvalue curve)
Average eigenvalue (Guttman – Kaiser)
Sensitivity tests (used in statistical
modeling)
• Monte Carlo tests – determines when
results become only random
• Visual inspection of mode patterns
The SCREE test
//
Monte Carlo
Exercises:
Find where
the red and
blue curves
intersect.
Extended EOFs
Can capture time evolution. Example:
AMJ–JAS–OND–JFM tropical SSTs are
combined as predictor fields for EOFs.
They are “temporally stacked” SST data.
Resulting modes will show evolutionary or
steady-state features of SSTs
Rotation
The weights are redefined for an alternative criteria.
Varimax rotation maximizes the variance of the loading
weights across the domain. The way this occurs, is there
are some very high weights, and a large number of weights
close to zero. Consequently, the patterns are more
localized.
The variances of the first few principal components are
reduced after rotation, and the curve of variance explained
is flatter than the curve for the original (unrotated) modes.
Rotation of EOFs:
Finding different directions of axes
• Rotation can be helpful when Buell
patterns exist (Richman 1986)
• Rotation provides a more simple, localized
structure; AO pattern become NAO pattern
• Same amount of variation explained after
rotation, for a given truncation choice
• Two types of rotation
• Orthogonal – varimax
• Oblique
Prediction of any element Y individually using multiple
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1.
x2
x3
x4
x5
x6
x7
x8
x9
Intercorrorrelation
matrix for X elements
↓
EOFs of X
regression
Predictors can be elements of X
y1  b1 x1  b2 x2  b3 x3  ....
y2  b1 x1  b2 x2  b3 x3  ....
y3  ....
OR
Predictors can be EOFs of X
(principal component regression)
y1  b1 XEOF1  b2 XEOF2  ....
y2  b1 XEOF1  b2 XEOF2  ....
y3  ....
Introduction to CCA
CCA is like EOF analysis, except that there are
TWO data sets (X and Y, or predictor and
predictand), and the input matrix (correlation or
covariance) contains cross-dataset coefficients.
Only (an X element with a Y element; no X-to-X
or Y-to-Y). Both X and Y can be time-extended
or contain multiple fields.
Analyses of correlation matrix of X and Y fields: 9 elements of X and Y
x1 x2 x3 x4 x5 x6 x7 x8 x9
y1 y2 y3 y4 y5 y6 y7 y8 y9
x1.
|
x2
|
x3
|
x4
|
x5
|
EOFs of X
x6
|
x7
|
x8
|
x9
|
-------------------------------------------------y1
|
y2
|
y3
|
y4
|
y5
|
EOFs of Y
y6
|
y7
|
y8
|
y9
|
CCA of X vs Y
Joint EOFs
of X and Y
CCA of X vs Y
CCA mode 1
X
Y
X: tropical Pacific
SST, SON season
Y: Indian Ocean
SST, DJF season
CCA mode 2
X
Y
X: tropical Pacific
SST, SON season
Y: Indian Ocean
SST, DJF season
Another CCA example, using tropical Pacific SST
for two separated months: July (X) and December (Y)
X
July and December 1950
– 1999 sea-surface
temperatures
Y
Buell Patterns affect CCA too, not just EOFs
Two possible predictor designs in CCA
1. Observational predictor design
X is observed earlier
predictors, such as the
field of governing SST
Y is rainfall pattern
prediction for a
region of interest
2. Model MOS design
X is dynamical model
prediction of rainfall pattern
around a region of interest
1. is a purely statistical forecast system
2. Is a dynamical forecast corrected by a statistical adjustment
Some simple matrix algebra:
a d 
If X   b e 
c f 


Rows are samples (time)
Columns are variable #
and a + b + c = 0, and d + e + f = 0, then


var iance  a 2  b 2  c 2 3
covariance   ad  be  cf  3
3 is number of cases; for unbiased estimate, 2 would be used.
In general, matrix multiplication gives:
a d 
a b c 

T
X X
b
e

d
e
f

  c f 


 a 2  b 2  c 2 ad  be  cf 

2
2
2 
da

eb

fc
d

e

f


So, if a + b + c = 0, and d + e + f = 0, then:
1
n
XT X  variance-covariance matrix
If X contains data expressed as anomalies, then
1
n
XT X  variance-covariance matrix
If X contains data expressed as standardized anomalies, then
1
n
XT X  correlation matrix
Download