Empirical Climate Prediction: Some Features of EOF ANALYSIS when used on its own or for Regression or CCA Slides taken from Willem Landman, Simon Mason, and Tony Barnston A long-standing, simple method to do climate prediction: Analogs Finding cases in the past that are similar to the current climate state, and predicting what happened, on average, in those past cases. Variations in analog forecasting: --Number of analogs --Weighting: equal vs. by degree of similarity --using EOFs, for “climate state vector” (Barnett & Preisendorfer 1978) --Inclusion of cases that are opposite (mult. by –1; assumes linearity) Advantage of analogs: Nonlinearity is taken into account Constructed analog: Building an analog for the present climate state from ALL available cases, using a multiple regression overfit (Van den Dool 1992). This makes analog forecasting more like regression. A related empirical method to do climate prediction: Composites Based on a (Believed) Relevant Dimension, Such as the ENSO state Advantage: Nonlinearity taken into account Example: GUAM ANNUAL RAIN 160 NOTE: RED INDICATES POST EL NINO YEARS We can make a composite of these 140 120 100 NORMAL 80 60 40 20 NOTE: POST-EL NINO YEARS IN RED 2000 1998 1996 1994 1992 1990 1988 1986 1984 1982 1980 1978 1976 1974 1972 1970 1968 1966 1964 1962 1960 1958 1956 1954 1952 1950 0 20 1969-70 GUAM 18 1972-73 1976-77 16 Year after El Nino 1982-83 14 1991-92 1997-98 12 NORMAL 10 8 6 4 2 El Nino Year 0 JAN 1 2 3 4 5 6 7 8 9 10 11 12 JAN 13 14 15 16 17 18 19 20 21 22 23 24 JAN 25 Percent of Normal GUAM 180 EL NINO YEAR YEAR AFTER EL NINO 160 140 120 NORMAL 100 80 60 40 20 0 1 JAN 2 3 4 5 6 7 8 9 10 11 12 13 JAN 14 15 16 17 18 19 20 21 22 23 24 25 JAN GUAM Average El Nino VS Mean Annual 20 EL NINO YEAR Average El Nino 18 NORMAL 16 YEAR AFTER EL NINO 14 12 10 8 6 4 2 0 1 JAN 2 3 4 5 6 7 8 9 10 11 12 13 JAN 14 15 16 17 18 19 20 21 22 23 24 25 JAN Probabilistic Composites Based on El Nino Mason and Goddard (2001) For Oct-Nov-Dec precipitation http://iri.columbia.edu/climate/forecast//enso/index.html Correlation between two variables Pearson product-moment correlation Correlation is a systematic relationship between x and y: When one goes up, the other tends to go up also, or may tend to go down. Need corresponding pairs of cases of x, y. “Perfect” positive correlation is +1 “Perfect” negative correlation is –1 No correlation (x and y completely unrelated) is 0 Correlation can be anywhere between –1 and +1. A relationship between x and y may or may not be causal – if not, x and y may be under control of some third variable. Correlation can be estimated visually by looking at a scatterplot of dots on an x vs. y graph. | | | o | | o o | | o o o | | o | | o o | Y| o o o o | | o | | o o | | o o | | o | | o | | o | |_______________________________________________| X correlation = 0.8 | o | | o | | o o o | | o o o o | | o o o | | o o | Y| o o o o | | o o o | | o o o | | o o | | o | | o o o | | o | |_______________________________________________| X correlation = 0.55 | o o o o o | | o o o oo o | Y | o o o o o | | o o o o o | | o o o o o o | | o o o o o o | | o o o o o o | | o o o o o o | | o o o o o o oo o o | |_______________________________________| X correlation = 0.0 | | | | Y | | | o | | o | | o | | o | | o | | o | |_______________________________________| X correlation = 1.0 | o | | o | Y | o | | o | | o | | o | | o | | o | | o | |_______________________________________| X correlation = 1.0 | o | | | Y | o | | | | o | | | | o | | | | o | |_______________________________________| X correlation = -1.0 | o | | o | Y | o | | o | | o | | o | | o | | o | | o | |_______________________________________| X correlation = undefined because SD of X is zero (X doesn’t change) | | | | Y | | | | | o o o o o o o o o o o o o | | | | | | | |_______________________________________| X correlation = undefined because SD of Y is zero (Y doesn’t change) | o x | | o x | Y | o x | | x o | | x o | | o x | | x o | | x o cor = 0 | | x o cor = 1 | |_______________________________________| X correlation for all 18 points = 0.707 correlation squared = 0.5 When points having a perfect correlation are mixed with an equal number of points having no correlation, and the two sets have same mean and variance for X and Y, correlation is 0.707. Correlation squared (“amount of variance accounted for”) is 0.5. | o | | | Y | | | | | | | | | o | |o o o | |oooooo o o | |oooooo_o_________________________________| X correlation = 0.87 (due to one outlier in upper right) If domination by one case is not desired, can use the Spearman rank correlation (correlation among ranks instead of actual values). | | | | Y | | | o | | | | | | o | | | | | |_________________________________________| X correlation = 1.0 (correlation between two points is always 1 or -1, unless x is the same for both or y is the same for both, in which case correlation is undefined) | | | | Y | | | | | o o o | | ooo ooo | | oo oo | | oo oo | | oo oo | |______oo________________________oo____| X correlation = 0 but there is a strong nonlinear relationship The Pearson correlation only detects linear relationships | | | o| Y | o | | o | | o | | o | | o | | o | | o | |_______________________________________| X correlation = 0.9 but there is an exact nonlinear relationship such as y = x How is linear correlation measured? First it is necessary to have data for the two variables whose correlation is to be computed. For each case of an x value, there is an associated y value. X Y Case 1 30 18 Case 2 28 22 Case 3 11 9 Case 4 3 7 .. . . .. . . Case n 50 27 Covariance and correlation between two variables x x ) SD n 1 2 x itself 2 x x ) 2 2 Variance ( SD) n 1 x x )( y y ) Co var iance n n (x i - x) ( yi y ) 1 Correlatio n n i 1 x y x itself x vs. y n 1 z( xi )z( yi ) n i 1 z x vs z y The above formula defines the Pearson product-moment correlation Correlation between X and itself = 1 n n n (x i - x) ( xi x ) 1 1 1 2 z( xi )z( xi ) z( xi ) 1 n i 1 x x n i 1 n i 1 So the expected (or mean) value of the square of z is 1 That means that if x and y are equal, their correlation is 1. If they are proportional, their correlation also is 1. If they are the negative of one another, correlation is –1. If they are negatively proportional, correlation also is –1. If the graph of y vs. x is any straight line*, their correlation Is either 1 or –1 depending on the sign of the slope. This implies perfect predictability of y from x. *Exception: If line is exactly horizontal or exactly vertical, then either x or y has SD=0, and correlation is undefined. Approximate* Standard Error of a Zero Correlation Coefficient (for example if X and Y are random data) 1 cor n 1 Examples of cor and critical values for 2-sided significance at 0.05 level for various sample sizes n n cor 10 20 50 100 400 0.33 0.23 0.14 0.10 0.05 *For small n, true values of x 1.96 = corcritical.025 0.65 0.45 0.28 0.20 0.10 cor Note: For significance of a correlation, z-distribution is used, rather than t-distribution, for any sample size. are slightly smaller. Confidence intervals for a nonzero correlation are smaller than those for zero correlation, and are asymmetric such that the interval toward lower absolute values is larger. (Uses the Fisher R-to-Z transformation) Example: for n=100 and sample correlation = 0.35, 95% confidence interval is 0.17 to 0.51. That is 0.35 minus 0.18, but 0.35 plus 0.16. (For zero correlation, it is zero plus 0.20 and zero minus 0.20.) A line in the x vs. y coordinate system has the form y = a + bx a is y-intercept b is slope Regression line is defined such that the sum of squares of the errors (predicted y vs. true y) is minimized. Such a line predicts y from x such that: z y corxy z x For example, if corxy .5 then y will be predicted to be half as many SDs away from its mean as x. When correlation between y and x is zero, the mean of y will always be predicted, no matter what x is. When we have no predictive info, the mean is the best guess for minimizing the sum of squared errors. z y corxy z x Simple regression prediction: Now we incorporate the actual units of x and y rather than the standardized (z) version in SD units. This is the “raw numbers” form of the same equation: y y (corxy ) SD y SDx (x x) The above equation “dresses up” the basic z relationship by adjusting for (1) ratio of SD of y to SD of x, and (2) the difference between the mean of y and the mean of x. (corxy ) SDy SDx is the slope (b) of the regression line y bx is the yintercept Standard error of estimate of regression forecasts ….is the standard deviation of the error distribution, where the errors are y predicted yactual St Error of Estimate (of standardized y data, or z y ) = 2 1 cor xy St Error of Estimate (of actual y data) = 2 SDy 1 cor xy When cor = 0, St Error of Estimate is same as the SD of y. When cor = 1, St Error of Estimate is 0 (all errors are zero). Standard Error of Estimate vs. Correlation Standard Error of Estimate 2 1 cor xy Correlation (as a fraction of SD of the predictand [y] ) 1.00 0.00 0.90 0.44 We need a very 0.80 0.60 high correlation to 0.70 0.71 get a low standard 0.60 0.80 error of estimate: 0.50 0.87 need cor = 0.866 0.40 0.92 to get SD of error 0.30 0.95 of half of SD or the 0.20 0.98 predicted variable (y). 0.10 0.99 This has implications for 0.00 1.00 probability of middle tercile! Tercile probabilities for various correlation skills and predictor signal strengths (in SDs). Assumes Gaussian probability distribution. Forecast (F) signal = (Predictor Signal) x (Correl Skill). Correlation Skill Predictor Signal=0.0 Predictor Signal +0.5 Predictor Signal +1.0 Predictor Signal +1.5 Predictor Signal +2.0 0.00 F signal 0.00 33 / 33 / 33 F signal 0.00 33 / 33 / 33 F signal 0.00 33 / 33 / 33 F signal 0.00 33 / 33 / 33 F signal 0.00 33 / 33 / 33 0.20 F signal 0.00 33 / 34 / 33 F signal 0.10 29 / 34/ 37 F signal 0.20 26 / 33 / 41 F signal 0.30 23 / 33 / 45 F signal 0.40 20 / 31 / 49 0.30 F signal 0.00 33 / 35 / 33 F signal 0.15 27 / 34 / 38 F signal 0.30 22 / 33 / 45 F signal 0.45 17 / 31 / 51 F signal 0.60 14 / 29 / 57 0.40 F signal 0.00 32 / 36 / 32 F signal 0.20 25 / 35 / 40 F signal 0.40 18 / 33 / 49 F signal 0.60 13 / 30 / 57 F signal 0.80 9 / 25 / 65 0.50 F signal 0.00 31 / 38 / 31 F signal 0.25 22 / 37 / 42 F signal 0.50 14 / 33 / 53 F signal 0.75 9 / 27 / 64 F signal 1.00 5 / 21 / 74 0.60 F signal 0.00 30 / 41 / 30 F signal 0.30 18 / 38 / 44 F signal 0.60 10 / 32 / 58 F signal 0.90 5 / 23 / 72 F signal 1.20 2 / 15 / 83 0.70 F signal 0.00 27 / 45 / 27 F signal 0.35 13 / 41 / 46 F signal 0.70 6 / 30 / 65 F signal 1.05 2 / 17 / 81 F signal 1.40 1 / 8 / 91 0.80 F signal 0.00 24 / 53 / 24 F signal 0.40 8 / 44 / 48 F signal 0.80 2 / 25 / 73 F signal 1.20 0* / 10 / 90 F signal 1.60 0** / 3 / 97 *0.3 **0.04 Spearman rank correlation Rank correlation is simply the correlation between the ranks of X vs. the ranks of Y, treating ranks as numbers. Rank correlation defuses outliers by not honoring original intervals between the numbers corresponding to adjacent ranks. Adjacent ranks only differ by 1. When there are outliers, or when the X and/or Y data are very much non-normal, the Spearman rank correlation should be computed in addition to the standard correlation. Example of conversion to ranks for X or for Y: Original numbers: 2 9 189 3 21 7 Corresponding ranks: 6 3 1 5 2 4 Multiple Linear Regression uses 2 or more predictors General form: z y b1z x1 b2 z x 2 b3 z x3 .... bn z xn Let us take simplest multiple regression case--two predictors: z y b1z x1 b2 z x 2 Here, the b’s are not simply corx1, yand corx 2, y , unless x1 and x2 have zero correlation with one another. Any correlation between x1 and x2 makes determining the b’s less simple. The b’s are related to the partial correlation, in which the value of the other predictor(s) is held constant. Holding other predictors constant eliminates the part of the correlation due to the other predictors and not just to the predictor at hand. Notation: partial correlation of y with x1, with x2 held constant, is written cor y, x1.x 2 z y b1z x1 b2 z x 2 For 2 (or any n) predictors, there are 2 (or any n) equations in 2 (or any n) unknowns to be solved simultaneously. When n >3 or so, determinant operations are necessary. For case of 2 predictors, and using z values (variables standardized by subtracting their mean and then dividing by the standard deviation) for simplicity, the solution can be done by hand. The two equations to be solved simultaneously are: b1.2 b1.2(corx1,x2) +b2.1(cor x1,x2) +b2.1 = cory,x1 = cory,x2 Goal is to find the two b coefficients, b1.2 and b2.1 X1: Polar north Atlantic 500 millibar height Example X2: North tropical Pacific sea level pressure Prediction Y : Seasonal number of hurricanes in North Atlantic corAtlantic 500 mb, hurricanes corPacificSLP ,hurricanes corAtlantic 500 mb , PacificSLP = 0.20 (x1,y) one predictor vs = 0.30 (x1,x2) the other Simultaneous equations to be solved b1.2 +(0.30)b2.1 = 0.20 (0.30)b1.2 +b2.1 = 0.40 = 0.40 (x2,y) Solution: Multiply 1st equation by 3.333, then subtract second equation from first equation. This gives (3.033)b1.2 +0 = 0.267 So b1.2 = 0.088 and use this to find that b2.1 = 0.374 Regression equation is Zy = (0.088)zx1 + (0.374)zx2 Multiple correlation coefficient = R = correlation between predicted y and actual y using multiple regression. The b coefficients are for standardized (z) X1, X2, and Y. R b1.2corx1 y b2.1corx2 y In example above, R (.088)(.20) (.373)(.40) = 0.408 Note this is only very slightly better than using the second predictor alone in simple regression. This is not surprising, since the first predictor’s total correlation with y is only 0.2, and it is correlated 0.3 with the second predictor, so that the second predictor already accounts for some of what the first predictor has to offer. A decision would probably be made concerning whether it is worth the effort to include the first predictor for such a small gain. Note: the multiple correlation can never decrease when more predictors are added. Multiple R is usually inflated somewhat compared with the true relationship, since additional predictors fit the accidental variations found in the test sample. Adjustment (decrease) of R for the existence of multiple predictors gives a less biased estimate of R: Adjusted R = 2 R (n 1) k n k 1 n = sample size k = number of predictors Sampling variability of a simple (x, y) correlation coefficient around zero when population correlation is zero is approximately 1 StError( zerocorrel ) n 1 In multiple regression the same approximate relationship holds except that n must be further decreased by the number of predictors additional to the first one. If the number of predictors (x’s) is denoted by k, then the sampling variability of R around zero, when there is no true relationship with any of the predictors, is given by 1 StError( zerocorrel ) nk It is easier to get a given multiple correlation by chance as the number of predictors increases. Partial Correlation is correlation between y and x1, where a variable x2 is not allowed to vary. Example: in an elementary school, reading ability (y) is highly correlated with the child’s weight (x1). But both y and x1 are really caused by something else: the child’s age (call x2). What would the correlation be between weight and reading ability if the age were held constant? (Would it drop down to zero?) ry , x1. x 2 ry , x1 (ry , x 2 )(rx1, x 2 ) (1 r b1 ry , x1. x 2 2 y,x2 )(1 r 2 x1, x 2 ) StErrorEst y , x 2 StErrorEst x1, x 2 A similar set of equations exists for the second predictor. Suppose the three correlations are: reading vs. weight : ry , x1 0.66 reading vs. age: weight vs. age: ry , x 2 0.82 rx1, x 2 0.71 The two partial correlations come out to be: ry , x1. x 2 0.193 ry , x 2. x1 0.664 Finally, the two regression weights (for z’s) turn out to be: b1 0.157 b2 0.709 R 0.827 Weight is seen to be a minor factor compared with age. The means and the standard deviations of three data sets (y, x1, x2) are y: Jul-Aug-Sep Sahel rainfall (mm): mean 230 mm, SD 88 mm x1: Tropical Atlantic/Indian ocean SST: mean 28.3 degr C, SD 1.7 C x2: Deforestation (percent of initial): mean 34%, SD 22% Suppose that Cor(x1,y)= -0.52 Cor(x2,y)= -0.37 Cor(x1,x2)=0.50 If regression equation in SD units is Zy = -0.447-zx1 -0.147zx2 ( y y) ( x1 x1) ( x2 x2 ) 0.447 0.147 SD y SDx1 SDx 2 ( x1 28.3) ( x2 34) ( y 230) 0.447 0.147 88 1.7 22 After simplification, final form will be: y = coeff x1 + coeff x2 + constant (here, both coeff <0) b1 b2 We now compute the multiple correlation R, and the standard error of estimate for the multiple regression. Using the two individual correlations and the b terms: Cor(x1,y)= -0.52 Cor(x2,y)= -0.37 Cor(x1,x2)=0.50 Regression equation is Zy = -0.447 zx1 -0.147 zx2 R b1.2corx1 y b2.1corx2 y R (.447)(.52) (.147)(.37) = 0.535 The deforestation factor helps the prediction accuracy only slightly. If there were less correlation between the two predictors, then the second predictor would be more valuable. Standard Error of Estimate = 1 R 2 y ,( x1x 2) = 0.845 In physical units it is (0.845)(88 mm) =74.3 mm Let us evaluate the significance of the multiple correlation of 0.535. How likely could it have arisen by chance alone? First we find the standard error of samples of 50 drawn from a population having no correlations at all, using 2 predictors: 1 StError( zerocorrel ) nk 1 For n=50 and k=2 we get = 0.145 50 2 For a 2-sided z test at the 0.05 level, we need 1.96(0.145) = 0.28 This is easily exceeded, suggesting that the combination of the two predictors (SST and deforestation) do have an impact on Sahel summer rainfall. (Using SST alone in simple regression, with cor=0.52, would have given nearly the same level of significance.) Example problem using this regression equation: Suppose that a climate change model predicts that in year 2050, the SST in the tropical Atlantic and Indian oceans will be 2.4 standard deviations above the means given for the 50-year period of the preceding problem. (It is now about 1.6 standard deviations above that mean.) Assume that land use practices (percentage deforestation) will be the same as they are now, which is 1.3 standard deviations above the mean. Under this scenario, using the multiple regression relationship above, how many standard deviations away from the mean will Jul-Aug-Sep Sahel rainfall be, and what seasonal total rainfall does that correspond to? The problem can be solved either in physical units or in standard deviation units, and then the answer can be expressed in either (or both) kinds of units afterward. If solved in physical units, the values of the two predictions in SD units (2.4 and 1.3) can be converted to raw units using the means and standard deviations of the variables provided previously, and the raw units form of the regression equation would be used. If solved in SD units, the simpler equation can be used: Zy = -0.447zx1 -0.147zx2 The z’s of the two predictors, according to the scenario given, will be 2.4 and 1.3, respectively. Then Zy = -0.447(2.4) – 0.147(1.3) = -1.264. This is how many SDs away from the mean the rainfall would be. Since the rainfall mean and SD are 230 and 88 mm, respectively, the actual amount predicted is 230 – 1.264(88) = 230 – 111.2 = 118.8 mm. A problem in multiple regression: Colinearity When the predictors are highly correlated with one another in multiple regression, a condition of colinearity exists. When this happens, the coefficients of two highly correlated predictors may have opposing signs, even when each of them has the same sign of simple correlation with the predictand. (Such opposing signed coefficients minimizes squared errors.) Issues and problems with this are (1) it is counterintuitive, and (2) the coefficients are very unstable, such that if one more sample is added to the data, they may change drastically. When colinearity exists, the multiple regression formula will often still provide useful and accurate predictions. To eliminate colinearity, predictors that are highly correlated can be combined into a single predictor. EOF Analysis, and its use in Regression or in CCA Empirical Orthogonal Functions (EOFs) (closely related to Principal Components, and Factor Analysis) Identifying preferred patterns within many variables Suppose we have a long time record of data for a field variable, such as temperatures at many locations. Examples in climate science : average temperature data for a 40-year period across across much of globe, at grid points 5 degrees latitude and 5 degrees longitude apart (over 2,000 grid points) Sea surface temperature data for a 40-year period over much of the globe’s oceans, on a 4 degree grid (again, roughly 2,000 grid points). Time 1 Time 2 …… …… …… Time 40 grid1 grid2…………..grid2000 grid1 grid2…………..grid2000 ….. ….. ………………….. ….. ….. ………………….. ….. ….. ………………….. grid1 grid2…………..grid2000 Climate sciences often deal with data that have high dimensionality such as collections of spatially distributed time series like the temperature observations or SST observations. Because such observations are not entirely random and are often related to eacg other, the information contained in such datasets can often be compressed down to a few spatial patterns that cluster stations/ grid points that are strongly related. EOF is an exploratory analysis technique designed to perform such a compression in an objective way, without any prior knowledge of the relationships linking the observations or underlying physical processes. It expresses the data in a smaller set of new variables defined through a linear combination of the original ones. The desired result is a limited collection of patterns, called EOF modes, that are sufficient to reconstruct a good approximation of the original data and also easy to visualize and recognize. Although such modes sometimes represent known physically phenomena, they are not designed to isolate only physical mechanisms. EOF should be always thought of only as an efficient statistical compression tool. How EOF modes are defined from a dataset. First, a complete intercorrelation matrix is computed | 1 2 …………….. 2000 -------|---------------------------------------------1 | 1.00 0.81 …………….. -0.13 | 2 | 0.81 1.00 …………….. 0.07 … | … | 2000 | -0.13 0.07….…………... 1.00 Then, using the cross-correlation matrix, a procedure is used to identify which grid points best form a coherent cluster—points that vary similarly or oppositely from one another. This information leads to the formation of a linear combination of all the grid points. In this combination, each gridded value will be assigned a weight (positive or negative), something like the weights assigned to the predictors in multiple regression. The pattern of these weights often shows up, visually, as a coherent (non-random) pattern in the spatial domain. Such a pattern of weights is an EOF loading pattern (technically, it is called an eigenvector). By multiplying the values at the grid points for one particular time by their loading weights, and adding them all up, we get the amplitude (or temporal “score”) for that time. Times whose original data assume that pattern have high (+ or -) scores. EOF analysis is performed by inputting the correlation matrix to a procedure called eigenvalue/eigenvector analysis. It involves solving a large set of linear equations. Grid points having high correlations with the most other grid points (+ or -) participate most strongly. Each EOF pattern that emerges explains a certain percentage of the total variance of all the grid points over time. This percentage of variance explained is maximized. The first EOF mode gathers the most variance, and then the second EOF mode works on what remains after all the variability associated with the first mode is removed. Often, after 2 to 6 modes have been defined, the coherent portion of the total variability is exhausted, and further modes just work on the remaining incoherent “noise”. When this happens, the loading patterns start looking random and physically meaningless, and the amounts of additional variance explained become small. What EOF analysis provides: 1. A set of EOF loading patterns (eigenvectors) 2. A set of corresponding amplitudes (temporal scores) 3. A set of corresponding variances accounted for (from the eigenvalues) Often, the EOF analysis allows a set of hundreds or thousands of variables (like grid points) to be compacted into just 3 to 6 EOF variables (modes) that account for two-thirds or more of the original variance. These modes capture coherent variations. More than one field can be input to EOF analysis. EOF mode 1: Global 500 mb height, JFM 1950-2004 loading pattern temporal scores (amplitude) for mode 1 EOF mode 2: Global 500 mb height, JFM 1950-2004 loading pattern temporal scores (amplitude) for mode 2 EOF mode 3: Global 500 mb height, JFM 1950-2004 loading pattern temporal scores (amplitude) for mode 3 EOF mode 1: Global 500 mb height, JFM 1950-2004 loading pattern temporal scores (amplitude) for mode 1 Using correlation matrix Mean SSTs for JFM, 1950-2002 Since all grid point data are standardized, only coherent relationships matter, and differing grid point variances do not affect the pattern. Using covariance matrix Mean SSTs for JFM, 1950-2002 Differing variances count, and grid points in extratropics (having high variance) receive more weight. EOF analysis of JFM SST history using correlation matrix: Amount of variance explained by EOF modes 1 to 12 EOF Mode number Principal component regression EOF amplitude time series, instead of raw (original) variables, can be used as predictors for a multiple regression. The EOF time series represent scores with respect to the EOF loading patterns, which contain large portions of the variance of the original variables. The EOF time series therefore may be an efficient way to include the information of many raw predictors at once. This depends, however, on whether or not the pattern is relevant to what is being predicted. (Sometimes, grid points that matter may not be part of the main EOF Patterns, such as SST points right along a coastline.) Interpreting EOFs EOFs are sometimes difficult to interpret physically. The weights are defined to maximize the variance, which may not necessarily maximize the physical interpretability. With spatial data (including climate data) the interpretation becomes even more difficult because there are geometric controls on the correlations between the data points. Buell patterns Imagine a rectangular domain in which all the points are strongly correlated with their neighbors. Buell patterns The points in the middle of the domain will have the strongest average correlations with all other points, simply because their average distance to all other grids is the smallest. The strong correlations between neighboring grids will be represented by EOF 1, with the central points dominating the pattern. Buell patterns The points in the corners of the domain will have the weakest average correlations with all other points, simply because their average distance to all other grids is the greatest. Mode 2 will represent points with weak correlations between distant grids, because their variance has not yet been ex-plained. A dipole pattern appears. The axis of the dipole is determined by the domain shape: it is along the long dimension. Buell patterns Are these real, or are they related to the domain shape? (They may be both together.) First EOF of Indian Ocean SST during Oct-Nov-Dec, for several decades Buell patterns Domain shape dependency can create these influences: 1. the first EOF frequently indicates positive loadings with strongest values in the center of the domain; 2. the second PC frequently indicates negative loadings on one side and positive loadings on the other side, with axis along the longest dimension of the domain. Similar kinds of problems can occur when using: 1. gridded data with converging longitudes, or simply with longitude spacing different from latitude spacing; 2. station data (“middle” stations vs. “edge” stations). EOF input can be correlation matrix or covariance matrix. Covariance matrix should NOT be used... (1) When variances differ greatly. For example: Eq. Indian Ocean SST variance << Eq. Pacific Ocean SST variance. Indian Ocean will have only very small influence on results. (2) When units are different. For example: SSTs combined with 200 hPa geopotential heights (SSTs would have very low weight, almost no influence). How many modes should be used when doing EOF analysis? Ways to determine the answer: • • • • Proportion of variance SCREE test (shape of eigenvalue curve) Average eigenvalue (Guttman – Kaiser) Sensitivity tests (used in statistical modeling) • Monte Carlo tests – determines when results become only random • Visual inspection of mode patterns The SCREE test // Monte Carlo Exercises: Find where the red and blue curves intersect. Extended EOFs Can capture time evolution. Example: AMJ–JAS–OND–JFM tropical SSTs are combined as predictor fields for EOFs. They are “temporally stacked” SST data. Resulting modes will show evolutionary or steady-state features of SSTs Rotation The weights are redefined for an alternative criteria. Varimax rotation maximizes the variance of the loading weights across the domain. The way this occurs, is there are some very high weights, and a large number of weights close to zero. Consequently, the patterns are more localized. The variances of the first few principal components are reduced after rotation, and the curve of variance explained is flatter than the curve for the original (unrotated) modes. Rotation of EOFs: Finding different directions of axes • Rotation can be helpful when Buell patterns exist (Richman 1986) • Rotation provides a more simple, localized structure; AO pattern become NAO pattern • Same amount of variation explained after rotation, for a given truncation choice • Two types of rotation • Orthogonal – varimax • Oblique Prediction of any element Y individually using multiple x1 x2 x3 x4 x5 x6 x7 x8 x9 x1. x2 x3 x4 x5 x6 x7 x8 x9 Intercorrorrelation matrix for X elements ↓ EOFs of X regression Predictors can be elements of X y1 b1 x1 b2 x2 b3 x3 .... y2 b1 x1 b2 x2 b3 x3 .... y3 .... OR Predictors can be EOFs of X (principal component regression) y1 b1 XEOF1 b2 XEOF2 .... y2 b1 XEOF1 b2 XEOF2 .... y3 .... Introduction to CCA CCA is like EOF analysis, except that there are TWO data sets (X and Y, or predictor and predictand), and the input matrix (correlation or covariance) contains cross-dataset coefficients. Only (an X element with a Y element; no X-to-X or Y-to-Y). Both X and Y can be time-extended or contain multiple fields. Analyses of correlation matrix of X and Y fields: 9 elements of X and Y x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7 y8 y9 x1. | x2 | x3 | x4 | x5 | EOFs of X x6 | x7 | x8 | x9 | -------------------------------------------------y1 | y2 | y3 | y4 | y5 | EOFs of Y y6 | y7 | y8 | y9 | CCA of X vs Y Joint EOFs of X and Y CCA of X vs Y CCA mode 1 X Y X: tropical Pacific SST, SON season Y: Indian Ocean SST, DJF season CCA mode 2 X Y X: tropical Pacific SST, SON season Y: Indian Ocean SST, DJF season Another CCA example, using tropical Pacific SST for two separated months: July (X) and December (Y) X July and December 1950 – 1999 sea-surface temperatures Y Buell Patterns affect CCA too, not just EOFs Two possible predictor designs in CCA 1. Observational predictor design X is observed earlier predictors, such as the field of governing SST Y is rainfall pattern prediction for a region of interest 2. Model MOS design X is dynamical model prediction of rainfall pattern around a region of interest 1. is a purely statistical forecast system 2. Is a dynamical forecast corrected by a statistical adjustment Some simple matrix algebra: a d If X b e c f Rows are samples (time) Columns are variable # and a + b + c = 0, and d + e + f = 0, then var iance a 2 b 2 c 2 3 covariance ad be cf 3 3 is number of cases; for unbiased estimate, 2 would be used. In general, matrix multiplication gives: a d a b c T X X b e d e f c f a 2 b 2 c 2 ad be cf 2 2 2 da eb fc d e f So, if a + b + c = 0, and d + e + f = 0, then: 1 n XT X variance-covariance matrix If X contains data expressed as anomalies, then 1 n XT X variance-covariance matrix If X contains data expressed as standardized anomalies, then 1 n XT X correlation matrix