FROM SPATIO-TEMPORAL DATA TO A WEIGHTED AND

advertisement
FROM SPATIO-TEMPORAL DATA TO A WEIGHTED AND
LAGGED NETWORK BETWEEN FUNCTIONAL DOMAINS:
APPLICATIONS IN CLIMATE AND NEUROSCIENCE
A Thesis
Presented to
The Academic Faculty
by
Ilias Fountalis
In Partial Fulfillment
of the Requirements for the Degree
Doctor of Philosophy in the
School of Computer Science
Georgia Institute of Technology
May 2016
Copyright © 2016 by Ilias Fountalis
FROM SPATIO-TEMPORAL DATA TO A WEIGHTED AND
LAGGED NETWORK BETWEEN FUNCTIONAL DOMAINS:
APPLICATIONS IN CLIMATE AND NEUROSCIENCE
Approved by:
Professor Constantine Dovrolis, Advisor
School of Computer Science
Georgia Tech
Professor Annalisa Bracco
School of Earth and Atmospheric
Sciences
Georgia Tech
Professor Mostafa H. Ammar
School of Computer Science
Georgia Tech
Professor Athanasios Nenes
School of Earth and Atmospheric
Sciences
Georgia Tech
Assistant Professor Bistra Dilkina
School of Computational Science and
Engineering
Georgia Tech
Associate Professor Shella Keilholz
Wallace H. Coulter Department of
Biomedical Engineering
Georgia Tech
Date Approved: 30 March 2016
To my family.
To my friends.
To Saamer.
iii
ACKNOWLEDGEMENTS
I joined Georgia Tech at the age of twenty five and after five and a half years I am at a
point where an interesting new chapter in my life begins. I was lucky enough to have
Constantine’s support and guidance throughout this time. His knack on finding interesting
research problems made this thesis possible. His attention to detail and his continuous
encouragement for me to understand in depth every single aspect of my research, changed
me as a scientist and as a person. I am proud to consider Constantine as my mentor and as
a valuable friend.
I would also like to thank Annalisa Bracco, which I consider as a co-advisor on this interesting research journey. Annalisa helped me understand concepts that (being a computer
scientist at heart) I was totally unfamiliar with. Special thanks go to Athanasios Nenes with
whom we exchanged interesting research ideas and whose advice in difficult times helped
me stay on the right track. I would also like to thank Bistra Dilkina and Shella Keilholz.
The last part of my thesis would not have been possible without their contribution.
I would also like to thank the NTG faculty for their continuous support and all the
people at the NTG lab for their company and friendship. Special thanks go to Demetris
Antoniades and Anirudh Ramachandran. I would like to thank my mother Anna and my
father Dimos for their unconditional love and for supporting me in all my choices. Last
but not least, I would like to thank my Atlanta friends. I would never have managed to go
through this journey without their help and support.
iv
Contents
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
LIST OF TABLES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
I
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Dimensionality reduction methods for spatio-temporal data . . . . . . . .
2
1.2
A framework for the analysis of spatio-temporal systems . . . . . . . . .
3
1.2.1
geo-Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2.2
δ-MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.1
Synthetic Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2
Network Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.3
Dimensionality Reduction methods . . . . . . . . . . . . . . . . . . . . .
8
1.3
II
2.3.1
Principal Component Analysis . . . . . . . . . . . . . . . . . . . 10
2.3.2
Independent Component Analysis . . . . . . . . . . . . . . . . . 11
2.3.3
Clustering based methods . . . . . . . . . . . . . . . . . . . . . . 11
2.3.4
Community Detection Methods . . . . . . . . . . . . . . . . . . . 14
III GEO-CLUSTER: SPATIO-TEMPORAL NETWORK ANALYSIS FOR STUDYING CLIMATE PATTERNS . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2
Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3
Climate network construction . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1
Cell-level network . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2
Identification of climate areas . . . . . . . . . . . . . . . . . . . . 22
v
3.3.3
Links between areas . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4
Network metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5
Robustness analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6
3.5.1
Robustness to additive white Gaussian noise . . . . . . . . . . . . 35
3.5.2
Robustness to the resolution of the input data set . . . . . . . . . . 36
3.5.3
Robustness to the selection of τ . . . . . . . . . . . . . . . . . . . 36
3.5.4
Robustness to the selection of the correlation metric . . . . . . . . 37
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6.1
Comparison of SST networks . . . . . . . . . . . . . . . . . . . . 41
3.6.2
Network changes over time . . . . . . . . . . . . . . . . . . . . . 47
3.6.3
Comparison of precipitation networks . . . . . . . . . . . . . . . 48
3.6.4
Regression between networks . . . . . . . . . . . . . . . . . . . . 51
3.6.5
CMIP5 SST networks . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7
Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.8
Selection of threshold τ . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.9
Pseudocode of area identification algorithm . . . . . . . . . . . . . . . . . 61
IV ENSO IN CMIP5 SIMULATIONS: NETWORK CONNECTIVITY FROM
THE RECENT PAST TO THE TWENTY-THIRD CENTURY . . . . . . . 64
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2
Climate Network Inference . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.1
CMIP5 Models and Observational Datasets . . . . . . . . . . . . 71
4.3.2
The Historical Experiments: 1956-2005 . . . . . . . . . . . . . . 72
4.3.3
The RCP8.5 Experiments: 2051-2100 . . . . . . . . . . . . . . . 76
4.3.4
The ECP8.5 Experiments: 2101 - 2300 . . . . . . . . . . . . . . . 80
4.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5
Supplementary strength and link maps . . . . . . . . . . . . . . . . . . . 92
4.6
Advantages of using a complete weighted cell-level network . . . . . . . . 105
vi
V
δ-MAPS: FROM SPATIO-TEMPORAL DATA TO A WEIGHTED AND LAGGED
NETWORK BETWEEN FUNCTIONAL DOMAINS . . . . . . . . . . . . . 108
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3
δ-MAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.1
Functional domains . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3.2
The domain network . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4
Illustration - Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.5
Application in Climate Science . . . . . . . . . . . . . . . . . . . . . . . 120
5.6
Application in fMRI data . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.7
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.8
Identifying the largest domain is NP-complete . . . . . . . . . . . . . . . 128
5.9
Heuristic for the selection of δ . . . . . . . . . . . . . . . . . . . . . . . . 129
5.10 δ-MAPS pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
VI CONCLUSIONS & FUTURE WORK . . . . . . . . . . . . . . . . . . . . . 133
6.1
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
vii
List of Tables
1
Synthetic domain generation parameters. . . . . . . . . . . . . . . . . . . .
2
Dsd and ARI from HadISST (1979-2005) to reanalyses, GISS-E2H and
HadCM3, and corresponding noise-to-signal ratios γ . . . . . . . . . . . . 58
3
List of models analyzed and global mean trends in sea surface temperature and rainfall over 1956-2005 and 2051-2100. The number of ensemble
members considered during the historical period (1956-2005) is indicated
for each model. In parenthesis the number of members with projections to
2100 under the RCP8.5 scenario. X indicates that the model has one member continuing to 2300. Boreal winter (December to February) global mean
trends are averaged over all ensemble members (± denotes the maximum
deviation between ensemble members) . . . . . . . . . . . . . . . . . . . . 73
4
Projected global mean trends in sea surface temperature and rainfall from
2101 to 2300. Trends are calculated over 50-year long consecutive intervals
for the models with one member extending to 2300 and for boreal winter
(December to February). Precipitation trends are in parenthesis . . . . . . . 84
viii
7
List of Figures
1
A: The five ground-truth domains. Adjacent domains have different colors, overlapping regions shown in black, and the core of each domain is
in blue. The three constructed edges are shown in gray lines. B: The homogeneity field r̂K (i) at each cell. The identified seeds are shown in blue.
C: The inferred domains: adjacent domains have different colors and overlaps are shown in black. D: The inferred domain-level network: the color
map refers to the edge correlation. The lag associated with each edge is
also shown. E,F,G: The first three EOF (PCA) components. The variance
explained by each component is shown at the top of each figure. H,I: The
two ICA components. J,K: K-means clustering. L: The second hierarchical level of community structure as identified by OSLOM: each community
has a distinct color and overlaps are shown in black. . . . . . . . . . . . .
9
2
Empirical Cumulative Distribution Functions (CDF) of correlations for the
HadISST reanalysis during the 1950-1976 and 1979-2005 periods, and for
ERSST-V3 and NCEP data during the 1979-2005 period . . . . . . . . . . 22
3
An example of the area identification algorithm. (a) 12-cell synthetic grid.
(b) The correlation matrix between cells (given as input). (c) The area
expansion process for a given τ =0.4. Cells shown in red are selected to
join the area (denoted by Ak ). Cells 1, 4, 9 and 12 will not join Ak since
they do not satisfy the τ constraint in Eq.2 . . . . . . . . . . . . . . . . . . 24
4
Identified areas in the HadISST 1979-2005 data set (τ =0.496). (a) The 176
areas identified by Part-1 of the area identification algorithm. (b) The 74
“merged” areas after the execution of Part-2. (c) The CDF of area sizes (in
number of cells) before and after the merging process . . . . . . . . . . . . 25
5
The relation between area size and standard deviation of the area’s cumulative anomaly (R2 = 0.88) for the HadISST reanalysis during the 1979-2005
period; τ =0.496 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6
CDF of the absolute correlation between area cumulative anomalies for the
HadISST reanalysis during the 1950-1976 and 1979-2005 periods, and for
ERSST-V3 and NCEP during the 1979-2005 period . . . . . . . . . . . . . 27
7
Link maps for two areas related to (a) ENSO and (b) the equatorial Indian
Ocean in the HadISST 1979-2005 network (τ =0.496). The color scale represents the weight of the link between the area shown in black and every
other area in this SST network . . . . . . . . . . . . . . . . . . . . . . . . 28
8
Strength maps for two different time periods using the HadISST data set.
(a) 1950-1976 network, strength of ENSO area: 20.1 × 104 ; (b) 1979-2005
network, strength of ENSO area: 18.8 × 104 . . . . . . . . . . . . . . . . . 29
ix
9
Color maps depicting the top-5 order cores for the (a) HadISST 1950-1976,
and (b) HadISST 1979-2005 networks . . . . . . . . . . . . . . . . . . . . 30
10
(a) Distribution of ranked area strengths for two networks constructed using
the HadISST data set over the periods 1950-1976 and 1979-2005, respectively. (b) Distance Dsd (N, Nγ ) and ARI(N, Nγ ) between the HadISST
1979-2005 network and networks constructed after the addition of white
Gaussian noise in the same data set . . . . . . . . . . . . . . . . . . . . . . 33
11
Strength maps for two perturbations of the HadISST 1979-2005 data set
using white Gaussian noise. (a) γ=0.05, strength of ENSO area: 18.0×104 .
(b) γ=0.10, strength of ENSO area: 19.1 × 104 . . . . . . . . . . . . . . . 35
12
Strength maps for the HadISST 1979-2005 network at three different resolutions. (a) Low resolution network, (4o lat × 4o lon), strength of ENSO
area: 18.2×104 . (b) Default resolution network, (2o lat×2.5o lon), strength
of ENSO area: 18.8 × 104 . (c) High resolution network, (1o lat × 2o lon),
strength of ENSO area: 18.2 × 104 . . . . . . . . . . . . . . . . . . . . . . 38
13
(a) Distance Dsd and (b) ARI from the original HadISST 1979-2005 network (marked with an asterisk in the x-axis, τ =0.496) to networks constructed with different values of τ . The black horizontal lines correspond
to the distance Dsd (N, Nγ ) and ARI(N, Nγ ) . . . . . . . . . . . . . . . . 39
14
Strength maps for the HadISST 1979-2005 network using two values of the
parameter τ . The “default” value is τ =0.496, corresponding to α=.1% (see
Section 3.8). (a) τ =0.45, strength of ENSO area: 18.7 × 104 . (b) τ =0.55,
strength of ENSO area: 18.6 × 104 . . . . . . . . . . . . . . . . . . . . . . 40
15
Strength map for the HadISST 1979-2005 network using Spearman’s correlation; strength of ENSO area: 18.5 × 104 . . . . . . . . . . . . . . . . . 40
16
Pearson correlation maps between the SST anomaly time series in all pairs
of three reanalyses data sets over the 1979-2005 period in boreal winter
(DJF). Correlations between (a) HadISST and ERSST-V3; (b) HadISST
and NCEP; (c) NCEP and ERSST-V3 . . . . . . . . . . . . . . . . . . . . 43
17
Strength maps for networks constructed based on (a) HadISST (ENSO area
strength 18.8 × 104 ); (b) ERSST-V3 (ENSO area strength 17.6 × 104 );
(c) NCEP (ENSO area strength 21.0 × 104 ). In all networks the period
considered is 1979-2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
18
Top-5 order cores in (a) HadISST; (b) ERSST-V3; (c) NCEP. The period
considered is 1979-2005 in all cases . . . . . . . . . . . . . . . . . . . . . 45
19
Links between the ENSO-like area shown in black and all other areas in
the three reanalyses. (a) HadISST, (b) ERSST-V3 and (c) NCEP networks . 46
x
20
Links for the HadISST network over 1950 - 1976 from the (a) ENSOrelated area, and (b) the equatorial Indian Ocean area (in black in the two
panels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
21
Precipitation networks. Area strength map in (a) CMAP (equatorial Pacific
area strength 49.4 × 104 ), and (b) ERA-Interim (equatorial area strength
41.0 × 104 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
22
Top-5 order cores in (a) CMAP, and (b) ERA-Interim . . . . . . . . . . . . 50
23
Link maps from the strongest area (in black) for the two precipitation reanalysis data sets. (a) CMAP; (b) ERA Interim . . . . . . . . . . . . . . . 52
24
Link maps from the ENSO-like area in HadISST data set to all areas in
the CMAP data set, considering the 1979-2005 period. Values greater than
|1 × 104 | are saturated . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
25
Strength maps for two members of the GISS-E2H and HadCM3 “historical” ensemble. (a) GISS-E2H run 1 (ENSO area strength 9.8 × 104 );
(b) GISS-E2H run 2 (ENSO area strength 10.0 × 104 ); (c) HadCM3 run
1 (ENSO area strength 23.3 × 104 ) and (d) HadCM3 run 2 (ENSO area
strength 16.9 × 104 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
26
Top-5 order cores identified in the SST anomaly networks for (a-b) two
GISS-E2H ensemble members and (c-d) two HadCM3 integrations . . . . . 56
27
Link maps from the ENSO-like area in the (a-b) GISS-E2H and (c-d) HadCM3
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
30
Trend anomaly maps for boreal winter in the recent past and near future. Anomalies are computed by removing the global mean trend calculated over the months of December to February and indicated in Table
3 from each grid cell. + and • indicate agreement in more than 90% and
70% of models in the sign of the trend anomaly slope. (a) HadISST. (b)
ERA40+Interim. (c) Sea surface temperature (SST) averaged across models in the historical period (1956-2005). (d) As in (c) but for rainfall. The
units are C o /year for SST and (mm/day)/year for precipitation . . . . . . . 74
31
Metric D versus ARI for climate networks during the historical period
1956-2005. (a) Sea surface temperature; reference network HadISST. (b)
Precipitation; reference network ERA40+Interim. Three levels of noise-tosignal ratios γ are also indicated . . . . . . . . . . . . . . . . . . . . . . . 77
xi
32
Strength maps of sea surface temperature for HadISST and three sample
models (top rows), and of precipitation for ERA40+Interim and the same
three models (bottom rows) during the historical period 1956-2005. Models shown: MIROC5, GFDL CM3 and MRI. For clarity, the strength of
the ENSO-related area is saturated when exceeding the colorscale and its
value is indicated at the top of each panel, together with D and ARI from
HadISST or ERA40+Interim for each of the model networks . . . . . . . . 78
33
Sea surface temperature link maps from the ENSO-related area in black for
HadISST and the three sample models during the historical period 19562005. Models shown: MIROC5, GFDL CM3 and MRI . . . . . . . . . . . 79
34
Trend anomaly maps for boreal winter in the second half of the 21st century. Anomalies are computed by removing the global mean trend calculated over the months of December to February and indicated in Table 3
from each grid cell. + and • indicate agreement in more than 90% and 70%
of models in the sign of the trend anomaly slope. (a) SST averaged across
models over 2051-2100. (b) As in (a) but for rainfall. The units are C o /year
for SST and (mm/day)/year for precipitation . . . . . . . . . . . . . . . . 80
35
Metric D versus ARI for climate model networks during the period 20512100. (a) Sea surface temperature. (b) Precipitation. All networks are referenced to the corresponding integration over the historical period. Three
levels of noise-to-signal ratios γ are also indicated. D and ARI between
HadISST and other sea surface temperature proxies, and ERA40+Interim
and other precipitation reanalyses are repeated to provide context . . . . . 81
36
Sea surface temperature strength maps for two members of the CanESM2
model in the historical period (1956-2005) on top, and in the 21st century (2051-2100) at the bottom. For clarity, the strength of the ENSOrelated area is saturated when exceeding the colorscale and indicated in
each panel. In the future projections D and ARI from the corresponding
historical member are also specified . . . . . . . . . . . . . . . . . . . . . 82
37
Trend anomaly maps for boreal winter in the 22nd and 23rd centuries.
Anomalies are computed by removing the global mean trend calculated
over the months of December to February and indicated in Table 4 from
each grid cell. + and • indicate agreement in more than 90% and 70% of
models in the sign of the trend anomaly slope. (a) Sea surface temperature (SST) averaged across models over 2101-2150. (b) Rainfall averaged
across models over 2101-2150. (C) As in (a) but for 2151-2200. (d) As in
(b) but for 2151-2200. (e) As in (a) but for 2201-2250. (f) As in (b) but
for 2201-2250. (g) As in (a) but for 2251-2300. (h) As in (b) but for 22512300. The units are C o /year for SST and (mm/day)/year for precipitation . 85
xii
38
Metric D versus ARI for seven climate model networks from 2051 to 2300
over five consecutive 50-year periods, from 1 to 5. (a) Sea surface temperature. (b) Precipitation. All networks are referenced to the corresponding
integration over the historical period. Three levels of noise-to-signal ratios
γ are also indicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
39
Sea surface temperature (a-d) and precipitation (e-h) strength maps for two
models (left column CCSM4, right column MPI) in the historical period
(1956-2005) and in the future (2251-2300). For each variable the first row
corresponds to the historical experiments. For clarity, the strength of the
ENSO-related area is saturated when exceeding the colorscale and indicated at the top of each panel. D and ARI metrics of the future projections
from the corresponding historical member are also included . . . . . . . . 87
40
Link maps for sea surface temperature (a-b) and precipitation (c-d) from the
ENSO-related area in black for two models for which the ENSO projected
strength evolves in opposite ways. CCSM4 is shown on the left column
and MPI on the right. Maps are calculated over the 2251-2300 period . . . 88
41
Variance of the cumulative anomalies of the ENSO area in DJF in the models and HadISST over 1956-2005 in red, and in the models over 2251-2250
in blue. For HadISST the time series is highly correlated (coefficient 0.94)
with the Niño3.4 index defined as the average of SST anomalies from 5o S
to 5o N , and from 120o to 170o W . Error bars around the mean variance
over 50 years are determined using a 20-year sliding window, and provide
a measure of the decadal modulation of ENSO in the models over the periods considered. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
42
Maps of area strength of sea surface temperature networks in boreal winter
(December to February) in the historical period 1956-2005 for models and
reanalyses. Only one ensemble member per model is shown. The strength
of the area corresponding to ENSO is indicated in the panel captions and
saturated in black if colorbar limits are exceeded . . . . . . . . . . . . . . 93
43
Link maps from the ENSO related area (in black) for sea surface temperature networks in boreal winter (December to February) in the historical
period 1956-2005 for models and reanalyses. Only one ensemble member
per model is shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
44
Maps of area strength for precipitation networks in boreal winter (December to February) in the historical period 1956-2005 for models and reanalyses. Only one ensemble member per model is shown. The strength of the
area corresponding to ENSO is indicated in the panel captions and saturated in black if colorbar limits are exceeded . . . . . . . . . . . . . . . . 95
45
Link maps from the ENSO related area (in black) for precipitation networks
in boreal winter (December to February) in the historical period 1956-2005
for models and reanalyses. Only one ensemble member per model is shown
xiii
96
46
Maps of area strength for the sea surface temperature networks in boreal
winter (December to February) in the RCP8.5 projections (period 20512100). For each model, the ensemble member shown projects into the
future the historical counterpart in Fig. 42. The strength of the area corresponding to ENSO is indicated in the panel captions and saturated in
black if colorbar limits are exceeded . . . . . . . . . . . . . . . . . . . . . 97
47
Link maps from the ENSO related area (in black) for sea surface temperature networks in boreal winter (December to February) in the RCP8.5
projections (period 2051-2100) for the ensemble members in Fig. 46 . . . . 98
48
Maps of area strength for the precipitation networks in boreal winter (December to February) in the RCP8.5 projections (period 2051-2100). For
each model, the ensemble member shown is the projection into the future
of the historical counterpart in Fig. 44. The strength of the area corresponding to ENSO is indicated in the panel captions and saturated in black
if colorbar limits are exceeded . . . . . . . . . . . . . . . . . . . . . . . . 99
49
Link maps from the ENSO related area (pictured in black) for the precipitation networks in boreal winter (December to February) in the RCP8.5
projections (period 2051-2100) for the ensemble members in Fig. 48 . . . . 100
50
Maps of area strength for the sea surface temperature networks in boreal winter (December to February) in the ECP8.5 projections (periods
2101-2150 and 2251-2300). For each model, the ensemble member shown
projects into the future the historical counterpart in Fig. 42. The strength
of the area corresponding to ENSO is indicated in the panel captions and
saturated in black if colorbar limits are exceeded . . . . . . . . . . . . . . 101
51
Link maps from the ENSO related area (in black) for sea surface temperature networks in boreal winter (December to February) in the ECP8.5 projections (periods 2101-2150 and 2251-2300) for the ensemble members in
Fig. 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
52
Maps of area strength for the precipitation networks in boreal winter (December to February) in the ECP8.5 projections (periods 2101-2150 and
2251-2300). For each model, the ensemble member shown projects into
the future the historical counterpart in Fig. 44. The strength of the area
corresponding to ENSO is indicated in the panel captions and saturated in
black if colorbar limits are exceeded . . . . . . . . . . . . . . . . . . . . . 103
53
Link maps from the ENSO related area (in black) for the precipitation networks in boreal winter (December to February) in the ECP8.5 projections
(periods 2101-2150 and 2251-2300) for the ensemble members in Fig. 52 . 104
54
Areas identified using three different cell-level networks. α was set to 1 ×
10−3 . Data set: HadiSST 1956-2005 . . . . . . . . . . . . . . . . . . . . . 107
xiv
55
ARI between a reference network constructed using α = 1 × 10−3 and
networks constructed using different α values . . . . . . . . . . . . . . . . 107
56
Correlogram between two climate time series for a lag range of ±12 months.
We show the significant correlations for a false discovery rate q = 10−3
with red. The error bars correspond to ± one standard deviation, as estimated by Eq. (15). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
57
(A) The identified domains. The color of each domain corresponds to the
connected component it belongs to (the blue and green nodes belong to
two different poles of the same component). (B) Color map for domain
strength. The strength of ENSO (domain E) is shown at the top. (C) Edges
to and from ENSO (shown in black). (D) The climate network. The color
of each edge represents the corresponding cross-correlation. (E) The lag
range associated with each edge. (F) Examples of lag-consistent triangles. . 122
58
(A),(B) The first two components of EOF analysis. (C) Communities identified by OSLOM. Each community has a unique number and color. (D)
Areas identified by spatial clustering. . . . . . . . . . . . . . . . . . . . . 123
59
Three domain-level network communities for each scan. The first corresponds to the default-mode network, the second to the occipital network,
and the third to the motor/somatosensory network. . . . . . . . . . . . . . 126
60
The domains of the backbone network for each hemisphere and scan. The
color of each domain is randomly assigned (overlaps are shown in black). . 127
xv
SUMMARY
Spatio-temporal data have become increasingly prevalent and important for in many
scientific fields (e.g., climate, systems neuroscience, seismology) and enterprises (e.g., geotagged tweets). Such data are typically embedded in an arbitrary grid. The grid cells, however, do not correspond to functionally distinct units. One major task is to identify the
distinct semi-autonomous functional components of the system and to infer their interconnections.
Common computational analysis methods for such data include standard time series
analysis, clustering, community detection, and multivariate statistical methods (e.g., PCA/ICA).
However, as we also demonstrate using synthetic data, each of these classes of methods
have important limitations in terms of accuracy and flexibility.
In this thesis, we propose two methods that first identify the functional components
of a spatio-temporal system as spatially contiguous sets of grid cells, homogeneous to the
underlying field. At a second step, an edge inference process identifies the possibly lagged
and weighted connections between the system’s components, applying a multiple-testing
process controlling for the rate of false positives. The inferred network is modeled as a
weighted and directed graph. The weight of an edge accounts for the magnitude of the
interaction between two components; the direction (and lag) associated with each edge
accounts for the temporal ordering of the interactions between the system’s components.
The first method, geo-Cluster, infers the spatial components as ”areas”. An area is a
spatially contiguous, non-overlapping, set of grid cells that satisfy a homogeneity constraint
in terms of their average pair-wise cross-correlation. However, in real physical systems the
underlying physical components might not have crisp boundaries (i.e., they might overlap).
To account for this we also propose δ-MAPS, a method that first identifies the epicenters
xvi
of activity of the functional components of the system and then creates domains - spatially contiguous, possibly overlapping, sets of grid cells that satisfy the same homogeneity
constraint.
The proposed framework is applied in climate science and neuroscience. In the context
of climate we show how such methods can be used to infer climate shifts, evaluate cutting
edge climate models and identify lagged relationships between different climate regions. In
the context of neuroscience, the method is applied to resting state fMRI data and successfully identifies well-known ”resting state networks” as well as a few areas that are strongly
interconnected to each other, forming the backbone of the functional cortical network.
xvii
Chapter I
INTRODUCTION
Many real world systems are modeled as an ensemble of distinct components that are associated via a complex set of connections. In some systems both the elements and their
connections are obvious (e.g., Internet routers as nodes, cables between routers as edges).
In others, the underlying mechanisms for remote connections are unknown a priori (e.g.,
social networks) and it is non-trivial to identify the distinct functional components of the
system (e.g., functional regions in the human brain). This is usually the case with systems
embedded in a spatio-temporal field.
In recent years, spatio-temporal data have become increasingly prevalent and important
for in many scientific fields (e.g., climate, systems neuroscience, seismology) and enterprises (e.g., geo-tagged tweets). Such data are typically embedded in an arbitrary grid. The
grid cells, however, do not correspond to functionally distinct units. One major task is to
identify the distinct semi-autonomous components of the system. A second is to infer the
strength of their (potentially lagged) interconnections.
A typical approach to study spatio-temporal systems is to model them as networks.
Typically, the grid cells are the nodes of the network and the edges of the network correspond to statistically significant linear [153] or non-linear [53] relationships between the
grid cell time series. These networks are modeled either as binary [165] or weighted graphs
[68]. Such methods have been successfully employed to forecast El Niño events [105], uncover interesting global-scale patterns responsible for the transfer of energy throughout the
oceans [52], investigate changes in the network structure due to neurobiological disorders
[138] and many more. The main drawback of such an approach is that the size and number
of the nodes (i.e., grid cells) are arbitrarily determined by the measurement technique and
1
2
do not correspond to functionally distinct units.
1.1
Dimensionality reduction methods for spatio-temporal data
To uncover the functional components of a spatio-temporal system, it is necessary to identify the dimensionality in the spatial domain. This can be accomplished through the use of
spatial dimensionality reduction techniques.
A common approach to reduce the dimensionality of a spatio temporal system is through
multivariate statistical methods. Examples of such methods include Principal Component
Analysis (PCA) [92], also known as Empirical Orthogonal Function (EOF) analysis [167],
and Independent Component Analysis (ICA) [89]. PCA (standard or rotated) aims to decompose the observed data into orthogonal vectors (i.e., the principal components) of high
energy content in terms of the variance of the signal. Known drawbacks of PCA include the
fact that lower variance components are masked by higher variance ones, and so the analysis is typically limited to the first one-two principal components, as long as they explain
most of the variance. Further, the orthogonality between PCA components complicates the
interpretation of the results making it difficult to identify the distinct functional components and separate their effects [50]. ICA separates a mixed signal into independent, nonGaussian components. In contrast to PCA there is no orthogonality constraint imposed on
the identified components. However, one cannot determine the variance, sign, or the correct
ordering of the independent components. In other words, ICA does not provide a relative
significance for each component and the number of independent components should be
chosen based on some additional information about the underlying system. Finally, an independent/principal component does not represent a distinct functional component; it is the
mixture of many functional components.
Another broad family of spatio-temporal dimensionality reduction methods is based on
3
clustering [60, 90]. Examples of clustering algorithms include region growing [104], partitioning [139], hierarchical [24], spectral [160] and probabilistic [83] methods. The functionality and scope of each method differs but they share some common characteristics.
For instance, every grid cell needs to belong to a cluster while the actual number of clusters
is often required as an input parameter. Further, the identified clusters are non-overlapping
and might not be spatially contiguous. In particular, the lack of spatial contiguity makes it
hard to distinguish between correlations due to spatial diffusion (or dispersion) phenomena
from correlations that are due to remote interactions between clusters. Relevant to clustering are community detection techniques [8, 145], which are applied on the cell-level
network directly. In contrast to clusters, communities can be overlapping [4, 116], however
there is no spatial contiguity constraint. Further, community detection methods do not decouple the identification of the functional components, to the connections that these have
with each other. Two components in the same community might have different connectivity
patterns to the rest of the network.
1.2
A framework for the analysis of spatio-temporal systems
In this thesis, we propose a framework that first identifies the distinct semi-autonomous
components of a spatio-temporal system as spatially contiguous clusters of grid cells. At a
second step, the (possibly lagged) interactions between them are inferred and their magnitude is assessed.
1.2.1
geo-Cluster
In detail, we first propose geo-Cluster, a method that first infers the spatial components of
the underlying system as “areas”. An area is a spatially contiguous, non-overlapping, set of
(two or more) grid cells that satisfy a homogeneity constraint based on their average pairwise cross-correlation. For parsimony reasons the proposed method aims to maximize the
size of the identified areas. The method requires a single parameter which determines the
minimum degree of homogeneity of the grid cells in each area. Next, a complete weighted
4
network between the identified areas is inferred, modeling the functional relationships between them. The weight of an edge corresponds to the covariance between the area time
series, accounting for the power of the signal of each area as well as the correlation between
the area time series.
The proposed method has been shown to be robust to noise, the resolution of the underlying grid, the parameter that determines the minimum degree of homogeneity in an area,
and the metric used to quantify the similarity between the grid cell time series. geo-Cluster
has been extensively applied to climate data to investigate climate shifts and to construct
interdependent networks [71] between different climate domains. Further, the method is
applied to evaluate cutting edge climate models assessing their ability to reproduce the climate in the past and investigating the model trajectories under a future climate warming
scenario.
1.2.2
δ-MAPS
In real physical systems the underlying spatial components might not have crisp boundaries
[63] and their interactions might not be instantaneous. To this end, we propose δ-MAPS; a
method that identifies spatially contiguous and possibly overlapping components referred
to as “domains”, and identifies the lagged functional relationships between them. Informally, a domain is a spatially contiguous region that somehow participates in the same
dynamic effect or function. The latter will result in highly correlated temporal activity between grid cells of the same domain. Thus, δ-MAPS first identifies the epicenters of activity
of a domain. Next, it identifies a domain as the maximum possible set of spatially contiguous grid cells that include the detected epicenters and satisfy a homogeneity constraint
(based again on the average pair-wise correlation of the grid cells in the domain’s scope).
After identifying the domains, δ-MAPS infers a functional network between them. The
proposed network inference method examines the statistical significance of each lagged
correlation between two domains, applies a multiple-testing process to control the rate of
5
false positives, infers a range of potential lag values for each edge, and assigns a weight
to each edge reflecting the magnitude of interaction between two domains. δ-MAPS is
related to clustering, multivariate statistical techniques and network community detection.
However, as we discuss and also show with synthetic data, it is also significantly different,
avoiding many of the known limitations of these methods.
We illustrate the application of δ-MAPS on data from two domains: climate science
and neuroscience. First, the sea-surface temperature (SST) climate network identifies some
well-known teleconnections (such as the lagged connection between the El Niño Southern
Oscillation and the Indian Ocean). Second, the analysis of resting state fMRI cortical data
confirms the presence of known functional resting state networks (default mode, occipital, motor/somatosensory and auditory), and shows that the cortical network includes a
backbone of relatively few regions that are densely interconnected.
1.3
Organization
The thesis is organized as follows. In Chapter II we present related work on network inference and dimensionality reduction techniques for spatio-temporal data. Using a synthetic
data set, in which the functional components and their interconnections are known, we contrast such methods to the proposed framework and identify key differences and limitations
for each method. In Chapter III we propose geo-Cluster [66], provide robustness results
and show example applications in the field of climate science. In Chapter IV we provide an
extensive application of geo-Cluster to evaluate cutting edge climate models [67]. In Chapter V, we propose δ-MAPS [65]. We compare δ-MAPS to the most common dimensionality
reduction methods and show its application on the fields of climate and neuroscience. Finally, Chapter VI provides the main conclusions from this thesis and an outlook for future
work.
Chapter II
RELATED WORK
There exist several methods to analyze spatio-temporal data, from simple time series analysis, to dimensionality reduction methods (e.g., PCA/clustering), to network based methods.
The proposed framework combines ideas from the latter two approaches. In the following,
we first introduce a synthetic data set in which both the functional components of the system
and their interactions are a priori known. We shall use this data set to illustrate limitations
of existing dimensionality reduction approaches. Next, we provide a brief overview of
the network based approach. We conclude by presenting various dimensionality reduction
techniques, contrasting them to the proposed approach, in terms of their ability to uncover
the functional components of a spatio-temporal system and their interactions.
2.1
Synthetic Data Generation
To better highlight the limitations and differences of various dimensionality reduction
methods, we first introduce an example in which we know both the dimensionality of the
spatio-temporal system as well as the interactions between the different components of the
system.
We construct five domains (modeled as spatially contiguous regions) on a 50×70 spatial
grid. Each domain i is associated with a “mother” time series yi (t), (i=1. . . 5). To make the
experiment more realistic in terms of autocorrelation structure and marginal distribution,
each yi (t) is a real fMRI time series with length T =1200 (see Section 5.6). The five mother
time series yi (t) are uncorrelated (absolute cross-correlation < 0.05 at all lags), and they
are normalized to zero-mean, unit-variance. To create correlations between domains (i.e.,
domain-level edges), we construct five new time series xi (t) based on linear combinations
of two or more mother time series. For instance, if we set xi (t) = (1 − α)yi (t) + αyj (t + τ )
6
7
with 0 < α < 1 and xj (t) = yj (t), domains i and j become positively correlated at a lag
τ ; the correlation increases with α. The time series xi are again normalized to zero-mean,
√
unit-variance. We then scale the time series of domain i by a factor si to control the
variance of each domain (Var[xi (t)] = si ).
For simplicity, each domain is a circle with radius rp . A domain has a “core region”
with the same center and radius rc < rp ; the core is supposed to be the epicenter of that
domain. Every point in the core has the same signal xi (t) (before we add random noise).
Outside the core, the signal attenuates at a distance d from the center of the domain as
follows:
xi (t) =
√
f (d) xi (t), f (d) =
rp − d
, rc ≤ d ≤ rp .
rp − rc
(1)
The parameters of the five synthetic domains are shown in Table 1. The domains differ
in terms of size and power (variance). The spatial extent of the domains is shown in Fig.1A; domains 1 and 3 overlap with domain 2, while domains 4 and 5 also overlap to a smaller
extent. Further, there is a strong and lagged anti-correlation between domains 1 and 3,
a weaker positive correlation at zero-lag between domains 4 and 5, and an ever weaker
positive correlation at zero-lag between domains 3 and 5. The edges of the domain-level
network are also shown in Fig.1-A.
Table 1: Synthetic domain generation parameters.
ID
1
2
3
4
5
2.2
rc
2
4
2
0.5
1
rp
10
14
10
5
7
si
16
11
16
9
6
xi (t)
x1 (t) = 2/3y1 (t) − 1/3y3 (t + 15)
x2 (t) = y2 (t)
x3 (t) = y3 (t)
x4 (t) = 3/4y4 (t) + 1/4y5 (t)
x5 (t) = 4/5y5 (t) + 1/5y3 (t)
Network Based Methods
Spatio-temporal data are usually embedded in a two or three dimensional grid; each grid
cell contains time series of measurements for a given variable. Such a data set can be naturally modeled as a network. The grid cells play the role of the nodes in the network. The
8
edges of the network are inferred based on statistically significant relationships between
the grid cell time series [52, 106, 117]. The network can be modeled either as a binary
[153] or weighted [68] graph.
All these methods require a statistical test to distinguish between significant and nonsignificant edges. Naive approaches to the problem include using a fixed threshold [153] or
requiring the network to have a fixed density [142]. More sophisticated approaches such as
using surrogate time series also exist (see e.g., [100, 107, 127]). However, methods based
on surrogate time series are computationally expensive (compared to the naive approach)
and might not scale for finer-scale resolution data.
The network based approach has been successful in many fields. For example, complex
network approaches have been used to forecast El Niño events [105], map brain regions that
are most likely to be affected by pathological changes [45], identify structures responsible
for the transfer of energy in oceans [52], show how some diseases (such as Alzheimer’s)
can affect the functional structure of the brain network [147] and many more. The main
drawback of such an approach is that the size (and number) of the nodes are arbitrarily
determined by the measurement technique and do not correspond to functionally distinct
units. To counter this problem there has been an effort to identify modular networks. From
the network perspective, a module (or community) corresponds to a set of grid cells highly
interconnected to each other and less connected to the rest of the world. The identification
of communities in networks is a concept similar to clustering and will be discussed further
in Section 2.3.4.
2.3
Dimensionality Reduction methods
In this section we provide an overview of dimensionality reduction methods used to analyze
spatio-temporal data. For each family of methods we show their limitations, when the objective is to identify the functional components and their interconnections in our synthetic
data set.
9
Figure 1: A: The five ground-truth domains. Adjacent domains have different colors,
overlapping regions shown in black, and the core of each domain is in blue. The three
constructed edges are shown in gray lines. B: The homogeneity field r̂K (i) at each cell.
The identified seeds are shown in blue. C: The inferred domains: adjacent domains have
different colors and overlaps are shown in black. D: The inferred domain-level network:
the color map refers to the edge correlation. The lag associated with each edge is also
shown. E,F,G: The first three EOF (PCA) components. The variance explained by each
component is shown at the top of each figure. H,I: The two ICA components. J,K:
K-means clustering. L: The second hierarchical level of community structure as identified
by OSLOM: each community has a distinct color and overlaps are shown in black.
10
2.3.1
Principal Component Analysis
Principal Component Analysis (PCA), also known as Empirical Orthogonal Function (EOF)
analysis, is one of the oldest techniques used to analyze spatio-temporal data. PCA aims to
decompose the original set of variables into a new set of (principal) components that capture most of the observed variance of the data through a linear combination of the original
variables. The identified components are orthogonal to each other and each component is
assigned a value equal to the total variance explained by it.
PCA assumes that the dominant patterns are orthogonal in space and time (which is
not necessarily true, see [132] for a case relevant to climate). To overcome this problem
alternative methods (e.g., rotated PCA) exist [164] but require more user defined parameters
and sometimes split a single pattern into two different ones [167]. An interesting analysis
on how PCA results can be misleading is presented in [50].
We apply PCA using Matlab’s PCA toolbox. Fig. 1-E,F,G show the first three principal
components, which collectively account for about 90% of the total variance. A first observation is that domains 4 and 5 are not even visible in these components – they only appear
in the next two components, which account for about 5% of the variance each. This is because domains 4 and 5 are smaller and have lower variance. This is a general limitation of
PCA: the variance of the analyzed field can be dominated by a small number of “modes of
variability”, completely masking smaller/weaker regions of interest and their connections.
Second, the first three components do not provide a consistent evidence that domains 1
and 3 are strongly anti-correlated; this is due to their lagged correlation, which is missed
by PCA. Third, the first component, which accounts for 40% of the total variance, can be
misinterpreted to imply that domain 2 is somehow positively correlated with domains 1 and
3, even though it is actually generated by an uncorrelated signal. This is due to the overlap
of domain 2 with domains 1 and 3.
11
2.3.2
Independent Component Analysis
A method typically used by the neuroscience community is Independent Component Analysis (ICA) [89]. ICA defines a model for the observed data; in the model the data are
assumed to be linear mixtures of some unknown latent variables (the independent components). The mixing system is also unknown. The unknown latent variables are assumed
to be non-Gaussian and mutually independent. In general, given the observed signals x,
the goal is to identify the mixing matrix A and the independent components s such that
x = As. In contrast to PCA there is no orthogonality constraint on the independent components but there is no way to determine the number, the variance, the sign, or the correct
ordering of the independent components. Thus, one should rely on empirical knowledge to
identify independent components that correspond to specific functional modules.
We apply ICA on the synthetic data using Matlab’s FastICA toolbox. To help ICA
perform better, we actually specified the right number of independent components, which
is two (domains 1,3,4,5 are indirectly correlated – domain 2 is not correlated with any
other). The two independent components are shown in Fig. 1-H,I. Note that only a rough
“shadow” of each domain is visible. Domains 1 and 3 appear in different colors, providing
a hint that they are anti-correlated, while domains 3 and 5 appear in the same color because
they are positively correlated. Overall, however, the components are quite noisy and it
would be hard in practice to discover the functional structure of the underlying system if
we did not know the ground-truth. The results are even harder to interpret when we request
a larger number of components.
2.3.3
Clustering based methods
A broad family of dimensionality methods is based on clustering [13, 139, 152, 160]. None
of these methods though guarantee that the identified clusters are spatially contiguous.
As we show next, spatial contiguity is an attractive property since it will enable us to
differentiate network nodes from large scale networks of nodes [17].
12
We apply the most well-known clustering method, k-means, on our synthetic data. As
commonly done with correlation-based clustering, the distance between two cells i and j is
∗ 1
determined by the maximum absolute correlation across all considered lags, as 1 − |ri,j
|.
Fig. 1-J,K shows the resulting clusters for k=5 (the number of synthetic domains) and
6, respectively. For k=5, domains 1 and 3 form a single cluster because of their strong
anti-correlation; the same happens with domains 4 and 5. Further, two of the five clusters
(green and brown) cover just noise. The situation changes completely when we request
k=6 clusters. In that case, the overlapping regions in domain 2 form a single cluster, while
domains 1 and 3 are separated in different clusters.
More similar to the proposed framework is the notion of spatially contiguous clustering.
Identifying spatially contiguous clusters is a problem that arises in many fields, from image
processing [69], to geographical sciences [55], to studying the Earth’s climate [66] and
the human brain [44]. In general there exist two approaches to find spatial clusters. The
first approach is a semi-supervised approach where after the initial clusters are identified
subsequent (supervised) changes are made to merge them into spatially contiguous regions
[55]. Our focus here are unsupervised approaches, where the spatial contiguity criterion is
incorporated into the clustering algorithm.
A typical approach to identify spatially contiguous clusters is agglomerative hierarchical clustering. In such an approach each grid cell forms its own cluster and clusters
are iteratively joined, according to some distance measure, if they are spatially adjacent.
Many distance measures have been proposed in the literature. For example, in [79] the authors evaluate a family of three different distance measures (single, average and complete
linkage) to cluster U.S. presidential election data into different regions. Essentially this
1
In detail, the Pearson correlation between grid cells i, j and lag τ is given by ri,j (τ ) =
∑T −τ
t=1 (xi (t)−µ̃i )(xj (t+τ )−µ̃j )
, T being the time series length andµ̃i , σ̃i their empirical meand and standard
T σ̃i σ̃j
∗
deviation respectively. |ri,j
| = arg maxτ ={−τmax ...τmax } |ri,j (τ )|, with τmax being the maximum lag.
13
hierarchical clustering process constructs a dendogram; by “cutting” horizontally the dendogram at a level of our choice we obtain the resulting clusters. In the case of the human
brain, hierarchical clustering has been applied to provide a full brain parcellation [111] and
to test clusters (rather than voxels) for activation during task based experiments [81].
Another popular clustering method with applications in fMRI is NCUT [44, 131].
NCUT is a graph based clustering method; edges are removed iteratively until a prespecified number of clusters is reached. The edges are removed such as to maximize the
similarity between the elements in the same cluster while maximizing the dissimilarity between elements in different clusters. Finding the optimal edge to remove is NP-Complete
and the method relies on an approximate solution (found by solving a generalized eigenvalue problem). One of the drawbacks of NCUT is that it is biased to identify clusters of
similar size (for a detailed description of the limitations of spectral clustering methods we
refer the reader to [112]). In [150] the authors compare NCUT to an agglomerative hierarchical clustering that uses Ward’s distance [169]. They show that the latter performs better
both in terms of reproducibility (i.e., sensitivity to noise) as well as in terms of accuracy.
Another group of clustering methods is based on the concept of region growing [2].
Typically, region growing methods start with a number of pre-specified seed regions (a
seed region contains only one grid cell). These regions grow by including grid cells similar
to them, until a homogeneity criterion is reached. Similar to our approach region growing
methods are based on the intuition that neighboring grid cells should have similar values.
In contrast to the proposed method, there is no merging of regions while these are growing.
Selecting the location of seed regions will affect the outcome of the clustering algorithm
and a couple of different approaches exist. For example, in [104] the authors propose that
all grid cells form a seed region. Having identified a seed region for each cell in the grid
they iteratively remove (and keep) the largest regions up to the point that only regions of
size less than an arbitrary threshold remain. In [24] the authors use the concept of stability
maps to select the seed regions and at a second step they use a hierarchical agglomerative
14
clustering to identify functional regions in the cortex.
In contrast to geo-Cluster, all these clustering methods need as an input the number
of clusters to identify. Moreover, such clustering methods identify spatially contiguous
regions even if the underlying field is composed of noise. Further, many of these methods
(e.g. NCUT) can be applied only when the distances between grid cells are positive. If
the distance between the grid cell time series is captured by a measure which also takes
negative values (e.g., Pearson correlation witch is the norm) then the similarity matrix has
to be thresholded to remove them. Similarly to geo-Cluster, the borders of the clusters are
crisp and no overlaps are allowed.
An alternative to clustering are edge detection or border detection methods. Border
detection techniques are based on the idea that “pixels” or grid cells representing the same
object should have a similar value yet distinct from pixels belonging to another area. In
[15] for example, the authors use a graph based border detection technique to extract homogeneous regions from raster data. In [38, 76, 170] the idea of border detection is applied
to fMRI data where the authors try to delineate the borders of functional areas. Border
detection techniques are known to be sensitive to localized patches of noise in the data.
2.3.4
Community Detection Methods
All of these clustering methods suppose that the identified clusters are independent in the
spatial domain with their boundaries well defined. In reality, the borders of the clusters
might not be clearly demarcated. Some grid cells belonging to one cluster can belong to
other clusters as well. For example, when we study the Earth’s climate we are interested
to identify functional domains (e.g. the El Niño Southern Oscillation). Such domains have
identifiable effects in the temperature anomaly field. One could not claim that strict borders exist in the gradients of the temperature as to allow the definition of “crisp” clusters. In
the field of neuroscience there is further evidence of the existence of overlapping clusters.
15
Quoting Fornito et al.: “A model of neuronal architecture that allows for overlapping modules offers a more realistic brain-network organization (for instance, cortical association
areas are known to have a role in multiple networks)” [63]. Further, other studies suggest
that cognitive functions are organized into segregated and overlapping networks [54, 82].
To our knowledge, the only method that allows for overlapping partitions is community
detection. A community is a set of nodes that are highly interconnected to each other, while
having fewer connections to the rest of the network. There are numerous methods to identify communities, for a review we refer the reader to [64]. There are also many applications
of community detection in the fields of climate science and neuroscience. For example in
[145] the authors use community detection techniques to evaluate climate models while
in [143] the authors propose the use of communities as informative predictors in lieu of
climate indices. In [118] the authors apply a wide variety of community detection methods
in fMRI data. At a second step, using a map of task-based activations, they map these
communities to specific cognitive functions. Many authors [21, 176] have also suggested
that the community structure of the human brain deteriorates (i.e. becomes less modular)
as a person gets older. For a comprehensive review of applications of community detection
methods in neuroscience we refer the reader to [137].
Overlapping communities are a “natural” extension to the classic definition of a community. The main premise is that an individual can belong to more than one communities
(e.g. work, family etc.). There exist several approaches to identify overlapping communities (e.g. [4, 59, 116]). One of the first applications of overlapping community detection
in spatio-temporal data (and more specifically in resting state fMRI data) can be found in
[175]. The authors test the capability of the proposed methodology to uncover overlapping
communities in the resting state network. In [171] the authors investigate the overlapping
community structure in the structural brain network and show that the identified communities can be mapped to well-known brain systems. To the best of our knowledge none
16
of these methods guarantees that the identified communities form spatially contiguous regions.
We apply a state-of-the-art overlapping community detection method, referred to as
OSLOM [103], with the default parameter values. The input to OSLOM is a positively
weighted graph: each vertex is a grid cell and an edge between vertices i and j corresponds
∗
to the maximum absolute cross-correlation |ri,j
| across all lags of interest. Absolute corre-
lations less than 30% are considered insignificant and the corresponding edges are pruned.2
As most community detection methods, OSLOM does not distinguish between positive and
negative correlations. OSLOM provides a hierarchy of communities. When applied to our
synthetic data, the first level of hierarchy (not shown) simply groups together domains
1,2,3 in one community (even though domain 2 is uncorrelated with domains 1 and 3), and
domains 4,5 in another community. The connection between domains 3 and 5 is missed.
The second level of hierarchy is shown in Fig. 1-L. Overall, OSLOM does a better job
than PCA/ICA/clustering in detecting the spatial extent of each domain. A small overlap
between domains (1,2) and (2,3) is discovered but to a smaller extent than δ-MAPS (the
results of δ-MAPS are discussed in more detail in Section 5.4). However, a community in
OSLOM is not constrained to be spatially contiguous. This is the reason we see some black
dots in regions 4 and 5; these are non-contiguous overlaps between the communities that
correspond to these two domains. Thus, a community may group together two regions that
are, first, not spatially contiguous, and second, different in terms of how they are connected
to other regions.
2
We have experimented with other pruning thresholds between 20%-50% and the results are very similar
at the first two hierarchy levels.
Chapter III
GEO-CLUSTER: SPATIO-TEMPORAL NETWORK ANALYSIS
FOR STUDYING CLIMATE PATTERNS
3.1 Introduction
Network analysis refers to a set of metrics, modeling tools and algorithms commonly used
in the study of complex systems. It merges ideas from graph theory, statistical physics,
sociology and computer science, and its main premise is that the underlying topology or
network structure of a system has a strong impact on its dynamics and evolution [114]. As
such it constitutes a powerful tool to investigate local and non-local statistical interactions.
The progress made in this field has led to its broad application; many real world systems
are modeled as an ensemble of distinct elements that are associated via a complex set of
connections. In some systems, referred to as structural networks, the underlying network
structure is obvious (e.g. Internet routers as nodes, cables between routers as edges). In
others, the underlying mechanisms for remote connections between different subsystems
are unknown a priori (e.g. social networks, or the climate system); still, their effects can be
mapped into a functional network. An extensive bibliography for applications of network
analysis can be found in [113].
By quantifying statistical interactions, network analysis provides a powerful framework
to validate climate models and investigate teleconnections, assessing their strength, range,
and impact on the climate system. The intention is to uncover relations in the climate system that are not (or not fully) captured by more traditional methodologies used in climate
science [49, 43, 1, 73, 72, 62, 10, 11], and to explain known climate phenomena in terms
of the underlying network’s structure and metrics.
Introductions to the application of network analysis in climate science are presented in
17
18
[142] and [156]. We can classify the prior work in this area in three distinct approaches. A
first approach assigns known climate indices as the nodes of the network [154, 148, 168].
By studying the collective behavior of these nodes, it has been possible to investigate their
relative role over time and to interpret climate shifts in terms of changes in their relative
strength. This approach is obviously sensitive to the initial selection of network nodes, and
it cannot be used to discover new climate phenomena involving other regions.
A second, and more common, approach represents the nodes of the climate network
by grid cells in the given climate field. Specifically, each grid cell is represented by a
node, and edges between nodes correspond to statistically significant relations based on
linear or nonlinear correlation metrics [153, 53]. In this approach, it is common to prune
edges whose statistical significance is below a certain threshold, and to assume that all
remaining edges are equally “strong”, resulting in an unweighted network [157, 53, 142].
This approach has been used to study teleconnections, uncover interesting global-scale
patterns responsible for the transfer of energy throughout the oceans, and analyze relations
between different variables in the atmosphere [157, 155, 173, 52, 51]. A limitation of
this approach is that it results in a very large number of network nodes (all cells in a grid
map), and these nodes cannot be used to describe parsimoniously any identified climate
phenomena.
The third approach focuses on the community structure of the underlying network
[115]. A community is a collection of nodes that are highly interconnected, while having much fewer interactions with the rest of the network. Communities can serve as informative predictors in lieu of climate indices [158, 143, 117], while their evolution and
stability has also received some attention [142, 144]. Clustering techniques have also been
proposed to discover significant geographical regions in a given climate field (again, in
lieu of climate indices) [139], and to identify dipoles (i.e., two regions whose anomalies
are anti-correlated) and to evaluate their significance [95, 94]. These community-based or
clustering techniques, however, do not infer a network of teleconnections between different
19
communities (clusters), and they do not quantify the intensity of teleconnections between
geographically separated regions within the same community (cluster).
In this work, we propose a new method to apply network analysis to climate science.
We first apply a novel network-based clustering method to group the initial set of grid cells
in “areas”, i.e., in geographical regions that are highly homogeneous in terms of the underlying climate variable. These areas represent the nodes of the inferred network. Links
between areas (i.e., the edges of the network) represent non-local dependencies between
different regions over a certain time period. These inter-area links are weighted, and their
magnitude depends on both the cumulative anomaly of each area and the cross-correlation
between the two cumulative anomalies. The similarity of our method to previous community/clustering techniques is that nodes are endogenously determined during the data analysis process. The main differences are that each node corresponds to a distinct geographical
region, and these nodes form a weighted network based on the connection intensity that is
inferred for each pair of nodes. In other words, the proposed method decouples the identification of the geographical boundary of each network node from the estimation of the
connection intensity between different regions.
The proposed method requires a single parameter τ , which determines the minimum
degree of homogeneity between cells of the same area. The method is robust to additive noise, changes in the resolution of the given data set, the selection of the correlation
metric, and variations in τ . The resulting climate network can be applied, regionally or
globally, to identify and quantify relationships between climate areas (or teleconnections)
and their representation in models, and to investigate climate variability and shifts. Finally,
the proposed method can be extended to investigate interactions between different climate
variables.
The rest of this chapter is organized as follows: In Section 3.2 we introduce the data
sets analyzed in this work. We describe the climate network construction algorithm and the
network analysis metrics in Sections 3.3 and 3.4, respectively. The robustness of the climate
20
network inference process is examined in Section 3.5. Applications of the proposed method
to a suite of reanalyses and model data sets are presented in Section 3.6. A discussion of
the main outcome of this work concludes the chapter.
3.2
Data sets
In this section we briefly describe the data sets that are used in the rest of this chapter.
For sea surface temperatures (SSTs), we construct and compare networks based on the
HadISST [121], the ERSST-V3 [134] and the NCEP/NCAR [93] reanalyses. For precipitation, we rely on CMAP merged data [172] and ERA-Interim reanalysis [46]. We also
analyze the SST fields generated by two coupled general circulation models chosen from
the CMIP5 archive: the NASA GISS-E2H [80] and the Hadley Center HadCM3 [75]. We
select randomly two runs of each model from the “historical run” ensembles [149].
Because the quality of the measurements contributing to the SST reanalyses deteriorates
as we move to higher latitudes, we only consider the latitudinal range of [60o N ; 60o S],
avoiding sea-ice covered regions. Also, we mostly focus on the period 1979-2005; in the
case of HadISST reanalysis, we contrast with the network characteristics during the 19501976 interval. Due to space constraints, results are only shown for the boreal winter season
(December to February, DJF). When not specified otherwise, all SST data are interpolated
(using bilinear interpolation) to the minimum common spatial resolution across all data sets
(2o × 2.5o ); for precipitation the resolution is 2.5o × 2.5o .
All climate networks are constructed from detrended anomalies derived from monthly
averages of the corresponding climate field. The detrending is done using linear regression
and the anomalies are computed after removing the annual cycle.
3.3
Climate network construction
The network construction process consists of three steps. First, we compute the “cell-level
network” from the detrended anomaly time series of each cell in the spatial grid. Second,
21
we apply a novel area identification algorithm on the cell-level network to identify the
nodes of the final “area-level network”; an area here represents a geographic region that is
highly homogeneous in terms of the given climate field. Third, we compute the weight of
the edges between areas, roughly corresponding to teleconnections, based on the covariance of the cumulative anomalies of the two corresponding areas. The following network
construction method requires a single parameter, τ , which determines the minimum degree
of homogeneity between cells of the same area. In the following we describe each step in
more detail.
3.3.1
Cell-level network
Consider a climate field x(t) defined on a finite number of cells in a given spatial grid.
The i’th vector of the climate field is a time series xi (t) of detrended anomalies in cell
i. The length of each time series is denoted by T . We first compute Pearson’s crosscorrelation r(xi , xj )1 between the time series xi (t) and xj (t) for every pair of cells i and
j. We calculate the correlations at zero-lag, assuming that the physical processes linking
different cells result from atmospheric wave dynamics and are fast compared to the onemonth averaging time scale of the input time series. Considering time-lagged correlations
is beyond the scope of this chapter. Instead of using Pearson’s correlation, other correlation
metrics could be adopted; in Section 3.5.4 we examine the differences in the resulting
network using a rank-based correlation metric.
Most of prior work on climate network analysis applies a cutoff threshold on the correlations r(xi , xj ) to prune insignificant ones and construct a binary (i.e., unweighted) network
between cells; for a recent review see [141]. Fig. 2 shows correlation distributions for four
SST reanalysis data sets; note that there is no natural cutoff point to separate significant
correlations from noise. We have experimented with methods that first prune insignificant
correlations and then construct unweighted networks, and observed that the final area-level
1
Unless specified otherwise, the term “correlation” will be used to denote Pearson’s cross-correlation
metric between two time series.
22
network is sensitive to the significance level at which correlations are pruned. Such sensitivity complicates any attempt to make quantitative comparisons between networks constructed from different data sets (for example networks from observations versus models).
For this reason, in the following we present a method that considers all pair-wise cell
correlations, without any pruning. Thus, the cell-level network is a complete and weighted
graph, meaning that every pair of cells is connected but with weighted edges between -1
and 1. This cell-level network is the input to the area identification algorithm, described
next.
Figure 2: Empirical Cumulative Distribution Functions (CDF) of correlations for the
HadISST reanalysis during the 1950-1976 and 1979-2005 periods, and for ERSST-V3 and
NCEP data during the 1979-2005 period
3.3.2
Identification of climate areas
A central concept in the proposed method is that of a climate area, or simply area. Informally, an area A represents a geographic region that is highly homogeneous in terms of the
climate field x(t).
In more detail, we define as neighbors of a grid cell i the four adjacent cells of i, and as
path a sequence of cells such that each pair of successive cells are neighbors. An area A is
a set of cells satisfying three conditions:
1. A includes at least two cells.
23
2. The cells in A form a connected geographic region, i.e., there is a path within A
connecting each cell of A to every other cell of that area.
3. The average correlation between all cells in A is greater than a given threshold τ ,
∑
i̸=j∈A r(xi , xj )
>τ
(2)
|A| × (|A| − 1)
where |A| denotes the number of cells in area A.
The parameter τ determines the minimum degree of homogeneity that is required within
an area. A heuristic for the selection of τ is presented in Section 3.8; we use that heuristic
in the rest of this chapter.
For the climate network to convey information in the most parsimonious way, the number of identified climate areas should be minimized. To this end, an area is defined as a
maximum cardinality set of cells, that are spatially contiguous, and whose average pairwise correlation is larger than the threshold τ . In Sec. 5.8 we show that this computational problem is NP-Complete, meaning that there exists no efficient way to solve it in
practice. Consequently, we have designed an algorithm that aims to minimize the number of areas heuristically, based on a so called “greedy” approach [41]. The algorithm
consists of two parts. First, it identifies a set of areas; secondly it merges some of those
areas together as long as they satisfy the previous three area constraints. A pseudocode
describing the algorithm is given in Section 3.9, while the actual software is available at
http://www.cc.gatech.edu/~dovrolis/ClimateNets/. An example of the area identification
process applied to a synthetic grid is illustrated in Fig. 3.
The identification part of the algorithm produces areas that are geographically connected by always expanding an area through neighboring cells. Additionally, the algorithm
attempts to identify the largest (in terms of number of cells) area in each iteration by selecting, in every expansion step, the neighboring cell that has the highest average correlation
with existing cells in that area. The expectation is that this greedy approach allows the area
to expand to as many cells as possible, subject to the constraint that the average correlation
24
Figure 3: An example of the area identification algorithm. (a) 12-cell synthetic grid. (b)
The correlation matrix between cells (given as input). (c) The area expansion process for a
given τ =0.4. Cells shown in red are selected to join the area (denoted by Ak ). Cells 1, 4, 9
and 12 will not join Ak since they do not satisfy the τ constraint in Eq.2
in the area should be more than τ . It is easy to show that an identified area satisfies the
condition given by Eq.2.
Within the set of areas V identified by the first part of the algorithm, it is possible to
find some areas that can be merged further, and still satisfy the previous three constraints.
Specifically, we say that two areas Ai and Aj can be merged into a new area Ak = Ai ∪Aj if
Ai and Aj have at least one pair of geographically adjacent cells and the average correlation
of cells in Ak is greater than τ . The second part of the algorithm, therefore, attempts to
merge as many areas as possible (see Section 3.9).
Fig. 4 shows the identified areas before merging (i.e., after Part-1 in Section 3.9) and
after merging (i.e., after Part-2 in Section 3.9) for the HadISST reanalysis. Fig 4c shows
the distribution of area sizes (in number of cells) before and after merging. Area merging
decreases substantially the number of small areas (the percentage of areas with less than 10
cells in this example drops from 46% to 10%).
The identified areas represent the nodes of the inferred climate network. We refer to this
network as “area-level network” to distinguish it from the underlying cell-level network.
3.3.3
Links between areas
Links (or edges) between areas identify non-local relations and can be considered a proxy
for climate teleconnections. To quantify the weight of these links, we first compute for
25
Figure 4: Identified areas in the HadISST 1979-2005 data set (τ =0.496). (a) The 176 areas
identified by Part-1 of the area identification algorithm. (b) The 74 “merged” areas after
the execution of Part-2. (c) The CDF of area sizes (in number of cells) before and after the
merging process
each area Ak the cumulative anomaly Xk (t) of the cells in that area,
Xk (t) =
∑
xi (t) cos(ϕi ) .
(3)
i∈Ak
The anomaly time series of a cell i is weighted by the cosine of the cell’s latitude (ϕi ), to
account for the cell’s relative size. As a sum of zero-mean processes, a cumulative anomaly
is also zero-mean.
Fig. 5 quantifies the relation between the size of the areas (
∑
i∈Ak
cos(ϕi )) identified
earlier in the HadISST data set and the standard deviation of their cumulative anomaly.
Note that the relation is almost linear, at least excluding the largest 3-4 areas. Exact linearity
would be expected if all cells had the same size, their anomalies had the same variance, and
every pair of cells in the same area had the same correlation. Even though these conditions
are not true in practice, it is interesting that the standard deviation of an area’s cumulative
anomaly is roughly proportional to its size.2
2
When comparing data sets with different spatial resolution, the anomaly of a cell should be normalized
26
Figure 5: The relation between area size and standard deviation of the area’s cumulative
anomaly (R2 = 0.88) for the HadISST reanalysis during the 1979-2005 period; τ =0.496
The strength, or weight, of the link between two areas Ai and Aj is captured by the
covariance of the corresponding cumulative anomalies Xi (t) and Xj (t). Specifically, every
pair of areas Ai and Aj in the constructed network is connected with a link of weight
w(Ai , Aj ),
w(Ai , Aj ) , w(Xi , Xj ) = cov(Xi , Xj ) = s(Xi ) s(Xj ) r(Xi , Xj )
(4)
where s(Xi ) is the standard deviation of the cumulative anomaly Xi (t), while cov(Xi , Xj )
and r(Xi , Xj ) are the covariance and correlation, respectively, of the cumulative anomalies
Xi (t) and Xj (t) that correspond to areas Ai and Aj . Note that the weight of the link between two areas does not depend only on their (normalized) correlation r(Xi , Xj ), but also
on the “power” of the two areas, as captured by the standard deviation of the corresponding
cumulative anomalies. Also, recall from the previous paragraph that this standard deviation is roughly proportional to the area’s size, implying that larger areas will tend to have
stronger connections. The link between two areas can be positive or negative, depending
on the sign of the correlation term. Fig. 6 presents the cumulative distribution function
(CDF) of the absolute correlation between the cumulative anomalies of areas for four SST
by the size of the cell in that resolution.
27
networks. As with the correlations of the cell-level network, there is no clear cutoff3 separating significant correlations from noise. For this reason we prefer to not prune the weaker
links between areas. Instead, every pair of areas Ai and Aj is connected through a weighted
link and the resulting graph is complete.
Figure 6: CDF of the absolute correlation between area cumulative anomalies for the
HadISST reanalysis during the 1950-1976 and 1979-2005 periods, and for ERSST-V3 and
NCEP during the 1979-2005 period
3.4
Network metrics
We now proceed to define a few network metrics that are used throughout the chapter. A
climate network N is defined by a set V of areas A1 , . . . , A|V | , representing the nodes of
the network, and a set of link weights, given by Eq. 4. Because the network is a complete
weighted graph, basic graph theoretic metrics that do not account for link weights (such
as average degree, average path length, or clustering coefficient) are not relevant in this
context.
A first representation of the network can be obtained through link maps. The link map
of an area Ak shows the weight of the links between Ak and every other area in the network. Link maps provide a direct visualization of the correlations, positive and negative,
between a given area and others in the system, often related to atmospheric teleconnection
3
Imposing a threshold on the actual strength of the link (computed as the covariance between the cumulative anomalies of two areas) would be incorrect. For example, multiplying low correlations with large
standard deviations can produce links of significant weight.
28
patterns. For instance, Fig. 7 shows link maps for the two largest areas identified in the
HadISST network in the 1979-2005 period. The first area has a clear correspondence to the
El Niño Southern Oscillation (ENSO); indeed, the cumulative anomaly over that area and
most common indices that describe ENSO variability are highly correlated (the correlation
reaches 0.94 for the Niño-3.4 index). The links of this “ENSO” area depict known teleconnections and their strength. The second largest area covers most of the tropical Indian
Ocean and represents the region that is most responsive to interannual variability in the Pacific. It corresponds, broadly, to the region where significant warming is observed during
peak El Niño conditions [32].
(a)
(b)
Figure 7: Link maps for two areas related to (a) ENSO and (b) the equatorial Indian Ocean
in the HadISST 1979-2005 network (τ =0.496). The color scale represents the weight of
the link between the area shown in black and every other area in this SST network
Another metric is the strength of an area (also known as weighted degree), defined as
the sum of the absolute link weights of that area,
W (Ai ) =
V
∑
j̸=i
|w(Ai , Aj )| = s(Xi )
V
∑
s(Xj )|r(Xi , Xj )| .
(5)
j̸=i
Note that anti-correlations (negative weights) also contribute to an area’s strength. Fig. 8
29
shows, for example, the strength maps for two HadISST networks covering the 19501976 and 1979-2005 periods, respectively. Both the geographical extent of areas and their
strength display differences in the two time intervals, particularly in the North Pacific sector
and in the tropical Atlantic [110, 125].
(a)
(b)
Figure 8: Strength maps for two different time periods using the HadISST data set. (a)
1950-1976 network, strength of ENSO area: 20.1 × 104 ; (b) 1979-2005 network, strength
of ENSO area: 18.8 × 104
It is often useful to “peel” the nodes of a network in successive layers of increasing
network significance. For weighted networks, we can do so through an iterative process
referred to as s-core decomposition [161]. The areas of the network are first ordered in
terms of their strength. In iteration-1 of the algorithm, the area with the minimum strength,
say Wmin , is removed. Then we recompute the (reduced) strength of the remaining areas,
and if there is an area with lower strength than Wmin , it is removed as well. Iteration-1
continues in this manner until there is no area with strength less than Wmin . The areas
removed in this first iteration are placed in the same layer. The algorithm then proceeds
similarly with iteration-2, forming the second layer of areas. The algorithm terminates
when we have removed all areas, say after K iterations. Finally, the K layers are re-labeled
30
as “cores” in inverse order, so that the first order core consists of the areas removed in the
last iteration (the strongest network layer), while the Kth order core consists of the areas
removed in the first iteration (the weakest layer). Fig. 9 shows the top five cores for two
HadISST networks, covering 1950-1976 and 1979-2005, respectively. Again, changes in
the relative role of areas are apparent in the North Pacific and in the tropical Atlantic.
(a)
(b)
Figure 9: Color maps depicting the top-5 order cores for the (a) HadISST 1950-1976, and
(b) HadISST 1979-2005 networks
Visual network comparisons provide insight but quantitative metrics that summarize
the distance between two networks into a single number would be useful. A challenge is
that the climate networks under comparison may have a different set of areas, and it is not
always possible to associate an area of one network with a unique area of another network.
We rely on two quantitative metrics: the Adjusted Rand Index (ARI), which focuses
on the similarity of two networks in terms of the identified areas, and the Area Strength
Distribution Distance, or simply Distance metric, which considers the magnitude of link
weights and thus area strengths.
The (non-adjusted) Rand Index is a metric that quantifies the similarity of two partitions
of the same set of elements into non-overlapping subsets or “clusters” [120]. Every pair
31
of elements that belong to the same cluster in both partitions, or that belong to different
clusters in both partitions, contributes positively to the Rand Index. Every pair of elements
that belong to the same cluster in one partition but to different clusters in the other partition, contributes negatively to the Rand Index. The metric varies between 0 (complete
disagreement between the two partitions) to 1 (complete agreement). A problem with the
Rand Index is that two random partitions would probably give a positive value because
some agreement between the two partitions may result by chance. The Adjusted Rand Index (ARI) [86, 140] ensures that the expected value of ARI in the case of random partitions
is 0, while the maximum value is still 1. We refer the reader to the previous references for
the ARI mathematical formula.
In the context of our method, the common set of elements is the set of grid cells, while
a partition represents how cells are classified into areas (i.e., each area is a cluster of cells).
Cells that do not belong to any area are assigned to an artificial cluster that we create
just for computing the ARI metric. We use the ARI metric to evaluate the similarity of
two networks in terms of the identified areas. This metric, however, does not consider
cell anomalies and cell sizes, and so it cannot capture similarities or differences between
two networks in terms of link weights, and area strengths. Two networks may have some
differences in the number or spatial extent of their areas, but they can still be similar if
those “ambiguously clustered” cells do not have a significant anomaly compared to their
area’s anomaly. Also, two networks can have similar areas but the magnitude of their area
anomalies can differ significantly, causing significant differences in link weights and thus
area strengths. Further, the ARI metric cannot be used to compare data sets with different
resolution because the underlying set of cells in that case would be different between the
two networks.
For these reasons, together with the ARI, we rely on a distance metric that is based
on the area strength distribution of the two networks. The strength of an area, in effect,
summarizes the combined effect of the area’s spatial scope (which cells participate in that
32
area), and of the anomaly and size of those cells.
Given two networks N and N ′ with V and V ′ ≤ V areas, respectively, we first add
V − V ′ “virtual” areas of zero strength in network N ′ so that the two networks have the
same number of nodes. Then, we rank the areas of each network in terms of strength,
with Ai being the i’th highest-strength area in network N . Fig. 10a shows the ranked
area strength distributions for the HadISST networks covering 1950-1976 and 1979-2005
periods. The distance dsd (N, N ′ ) quantifies the similarity between two networks in terms
of their ranked area strength distribution,
′
dsd (N, N ) =
V
∑
|W (Ai ) − W (A′i )|
(6)
i=1
To normalize the previous metric, we introduce the relative distance Dsd (N, N ′ ). Specifically, we construct an ensemble of randomized networks Nr with the same number of
areas and link weight distribution as network N , but with random assignment of links to
areas. The random variable dsd (N, Nr ) represents the distance between N and a random
network Nr , while dsd (N, Nr ) denotes the sample average of this distance across 100,000
such random networks. The relative distance Dsd (N, N ′ ) is then defined as
Dsd (N, N ′ ) =
dsd (N, N ′ )
dsd (N, Nr )
.
(7)
Note that Dsd (N, N ′ ) represents an ordered relation, from network N to N’. A relative distance close to 0 implies that N ′ is similar to N in terms of the allocation of link weights to
areas. As the relative distance approaches 1, N ′ may have a similar link weight distribution
with N , but the two networks differ significantly in the assignment of links to areas. The
relative distance can be larger than 1 when N ′ ’s link weight distribution is significantly
different than that of N .
Two networks may be similar in terms of the identified areas (high ARI) but with large
distance (high Dsd ) if the strength of at least some areas is significantly different across
the two networks (perhaps due to the magnitude of the underlying cell anomalies). In
principle, it could also be that two networks have similar ranked area strength distributions
33
(low Dsd ) but significant differences in the number or spatial extent of the identified areas.
Consequently, the joint consideration of both metrics allows us to not only evaluate or rank
pairs of networks in terms of their similarity, but also to understand which aspects of those
pairs of networks are similar or different.
(a)
(b)
Figure 10: (a) Distribution of ranked area strengths for two networks constructed using the
HadISST data set over the periods 1950-1976 and 1979-2005, respectively. (b) Distance
Dsd (N, Nγ ) and ARI(N, Nγ ) between the HadISST 1979-2005 network and networks
constructed after the addition of white Gaussian noise in the same data set
We can also map a distance Dsd (N, N ′ ) to an amount of White Gaussian Noise (WGN)
that, if added to the climate field that produced N , will result in a network with equal
34
distance from N . In more detail, let s2 (xi ) be the sample variance of the anomaly time
series xi (t) in the climate field under consideration. We construct a perturbed climate field
by adding WGN with variance γ s2 (xi ) to every xi (t), where γ is referred to as the noiseto-signal ratio. Then, we construct the corresponding network Nγ , and Dsd (N, Nγ ) is its
distance from N . A given distance Dsd (N, N ′ ) can be mapped to a noise-to-signal ratio γ
when Dsd (N, N ′ ) = Dsd (N, Nγ ). Similarly, a given ARI value ARI(N, N ′ ) can be mapped
to noise-to-signal ratio γ such that ARI(N, N ′ ) = ARI(N, Nγ ). Fig. 10b shows how γ
affects Dsd (N, Nγ ) and ARI(N, Nγ ) when the network N corresponds to the HadISST
1979-2005 reanalysis. As a reference point, note that a low noise magnitude, say γ=0.1,
corresponds to distance D ≈0.12 and ARI ≈0.68.
Finally, we emphasize that the ARI and Dsd metrics focus on the global scale. Even if
two networks are quite similar according to these two metrics, meaningful differences at
the local scale of individual areas may still exist. The study of regional climate effects may
require an adaptation of these metrics.
3.5
Robustness analysis
Analyzing climate data poses many challenges: measurements provide only partial geographical and temporal coverage, while the collected data are subject to instrumental biases
and errors both random and systematic. Greater uncertainties exist in general circulation
model outputs: climate simulations are dependent on modeling assumptions, complex parameterizations and implementation errors. An important question for any method that
identifies topological properties of climate fields is whether it is robust to small perturbations in the input data, the method parameters, or in the assumptions the method is based on.
If so, the method can provide useful information on the climate system despite uncertainties of various types. In this section, we examine the sensitivity of the inferred networks
to deviations in the input data, the parameter τ , and certain methodological choices. In
all cases we quantify sensitivity by computing the Dsd and ARI metrics from the original
35
network to each of the perturbed networks.
3.5.1
Robustness to additive white Gaussian noise
As described in Section 3.4, a simple way to perturb the input data is to add white Gaussian
noise to the original climate field time series. The magnitude of the noise is controlled by
the noise-to-signal ratio γ. The distance Dsd and ARI from the original network N to the
“noisy” networks Nγ are shown in Fig. 10b for the HadISST reanalysis over 1979-2005.
To visually illustrate how noise affects the identified areas, and in particular their strength,
Fig. 11 presents strength maps for two values of γ; the area strengths should be compared
with Fig. 8b. Although some differences exist, the ENSO area strength is comparable to
that of the original network, and the hierarchy (in terms of strength) in the three basins is
conserved.
(a)
(b)
Figure 11: Strength maps for two perturbations of the HadISST 1979-2005 data set using
white Gaussian noise. (a) γ=0.05, strength of ENSO area: 18.0 × 104 . (b) γ=0.10,
strength of ENSO area: 19.1 × 104
36
3.5.2
Robustness to the resolution of the input data set
All data sets compared in this chapter have been spatially interpolated to the lowest common resolution. Here we investigate the robustness of the identified network to the resolution of the input data set. To do so, consider the HadISST reanalysis over the 1979-2005
period and compare the network discussed so far, constructed using data interpolated on
a 2o lat × 2.5o lon grid, with two networks based on a lower (4o lat × 4o lon) and a higher
(1o lat × 2o lon) resolution realization of the same reanalysis. Fig. 12 shows strength maps
for the two new networks. As we lower the resolution the total number of areas decreases,
and the areas immediately surrounding the ENSO-related area get weaker. Nonetheless,
the hierarchy of area strengths in the three basins is preserved, and differences are small,
as quantified by the distance metric. The distance from the default to the high resolution
network is Dsd (N, N ′ )=0.10 (γ=0.07). The distance from the default to the low resolution
network is Dsd (N, N ′ )=0.11 (γ=0.10). As previously mentioned, the ARI cannot be used
to compare data sets with different spatial resolution.
3.5.3
Robustness to the selection of τ
Recall that the parameter τ represents the threshold for the minimum average pair-wise
correlation between cells of the same area. Even though we provide a heuristic (see Section
3.8) for the selection of τ , which depends on the given data set, it is important to know
whether small deviations in τ have a major effect on the constructed networks.
Considering again the HadISST 1979-2005 reanalysis, Fig. 13 presents the relative distance and ARI from the original network N constructed using τ =0.496 (it corresponds to a
significance level α = .1%), to networks Nτ constructed using different τ values. We vary
τ by ±10%, in the range 0.45–0.55. This corresponds to a large change, roughly an order
of magnitude, in the underlying significance level α.
Fig. 14 visualizes strength maps for the two extreme values of τ in the previous range.
While some noticeable differences exist, the overall area structure appears robust to the
37
choice of τ . By increasing τ , we increase the required degree of homogeneity within an
area, and therefore the resulting network will be more fragmented, with more areas of
smaller size and lower strength, and vice versa for decreasing τ .
3.5.4
Robustness to the selection of the correlation metric
The input to the network construction process is a matrix of correlation values between
all pairs of cells. So far, we have relied on Pearson’s correlation coefficient, which is a
linear dependence measure between two random variables. Any other correlation metric
could be used instead. To verify that the properties of the resulting network do not depend
strongly on the selected correlation metric, we use here the non-parametric Spearman’s
rank coefficient to compute cell-level correlations.
Fig. 15 shows the strength map for the HadISST 1979-2005 network using Spearman’s
correlation metric. Again, while small changes are apparent, the size and shape of the major
areas and their relative strength are unaltered. Dsd (N, N ′ )=0.08 and ARI(N, N ′ )=0.76,
where N is the network shown in Fig. 8b; both metrics correspond to γ=0.05.
We have performed similar robustness tests using precipitation data obtaining comparable results.
38
(a)
(b)
(c)
Figure 12: Strength maps for the HadISST 1979-2005 network at three different
resolutions. (a) Low resolution network, (4o lat × 4o lon), strength of ENSO area:
18.2 × 104 . (b) Default resolution network, (2o lat × 2.5o lon), strength of ENSO area:
18.8 × 104 . (c) High resolution network, (1o lat × 2o lon), strength of ENSO area:
18.2 × 104
39
(a)
(b)
Figure 13: (a) Distance Dsd and (b) ARI from the original HadISST 1979-2005 network
(marked with an asterisk in the x-axis, τ =0.496) to networks constructed with different
values of τ . The black horizontal lines correspond to the distance Dsd (N, Nγ ) and
ARI(N, Nγ )
40
(a)
(b)
Figure 14: Strength maps for the HadISST 1979-2005 network using two values of the
parameter τ . The “default” value is τ =0.496, corresponding to α=.1% (see Section 3.8).
(a) τ =0.45, strength of ENSO area: 18.7 × 104 . (b) τ =0.55, strength of ENSO area:
18.6 × 104
Figure 15: Strength map for the HadISST 1979-2005 network using Spearman’s
correlation; strength of ENSO area: 18.5 × 104
41
3.6
Applications
We now apply the proposed method to the climate data sets described in Section 3.2 to
illustrate that network analysis can be successfully used to compare data sets and to validate model representations of major climate areas and their connections. We proceed by
constructing networks for three different SST reanalyses and two precipitation data sets.
We then examine the relation between two different climate fields (SST and precipitation)
introducing a regression of networks technique. Finally, we analyze the network structure
of the SST fields from two models participating in CMIP5.
3.6.1
Comparison of SST networks
Here we investigate the network properties and metrics for three SST reanalyses focusing
on the 1979-2005 period. Two of them, HadISST and ERSST-V3, use statistical methods
to fill sparse SST observations; HadISST implements a reduced space optimal interpolation (RSOI) technique, while ERSST-V3 adopts a method based on empirical orthogonal
function (EOF) projections. NCEP/NCAR uses the Global Sea Ice and Sea Surface Temperatures (GISST2.2) from the U.K. Meteorological Office until late 1981 and the NCEP
Optimal Interpolation (OI) SST analysis from November 1981 onward. The GISST2.2 is
based on empirical orthogonal function (EOF) reconstructions [87]. The OI SST analysis
technique combines in situ and satellite-derived SST data [123]. To minimize the possibility of artificial trends, and the bias introduced by merging different data sets, GISST data
are modified to include an EOF expansion based on the IO analysis from January 1982 to
December 1993.
In Fig. 16, we quantify the differences between the three reanalyses showing correlation
maps between the detrended DJF SST anomaly time series for HadISST and ERSST-V3,
HadISST and NCEP, and ERSST-V3 and NCEP. The patterns that emerge in the all correlation maps are similar. Correlations are generally higher than 0.9 in the equatorial Pacific,
due to the almost cloud free sky and to the in-situ coverage provided since the mid 80s’
42
first by the Tropical Ocean Global Atmosphere (TOGA) program, and then by the Tropical
Atmosphere Ocean (TAO)/Triangle Trans-Ocean Buoy Network (TAO/TRITON) program
[166]. Good agreement between reanalyses is also found in the north-east Pacific, in the
tropical Atlantic and in the Indian and Pacific Oceans between 10o S and 30o S. Correlations decrease to approximately 0.7 in the equatorial Indian Ocean and around Indonesia,
where cloud coverage limits satellite retrievals, and reach values as small as 0.2-0.3 in the
Labrador Sea, close to the Bering Strait and south of 40o S, particularly in the Atlantic and
Indian sectors, due to persistent clouds and poor availability of in-situ data. North of 60o N
and south of 60o S the presence of inadequately sampled sea-ice and intense cloud coverage
reduce even further the correlations, that attain non-significant values almost everywhere.
At those latitudes any comparison between those reanalyses and their resulting networks is
meaningless given that it would not possible to identify a reference data set.
The strength maps constructed using these data sets show differences in all basins, and
suggest that the network analysis performed allows for capturing more subtle properties
than correlation maps (Fig. 17). To begin with the strongest area, corresponding to ENSO,
we notice that it has a similar shape in HadISST and NCEP, but it extends further to the
west in ERSST-V3. Its strength is about 10% higher in NCEP compared to the other two
reanalyses. In HadISST, the equatorial Indian Ocean appears as the second strongest area,
followed by areas surrounding the ENSO region in the tropical Pacific and by the tropical
Atlantic. In ERSST-V3 the area comprising the equatorial Indian Ocean has shape and
size analogous to HadISST, but 30% weaker, and it is closer in strength to the area covering the warm-pool in the western tropical Pacific. Also the areas comprising the tropical
Atlantic are slightly weaker than in the other two data sets. HadISST and ERSST-V3 display a similar strength hierarchy, with the Pacific Ocean being the basin with the strongest
(ENSO-like) area, followed by the Indian, and finally by the Atlantic Ocean. In NCEP all
tropical areas (except the area corresponding to the ENSO region) have similar strength and
the hierarchy between Indian and Atlantic Oceans is inverted. Also, the equatorial Indian
43
(a)
(b)
(c)
Figure 16: Pearson correlation maps between the SST anomaly time series in all pairs of
three reanalyses data sets over the 1979-2005 period in boreal winter (DJF). Correlations
between (a) HadISST and ERSST-V3; (b) HadISST and NCEP; (c) NCEP and ERSST-V3
Ocean appears subdivided in several small areas.
Differences in strength maps are also reflected in the s-core decomposition (Fig. 18) and
in the links between the ENSO-related areas and other areas in the network (Fig. 19). In
HadISST and ERSST-V3, the first order core is located in the tropical and equatorial Pacific
and Indian Ocean, while in NCEP it is limited to the Pacific. As a consequence the strength
of the link between the ENSO-related area and the Indian Ocean is much stronger in the first
two reanalyses than in NCEP. In HadISST, the ENSO-related and Indian Ocean areas are
44
(a)
(b)
(c)
Figure 17: Strength maps for networks constructed based on (a) HadISST (ENSO area
strength 18.8 × 104 ); (b) ERSST-V3 (ENSO area strength 17.6 × 104 ); (c) NCEP (ENSO
area strength 21.0 × 104 ). In all networks the period considered is 1979-2005
separated by regions of higher order in the western Pacific, organized in the characteristic
“horse-shoe” pattern. In the other two reanalyses the first order core extends along the
whole Pacific equatorial band and includes the horse-shoe areas. In correspondence, the
links between the ENSO-like and the western Pacific areas are, in absolute value, weaker
than the link between ENSO and the Indian Ocean in HadISST, but comparable in ERSSTV3. NCEP shows significantly weaker links overall, but the highest link weights are found
between ENSO and the western Pacific.
45
(a)
(b)
(c)
Figure 18: Top-5 order cores in (a) HadISST; (b) ERSST-V3; (c) NCEP. The period
considered is 1979-2005 in all cases
To conclude the comparison of different SST reanalyses, we measure the distance and
ARI values from HadISST to the other two networks. The distance from HadISST to
ERSST-V3 is small, Dsd (N, N ′ )=0.16, mapped to a noise-to-signal ratio γ=0.15. The
strongest areas show indeed a good correspondence in strength and size in the two data sets,
even if the shape of the ENSO-related areas differ. The distance from HadISST to NCEP,
Dsd (N, N ′ )=0.29 with γ=0.35, is greater, as expected from the previous figures, given that
all areas except of the ENSO-related one appear significantly weaker, while the ENSO area
is stronger than in HadISST. NCEP is also penalized because of the differences, compared
46
(a)
(b)
(c)
Figure 19: Links between the ENSO-like area shown in black and all other areas in the
three reanalyses. (a) HadISST, (b) ERSST-V3 and (c) NCEP networks
to HadISST, in the strength (and size) of areas over the Indian Ocean and in the horse-shoe
pattern. Recall that Dsd compares areas based on their strength ranking, independent on
their geographical location. In this respect, the two strongest areas represented by ENSO
and Indian Ocean in HadISST are replaced by ENSO and the North Pacific extension of
the horse-shoe region in NCEP. The ARI metric, on the other hand, ranks NCEP closer to
HadISST than ERSST-V3 (ARI=0.59 for NCEP and ARI=0.54 for ERSST-V3, mapped to
γ=0.35 and 0.45, respectively). The shape of the ENSO-related area and of areas in the
tropical Atlantic and south of 30o S are indeed in better agreement between HadISST and
47
NCEP, despite having different strengths.
The previous discussion illustrates that Dsd and ARI should be considered jointly, as
they provide complementary information about the similarity and differences between two
networks.
3.6.2
Network changes over time
Network analysis can also be a powerful tool to detect and quantify climate shifts. The
insights that network analysis can offer, compared to more traditional time series analysis
methods, are related to the detection of changes in network metrics that are associated
with specific climate modes of variability, regional or global. Topological changes may
include addition or removal of areas, significant fluctuations in the weight of existing links
(strengthening and weakening of teleconnections), or variations in the relative significance
of different areas, quantified by the area strength distribution. For instance, Tsonis and
co-authors have built a network of four interacting nodes using the major climate indices,
the North Atlantic Oscillation (NAO), ENSO, the North Pacific Oscillation (NPO) and the
Pacific Decadal Oscillation (PDO), and suggested that those climate modes of variability
tend to synchronize with a certain coupling strength [154]. Climate shifts, including the
one recorded in the north Pacific around 1977 [110], could result from changes in such
coupling strength.
Here we compare the climate networks constructed on the HadISST data set over the
periods 1950-1976 and 1979-2005 to illustrate that the proposed methodology may also
provide insights into the detection of climate shifts. Instead of simply comparing different
periods, it is possible to use a sliding window in the network inference process to detect
significant changes or shifts without prior knowledge; we will explore this possibility in
future work.
Strength maps for the two networks were shown in Fig. 8, while the top-5 order cores
48
were shown in Fig. 9. The links from the ENSO-related area and from the equatorial Indian Ocean during the 1950-1976 period are presented in Fig. 20, and they can be compared
with Fig. 7. When the 1979-2005 period is compared to the earlier period, we note a substantial strength decrease for the area covering the south tropical Atlantic and a significant
weaker link between this area and ENSO. This suggests an alteration in the Pacific-Atlantic
connection, which indeed has been recently pointed out by [125] and may be linked to the
Atlantic warming [102]. Additionally, there is a change in the sign of the link weight between the ENSO area and the area off the coast of Alaska in the north Pacific, which is
related to the change in sign of the PDO in 1976-1977 [110, 78].
Despite those differences, the distance from the 1979-2005 HadISST network to the
1950-1976 network is less than the distance from the former to any of the other reanalyses
investigated earlier: Dsd (N, N ′ )=0.13 with noise γ=0.10. The ARI, on the other hand, is
0.55 (γ=0.40). The ARI value reflects, predominantly, the changes in shape and size of the
ENSO-related areas and of the areas over the North Atlantic and North Pacific.
3.6.3
Comparison of precipitation networks
One of the advantages of the proposed methodology is its applicability, without modifications, to any climate variable. As an example, in the following we focus on precipitation,
chosen for having statistical characteristics very different from SST due to its intermittency.
We investigate the network structure of the CPC Merged Analysis of Precipitation (CMAP)
[172] and ERA-Interim reanalysis [46]. Both data sets are available from 1979 onward.
CMAP provides gridded, monthly averaged precipitation rates obtained from satellite estimates. ERA-Interim is the outcome of a state-of-the-art data assimilative model that assimilates a broad set of observations, including satellite data, every 12 hours. As in the case
of SSTs, we present the precipitation networks focusing on boreal winter (December to
January) based on detrended anomalies from 1979 to 2005. Fig. 21 shows the map of area
strengths for both data sets, Fig. 22 presents the top-5 order cores, while Fig. 23 depicts
49
(a)
(b)
Figure 20: Links for the HadISST network over 1950 - 1976 from the (a) ENSO-related
area, and (b) the equatorial Indian Ocean area (in black in the two panels)
links from the strongest area in the two networks.
The precipitation network is, not surprisingly, characterized by smaller areas, compared
to SSTs. Precipitation time series are indeed highly intermittent, resulting in weaker correlations between grid cells. The areas with the highest strength are concentrated in the
tropics, where deep convection takes place. The strongest area is located in the equatorial Pacific in correspondence with the center of action of ENSO. In CMAP, this area is
linked with strong negative correlation to the area covering the warm-pool region, and together they represent the first order core of this network. The second order core covers
the eastern part of the Indian Ocean and eastern portion of the South Pacific Convergence
Zone (SPCZ). Both those regions are strongly affected by the shift in convection associated with ENSO events. In the reanalysis, the warm-pool area extends predominantly into
the northern hemisphere, and its strength and size, as well as the weight of its link with
the ENSO-related area, are reduced. Additionally, the Indian Ocean is subdivided in small
50
(a)
(b)
Figure 21: Precipitation networks. Area strength map in (a) CMAP (equatorial Pacific
area strength 49.4 × 104 ), and (b) ERA-Interim (equatorial area strength 41.0 × 104 )
(a)
(b)
Figure 22: Top-5 order cores in (a) CMAP, and (b) ERA-Interim
51
areas all of negligible strength, similarly to what seen for NCEP SSTs, indicating that the
atmospheric teleconnection between ENSO and the eastern Indian Ocean that causes a shift
in convective activity over the Indian basin (see e.g. [98, 27]) is not correctly captured by
ERA-Interim. The s-core decomposition does not include in the second order core any
area in the Indian Ocean, but is limited to two areas to the north and to the south of the
ENSO-related one.
The distance from the CMAP network to the ERA-Interim network is Dsd (N, N ′ )=0.21,
with γ=0.25, while the ARI value is 0.49, with γ=0.45. These values reflect larger differences compared to the SST networks we presented earlier, but precipitation is known to be
one of the most difficult fields to model, even when assimilating all available data, due to
biases associated with the cloud formation and convective parameterization schemes [3].
In particular Dsd is affected by the significant difference in the strength and size of the area
over the warm-pool, and of the one between the ENSO-related area and the warm-pool,
while the ARI is affected by the difference in the partitions over the warm-pool and most
of the Atlantic basin.
3.6.4
Regression between networks
So far we have shown applications of network analysis considering one climate variable at
a time. In climate science it is often useful to visualize the relations between two or more
variables to understand, for example, how changes in sea surface temperatures may impact
rainfall. A simple statistical tool that highlights such relations is provided by regression
analysis. Here we apply a similar approach using climate networks.
Consider two climate networks Nx and Ny , constructed using variables x(t) and y(t),
respectively. The relation between an area of Nx and the areas of Ny can be quantified
based on the cumulative anomaly of each area, using the earlier link weight definition (see
Eq. 4). Similarly, a link map for an area Ai ∈ Vx can be constructed based on the link
weights between the area Ai and all areas Aj ∈ Vy .
52
(a)
(b)
Figure 23: Link maps from the strongest area (in black) for the two precipitation
reanalysis data sets. (a) CMAP; (b) ERA Interim
For instance, we construct a network linking the area that corresponds to ENSO in the
HadISST reanalysis to the areas of the CMAP precipitation network for the period 19792005 in boreal winter. Both networks are dominated by the ENSO area and it is expected
that this exercise will portrait the ENSO teleconnection patterns. Results are shown in
Fig. 24. The regression of the rainfall network onto the ENSO-related area in the SST
reanalysis visualizes the well known shift of convective activity from the warm-pool into
the central and eastern equatorial Pacific during El Niño. For positive ENSO episodes,
negative precipitation anomalies concentrate in the warm-pool and extend to the SPCZ
and the eastern Indian Ocean. Weak, positive correlations between SST anomalies in the
equatorial Pacific and precipitation are seen over the western Indian Ocean and east Africa,
part of China, the Gulf of Alaska and the north-east USA. This approach is only moderately
useful on reanalysis or observational data, where known indices can be used to perform
regressions without the need of constructing a network. Its extension to model outputs,
53
however, is advantageous compared to traditional methods, because it does not require any
ad-hoc index definition, but relays on areas objectively identified by the proposed network
algorithm.
Figure 24: Link maps from the ENSO-like area in HadISST data set to all areas in the
CMAP data set, considering the 1979-2005 period. Values greater than |1 × 104 | are
saturated
3.6.5
CMIP5 SST networks
We now compare the HadISST network with networks constructed using SST anomalies
from two coupled models participating in CMIP5. Our goal is to exemplify the information that our methodology can provide when applied to model outputs. We do not aim at
providing an exhaustive evaluation of the model performances, which would be beyond the
scope of this chapter. We analyze the SST fields of two members of the CMIP5 historical
ensemble from the GISS-E2H and HadCM3 models over the period 1979-2005. Historical
runs aim at reproducing the observed climate from 1850 to 2005 including all forcings. We
show strength maps (Fig. 25), top-5 order cores (Fig. 26), and link maps for the area that is
related to ENSO (Fig. 27).
In all model integrations the ENSO-like area extends too far west into the warm-pool
region, and is too narrow in the simulated width, in agreement with the recent analysis by
[178]. The warm-pool is therefore not represented as an independent area anticorrelated to
the ENSO-like one. In the GISS-E2H model the strength of the ENSO area is underestimated compared to the reanalyses (see Fig. 17a), but the overall size of the area is larger
54
than observed. Both the extent and strength of the Indian Ocean area around the equator and of the areas forming the horse-shoe pattern are reduced with respect to HadISST.
Links in GISS-E2H are overall weaker than in the reanalysis (see Fig. 19a), the role of the
Atlantic is slightly overestimated, and the high negative correlations between the ENSO
region and the areas forming the horse-shoe patterns are not captured. In HadCM3, on
the other hand, the strength of the ENSO area is comparable or greater than in the observations. In this model, areas are more numerous and fragmented than in the reanalysis,
and in several cases confined within narrow latitudinal bands. This bias may result from
too weak meridional currents and/or weak trade wind across all latitudes, as suggested by
[179]. HadCM3 shows also erroneously strong links between the modeled ENSO area and
the Southern Ocean, particularly in the Pacific and Indian sectors, as evident in the s-core
decomposition and link maps. The link strengths in HadCM3 are closer to the observed,
but some areas in the southern hemisphere play a key role, unrealistically.
To conclude this comparison we present the distance from the HadISST reanalysis to
those two models, and the corresponding ARI values. Table 2 summarizes this comparison.
Dsd (N, N ′ ) from HadISST to the two GISS-E2H integrations is 0.29 and 0.37, with γ=0.35
and γ=0.45, respectively. Dsd (N, N ′ ) from HadISST to the two HadCM3 runs is 0.56 and
0.35, with γ=0.70 and γ=0.40. One of the GISS member networks displays a significantly
smaller distance from HadISST than both networks build on the HadCM3 runs. This is due
to the fact that in all networks considered the ENSO-like area overpowers all others in terms
of strength and, furthermore, there exist a few other strong areas (areas that are weaker than
the ENSO-related one by less than one order of magnitude). Focusing on the extent of the
areas in the GISS member with smaller Dsd we observe striking differences relative to the
base HadISST network: the GISS model is unable to reproduce the horse-shoe pattern, and
it splits the tropical Indian Ocean in two areas. However, it reproduces quite well the overall
size of most areas, and the strength of the largest two in the tropics, despite inverting the
relative strengths of the Indian Ocean and of the south tropical Atlantic. The south tropical
55
(a)
(b)
(c)
(d)
Figure 25: Strength maps for two members of the GISS-E2H and HadCM3 “historical”
ensemble. (a) GISS-E2H run 1 (ENSO area strength 9.8 × 104 ); (b) GISS-E2H run 2
(ENSO area strength 10.0 × 104 ); (c) HadCM3 run 1 (ENSO area strength 23.3 × 104 ) and
(d) HadCM3 run 2 (ENSO area strength 16.9 × 104 )
Atlantic area in GISS and the Indian Ocean one in HadISST have comparable size and
strength, and Dsd cannot account for their different location. The HadCM3 networks, on
56
(a)
(b)
(c)
(d)
Figure 26: Top-5 order cores identified in the SST anomaly networks for (a-b) two
GISS-E2H ensemble members and (c-d) two HadCM3 integrations
the other hand, are too fragmented and are characterized by unrealistically strong areas in
the Southern Ocean, and are penalized by Dsd for not capturing properly the size of the
strongest areas. The ARI values are 0.46 and 0.48 for the two GISS members, and 0.43 and
57
(a)
(b)
(c)
(d)
Figure 27: Link maps from the ENSO-like area in the (a-b) GISS-E2H and (c-d) HadCM3
models
0.45 for the two HadCM3 integrations. GISS again outperforms HadCM3 due to the better
representation of the shape of most areas.
As already mentioned, the relative distance and adjusted Rand index metrics, while
58
Table 2: Dsd and ARI from HadISST (1979-2005) to reanalyses, GISS-E2H and
HadCM3, and corresponding noise-to-signal ratios γ
Data set
Dsd
γ
HadISST 1950-1976
ERSST-V3
NCEP
GISS run 1
GISS run 2
HadCM3 run 1
HadCM3 run 2
0.13 0.10 0.55 0.40
0.16 0.15 0.54 0.45
0.29 0.35 0.59 0.35
0.29 0.35 0.46 0.60
0.37 0.45 0.48 0.55
0.56 0.70 0.43 0.70
0.35 0.40 0.45 0.60
ARI
γ
alone unable to quantify all the differences and similarity between networks, can be used
successfully together to rank several networks with respect to a common reference. Two
networks are similar if both ARI is large and Dsd is small, where the first constrain, given
the analysis above, can be translated into ARI ≥ 0.5 and the second into Dsd ≤ 0.25. If any
of these two conditions is not met, an analysis of the other metrics introduced can provide
useful information on the topological differences between the data sets under consideration.
3.7
Discussion and Conclusions
We developed a novel method to analyze climate variables using complex network analysis.
The nodes of the network, or areas, are formed by clusters of grid cells that are highly
homogeneous to the underlying climate variable. These areas can often be mapped into
well known patterns of climate variability.
The network inference algorithm relies on a single parameter τ that determines the
degree of homogeneity between cells in an area. The requirement of only one parameter,
combined with the fact that no link pruning in the underlying cell-level network is imposed,
adds robustness to a network’s structure and makes the comparison of different networks
more reliable.
59
The constructed climate networks are complete weighted graphs. In effect, our network framework allows for investigating and visualizing the relative strength of node interactions, which can be associated with teleconnection patterns. The inferred networks
are robust under random perturbations when adding noise to the anomaly time series of the
climate variable under investigation, to small changes in the selection of τ , to the choice of
the correlation metric used in the inference algorithm, and to the spatial resolution of the
input field.
In this chapter we constructed networks for a suite of SST and precipitation data sets,
and we analyzed them with a set of weighted metrics such as link maps, area strength and
s-core decomposition. Link maps enable us to visualize all statistical relationships between
areas, while strength maps highlight the relative importance of those relationships, identifying major climate patterns. The s-core decomposition, on the other hand, identifies the
backbone structure of a network, clustering areas into layers of increasing significance. Finally, we quantified the degree of similarity between different networks using the Adjusted
Rand Index metric and a newly introduced ”distance metric”, based on the area strength
distribution.
After analyzing three SST reanalyses and two precipitation data sets, we investigated
the network structure of two CMIP5 outputs, GISS-E2H and HadCM3, focusing on SST
anomalies. We visualized model biases in the underlying network topology and in the
spatial expression of patterns, and we quantified the distance between model outputs and
reanalyses. We found significant differences between model and observational data sets in
the shape and relative strength of areas. The most striking biases common to both models
are the excessive longitudinal extension of the area corresponding to ENSO, and the inability to represent the horse-shoe pattern in the western tropical Pacific. Links are generally
weaker than observed in the GISS-E2H model, but the relative strength, shape and size of
the main areas are in reasonable agreement with the reanalyses. The HadCM3 network,
60
on the other hand, is closer to observations in the absolute strength of its areas, but the areas are too numerous in the tropics and unrealistically strong nodes are found in the South
Pacific. In the near future, we aim at providing a comprehensive comparison of CMIP5
outputs to the climate community by extending our analysis to a much larger number of
models.
In this work we limited our analysis to linear and zero-lag correlations. The methodology presented, however, could be generalized to include the analysis of nonlinear phenomena and non-instantaneous links, by introducing nonlinear correlation metrics, such as
mutual information or the maximal information coefficient [122], and time-lags. Additionally, the set of metrics proposed can be enhanced to capture more complex relationships in
the underlying network.
3.8
Selection of threshold τ
The threshold τ is the only parameter of the proposed network construction method. It
represents the minimum average pair-wise correlation between cells of the same area, as
shown in Eq.2. Intuitively, τ controls the minimum degree of homogeneity that the climate field should have within each area. The higher the threshold, the higher the required
homogeneity, and therefore the smaller the identified areas.
Throughout this chapter, we select τ based on the following heuristic. First, we apply
the one-sided t-test for Pearson correlations at level α and with T − 2 degrees of freedom
(recall that T is the length of the anomaly time series) to calculate the minimum correlation value rα that is significant at that level [126]. For example, with α=1% and T =81
(corresponding to 27 years of SST monthly DJF averages), we get rα =0.34.
Instead of pruning any correlations r(xi , xj ) that are below rα , we estimate the expected
value of only those correlations that are larger than rα ,
r̄α , E[r(xi , xj ), r(xi , xj ) > rα ]
(A1)
For a set of k randomly chosen cells that have statistical significant correlations (at level
61
α) between them, r̄α is approximately equal, for large k, to their average pair-wise correlation. A climate area, however, is not a set of randomly chosen cells, but a geographically
connected region. So, we require that the average pair-wise correlation of cells that belong
to the same area should be higher than r̄α , i.e.,
τ = r̄α
(A2)
Note that τ is independent of the size of an area, but it depends on both α and on the
distribution of pair-wise correlations r(xi , xj ).
3.9
Pseudocode of area identification algorithm
Below we present the pseudocode for the area identification algorithm used in this chapter.
62
63
Chapter IV
ENSO IN CMIP5 SIMULATIONS: NETWORK CONNECTIVITY
FROM THE RECENT PAST TO THE TWENTY-THIRD CENTURY
4.1 Introduction
Understanding how major modes of natural variability will respond to gradual mean state
changes associated with anthropogenic warming is crucial to climate science [43]. Coupled general circulation models (CGCMs) are the most powerful and widely used tool to
address this problem, and therefore there is increasing interest in new approaches to evaluate systematically CGCM performances and their sensitivities to increased greenhouse gas
(GHG) emissions. Here we present the first extensive application of a new methodology,
built upon complex network analysis, to assess model performances in reproducing the
recent past and their topological changes in future projections under varying GHG forcing.
In recent years complex network analysis [7] has been widely applied to the investigation of complex dynamical systems, ranging from the Internet and its evolution [5] to
the human connectome [28]. Many of the complex systems studied by network analysis
are embedded in space [16] while their elements interact with each other forming complex
functional relationships. Climate is another complex system that can be represented as a
spatial network. Climate networks were first introduced by Tsonis and Roebber [153], who
applied ideas from graph theory to study the behavior of global geopotential height fields.
Since then, climate network analysis has contributed to the discovery of new dynamical
transitions and teleconnections in the climate system [94, 154, 173, 77], to the investigation of the monsoon [106, 25] and to the prediction of El Niño episodes [105]. Network
approaches have been used to identify high-energy oceanic flows representing the backbone of the climate system [52], to evaluate the collective behavior of different climate
64
65
variables [51] (see also Section 3.6.4) and structural changes as climate evolves through
time [20, 142, 144, 119]. Attempts to investigate causal dependencies have been made in
[84] and [57].
Recently climate networks have been employed also to evaluate and compare climate
models. A community detection algorithm was used to rank the performance of several
CGCMs [145], and complex network analysis was adopted to evaluate the Statistical Analogue Resampling Scheme (STARS) model against a dynamical model (COSMO-CLM) in
representing the climate of South America [61].
Here we analyze the network properties of model outputs from the Coupled Model Intercomparison Project - Phase 5 (CMIP5) spanning the 1956-2100 or 1956-2300 intervals
using the network methodology proposed in Section 3.3 of this thesis. First we identify areas - i.e. geographically connected regions homogeneous to the underlying climate variable
- that represent the nodes of the network, roughly corresponding to major climate modes.
Then we visualize, validate and compare those areas and their links or connections. Links
represent non-local dependencies between different areas. Therefore, in contrast to more
commonly used community detection techniques, our method decouples the identification
of climate nodes from the connections that those have with each other.
The methodology adopted yields several desirable properties compared to more traditional time series analysis [10, 1, 11, 43, 49, 62, 73]. It allows evaluating model performances at both local and global scales, uncovers relations in the climate system that are not
fully captured by traditional methodologies, explains known climate phenomena in terms
of the underlying network’s structure and metrics, and is not locked into a particular set of
climate indices from the outset. Its scope is similar to empirical orthogonal functions (EOF)
in that it identifies the major modes of climate variability for a given variable. In contrast
with EOFs, however, our method does not impose any orthogonality constraints and does
not mask patterns (or climate modes) of weaker variance. Regional or global changes can
be quantified in terms of addition or removal of areas, fluctuations in the weight of existing
66
links, or variations in the relative significance of different areas, providing sensitivity information. The proposed framework is fast and scalable, and has been developed to ensure
robust comparisons. Furthermore, it allows estimates of model trajectories over time and
of intra ensemble variability. The last can be objectively compared to contributions from
different forcings or mean states.
In this work we focus on global quantities, and analyze time intervals of fifty years.
The dominant mode of variability at those time and spatial scales is the El Niño Southern
Oscillation (ENSO). First we assess how CMIP5 models represent the network topology of
ENSO and its teleconnections in sea surface temperature and precipitation comparing them
to various reanalysis. Then we focus on model projections and on the stability of ENSO
and its links in the near and far future.
4.2
Climate Network Inference
The network inference is a three-step process. First we construct a “cell-level network”;
second we apply a clustering algorithm to identify the nodes or areas; third we compute
weighted links between areas to quantify their connections. All networks are inferred from
monthly averages of detrended seasonal anomalies of sea surface temperature (SST) and
precipitation but the procedure can be applied to any variable of interest. Trends are calculated with the Theil-Sen estimator [6] to reduce sensitivity to outliers and at least partially
account for the ENSO variability [135]. All datasets are interpolated to a minimum common resolution (2o lat × 2.5o lon for SST and 2.5o lat × 2.5o lon for precipitation) and only
the range [60o N − 60o S] is considered due to the large differences in reconstructions, reanalyses and models at higher latitudes. We focus on boreal winter (December to February,
DJF), when ENSO is strongest. Calculations have been repeated for summer (June to August) confirming all major outcomes.
The cell-level network is constructed computing the Pearson cross-correlation r(xi (t), xj (t))
between the anomaly time-series xi (t), xj (t) for all cell pairs i, j. All pair correlations are
67
retained and the resulting cell-level network is a complete weighted graph (i.e. a link exists
between all pairs of grid cells). This characteristic differentiates this method from most
prior work on climate network analysis where a threshold to prune non-significant correlations is applied [53, 174, 143, 155], and guarantees robust comparisons between networks
constructed on different datasets. The cell-level network is input to the clustering algorithm
and relies on a single parameter τ controlling the homogeneity of areas to the underlying
climate variable. Formally, an area Ak is a geographically connected cluster of two or more
cells satisfying
∑
r(xi (t), xj (t))
>τ ,
|Ak |(|Ak | − 1)
i̸=j
(8)
where |Ak | denotes the number of cells in the area. τ represents the minimum average
pair-wise correlation between cells of an area at a given significance level α (here α = 1%)
and is determined following the heuristic presented in Section 3.8. τ depends on α and on
the distribution of pair-wise correlations r(xi (t), xj (t)) in any given dataset.
The clustering algorithm aims also to minimize the number of areas identified; the
problem is NP-Complete, thus the algorithm relies on greedy heuristics. It identifies areas
iteratively by selecting the pair of geographically connected grid cells with the maximal
r(xi (t), xj (t)); An area is further expanded by adding the adjacent grid cell that maximizes
the average cross-correlation to the existing cells in the area. The area expansion stops
either if Eq.8 is violated or when all neighboring grid cells belong to other areas1 . Since
the algorithm relies on greedy heuristics, the solution is suboptimal and areas with at least
one pair of geographically adjacent cells, and whose union satisfies Eq. 8, are further
merged together. The methodology ensures the robustness of the area-level structure for a
wide range of significance levels, as extensively tested (see Section 4.6 and Section 3.5.3).
Finally, links are computed from the area cumulative anomalies. For a given area Ak ,
∑
the cumulative anomaly is equal to Xk (t) = i∈Ak xi (t)cos(ϕi ), with ϕi being the latitude
1
At the end of the area identification some grid cells may not belong to any area (if they violate the τ
criterion for each candidate area). Such grid cells are shown in white in all maps.
68
of cell i (the anomaly time series of any given cell i are therefore weighted by the cell size).
The weighted link w(Ak , Am ) between two areas Ak and Am is equal to the covariance
between the corresponding cumulative anomalies. Links can be positive or negative, and
are computed for all pairs of areas to obtain a complete weighted graph. Link maps allow
the visualization of the (weighted) connections between any given area and all others in the
network. Areas are also characterized by their weighted degree or strength, defined as the
∑
sum of the absolute link weights W (Ak ) = Vk̸=m |w(Ak , Am )| , where V is the set of the
areas A1 . . . AV inferred. Strongest areas correspond to major modes of climate variability.
Similarities and differences between two networks N and N ′ , each of size n grid cells,
are quantified by two metrics, the Adjusted Rand Index (ARI) and a newly defined network
distance D. The ARI measures the spatial likeness of the areas in two networks [86, 140].
Any pair of cells that belong to the same area in N and N ′ , or that belong to different areas
in both networks, contributes positively to the ARI; conversely, any pair of cells that belong
to the same area in one partition but to different areas in the other, contributes negatively.
The ARI ranges between 0 and 1, with 1 denoting perfect similarity, and ensures that the
distance between two random partitions is zero. The ARI, however, does not consider cell
anomalies and (actual) cell size.
To capture similarities or differences at the network level (i.e. in terms of link weights
and area strengths) we also define a distance D between two networks. For the calculation
of D we assign each grid cell a weight that is equal to the strength of the area the cell
belongs to. The distance D between two networks N and N ′ is then defined as
∑n
|WN (i) − WN ′ (i)|
′
.
D(N, N ) = ∑ni=1
i=1 |WN̂ (i) − WN̂ ′ (i)|
(9)
n is the number of grid cells and it includes cells that do not belong to any area. WN (i)
is the weight assigned to grid cell i in network N ; similarly for WN ′ (i). The network N̂ is
a randomized instance of N in which the cells of the latter have been randomly permuted
in the underlying grid, keeping the original weight that was assigned to them in N . The
numerator of D increases whenever a cell belongs to different areas, or, if two areas are
69
identical, whenever they have different strengths. The denominator of D is expected to
be higher than the numerator because it is very unlikely that the same grid cell of the two
randomized networks belongs to the same area. In the pathological case that two networks
differ significantly in terms of their areas but all grid cells have roughly the same weight,
the distance D will still be high (close to one, given that the numerator and denominator
will be approximately equal). It is noted that D is different than the distance metric that
was introduced in Section 3.4; the metric of Eq. 9 considers not only the strength of each
area but also its spatial extent, while the distance metric of Section 3.4 only considers the
area strength distribution.
The joint consideration of both ARI and D offers more information than any of the two
metrics alone. Specifically, ARI focuses on the spatial extent of each area (the set of cells
that belong to an area) but it ignores the area strengths. The distance D depends both on
the spatial extent and the strength of each area but it does not separate the two. So, for
instance, when two pairs of networks both have D ∼ 0 but one of them has higher ARI, we
can conclude that the latter are more similar compared to the other pair, mostly due to the
spatial extent of their areas.
Finally, given two networks N and N ′ and their respective D(N, N ′ ) and ARI(N, N ′ ),
it is possible to map both metrics to the amount of white Gaussian noise (WGN) that added
to the original climate field will produce a network N ′′ such that D(N, N ′ ) ≈ D(N, N ′′ )
and ARI(N, N ′ ) ≈ ARI(N, N ′′ ). Specifically, the anomaly time series x(t) of the original
climate field can be perturbed by adding WGN γ-times the variance of x(t). γ therefore
quantifies the noise-to-signal ratio between N and N ′ .
Several different approaches have been proposed in the literature to represent the Earth’s
climate as a network. A common element in most of them is that the network nodes are grid
cells and edge pruning is performed to remove non-significant pairwise correlations. Our
methodology differs substantially, and in the following, we contrast it with the two most
relevant climate network methods developed to assess climate model outputs [61, 145]. In
70
[61] the authors evaluate the performance of two regional models representing the South
American climate. Their method represents the climate network as a binary graph. Nodes
correspond to grid cells, weighted proportionally to their geographical size. Non-significant
links are removed by enforcing a fixed graph density and only positive correlations are considered. In [145] the authors evaluate the performance of an ensemble of CMIP3 models.
The climate network is again represented as a binary graph. In contrast to [61], both positive and negative correlations between nodes are taken into account. Network nodes are
unweighted and non-significant edges between them are removed using a fixed threshold
approach. The climate network is then used as input to a community detection algorithm. A
community is a subset of nodes that are densely interconnected relative to their connections
with the rest of the network. The identified communities are groups of grid cells forming,
possibly disjoint, geographical regions. Model differences are captured using the ARI metric, measuring the spatial similarity between the identified communities in each network.
Summarizing, in [61] models are evaluated based on their actual network structure, while
in [145] model outputs are evaluated based on their community structure.
Instead, the proposed methodology compares climate models based both on network
structure (distance metric) and on the spatial representation (ARI metric) of different climate modes of variability. Furthermore, the combination of the ARI and D metrics allows also to quantify intra-ensemble variability, while modeling the climate network as a
weighted graph enable to evaluate the magnitude and relative importance of specific teleconnections. By considering both positive and negative link weights, different functional
relationships between the elements of the climate system are considered. Similarly to community detection, grid cells are clustered into areas and this reduces the dimensionality of
the problem. However, communities may consist of geographically disjoint areas, and so
they will not show explicitly the teleconnections between these regions. In contrast, we
decouple the identification of areas from the connections they have with each other.
71
4.3
4.3.1
Results
CMIP5 Models and Observational Datasets
The network analysis is performed on realizations from twelve models of the CMIP5 catalog (Table 3), chosen among those with ensembles of at least three members in the historical
period, and with one member or more continuing to 2100, and possibly to 2300, under the
scenario with the highest Representative and Extended Concentration Pathways (RCP8.5
and ECP8.5) relative to preindustrial levels [109]. The projections are forced with emissions such that the radiative forcing induced by GHGs reaches 8.5 Wm-2 in 2100 [124].
This choice of scenario is dictated by the larger availability of modeling centers extending
their integrations to 2300. To evaluate the realism of CMIP5 CGCMs in simulating the
recent past, we consider historical ensembles over the period 1956 - 2005 [149], and we
contrast SST and precipitation model networks with the ones from the Hadley Center SST
reconstruction over the same period (HadISST) [121], and from the European Centre for
Medium-range Weather Forecasts Re-Analysis (ERA40+Interim). ERA40+Interim combines ERA-40 [159], available from 1958, with ERA-Interim [46] after 1979. Furthermore,
networks constructed from the Extended Reconstructed Sea Surface Temperature version 3
(ERSST-V3) [134], surface temperatures provided by National Centers for Environmental
Prediction (NCEP) [93], and two SST realizations of the Simple Ocean Data Assimilation reanalysis (SODA version 2.1.6 available from 1958 to 2005, and 2.2.8, covering the
1956-2005 interval) [31], are compared to HadISST to quantify the range of uncertainties
and spread in the SST observational proxies. For precipitation, networks from the NCEP
reanalysis, the CPC Merged Analysis of Precipitation (CMAP) [172] and ERA-Interim,
the last two available from 1979, are also compared to ERA40+Interim. We verified that
NCEP rainfall networks over the 1958-2005 or 1956-2005 periods are indistinguishable.
Networks are then computed for the model future projections, and for all integrations, past
and future networks are compared to quantify projected changes in climate modes (areas)
and their connections (links). Similarities and differences between HadISST and CMIP5
72
historical SST networks, ERA and modeled precipitation networks, and between historical
and projected networks for the same model member, are summarized using the Adjusted
Rand Index and the distance D.
Climate networks are constructed using detrended time series of SST and precipitation.
One question we wish to answer is if the uncertainty in the representation of major tropical
teleconnections in CMIP5 models results in greater or lesser regional impacts than the
uncertainty in temperature and rainfall trends. Therefore a brief comparison of observed
and modeled trend for the historical period, and a description of future trends during the
projected intervals is added to each subsection, prior of the network analysis.
4.3.2
The Historical Experiments: 1956-2005
Historical global mean trends in winter are summarized in Table 3 for models and observational proxies. Several models overestimate the observed SST trend over the historical
period, due to their inability to simulate the ’pause’ or ’hiatus’ observed since 1998 [70].
In the majority of integrations SSTs are characterized by cooling (or lesser warming) south
of 50o S, that results from heat uptake by the deep ocean [97], and by the greatest warming over the Atlantic and Indian Oceans between 40o S and 50o S, in agreement with the
observational proxies (Fig. 30, left panels). Most models warm above the global mean in
the Equatorial Pacific and show negative trend anomalies in the East China Sea and along
the coasts of Japan. Conversely, the observational proxies display a cooling trend along
the equatorial Pacific, and the most intense cooling in the central North Pacific [99], and
in the subpolar gyre in the North Atlantic. Global mean precipitation trends are extremely
small for models, and uncertain in the reanalyses (Fig. 30, right panels). At a regional level,
however, NCEP and ERA (and CMAP over the available period) have slopes much steeper
than any CMIP5 output, and a complex spatial patchiness that varies greatly with the period considered, indicating large interannual fluctuations not represented in the CGCMs. In
the tropics, all models but CSIRO and MIROC5 underestimate the local trends by two- or
73
threefold. In the extratropics none of the runs captures the observed variability, underestimating it by five times or more. 70% of models show an increase in rainfall in the tropical
Pacific centered around 5o S, in partial agreement with ERA.
Table 3: List of models analyzed and global mean trends in sea surface temperature and
rainfall over 1956-2005 and 2051-2100. The number of ensemble members considered
during the historical period (1956-2005) is indicated for each model. In parenthesis the
number of members with projections to 2100 under the RCP8.5 scenario. X indicates that
the model has one member continuing to 2300. Boreal winter (December to February)
global mean trends are averaged over all ensemble members (± denotes the maximum
deviation between ensemble members)
Model
Ensemble
#
BCC-CSM1.1
CanESM2
CCSM4
CNRM-CM5
CSIRO-Mk3.6.0
GFDL CM3
GISS-E2-H
HadGEM-ES
IPSL-CM5a-LR
MIROC5
MPI-ESM-LR
MRI-CGM3
3(1) X
4(4)
4(4) X
4(4) X
4(4)
4(1)
4(4) X
4(4) X
4(4) X
4(3)
3(3) X
4(1)
REANALYSIS
19562005 SST
C o /year
×10−2
1.2 ± 0.3
1.2 ± 0.2
1.4 ± 0.1
0.7 ± 0.4
1.2 ± 0.1
0.8 ± 0.2
0.6 ± 0.3
0.5 ± 0.3
1.4 ± 0.1
0.7 ± 0.1
1.0 ± 0.1
0.6 ± 0.2
1956-2005
PREC
(mm/day)/yr
×10−4
9.5 ± 2.7
8.4 ± 2.2
9.6 ± 2.6
1.1 ± 2.4
-2.0 ± 2.6
2.2 ± 2.6
2.6 ± 3.7
1.4 ± 1.6
14.0 ± 1.6
2.6 ± 2.1
11.0 ± 1.0
-0.1 ± 3.3
20512100 SST
C o /year
×10−2
3.2
4.0 ± 0.9
3.5 ± 0.2
3.5 ± 0.1
4.1 ± 0.1
4.2
2.4
4.0 ± 0.3
4.4 ± 1.3
2.9 ± 0.1
3.3 ± 0.1
3.1
1956-2005
HadISST
C o /year
×10−2
0.7
19582005 ERA
(mm/day)/yr
×10−4
53.1
1958-2005
NCEP
(mm/day)/yr
×10−4
-3.1
2051-2100
PREC
(mm/day)/yr
×10−4
26.7
21.0 ± 2.8
23.0 ± 3.4
25.0 ± 3.8
34.0 ± 3.7
31.0
14.0
21.0 ± 2.8
39.0 ± 7.5
17.0 ± 1.5
25.0 ± 3.2
31.0
Comparing the observational proxy networks in terms of their global metrics, the NCEP
surface temperature reanalysis is further apart from HadISST than any other observational
dataset, with γ > 1 (Fig. 31a). This was expected considering that we are comparing SST
with surface air temperature (masked over the ocean), and can be used as benchmark for
the model comparisons. In particular, the NCEP reanalysis overestimates the strength of
74
Figure 30: Trend anomaly maps for boreal winter in the recent past and near future.
Anomalies are computed by removing the global mean trend calculated over the months
of December to February and indicated in Table 3 from each grid cell. + and • indicate
agreement in more than 90% and 70% of models in the sign of the trend anomaly slope.
(a) HadISST. (b) ERA40+Interim. (c) Sea surface temperature (SST) averaged across
models in the historical period (1956-2005). (d) As in (c) but for rainfall. The units are
C o /year for SST and (mm/day)/year for precipitation
the areas covering the tropical Indian Ocean, and misrepresents the so-called horse-shoe
pattern in the Pacific (see strength maps in Section 4.5, Fig. 42).
Between the models, seven have at least one realization contained within the uncertainty
cloud of the reanalyses, as MIROC5, shown in Fig. 32b, that displays a network slightly
stronger but overall very similar to the observed. Of the remaining BCC, GISS-E2H and
MRI underperform in both metrics due to an underestimation of size and strength of most
areas (see Fig. 42d for MRI and Fig. 42 for one sample map from each modeling center)
and very weak connectivity between nodes (e.g. Fig. 33d and Fig. 43). Furthermore, the
area corresponding to ENSO develops too narrowly around the Equator, extends into the
Warm Pool region, and has low strength, which is directly associated to a very low ENSO
variance. The extension of the ENSO node into the west Pacific is common to HadGEM2,
but strength and connectivity of major areas compare well to observations. GFDL CM3,
shown in Fig. 32c, and IPSL display a strong, broad area in the Southern Ocean (SO)
75
south of 45o S extending from the Atlantic to the Pacific. BCC, CanESM2 and CNRM
display an analogous node, but of lesser strength. In IPSL the SO area has comparable or
greater strength than ENSO. The correlation between the SO node and ENSO is zero or
moderately positive in IPSL, and generally very high and positive in GFDL CM3 (Fig. 33c
and in Figs. 42 and 43 in Section 4.5).
The spread in ARI and D for members of the same ensemble is indicative of the model
intrinsic variability. In general, large intra-model differences are noticeable for CGCMs
further apart from HadISST and are related to the strength of the areas. D is strongly
affected by the connections that the ENSO-related node reproduces: Models with weak
ENSO areas (e.g. BCC, GISS-E2H, MRI) or for which ENSO is not the strongest mode of
variability (IPSL), are subject to greater spreads in their distance. One member may yield
nodes that are too weak, while representing correctly their relative strengths and links,
and another member may develop implausible relations between areas other than ENSO.
Coupled models capable of reproducing well the strength of the ENSO node, and for which
this node is dominant in the network, cluster their members closely together, even more so
that different observational proxies.
For precipitation, the two strongest areas observed in the tropical Pacific correspond
to ENSO and the Warm Pool (bottom rows in Fig. 32, and Fig. 44). In all reanalyses and
CMAP the node associated to ENSO extends along the equator from about 180o W to the
coast of the American continent. The spread in D between different rainfall reanalyses is
far larger than for SSTs, with NCEP displaying the least agreement with ERA40+Interim
(Fig. 31b). Additionally, the ARI is always smaller than 0.5, indicating profound differences in the node shapes and distributions also between datasets representing the observational truth. Precipitation is by nature an intermittent field in space and time; the inferred
networks have a much greater number of nodes than their SST counterpart, reducing the
chances of spatial and strength likeness. The quantification of the noise to signal ratio accounts for this inherent difference between SST and rainfall metrics. The comparison of
76
Fig. 31a and 31b reveals that models with SST networks characterized by very large D and
small ARI perform poorly also in representing rainfall. BCC, GISS-E2H, IPSL and MRI
underestimate, once more, the strength of most tropical areas (see for example Fig. 32h).
Additionally, the strongest node in BCC and IPSL occupies the center of the Pacific Ocean
and does not penetrate eastward, in MRI extends from 180o W to the west, reaching New
Guinea, and in two of the GISS-E2H members fills the whole equatorial Pacific. A reliable representation of SST variability, however, does not guarantee a realistic simulation
of precipitation distribution and interannual modulation. CSIRO and MPI, in particular,
are penalized in the metrics due to the shift of their rainfall ENSO-related area westward,
over the center of the Pacific basin; furthermore, CSIRO underestimates the size of major
nodes. CNRM outperforms all other models (and partially NCEP) with both smallest D
and largest ARI, followed by MIROC5 (Fig. 32f). They both reproduce well the patterns associated to ENSO and the Warm Pool in the equatorial Pacific, in terms of shape, and more
so strength, and are capable of simulating major connections between nodes (Fig. 45). In
all other CGCMs the area corresponding to the Warm Pool is absent, as in HadGEM2, or
shifted to the west into the Indian Ocean, as in MPI. Finally, different ensemble members
appear clustered together more tightly than the reanalyses and CMAP, independently of
their ability to represent the observations. As a result, the wide range of strengths found in
SSTs for models with a weak ENSO node is not mirrored in precipitation.
4.3.3
The RCP8.5 Experiments: 2051-2100
Near future trends are projected to be analogous in patterns, but stronger in amplitude, to
those found during the 1956-2005 period in both SST and rainfall (Fig. 34).
Globally CGCMs warm by 3.5 × 10−2 C o /year on average, and get wetter (Table 3).
They agree in the main on the areas subject to above average warming (North Pacific subpolar gyre, equatorial Pacific, Arabian Sea, the band between 40o S and 50o S) and cooling
77
Figure 31: Metric D versus ARI for climate networks during the historical period
1956-2005. (a) Sea surface temperature; reference network HadISST. (b) Precipitation;
reference network ERA40+Interim. Three levels of noise-to-signal ratios γ are also
indicated
(south of 50o S, the eastern side of the South Pacific gyre, the eastern side of the North Atlantic); or to more intense rainfall (north equatorial Indian Ocean, south equatorial Pacific
around 5o S, North Pacific gyre) and weaker precipitation (regions to the north and south of
the Pacific interconvergence zone).
Figure 35 summarizes the differences between SST and precipitation networks over
2051-2100 from their historical counterparts. Focusing on SST, all models with more than
one integration available (i.e. all but BCC, MRI and GFDL CM3) have at least one member
whose projected areas into the 21st century closely resemble those found in the historical
period. An example from the CanESM2 model is shown in the left panels of Fig. 36 (see
78
Figure 32: Strength maps of sea surface temperature for HadISST and three sample
models (top rows), and of precipitation for ERA40+Interim and the same three models
(bottom rows) during the historical period 1956-2005. Models shown: MIROC5, GFDL
CM3 and MRI. For clarity, the strength of the ENSO-related area is saturated when
exceeding the colorscale and its value is indicated at the top of each panel, together with
D and ARI from HadISST or ERA40+Interim for each of the model networks
also Fig. 46 for a sample map for each modeling center). For those projections D and
ARI from the corresponding 20th century realization are contained within the spread of the
reanalyses (i.e. D ≤ 10−4 and [ARI ≥ 0.5); The changes in topological properties and connectivity (Fig. 47) are therefore insignificant, and the response to increased GHGs is simply
the superposition of the trends, with their regional patterns, onto their historical modes of
variability. Of the remaining, GFDL CM3 displays a significant change in spatial likeness
due to the disappearance of the Southern Ocean node. In eight out of twelve models the
79
Figure 33: Sea surface temperature link maps from the ENSO-related area in black for
HadISST and the three sample models during the historical period 1956-2005. Models
shown: MIROC5, GFDL CM3 and MRI
members that differ in D display a decrease in strength of the ENSO area and its connectivity by a third or more of the historical value, as for the member of CanESM2 shown to
the right in Fig. 36. The same eight models are characterized by a more prominent tendency for eastward propagation of positive ENSO events, associated with a weakening of
the equatorial upper ocean currents, as noticed by [129]. The exceptions are MIROC5, MPI
and MRI, where the ENSO node strengthens, and IPSL, where the ENSO area weakens to
a small degree, and the Southern Ocean node becomes stronger and dominant. In those
four models moderate or no changes in propagation asymmetry have been found [129], but
the MIROC5 version analyzed differs from ours.
Precipitation networks for the RCP8.5 scenarios do not differ from their historical counterparts more than the reanalyses and CMAP over the historical period, in both ARI and
D. Only MRI and one member of GISS-E2H stand out due to a large increase in strength
of the node associated to the ENSO anomalies in the equatorial Pacific and increased connectivity (Fig. 48 and 49). In MRI the ENSO related area is five times stronger than in
the historical period, pointing to a considerable sensitivity of the model convective scheme
to SST changes, and in the GISS-E2H member is almost three times stronger, achieving a
value close to the reanalyses over the historical period.
80
Figure 34: Trend anomaly maps for boreal winter in the second half of the 21st century.
Anomalies are computed by removing the global mean trend calculated over the months
of December to February and indicated in Table 3 from each grid cell. + and • indicate
agreement in more than 90% and 70% of models in the sign of the trend anomaly slope.
(a) SST averaged across models over 2051-2100. (b) As in (a) but for rainfall. The units
are C o /year for SST and (mm/day)/year for precipitation
4.3.4
The ECP8.5 Experiments: 2101 - 2300
For models simulating the climate system evolution under the highest of the Extended Concentration Pathways (ECP8.5) [109], the network analysis is extended to 2300. According
to the ECP, the aggregated GHG emissions rise until 2100, remain constant until 2150,
drop linearly to current levels by 2250 and continue as such to 2300. Correspondingly,
the warming trend decreases with time, especially in tropical regions (Figure 37 and Table
4). Seven of the twelve models have one member continuing to 2300. The networks are
constructed on four consecutive fifty-year windows.
Figure 38 presents D and ARI for SST and precipitation, again evaluated against their
historical counterpart. For clarity, the distance for the corresponding ensemble member
during 2051-2100 is repeated. The SST networks for BCC, CCSM4, CNRM, GISS-E2H,
and HadGEM2 depart significantly from the historical period, and they are characterized
by increasingly greater distances, exceeding γ > 1.5 by 2150 or 2200. The large distances
are due to a decrease in strength of the ENSO-related area and its links to half or a quarter
of their original value (Section 4.5, Fig. 50 and 51). None of these models recovers ENSO
and its teleleconnections once emissions are reduced. In fact the ENSO area in CCSM4 first
81
Figure 35: Metric D versus ARI for climate model networks during the period 2051-2100.
(a) Sea surface temperature. (b) Precipitation. All networks are referenced to the
corresponding integration over the historical period. Three levels of noise-to-signal ratios
γ are also indicated. D and ARI between HadISST and other sea surface temperature
proxies, and ERA40+Interim and other precipitation reanalyses are repeated to provide
context
expands west into the Warm Pool region while retaining its strength and major links (20512150), and then weakens dramatically and suddenly after 2150 (Fig. 39a,c), in HadGEM2
loses its strength after 2200, and in GISS-E2H it is not the dominant mode of variability
past 2250. A different trajectory is followed by IPSL with a reduction in strength in the
tropics that culminates in 2200 and is partially recovered by 2300. Through the whole integration IPSL produces a network with a strong SO area, which at times - from 2050 to 2200
- is stronger than the ENSO node. Finally, MPI responds to the warming by strengthening
the ENSO area and its links, particularly over the Indian Ocean and the tropical Atlantic
82
Figure 36: Sea surface temperature strength maps for two members of the CanESM2
model in the historical period (1956-2005) on top, and in the 21st century (2051-2100) at
the bottom. For clarity, the strength of the ENSO-related area is saturated when exceeding
the colorscale and indicated in each panel. In the future projections D and ARI from the
corresponding historical member are also specified
in the 21st century, and oscillating between a network stronger than, or comparable to, its
historical counterpart in the following periods. After 2200 the differences between historical and projected networks are negligible in strength (less or equal to differences between
observational proxies) and minor in area likeness (Fig. 39b,d), with the ENSO node no
longer extending into the Warm Pool. The differences in the evolution of the strength of
the ENSO area are reflected in its connectivity: links from the ENSO node are dramatically
reduced in most models, while remain comparable to the recent past in MPI (Fig. 40a,b).
The time series of the cumulative anomaly over an area quantify the evolution of its
strength variance. For ENSO the variances in the historical period and during 2251-2300
in DJF are shown in Fig. 41, with the HadISST plotted as reference. All models but MPI
display a systematic, gradual reduction in mean variance, ranging from -33% in GISS-E2H
to -75% in CCSM4 by 2300. Large changes in ENSO variance (± 50%) have been found
also for millennial unforced simulations [48, 37], but without a preferred sign tendency,
while fossil corals suggest that a weaker ENSO than today dominated the last 10,000 years
[37, 30]. The ENSO variance in the historical period varies depending on the twenty-year
83
window used for the calculation, as indicated in Fig. 41. Such variability is twice as strong
as the observations in MPI and about half as observed in BCC and GISS-E2H, while is
consistent with the reanalysis in the remaining four models.
In the case of precipitation, the networks for BCC, CNRM, and HadGEM2 are unaltered in the projections, except for a mild weakening of most tropical areas in the first
two models (Fig. 52). GISS-E2H, after an initial strengthening, returns to conditions close
to the historical period by 2300. CCSM4 exhibits a fivefold decrease in the strength of
the ENSO area over two hundred years, while the nodes covering the south Pacific convergence zone and the south equatorial Indian Ocean become stronger (Fig. 39e,g). Those
areas eventually lose their connectivity with ENSO and evolve independently of it (Fig. 53).
In IPSL the strength of the ENSO node fluctuates, decreasing slowly at first and regaining
power in the last 50 years. The area also shifts position, translating eastward and occupying
first the western and central Pacific, then the central portion, and finally developing to the
east of 180o W after 2250. Finally, MPI after strengthening most nodes in the 21st century
by almost three folds, and shifting the strongest one eastward, maintains the new strengths
and intensifies the links between ENSO and all major areas, from the Warm Pool to the Indian Ocean, and the north and south Pacific (Fig. 39f,h and Fig. 40d). By 2300 the rainfall
network is much stronger and more complex than during the historical period, in spite of
the SST network resembling the 20th century one.
84
Table 4: Projected global mean trends in sea surface temperature and rainfall from 2101 to
2300. Trends are calculated over 50-year long consecutive intervals for the models with
one member extending to 2300 and for boreal winter (December to February).
Precipitation trends are in parenthesis
Model
BCC-CSM1.1
CCM4
CNRM-CM5
GISS-E2-H
HadGEM-ES
IPSL-CM5a-LR
MPI-ESM-LR
2101-2150
SST [PREC]
C o /year
×10−2
[(mm/day)/yr
×10−4 ]
2.6 [21.0]
3.0 [16.0]
3.5 [23.0]
1.6 [8.8]
3.8 [17.0]
3.8 [24.0]
3.6 [27.0]
2151-2200
SST [PREC]
C o /year
×10−2
[(mm/day)/yr
×10−4 ]
2.5 [19.0]
2.5 [19.0]
2.9 [20.0]
1.1 [7.2]
3.1 [16.0]
3.3 [27.0]
2.8 [16.0]
2201-2250
SST [PREC]
C o /year
×10−2
[(mm/day)/yr
×10−4 ]
1.3 [9.1]
1.4 [14.0]
1.9 [12.0]
0.7 [5.6]
2.0 [9.0]
2.6 [20.0]
1.5 [8.3]
2251-23000
SST [PREC]
C o /year
×10−2
[(mm/day)/yr
×10−4 ]
0.7 [5.6]
0.8 [8.6]
0.7 [9.9]
0.4 [3.4]
0.4 [1.6]
1.3 [13.0]
0.8 [11.0]
85
Figure 37: Trend anomaly maps for boreal winter in the 22nd and 23rd centuries.
Anomalies are computed by removing the global mean trend calculated over the months
of December to February and indicated in Table 4 from each grid cell. + and • indicate
agreement in more than 90% and 70% of models in the sign of the trend anomaly slope.
(a) Sea surface temperature (SST) averaged across models over 2101-2150. (b) Rainfall
averaged across models over 2101-2150. (C) As in (a) but for 2151-2200. (d) As in (b) but
for 2151-2200. (e) As in (a) but for 2201-2250. (f) As in (b) but for 2201-2250. (g) As in
(a) but for 2251-2300. (h) As in (b) but for 2251-2300. The units are C o /year for SST and
(mm/day)/year for precipitation
86
Figure 38: Metric D versus ARI for seven climate model networks from 2051 to 2300
over five consecutive 50-year periods, from 1 to 5. (a) Sea surface temperature. (b)
Precipitation. All networks are referenced to the corresponding integration over the
historical period. Three levels of noise-to-signal ratios γ are also indicated
87
Figure 39: Sea surface temperature (a-d) and precipitation (e-h) strength maps for two
models (left column CCSM4, right column MPI) in the historical period (1956-2005) and
in the future (2251-2300). For each variable the first row corresponds to the historical
experiments. For clarity, the strength of the ENSO-related area is saturated when
exceeding the colorscale and indicated at the top of each panel. D and ARI metrics of the
future projections from the corresponding historical member are also included
88
Figure 40: Link maps for sea surface temperature (a-b) and precipitation (c-d) from the
ENSO-related area in black for two models for which the ENSO projected strength
evolves in opposite ways. CCSM4 is shown on the left column and MPI on the right.
Maps are calculated over the 2251-2300 period
Figure 41: Variance of the cumulative anomalies of the ENSO area in DJF in the models
and HadISST over 1956-2005 in red, and in the models over 2251-2250 in blue. For
HadISST the time series is highly correlated (coefficient 0.94) with the Niño3.4 index
defined as the average of SST anomalies from 5o S to 5o N , and from 120o to 170o W .
Error bars around the mean variance over 50 years are determined using a 20-year sliding
window, and provide a measure of the decadal modulation of ENSO in the models over
the periods considered.
89
4.4
Discussion
In this work we have established the stability of the SST and precipitation networks for
twelve model ensembles in the CMIP5 catalog using a novel framework based on complex
network analysis. This fast, scalable and robust method provides considerable advantages
when comparing climate fields compared to more traditional approaches (e.g., predefined
climate indices or EOFs). The areas identified reduce the dimensionality of the climate
field and provide a compact and spatially embedded representation of the major modes of
climate variability. Their interdependencies are quantified using weighted links, enabling
us not only to detect their existence, but also to estimate their magnitude. With two metrics,
the Adjusted Rand Index and a network distance metric, the output of climate models can be
compactly validated against observations, intra-ensemble variability can be assessed, and
networks obtained from model outputs under different forcing conditions can be contrasted.
The applicability of the method is general and fits the objectives of any spatio-temporal data
analysis, discovering unknown functional components and their inter-dependencies.
An important distinction between earlier network-based approaches and our method is
that we construct networks that are complete and weighted graphs between homogeneous
spatial areas. The clustering of grid cells into areas, the lack of edge pruning as well as
the way in which we calculate the weights of the links between areas, makes the proposed
network inference method more robust with respect to the underlying threshold compared
to approaches that are based on (typically pruned) cell-level networks (as shown in Section
4.6).
The CMIP5 models have been validated against reanalyses over the second half of the
20th century, and compared for their projected responses under high GHG concentrations.
We focused on global quantities, and analyzed fifty years time intervals; the dominant mode
of variability at those time and spatial scales is ENSO, which induces the most severe global
impacts in surface temperatures and precipitation, among other variables. Despite decades
of research, ENSO sensitivity to changes in GHG concentrations remains undetermined in
90
the last generation of climate models [18, 39, 129].
The results of our analysis can be summarized as follows:
• Within the CMIP5 inventory, several models reproduce closely the observed SST network over the historical period (1956-2005), providing an accurate representation of
major modes of climate variability and their links, despite biases in the climatologies.
The spread in ARI and D between SST networks from members of the same ensemble is broadly consistent with the spread between different observational datasets
or reanalyses. Precipitation networks, unsurprisingly, indicate that spatial likeness
and strength are still challenging for modelers. However, the limited agreement between reanalysis products, and the evaluation of the noise-to-signal ratio suggests
that the spatial and temporal intermittency of precipitation intrinsically limits the reproducibility of its topology. For rainfall, the intra-model spread in network metrics
is generally very small; additionally, slope and patchiness of regional trends are decidedly underestimated by models. Together those outcomes suggest that CGCMs
cannot yet capture the observed natural variability of rainfall. Models characterized
by large D and small ARI in their SST fields, are also inaccurate in the representation
of precipitation, but model performing the closest to the reanalysis in each of those
fields differs.
• Changes in the network properties between the second half of the 20th and 21st
centuries are generally modest and contained within the spread between different
observational proxies in the historical period, despite substantial trends. This is especially true for the models that reproduce accurately the recent past. For those
models uncertainties are greater in the projected trends than in the response of their
modes of variability. Differences are slightly more probable in strength than in the
spatial distributions of areas. Changes in distance D greater than 30% around the
historical value signals the model tendency towards strengthening or weakening of
major climate modes, and of ENSO, its variance, and its connectivity, even when
91
limited to one ensemble member. Eight of the twelve models analyzed display substantially weaker tropical areas and connections in one or more members, implying
a decrease in ENSO strength and in potential predictability at seasonal and longer
scales in the future [47]. The weakening of the links from the ENSO area for the
majority of models analyzed is opposite to the conclusion presented in [29] and obtained characterizing El Nio activity using equatorial indices. Only two models, MPI
and MIROC5, display a clear trend towards intensifying the strength of the ENSO
area and its links.
• After 2100, models forced by the concentration pathway of the scenario with the
highest greenhouse gas concentrations in CMIP5 reveal discernible changes in the
strength of all major areas. Five out of seven follow an irreversible trajectory towards
reducing dramatically the strength and size of the ENSO node, and towards weakening all ENSO links over the 23rd century. This behavior is mirrored in precipitation
to a lesser extent. IPSL weakens as well, but partially recovers by 2300 in both SST
and rainfall. MPI by the end of the integration has a virtually unaltered network in
SST, while strength and links of the ENSO area increase substantially for precipitation. No obvious relation has been found between the trend patterns in the equatorial
tropical Pacific, indicative of mean state changes, or the global warming/wettening
trends, and the ENSO behavior in the networks [40], or between the response patterns
of clouds and precipitation to a uniform warming in an aquaplanet configuration for
three of the atmospheric components of the models analyzed [146], and their tropical
rainfall response in a coupled set up. On the other hand, an increased tendency for
eastward propagation of SST anomalies during positive ENSO events [129] in the
21st century, counterintuitively, may be symptomatic of an irreversible weakening of
ENSO in the next century and of a loss of potential predictability in the atmosphere.
Considering the global impacts of tropical teleconnections and the changes in temperature and precipitation associated with El Niño and La Niña events (e.g., [9, 85]), we
92
conclude that the uncertainty in the projected connectivity of the climate system after 2100
in many regions and for models performing well under current conditions exceeds the uncertainty associated with the equilibrium temperature change.
For the question of robustness versus sensitivity of climate patterns under different forcing scenarios, the lack of consistency between models highlights, once more, the complexity associated with having multiple, nonlinear coupled processes. By adopting a perturbationbased approach and focusing on models with networks in the recent past that compare
well to observations but diverge substantially in the future, it is possible to target more effectively efforts to understand the physical mechanisms and model parameterizations that
cause such divergences.
4.5
Supplementary strength and link maps
Strength maps for boreal winter SST and precipitation for one member of each model
considered are displayed below for the 1956-2005 historical period, the 2051-2100 RCP8.5
interval, and the ECP8.5 extension. Additionally, the corresponding link maps of the ENSO
area are provided for both fields.
93
Figure 42: Maps of area strength of sea surface temperature networks in boreal winter
(December to February) in the historical period 1956-2005 for models and reanalyses.
Only one ensemble member per model is shown. The strength of the area corresponding
to ENSO is indicated in the panel captions and saturated in black if colorbar limits are
exceeded
94
Figure 43: Link maps from the ENSO related area (in black) for sea surface temperature
networks in boreal winter (December to February) in the historical period 1956-2005 for
models and reanalyses. Only one ensemble member per model is shown
95
Figure 44: Maps of area strength for precipitation networks in boreal winter (December to
February) in the historical period 1956-2005 for models and reanalyses. Only one
ensemble member per model is shown. The strength of the area corresponding to ENSO is
indicated in the panel captions and saturated in black if colorbar limits are exceeded
96
Figure 45: Link maps from the ENSO related area (in black) for precipitation networks in
boreal winter (December to February) in the historical period 1956-2005 for models and
reanalyses. Only one ensemble member per model is shown
97
Figure 46: Maps of area strength for the sea surface temperature networks in boreal winter
(December to February) in the RCP8.5 projections (period 2051-2100). For each model,
the ensemble member shown projects into the future the historical counterpart in Fig. 42.
The strength of the area corresponding to ENSO is indicated in the panel captions and
saturated in black if colorbar limits are exceeded
98
Figure 47: Link maps from the ENSO related area (in black) for sea surface temperature
networks in boreal winter (December to February) in the RCP8.5 projections (period
2051-2100) for the ensemble members in Fig. 46
99
Figure 48: Maps of area strength for the precipitation networks in boreal winter
(December to February) in the RCP8.5 projections (period 2051-2100). For each model,
the ensemble member shown is the projection into the future of the historical counterpart
in Fig. 44. The strength of the area corresponding to ENSO is indicated in the panel
captions and saturated in black if colorbar limits are exceeded
100
Figure 49: Link maps from the ENSO related area (pictured in black) for the precipitation
networks in boreal winter (December to February) in the RCP8.5 projections (period
2051-2100) for the ensemble members in Fig. 48
101
Figure 50: Maps of area strength for the sea surface temperature networks in boreal winter
(December to February) in the ECP8.5 projections (periods 2101-2150 and 2251-2300).
For each model, the ensemble member shown projects into the future the historical
counterpart in Fig. 42. The strength of the area corresponding to ENSO is indicated in the
panel captions and saturated in black if colorbar limits are exceeded
102
Figure 51: Link maps from the ENSO related area (in black) for sea surface temperature
networks in boreal winter (December to February) in the ECP8.5 projections (periods
2101-2150 and 2251-2300) for the ensemble members in Fig. 50
103
Figure 52: Maps of area strength for the precipitation networks in boreal winter
(December to February) in the ECP8.5 projections (periods 2101-2150 and 2251-2300).
For each model, the ensemble member shown projects into the future the historical
counterpart in Fig. 44. The strength of the area corresponding to ENSO is indicated in the
panel captions and saturated in black if colorbar limits are exceeded
104
Figure 53: Link maps from the ENSO related area (in black) for the precipitation
networks in boreal winter (December to February) in the ECP8.5 projections (periods
2101-2150 and 2251-2300) for the ensemble members in Fig. 52
105
4.6
Advantages of using a complete weighted cell-level network
The network methodology adopted in this work differs from several others proposed in the
literature [143, 153] most importantly because no pruning is done to remove edges. Here
both the cell-level network and the area-level networks are modeled as complete weighted
graphs. Instead, the majority of earlier climate network inference methods construct unweighted graphs in which whenever the cross-correlation between two cells is less than a
threshold, the corresponding cells are not considered connected.
Our approach offers two substantial advantages. First, by modeling the climate network
as a weighted graph we can leverage information about the actual magnitude of the celllevel correlations. The information captured by these weights can give us insights about the
strength of specific teleconnections between different nodes of the network (e.g. between
ENSO and areas forming the horseshoe pattern). Secondly, the proposed method is more
robust compared to methods that perform pruning. Robustness is an important property,
especially for the objectives of this chapter, since we compare different climate models and
the properties of the climate system they simulate over time.
In this section, we substantiate those points by showing that link pruning makes the
network inference process less robust, based on two comparisons. Our proposed inference
method relies on a single parameter, the level of significance α. The parameter τ , which
is used in the area detection algorithm, is calculated based on α, as described in Section
3.8. Let rα denote the minimum significant correlation for a given level of significance
α. In the first comparison the input to the area identification algorithm is an unweighted
network. All pair-wise grid cell correlations that are non-significant for the given level α
are set to 0. Correlations larger than rα are set to 1 and correlations lower than - rα are set
to -1. We refer to the corresponding cell-level network as the unweighted pruned network
(such networks have been studied in [61, 145]). The second comparison is a more relaxed
version of the first; we simply remove all pair-wise cell correlations that are non-significant
for a given level of significance α but maintain the actual magnitude of the significant
106
correlations. We refer to this type of cell-level network as the weighted pruned network
(such networks have been studied in [143]). In both cases, the τ threshold is computed as
in the weighted complete network. Consequently, the area identification algorithm and the
threshold τ are the same for all three networks; the only difference is the input to the area
identification algorithm (i.e. the cell-level network).
In all comparisons our ”reference network” is constructed using the HadISST 19562005 (DJF) anomaly time series for α = 1 × 10−3 . The identified areas are presented in
Fig. 54. When the input is the unweighted pruned network the resulting areas cannot be
easily interpreted in a climate context. For example, the area corresponding to the Indian
Ocean extends to the North Pacific Ocean while the ENSO related area includes ample
extratropical regions in both hemispheres. The areas identified using a weighted pruned
network are closer to those identified using a complete weighted graph but, as we shall
prove next, the former is less robust. We cannot be certain that a certain set of areas is the
”right set”, since no ground truth exists for such an evaluation, but any network methodology used for model intercomparison should, at least, be robust to its input parameter (α
in our case) and should be insensitive (or have only small sensitivity) to changes in α. To
evaluate the robustness of the various methodologies we vary α around its standard value
and quantify the network changes in terms of the ARI metric (Fig. 55). The network inference process is more robust when the cell-level network is modeled as a complete and
weighted graph. If we prune some edges (and keep the weight of the remaining links) the
robustness of the method decreases resulting in lower ARI values. Finally, the least robust
option is to model the cell-level network as an unweighted pruned network.
107
Figure 54: Areas identified using three different cell-level networks. α was set to
1 × 10−3 . Data set: HadiSST 1956-2005
Figure 55: ARI between a reference network constructed using α = 1 × 10−3 and
networks constructed using different α values
Chapter V
δ-MAPS: FROM SPATIO-TEMPORAL DATA TO A WEIGHTED
AND LAGGED NETWORK BETWEEN FUNCTIONAL DOMAINS
5.1 Introduction
Spatio-temporal data become increasingly prevalent and important for both science (e.g.,
climate, systems neuroscience, seismology) and enterprises (e.g., the analysis of geotagged
social media activity). The spatial scale of the available data is often determined by an
arbitrary grid, which is typically larger than the true dimensionality of the underlying system. One major task is to identify the distinct semi-autonomous components of this system
and to infer their (potentially lagged and weighted) interconnections from the available
spatio-temporal data. Traditional dimensionality reduction methods, such as PCA, ICA or
clustering, have been successfully used for many years but they have known limitations
when the objective is to infer the functional network between all spatial components of the
system.
We propose δ-MAPS, an inference method that first identifies these spatial components, referred to as “domains”, and then the connections between them (§5.3). Informally,
a functional domain (or simply domain) is a spatially contiguous region that somehow
participates in the same dynamic effect or function. The exact mechanism that creates
this effect or function varies across application domains; however, the key idea is that the
functional relation between the grid cells of domain results in highly correlated temporal
activity. If we accept this premise, it follows that we should be able to identify the “epicenter” or core of a domain as a point (or subregion) at which the local homogeneity is
maximum across the entire domain. Instead of searching for the discrete boundary of a
domain, which may not exist in reality, we compute a domain as the maximum possible set
108
109
of spatially contiguous cells that include the detected core, and that satisfy a homogeneity
constraint, expressed in terms of the average pairwise cross-correlation across all cells in
the domain. Domains may be spatially overlapping. Also, some cells may not belong to
any domain.
After we identify all domains, δ-MAPS infers a functional network between them. Different domains may have correlated activity, potentially at a lag, because of direct or indirect interactions. The proposed edge inference method examines the statistical significance
of each lagged cross-correlation between two domains, applies a multiple-testing process
to control the rate of false positives, infers a range of potential lag values for each edge, and
assigns a weight to each edge based on the covariance of the corresponding two domains.
δ-MAPS is related to clustering, parcellation (or regionalization), network community
detection, multivariate statistical methods for dimensionality reduction such as PCA and
ICA, as well as functional network and lag inference methods. However, as we discuss
in §5.2 and show with synthetic data experiments in §5.4, δ-MAPS is also significantly
different than all these methods. δ-MAPS does not require the number of domains as an
input parameter, the resulting domains are spatially contiguous and potentially overlapping,
and the inferred connections between domains can be lagged and positively or negatively
weighted. Further, the distinction between grid cells that are correlated within the same
domain and grid cells that are correlated across two distinct domains allows δ-MAPS to
separate between local diffusion (or dispersion) phenomena and remote interactions that
may be due to underlying structural connections (e.g., a white-matter fiber between two
brain regions).
We illustrate the application of δ-MAPS on data from two domains: climate science
(§5.5) and neuroscience (§5.6). First, the sea-surface temperature (SST) climate network
identifies some well-known climate “tele-connections” (such as the lagged connection between the El Niño Southern Oscillation and the Indian ocean). Second, the analysis of
resting-state fMRI cortical data confirms the presence of three well-known functional brain
110
“networks” (default-mode, occipital, and motor/somatosensory), and shows that the cortical network includes a backbone of relatively few regions that are densely interconnected.
5.2
Related Work
A common approach to reduce the dimensionality of spatio-temporal data is to apply PCA
(standard or rotated) or ICA techniques. For instance, in climate science, PCA (also known
as Empirical Orthogonal Function (EOF) analysis) has been used to identify teleconnections between distinct climate regions [167]. The orthogonality between PCA components
complicates the interpretation of the results making it difficult to identify the distinct underlying modes of variability and to separate their effects, as clearly discussed in [50]. ICA
analysis is more common in the neuroscience literature, aiming to identify independent
rather than orthogonal components [88]. However, ICA does not provide a relative significance for each component, and the number of independent components should be chosen
based on some additional information about the underlying system.
Another broad family of spatio-temporal dimensionality reduction methods is based
on clustering [22, 60, 139, 177]. These algorithms can be grouped into region-growing
methods (e.g., [23, 104]), spectral (e.g., the NCUT method often applied in fMRI analysis
[44, 160] – but also see a discussion of their limitations [14]), hierarchical (e.g., [24, 150]),
and probabilistic (e.g., [14, 83]). These groups of algorithms are quite different but they
share some common characteristics: the resulting clusters may not be spatially contiguous,
they are typically non-overlapping, every grid cell needs to belong to a cluster (potentially
excluding only outliers), and the number of clusters is often required as an input parameter.
In particular, the lack of spatial contiguity makes it hard to distinguish between correlations
due to spatial diffusion (or dispersion) phenomena from correlations that are due to remote
(structural) interactions between distinct effects.
An approach of increasing popularity is to first construct a correlation-based network
111
between individual grid cells, after pruning cross-correlations that are not statistically significant – see [100]. Then, some of these methods analyze the (binary or weighted) celllevel network directly based on various centrality metrics, k-core decomposition, spectral
analysis, etc. (e.g., [52, 161]) or they first apply a community detection algorithm (potentially able to detect overlapping communities, e.g., [4, 103, 116]) on the cell-level network
and then analyze the resulting communities in terms of size, density, location, overlap, etc.
(e.g., [108, 118, 142, 143]). A community however may group together two regions that
are, first, not spatially contiguous, and second, different in terms of how they are connected
to other regions; an instance of this issue is illustrated in Fig. 58-C in the context of climate
data analysis.
5.3
δ-MAPS
The input data is generated from a spatial field X(t) sampled on an arbitrary grid G. This
grid can be modeled as a planar graph G(V, E), where each vertex in V is a grid cell
and each edge in E represents the spatial adjacency between two neighboring cells. A
set of cells A ⊆ V is spatially contiguous, denoted by IG (A)=1, if it forms a connected
component in G.
The K-neighborhood of a cell i, denoted by ΓK (i), includes i and the set of K nearest
neighbors to i according to an appropriate spatial distance metric (e.g., geodesic distance
for climate data, Euclidean distance for fMRI data). The K-neighborhood of a cell is
always spatially contiguous.
Each grid cell i is associated with a time series xi (t) of length T (t ∈ {1, . . . T }). We
assume that xi (t) is sampled from a stationary signal and denote by µ̃i and σ̃i2 its sample
mean and variance, respectively. The similarity between the activity of two cells i and j is
measured with Pearson’s cross-correlation at zero-lag,
∑T
(xi (t) − µ̃i )(xj (t) − µ̃j )
ri,j = t=1
.
T σ̃i σ̃j
Other similarity metrics could be used instead.
(10)
112
The local homogeneity at cell i is defined as the average pairwise cross-correlation
between the K + 1 cells in ΓK (i),
∑
r̂K (i) =
m̸=n∈ΓK (i) rm,n
K (K + 1)
.
(11)
Similarly, we define the homogeneity of a set of cells A as the average pairwise crosscorrelation between all distinct cells in A,
∑
r̂(A) =
5.3.1
m̸=n∈A rm,n
|A| (|A| − 1)
.
(12)
Functional domains
Intuitively, a domain A is a spatially contiguous set of cells that somehow participate in the
same dynamic effect or function. The exact mechanism that creates this effect or function
varies across application domains; however, the key premise is that the functional relation
between the cells of domain A results in highly correlated temporal activity (at zero-lag),
and thus high values of the homogeneity metric r̂(A). A given homogeneity threshold δ
examines if the homogeneity of A is sufficiently high, i.e., a domain A must have r̂(A) > δ.
(the selection of δ is discussed later in this section).
If we accept this premise, it follows that we should be able to identify the “epicenter”
or core of a domain A as a cell i ∈ A at which the local homogeneity r̂K (i) is maximum
across all cells in A (and certainly larger than δ). In general, the core of a domain may not
be a unique cell.
More formally now, suppose that we know that cell c is in the core of a domain. The
domain A rooted at c has to satisfy the following three properties: it should include cell c,
be spatially contiguous, and have higher homogeneity than δ:
c ∈ A,
IG (A) = 1,
r̂(A) > δ .
(13)
A domain may not have sharp spatial boundaries; instead, it may gradually “fade” into
other domains or regions dominated by noise. So, instead of searching for the discrete
113
boundary of a domain, it is more reasonable to compute a domain as the largest possible
set of cells that satisfies the previous three constraints.
Domain identification problem: Given the field X(t) on the spatial grid G, a core cell
c, and the threshold δ, the domain A(c) is a maximum-sized set of cells that satisfies
the three constraints of (13). In Section 5.8 we prove that the decision version of this
problem is NP-Hard.
A given spatial field X(t) may include several domains. The number of identified domains, denoted by N , depends on the threshold δ. Domains may be spatially overlapping;
this is the case when the cells of a region are significantly correlated with two or more
distinct domain cores. Also, some cells of the grid may not belong to any domain, meaning that their signal can be thought of as mostly noise (at least for the given value of δ).
Decreasing δ will typically result in a larger number of detected domain cores. Further,
as δ decreases, the spatial extent of each domain will typically increase, resulting in larger
overlaps between nearby domains.
δ can simply be a user-specified parameter for the minimum required average crosscorrelation within a domain. Another way is to calculate δ based on a statistical test for
the significance of the observed zero-lag cross-correlations. A summary of this method is
given next (described in more detail in Section 5.9). We start with a random sample of pairs
of grid cells. We then apply the statistical test described in §5.3.2 (see Equations 15 and
16) to examine if the zero-lag cross-correlation between each of these pairs passes a given
significance level α (set to 10−2 unless specified otherwise). δ is then set to the average
of the statistically significant cross-correlations in that sample. The rationale is that the
average pairwise cross-correlation among cells that belong to the same domain should be
higher than a sample average of statistically significant cross-correlations between cells
that can be anywhere on the grid.
114
5.3.1.1
Algorithm for domain identification
Given the NP-Hardness of the previous problem, we propose a greedy algorithm that runs
in two phases. In the first phase, we identify a set of cells, referred to as seeds; each seed
is a candidate core for a domain. In the second phase, each seed is initially considered as
a distinct domain. Then, an iterative and greedy algorithm attempts to identify the largest
possible domains that satisfy the three constraints of (13) through a sequence of expansion
and merging operations. The two phases are described next, while the complete pseudocode
is presented in Section 5.10. The source code (including supporting documentation) will
be available on GitHub.
Seed selection Recall that the core of a domain is a cell of maximum local homogeneity
across all cells of that domain. So, one way to detect potential core cells, while the domains
are still unknown, is to identify points at which the homogeneity field r̂K (i) is locally
maximum. Specifically, cell i is a seed if r̂K (i) > δ and r̂K (i) ≥ r̂K (j) ∀j ∈ ΓK (i). Let S
be the set of all identified seeds.
In general, a single domain may produce more than one seed because the local homogeneity field can be noisy and so it may include multiple local maxima, greater than δ.
Further, additional seeds can appear in regions where domains overlap. Consequently, it is
necessary to include a merging operation in which two or more seeds are eventually merged
into the same domain.
Note that as K decreases, the local homogeneity field becomes more noisy and so we
may detect more seeds in the same domain. On the other hand, larger values of the neighborhood size K can oversmooth the homogeneity field, removing seeds and potentially
hiding entire domains. The latter is more likely if the spatial extent of a domain is smaller
than K+1 cells. This observation implies that the spatial resolution of the given grid sets a
lower bound on the size of the functional domains that can be detected.
115
Domain-merging operation
Two candidate domains A and B can be merged if they
are spatially contiguous and if the homogeneity of their union is sufficiently high, i.e.,
r̂(A ∪ B) > δ. Whenever there is more than one pair of domains that can be merged, we
greedily choose the pair with the maximum union homogeneity; this greedy choice makes
the merged domain more likely to expand further.
The merging operation is performed initially on the set of seeds S. It is also performed
after each domain-expansion operation, whenever it is possible to do so.
Domain-expansion operation
A domain A is expanded by considering all cells that are
adjacent to A, and selecting the cell i that maximizes r̂(A ∪ {i}); again, this greedy choice
makes the expanded domain more likely to expand further.
The expansion operation is repeated in rounds. At the start of each round, domains are
sorted in decreasing order of homogeneity. Then, each domain is expanded by one cell at
a time, as previously described, in that order. After every expansion operation, we check
whether one or more merging operations are possible. A round is complete when we have
attempted to expand each domain once.
A domain can no longer expand if that would violate the homogeneity constraint δ
or if there are no other adjacent cells that can be added into the domain. The domain
identification algorithm terminates when no further expansion or merging operations are
possible.
5.3.2
The domain network
Given the N identified domains Vδ = {A1 , . . . AN }, the next step is to construct a network
Gδ (Vδ , Eδ ) between domains. Different domains may have correlated activity because of
direct or indirect interactions. We refer to Gδ as a functional network to emphasize that
the edges between domains are based on functional activity and correlations instead of
structural or physical connections (“structural network”) or causal interactions (“effective
network”).
116
We associate a domain-level signal XA (t) with each domain A. The definition of this
signal depends on the specific application field. For instance, when we analyze climate
anomaly time series, the domain-level signal is defined as the cumulative anomaly across
all cells of that domain, where the contribution of each signal is weighted by the relative
size of that cell (it depends on the cell’s latitude). For fMRI data, the domain-level signal
is defined as the average BOLD signal across the cells of that domain.
Two different domains may be located at some distance, and so they may be correlated
at a non-zero lag τ . For this reason, we examine if there is a significant cross-correlation
between different domains over a range of lags (−τmax ≤ τ ≤ τmax ). The sample crosscorrelation between domains A and B at a lag τ can be estimated as:
∑T −τ
rA,B (τ ) =
t=1
(XA (t) − µ̃A )(XB (t + τ ) − µ̃B )
,
T σ̃A σ̃B
(14)
where µ̃A and σ̃A denote sample mean and standard deviation estimates, respectively. The
selection of τmax should be large enough to include the typical signal propagation delays in
the underlying system but at the same time it should be much lower than T . The 2τmax + 1
cross-correlations for a pair of domains can be represented with a correlogram; an example
based on climate sea-surface temperature data (see §5.5) is shown in Fig. 56.
Figure 56: Correlogram between two climate time series for a lag range of ±12 months.
We show the significant correlations for a false discovery rate q = 10−3 with red. The
error bars correspond to ± one standard deviation, as estimated by Eq. (15).
117
The next step is to examine the statistical significance of the measured cross-correlation
between two domains A and B. Two uncorrelated signals can still produce a considerable
sample cross-correlation if they have a strong auto-correlation structure. This is captured by
Bartlett’s formula [26], which is an estimator for the variance of rA,B (τ ) (for a fixed value
of τ ). Under the null-hypothesis that the domain-level signals of A and B are uncorrelated,
T
∑
1
Var[rA,B (τ )] =
rA,A (τk ) rB,B (τk ) ,
T − τ τ =−T
(15)
k
where rA,A (τk ) is the autocorrelation of the time series of domain A at lag τk .
Under the previous null-hypothesis, the expected value of rA,B (τ ) is zero and the following statistic approximately follows the standard normal distribution N (0, 1):
zA,B (τ ) = √
rA,B (τ )
.
Var[rA,B (τ )]
(16)
The approximation is due to the fact that rA,B (τ ) is bounded between [−1, 1]. So, we
can now perform hypothesis testing for every pair of domains, computing a corresponding
p-value based on z.
Given that there may be several domains in Gδ , we need to control the number of false
positive edges that may result from the multiple testing problem. We do so using the False
Discovery Rate (FDR) method of Benjamini and Hochberg [19]. Specifically, given N
domains, we need to perform M =
N (N −1)
2
(2τmax + 1) tests (for each potential edge and
for each possible lag value), and compute the p-value for each test, based on (16). Given a
False Discovery Rate q (the expected value of the fraction of tests that are false positives),
the Benjamini-Hochberg procedure ranks the M p-values (pi becomes the i’th lowest pvalue) and only keeps the first m < M tests (edges), where pm is the highest p-value such
that pm < q m/M .
Lag inference and edge directionality We infer the domain-level network Gδ as follows.
Two domains A, B ∈ Vδ are connected if there is at least one lag value at which the crosscorrelation rA,B (τ ) has passed the FDR test. The standard approach in lag inference is to
118
consider the lag value τ ∗ that maximizes the absolute cross-correlation,
∗
τA,B
= arg maxτ =−τmax ...τmax {|rA,B (τ )|} .
(17)
∗
The corresponding correlation is denoted as rA,B
. There are two problems with this ap∗
proach. First, it is harder to examine the statistical significance of |rA,B
| because it is the
maximum of a set of random variables.1 Second, it is often the case that there is a range
of lag values that produce “almost maximum” cross-correlations, say within one standard
∗
deviation from each other. Focusing on τA,B
and ignoring the rest of the statistically signif-
icant and almost equal cross-correlations is not well justified.
Instead, we follow a more robust approach in which an edge of the domain-level network Gδ may be associated with a range of lag values.2 The lag range that we associate
with the edge between A and B, denoted as Rτ (A, B), is defined as the range of lags
∗
that produce significant cross-correlations, within one standard deviation from |rA,B
|. If
Rτ (A, B) includes τ =0, the edge is represented as undirected. If Rτ (A, B) includes only
positive lags, the edge is directed from A to B meaning that A’s signal precedes B’s by the
given lag range; otherwise, we associate the opposite direction with that edge. We emphasize that the directionality of the edges does not imply causality; it only refers to temporal
ordering.
Edge weight and domain strength How to assign a weight to each domain-level edge
in Gδ ? A common approach is to consider the (signed) magnitude of the cross-correlation
∗
rA,B
. This is reasonable if all domain signals have approximately the same signal power. In
addition, we propose a new edge weight that is based on the covariance of the two domains:
∗
.
w(A, B) = cov[XA (t), XB (t)] = σ̃A σ̃B rA,B
(18)
1
An analytic approach based on extreme-value statistics was proposed in [100] but it relies on several
approximations. Numerical approaches based on frequency-domain bootstrapping, on the other hand, are
computationally expensive [100, 107, 127].
2
In principle, it may be a set of lag values. In practice though, significant correlations result for a continuous range of lag values.
119
∗
The cross-correlation is computed at lag τA,B
but we could use the average of all cross-
correlations in Rτ (A, B) instead. The weight of an edge can be positive or negative depending on the sign of the corresponding cross-correlation.
Finally, the strength of a network node (domain) is defined as the sum of the absolute
weights of all edges of that node (ignoring edge directionality).
5.4
Illustration - Comparisons
In this section we validate δ-MAPS using the synthetic data set presented in section 2.1.
The parameters of δ-MAPS are set as follows: K=4 cells (up-down-left-right), and δ=0.55
(corresponds to significance level 10−2 ). In the edge inference step, the FDR threshold is
q=10% and τmax = 20.
Fig.1-B shows the local homogeneity field r̂K (i) as well as as the identified seeds (blue
dots), while Fig.1-C shows the five discovered domains. As expected, we often identify
more than one seed in the core of each domain due to noise; those seeds are eventually
merged into the same domain. The local homogeneity field is weaker in domains 4 and 5
(due to their lower variance) but a seed is still detected in those domains. Seeds also appear
at the two overlapping regions between (1,2) and (2,3) but those seeds gradually merge
with one of the domains in which they appear.
Each domain is a subset of the domain’s true expanse. The reason is that some cells
close to the periphery of each domain have very low signal-to-noise ratio (recall that the
signal decays to zero at the periphery and so the average correlation between those cells
with the rest of their domain does not exceed the δ threshold). More quantitatively, the
inferred domains include about 80%-90% of the ground-truth cells in each domain. In
non-overlapping regions this fraction is higher (85%-95% of the cells), while in overlapping regions it drops to 45%-80%. The extent of overlapping regions is harder to correctly identify especially when a domain (e.g., domain 2) overlaps with a stronger domain
120
(e.g., domains 1 or 3); the stronger domain effectively masks the signal of the weaker domain. The average pairwise cross-correlation of the cells in each domain varies between
55%-70% in the ground-truth data, while the inferred domains have slightly higher average
cross-correlation (65%-75%) due to their smaller expanse.
Finally, Fig. 1-C shows the inferred domain-level network. δ-MAPS identifies correctly
the three edges and their polarity (positive versus negative correlations). The lag ranges
always include the correct value (e.g., the edge between domains 1 and 3 has a lag range
[14,15]). Also, the three edges are correctly ordered in terms of absolute cross-correlation
magnitude: (1,3) followed by (4,5), followed by (3,5).
5.5
Application in Climate Science
We first apply δ-MAPS in the context of climate science. Climate scientists are interested in
teleconnections between different regions, and they often rely on EOF analysis to uncover
them [167]. Here, we analyze the monthly Sea-Surface Temperature (SST) field from the
HadISST dataset [121], covering 50 years (1956-2005) at a spatial resolution of 2.0o ×2.5o ,
and we focus on the latitudinal range of [60o S; 60o N ] to avoid sea-ice covered regions.
Following standard practice, we pre-process the time series to form anomalies, i.e., remove
the seasonal cycle, remove any long-term trend at each grid-point (using the Theil-Sen
estimator), and transform the signal to zero-mean at each grid point.
δ-MAPS is applied as follows. We set the local neighborhood to the K=4 nearest
cells so that we can identify the smallest possible domains at the given spatial resolution.
Second, the homogeneity threshold δ is set to 0.37 (corresponds to a significance level of
10−2 ). In the edge inference stage, the lag range is τmax =12 months (a reasonable value for
large-scale changes in atmospheric wave patterns), and the FDR threshold is set to q=3%
(we identify about 30 edges and so we expect no more than one false positive).
Fig. 57-A shows the identified domains (the color code will be explained shortly). The
spatial dimensionality has been reduced from about 6000 grid cells to 18 domains. 65%
121
of the sea-covered cells belong to at least one domain; the overlapping regions are shown
in black and they cover 2% of the grid cells that belong to a domain. The largest domain
(domain E) corresponds to the El Niño Southern Oscillation (ENSO), which is also the
most important in terms of node strength (see Fig. 57-B). Other strong nodes are domain F
(part of the “horseshoe-pattern” surrounding ENSO), domain J (Indian ocean) and domain
Q (sub-tropical Atlantic). The strength of the edges associated with ENSO are shown in
Fig. 57-C. These observations are consistent with known facts in climate science regarding
ENSO and its positive correlation with the Indian ocean and north tropical Atlantic, and
negative correlations with the regions that surround it in the Pacific (horseshoe-pattern)
[98].
Fig. 57-D shows the inferred domain-level network. The color code represents the
(signed) cross-correlation for each edge. The lag range associated with each edge is shown
in Fig. 57-E; recall that some edges are not directed because their lag range includes τ =0.
The network consists of five weakly-connected components. If we analyze the largest
component (which includes ENSO) as a signed network (i.e., some edges are positive and
some negative) we see that it is structurally balanced [56]. A graph is structurally balanced
if it does not contain cycles with an odd number of negative edges.3 A structurally balanced
network can be partitioned in a “dipole”, so that positive edges only appear within each pole
and negative edges appear only between the two poles. In Fig. 57-A, the nodes of these
two poles are colored as blue and green (the smaller disconnected components are shown
in other colors).
Focusing on the lag range of each edge, domain Q seems to play a unique role, as it
temporally precedes all other domains in the inferred network. Specifically, its activity
precedes that of domains D, E and F by about 5-10 months. The lead of south tropical
Atlantic SSTs (domain Q) on ENSO has recently received significant attention in climate
science [125]. Our results suggest that SST anomalies in domain Q may impact a large
3
For instance, if two friends are both enemies with a third person, they form a balanced social triangle.
122
portion of the climate system.
Switching to lag inference, we say that a triangle is lag-consistent if there is at least
one value in the lag range associated with each edge that would place the three nodes in
a consistent temporal distance with respect to each other. For instance, in the case of the
first triangle of Fig. 57-F, the triangle is lag-consistent if the edge from Q to F has a lag
of 8 months and the edge between E and F has lag -2 months (meaning that the direction
would be from F to E); several other values would make this triangle lag-consistent. We
have verified the lag-consistency of every triangle in the climate network. One exception
is the triangle between domains (C, D, G), shown at the bottom of Fig. 57-F. However,
the large lag in the edge from C to G can be explained with the triangle between domains
(C, E, G), which is lag-consistent. We emphasize that the temporal ordering that results
from these lag relations should not be misinterpreted as causality; we expect that several
of the edges we identify are only due to indirect correlations, not associated with a causal
interaction between the corresponding two nodes.
Figure 57: (A) The identified domains. The color of each domain corresponds to the
connected component it belongs to (the blue and green nodes belong to two different poles
of the same component). (B) Color map for domain strength. The strength of ENSO
(domain E) is shown at the top. (C) Edges to and from ENSO (shown in black). (D) The
climate network. The color of each edge represents the corresponding cross-correlation.
(E) The lag range associated with each edge. (F) Examples of lag-consistent triangles.
For comparison purposes, Fig. 58 shows the results of EOF analysis, community detection, and spatial clustering on the same dataset. The first EOF explains only about 19%
123
of the variance, implying that the SST field is too complex to be understood with only
one spatial component. On the other hand, the joint interpretation of multiple EOF components is problematic due to their orthogonal relation [50]. The anti-correlation between
ENSO and the horseshoe-pattern regions is well captured in the first component but several
other important connections, such as the negative and lagged relation between the south
subtropical Atlantic and ENSO (domains Q and E, respectively), are missed.
Figure 58: (A),(B) The first two components of EOF analysis. (C) Communities identified
by OSLOM. Each community has a unique number and color. (D) Areas identified by
spatial clustering.
Fig. 58-C shows the results of the overlapping community detection method OSLOM.
Following [142], the input to OSLOM is a correlation-based cell-level network. Correlations less than 30% are ignored. The weight of each edge is set to the maximum absolute
correlation between the corresponding two cells, across all considered lags. OSLOM identifies 22 communities. Community 6 is not spatially contiguous; it covers ENSO, the Indian
ocean, a region in the north tropical Atlantic, and a region in south Pacific. This is a general
problem with community detection methods: they cannot distinguish high correlations due
to a remote connection from correlations due to spatial proximity. In the context of climate,
the former may be due to atmospheric waves or large-scale currents while the latter may be
due to local circulations.
124
Finally, Fig. 58-D shows the results of a spatial clustering method [66], with the same
homogeneity threshold δ we use in δ-MAPS. That method ensures that every cluster (referred to as “area”) is spatially contiguous but it also requires that there is no overlap between areas and it attempts to assign each grid cell to an area. Consequently, it results in
more areas (compared to the number of domains), some of which are just artifacts of the
spatial parcellation process. Further, the spatial expanse of an area constrains the computation of subsequent areas because no overlaps are allowed.
5.6
Application in fMRI data
Functional magnetic resonance imaging (fMRI) measures fluctuations of the blood oxygenation level dependent (BOLD) signal in the brain. The dynamics of the BOLD signal in
gray matter are generally correlated with the level of neural activity. The resulting spatiotemporal field is often analyzed using ICA, clustering or network-based methods to infer
brain functional networks [136].
Here, we illustrate δ-MAPS on cortical resting-state fMRI data from a single subject
(healthy young male adult, subject-ID: 122620) from the WU-Minn Human Connectome
Project (HCP) [163]. The data acquisition parameters are described in [133]. The spatial
resolution is 2mm in each voxel dimension. The pre-processing of fMRI data requires
several steps; we use the “fix-extended” HCP minimal processing pipeline that includes
head motion correction, registration to a structural image, masking on non-brain voxels,
etc; please see [74]. MELODIC ICA and FIX are used to remove non-neuronal artifacts
(e.g., physiological noise due to cardiac and respiratory cycles). We also perform bandpass
filtering in the range 0.01-0.08Hz, as commonly done in resting-state fMRI.
In this chapter, we analyze two scanning runs of the same subject (“scan-1” and “scan2”). Each scan lasts about 14 minutes and results in a time series of length T =1200 (repetition time TR=720msec). We emphasize that major differences across different scanning
sessions of the same subject are common in fMRI; studies of functional brain networks
125
often only report group-level averages. The entire cortical volume is projected to a surface
mesh (Conte69 32K) resulting in about 65K gray-ordinate points (as opposed to volumetric
voxels) [162]. Each point of this mesh is adjacent to six other points; for this reason we set
K=6. The homogeneity threshold is set to δ=0.37 (corresponds to significance level 10−2 ).
The maximum lag range τmax is set to ±3, i.e., 2.2 seconds, and the FDR threshold is set to
q=10−4 (i.e., we expect one out of 10K edges to be a false positive). The signal of a domain
is defined as the average across all voxels in that domain.
The application of δ-MAPS results in a network with about 850 domains in scan-1
(1120 domains in scan-2). 80% of the domains are smaller than 30-40 voxels (depending
on the scan) and 5% of the domains are larger than 250 voxels. The number of edges is
4285 in scan-1 (4200 in scan-2). The absolute value of the cross-correlation associated
with each edge is typically larger than 0.5. The fraction of negative edge correlations is
about 5% in scan-1 and 20% in scan-2 suggesting that the polarity of some network edges
may be time-varying. The lag τ ∗ that corresponds to the maximum cross-correlation is 0 in
70% of the edges and ±1 in almost all other cases. 13% of the edges are directed, meaning
that lag-0 does not produce a significant correlation for that pair of domains. There is a
positive correlation between the degree of a domain and its physical size (the correlation
coefficient between degree and log10 (size) is 0.70 for scan-1 and 0.66 for scan-2). Further,
the network is assortative meaning that domains tend to connect to other domains of similar
degree (assortativity coefficient about 0.7 in both scans).
An important question is whether the δ-MAPS networks are consistent with what neuroscientists currently know about resting-state activity in the brain. During rest, certain
cortical regions that are collectively referred to as the Default-Mode Network (or DMN) are
persistently active across age and gender [176]. Other known resting-state networks are the
occipital (part of the visual system) and the motor/somatosensory (associated with planning
and execution of voluntary body motion). With the terminology of network theory, the previous “networks” would be referred to as communities within the larger functional brain
126
Figure 59: Three domain-level network communities for each scan. The first corresponds
to the default-mode network, the second to the occipital network, and the third to the
motor/somatosensory network.
network. To identify communities in the δ-MAPS network, we applied OSLOM [103].
OSLOM identifies two hierarchical levels in both scans. The first level consists of highly
overlapping communities that cover almost the entire cortex. The second hierarchical level
is more interesting, resulting in eight communities for scan-1 (nine for scan-2). Fig. 59
shows the three communities (C.1, C.2, C.3) for each scan that have the highest resemblance to the three previously mentioned resting-state networks: C.1 corresponds to the
DMN, C.2 corresponds to the occipital resting-state network, and C.3 corresponds to the
motor/somatosensory network. C.1 is quite similar across the two scanning sessions and it
clearly captures the DMN. In C.2, the extent of the network is smaller in scan-2, which is
not too surprising giving the known inter-scan variability of resting-state fMRI. C.3 is also
quite similar across the two scans and consistent with the motor/somatosensory network.
To further investigate the structure of those higher degree (and typically larger) domains, we perform k-core decomposition.4 The density of the remaining network, after
the extraction of k=14 cores from the scan-1 network (k=16 cores in scan-2) shows a
sudden increase by a factor of two. This suggests that the network includes a densely
inter-connected backbone, also known as “rich-club”. The size of this backbone is small
relative to the entire network: 130 domains in scan-1 (90 in scan-2). Similar observations
about the resting-state brain, but using voxel-level network analysis methods, have been
4
A process that starts with the original network (k=0), and it removes iteratively all nodes of degree k or
less in each round so that after the extraction of the k’th core all remaining nodes have degree larger than k.
127
previously reported [161]. Fig.60 shows the location of the backbone domains for each
hemisphere and for each scan. The regions that are usually associated with the DMN dominate the backbone of both sessions. Interestingly though, scan-1 includes the regions of
the motor/somatosensory network, while the backbone of scan-2 is missing those regions.
One possible explanation for this discrepancy is that the subject was more relaxed during
scan-2, not exerting the mental effort to stay still.
Figure 60: The domains of the backbone network for each hemisphere and scan. The
color of each domain is randomly assigned (overlaps are shown in black).
5.7
Discussion
δ-MAPS results in a correlation-based functional network. A next step could be to infer
a causal, or effective network, leveraging the framework of probabilistic graphical models.
Instead of attempting to learn the graph structure from raw data, one could use the δMAPS network as the underlying structure and then apply conditional independence tests
to remove non-causal edges (e.g., [58]). Another direction could be to combine the inferred
functional network with a structural network that shows the physical connectivity between
the identified domains. This is not hard in the case of communication networks but it also
becomes feasible for brain networks using diffusion-weighted MRI. The projection of the
observed dynamics on the underlying structure can help to characterize the actual function
and delay of each system component.
128
5.8
Identifying the largest domain is NP-complete
We are given a spatio-temporal field X(t) on a grid G, a pairwise similarity metric between
pairs of grid cells and a threshold δ. Starting from a grid cell c, the goal is to find the largest
subset of grid cells that form a single spatially connected component, and whose average
similarity exceeds the threshold δ. The spatial grid can be represented as a planar graph
G(V, E) where each grid cell is a node and edges connect adjacent grid cells. Formally we
have the following graph optimization problem:
Definition 1. Rooted Largest Connected δ-Dense Subgraph Problem (rooted LCδDS).
Given a regular (grid) graph G(V, E), a weight function w : V × V → R (where w(v, v) =
0 and symmetric), a threshold δ, and a node c ∈ V , find a maximum cardinality set of
nodes A ⊆ V such that c ∈ A, the induced subgraph is connected (IG (A) = 1) and
∑
v,u∈A w(v,u)
|A|(|A|−1)
> δ (i.e., r̂(A) > δ).
To show that rooted LCδDS is NP-hard we first consider a variant of the problem in
which the induced subgraph A has to satisfy two conditions; it has to be a connected subgraph of G, and the average weight of the edges in A has to exceed δ. More formally:
Definition 2. Largest Connected δ-Dense Subgraph Problem (LCδDS). Given a regular (grid) graph G(V, E), a weight function w : V × V → R (where w(v, v) = 0 and
symmetric), and a threshold δ, find a maximum cardinality set of nodes A ⊆ V such that
IG (A) = 1 and r̂(A) > δ.
To show that LCδDS is NP-hard we use a reduction of the densest connected k subgraph
problem.
Definition 3. Densest Connected k-Subgraph Problem (DCkS). Decision version:
Given a graph G(V, E), and positive integers k and j, does there exist an induced subgraph on k vertices such that this subgraph has at least j edges and is connected?
DCkS (also referred to as the connected h-clustering problem) has been shown to be
NP-complete on general graphs [42], as well as on planar graphs [96]. DCkS is polynomially time solvable for subclasses of planar graphs of bounded tree width [12]. Grid
129
graphs, which are the type of graphs that arise in our application domains, are planar bipartite graphs, with non-fixed tree width, and no positive results are known for this subclass
of planar graphs. The work on approximating densest/heaviest connected k-subgraphs is
relatively very limited (see recent theoretical result [36]). It is easy to show that the DCkS
problem can be easily reduced to an instance of the decision version of the LCδDS problem,
and hence it is also NP-complete even on planar graphs.
LEMMA 1. The decision version of the LCδDS problem is NP-complete on planar
graphs.
PROOF. This can be shown via a reduction from the DCkS. We reduce an instance <
G, k, j > of the DCkS to an LCδDS instance by using the same graph G, setting w(u, v) =
I(u, v) ∈ E (w(u, v) is 1 if and only if the pair of nodes is connected by an edge), and
δ = j/k(k − 1).
Now it is easy to show that rooted LCδDS is also NP-hard. If a poly-time algorithm
existed for the rooted LCδDS, then by calling it |V | times with each of the nodes of the
graph, we would obtain in poly-time a solution to the NP-hard LCδDS.
5.9
Heuristic for the selection of δ
The threshold δ intuitively determines the minimum degree of homogeneity that the underlying field must have within each domain. The higher the threshold, the higher the required
homogeneity and therefore, the smaller the size of the identified domains.
To select δ we propose the following heuristic. We start with a random sample of pairs
of grid cells and for each pair i, j we compute the Pearson correlation ri,j at zero lag. To
assess the significance of each correlation we use Bartlett’s formula [26]. Under the null
hypothesis of no coupling ri,j should have zero mean, and a reasonable estimate of its
variance is given by
V ar[ri,j ] =
T
1 ∑
ri,i (τk )rj,j (τk ) ,
T τ =−T
k
(19)
130
here ri,i (τk ) is the autocorrelation of the time series of grid cell i at lag τk . The scaled
values zi,j = √
ri,j
V ar[ri,j ]
should approximately follow a standard normal distribution. To
assess the significance of each correlation we perform a one sided z-test for a given level
of significance α.
The threshold δ is set as the average of all significant correlations. A domain is a set
of spatially contiguous grid cells, thus we require that the mean pairwise correlation for
the cells belonging to the same domain to be higher than the mean pair-wise correlation
of randomly picked pairs of grid cells. δ depends on the choice of the significance level
α, on the autocorrelation structure of the underlying time series and on the correlation
distribution of the field.
5.10 δ-MAPS pseudocode
131
132
Chapter VI
CONCLUSIONS & FUTURE WORK
6.1 Conclusions
In this thesis we propose a framework for the analysis of spatio-temporal systems based on
complex network analysis. The proposed framework consists of two methods geo-Cluster
and δ-MAPS, whose scope is to uncover the semi-autonomous functional components of a
spatio-temporal system and infer their interactions.
The first method, geo-Cluster, identifies the functional components of the system, referred to as “areas”, and models their interconnections as a complete and weighted network. An area is a spatially contiguous, non-overlapping, set of grid cells that conform to
a homogeneity constraint. This homogeneity constraint requires that the average pairwise
correlation between the grid cells in an area’s scope to be larger than a pre-defined threshold - the only parameter of the proposed algorithm. The requirement of only one parameter,
combined with the fact that no link pruning in the underlying cell-level network is imposed,
adds robustness to a network’s structure and makes the comparison of different networks
more reliable. At a second step, we infer a network between these areas. The network
is modeled as a complete and weighted graph. The weight of an edge, measured as the
covariance between the time series of the two corresponding areas, captures the magnitude
of the interaction between the functional components of the system.
The proposed method is robust to noise, the resolution of the spatio-temporal data set,
the measure that quantifies similarities between the grid cell time series, and to perturbations of the homogeneity parameter.
The second method, δ-MAPS, allows for the functional components of the system (referred to as “domains”) to overlap and accounts for non-instantaneous interactions between
133
134
them. δ-MAPS is based on the premise that the functional relation between the grid cells
of a domain results in highly correlated temporal activity. To this end it first identifies the
“epicenter” or “core” of a domain as a point (or set of points) where the local homogeneity
is maximum across the entire domain. Instead of searching for the discrete boundary of
a domain, which may not exist in reality, we compute a domain as the maximum possible
set of spatially contiguous cells that include the detected core, and that satisfy a homogeneity constraint, expressed in terms of the average pairwise cross-correlation across the
domain’s scope. At a second step, δ-MAPS infers a functional network. Different domains
may have correlated activity, potentially at a lag, because of direct or indirect interactions.
The proposed edge inference method examines the statistical significance of each lagged
cross-correlation between two domains, applies a multiple-testing process to control the
rate of false positives, infers a range of potential lag values for each edge, and assigns a
weight to each edge based on the covariance of the corresponding two domains.
δ-MAPS does not require the number of domains as an input parameter, the resulting
domains are spatially contiguous and potentially overlapping, and the inferred connections
between domains can be lagged and positively or negatively weighted. Further, the distinction between grid cells that are correlated within the same domain and grid cells that are
correlated across two distinct domains allows δ-MAPS to separate between local diffusion
(or dispersion) phenomena and remote interactions that may be due to underlying structural
connections (e.g., a white-matter fiber between two brain regions).
δ-MAPS is not just a generalization of geo-Cluster, allowing for overlapping functional
domains and accounting for lagged interactions between them. The greedy heuristics of
geo-Cluster force each grid cell to belong to an area. Further, after a grid cell is assigned
to an area it cannot belong to any other area, potentially limiting the scope of subsequent
areas. This leads to a stronger path dependency (thus less robustness) compared to the
approach taken by the δ-MAPS algorithm.
135
The proposed framework has been applied in the fields of climate science and neuroscience. In the context of climate we present applications of geo-Cluster to identify
well known climate shifts and construct networks between different climate fields. Using
geo-Cluster we performed an extensive study analyzing twelve cutting edge climate model
ensembles from the CMIP5 output. Using two distance metrics, a network distance D and
the adjusted Rand index (ARI) we are able to rank the models in terms of their ability to
reproduce the climate of the past as well as quantify the variability between different members of the same model ensemble. When investigating the model trajectories in the future,
under a global warming scenario, we found that the uncertainty in the model trajectories is
larger than the uncertainty in the superimposed trends.
Using δ-MAPS we analyzed the temporal relationships between different functional
components of the climate system in the sea surface temperature field. We found that the
proposed method successfully uncovered many well-known climate teleconnections and
the lag associated with them. In the context of neuroscience we performed a single subject
analysis focusing on resting state fMRI data. We found that the proposed method was
able to uncover many of the well-known resting state networks. We also show how the
method identifies a small number of strongly interconnected areas forming the backbone
of the resting state network. Using synthetic data we also show how δ-MAPS overcomes
limitations of traditional dimensionality reduction techniques such as PCA/ICA, clustering
and community detection.
6.2
Future Work
Climate Networks Over Time. The proposed method can be naturally extended to construct networks over time (e.g., using a sliding window approach). To this end, we are
interested to observe the trajectories of the functional components of the system as expressed by their strength and size. Specifically, in the context of climate we are interested
to focus on the dynamics of ENSO and identify “tipping” points at which its dynamics
136
change. Such tipping points can have a global impact on the climate system.
Climate Models and Controlled Perturbation Experiments. In the context of climate we are interested to use the network framework to evaluate how the perturbations
imposed on a model’s parameters propagate to the climate scale. We are interested to first
identify the regions that are the most (or the least) affected by the perturbations, the time
scale of the propagation, and the implications for teleconnections.
Effective Connectivity. In this thesis we have limited our analysis to functional networks. A next step could be to infer a causal, or effective [130] network, leveraging the
framework of probabilistic graphical models [57, 101, 128]. Instead of attempting to learn
the graph structure from raw data, one could use the identified spatial components as the
underlying structure and then apply conditional independence tests to remove non-causal
edges.
Dynamic Networks Using Contextual Time Series Detection. A problem that arises
with fMRI measurements is that they are sampled over an extended recording period. Most
fMRI studies require that the subject will remain at rest through this period (which cannot
be guaranteed in practice). When we measure cross-correlations throughout that extended
measurement period, abrupt changes in the signal might be averaged out [151]. A proposed solution to this problem, to be able to identify these dynamic changes, is to construct
temporal networks using a sliding window approach [33]. However, the results are highly
dependent on the length of the window chosen. An alternative direction would be to automatically detect changes between two time series [34, 35]. Such changes can be tracked at
the voxel level. However, due to the high amount of noise in fMRI data, a better approach
would be to track such changes at the functional domain level.
Structural-Functional Networks. Another direction could be to combine the inferred
functional network with a structural network that shows the physical connectivity between
the identified domains. This is not hard in the case of communication networks but it also
becomes feasible for brain networks using diffusion-weighted MRI. The projection of the
observed dynamics on the underlying structure can help to characterize the actual function
and delay of each system component.
Extensions to Other Spatio-temporal Data. The applications of the proposed framework are not only limited in the fields of climate science and neuroscience. To this end, we
propose to apply the proposed framework to data describing species migration patterns (see
e.g., [91]). By understanding the processes that drive such patterns we can mitigate risks
to populations due to climate change, urban expansion and many more factors. In such a
context, the functional components that we identify will correspond to migratory regions.
The edges between the identified regions can uncover pathways of population movement.
137
REFERENCES
[1] A BRAMOV, R. V. and M AJDA , A. J., “A new algorithm for low-frequency climate
response,” Journal of the Atmospheric Sciences, vol. 66, no. 2, pp. 286–309, 2009.
[2] A DAMS , R. and B ISCHOF, L., “Seeded region growing,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, no. 6, pp. 641–647, 1994.
[3] A HLGRIMM , M. and F ORBES , R., “The impact of low clouds on surface shortwave
radiation in the ecmwf model,” Monthly Weather Review, vol. 140, no. 11, pp. 3783–
3794, 2012.
[4] A HN , Y.-Y., BAGROW, J. P., and L EHMANN , S., “Link communities reveal multiscale complexity in networks,” Nature, vol. 466, no. 7307, pp. 761–764, 2010.
[5] A KHSHABI , S. and D OVROLIS , C., “The evolution of layered protocol stacks leads
to an hourglass-shaped architecture,” in Dynamics On and Of Complex Networks,
Volume 2, pp. 55–88, Springer, 2013.
[6] A KRITAS , M. G., M URPHY, S. A., and L AVALLEY, M. P., “The Theil-Sen estimator with doubly censored data and applications to astronomy,” Journal of the
American Statistical Association, vol. 90, no. 429, pp. 170–177, 1995.
[7] A LBERT, R. and BARAB ÁSI , A.-L., “Statistical mechanics of complex networks,”
Reviews of modern physics, vol. 74, no. 1, p. 47, 2002.
[8] A LEXANDER -B LOCH , A., L AMBIOTTE , R., ROBERTS , B., G IEDD , J., G OGTAY,
N., and B ULLMORE , E., “The discovery of population differences in network community structure: new methods and applications to brain functional networks in
schizophrenia,” Neuroimage, vol. 59, no. 4, pp. 3889–3900, 2012.
[9] A LLAN , R., L INDESAY, J., PARKER , D., and OTHERS, El Niño southern oscillation
& climatic variability. CSIRO publishing, 1996.
[10] A LLEN , M. R. and S MITH , L. A., “Investigating the origins and significance of
low-frequency modes of climate variability,” Geophysical Research Letters, vol. 21,
no. 10, pp. 883–886, 1994.
[11] A NDRONOVA , N. G. and S CHLESINGER , M. E., “Objective estimation of the probability density function for climate sensitivity,” Journal of Geophysical Research:
Atmospheres, vol. 106, no. D19, pp. 22605–22611, 2001.
[12] A RNBORG , S., L AGERGREN , J., and S EESE , D., “Easy problems for treedecomposable graphs,” Journal of Algorithms, vol. 12, no. 2, pp. 308–340, 1991.
138
[13] A RTHUR , D. and VASSILVITSKII , S., “k-means++: The advantages of careful seeding,” in Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete
algorithms, pp. 1027–1035, Society for Industrial and Applied Mathematics, 2007.
[14] BALDASSANO , C., B ECK , D. M., and F EI -F EI , L., “Parcellating connectivity in
spatial maps,” PeerJ, vol. 3, p. e784, 2015.
[15] BANDUKWALA , F., “Extracting spatially and spectrally coherent regions from
multispectral images,” in Computer Vision and Pattern Recognition Workshops
(CVPRW), 2011 IEEE Computer Society Conference on, pp. 82–87, IEEE, 2011.
[16] BARTH ÉLEMY, M., “Spatial networks,” Physics Reports, vol. 499, no. 1, pp. 1–101,
2011.
[17] B ELLEC , P., P ERLBARG , V., J BABDI , S., P ÉL ÉGRINI -I SSAC , M., A NTON , J.-L.,
D OYON , J., and B ENALI , H., “Identification of large-scale networks in the brain
using fmri,” Neuroimage, vol. 29, no. 4, pp. 1231–1243, 2006.
[18] B ELLENGER , H., G UILYARDI , É., L ELOUP, J., L ENGAIGNE , M., and V IALARD ,
J., “Enso representation in climate models: from cmip3 to cmip5,” Climate Dynamics, vol. 42, no. 7-8, pp. 1999–2018, 2014.
[19] B ENJAMINI , Y. and H OCHBERG , Y., “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” Journal of the Royal Statistical
Society. Series B (Methodological), pp. 289–300, 1995.
[20] B EREZIN , Y., G OZOLCHIANI , A., G UEZ , O., and H AVLIN , S., “Stability of climate
networks with time,” Scientific reports, vol. 2, 2012.
[21] B ETZEL , R. F., B YRGE , L., H E , Y., G O ÑI , J., Z UO , X.-N., and S PORNS , O.,
“Changes in structural and functional connectivity among resting-state networks
across the human lifespan,” NeuroImage, vol. 102, pp. 345–357, 2014.
[22] B IRANT, D. and K UT, A., “ST-DBSCAN: An algorithm for clustering spatial–
temporal data,” Data & Knowledge Engineering, vol. 60, no. 1, pp. 208–221, 2007.
[23] B LUMENSATH , T., B EHRENS , T. E., and S MITH , S. M., “Resting-state fmri single
subject cortical parcellation based on region growing,” in Medical Image Computing
and Computer-Assisted Intervention–MICCAI 2012, pp. 188–195, Springer, 2012.
[24] B LUMENSATH , T., J BABDI , S., G LASSER , M. F., VAN E SSEN , D. C., U GURBIL ,
K., B EHRENS , T. E., and S MITH , S. M., “Spatially constrained hierarchical parcellation of the brain with resting-state fmri,” Neuroimage, vol. 76, pp. 313–324,
2013.
[25] B OERS , N., B OOKHAGEN , B., M ARWAN , N., K URTHS , J., and M ARENGO ,
J., “Complex networks identify spatial patterns of extreme rainfall events of the
south american monsoon system,” Geophysical Research Letters, vol. 40, no. 16,
pp. 4386–4392, 2013.
139
[26] B OX , G. E., J ENKINS , G. M., and R EINSEL , G. C., Time series analysis: forecasting and control, vol. 734. John Wiley & Sons, 2011.
[27] B RACCO , A., K UCHARSKI , F., M OLTENI , F., H AZELEGER , W., and S EVERIJNS ,
C., “Internal and forced modes of variability in the indian ocean,” Geophysical research letters, vol. 32, no. 12, 2005.
[28] B ULLMORE , E. and S PORNS , O., “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience, vol. 10,
no. 3, pp. 186–198, 2009.
[29] C AI , W., B ORLACE , S., L ENGAIGNE , M., VAN R ENSCH , P., C OLLINS , M., V EC CHI , G., T IMMERMANN , A., S ANTOSO , A., M C P HADEN , M. J., W U , L., and
OTHERS , “Increasing frequency of extreme el niño events due to greenhouse warming,” Nature Climate Change, vol. 4, no. 2, pp. 111–116, 2014.
[30] C ARR É , M., S ACHS , J. P., P URCA , S., S CHAUER , A. J., B RACONNOT, P.,
FALC ÓN , R. A., J ULIEN , M., and L AVALL ÉE , D., “Holocene history of enso variance and asymmetry in the eastern tropical pacific,” Science, vol. 345, no. 6200,
pp. 1045–1048, 2014.
[31] C ARTON , J. A. and G IESE , B. S., “A reanalysis of ocean climate using simple ocean
data assimilation (soda),” Monthly Weather Review, vol. 136, no. 8, pp. 2999–3017,
2008.
[32] C HAMBERS , D., TAPLEY, B., and S TEWART, R., “Anomalous warming in the indian ocean coincident with el nino,” Journal of Geophysical Research: Oceans,
vol. 104, no. C2, pp. 3035–3047, 1999.
[33] C HANG , C. and G LOVER , G. H., “Time–frequency dynamics of resting-state brain
connectivity measured with fmri,” Neuroimage, vol. 50, no. 1, pp. 81–98, 2010.
[34] C HEN , X. C., M UEEN , A., NARAYANAN , V. K., K ARAMPATZIAKIS , N., BANSAL ,
G., and K UMAR , V., “Online discovery of group level events in time series.,” in
SDM, pp. 632–640, SIAM, 2014.
[35] C HEN , X. C., S TEINHAEUSER , K., B ORIAH , S., C HATTERJEE , S., and K UMAR ,
V., “Contextual time series change detection.,” in SDM, pp. 503–511, SIAM, 2013.
[36] C HEN , X., H U , X., and WANG , C., “Finding connected dense k-subgraphs,” in
Theory and Applications of Models of Computation, pp. 248–259, Springer, 2015.
[37] C OBB , K. M., W ESTPHAL , N., S AYANI , H. R., WATSON , J. T., D I L ORENZO ,
E., C HENG , H., E DWARDS , R., and C HARLES , C. D., “Highly variable el niño–
southern oscillation throughout the holocene,” Science, vol. 339, no. 6115, pp. 67–
70, 2013.
140
[38] C OHEN , A. L., FAIR , D. A., D OSENBACH , N. U., M IEZIN , F. M., D IERKER ,
D., VAN E SSEN , D. C., S CHLAGGAR , B. L., and P ETERSEN , S. E., “Defining
functional areas in individual human brains using resting functional connectivity
mri,” Neuroimage, vol. 41, no. 1, pp. 45–57, 2008.
[39] C OLLINS , M., A N , S.-I., C AI , W., G ANACHAUD , A., G UILYARDI , E., J IN , F.-F.,
J OCHUM , M., L ENGAIGNE , M., P OWER , S., T IMMERMANN , A., and OTHERS,
“The impact of global warming on the tropical pacific ocean and el niño,” Nature
Geoscience, vol. 3, no. 6, pp. 391–397, 2010.
[40] C OLLINS , M. and OTHERS, “El niño-or la niña-like climate change?,” Climate Dynamics, vol. 24, no. 1, pp. 89–104, 2005.
[41] C ORMEN , T. H., L EISERSON , C. E., R IVEST, R. L., and S TEIN , C., Introduction
to algorithms, vol. 6. MIT press Cambridge, 2001.
[42] C ORNEIL , D. G. and P ERL , Y., “Clustering and domination in perfect graphs,”
Discrete Applied Mathematics, vol. 9, no. 1, pp. 27–39, 1984.
[43] C ORTI , S., G IANNINI , A., T IBALDI , S., and M OLTENI , F., “Patterns of lowfrequency variability in a three-level quasi-geostrophic model,” Climate Dynamics,
vol. 13, no. 12, pp. 883–904, 1997.
[44] C RADDOCK , R. C., JAMES , G. A., H OLTZHEIMER , P. E., H U , X. P., and M AYBERG , H. S., “A whole brain fmri atlas generated via spatially constrained spectral
clustering,” Human brain mapping, vol. 33, no. 8, pp. 1914–1928, 2012.
[45] C ROSSLEY, N. A., M ECHELLI , A., S COTT, J., C ARLETTI , F., F OX , P. T.,
M C G UIRE , P., and B ULLMORE , E. T., “The hubs of the human connectome are
generally implicated in the anatomy of brain disorders,” Brain, vol. 137, no. 8,
pp. 2382–2395, 2014.
[46] D EE , D., U PPALA , S., S IMMONS , A., B ERRISFORD , P., P OLI , P., KOBAYASHI ,
S., A NDRAE , U., BALMASEDA , M., BALSAMO , G., BAUER , P., and OTHERS,
“The era-interim reanalysis: Configuration and performance of the data assimilation
system,” Quarterly Journal of the Royal Meteorological Society, vol. 137, no. 656,
pp. 553–597, 2011.
[47] D ENG , Y. and E BERT-U PHOFF , I., “Weakening of atmospheric information flow in
a warming climate in the community climate system model,” Geophysical Research
Letters, vol. 41, no. 1, pp. 193–200, 2014.
[48] D ESER , C., P HILLIPS , A. S., T OMAS , R. A., O KUMURA , Y. M., A LEXANDER ,
M. A., C APOTONDI , A., S COTT, J. D., K WON , Y.-O., and O HBA , M., “Enso
and pacific decadal variability in the community climate system model version 4,”
Journal of Climate, vol. 25, no. 8, pp. 2622–2651, 2012.
141
[49] D IJKSTRA , H. A., Nonlinear physical oceanography: a dynamical systems approach to the large scale ocean circulation and El Nino, vol. 28. Springer Science
& Business Media, 2005.
[50] D OMMENGET, D. and L ATIF, M., “A cautionary note on the interpretation of
EOFs,” Journal of Climate, vol. 15, no. 2, pp. 216–225, 2002.
[51] D ONGES , J. F., S CHULTZ , H. C., M ARWAN , N., Z OU , Y., and K URTHS , J., “Investigating the topology of interacting networks,” The European Physical Journal
B, vol. 84, no. 4, pp. 635–651, 2011.
[52] D ONGES , J. F., Z OU , Y., M ARWAN , N., and K URTHS , J., “The backbone of the
climate network,” EPL (Europhysics Letters), vol. 87, no. 4, p. 48007, 2009.
[53] D ONGES , J. F., Z OU , Y., M ARWAN , N., and K URTHS , J., “Complex networks in
climate dynamics,” The European Physical Journal Special Topics, vol. 174, no. 1,
pp. 157–179, 2009.
[54] D OWNAR , J., C RAWLEY, A. P., M IKULIS , D. J., and DAVIS , K. D., “A multimodal
cortical network for the detection of changes in the sensory environment,” Nature
neuroscience, vol. 3, no. 3, pp. 277–283, 2000.
[55] D UQUE , J. C., R AMOS , R., and S URI ÑACH , J., “Supervised regionalization methods: A survey,” International Regional Science Review, vol. 30, no. 3, pp. 195–220,
2007.
[56] E ASLEY, D. and K LEINBERG , J., Networks, crowds, and markets: Reasoning about
a highly connected world. Cambridge University Press, 2010.
[57] E BERT-U PHOFF , I. and D ENG , Y., “Causal discovery for climate research using
graphical models,” Journal of Climate, vol. 25, no. 17, pp. 5648–5665, 2012.
[58] E BERT-U PHOFF , I. and D ENG , Y., “Causal discovery from spatio-temporal data
with applications to climate science,” in Machine Learning and Applications
(ICMLA), 2014 13th International Conference on, pp. 606–613, IEEE, 2014.
[59] E VANS , T. and L AMBIOTTE , R., “Line graphs, link partitions, and overlapping communities,” Physical Review E, vol. 80, no. 1, p. 016105, 2009.
[60] FAGHMOUS , J. H. and K UMAR , V., “Spatio-temporal data mining for climate data:
Advances, challenges, and opportunities,” in Data Mining and Knowledge Discovery
for Big Data, pp. 83–116, Springer, 2014.
[61] F ELDHOFF , J. H., L ANGE , S., VOLKHOLZ , J., D ONGES , J. F., K URTHS , J., and
G ERSTENGARBE , F.-W., “Complex networks for climate model evaluation with
application to statistical versus dynamical modeling of south american climate,” Climate Dynamics, vol. 44, no. 5-6, pp. 1567–1581, 2015.
142
[62] F OREST, C. E., S TONE , P. H., S OKOLOV, A. P., A LLEN , M. R., and W EBSTER ,
M. D., “Quantifying uncertainties in climate system properties with the use of recent
climate observations,” Science, vol. 295, no. 5552, pp. 113–117, 2002.
[63] F ORNITO , A., Z ALESKY, A., and B REAKSPEAR , M., “The connectomics of brain
disorders,” Nature Reviews Neuroscience, vol. 16, no. 3, pp. 159–172, 2015.
[64] F ORTUNATO , S., “Community detection in graphs,” Physics Reports, vol. 486, no. 3,
pp. 75–174, 2010.
[65] F OUNTALIS , I., B RACCO , A., D ILKINA , B., D OVROLIS , C., and K EILHOLZ , S.,
“{\ delta}-maps: From spatio-temporal data to a weighted and lagged network between functional domains,” arXiv preprint arXiv:1602.07249, 2016.
[66] F OUNTALIS , I., B RACCO , A., and D OVROLIS , C., “Spatio-temporal network analysis for studying climate patterns,” Climate dynamics, vol. 42, no. 3-4, pp. 879–899,
2014.
[67] F OUNTALIS , I., B RACCO , A., and D OVROLIS , C., “Enso in cmip5 simulations:
network connectivity from the recent past to the twenty-third century,” Climate Dynamics, vol. 45, no. 1-2, pp. 511–538, 2015.
[68] F OX , M. D., S NYDER , A. Z., V INCENT, J. L., C ORBETTA , M., VAN E SSEN ,
D. C., and R AICHLE , M. E., “The human brain is intrinsically organized into dynamic, anticorrelated functional networks,” Proceedings of the National Academy of
Sciences of the United States of America, vol. 102, no. 27, pp. 9673–9678, 2005.
[69] F U , K.-S. and M UI , J., “A survey on image segmentation,” Pattern recognition,
vol. 13, no. 1, pp. 3–16, 1981.
[70] F YFE , J. C., G ILLETT, N. P., and Z WIERS , F. W., “Overestimated global warming
over the past 20 years,” Nature Climate Change, vol. 3, no. 9, pp. 767–769, 2013.
[71] G AO , J., B ULDYREV, S. V., S TANLEY, H. E., and H AVLIN , S., “Networks formed
from interdependent networks,” Nature physics, vol. 8, no. 1, pp. 40–48, 2012.
[72] G HIL , M. and VAUTARD , R., “Interdecadal oscillations and the warming trend in
global temperature time series,” Nature, vol. 350, pp. 324–327, 1991.
[73] G HIL , M., A LLEN , M., D ETTINGER , M., I DE , K., KONDRASHOV, D., M ANN , M.,
ROBERTSON , A. W., S AUNDERS , A., T IAN , Y., VARADI , F., and OTHERS, “Advanced spectral methods for climatic time series,” Reviews of geophysics, vol. 40,
no. 1, 2002.
[74] G LASSER , M. F., S OTIROPOULOS , S. N., W ILSON , J. A., C OALSON , T. S., F IS CHL , B., A NDERSSON , J. L., X U , J., J BABDI , S., W EBSTER , M., P OLIMENI ,
J. R., and OTHERS, “The minimal preprocessing pipelines for the Human Connectome Project,” Neuroimage, vol. 80, pp. 105–124, 2013.
143
[75] G ORDON , C., C OOPER , C., S ENIOR , C. A., BANKS , H., G REGORY, J. M., J OHNS ,
T. C., M ITCHELL , J. F., and W OOD , R. A., “The simulation of sst, sea ice extents
and ocean heat transports in a version of the hadley centre coupled model without
flux adjustments,” Climate Dynamics, vol. 16, no. 2-3, pp. 147–168, 2000.
[76] G ORDON , E. M., L AUMANN , T. O., A DEYEMO , B., H UCKINS , J. F., K ELLEY,
W. M., and P ETERSEN , S. E., “Generation and evaluation of a cortical area parcellation from resting-state correlations,” Cerebral Cortex, p. bhu239, 2014.
[77] G OZOLCHIANI , A., H AVLIN , S., and YAMASAKI , K., “Emergence of el niño as an
autonomous component in the climate network,” Physical review letters, vol. 107,
no. 14, p. 148501, 2011.
[78] G RAHAM , N., “Decadal-scale climate variability in the tropical and north pacific
during the 1970s and 1980s: Observations and model results,” Climate Dynamics,
vol. 10, no. 3, pp. 135–162, 1994.
[79] G UO , D., “Regionalization with dynamically constrained agglomerative clustering
and partitioning (redcap),” International Journal of Geographical Information Science, vol. 22, no. 7, pp. 801–823, 2008.
[80] H ANSEN , J., S ATO , M., NAZARENKO , L., RUEDY, R., L ACIS , A., KOCH , D.,
T EGEN , I., H ALL , T., S HINDELL , D., S ANTER , B., and OTHERS, “Climate forcings
in goddard institute for space studies si2000 simulations,” Journal of Geophysical
Research: Atmospheres, vol. 107, no. D18, 2002.
[81] H ELLER , R., S TANLEY, D., Y EKUTIELI , D., RUBIN , N., and B ENJAMINI , Y.,
“Cluster-based analysis of fmri data,” NeuroImage, vol. 33, no. 2, pp. 599–608,
2006.
[82] H IKOSAKA , K., I WAI , E., S AITO , H., and TANAKA , K., “Polysensory properties of
neurons in the anterior bank of the caudal superior temporal sulcus of the macaque
monkey,” Journal of neurophysiology, vol. 60, no. 5, pp. 1615–1637, 1988.
[83] H INNE , M., E KMAN , M., JANSSEN , R. J., H ESKES , T., and VAN G ERVEN ,
M. A., “Probabilistic clustering of the human connectome identifies communities
and hubs,” PloS one, vol. 10, no. 1, p. e0117179, 2015.
[84] H LINKA , J., H ARTMAN , D., V EJMELKA , M., RUNGE , J., M ARWAN , N., K URTHS ,
J., and PALU Š , M., “Reliability of inference of directed climate networks using
conditional mutual information,” Entropy, vol. 15, no. 6, pp. 2023–2045, 2013.
[85] H OLTON , J. R., D MOWSKA , R., and P HILANDER , S. G., El Niño, La Niña, and the
southern oscillation, vol. 46. Academic press, 1989.
[86] H UBERT, L. and A RABIE , P., “Comparing partitions,” Journal of classification,
vol. 2, no. 1, pp. 193–218, 1985.
144
[87] H URRELL , J. W. and T RENBERTH , K. E., “Global sea surface temperature analyses: multiple problemsand their implications for climate analysis, modeling, and
reanalysis,” Bulletin of the American Meteorological Society, vol. 80, no. 12,
pp. 2661–2678, 1999.
[88] H YV ÄRINEN , A., “Fast and robust fixed-point algorithms for independent component analysis,” Neural Networks, IEEE Transactions on, vol. 10, no. 3, pp. 626–634,
1999.
[89] H YV ÄRINEN , A., K ARHUNEN , J., and O JA , E., Independent component analysis,
vol. 46. John Wiley & Sons, 2004.
[90] JAIN , A. K., M URTY, M. N., and F LYNN , P. J., “Data clustering: a review,” ACM
computing surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999.
[91] JAIN , N. and D ILKINA , B., “Coarse models for bird migrations using clustering and
non-stationary markov chains,” in Workshops at the Twenty-Ninth AAAI Conference
on Artificial Intelligence, 2015.
[92] J OLLIFFE , I., Principal component analysis. Wiley Online Library, 2002.
[93] K ALNAY, E., K ANAMITSU , M., K ISTLER , R., C OLLINS , W., D EAVEN , D.,
G ANDIN , L., I REDELL , M., S AHA , S., W HITE , G., W OOLLEN , J., and OTHERS,
“The ncep/ncar 40-year reanalysis project,” Bulletin of the American meteorological
Society, vol. 77, no. 3, pp. 437–471, 1996.
[94] K AWALE , J., C HATTERJEE , S., O RMSBY, D., S TEINHAEUSER , K., L IESS , S., and
K UMAR , V., “Testing the significance of spatio-temporal teleconnection patterns,”
in Proceedings of the 18th ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 642–650, ACM, 2012.
[95] K AWALE , J., L IESS , S., K UMAR , A., S TEINBACH , M., G ANGULY, A. R., S AM ATOVA , N. F., S EMAZZI , F. H., S NYDER , P. K., and K UMAR , V., “Data guided
discovery of dynamic climate dipoles.,” in CIDU, pp. 30–44, 2011.
[96] K EIL , J. M. and B RECHT, T. B., “The complexity of clustering in planar graphs,”
J. Combinatorial Mathematics and Combinatorial Computing, vol. 9, pp. 155–159,
1991.
[97] K IRKMAN IV, C. H. and B ITZ , C. M., “The effect of the sea ice freshwater flux on
southern ocean temperatures in ccsm3: Deep-ocean warming and delayed surface
warming,” Journal of Climate, vol. 24, no. 9, pp. 2224–2237, 2011.
[98] K LEIN , S. A., S ODEN , B. J., and L AU , N.-C., “Remote sea surface temperature
variations during enso: Evidence for a tropical atmospheric bridge,” Journal of Climate, vol. 12, no. 4, pp. 917–932, 1999.
[99] KOSAKA , Y. and X IE , S.-P., “Recent global-warming hiatus tied to equatorial pacific surface cooling,” Nature, vol. 501, no. 7467, pp. 403–407, 2013.
145
[100] K RAMER , M. A., E DEN , U. T., C ASH , S. S., and KOLACZYK , E. D., “Network
inference with confidence from multivariate time series,” Physical Review E, vol. 79,
no. 6, p. 061916, 2009.
[101] K RETSCHMER , M., C OUMOU , D., D ONGES , J. F., and RUNGE , J., “Using causal
effect networks to analyze different arctic drivers of mid-latitude winter circulation,”
Journal of Climate, no. 2016, 2016.
[102] K UCHARSKI , F., K ANG , I.-S., FARNETI , R., and F EUDALE , L., “Tropical pacific
response to 20th century atlantic warming,” Geophysical Research Letters, vol. 38,
no. 3, 2011.
[103] L ANCICHINETTI , A., R ADICCHI , F., R AMASCO , J. J., F ORTUNATO , S., and OTH ERS , “Finding statistically significant communities in networks,” PloS one, vol. 6,
no. 4, p. e18961, 2011.
[104] L U , Y., J IANG , T., and Z ANG , Y., “Region growing method for the analysis of
functional mri data,” NeuroImage, vol. 20, no. 1, pp. 455–465, 2003.
[105] L UDESCHER , J., G OZOLCHIANI , A., B OGACHEV, M. I., B UNDE , A., H AVLIN , S.,
and S CHELLNHUBER , H. J., “Improved el niño forecasting by cooperativity detection,” Proceedings of the National Academy of Sciences, vol. 110, no. 29, pp. 11742–
11745, 2013.
[106] M ALIK , N., B OOKHAGEN , B., M ARWAN , N., and K URTHS , J., “Analysis of spatial
and temporal extreme monsoonal rainfall over south asia using complex networks,”
Climate dynamics, vol. 39, no. 3-4, pp. 971–987, 2012.
[107] M ARTIN , E. and DAVIDSEN , J., “Estimating time delays for constructing dynamical
networks,” Nonlinear Processes in Geophysics, vol. 21, no. 5, pp. 929–937, 2014.
[108] M C G UIRE , M. P. and N GUYEN , N. P., “Community structure analysis in big climate data,” in Big Data (Big Data), 2014 IEEE International Conference on, pp. 38–
46, IEEE, 2014.
[109] M EINSHAUSEN , M., S MITH , S. J., C ALVIN , K., DANIEL , J. S., K AINUMA , M.,
L AMARQUE , J., M ATSUMOTO , K., M ONTZKA , S., R APER , S., R IAHI , K., and
OTHERS , “The rcp greenhouse gas concentrations and their extensions from 1765 to
2300,” Climatic change, vol. 109, no. 1-2, pp. 213–241, 2011.
[110] M ILLER , A. J., C AYAN , D. R., BARNETT, T. P., G RAHAM , N. E., and O BERHU BER , J. M., “The 1976–77 climate shift of the pacific ocean,” Oceanography, vol. 7,
no. 1, pp. 21–26, 1994.
[111] M ORENO -D OMINGUEZ , D., A NWANDER , A., and K N ÖSCHE , T. R., “A hierarchical method for whole-brain connectivity-based parcellation,” Human brain mapping,
vol. 35, no. 10, pp. 5000–5025, 2014.
146
[112] NADLER , B. and G ALUN , M., “Fundamental limitations of spectral clustering,” in
Advances in Neural Information Processing Systems, pp. 1017–1024, 2006.
[113] N EWMAN , M., Networks: an introduction. OUP Oxford, 2010.
[114] N EWMAN , M., BARABASI , A.-L., and WATTS , D. J., The structure and dynamics
of networks. Princeton University Press, 2006.
[115] N EWMAN , M. E. and G IRVAN , M., “Finding and evaluating community structure
in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
[116] PALLA , G., D ER ÉNYI , I., FARKAS , I., and V ICSEK , T., “Uncovering the overlapping community structure of complex networks in nature and society,” Nature,
vol. 435, no. 7043, pp. 814–818, 2005.
[117] P ELAN , A., S TEINHAEUSER , K., C HAWLA , N. V., DE A LWIS P ITTS , D. A., and
G ANGULY, A. R., “Empirical comparison of correlation measures and pruning levels in complex networks representing the global climate system,” in Computational
Intelligence and Data Mining (CIDM), 2011 IEEE Symposium on, pp. 239–245,
IEEE, 2011.
[118] P OWER , J. D., C OHEN , A. L., N ELSON , S. M., W IG , G. S., BARNES , K. A.,
C HURCH , J. A., VOGEL , A. C., L AUMANN , T. O., M IEZIN , F. M., S CHLAGGAR ,
B. L., and OTHERS, “Functional network organization of the human brain,” Neuron,
vol. 72, no. 4, pp. 665–678, 2011.
[119] R ADEBACH , A., D ONNER , R. V., RUNGE , J., D ONGES , J. F., and K URTHS , J.,
“Disentangling different types of el niño episodes by evolving climate network analysis,” Physical Review E, vol. 88, no. 5, p. 052807, 2013.
[120] R AND , W. M., “Objective criteria for the evaluation of clustering methods,” Journal
of the American Statistical association, vol. 66, no. 336, pp. 846–850, 1971.
[121] R AYNER , N., PARKER , D. E., H ORTON , E., F OLLAND , C., A LEXANDER , L.,
ROWELL , D., K ENT, E., and K APLAN , A., “Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century,”
Journal of Geophysical Research: Atmospheres (1984–2012), vol. 108, no. D14,
2003.
[122] R ESHEF, D. N., R ESHEF, Y. A., F INUCANE , H. K., G ROSSMAN , S. R., M C V EAN ,
G., T URNBAUGH , P. J., L ANDER , E. S., M ITZENMACHER , M., and S ABETI ,
P. C., “Detecting novel associations in large data sets,” science, vol. 334, no. 6062,
pp. 1518–1524, 2011.
[123] R EYNOLDS , R. W. and S MITH , T. M., “Improved global sea surface temperature
analyses using optimum interpolation,” Journal of climate, vol. 7, no. 6, pp. 929–
948, 1994.
147
[124] R IAHI , K., R AO , S., K REY, V., C HO , C., C HIRKOV, V., F ISCHER , G., K INDER MANN , G., NAKICENOVIC , N., and R AFAJ , P., “Rcp 8.5a scenario of comparatively
high greenhouse gas emissions,” Climatic Change, vol. 109, no. 1-2, pp. 33–57,
2011.
[125] RODR ÍGUEZ -F ONSECA , B., P OLO , I., G ARC ÍA -S ERRANO , J., L OSADA , T., M O HINO , E., M ECHOSO , C. R., and K UCHARSKI , F., “Are atlantic niños enhancing pacific enso events in recent decades?,” Geophysical Research Letters, vol. 36,
no. 20, 2009.
[126] ROGERS , G. S., “A course in theoretical statistics,” Technometrics, vol. 11, no. 4,
pp. 840–841, 1969.
[127] RUMMEL , C., M ÜLLER , M., BAIER , G., A MOR , F., and S CHINDLER , K., “Analyzing spatio-temporal patterns of genuine cross-correlations,” Journal of neuroscience
methods, vol. 191, no. 1, pp. 94–100, 2010.
[128] RUNGE , J., P ETOUKHOV, V., D ONGES , J. F., H LINKA , J., JAJCAY, N., V E JMELKA , M., H ARTMAN , D., M ARWAN , N., PALU Š , M., and K URTHS , J., “Identifying causal gateways and mediators in complex spatio-temporal systems,” Nature
communications, vol. 6, 2015.
[129] S ANTOSO , A., M C G REGOR , S., J IN , F.-F., C AI , W., E NGLAND , M. H., A N , S.-I.,
M C P HADEN , M. J., and G UILYARDI , E., “Late-twentieth-century emergence of the
el niño propagation asymmetry and future projections,” Nature, vol. 504, no. 7478,
pp. 126–130, 2013.
[130] S CHL ÖSSER , R., G ESIERICH , T., K AUFMANN , B., V UCUREVIC , G., H UNSCHE ,
S., G AWEHN , J., and S TOETER , P., “Altered effective connectivity during working
memory performance in schizophrenia: a study with fmri and structural equation
modeling,” Neuroimage, vol. 19, no. 3, pp. 751–763, 2003.
[131] S HEN , X., T OKOGLU , F., PAPADEMETRIS , X., and C ONSTABLE , R. T., “Groupwise whole-brain parcellation from resting-state fmri data for network node identification,” Neuroimage, vol. 82, pp. 403–415, 2013.
[132] S IMMONS , A., WALLACE , J., and B RANSTATOR , G., “Barotropic wave propagation and instability, and atmospheric teleconnection patterns,” Journal of the Atmospheric Sciences, vol. 40, no. 6, pp. 1363–1392, 1983.
[133] S MITH , S. M., B ECKMANN , C. F., A NDERSSON , J., AUERBACH , E. J., B IJSTER BOSCH , J., D OUAUD , G., D UFF , E., F EINBERG , D. A., G RIFFANTI , L., H ARMS ,
M. P., and OTHERS, “Resting-state fMRI in the human connectome project,” Neuroimage, vol. 80, pp. 144–168, 2013.
[134] S MITH , T. M., R EYNOLDS , R. W., P ETERSON , T. C., and L AWRIMORE , J.,
“Improvements to noaa’s historical merged land-ocean surface temperature analysis (1880-2006),” Journal of Climate, vol. 21, no. 10, pp. 2283–2296, 2008.
148
[135] S OLOMON , A. and N EWMAN , M., “Reconciling disparate twentieth-century indopacific ocean temperature trends in the instrumental record,” Nature Climate
Change, vol. 2, no. 9, pp. 691–699, 2012.
[136] S PORNS , O., Networks of the Brain. MIT press, 2011.
[137] S PORNS , O. and B ETZEL , R. F., “Modular brain networks,” Annual review of psychology, vol. 67, no. 1, 2015.
[138] S TAM , C. J., “Modern network science of neurological disorders,” Nature Reviews
Neuroscience, vol. 15, no. 10, pp. 683–695, 2014.
[139] S TEINBACH , M., TAN , P.-N., K UMAR , V., K LOOSTER , S., and P OTTER , C.,
“Discovery of climate indices using clustering,” in Proceedings of the ninth
ACM SIGKDD international conference on Knowledge discovery and data mining,
pp. 446–455, ACM, 2003.
[140] S TEINHAEUSER , K. and C HAWLA , N. V., “Identifying and evaluating community
structure in complex networks,” Pattern Recognition Letters, vol. 31, no. 5, pp. 413–
421, 2010.
[141] S TEINHAEUSER , K., C HAWLA , N. V., and G ANGULY, A. R., “Complex networks
in climate science: Progress, opportunities and challenges.,” in CIDU, pp. 16–26,
2010.
[142] S TEINHAEUSER , K., C HAWLA , N. V., and G ANGULY, A. R., “An exploration
of climate data using complex networks,” ACM SIGKDD Explorations Newsletter,
vol. 12, no. 1, pp. 25–32, 2010.
[143] S TEINHAEUSER , K., C HAWLA , N. V., and G ANGULY, A. R., “Complex networks
as a unified framework for descriptive analysis and predictive modeling in climate
science,” Statistical Analysis and Data Mining, vol. 4, no. 5, pp. 497–511, 2011.
[144] S TEINHAEUSER , K., G ANGULY, A. R., and C HAWLA , N. V., “Multivariate and
multiscale dependence in the global climate system revealed through complex networks,” Climate dynamics, vol. 39, no. 3-4, pp. 889–895, 2012.
[145] S TEINHAEUSER , K. and T SONIS , A. A., “A climate model intercomparison at the
dynamics level,” Climate dynamics, vol. 42, no. 5-6, pp. 1665–1670, 2014.
[146] S TEVENS , B., B ONY, S., and OTHERS, “What are climate models missing,” Science,
vol. 340, no. 6136, pp. 1053–1054, 2013.
[147] S UPEKAR , K., M ENON , V., RUBIN , D., M USEN , M., and G REICIUS , M. D., “Network analysis of intrinsic functional brain connectivity in alzheimer’s disease,” PLoS
Comput Biol, vol. 4, no. 6, p. e1000100, 2008.
[148] S WANSON , K. L. and T SONIS , A. A., “Has the climate recently shifted?,” Geophysical Research Letters, vol. 36, no. 6, 2009.
149
[149] TAYLOR , K. E., S TOUFFER , R. J., and M EEHL , G. A., “An overview of cmip5 and
the experiment design,” Bulletin of the American Meteorological Society, vol. 93,
no. 4, pp. 485–498, 2012.
[150] T HIRION , B., VAROQUAUX , G., D OHMATOB , E., and P OLINE , J.-B., “Which fmri
clustering gives good brain parcellations?,” Frontiers in neuroscience, vol. 8, 2014.
[151] T HOMPSON , G. J., M ERRITT, M. D., PAN , W.-J., M AGNUSON , M. E., G ROOMS ,
J. K., JAEGER , D., and K EILHOLZ , S. D., “Neural correlates of time-varying functional connectivity in the rat,” Neuroimage, vol. 83, pp. 826–836, 2013.
[152] T ONONI , G., M C I NTOSH , A. R., RUSSELL , D. P., and E DELMAN , G. M., “Functional clustering: identifying strongly interactive brain regions in neuroimaging
data,” Neuroimage, vol. 7, no. 2, pp. 133–149, 1998.
[153] T SONIS , A. A. and ROEBBER , P. J., “The architecture of the climate network,”
Physica A: Statistical Mechanics and its Applications, vol. 333, pp. 497–504, 2004.
[154] T SONIS , A. A., S WANSON , K., and K RAVTSOV, S., “A new dynamical mechanism
for major climate shifts,” Geophysical Research Letters, vol. 34, no. 13, 2007.
[155] T SONIS , A. A. and S WANSON , K. L., “Topology and predictability of el nino and
la nina networks,” Physical Review Letters, vol. 100, no. 22, p. 228502, 2008.
[156] T SONIS , A. A., S WANSON , K. L., and ROEBBER , P. J., “What do networks have to
do with climate?,” Bulletin of the American Meteorological Society, vol. 87, no. 5,
p. 585, 2006.
[157] T SONIS , A. A., S WANSON , K. L., and WANG , G., “On the role of atmospheric
teleconnections in climate,” Journal of Climate, vol. 21, no. 12, pp. 2990–3001,
2008.
[158] T SONIS , A. A., WANG , G., S WANSON , K. L., RODRIGUES , F. A., and
DA F ONTURA C OSTA , L., “Community structure and dynamics in climate networks,” Climate dynamics, vol. 37, no. 5-6, pp. 933–940, 2011.
[159] U PPALA , S. M., K ÅLLBERG , P., S IMMONS , A., A NDRAE , U., B ECHTOLD , V. D .,
F IORINO , M., G IBSON , J., H ASELER , J., H ERNANDEZ , A., K ELLY, G., and OTH ERS , “The era-40 re-analysis,” Quarterly Journal of the Royal Meteorological Society, vol. 131, no. 612, pp. 2961–3012, 2005.
[160] VAN D EN H EUVEL , M., M ANDL , R., and P OL , H. H., “Normalized cut group
clustering of resting-state fmri data,” PloS one, vol. 3, no. 4, p. e2001, 2008.
[161]
H EUVEL , M. P. and S PORNS , O., “Rich-club organization of the human
connectome,” The Journal of neuroscience, vol. 31, no. 44, pp. 15775–15786, 2011.
VAN DEN
150
[162] VAN E SSEN , D. C., G LASSER , M. F., D IERKER , D. L., H ARWELL , J., and C OAL SON , T., “Parcellations and hemispheric asymmetries of human cerebral cortex analyzed on surface-based atlases,” Cerebral Cortex, vol. 22, no. 10, pp. 2241–2262,
2012.
[163] VAN E SSEN , D. C., S MITH , S. M., BARCH , D. M., B EHRENS , T. E., YACOUB ,
E., U GURBIL , K., C ONSORTIUM , W.-M. H., and OTHERS, “The wu-minn human
connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013.
[164] V EJMELKA , M., P OKORN Á , L., H LINKA , J., H ARTMAN , D., JAJCAY, N., and
PALU Š , M., “Non-random correlation structures and dimensionality reduction in
multivariate climate data,” Climate Dynamics, vol. 44, no. 9-10, pp. 2663–2682,
2014.
[165] V ÉRTES , P. E., A LEXANDER -B LOCH , A. F., G OGTAY, N., G IEDD , J. N.,
R APOPORT, J. L., and B ULLMORE , E. T., “Simple models of human brain functional networks,” Proceedings of the National Academy of Sciences, vol. 109, no. 15,
pp. 5868–5873, 2012.
[166] V IDARD , A., A NDERSON , D. L., and BALMASEDA , M., “Impact of ocean observation systems on ocean analysis and seasonal forecasts,” Monthly weather review,
vol. 135, no. 2, pp. 409–429, 2007.
[167] VON S TORCH , H. and Z WIERS , F. W., Statistical analysis in climate research.
Cambridge university press, 2001.
[168] WANG , G., S WANSON , K. L., and T SONIS , A. A., “The pacemaker of major climate shifts,” Geophysical Research Letters, vol. 36, no. 7, 2009.
[169] WARD J R , J. H., “Hierarchical grouping to optimize an objective function,” Journal
of the American statistical association, vol. 58, no. 301, pp. 236–244, 1963.
[170] W IG , G. S., L AUMANN , T. O., and P ETERSEN , S. E., “An approach for parcellating human cortical areas using resting-state correlations,” Neuroimage, vol. 93,
pp. 276–291, 2014.
[171] W U , K., TAKI , Y., S ATO , K., S ASSA , Y., I NOUE , K., G OTO , R., O KADA , K.,
K AWASHIMA , R., H E , Y., E VANS , A. C., and OTHERS, “The overlapping community structure of structural brain network in young healthy individuals,” PLoS One,
vol. 6, no. 5, p. e19608, 2011.
[172] X IE , P. and A RKIN , P. A., “Global precipitation: A 17-year monthly analysis based
on gauge observations, satellite estimates, and numerical model outputs,” Bulletin of
the American Meteorological Society, vol. 78, no. 11, pp. 2539–2558, 1997.
[173] YAMASAKI , K., G OZOLCHIANI , A., and H AVLIN , S., “Climate networks around
the globe are significantly affected by el nino,” Physical review letters, vol. 100,
no. 22, p. 228501, 2008.
151
152
[174] YAMASAKI , K., G OZOLCHIANI , A., and H AVLIN , S., “Climate networks based
on phase synchronization analysis track el-nino,” Progress of Theoretical Physics
Supplement, vol. 179, pp. 178–188, 2009.
[175] YAN , X., K ELLEY, S., G OLDBERG , M., and B ISWAL , B. B., “Detecting overlapped functional clusters in resting state fmri with connected iterative scan: a graph
theory based clustering algorithm,” Journal of neuroscience methods, vol. 199, no. 1,
pp. 108–118, 2011.
[176] Y EO , B. T., K RIENEN , F. M., S EPULCRE , J., S ABUNCU , M. R., L ASHKARI ,
D., H OLLINSHEAD , M., ROFFMAN , J. L., S MOLLER , J. W., Z ÖLLEI , L., P OLI MENI , J. R., and OTHERS , “The organization of the human cerebral cortex estimated
by intrinsic functional connectivity,” Journal of neurophysiology, vol. 106, no. 3,
pp. 1125–1165, 2011.
[177] Z HANG , P., H UANG , Y., S HEKHAR , S., and K UMAR , V., “Correlation analysis of
spatial time series datasets: A filter-and-refine approach,” in Advances in Knowledge
Discovery and Data Mining, pp. 532–544, Springer, 2003.
[178] Z HANG , W. and J IN , F.-F., “Improvements in the cmip5 simulations of enso-ssta
meridional width,” Geophysical Research Letters, vol. 39, no. 23, 2012.
[179] Z HANG , W., J IN , F.-F., Z HAO , J.-X., and L I , J., “On the bias in simulated enso ssta
meridional widths of cmip3 models,” Journal of Climate, vol. 26, no. 10, pp. 3173–
3186, 2013.
Download