Functional Median Polish, with Climate Applications

advertisement
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Functional Median Polish, with Climate
Applications
Marc G. Genton
Department of Statistics, Texas A&M University
Program in Spatial Statistics
(stat.tamu.edu/pss)
Institute for Applied Mathematics and Computational Sciences
(iamcs.tamu.edu)
Based on joint work with Ying Sun
May 11, 2012
Marc G. Genton
Functional Median Polish, with Climate Applications
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Functional Median Polish
1
Motivation
2
Univariate ANOVA
3
Functional ANOVA
4
Simulation Studies
5
Applications
6
Discussion
Marc G. Genton
Functional Median Polish, with Climate Applications
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Observations and Climate Models
Observations:
provide a corroborating source of information about physical
processes being modeled.
have methodological and practical issues due to uncertainties.
Climate Models:
numerically solve systems of differential equations representing
physical relationships in the climate system.
have huge uncertainties and biases.
Scientific Questions:
How do we compare sources of variability in observations or
climate model outputs? i.e. quantification of uncertainties?
Marc G. Genton
Functional Median Polish, with Climate Applications
Spatio-Temporal Precipitation Data
Spatio-temporal precipitation data: annual total precipitation
data for U.S. from 1895 to 1997 at 11,918 weather stations.
Nine climatic regions for precipitation defined by National
Climatic Data Center.
Several areas of outliers detected by Sun and Genton (2011,
2012).
+ +++
+++ +
++
+
++
+
+
+++
+
++
+
+ NW + ++
WNC
ENC
+
+
++++
+
+ +
+
+++
+
NE
+
++++
+++ +
++
+ ++++
+
++
+ +++++
++
C
+
+
W ++ ++++++
SW
+
+
++
++
+++++
+
+
S
SE
+
+
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Analysis of Variance
Analysis of Variance (ANOVA):
An important technique for analyzing the effect of categorical
factors on a response.
It decomposes the variability in the response variable among
the different factors.
A two-way additive model: for i = 1, . . . , r , j = 1, . . . , c,
yij = µ + αi + βj + ij .
The ANOVA model can be fitted by arithmetic means (no
outliers), or medians (robust).
Marc G. Genton
Functional Median Polish, with Climate Applications
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
ANOVA Model Fitting
Fitted by means:
µ̂ = ȳ (grand effect),
α̂i = ȳi· − ȳ (row effect),
β̂j = ȳ·j − ȳ (column effect).
Fitted by medians:
Median polish (Tukey, 1970, 1977).
An iterative technique for extracting row and column effects in
a two-way table using medians rather than means.
It stops when no more changes occur in the row and column
effects, or changes are sufficiently small.
Marc G. Genton
Functional Median Polish, with Climate Applications
Median Polish Example
1
Original table: find row medians.
2
1st iteration: subtract row medians, find column medians. Grand median
in red, row effects in blue, column effects in green.
3
2nd iteration: subtract column medians, find row medians,
6
3
9
3
2
0
0
6
0
3 →
9
0
0
11
4
0
−3
−1
0
−1
5
1
0
1
6
0
0
3
→
0
9
3
0
−2
0
1
−1
4
0
−1
1
0
0
1
0
3
0
→
−3
3
4
subtract new row medians, add their medians to the grand median, find
column medians.
5
Polished table: new row and column medians are zero after two iterations.
0
0
8
0
0
−2
0
0
0
−1
4
0
−2
0
1
3
0
0
0
−3 →
8
0
0
3+0
−2
0
0
−1
4
0
−2
1
3
0
−3
3
Functional Median Polish
Observe functional data at each combination of two
categorical factors.
Examine their effects: functional row or column effects.
yijk (x) = µ(x) + αi (x) + βj (x) + ijk (x), where
i = 1, . . . , r , j = 1, . . . , c, k = 1, . . . , mij .
Constraints: mediani {αi (x)} = 0, medianj {βj (x)} = 0 and
mediani {ijk (x)} = medianj {ijk (x)} = 0 for all k.
x can be time for curves or spatial index for surfaces/images.
Iterative procedure sweeping out column and row medians.
One-way functional ANOVA can be done in a similar way.
Need to order functional data.
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Multivariate Ordering
Basic ideas of depth in functional context
provides a method to order sample curves according to
decreasing depth values,
y[1] : the deepest (most central or median) curve,
y[n] : the most outlying (least representative) curve,
y[1] , . . . , y[n] : start from the center outwards.
Usual order statistics: ordered from the smallest sample value
to the largest.
Marc G. Genton
Functional Median Polish, with Climate Applications
Band Depth for Functional Data
López-Pintado and Romo (2009) introduced the band depth
(BD) concept through a graph-based approach.
Grey area: band determined by two curves, y1 and y3 .
2
3
4
Contains the curve y2 , but does not contain y4 .
y2
1
y1
0
y(t)
y3
−3
−2
−1
y4
0.0
0.2
0.4
0.6
t
0.8
1.0
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Band Depth for Functional Data
Population version of BD (2) :
BD (2) (y , P) = P{G (y ) ⊂ B(Y1 , Y2 )}.
G (y ): graph of the curve y ,
B(Y1 , Y2 ): band delimited by 2 random curves.
The band could be delimited by more than 2 random curves,
BDJ (y , P) =
J
X
BD (j) (y , P).
j=2
Marc G. Genton
Functional Median Polish, with Climate Applications
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Sample Band Depth
Population level: BD (j) (y , P) is a probability.
Sample version of BD (j) (y , P)
(j)
BDn (y )
−1
n
=
j
X
I {G (y ) ⊆ B(yi1 , . . . , yij )},
1≤i1 <i2 <...<ij ≤n
I {·}: the indicator function,
fraction of the bands completely containing the curve y .
Sample BD: BDn,J (y ) =
(j)
j=2 BDn (y ).
PJ
Marc G. Genton
Functional Median Polish, with Climate Applications
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Modified Band Depth
López-Pintado and Romo (2009) also proposed a more flexible
definition, the modified band depth (MBD).
−1
X
n
=
I {G (y ) ⊆ B(yi1 , . . . , yij )},
j
1≤i1 <i2 <...<ij ≤n
−1
X
n
(j)
MBDn (y ) =
λr {A(y ; yi1 , . . . , yij )}.
j
(j)
BDn (y )
1≤i1 <i2 <...<ij ≤n
λr {A(y ; yi1 , . . . , yij )} measures the proportion of time that a
curve y is in the band.
Marc G. Genton
Functional Median Polish, with Climate Applications
30
Functional Boxplots (Sun and Genton, 2011, 2012)
30
1998
28
26
6
Month
8
10
26
24
1982
12
20
4
1997
18
2
1983
22
Sea Surface Temperature
24
22
20
18
Sea Surface Temperature
28
Functional Boxplot
2
4
6
8
10
12
Surface Boxplot
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
True Model
Generate data from a true model with r = 2, c = 3, and
m = 100 curves in each cell at p = 50 time points.
True Column Effect
−4
−1.0
−2
0
Column Effect
0.0
Row Effect
−0.5
2
1
0
Grand Effect
0.5
3
2
1.0
4
True Row Effect
4
True Grand Effect
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
x
0.6
x
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
x
Introduce outliers through a Gaussian process ijk (t).
Replications: 1,000.
Marc G. Genton
Functional Median Polish, with Climate Applications
Outlier Models
Model 1: ijk (t) = eijk (t), where eijk (t) ∼ GP(0, γ) with
γ(t1 , t2 ) = exp{−|t2 − t1 |}.
Model 2: ijk (t) = eijk (t) + cijk K , where cijk is 1 with prob qij
and 0 with prob 1 − qij , qij is different for each cell.
Model 3: ijk (t) = eijk (t) + cijk K , if t ≥ Tijk and
ijk (t) = eijk (t), if t < Tijk , where Tijk ∼ U(0, 1).
0.0
0.2
0.4
0.6
t
0.8
1.0
25
20
y
−5
0
5
10
15
20
−5
0
5
10
y
15
20
15
10
5
0
−5
y
Model 3
25
Model 2
25
Model 1
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
0.8
1.0
Simulations: Model 1
0.6
0.8
1.0
3
2
1
−1
−2
−3
0.4
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
t
t
t
Model 1 (Mean)
Model 1 (Mean)
Model 1 (Mean)
0.8
1.0
0.8
1.0
0.0
0.2
0.4
0.6
t
0.8
1.0
−3
−3
−2
−2
−1
0
Row Effect 2
0
−1
Row Effect 1
4
2
0
−2
Grand Effect
1
6
1
2
3
2
0.2
8
0.0
0
Row Effect 2
1
0
−2
−1
Row Effect 1
6
4
2
−2
0
Grand Effect
Model 1 (Median)
2
Model 1 (Median)
8
Model 1 (Median)
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
Simulations: Model 1
6
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
t
Model 1 (Mean)
Model 1 (Mean)
Model 1 (Mean)
6
0.8
1.0
0.8
1.0
6
4
Column Effect 3
2
0
−2
−6
0.6
t
1.0
8
4
2
0
Column Effect 2
−4
−2
0
−2
−4
−6
0.4
0.8
10
t
−8
0.2
4
Column Effect 3
0.0
t
−10
0.0
2
0
−2
−6
0.4
6
8
4
2
0
Column Effect 2
−4
−2
0
−2
−4
Column Effect 1
−6
−8
−10
0.2
2
0.0
Column Effect 1
Model 1 (Median)
10
Model 1 (Median)
2
Model 1 (Median)
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
Simulations: Model 2
0.6
0.8
1.0
3
2
1
−1
−2
−3
0.4
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
t
t
t
Model 2 (Mean)
Model 2 (Mean)
Model 2 (Mean)
0.8
1.0
0.8
1.0
0.0
0.2
0.4
0.6
t
0.8
1.0
−3
−3
−2
−2
−1
0
Row Effect 2
0
−1
Row Effect 1
4
2
0
−2
Grand Effect
1
6
1
2
3
2
0.2
8
0.0
0
Row Effect 2
1
0
−2
−1
Row Effect 1
6
4
2
−2
0
Grand Effect
Model 2 (Median)
2
Model 2 (Median)
8
Model 2 (Median)
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
Simulations: Model 2
6
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
t
Model 2 (Mean)
Model 2 (Mean)
Model 2 (Mean)
6
0.8
1.0
0.8
1.0
6
4
Column Effect 3
2
0
−2
−6
0.6
t
1.0
8
4
2
0
Column Effect 2
−4
−2
0
−2
−4
−6
0.4
0.8
10
t
−8
0.2
4
Column Effect 3
0.0
t
−10
0.0
2
0
−2
−6
0.4
6
8
4
2
0
Column Effect 2
−4
−2
0
−2
−4
Column Effect 1
−6
−8
−10
0.2
2
0.0
Column Effect 1
Model 2 (Median)
10
Model 2 (Median)
2
Model 2 (Median)
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
Simulations: Model 3
0.6
0.8
1.0
3
2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.4
0.6
Model 3 (Mean)
Model 3 (Mean)
Model 3 (Mean)
0.8
1.0
0.8
1.0
2
1
−1
−2
−3
0.6
t
1.0
0
Row Effect 2
1
0
−2
−1
Row Effect 1
6
4
2
0.4
0.8
3
t
2
t
0
0.2
0.2
t
−2
0.0
1
−1
−2
−3
0.4
0
Row Effect 2
1
0
−2
−1
Row Effect 1
6
4
2
Grand Effect
0
−2
0.2
8
0.0
Grand Effect
Model 3 (Median)
2
Model 3 (Median)
8
Model 3 (Median)
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
Simulations: Model 3
6
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
t
Model 3 (Mean)
Model 3 (Mean)
Model 3 (Mean)
6
0.8
1.0
0.8
1.0
6
4
Column Effect 3
2
0
−2
−6
0.6
t
1.0
8
4
2
0
Column Effect 2
−4
−2
0
−2
−4
−6
0.4
0.8
10
t
−8
0.2
4
Column Effect 3
0.0
t
−10
0.0
2
0
−2
−6
0.4
6
8
4
2
0
Column Effect 2
−4
−2
0
−2
−4
Column Effect 1
−6
−8
−10
0.2
2
0.0
Column Effect 1
Model 3 (Median)
10
Model 3 (Median)
2
Model 3 (Median)
0.0
0.2
0.4
0.6
t
0.8
1.0
0.0
0.2
0.4
0.6
t
Spatio-Temporal Precipitation Data
Spatio-temporal precipitation data: annual total precipitation
data for U.S. from 1895 to 1997 at 11,918 weather stations.
Nine climatic regions for precipitation defined by National
Climatic Data Center.
Several areas of outliers detected by Sun and Genton (2011,
2012).
+ +++
+++ +
++
+
++
+
+
+++
+
++
+
+ NW + ++
WNC
ENC
+
+
++++
+
+ +
+
+++
+
NE
+
++++
+++ +
++
+ ++++
+
++
+ +++++
++
C
+
+
W ++ ++++++
SW
+
+
++
++
+++++
+
+
S
SE
+
+
Functional Boxplots for Nine Climatic Regions
1920
1945
1970
4000
3000
0
1000
2000
3000
0
1000
2000
3000
2000
Precipitation (mm/year)
1000
0
1895
1995
1895
1945
1970
1995
1920
1945
1970
4000
1945
1970
1995
1895
1945
1970
1995
4000
0
1000
2000
3000
4000
2000
1000
1995
1920
West (4.04% Outliers)
0
1970
1995
2000
1920
3000
4000
3000
2000
1000
1945
Year
1970
1000
1895
North West (2.13% Outliers)
0
1920
1945
0
1995
South West (3.03% Outliers)
1895
1920
3000
4000
3000
1000
0
1895
1895
South (0% Outliers)
2000
3000
2000
1000
0
Precipitation (mm/year)
1920
West North Central (2.04% Outliers)
4000
South East (2.52% Outliers)
Precipitation (mm/year)
Central (0.25% Outliers)
4000
East North Central (0% Outliers)
4000
North East (0.21% Outliers)
1895
1920
1945
Year
1970
1995
1895
1920
1945
Year
1970
1995
Climatic Region Effects
850
Grand Effect
750
800
850
800
750
Grand Effect
900
Weather Station (Mean)
900
Weather Station (Median)
1895
1920
1945
1970
1995
1895
1920
1945
Year
Weather Station (Median)
North West
West
West North Central
South
South West
0
20
40
60
Year
80
100
0
−500
−500
0
Region Effect
500
1000
East North Central
Central
South East
North East
North West
West
West North Central
South
South West
−1000
500
1000
1995
Weather Station (Mean)
East North Central
Central
South East
North East
−1000
Region Effect
1970
Year
0
20
40
60
Year
80
100
Weather Station and GCM Effects
900
800
Grand Effect
600
1920
1945
1970
1995
1895
1920
1945
1970
Year
Weather Station and GCM (Median)
Weather Station and GCM (Mean)
1995
1945
Year
1970
1995
100
0
Data Source Effect
1920
−100
0
−100
−200
Weather Station
GCM
1895
−200
100
200
Year
200
1895
Data Source Effect
700
800
700
600
Grand Effect
900
1000
Weather Station and GCM (Mean)
1000
Weather Station and GCM (Median)
Weather Station
GCM
1895
1920
1945
Year
1970
1995
UK and Ireland Temperatures
Regional Climate Model (RCM): has higher resolution, covers
a limited area of the globe.
The boundaries of RCM are driven by variables output from a
global climate model (GCM).
Question: how much variability in the model output is from
RCM and how much is due to the boundary conditions from
GCM.
Functional ANOVA: Kaufman and Sain (2010) proposed a
mean-based Bayesian framework for spatial data.
Data: PRUDENCE project (Christensen, Carter, and Giorgi
2002), consists of control runs (1961-1990).
Two factors: RCM (HIRHAM and RCAO), GCM (ECHAM4
and HadAm3H).
Grand Effect and Residuals
Control Total (Median)
Control Total (Mean)
58
17
58
17
56
16
Lat
14
15
14
54
15
54
Lat
56
16
52
13
52
13
12
50
50
12
11
−10
−8
−6
−4
−2
0
2
11
−10
−8
−6
−4
−2
Lon
Lon
Residuals (Median)
Residuals (Mean)
0
2
58
0.4
58
0.4
−10
−8
−6
−4
Lon
−2
0
2
Lat
−0.2
52
54
0.0
−0.4
50
52
54
0.0
50
Lat
56
0.2
56
0.2
−0.2
−0.4
−10
−8
−6
−4
Lon
−2
0
2
GCM and RCM Effects
0.0
Lat
58
1.0
56
0.5
0.5
0.0
54
58
1.0
56
Control GCM: HadAM3H (Median)
54
Lat
Control GCM: ECHAM4 (Median)
52
−0.5
52
−0.5
50
−1.0
50
−1.0
−10
−8
−6
−4
−2
0
2
−10
−8
−6
−4
−2
0
Lon
Lon
Control RCM: HIRHAM (Median)
Control RCM: RCAO (Median)
0.6
58
58
0.6
2
56
0.4
Lat
0.2
0.0
−0.2
−0.2
−0.4
−0.4
52
52
54
0.0
54
0.2
−0.6
50
−0.6
50
Lat
56
0.4
−10
−8
−6
−4
Lon
−2
0
2
−10
−8
−6
−4
Lon
−2
0
2
Functional Median Polish
Motivation
Univariate ANOVA
Functional ANOVA
Simulation Studies
Applications
Discussion
Discussion
Functional Median Polish: robust functional ANOVA fitted by
functional median.
Band depth: graph-based nonparametric ordering for
functional/image data (e.g. median image).
The functional median polish algorithm does not guarantee to
yield the least L1 -norm residuals.
Fink (1988) proposed a rather complex modification of the
classical procedure that converges to the least L1 -norm
residuals.
Marc G. Genton
Functional Median Polish, with Climate Applications
Download