Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Functional Median Polish, with Climate Applications Marc G. Genton Department of Statistics, Texas A&M University Program in Spatial Statistics (stat.tamu.edu/pss) Institute for Applied Mathematics and Computational Sciences (iamcs.tamu.edu) Based on joint work with Ying Sun May 11, 2012 Marc G. Genton Functional Median Polish, with Climate Applications Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Functional Median Polish 1 Motivation 2 Univariate ANOVA 3 Functional ANOVA 4 Simulation Studies 5 Applications 6 Discussion Marc G. Genton Functional Median Polish, with Climate Applications Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Observations and Climate Models Observations: provide a corroborating source of information about physical processes being modeled. have methodological and practical issues due to uncertainties. Climate Models: numerically solve systems of differential equations representing physical relationships in the climate system. have huge uncertainties and biases. Scientific Questions: How do we compare sources of variability in observations or climate model outputs? i.e. quantification of uncertainties? Marc G. Genton Functional Median Polish, with Climate Applications Spatio-Temporal Precipitation Data Spatio-temporal precipitation data: annual total precipitation data for U.S. from 1895 to 1997 at 11,918 weather stations. Nine climatic regions for precipitation defined by National Climatic Data Center. Several areas of outliers detected by Sun and Genton (2011, 2012). + +++ +++ + ++ + ++ + + +++ + ++ + + NW + ++ WNC ENC + + ++++ + + + + +++ + NE + ++++ +++ + ++ + ++++ + ++ + +++++ ++ C + + W ++ ++++++ SW + + ++ ++ +++++ + + S SE + + Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Analysis of Variance Analysis of Variance (ANOVA): An important technique for analyzing the effect of categorical factors on a response. It decomposes the variability in the response variable among the different factors. A two-way additive model: for i = 1, . . . , r , j = 1, . . . , c, yij = µ + αi + βj + ij . The ANOVA model can be fitted by arithmetic means (no outliers), or medians (robust). Marc G. Genton Functional Median Polish, with Climate Applications Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion ANOVA Model Fitting Fitted by means: µ̂ = ȳ (grand effect), α̂i = ȳi· − ȳ (row effect), β̂j = ȳ·j − ȳ (column effect). Fitted by medians: Median polish (Tukey, 1970, 1977). An iterative technique for extracting row and column effects in a two-way table using medians rather than means. It stops when no more changes occur in the row and column effects, or changes are sufficiently small. Marc G. Genton Functional Median Polish, with Climate Applications Median Polish Example 1 Original table: find row medians. 2 1st iteration: subtract row medians, find column medians. Grand median in red, row effects in blue, column effects in green. 3 2nd iteration: subtract column medians, find row medians, 6 3 9 3 2 0 0 6 0 3 → 9 0 0 11 4 0 −3 −1 0 −1 5 1 0 1 6 0 0 3 → 0 9 3 0 −2 0 1 −1 4 0 −1 1 0 0 1 0 3 0 → −3 3 4 subtract new row medians, add their medians to the grand median, find column medians. 5 Polished table: new row and column medians are zero after two iterations. 0 0 8 0 0 −2 0 0 0 −1 4 0 −2 0 1 3 0 0 0 −3 → 8 0 0 3+0 −2 0 0 −1 4 0 −2 1 3 0 −3 3 Functional Median Polish Observe functional data at each combination of two categorical factors. Examine their effects: functional row or column effects. yijk (x) = µ(x) + αi (x) + βj (x) + ijk (x), where i = 1, . . . , r , j = 1, . . . , c, k = 1, . . . , mij . Constraints: mediani {αi (x)} = 0, medianj {βj (x)} = 0 and mediani {ijk (x)} = medianj {ijk (x)} = 0 for all k. x can be time for curves or spatial index for surfaces/images. Iterative procedure sweeping out column and row medians. One-way functional ANOVA can be done in a similar way. Need to order functional data. Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Multivariate Ordering Basic ideas of depth in functional context provides a method to order sample curves according to decreasing depth values, y[1] : the deepest (most central or median) curve, y[n] : the most outlying (least representative) curve, y[1] , . . . , y[n] : start from the center outwards. Usual order statistics: ordered from the smallest sample value to the largest. Marc G. Genton Functional Median Polish, with Climate Applications Band Depth for Functional Data López-Pintado and Romo (2009) introduced the band depth (BD) concept through a graph-based approach. Grey area: band determined by two curves, y1 and y3 . 2 3 4 Contains the curve y2 , but does not contain y4 . y2 1 y1 0 y(t) y3 −3 −2 −1 y4 0.0 0.2 0.4 0.6 t 0.8 1.0 Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Band Depth for Functional Data Population version of BD (2) : BD (2) (y , P) = P{G (y ) ⊂ B(Y1 , Y2 )}. G (y ): graph of the curve y , B(Y1 , Y2 ): band delimited by 2 random curves. The band could be delimited by more than 2 random curves, BDJ (y , P) = J X BD (j) (y , P). j=2 Marc G. Genton Functional Median Polish, with Climate Applications Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Sample Band Depth Population level: BD (j) (y , P) is a probability. Sample version of BD (j) (y , P) (j) BDn (y ) −1 n = j X I {G (y ) ⊆ B(yi1 , . . . , yij )}, 1≤i1 <i2 <...<ij ≤n I {·}: the indicator function, fraction of the bands completely containing the curve y . Sample BD: BDn,J (y ) = (j) j=2 BDn (y ). PJ Marc G. Genton Functional Median Polish, with Climate Applications Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Modified Band Depth López-Pintado and Romo (2009) also proposed a more flexible definition, the modified band depth (MBD). −1 X n = I {G (y ) ⊆ B(yi1 , . . . , yij )}, j 1≤i1 <i2 <...<ij ≤n −1 X n (j) MBDn (y ) = λr {A(y ; yi1 , . . . , yij )}. j (j) BDn (y ) 1≤i1 <i2 <...<ij ≤n λr {A(y ; yi1 , . . . , yij )} measures the proportion of time that a curve y is in the band. Marc G. Genton Functional Median Polish, with Climate Applications 30 Functional Boxplots (Sun and Genton, 2011, 2012) 30 1998 28 26 6 Month 8 10 26 24 1982 12 20 4 1997 18 2 1983 22 Sea Surface Temperature 24 22 20 18 Sea Surface Temperature 28 Functional Boxplot 2 4 6 8 10 12 Surface Boxplot Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion True Model Generate data from a true model with r = 2, c = 3, and m = 100 curves in each cell at p = 50 time points. True Column Effect −4 −1.0 −2 0 Column Effect 0.0 Row Effect −0.5 2 1 0 Grand Effect 0.5 3 2 1.0 4 True Row Effect 4 True Grand Effect 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 x 0.6 x 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x Introduce outliers through a Gaussian process ijk (t). Replications: 1,000. Marc G. Genton Functional Median Polish, with Climate Applications Outlier Models Model 1: ijk (t) = eijk (t), where eijk (t) ∼ GP(0, γ) with γ(t1 , t2 ) = exp{−|t2 − t1 |}. Model 2: ijk (t) = eijk (t) + cijk K , where cijk is 1 with prob qij and 0 with prob 1 − qij , qij is different for each cell. Model 3: ijk (t) = eijk (t) + cijk K , if t ≥ Tijk and ijk (t) = eijk (t), if t < Tijk , where Tijk ∼ U(0, 1). 0.0 0.2 0.4 0.6 t 0.8 1.0 25 20 y −5 0 5 10 15 20 −5 0 5 10 y 15 20 15 10 5 0 −5 y Model 3 25 Model 2 25 Model 1 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t 0.8 1.0 Simulations: Model 1 0.6 0.8 1.0 3 2 1 −1 −2 −3 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 t t t Model 1 (Mean) Model 1 (Mean) Model 1 (Mean) 0.8 1.0 0.8 1.0 0.0 0.2 0.4 0.6 t 0.8 1.0 −3 −3 −2 −2 −1 0 Row Effect 2 0 −1 Row Effect 1 4 2 0 −2 Grand Effect 1 6 1 2 3 2 0.2 8 0.0 0 Row Effect 2 1 0 −2 −1 Row Effect 1 6 4 2 −2 0 Grand Effect Model 1 (Median) 2 Model 1 (Median) 8 Model 1 (Median) 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t Simulations: Model 1 6 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 t Model 1 (Mean) Model 1 (Mean) Model 1 (Mean) 6 0.8 1.0 0.8 1.0 6 4 Column Effect 3 2 0 −2 −6 0.6 t 1.0 8 4 2 0 Column Effect 2 −4 −2 0 −2 −4 −6 0.4 0.8 10 t −8 0.2 4 Column Effect 3 0.0 t −10 0.0 2 0 −2 −6 0.4 6 8 4 2 0 Column Effect 2 −4 −2 0 −2 −4 Column Effect 1 −6 −8 −10 0.2 2 0.0 Column Effect 1 Model 1 (Median) 10 Model 1 (Median) 2 Model 1 (Median) 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t Simulations: Model 2 0.6 0.8 1.0 3 2 1 −1 −2 −3 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 t t t Model 2 (Mean) Model 2 (Mean) Model 2 (Mean) 0.8 1.0 0.8 1.0 0.0 0.2 0.4 0.6 t 0.8 1.0 −3 −3 −2 −2 −1 0 Row Effect 2 0 −1 Row Effect 1 4 2 0 −2 Grand Effect 1 6 1 2 3 2 0.2 8 0.0 0 Row Effect 2 1 0 −2 −1 Row Effect 1 6 4 2 −2 0 Grand Effect Model 2 (Median) 2 Model 2 (Median) 8 Model 2 (Median) 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t Simulations: Model 2 6 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 t Model 2 (Mean) Model 2 (Mean) Model 2 (Mean) 6 0.8 1.0 0.8 1.0 6 4 Column Effect 3 2 0 −2 −6 0.6 t 1.0 8 4 2 0 Column Effect 2 −4 −2 0 −2 −4 −6 0.4 0.8 10 t −8 0.2 4 Column Effect 3 0.0 t −10 0.0 2 0 −2 −6 0.4 6 8 4 2 0 Column Effect 2 −4 −2 0 −2 −4 Column Effect 1 −6 −8 −10 0.2 2 0.0 Column Effect 1 Model 2 (Median) 10 Model 2 (Median) 2 Model 2 (Median) 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t Simulations: Model 3 0.6 0.8 1.0 3 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.6 Model 3 (Mean) Model 3 (Mean) Model 3 (Mean) 0.8 1.0 0.8 1.0 2 1 −1 −2 −3 0.6 t 1.0 0 Row Effect 2 1 0 −2 −1 Row Effect 1 6 4 2 0.4 0.8 3 t 2 t 0 0.2 0.2 t −2 0.0 1 −1 −2 −3 0.4 0 Row Effect 2 1 0 −2 −1 Row Effect 1 6 4 2 Grand Effect 0 −2 0.2 8 0.0 Grand Effect Model 3 (Median) 2 Model 3 (Median) 8 Model 3 (Median) 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t Simulations: Model 3 6 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 t Model 3 (Mean) Model 3 (Mean) Model 3 (Mean) 6 0.8 1.0 0.8 1.0 6 4 Column Effect 3 2 0 −2 −6 0.6 t 1.0 8 4 2 0 Column Effect 2 −4 −2 0 −2 −4 −6 0.4 0.8 10 t −8 0.2 4 Column Effect 3 0.0 t −10 0.0 2 0 −2 −6 0.4 6 8 4 2 0 Column Effect 2 −4 −2 0 −2 −4 Column Effect 1 −6 −8 −10 0.2 2 0.0 Column Effect 1 Model 3 (Median) 10 Model 3 (Median) 2 Model 3 (Median) 0.0 0.2 0.4 0.6 t 0.8 1.0 0.0 0.2 0.4 0.6 t Spatio-Temporal Precipitation Data Spatio-temporal precipitation data: annual total precipitation data for U.S. from 1895 to 1997 at 11,918 weather stations. Nine climatic regions for precipitation defined by National Climatic Data Center. Several areas of outliers detected by Sun and Genton (2011, 2012). + +++ +++ + ++ + ++ + + +++ + ++ + + NW + ++ WNC ENC + + ++++ + + + + +++ + NE + ++++ +++ + ++ + ++++ + ++ + +++++ ++ C + + W ++ ++++++ SW + + ++ ++ +++++ + + S SE + + Functional Boxplots for Nine Climatic Regions 1920 1945 1970 4000 3000 0 1000 2000 3000 0 1000 2000 3000 2000 Precipitation (mm/year) 1000 0 1895 1995 1895 1945 1970 1995 1920 1945 1970 4000 1945 1970 1995 1895 1945 1970 1995 4000 0 1000 2000 3000 4000 2000 1000 1995 1920 West (4.04% Outliers) 0 1970 1995 2000 1920 3000 4000 3000 2000 1000 1945 Year 1970 1000 1895 North West (2.13% Outliers) 0 1920 1945 0 1995 South West (3.03% Outliers) 1895 1920 3000 4000 3000 1000 0 1895 1895 South (0% Outliers) 2000 3000 2000 1000 0 Precipitation (mm/year) 1920 West North Central (2.04% Outliers) 4000 South East (2.52% Outliers) Precipitation (mm/year) Central (0.25% Outliers) 4000 East North Central (0% Outliers) 4000 North East (0.21% Outliers) 1895 1920 1945 Year 1970 1995 1895 1920 1945 Year 1970 1995 Climatic Region Effects 850 Grand Effect 750 800 850 800 750 Grand Effect 900 Weather Station (Mean) 900 Weather Station (Median) 1895 1920 1945 1970 1995 1895 1920 1945 Year Weather Station (Median) North West West West North Central South South West 0 20 40 60 Year 80 100 0 −500 −500 0 Region Effect 500 1000 East North Central Central South East North East North West West West North Central South South West −1000 500 1000 1995 Weather Station (Mean) East North Central Central South East North East −1000 Region Effect 1970 Year 0 20 40 60 Year 80 100 Weather Station and GCM Effects 900 800 Grand Effect 600 1920 1945 1970 1995 1895 1920 1945 1970 Year Weather Station and GCM (Median) Weather Station and GCM (Mean) 1995 1945 Year 1970 1995 100 0 Data Source Effect 1920 −100 0 −100 −200 Weather Station GCM 1895 −200 100 200 Year 200 1895 Data Source Effect 700 800 700 600 Grand Effect 900 1000 Weather Station and GCM (Mean) 1000 Weather Station and GCM (Median) Weather Station GCM 1895 1920 1945 Year 1970 1995 UK and Ireland Temperatures Regional Climate Model (RCM): has higher resolution, covers a limited area of the globe. The boundaries of RCM are driven by variables output from a global climate model (GCM). Question: how much variability in the model output is from RCM and how much is due to the boundary conditions from GCM. Functional ANOVA: Kaufman and Sain (2010) proposed a mean-based Bayesian framework for spatial data. Data: PRUDENCE project (Christensen, Carter, and Giorgi 2002), consists of control runs (1961-1990). Two factors: RCM (HIRHAM and RCAO), GCM (ECHAM4 and HadAm3H). Grand Effect and Residuals Control Total (Median) Control Total (Mean) 58 17 58 17 56 16 Lat 14 15 14 54 15 54 Lat 56 16 52 13 52 13 12 50 50 12 11 −10 −8 −6 −4 −2 0 2 11 −10 −8 −6 −4 −2 Lon Lon Residuals (Median) Residuals (Mean) 0 2 58 0.4 58 0.4 −10 −8 −6 −4 Lon −2 0 2 Lat −0.2 52 54 0.0 −0.4 50 52 54 0.0 50 Lat 56 0.2 56 0.2 −0.2 −0.4 −10 −8 −6 −4 Lon −2 0 2 GCM and RCM Effects 0.0 Lat 58 1.0 56 0.5 0.5 0.0 54 58 1.0 56 Control GCM: HadAM3H (Median) 54 Lat Control GCM: ECHAM4 (Median) 52 −0.5 52 −0.5 50 −1.0 50 −1.0 −10 −8 −6 −4 −2 0 2 −10 −8 −6 −4 −2 0 Lon Lon Control RCM: HIRHAM (Median) Control RCM: RCAO (Median) 0.6 58 58 0.6 2 56 0.4 Lat 0.2 0.0 −0.2 −0.2 −0.4 −0.4 52 52 54 0.0 54 0.2 −0.6 50 −0.6 50 Lat 56 0.4 −10 −8 −6 −4 Lon −2 0 2 −10 −8 −6 −4 Lon −2 0 2 Functional Median Polish Motivation Univariate ANOVA Functional ANOVA Simulation Studies Applications Discussion Discussion Functional Median Polish: robust functional ANOVA fitted by functional median. Band depth: graph-based nonparametric ordering for functional/image data (e.g. median image). The functional median polish algorithm does not guarantee to yield the least L1 -norm residuals. Fink (1988) proposed a rather complex modification of the classical procedure that converges to the least L1 -norm residuals. Marc G. Genton Functional Median Polish, with Climate Applications