Vector Generalized Additive Models and applications to extreme value analysis (1) (2) Olivier Mestre (1,2) Météo-France, Ecole Nationale de la Météorologie, Toulouse, France Université Paul Sabatier, LSP, Toulouse, France Based on previous studies realized in collaboration with : Stéphane Hallegatte (CIRED, Météo-France) Sébastien Denvil (LMD) SMOOTHER « Smoother=tool for summarizing the trend of a response measurement Y as a function of predictors » (Hastie & Tibshirani) estimate of the trend that is less variable than Y itself Smoothing matrix S Y*=SY The equivalent degrees of freedom (df) of the smoother S is the trace of S. Allows compare with parametric models. Pointwise standard error bands COV(Y*)=V=S tS ² given an estimation of ², this allows approximate confidence intervals (values : ±2square root of the diagonal of V) SCATTERPLOT SMOOTHING EXAMPLE Data: wind farm production vs numerical windspeed forecasts SMOOTHING Problems raised by smoothers How to average the response values in each neighborhood? How large to take the neighborhoods? Tradeoff between bias and variance of Y* SMOOTHING: POLYNOMIAL (parametric) Linear and cubic parametric least squares fits: MODEL DRIVEN APPROACHES SMOOTHING: BIN SMOOTHER In this example, optimum intervals are determined by means of a regression tree SMOOTHING: RUNNING LINE Running line KERNEL SMOOTHER Watson-Nadaraya SMOOTHING: LOESS The smooth at the target point is the fit of a locally-weighted linear fit (tricube weight) CUBIC SMOOTHING SPLINES This smoother is the solution of the following optimization problem: among all functions f(x) with two continuous derivatives, choose the one that minimizes the penalized sum of squares n Y f X i 1 2 i i Closeness to the data f " x dx b 2 a penalization of the curvature of f It can be shown that the unique solution to this problem is a natural cubic spline with knots at the unique values xi Parameter can be set by means of cross-validation CUBIC SMOOTHING SPLINES Cubic smoothing splines with equivalent df=5 and 10 Additive models Gaussian Linear Model Gaussian Additive model : : IE[Y]=o+1X1+2X2 IE[Y]=S1(X1)+S2(X2) S1, S2 smooth functions of predictors X1, X2, usually LOESS, SPLINE Estimation of S1, S2 : « Backfitting Algorithm » PRINCIPLE OF THE BACKFITTING ALGORITHM Y=S1(X1)+e estimation S1* Y-S1*(X1)=S2(X2)+e estimation S2* Y-S2*(X2)=S1(X1)+e estimation S1** Y-S1**(X1)=S2(X2)+e estimation S2** Y-S2**(X2)=S1(X1)+e estimation S1*** Etc… until convergence Additive models Additive models One efficient way to perform non-linear regression, but… Crucial point ADAPTED WHEN ONLY FEW PREDICTORS 2, 3 predictors at most Additive models Philosophy DATA DRIVEN APPROACHES RATHER THAN MODEL DRIVEN APPROACH USEFUL AS EXPLORATORY TOOLS Approximate inference tests are possible, but full inferences are better assessed by means of parametric models Generalized Additive models (GAM) Extension to non-normal dependant variables Generalized additive models : additive modelling of the natural parameter of exponential family laws (Poisson, Binomial, Gamma, Gauss…). g[µ]==S1(X1)+S2(X2) Vector Generalized Additive Models (VGAM): one step beyond… Example 1 Annual umber and maximum integrated intensity (PDI) of hurricane tracks over the North Atlantic Number of Hurricanes Number of Hurricanes in North Atlantic ~ Poisson distribution Factors influencing the number of hurricanes GAM applied to number of hurricanes (YEAR,SST,SOI,NAO) GAM model Log()= o+S1(SST)+S2(SOI) PARAMETRIC model “broken stick model” (with continuity constraint) in SOI, revealed by GAM analysis log() = o+SOI(1)SOI+SSTSST = o+SOI(1)SOI+SOI(2)(SOI-K)+SSTSST SOI<K SOIK The best fit obtained for SOI value K=1 log-likelihood=-316.16, to be compared with -318.71 (linearity) standard deviance test allows reject linearity (p value=0.02) Expectation of the hurricane number is then straightforwardly computed as a function of SOI and SST EXPECTATION OF HURRICANE NUMBERS OBSERVED vs EXPECTED: r=0.6