part 1

advertisement
Vector Generalized Additive Models
and applications to extreme value
analysis
(1)
(2)
Olivier Mestre (1,2)
Météo-France, Ecole Nationale de la Météorologie, Toulouse, France
Université Paul Sabatier, LSP, Toulouse, France
Based on previous studies realized in collaboration with :
Stéphane Hallegatte (CIRED, Météo-France)
Sébastien Denvil (LMD)
SMOOTHER
« Smoother=tool for summarizing the trend of a response measurement Y
as a function of predictors » (Hastie & Tibshirani)
estimate of the trend that is less variable than Y itself
 Smoothing matrix S
Y*=SY
The equivalent degrees of freedom (df) of the smoother S is the trace of S.
Allows compare with parametric models.
 Pointwise standard error bands
COV(Y*)=V=S tS ² given an estimation of ², this allows approximate
confidence intervals (values : ±2square root of the diagonal of V)
SCATTERPLOT SMOOTHING EXAMPLE
 Data: wind farm production vs numerical windspeed forecasts
SMOOTHING
 Problems raised by smoothers
How to average the response values in
each neighborhood?
How large to take the neighborhoods?

Tradeoff between bias and variance of Y*
SMOOTHING: POLYNOMIAL (parametric)
 Linear and cubic parametric least squares fits: MODEL DRIVEN
APPROACHES
SMOOTHING: BIN SMOOTHER
 In this example, optimum intervals are determined by means of a
regression tree
SMOOTHING: RUNNING LINE
 Running line
KERNEL SMOOTHER
 Watson-Nadaraya
SMOOTHING: LOESS
 The smooth at the target point is the fit of a locally-weighted linear fit
(tricube weight)
CUBIC SMOOTHING SPLINES
 This smoother is the solution of the following optimization problem:
among all functions f(x) with two continuous derivatives, choose the
one that minimizes the penalized sum of squares
n
Y  f  X 
i 1
2
i
i
Closeness to the data
   f "  x  dx
b
2
a
penalization of the curvature of f
It can be shown that the unique solution to this problem is a natural cubic
spline with knots at the unique values xi
Parameter  can be set by means of cross-validation
CUBIC SMOOTHING SPLINES
 Cubic smoothing splines with equivalent df=5 and 10
Additive models
 Gaussian Linear Model
 Gaussian Additive model
:
:
IE[Y]=o+1X1+2X2
IE[Y]=S1(X1)+S2(X2)
S1, S2 smooth functions of predictors X1, X2, usually LOESS, SPLINE
Estimation of S1, S2 : « Backfitting Algorithm »
 PRINCIPLE OF THE BACKFITTING ALGORITHM
Y=S1(X1)+e

estimation S1*
Y-S1*(X1)=S2(X2)+e 
estimation S2*
Y-S2*(X2)=S1(X1)+e 
estimation S1**
Y-S1**(X1)=S2(X2)+e 
estimation S2**
Y-S2**(X2)=S1(X1)+e 
estimation S1***
Etc… until convergence
Additive models
 Additive models
One efficient way to perform non-linear regression, but…
 Crucial point
ADAPTED WHEN ONLY FEW PREDICTORS
2, 3 predictors at most
Additive models
 Philosophy
DATA DRIVEN APPROACHES RATHER THAN
MODEL DRIVEN APPROACH
USEFUL AS EXPLORATORY TOOLS
 Approximate inference tests are possible, but full inferences are better
assessed by means of parametric models
Generalized Additive models (GAM)
 Extension to non-normal dependant variables
 Generalized additive models : additive modelling of the natural
parameter of exponential family laws (Poisson, Binomial, Gamma,
Gauss…).
g[µ]==S1(X1)+S2(X2)
 Vector Generalized Additive Models (VGAM): one step beyond…
Example 1
Annual umber and maximum integrated
intensity (PDI) of hurricane tracks
over the North Atlantic
Number of Hurricanes
 Number of Hurricanes in North Atlantic ~ Poisson distribution
Factors influencing the number of hurricanes
 GAM applied to number of hurricanes (YEAR,SST,SOI,NAO)
GAM model
 Log()= o+S1(SST)+S2(SOI)
PARAMETRIC model
 “broken stick model” (with continuity constraint) in SOI, revealed by
GAM analysis
 log()
= o+SOI(1)SOI+SSTSST
= o+SOI(1)SOI+SOI(2)(SOI-K)+SSTSST
SOI<K
SOIK
 The best fit obtained for SOI value K=1
log-likelihood=-316.16, to be compared with -318.71 (linearity)
standard deviance test allows reject linearity (p value=0.02)
 Expectation  of the hurricane number is then straightforwardly
computed as a function of SOI and SST
EXPECTATION OF HURRICANE NUMBERS
OBSERVED vs EXPECTED: r=0.6
Download