Predicting the intraday-load curve Structure in high dimensional models Dominique Picard

advertisement
Predicting the intraday-load curve
Structure in high dimensional models
Dominique Picard
LPMA- Université Paris-Diderot-Paris 7
joint : Mathilde Mougeot UPD, Vincent Lefieux RTE, Laurence Maillard RTE
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
1/50
Forecasting electricity : black-out
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
2/50
Predicting the intraday load curve
with a statistical learning point of
view
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
3/50
Intraday load curve forecasting
-here 48h-
4
5.6
20100602−3−−20100603−4
x 10
5.4
5.2
5
4.8
4.6
4.4
4.2
4
Y
Yapx
tpred
3.8
3.6
0
10
20
30
40
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
50
4/50
Forecasting pipeline
1
Construction of a ’ smart encyclopedia’ of past scenarios out
of a data basis using learning algorithms.
2
Build an ensemble of prediction experts consulting the
encyclopedia.
3
Build an ensemble prediction
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
5/50
Data basis
The past data basis
Electrical consumption of the past
Meteorological input
Endogenous variables : calendar data, functional bases
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
6/50
Electrical consumption of the past
Recorded every half hour from January 1st , 2003 to August
31th , 2010.
For this period of time, the global consumption signal is split
into N = 2800 sub signals (Y1 , . . . , Yt , . . . , YN ). Yt ∈ R n ,
defines the intra day load curve for the t th day of size n = 48.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
7/50
Load curve during a week : a function
4
9
x 10
8.5
8
7.5
7
6.5
6
0
20
40
60
80
100
120
140
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
160
180
8/50
Functional aspect : dictionary
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
0
10
20
30
40
50
60
70
Figure : Functions of the dictionary. Constant (black-solid line), cosine
(blue-dotted line), sine (blue-dashdot line), Haar (red-dashed line) and
temperature (green-solid line with points) functions.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
9/50
Calendar and functional effects
4
4
6
x 10
8.4
x 10
8.2
8
5.5
7.8
7.6
5
7.4
4.5
7.2
4
6.8
7
6.6
3.5
0
5
10
15
20
25
6.4
4
7.8
0
5
10
15
20
25
5
10
15
20
25
4
x 10
4.4
7.6
x 10
4.2
7.4
4
7.2
7
3.8
6.8
3.6
6.6
3.4
6.4
3.2
6.2
6
0
5
10
15
20
Figure :
25
3
0
autumn, winter, spring and summer
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
10/50
Meteorological inputs
A total of 371 (=2x39+293) meteorological variables recorded
each day half-hourly over the 2800 days of the same period of time.
Temperature :
T k for k = 1, . . . , 39 measured in 39 weather stations scattered all
over the French territory.
Cloud Cover :
N k for k = 1, . . . , 39 measured in the same 39 weather stations.
Wind :
′
W k for k ′ = 1, . . . , 293 available at 293 network points scattered
all over the territory.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
11/50
Weather stations
Figure :
Temperature and Cloud covering measurement stations. Wind stations
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
12/50
Main issues
1
Large dimension
2
Prediction requires to explain with a small number of
predictable parameters
3
Most of the potentially explanatory variables (load curve,
meteo, functions of the dictionary) are highly correlated
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
13/50
Reduced set of explanatory variables
For each t index of the day of interest, we register the daily
electrical consumption signal Yt and
Zt = [[CF ]t [T ]t [N]t [W ]t ]
is the concatenation of the ”calendar and functional” variables and
”climate variables”.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
14/50
Sparse approximation
Sparse Approximation of each consumption day on a learning set of
days (2003-2010), using the set of potentially explanatory variables.
For each day t of the learning set, we build an approximation
Ŷt of the (observed) signal Yt with the help of the set of
explanatory variables (Zt ) :
Ŷt = Gt (Zt )
Gt (Zt ) = Zt βˆt , sparse
(∗ ) Sparse Approximation and Knowledge Extraction for Electrical Consumption Signals, M. Mougeot, D. P., K.
Tribouley & V. Lefieux, L. Teyssier-Maillard, Adv. Adapt. Data Anal. 5 (2013), no. 4
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
15/50
Forecasting procedure
Forecasting using the encyclopedia
Construction of an ensemble of forecasting experts.
Aggregation of the experts.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
16/50
Expert associated to the strategy M
Forecasting experts
Strategy : M a function, data dependent or not, from N to N
such that for any d ∈ N, M(d) < d (purely non anticipative).
Plug-in To the strategy M we associate the expert ỸtM : the
prediction of the signal of day t using forecasting strategy M,
ỸtM = GM(t) (Zt ) = Zt β̂M(t)
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
17/50
Examples of strategies : time focused
experts
tm1 : Refer to the day before : (The coefficients used for prediction
are those calculated the previous day)
M(d) = d − 1
Ỹttm1 = Zt β̂t−1
tm7 : Refer to one week before :
M(d) = d − 7
Ỹttm7 = Zt β̂t−7
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
18/50
Meteo-scenario focused experts
T : Find the day having the closest temperature indicators,
regarding the sup distance (over the days, and over the
indicators) :
M(d) = ArgMint supk∈{1,...,6}, i ∈{1,...,48} |Tdk (i ) − Ttk (i )|
Tm : Find the day having the closest median temperature with
the sup distance (over the days) :
M(d) = ArgMint supi ∈{1,...,48} |Td3 (i ) − Tt3 (i )|
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
19/50
Prediction evaluation
M
Naive
Apx
tm1
tm7
T
Tm
T/N
Tm/N
T/G
T/d
T/c
Ns/G
N/d
N/c
mean
0.0634
0.0129
0.0340
0.0327
0.0306
0.0329
0.0347
0.0358
0.0323
0.0351
0.0340
0.0322
0.0305
0.0307
med
0.0415
0.0104
0.0281
0.0258
0.0263
0.0275
0.0293
0.0300
0.0271
0.0278
0.0259
0.0251
0.0239
0.0237
min
0.0046
0.0023
0.0063
0.0054
0.0058
0.0047
0.0056
0.0054
0.0050
0.0053
0.0053
0.0049
0.0042
0.0042
max
0.1982
0.0786
0.1490
0.2297
0.1085
0.2020
0.1916
0.2156
0.1916
0.1916
0.1937
0.2078
0.1449
0.1990
Table : Mape errors for the different strategies for intraday forecasting
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
20/50
Aggregation of predictors
(Exponential weights inspired by various theoretical results -see
Lecue, Rigollet, Stolz, Tsybakov,...-)
Ỹdwgt ∗
with
wdM = exp(−
=
PM
m m
m=1 wd Ỹd
m
m=1 wd
PM
T
1 X M
|Ỹd (i ) − Yd (i )|2 )
Tθ
i =1
θ is a parameter, (often called temperature in physic applications,
see the discussion below) T = Tpred.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
21/50
Forecasting
Figure 4 shows the forecasting signal for a winter day
(mape=0.7%).
4
5.6
20100602−3−−20100603−4
x 10
5.4
5.2
5
4.8
4.6
4.4
4.2
4
Y
Yapx
tpred
3.8
3.6
0
10
20
30
40
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
50
22/50
Winter forecast
4
9
x 10
8.5
8
7.5
7
6.5
6
5.5
5
0
20
40
60
80
100
120
140
160
180
Figure :
Forecast (solid blue line) and observed (dashed dark line) electrical consumption for a winter week
from Monday February 1st to Sunday January 7th 2010.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
23/50
Spring forecast
4
6
x 10
5.5
5
4.5
4
3.5
3
2.5
0
20
40
60
80
100
120
140
160
180
Figure :
Forecast (solid blue line) and observed (dashed dark line) electrical consumption for a spring week
from Monday June 14th to Sunday June 21th 2010.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
24/50
Sparse structured methods
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
25/50
High dimensional linear Models
Y = Xβ + ǫ
β ∈ IR k is the unknown parameter (to be estimated)
ǫ = (ǫ1 , . . . , ǫn )∗ is a (non observed) vector of random errors.
It is assumed to be variables i.i.d. N(0, σ 2 )
X is a known matrix n × k.
High dimension : k >> nt
M. Mougeot, D. P., K. Tribouley, 2012, JRSS B vol 74
Mougeot M. , P. D., Tribouley K., 2013, JSPI vol 143
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
26/50
Usual sparsity conditions
SMALL NUMBER OF BIG COEFFICIENTS
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
27/50
Structuring
Y = X β + ǫ, X : N × k
We decide to re-arrange the k predictors into p (p ≤ k) groups of
variables
X = [XG1 , . . . , XGp ]
where G1 , . . . , Gp is a partition of {1, . . . , k}.
Xℓ = X(j,t) , XGj = [X(j,1) , . . . , X(j,|Gj |) ]
j ∈ {1, . . . , p} is the index of the group Gj
t is the altitude (height) of ℓ inside the group Gj .
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
28/50
How to choose the structure
Frequent observations
Lasso chooses one variable among correlated ones
but is unstable in presence of correlation : a weakness for
prediction
Diverses ways for structuring
Pre-screening : search for a regression on unions of spaces
with dimension reduction : PCA, PLS, Kernel methods,
Graph-Laplacian, spectral screening...
Adapt the groups to improve the procedure
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
29/50
Structured Sparsity
• Structured sparsity ideally less stringent than ordinary one
• Means we require a small number of ’energetic’ groups
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
30/50
Methods for sparsity
Many penalizations introduced historically in the regression
framework (identification constraints on β)
penalization : E (β, λ) = ||Y − X β||2 + λ pen(β)
Greedy methods
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
31/50
Greedy methods : 2-step thresholding
Y = Xβ + ǫ
Y (n × 1), X (n × k)
steps
Step 1=pre-selection
Least squares
Step 2=denoising
compute
size
Find b elected
b < n << k
Xb
(n, b)
on elected
β̃ = (Xb∗ Xb )−1 Xb∗ Y
(1, b)
the coefficients
β̂
(1, Ŝ)
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
32/50
GR-greedy - step1
The columns of X are normalized
N
1X 2
Xi (j,t) = 1, ∀ (j, t).
n
i =1
Search for energetic groups thresholding :
N
1X
Xi (j,t)Yi |
∀ (j, t), 1 ≤ j ≤ p, 1 ≤ t ≤ T
n
i =1
X
2
ρ2j =
K(j,t)
, energy of group j
K(j,t) = |
t=1,...,T
T = max |Gj |
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
33/50
GR-greedy - step1
λ(1) tuning parameter
B = j = 1, . . . , p, ρ2j ≥ λ(1)2
(1)
GB = ∪j∈B Gj .
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
34/50
GR-greedy - step2
OLS on all the elected people by considering the new
pseudo-linear model
Y = XGB βGB + error .
β̂GB = β̂(B) ( hence
β̂GBc = 0)
where
β̂(B) = [XGt B XGB ]−1 XGB Y .
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
35/50
GR-greedy - Block-Thresholding
λ(2) another tuning parameter.
We apply the second threshold on the estimated coefficients
∀ℓ = (j, t) ∈ {1, . . . , k},
kβ̂k2Gj ,2 :=
β̂ℓ∗ = β̂ℓ I{ kβ̂kGj ,2 ≥ λ(2) }
X
2
β̂(j,t)
.
0≤t≤T
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
36/50
Concentration results
Loss Function
d(β̂ ∗ , β)2 =
k
X
(βbl − βl )2
l=1
Assumptions
Sparsity :
p
X
j=1
kβkqGj ,2 =
p X
T
X
|β(j,t) |2 ]q/2 ≤ (M)q .
[
j=1 t=1
(β ∈ B2,q (M))
Dimension : p ≤ exp(c ′ n),
(c ′ constant)
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
37/50
Concentration results
r
sup Ed(β̂ ∗ , β)2 ≤ D[
B2,q (M)
T ∨ log p
∨ τ (G)](2−q)
n
for some positive constant D
What is τ (G) ?
Mougeot M. , P. D., Tribouley K. (2013) Grouping Strategies and Thresholding for High Dimensional Linear
Models. Journal of Statistical Planning and Inference with discussion, 143, p 1457-1465.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
38/50
Coherence.
Let M be the k × k Gram matrix :
M :=
1 ∗
X X.
n
and the Coherence
N
τn = sup |Mℓm | = sup |
ℓ6=m
ℓ6=m
1X
Xi ℓ Xim |
n
i =1
N
1X
sup |
=
sup |M(j,t)(j ′ ,t ′ ) | =
Xi (j,t)Xi (j ′ ,t ′ ) |
(j,t)6=(j ′ ,t ′ )
(j,t)6=(j ′ ,t ′ ) n
i =1
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
39/50
Splitting the coherence
Multitask inspiration
We split the coherence τn into γBG and γBA where
γBG := sup sup M(j,t)(j ′ ,t) .
t
j6=j ′
between groups-given altitude, sup over altitude
γBA := sup sup M(j,t)(j ′ ,t ′ ) (small)
j,j ′ t6=t ′
different altitudes, no matter which groups
τ (G) = T γBA + γBG
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
40/50
Example
ST coefficients, all equal to γ.
γBA = 0, γBG = γ ≥
greedy
RATES
ST [γ 2 +
log k
n ]
r
log k
n
GRgreedy (opt)
S[γ 2 +
T
n
+
log k/T
n
GRgreedy (worse)
]
ST [γ 2 +
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
T
n
+
log k/T
]
n
41/50
Strategies for grouping
Y = X β + ǫ Y (N × 1), X (N × k)
Question : how to group to obtain better rates, when possible ?
P
P
2 q/2 ]
[ pj=1 [ T
t=1 |β(j,t) | ]
q
p
∨ {T γBA + γBG }]2−q
[ T ∨log
n
↓
↓
GATHERING
WORKING on T γBA + γBG
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
42/50
Boosting the rates
Y = Xβ + ǫ
Y (N × 1), X (N × k)
p groups
Calculate the (internal) correlations of the columns of the
matrix X as well as their (external) correlation with the target
Y.
Divide the columns of X into two sets : S1 : highly correlated,
S2 : weakly correlated.
Put S1 as ’group beginners’ or ’delegates’ (each of them has
smallest altitude in its group)
Fill the groups
with affinity with the delegate in terms of
P
X
Kl = | n1 N
i =1 il Yi | : Realize a small number of energetic
groups
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
43/50
Boosting the rates
Divide the columns of X into two sets : S1 : highly correlated,
S2 : weakly correlated.
Put S1 as ’group beginners’ or ’delegates’ (each of them has
smallest altitude in its group) −→ γBA << γBG = γmax
Realize a ’good cut’ off S1 and S2 , ensuring :
T γBA ≤ γBG ,
2
T /n ≤ γBG
Fill the groups
with affinity with thePdelegate
PT in terms 2ofq
p
1 PN
Kl = | n i =1 Xil Yi | : indication of j=1 [ t=1 |β(j,t) | ] as
small as possible
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
44/50
Back to electrical consumption
4
5
x 10
4.8
4.6
4.4
4.2
4
3.8
3.6
3.4
3.2
0
10
20
30
40
50
60
70
time
Figure : French consumption
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
45/50
Temperatures
Figure : French temperature spots
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
46/50
Dictionary
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
0
10
20
30
40
50
60
70
Figure : Functions of the dictionary. Constant (black-solid line), cosine
(blue-dotted line), sine (blue-dashdot line), Haar (red-dashed line) and
temperature (green-solid line with points) functions.
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
47/50
Delegates
6000
1
3
5000
E
C
S
H
T
4000
3000
2000
1000
0
11
3
0
Figure :
20
40
60
3
80
100
Correlations between the consumption signal and the various dictionary functions.
Group : E = 0.75%, 24 regressors / 7 groups
THS-THH-THH-TCS-TCST-HHTC-STSH.
meaningful functions (8 : T ) ; (3 : C ) ; (5 : S) ; (8 : H).
Individual, E = 1.86% 24 regressors :
T-T-C-T-H-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-S-C
and are meaningful functions (20 : T ), (2 : C ), (1 : S), (1 : H) .
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
48/50
Approximation
4
5
x 10
4.8
4.6
4.4
4.2
4
3.8
3.6
sig
grol
lol
3.4
3.2
0
10
20
30
40
50
60
70
Figure : Model of the consumption signal (black-solid line) using
GRgreed (red-dahsed line) and I-greed (blue dot dashed line).
Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models
49/50
Download