Predicting the intraday-load curve Structure in high dimensional models Dominique Picard LPMA- Université Paris-Diderot-Paris 7 joint : Mathilde Mougeot UPD, Vincent Lefieux RTE, Laurence Maillard RTE Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 1/50 Forecasting electricity : black-out Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 2/50 Predicting the intraday load curve with a statistical learning point of view Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 3/50 Intraday load curve forecasting -here 48h- 4 5.6 20100602−3−−20100603−4 x 10 5.4 5.2 5 4.8 4.6 4.4 4.2 4 Y Yapx tpred 3.8 3.6 0 10 20 30 40 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 50 4/50 Forecasting pipeline 1 Construction of a ’ smart encyclopedia’ of past scenarios out of a data basis using learning algorithms. 2 Build an ensemble of prediction experts consulting the encyclopedia. 3 Build an ensemble prediction Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 5/50 Data basis The past data basis Electrical consumption of the past Meteorological input Endogenous variables : calendar data, functional bases Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 6/50 Electrical consumption of the past Recorded every half hour from January 1st , 2003 to August 31th , 2010. For this period of time, the global consumption signal is split into N = 2800 sub signals (Y1 , . . . , Yt , . . . , YN ). Yt ∈ R n , defines the intra day load curve for the t th day of size n = 48. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 7/50 Load curve during a week : a function 4 9 x 10 8.5 8 7.5 7 6.5 6 0 20 40 60 80 100 120 140 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 160 180 8/50 Functional aspect : dictionary 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 0 10 20 30 40 50 60 70 Figure : Functions of the dictionary. Constant (black-solid line), cosine (blue-dotted line), sine (blue-dashdot line), Haar (red-dashed line) and temperature (green-solid line with points) functions. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 9/50 Calendar and functional effects 4 4 6 x 10 8.4 x 10 8.2 8 5.5 7.8 7.6 5 7.4 4.5 7.2 4 6.8 7 6.6 3.5 0 5 10 15 20 25 6.4 4 7.8 0 5 10 15 20 25 5 10 15 20 25 4 x 10 4.4 7.6 x 10 4.2 7.4 4 7.2 7 3.8 6.8 3.6 6.6 3.4 6.4 3.2 6.2 6 0 5 10 15 20 Figure : 25 3 0 autumn, winter, spring and summer Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 10/50 Meteorological inputs A total of 371 (=2x39+293) meteorological variables recorded each day half-hourly over the 2800 days of the same period of time. Temperature : T k for k = 1, . . . , 39 measured in 39 weather stations scattered all over the French territory. Cloud Cover : N k for k = 1, . . . , 39 measured in the same 39 weather stations. Wind : ′ W k for k ′ = 1, . . . , 293 available at 293 network points scattered all over the territory. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 11/50 Weather stations Figure : Temperature and Cloud covering measurement stations. Wind stations Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 12/50 Main issues 1 Large dimension 2 Prediction requires to explain with a small number of predictable parameters 3 Most of the potentially explanatory variables (load curve, meteo, functions of the dictionary) are highly correlated Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 13/50 Reduced set of explanatory variables For each t index of the day of interest, we register the daily electrical consumption signal Yt and Zt = [[CF ]t [T ]t [N]t [W ]t ] is the concatenation of the ”calendar and functional” variables and ”climate variables”. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 14/50 Sparse approximation Sparse Approximation of each consumption day on a learning set of days (2003-2010), using the set of potentially explanatory variables. For each day t of the learning set, we build an approximation Ŷt of the (observed) signal Yt with the help of the set of explanatory variables (Zt ) : Ŷt = Gt (Zt ) Gt (Zt ) = Zt βˆt , sparse (∗ ) Sparse Approximation and Knowledge Extraction for Electrical Consumption Signals, M. Mougeot, D. P., K. Tribouley & V. Lefieux, L. Teyssier-Maillard, Adv. Adapt. Data Anal. 5 (2013), no. 4 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 15/50 Forecasting procedure Forecasting using the encyclopedia Construction of an ensemble of forecasting experts. Aggregation of the experts. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 16/50 Expert associated to the strategy M Forecasting experts Strategy : M a function, data dependent or not, from N to N such that for any d ∈ N, M(d) < d (purely non anticipative). Plug-in To the strategy M we associate the expert ỸtM : the prediction of the signal of day t using forecasting strategy M, ỸtM = GM(t) (Zt ) = Zt β̂M(t) Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 17/50 Examples of strategies : time focused experts tm1 : Refer to the day before : (The coefficients used for prediction are those calculated the previous day) M(d) = d − 1 Ỹttm1 = Zt β̂t−1 tm7 : Refer to one week before : M(d) = d − 7 Ỹttm7 = Zt β̂t−7 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 18/50 Meteo-scenario focused experts T : Find the day having the closest temperature indicators, regarding the sup distance (over the days, and over the indicators) : M(d) = ArgMint supk∈{1,...,6}, i ∈{1,...,48} |Tdk (i ) − Ttk (i )| Tm : Find the day having the closest median temperature with the sup distance (over the days) : M(d) = ArgMint supi ∈{1,...,48} |Td3 (i ) − Tt3 (i )| Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 19/50 Prediction evaluation M Naive Apx tm1 tm7 T Tm T/N Tm/N T/G T/d T/c Ns/G N/d N/c mean 0.0634 0.0129 0.0340 0.0327 0.0306 0.0329 0.0347 0.0358 0.0323 0.0351 0.0340 0.0322 0.0305 0.0307 med 0.0415 0.0104 0.0281 0.0258 0.0263 0.0275 0.0293 0.0300 0.0271 0.0278 0.0259 0.0251 0.0239 0.0237 min 0.0046 0.0023 0.0063 0.0054 0.0058 0.0047 0.0056 0.0054 0.0050 0.0053 0.0053 0.0049 0.0042 0.0042 max 0.1982 0.0786 0.1490 0.2297 0.1085 0.2020 0.1916 0.2156 0.1916 0.1916 0.1937 0.2078 0.1449 0.1990 Table : Mape errors for the different strategies for intraday forecasting Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 20/50 Aggregation of predictors (Exponential weights inspired by various theoretical results -see Lecue, Rigollet, Stolz, Tsybakov,...-) Ỹdwgt ∗ with wdM = exp(− = PM m m m=1 wd Ỹd m m=1 wd PM T 1 X M |Ỹd (i ) − Yd (i )|2 ) Tθ i =1 θ is a parameter, (often called temperature in physic applications, see the discussion below) T = Tpred. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 21/50 Forecasting Figure 4 shows the forecasting signal for a winter day (mape=0.7%). 4 5.6 20100602−3−−20100603−4 x 10 5.4 5.2 5 4.8 4.6 4.4 4.2 4 Y Yapx tpred 3.8 3.6 0 10 20 30 40 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 50 22/50 Winter forecast 4 9 x 10 8.5 8 7.5 7 6.5 6 5.5 5 0 20 40 60 80 100 120 140 160 180 Figure : Forecast (solid blue line) and observed (dashed dark line) electrical consumption for a winter week from Monday February 1st to Sunday January 7th 2010. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 23/50 Spring forecast 4 6 x 10 5.5 5 4.5 4 3.5 3 2.5 0 20 40 60 80 100 120 140 160 180 Figure : Forecast (solid blue line) and observed (dashed dark line) electrical consumption for a spring week from Monday June 14th to Sunday June 21th 2010. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 24/50 Sparse structured methods Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 25/50 High dimensional linear Models Y = Xβ + ǫ β ∈ IR k is the unknown parameter (to be estimated) ǫ = (ǫ1 , . . . , ǫn )∗ is a (non observed) vector of random errors. It is assumed to be variables i.i.d. N(0, σ 2 ) X is a known matrix n × k. High dimension : k >> nt M. Mougeot, D. P., K. Tribouley, 2012, JRSS B vol 74 Mougeot M. , P. D., Tribouley K., 2013, JSPI vol 143 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 26/50 Usual sparsity conditions SMALL NUMBER OF BIG COEFFICIENTS Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 27/50 Structuring Y = X β + ǫ, X : N × k We decide to re-arrange the k predictors into p (p ≤ k) groups of variables X = [XG1 , . . . , XGp ] where G1 , . . . , Gp is a partition of {1, . . . , k}. Xℓ = X(j,t) , XGj = [X(j,1) , . . . , X(j,|Gj |) ] j ∈ {1, . . . , p} is the index of the group Gj t is the altitude (height) of ℓ inside the group Gj . Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 28/50 How to choose the structure Frequent observations Lasso chooses one variable among correlated ones but is unstable in presence of correlation : a weakness for prediction Diverses ways for structuring Pre-screening : search for a regression on unions of spaces with dimension reduction : PCA, PLS, Kernel methods, Graph-Laplacian, spectral screening... Adapt the groups to improve the procedure Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 29/50 Structured Sparsity • Structured sparsity ideally less stringent than ordinary one • Means we require a small number of ’energetic’ groups Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 30/50 Methods for sparsity Many penalizations introduced historically in the regression framework (identification constraints on β) penalization : E (β, λ) = ||Y − X β||2 + λ pen(β) Greedy methods Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 31/50 Greedy methods : 2-step thresholding Y = Xβ + ǫ Y (n × 1), X (n × k) steps Step 1=pre-selection Least squares Step 2=denoising compute size Find b elected b < n << k Xb (n, b) on elected β̃ = (Xb∗ Xb )−1 Xb∗ Y (1, b) the coefficients β̂ (1, Ŝ) Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 32/50 GR-greedy - step1 The columns of X are normalized N 1X 2 Xi (j,t) = 1, ∀ (j, t). n i =1 Search for energetic groups thresholding : N 1X Xi (j,t)Yi | ∀ (j, t), 1 ≤ j ≤ p, 1 ≤ t ≤ T n i =1 X 2 ρ2j = K(j,t) , energy of group j K(j,t) = | t=1,...,T T = max |Gj | Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 33/50 GR-greedy - step1 λ(1) tuning parameter B = j = 1, . . . , p, ρ2j ≥ λ(1)2 (1) GB = ∪j∈B Gj . Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 34/50 GR-greedy - step2 OLS on all the elected people by considering the new pseudo-linear model Y = XGB βGB + error . β̂GB = β̂(B) ( hence β̂GBc = 0) where β̂(B) = [XGt B XGB ]−1 XGB Y . Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 35/50 GR-greedy - Block-Thresholding λ(2) another tuning parameter. We apply the second threshold on the estimated coefficients ∀ℓ = (j, t) ∈ {1, . . . , k}, kβ̂k2Gj ,2 := β̂ℓ∗ = β̂ℓ I{ kβ̂kGj ,2 ≥ λ(2) } X 2 β̂(j,t) . 0≤t≤T Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 36/50 Concentration results Loss Function d(β̂ ∗ , β)2 = k X (βbl − βl )2 l=1 Assumptions Sparsity : p X j=1 kβkqGj ,2 = p X T X |β(j,t) |2 ]q/2 ≤ (M)q . [ j=1 t=1 (β ∈ B2,q (M)) Dimension : p ≤ exp(c ′ n), (c ′ constant) Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 37/50 Concentration results r sup Ed(β̂ ∗ , β)2 ≤ D[ B2,q (M) T ∨ log p ∨ τ (G)](2−q) n for some positive constant D What is τ (G) ? Mougeot M. , P. D., Tribouley K. (2013) Grouping Strategies and Thresholding for High Dimensional Linear Models. Journal of Statistical Planning and Inference with discussion, 143, p 1457-1465. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 38/50 Coherence. Let M be the k × k Gram matrix : M := 1 ∗ X X. n and the Coherence N τn = sup |Mℓm | = sup | ℓ6=m ℓ6=m 1X Xi ℓ Xim | n i =1 N 1X sup | = sup |M(j,t)(j ′ ,t ′ ) | = Xi (j,t)Xi (j ′ ,t ′ ) | (j,t)6=(j ′ ,t ′ ) (j,t)6=(j ′ ,t ′ ) n i =1 Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 39/50 Splitting the coherence Multitask inspiration We split the coherence τn into γBG and γBA where γBG := sup sup M(j,t)(j ′ ,t) . t j6=j ′ between groups-given altitude, sup over altitude γBA := sup sup M(j,t)(j ′ ,t ′ ) (small) j,j ′ t6=t ′ different altitudes, no matter which groups τ (G) = T γBA + γBG Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 40/50 Example ST coefficients, all equal to γ. γBA = 0, γBG = γ ≥ greedy RATES ST [γ 2 + log k n ] r log k n GRgreedy (opt) S[γ 2 + T n + log k/T n GRgreedy (worse) ] ST [γ 2 + Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models T n + log k/T ] n 41/50 Strategies for grouping Y = X β + ǫ Y (N × 1), X (N × k) Question : how to group to obtain better rates, when possible ? P P 2 q/2 ] [ pj=1 [ T t=1 |β(j,t) | ] q p ∨ {T γBA + γBG }]2−q [ T ∨log n ↓ ↓ GATHERING WORKING on T γBA + γBG Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 42/50 Boosting the rates Y = Xβ + ǫ Y (N × 1), X (N × k) p groups Calculate the (internal) correlations of the columns of the matrix X as well as their (external) correlation with the target Y. Divide the columns of X into two sets : S1 : highly correlated, S2 : weakly correlated. Put S1 as ’group beginners’ or ’delegates’ (each of them has smallest altitude in its group) Fill the groups with affinity with the delegate in terms of P X Kl = | n1 N i =1 il Yi | : Realize a small number of energetic groups Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 43/50 Boosting the rates Divide the columns of X into two sets : S1 : highly correlated, S2 : weakly correlated. Put S1 as ’group beginners’ or ’delegates’ (each of them has smallest altitude in its group) −→ γBA << γBG = γmax Realize a ’good cut’ off S1 and S2 , ensuring : T γBA ≤ γBG , 2 T /n ≤ γBG Fill the groups with affinity with thePdelegate PT in terms 2ofq p 1 PN Kl = | n i =1 Xil Yi | : indication of j=1 [ t=1 |β(j,t) | ] as small as possible Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 44/50 Back to electrical consumption 4 5 x 10 4.8 4.6 4.4 4.2 4 3.8 3.6 3.4 3.2 0 10 20 30 40 50 60 70 time Figure : French consumption Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 45/50 Temperatures Figure : French temperature spots Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 46/50 Dictionary 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 0 10 20 30 40 50 60 70 Figure : Functions of the dictionary. Constant (black-solid line), cosine (blue-dotted line), sine (blue-dashdot line), Haar (red-dashed line) and temperature (green-solid line with points) functions. Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 47/50 Delegates 6000 1 3 5000 E C S H T 4000 3000 2000 1000 0 11 3 0 Figure : 20 40 60 3 80 100 Correlations between the consumption signal and the various dictionary functions. Group : E = 0.75%, 24 regressors / 7 groups THS-THH-THH-TCS-TCST-HHTC-STSH. meaningful functions (8 : T ) ; (3 : C ) ; (5 : S) ; (8 : H). Individual, E = 1.86% 24 regressors : T-T-C-T-H-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-S-C and are meaningful functions (20 : T ), (2 : C ), (1 : S), (1 : H) . Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 48/50 Approximation 4 5 x 10 4.8 4.6 4.4 4.2 4 3.8 3.6 sig grol lol 3.4 3.2 0 10 20 30 40 50 60 70 Figure : Model of the consumption signal (black-solid line) using GRgreed (red-dahsed line) and I-greed (blue dot dashed line). Dominique Picard — Predicting the intraday-load curve Structure in high dimensional models 49/50