ENE-C3002_forecasting_VB

advertisement
Forecasting
V Bandi and R Lahdelma
1
Forecasting?
• Decision-making deals with future problems
-
Thus data describing future must be needed
• Representation of what occurs in future
Strategic
decisions
Tactical
decisions
Operative
decisions
Time
Hours
Days
Weeks
Months
Year
5 Years
V Bandi and R Lahdelma
20-50 Years
2
Time horizons of Forecast
• Depending on the purpose, the time horizon may
differ
- Operational planning
• Day – week level
- Tactical planning
• Week – month– year
- Strategic planning
• Year – 10 year – 50 years
V Bandi and R Lahdelma
3
Requirments of forecasting model
• Sufficient accuracy
- Depends on purpose of the forecast
• Operative decisions requires high degree of accracy
• Necessary Input data availability
- Having access to real data is always a challenge
•
Model must be easy to update and maintain the
model
- when the system changes
- Not overly complex and specialized
V Bandi and R Lahdelma
4
Different approaches to forecasting
• Theory-oriented
- The laws of physics determine how the system behaves;
therefore the model is formed based on theoretical laws
• Example: Heat is transferred through radiation, conduction and
convection...
• Data-oriented
- History data is analyzed in order to find out dependencies
• Requires applied mathematical techniques
V Bandi and R Lahdelma
5
Different approaches to forecasting
• In practice it is wise to use both (theortical and dataoriented) approaches together
- forecast model structure is planned based on theory but the
parameters are estimated from history data
- Sometimes observing the data can reveal dependences that
are otherwise missed in theoretical analyses
- Understanding the laws of physics allows making the model
more generic and accurate
V Bandi and R Lahdelma
6
Let us try some simple forecast models
V Bandi and R Lahdelma
7
Forecasting demand for Cars
• The demand for Toyota cars over first six months in
helsinki region is summarized in following table.
Forcast the demand for car in next 6 months.
Month
Number
of units
Jan
46
Feb
56
Mar
43
Apr
43
May
60
Jun
72
V Bandi and R Lahdelma
8
Forecast demand for cars
• Simple modeling techniques
- Based on a averages, weighing averages
• In the example, dependency between month and net
unit of sales is hard to identify
- It is very difficult to forecast accurately
V Bandi and R Lahdelma
9
Forecasting applications in Engineering
• Planning and optimization
- example: coordination of cogeneration
• Simulation
- Planning new systems
- Improving existing systems
- To understand the behavior of systems
V Bandi and R Lahdelma
10
Forecasting methods
• Based on averages
- Moving averages
• Smoothing techniques
• Regression
- Linear regression
• In simplest form: Y = aX+b
• Y dependent variable, X independent variable
- Non-linear regression
- Dynamic regression
• Neural networks and many more
V Bandi and R Lahdelma
11
Regression analysis
• A regression analysis is for forecasting one variable
from another
- we must decide which variable will be independent variable
and which is dependent variable Y
- This choice is usually motivated by a theory or hypothesis of
causality
• The alleged “cause” is X and the alleged “effect” is Y
V Bandi and R Lahdelma
12
What regression does
• What regression does
- A regression analysis produces a straight line that estimates
the average value of Y at any specific value of X
- Example: Heat demand forecast in a year based on out door
temperature yt = a0+a1xt
• a0= 261 MW
• a1= -11.3 MW/Co
- The curve fits badly at high temperatures
• therefore it is misaligned also for cold temperatures
V Bandi and R Lahdelma
13
Forecast regression model
•
The model aims to explain the behavior of the
unknown quantity y in terms of known quantities x,
parameters a and random noise e
- y = f(x, a) + e
• The structure of the model (shape of function f()) can
be determined based on theory, based on intuition or
by exploring history data
-
The parameters a are estimated from history data so that
the noise e is minimized
• When the model has a good structure, e is white noise
•
Forecasting models can be classified according to
the shape of function f
V Bandi and R Lahdelma
14
Linear Regression model based on one
dependent and one independent variable
•
A model where a single dependent variable y is
explained by a single independent variable x is fitted
to history data
- yt = a0+a1xt, where t= 1,...,T
• This is a linear equation system with two unknowns
- The equation can be solved in the least squares sense (2norm)
- To solve it we augment it with a error variable et
V Bandi and R Lahdelma
15
Linear Regression model, Determining
parameters
• We seek for parameters a values that minimize the
square sum of the error variables
2
Min 𝑒12 + 𝑒21
+ β‹― + 𝑒𝑑2 s.t.
𝑒𝑑2 = π‘Žπ‘œ + π‘Žπ‘‘ π‘₯𝑑 , 𝑑 = 1, … , 𝑇
• If we introduce the vector/matrix notations,
π‘Ž0
𝒂= π‘Ž
1
𝑒1
.
𝒆= .
𝑒𝑇
𝑦1
.
π’š= .
𝑦𝑇
1
.
And 𝑿 =
.
1
V Bandi and R Lahdelma
π‘₯1
.
.
π‘₯𝑇
16
Linear Regression model, Determining
parameters
• The problem in vector/matrix format
𝑀𝑖𝑛 𝒆𝑇 𝒆 𝑠. 𝑑 𝒆 = 𝑿𝒂 − π’š
•
Substituting e into the objective function yields an
unconstrained optimization problem
Min (Xa – y)T(Xa – y) = aTXTXa – 2aTXTy + yTy
• Derivative w.r.t to a gives the solution
2XTXa – 2XTy = 0
a = (XTX)-1 XTy
V Bandi and R Lahdelma
17
Generalizations of linear regression
Multiple independent (explaining)
variables
• Linear regression model with multiple parameters xi
yt = a0+a1x1,t+a2x2,t+….+anxn,t , where t= 1,...,T
• Now there are more unknown parameters a and the
X-matrix becomes wider
i
π‘Ž0
π‘Ž1
𝒂= .
π‘Žπ‘›
•
𝑒1
.
𝒆= .
𝑒𝑇
𝑦1
.
π’š= .
𝑦𝑇
1
.
And 𝑿 =
.
1
π‘₯11
.
.
π‘₯1𝑇
.
.
.
.
.
.
.
.
π‘₯𝑛1
.
.
π‘₯𝑛𝑇
The matrix formulas and solution remain the same
𝑀𝑖𝑛 𝒆𝑻 𝒆 𝑠. 𝑑 a = (XTX)-1 XTy
V Bandi and R Lahdelma
18
Heat Demand Forecast
• Heat demand depends
- Weather
• Outside temperature, wind, solar radiation, seasons
- Building properties
- Residents behavior
• Forecasting requires identification of independent
variables
V Bandi and R Lahdelma
19
Heat Demand Forecast
• Accurate heat demand forecast
- Weather, resident behavior, building properties can be
considered as independent variables
- Forecast modelling with all independent variables requires
data
• Obtaining data is challenging
• According to previous studies, outside temperature
has most influence on heat demand
V Bandi and R Lahdelma
20
Heat demand forecast using Regression
based on outside temperature
• Dependent variable
- Heat consumption (historical data)
• Independent variable
- Outside temperature (historical data)
• Forecasting model yt = a0 + a1xt
• The curve fits badly at high temperatures, therefore it
is misaligned also for cold temperatures
V Bandi and R Lahdelma
21
Standard Deviation (SD) or RMSE (RootMean-Squared-Error)
• The square root of the mean/average of the square of
all of the error
- The use of SD or RMSE is very common and it makes an
excellent general purpose error criteria for forecasts
• stdev(e) = sqrt(eTe/T)
V Bandi and R Lahdelma
22
Forecast based on outdoor temperature
Forecast vs actual for sample week
The forecast is on good on average, but does not quite satisfactory,
RMSE (Root-Mean-Squared-Error) or standard deviation for annual
forecast is 20%
V Bandi and R Lahdelma
23
Forecast based on outdoor temperature
Forecast vs actual for sample week
• RMSE = 20% (out of average demand)
- not a good forecast
• Reason for low accuracy
- Outside temperature alone cannot explain heat consumption
completely
- Outside temperature alone cannot explain heat consumption
completely. This can be explained by correlation coefficient
between outside temperature and heat consumption
V Bandi and R Lahdelma
24
Correlation coefficient
•
The correlation coefficient is a number between -1 and 1 that indicates
the strength of the linear relationship between two variables
- Very strong positive linear relationship between X and Y
• r ≈ 1:
- No linear relationship between X and Y. Y does not tend to
increase or decrease as X increases.
• r ≈ 0:
- Very strong negative linear relationship between X and Y. Y
decreases as X increases
• r ≈ -1
•
The sign of r (+ or -) indicates the direction of the relationship between
X and Y. The magnitude of r (how far away from zero it is) indicates the
strength of the relationship.
V Bandi and R Lahdelma
25
Correlation coefficient
V Bandi and R Lahdelma
26
Correlation between outside temperature and
heat consumption for a single building
• Correlation coefficient for a building r = -0.956
- Strong negative relation ship
- Model could have been more accurate if r = -1
V Bandi and R Lahdelma
27
Residents behavior in a building
• People behavior usually have a rhythm (a strong,
regular repeated pattern)
• Lets hypothesis residents behavior has similar
rhythm or on weekdays (Monday to Friday) and
weekends (Saturday and Sunday)
• Let us modify the forecast model using these week
rhythms
V Bandi and R Lahdelma
28
Modified Forecast Model
• Original forecast model
yt = a0 + a1xt
• y intercept a0 also has a negative on accuracy, as it
influences the forecast being a constant
• Modified forecast model
yt = ah(t) + a1xt
Where π‘Žβ„Ž(𝑑) is a social component based on weekly rhythm
ah(t) = a0 − π‘Žπ‘£π‘’π‘Ÿπ‘Žπ‘”π‘’(e𝑑| β„Ž = β„Ž(𝑑)) β„Ž = 1,2, … , 168
V Bandi and R Lahdelma
29
Forecast based using weekly rhythm
RMSE (Root-Mean-Squared-Error) or standard deviation for annual forecast is 13%
V Bandi and R Lahdelma
30
Improving accuracy of the model
•
The weekly rhythm model does not consider that
some weeks and days are different
- E.g. during holiday seasons, religious holidays etc the
demand is different from the normal weekday
• The days can be classified e.g. working day,
Saturday, holiday
•
Possible to include more independent variables
- solar radiation, wind speed and direction, cloudiness, ...
• In general these affect the precision only a little
•
History data from multiple years
- Weighted regression – recent history can obtain more weight
V Bandi and R Lahdelma
31
Download