Datamining Methods for Demand Forecasting at NGT

advertisement
Datamining Methods for
Demand Forecasting
at National Grid Transco
David Esp
A presentation to the Royal Statistical Society
local meeting of 24 February 2005
at the University of Reading, UK.
UK Transmission
Contents

Introduction
 National Grid Transco
– The Company
– Gas Demand & Forecasting
 Datamining
– Especially Adaptive Logic Networks

Datamining for Gas Demand Forecasting






Framing the Problem
Data Cleaning
Model Inputs
Model Production
Scope for Improvement
Conclusions
Introduction to:
National Grid Transco
UK Transmission
National Grid Transco (NGT)



Part of the NGT Group (www.ngtgroup.com)
NGT Group has interests around the globe, particularly the US
NGT-UK consists of:
 National Grid (NG): Electricity transmission (not generation or distribution)
 Transco (T): Gas transmission
Introduction to:
Gas Demand and its Forecasting
at National Grid Transco
UK Transmission
Breakdown of Demand

National Transmission System (NTS)
 Many Large industrials
– Large industrials
– Gas-fired power stations
 13 Local Distribution Zones (LDZs)
– Mostly domestic
– The presentation will focus on models for this level onlY.
Forecasting Horizons



Within day - at five different times
Day Ahead
Up to one week ahead
Time (Gas Day)
Demand0600
Demand0500
Demand0400
Demand0300
Demand0200
Demand0100
Demand0000
Demand2300
Demand2200
Demand2100
Demand2000
Demand1900
Demand1800
Demand1700
Demand1600
Demand1500
Demand1400
Demand1300
Demand1200
Demand1100
Demand1000
Demand0900
Demand0800
Demand0700
Volume (mcm)
Gas Demand Daily Profiles
National Daily Demand Profile
16
14
12
10
2002 Summer Peak
8
2002 Summer Norm
2002 Winter Norm
6
2002 Winter Peak
4
2
0
What Factors Drive Gas Demand?

Weather





Thermostats
Heat leakage from buildings
Heat distribution in buildings (hot air rises)
Gas-powered plant efficiencies
Consumer Behaviour
 Season (e.g. stay indoors when dark)
 Holidays

Weather-Influenced Consumer Behaviour
 Perception of weather (actual or forecast)
 Adjustment of thermostats
Weather




Temperature ( 1ºC = 5 to 6%)
Wind ( above 10 Knots 1K = 0.5%)
Cooling Power - wind-chill (a function of wind and temperature)
( + Straight, delayed and moving average derivations of all the
above ).
Demand Temperature Relationships
Temperature Effects
20
1ºC = 3%
18
16
Demands
14
1ºC = 6%
12
10
1ºC = 2%
8
6
4
0
5
10
15
Temperatures
20
25
Demand
Seasonal Temperature Sensitivity
of Gas Demand
Change in demand for 1deg C, 1998-2001; LDZ WS
12%
0.60
10%
0.50
8%
0.40
6%
0.30
4%
0.20
2%
0.10
as % of demand
mcm
0%
0.00
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Millions Cubic Meters (mcm)
Percentage Change In Demand
WS
Consumer Behaviour



Seasonal Transitions (Autumn and Spring)
Bank Holidays (Typically -5 to -20% variation)
Adjust thermostats & timers in (delayed) response to weather.
 e.g. protracted or extreme cold spells


Weather Forecast Effects
Special Events
Introduction to Datamining:
What & Why
UK Transmission
Datamining

A generally accepted definition:
 “The non-trivial extraction of implicit, previously unknown and potentially
useful information from data”
[Frawley, Piatetsky-Shapiro & Metheus]

In practice:
 The use of novel computational tools (algorithms & machine power).
 “Information” may include models, such as neural networks.

A higher-level concept, of which Datamining forms a (key) part:
 Knowledge Discovery from Databases (KDD)
 Relationship: Knowledge > Information > Data
Datamining Techniques

What are they?
 Relatively novel computer-based data analysis & modelling algorithms.
 Examples: neural nets, genetic algorithms, rule induction, clustering.
 In existence since 1960’s, popular since 1995.

Why advantages have they over traditional methods?
 More automatic
– Less reliance on forecasting expertise.
– Fewer man-hours (more computer-hours)
 Potentially more accurate
– New kinds of model, more accurate than existing ones
– Greater accuracy overall, when used in combination with existing models
– Knowledge discovery might lead to improvements in existing models.
Core Methods & Tools

Data Cleaning
 Self-Organizing Map
 Used to highlight atypical demand profiles and cluster typical ones

Adaptable (Nonlinear Nonparametric) Modelling
 Adaptive Logic Network (ALN)
– Automatically produces models from data.
– “Better than a Neural Network”

Input Selection
 Genetic Algorithm (GA)
– Selects best combination of input variables for model
– Also optimizes an ALN training parameter - learning rate
Experience



1995-1999: Financial, electrical & chemical problems.
1999: Diagnosis of Oil-Filled Equipment (e.g. supergrid
transformers) by Kohonen SOM.
2000: Electricity Demand Forecasting
 Encouraging results
 Business need disappeared




2001-2: EUNITE Datamining competitions
2003: Gas Demand Forecasting Experiments
2004: Gas Demand Forecasting models in service
2005: More gas models, also focusing on wind power.
Introduction to Datamining:
Nonlinear Nonparametric Models
The core datamining method applied to gas demand
forecasting.
UK Transmission
Some Types of Problem
Linear - e.g. y=mx+c
50
40
30
20
Y axis

2X+10
10
0
-20
-10
-10
0
10
20
-20
-30
X axis
Non-Linear and Smooth
350
300
6000
250
4000
Y axis
 Monotonic - e.g.
 Non-Monotonic - e.g. y=x2
y=x3
X^2
X^3
8000
Y axis

2000
-20
-10
10
20
0
-20
-4000
40
Y axis
30
20
10
0
-10
0
-10
X axis
We might not know the
type of function in advance.
-10
0
X axis
50
-20

50
-2000
=MAX(2X+10, -5)
Discontinuous
- e.g. y=max(0,x)
150
100
0
X axis

200
0
10
20
10
20
Parametric Modelling
Linear (1st Order Polynomial) Fit
3rd Order Polynomial Fit
=MIN(X^3,2000)
3000
3000
2000
2000
1000
1000
0
-20
-10
-1000 0
10
20
Y axis
Y axis
=MIN(X^3,2000)
0
-20
-10
-1000 0
-2000
-2000
-3000
-3000
-4000
-4000
X axis
Trendline y = 128.19x - 83.929
10
20
X axis
Trendline y = 0.441x3 - 3.4641x2 + 59.287x + 168.14
Non-Parametric Modelling
One Linear Segment
Two Linear Segments
=MIN(X^3,2000)
3000
3000
2000
2000
1000
1000
0
-20
-10
-1000
0
10
20
Y axis
Y axis
=MIN(X^3,2000)
0
-20
-10
-1000
-2000
-2000
-3000
-3000
-4000
-4000
X axis
0
10
X axis
• Linear Segmentation is not the only non-parameterised technique.
• The key feature is growth - hence no constraint on degrees of freedom.
20
Non-Parametric Modelling
Three Linear Segments
Four Linear Segments
-20
-10
=MIN(X^3,2000)
3000
3000
2000
2000
1000
1000
0
0
-1000
0
10
20
Y axis
Y axis
=MIN(X^3,2000)
-20
-10
-1000
-2000
-2000
-3000
-3000
-4000
-4000
X axis
0
X axis
• No need for prior knowledge of the nature of the underlying function.
• The underlying function does not have to be smooth, monotonic etc.
10
20
Parametric Modelling Method
 A formula is known or at least assumed
– Typically a polynomial (e.g. linear).
– May be any kind of formula.
– Can be discontinuous.
 Model complexity is constrained
– Tends to make the training process robust and
data-thrifty.
– A model of complexity exactly as required by the
problem should be slightly more accurate than a
non parametric model, which can only
approximate this degree of complexity.
 Specialist regression tools can be applied for
different classes of function
– linear (or linearizable), smooth, discontinuous...
Parametric Modelling Method:
e.g. Multiple Linear Regression
 Advantages:
– Extremely fast both to ‘train’ and use
– If well-tailored to the problem, should give optimal
results.
 Disadvantages:
– Requires uncorrelated inputs
– Assumptions about data distributions
Non Parametric Modelling: Benefits

Advance knowledge of the problem is not required
 Domain-specific knowledge, though helpful, is not vital.
 No assumptions about population density or independence of inputs.

Model complexity is unconstrained
 Advantage: Model may capture unimagined subtleties.
 Disadvantages
– Training demands greater time, data volume and quality.
– Model may grow to become over-complex, e.g. fitting every data point.

Additional possibilities:
 Feasibility Study
– Determine if any model is possible at all.
 Knowledge Discovery
– Analyze the model to determine an equivalent parametric model.
Non-Parametric Modelling: Issues

Might not be completely flexible; learning algorithm may have limitations.
 We may need to partition the problem manually.
 The model might not generalize to the extent theoretically possible.



Much greater need for training data.
Can over-fit (resulting in errors): Extra measures needed to prevent this.
Longer training time (may not be an issue).
Introduction to Datamining:
Nonlinear Nonparametric Models:
Under, Optimal and Over Fitting
This section applies to many nonlinear nonparametric
modelling methods, not just neural networks.
UK Transmission
Example: Underlying (2-D) Function
A privileged view - we would not normally know what the function looked like...
z = 1000 sin(0.125 x) cos(4 /(0.2 y +1))
Undertrained Model
ALN model with 24 segments i.e. planes. Too angular (from privileged knowledge)
Optimally Trained Model
ALN model with 300 planes. Looks very similar to our defined function.
Overtrained Model
An ALN with 1500 planes “joins the dots” of the data instead of generalising.
Determining Optimality of Fit

The function is not known in advance
 Might be smooth, might be wrinkly - we don’t know.

What are our requirements on the model?
 What degree of accuracy is needed?
 Any constraints on shape or rates-of-change?

How do we assess the model’s quality?
 Test against a held-back set of data
 Analyze the model’s characteristics
– Assumes we know what to require or expect.
– e.g. Sensitivity to inputs (at various parts of the data space)
– e.g. Cross-sections (of each variable, for different set-points of the other
variables)
Traditional Cross-Validation
Validate on data that is randomly or systematically
selected from the same period as the training data.
Train on the training data (grey) until error is least on the cross-validation data (blue).
Actual use will be in the future (green), on data which is not yet available.
Future data
(unavailable)
Back-Validation
Validate on data that, relative to the training data, is as old as the future is new.
Train on the training data (grey) until error is least on the back-validation data (blue).
Reason: like the future data, the back-val. data is an edge.
Back-val.
data
Training (regression) data
Future data
(unavailable)
This method has been proven by experiment to be superior to
traditional cross validation for both gas and electricity problems.
Optimal and Over Training
This is deliberate over-training. The optimum point is where the (purple) Back-Validation (Backval) error
curve is at a minimum, namely Epoch 30. This agrees well with that of the Holdback (pseudo future) data.
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
2500
1500
1000
500
Epoch
Train
Backval
Holdback
LFs
73
69
65
61
57
53
49
45
41
37
33
29
25
21
17
13
9
5
0
No. of LF's
2000
1
Error (rmse)
Example ALN Training Trends
Introduction to Datamining:
Nonlinear Nonparametric Models:
Example Algorithms
UK Transmission
Machine Learning / Natural Computing /
Basis Function Techniques



Derive models more from data (examples) than from knowledge.
Roots in nature and philosophy
e.g. artificial intelligence & robotics.
but converging with traditional maths & stats.
Many types of algorithm.







Evolutionary / Genetic Algorithms
Neural Network (e.g. MLP-BP or RBF) - popular
Support Vector Machine - fashionable
Adaptive Logic Network - experience
Regression Tree
Rule Induction
Instance (Case) and Cluster Based
Introduction to Datamining:
Nonlinear Nonparametric Models:
Example Algorithms:
Neural Networks (ANNs)
Focussing on the Multi Layer Perceptron (MLP)
UK Transmission
Neural Networks - Brief Overview (1)
But how many neurons or layers? Repeatedly experiment (grow, prune…)
Neural Networks - Brief Overview (2)



Inspired by nature (and used to test it).
Output is sum of many (basis-) functions, typically S-shaped.
Each function is offset and scaled by a different amount.
 Very broadly analogous to Fourier etc.

Given data, produce its underlying model.
Neural Networks - Brief Overview (3)
Introduction to Datamining:
Nonlinear Nonparametric Models:
Example Algorithms:
Adaptive Logic Networks (ALNs)
UK Transmission
Main Advantages over ANNs

Theoretical
 No need to define anything like a number of neurons or layers
– ALNs automatically grow to the required extent.
– No need for ‘outer loop’ of experimentation (e.g. pruning)
 Basis functions are more independent, hence:
– easier and faster learning
– greater accuracy
– faster execution.
 Less “black-box” - can be understood.
 Function inversion - can run backwards.
Main Advantages over ANNs

Observed
 Better accuracy: sharper detail.
 Better training: faster, more reliable and more controllable.
Adaptive Logic Networks:
How they Work:
ALN Structure
UK Transmission
What is an ALN?

A proprietary technique developed by William
Armstrong, formerly of University of Alberta, founder
of Dendronic Decision Limited in Canada.
– WWW.DENDRONIC.COM

A combined set of Linear Forms (LFs)
 An LF: y=offset+a1x1+a2x2+...
 An ALN initially has one LF - making it the same as
normal linear regression
 After optimizing its own fit, each LF can divide into
independent LFs.

ALNs are generated in a descriptive form that can
be translated into various programming languages
(e.g. VBA, C or Matlab).
Minimum (Min) & Maximum (Max) Operators in ALNs
y = Min(a,b,c) - lines cut down
y = Max(a,b,c,d) - lines cut up
y
y
d
b
c
a
a
b
c
x
x
Output:
Linear Forms:
(regressions)
y
y
Max
Min
a
w x
i
Inputs:
i
x1
x2
w x
w x
w x
i
a
c
b
i
i
x3
...
i
i
c
b
i
c
w x
w x
i
i
w x
i
i
i
xn
x1
x2
x3
...
xn
i
Min & Max Combined
y c
a
b
Output: y
e
Max
f
g
LeftHump
RightHump
Min
Min
d
x
LeftHump = Min(a,b,c)
RightHump = Min(d,e,f,g)
a
Linear
wx
y = Max(LeftHump,RightHump) Forms: 
i i
c
b
w x
i i
w x
Inputs:
w x
w x
i i
x1
e
d
i i
i i
x2
x3
...
xn
ALNs are Trees of Linear Forms

More Complex “Trees” are Possible
 Can grow to any number of layers, any number of linear forms.
 During training, each “leaf” - linear form - can split into a min or max
branch.
 Later in training, leaves can be recombined as necessary.

Tree complexity can be limited by
 Tolerance - a sufficiently accurate leaf won’t split any further.
– Can be fixed or varying across the data space
 Direct constraint - e.g. max. depth = 5.
 Indirectly, by stopping training at minimum validation error
Introduction to Datamining:
Nonlinear Nonparametric Models:
Example Algorithms:
ALNs vs. MLPs: Simple Demo
Demonstration of ALN benefits through a trivial example.
UK Transmission
Artificial Problem:
With smooth regions and a sharp point
Neural Net - 4 Hidden Neurons
Handicapped ALN:
Tolerance=0.6  4 Linear Forms
Neural Net - Further Training
Unhandicapped ALN:
Offset is simply for clarity of presentation
Adaptive Logic Networks:
How they Work:
Further Details
UK Transmission
A Snapshot of Training
y = Max(LF1,LF2,LF3)
LF3
LF1
LF2
A data point is presented. It pulls the
linear form it influences towards itself
(by “learning factor” proportion).
Side-effect: Orange
points no longer
influence that LF,
but will now pull up
the other two LFs.
ALN Learning: LF Splitting
Output axis
If repeated adjustments of a given LF fail to reduce error below
Tolerance, the LF splits into two and the process is repeated for
each one independently. Due to random elements of training,
they “wander apart” to cover different portions of the data space.
Input axis
Recap: ALN Structure

y
During training ALNs can grow into
complex trees.
MIN
 Branches are Max and Min
operators.
 Leaves are Linear Forms.


Trees can be of any depth. The
one shown here is just a simple
example.
Transformation may be possible
into a more efficient form where
initial branches are if..then rules.
Output
MAX
w x
i
MAX
w x
i
i
x1
x2
w x
i
i
x3
...
w x
i
i
xn
i
Inputs
ALNs can be “Compiled” into DTRs
Example: For x in
this interval
only pieces 4 and 5
play a role.
1
2
Min(1,2)
3
Min(2,3,4)
4
5
Min(4,5)
6
Min(5,6)
x
Input axis x
Bagging - Averaging Several ALNs


A very simple way to improve accuracy
Applicable to any set of diverse models having same goal
– For example standard MLP neural nets


For ALNs, diversity arises through random number generator affecting
the training process e.g. the order in which data are presented.
“BestMean” is a proven refinement
 e.g. reject results outside 2 * stdev then compute the new mean
Model Development
How datamining methods were brought to bear on our gas demand
forecasting problem.
UK Transmission
Stages of Model Production


Framing the Problem
Data Preparation
 Data Cleaning
 Derived Variables, Partitioning.



Input Selection
ALN Training
Implementation in Code
 Conversion of the ALN to a convenient programming language.

Quality Assessment
 User-testing in the target environment.
Model Development:
Framing the Problem
UK Transmission
How should we frame the problem?
We are ‘in a vacuum’ here, so we need to guess or preferably experiment.

Hourly or daily?
 The main requirement is for daily total demand
 Summing hourly demands tends to give greater accuracy.

Absolute or relative?
 But d(Demand)/d(Temperature) varies with Temperature



One big model for all LDZs, all-year round?
Separate models for each LDZ?
Split the year into parts or just flag or normalize each part?
 What parts? GMT/BST Seasons? Christmas? Easter?
– Try clustering, make a model for each cluster
– Also try experiments based on intuition & guesswork
Traditional framing of the problem




Daily totals
Linear relationships
Only model standard days - employ normalization (adjustment
factors) for special days such as bank holidays.
Compute the change in demand
New framing of the problem
Based on experience & intuition




Hourly totals (daily = sum of hourlies)
Nonlinear relationships
Model all days - no need for normalization (adjustment factors).
Absolute demand
Experience: Clustering of Electricity Profiles
Kohonen SOM - as implemented in Eudaptics’ Viscovery SOM-Mine
Coloured areas are clusters, each with a distinctive daily demand profile.
Red text is our interpretation.
Clustering of Gas Profiles
…not such a detailed picture as for electricity...
Jan &
Dec
Jan Feb
Mar & Nov
Apr May
& Oct
June July
Aug & Sept
Yellow-ish areas indicate similar profiles,
Red-ish areas indicate more varying profiles.
Find the Best Structure for the Model
By experiment...
Experiments (on one typical LDZ):




One model for the whole year
Separate models for each of four clusters
Separate models for the GMT, BST and ‘Xmas & New Year’
periods
Separate models for GMT and BST, experimenting with various
types of indicator for the Xmas-NY period
 straight flags & fuzzy flags
 THIS PRODUCED THE BEST RESULTS
Final Structure for the Model

Produce separate models for each season of each LDZ.
 Two seasons: GMT & BST
– The Easter and Xmas-NY periods are indicated by separate fuzzy flags.
 13 LDZs

Each model will contain a Bag of 10 ALNs
 Bag returns BestMean of the 10 ALNs
 Bestmean rejects results outside 2 * stdev

Thus 260 ALNs need to be produced.
Model Development:
Data Preparation:
Data Cleaning
UK Transmission
Data Cleaning

Data Problems
 Some “actual” demands are unrealistic.
 Atypical demands are not useful for training.

Detection Method
 Viscovery - commercial Kohonen / SOM tool
– Was used to highlight unusual profiles.
 Also manually checked & plotted ranges and profiles in Excel.
Greater Requirement for Data Quality

Our models may be more demanding than traditional ones in terms of
data quality.
 Since our models are non parametric, they may be more susceptible to
glitches in the data (may try to model them).

It is possible that the available data will not meet our quality
requirements.
 The existing data is clean in respect of daily totals, but hourly figures are
traditionally less important.
Bad Profile Detection
Once again, making use of Eudaptic’s Viscovery SOM-Mine





Arguably the best possible two-dimensional representation of an
n-dimensional problem.
The aspect ratio is based on 1st two principal components. It
shows the main ‘shape’ of the problem.
Outlier profiles (possible errors) show up as red “blemishes”
Yellow-ish areas are groups of similar profiles
Red-ish areas indicate abnormalities.
Bad Profile - Positive Glitch
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
3
5
7
9
11
13
15
17
19
21
23
Bad Profile - Negative Glitch
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
3
5
7
9
11
13
15
17
19
21
23
Bad Profile - “Wobble”
1.2
1
0.8
0.6
0.4
0.2
0
1
3
5
7
9
11
13
15
17
19
21
23
Bad Profile - Clock-change Artefact
1.2
1
0.8
0.6
0.4
0.2
0
1
3
5
7
9
11
13
15
17
19
21
23
Model Development:
Data Preparation:
Model Inputs
UK Transmission
Data Preparation:
Derive additional variables as possible inputs



Think up as many candidate inputs as possible
Anthropomorphize: “Think like an ALN”
Sine and Cosine of Day and of Year.
 Represent and maintain cyclic nature of diurnal and annual cycles.
 Annual gas cycle is approximately a sine wave (obvious knowledge).




Moving-average of Temperature
Cooling Power (wind chill)
“Days Since 1 April 1990” (basis for spotting long term trends)
Fuzzy-Flags (special periods)
 These merely highlight the incidences of special days
 They do not indicate demand effects
Input Selection (1)

Around 60 potential inputs
 Implies 260 possible choices.
 Too many for exhaustive search.
 Systematic search may be infeasible
– The search-space may be rough.
– Inputs may ‘interact’, especially in an unknown nonlinear model.
 In previous projects, standard methods such as correlation-based input
selection or adding or pruning inputs one at a time have failed to find the
optimum selection.

The chosen selection method
 Genetic Algorithm
– Proven ‘jack of all trades’ discrete optimization method
– Fitness function based on training and testing ‘disposable’ ALNs.
Input Selection (2)
No simple consistent method - too many interactions and nonlinearities - use a genetic algorithm.
Unsurprisingly, inputs having greatest correlation to the output were chosen by the GA. However, below a
certain threshold of correlation, the correspondence is less: the GA chose some inputs having tiny correlation
instead of other inputs of greater correlation.
Relative Importance of Each Input, According to Different Views
0.2
Measure of Importance
0.18
0.16
0.14
0.12
Rel. Correl
Rel. Imp
Chosen by GA
0.1
0.08
0.06
0.04
0.02
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Inputs
Only 32 choices in this example. The small black stumps indicate inputs chosen by the GA.
Input Selection (3):
Genetic Algorithm (GA)
Inspired by Darwin’s Theory of Evolution

Our GA:
 Around 100 generations of 50 individuals, initially random.
– An individual is a specific choice of inputs.

Reproduction
 Crossover (mating)
– Make a new individual by combining randomly selected features from some
of the fittest existing individuals.
 Mutation (small random changes)
– Invert one or more decisions as to which inputs to use.

Survival of the Fittest
 The fitness of an individual is assessed by training an ALN with the
given input selection, then testing it on separate test data.
– Actually we train and average the results of a few ALNs.
Input Selection (4a):
Genetic Algorithm: The Principles:
Survival of the Fittest
Survivors plus their offspring (produced by crossover & mutation)
Input Selection (4b):
Genetic Algorithm: The Principles:
Crossover
Input Selection (4c):
Genetic Algorithm: The Principles:
Mutation
Input Selection (4d):
Genetic Algorithm: The Principles:
Overall Loop
Model Development:
Model Production
UK Transmission
ALN Training

Tool: AlnFit-NGT
 Source code adapted from Dendronic Decisions Limited.
 Underlying Dendronic Learning Engine (a standard DLL).

Method: Back-Validation
 Oldest year of data used for validation.
 Most recent years of data used for training.
 Train to the point (epoch) of least error on validation data.
Implementation in Code


Automatically translate descriptive form to VBA
Ultimately implement as a set of ActiveX DLLs
 Topmost: a ‘Wrapper’ DLL
– Provides a standard interface to the user-program.
– Generates derived inputs
– Decides which model to run (based on LDZ & time of year).
 ALNs DLLs (one for GMT, one for BST)
 Contain LDZ-specific models as Classes
 Type Definitions DLL
Scope for Improvement
UK Transmission
Remaining Technical Issues - 1

Knowledge Refinement
 Find the best way to use recent demand or demand error

Improved Weather Inputs
 Wind direction
 >1 weather station in same LDZ

Refinement of our Methods and Tools:
 Automatic data error detection
 Genetic Algorithm - make it more robust and efficient (e.g.
distributed)
 ALN training improvements
Remaining Technical Issues - 2



Metrics
 Needed for model optimization and quality assessment
 Different metrics targetted at model developer and user?
Kinds of Metrics
 Traditional MAPE and Max. Abs. Error
 Propose Median Abs. Error and Ave of top-10% Abs. Errors
 For comparability, normalize by St.Dev ?
Data Sampling and Input Selection
 Is there a better way? WAID?
Future Development

Refinements:
 Within-Day Fixer (part-developed).
 Arbitrary-Horizon Fixer.
 Kalman Filter (on-line adaption).

Future Problems:
 National gas demand
 Windpower

Wish:
 ‘Hands off’ Model Development Server
Conclusions
UK Transmission
Conclusions

Regarding NGT:
 NGT have made effective use of datamining methods for electricity and gas
demand forecasting.
– Quick & dirty ‘feasibility’ models
– Longer development high-accuracy production models
 When run in combination with existing models, the overall accuracy is
improved
– With financial benefits !

More General Lessons:
 ALNs are great!
 For such problems, back-validation is better than cross-validation.
- End -
Any Questions?
UK Transmission
Datamining-Based
Gas Demand Forecasting Models


Phase-I Models in service since July 2004
Phase-II Models
 GMT Models in service since January 2005
 BST Models currently under development (for March 05)

Phase II Enhancements:
 More intensive Genetic Algorithm (GA) runs
– Greater number of generations
– Greater mutation probability
– Greater choice of inputs
 Individual GA runs for each LDZ
(hence potentially different input variables)
 Methodology verified by experiment
Download