Time Series

advertisement
WK4 – Radial Basis Function Networks
Contents
Time Series
Prediction
TS & NNs
CS 476: Networks of Neural Computation
WK4 – Radial Basis Function
Networks
RBF Model
Conclusions
Dr. Stathis Kasderidis
Dept. of Computer Science
University of Crete
Spring Semester, 2009
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Contents
•Introduction to Time Series Analysis
Contents
Time Series
Prediction
•Prediction Problem
•Predicting Time Series with Neural Networks
TS & NNs
•Radial Basis Function Network
RBF Model
•Conclusions
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis
•There are two major classes of statistical problems:
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Classification
problems (given an input x find in
which of a set of K known classes it belongs to);
•Regression
problems (try to build a functional
relationship between independent and regressed
variables. The former are the effects, while the latter
are the the causes).
•The regression problems are created due to the
need for:
•Explanation
•Prediction
•Control
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis II
Contents
Time Series
•In a regression problem, there are two high-level
issues to determine:
•The
nature of the mechanism that generates
TS & NNs
the data (stochastic or deterministic). This
affects which class of models he will use use;
RBF Model
•A
Prediction
modelling procedure.
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis III
•
Contents
Time Series
Prediction
A modelling procedure includes usually the
following steps:
1.
Specification of a model :
•
If it describes a function or a probability
distribution;
RBF Model
•
If it is linear or non-linear;
Conclusions
•
If it is parametric or non-parametric;
•
If it is a mixture or a single function;
•
It it includes time explicitly or not;
•
It it include memory or not.
TS & NNs
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis IV
2.
Contents
Preparation of the data:
•
Noise reduction;
•
Scaling;
•
Appropriate representation for the target
problem;
RBF Model
•
Transformations
Conclusions
•
De-correlation (cleaning up spatial or
temporal correlation structure)
•
Feature extraction
•
Handling missing values
Time Series
Prediction
TS & NNs
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis V
3.
Contents
An estimation procedure (i.e. a framework to
estimate the model parameters):
Time Series
•
Maximum Likelihood estimation;
Prediction
•
Bayesian estimation;
TS & NNs
•
(Ordinary) Least Squares;
RBF Model
•
Numerical Techniques used in the
estimation framework are:
•
Optimisation;
•
Integration;
•
Graph-Theoretic methods;
•
etc
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis VI
•Availability
of data:
Contents
•Enough
Time Series
•Quality;
Prediction
•Resolution.
TS & NNs
RBF Model
Conclusions
•Resulting
must be:
in number;
estimators created by the framework
•Un-biased
(i.e. do not systematically differ
from the true model in a statistical sense);
•Consistent
(i.e. as the number of data grows
the estimator approaches the true model with
probability 1).
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis VII
4.
Contents
Time Series
A model selection procedure (i.e. to select the
best model). Factors include:
•
Goodness of Fit (i.e. how well fitted first the
given data);
•
Generalisation (i.e. how well approaches the
underlying data generation mechanism);
•
Confidence Intervals.
Prediction
TS & NNs
RBF Model
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Introduction to Time Series Analysis VIII
5.
Contents
Time Series
Testing a model:
•
Testing the model in out of sample data;
•
Re-iterate the modelling procedure until we
produce a model with which we are satisfied;
•
Compare different classes of models in order
to find the best one;
•
Usually we select the simplest class which
describes well the data;
•
There is not always available a comparison
framework among different classes of
models.
Prediction
TS & NNs
RBF Model
Conclusions
•
Neural Networks are semi-parametric, nonlinear statistical modelling techniques
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Def: A time series, {Xt}, is a family of real-valued
random variables indexed by t. The index t can take
values in  or .
•When a family of variables is defined in all points in
time it is called continuous, otherwise it is called
discrete.
•In practice we have always a discrete series due to
discrete sampling times of a continuous series or due
to digitization.
•The length of a series is the time elapsed between
the recoded start and finish of the series.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem II
Contents
Time Series
Prediction
TS & NNs
RBF Model
•Def: A time series, {Xt}, is called (strictly) stationary
if, for any t1, t2,…, tn  I, any k  I and n=1,2,…



Pxt1 , xt2 ,...,xtn xt1 , xt 2 ,..., xtn  Pxt1k , xt2k ,...,xtn k xt1 , xt 2 ,..., xtn
Where P denotes the joint distribution function of the
set of random variables which appear as suffices and I
is an appropriate indexing set.
Conclusions
•Broadly speaking a time series is stationary if there is
no systematic change in mean, if there is no
systematic change in variance, and if strictly periodic
variations have been removed.
CS 476: Networks of Neural Computation, CSD, UOC, 2009

The Prediction Problem III
Contents
Time Series
Prediction
TS & NNs
RBF Model
•In classical time series analysis we decompose a
time series to the following components:
•A trend (a long term movement);
•Fluctuations about the trend of grater or less
regularity;
•A seasonal component;
•A residual (irregular or random effect).
Conclusions
•Typically probability theory of time series examines
stationary series and investigates residuals for further
structure. However, in other cases we may be
interested in capturing the trend (i.e. function
approximation).
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem IV
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•It is assumed that if the residuals do not contain any
further structure, then they behave like an IID
(identical and independent distributed) process which
usually is assumed to be the normal. Such a stochastic
process cannot be modelled further, thus the analysis
of a time series terminates;
•If on the other hand the series contains more
structure, we re-iterate the analysis until the residuals
do not contain any structure.
•Tests to use for checking the normality of the
residuals are:
•Kolmogorov-Smirnov test;
•BDS test, etc;
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem V
Contents
Time Series
Prediction
TS & NNs
RBF Model
•If the structure of the series is linear then we fit a
linear model such as ARMA, or if it is non-stationary
we fit the ARIMA model.
•On the other hand for non-linear models we use the
ARCH, GARCH and neural network models. Typically
we fit first the linear component with a linear model
and then the residuals with a non-linear model.
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem VI
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Usually a time series does not have all the desirable
statistical properties so we transform it in order to
achieve better results before we start the analysis.
Typical transforms include:
•Stabilise the variance;
•Make seasonal effects additive;
•Make the data normally distributed;
•Filtering (FFT, moving averages, exponential
smoothing, low and high-pass filters, etc)
•Differencing (the preferred method for detrending. We apply differencing until the time
series becomes stationary).
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem VII
•Restating the prediction problem:
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
“We want to construct a model with an appropriate
technique, which when is estimated can give 'good'
forecasts in new data. The new data commonly are
some future values of the series. We want the model
to predict as accurately as possible the future values
of the time series, given as input some previous
values of the series”.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem VIII
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•There are three main approaches which are used to
model the series prediction problem:
•A. Assume a functional relationship as a generating
mechanism. E.g. Xt+1 = F(Xt), where Xt is an
appropriate vector of past values and F is the
generating mechanism;
•B. Assume that the map F has multiple braches. Then
the returned output represents the probability of
obtaining Xt+1 in any one of the branches of F.
•C. Divide the input to a set of classes and try to learn
the map from input to classes, I.e. a classification
problem.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks
Contents
Time Series
•To apply a neural network model in time series
prediction we we have to make choices on the
following issues:
Prediction
•Preparing the data:
TS & NNs
•Transforming
RBF Model
•Handling
Conclusions
•Smoothing
•Scale
the data (see above);
missing values;
the data (if needed);
the data (almost always a good idea!);
•Dimensionality
reduction (principal component
analysis, factor analysis);
•De-correlating
•Extracting
data
Features (I.e. combination of variables)
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks II
•Representing variables:
Contents
Time Series
•Continuous
or discrete;
Prediction
of variables (i.e. probabilities,
categories, data points, etc);
TS & NNs
•Distributed
RBF Model
•Variables
Conclusions
•Semantics
or atomic representation;
with little information content can be
harmful in generalisation;
•In
Bayesian estimation the method of Automatic
Relevance Determination can be used for
selecting variables;
•Selecting
•Capturing
Features
of causal relations
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks III
•Discovering ‘memory’ in the generating process:
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Trial
and error;
•Partial
•Mutual
+ Auto-correlation functions (linear);
Information function (non-linear);
•Methods
from Dynamical Systems theory;
•Determination
of past values by fitting a model
(e.g. linear) and eliminating past values with small
contribution based on sensitivity.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks IV
•Selecting an architecture:
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Type
of training;
•Family
of models;
•Transfer
function;
•Memory;
•Network
•Other
•Model
Topology;
parameters in network specification.
selection:
•See
discussion in WK3
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks V
•Determination of Confidence Intervals:
Contents
•Jacknife
Time Series
Method (a linear approximation of
Bootstrap)
Prediction
•Bootstrap;
TS & NNs
•Moving
RBF Model
•Bootstrap
t-interval;
Conclusions
•Bootstrap
percentile interval;
Blocks Bootstrap;
•Bias-corrected
and accelerated Bootstrap.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks VI
•
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
Additional Literature:
1. Masters T. (1995). Neural, Novel & Hybrid
Algorithms for Time Series Prediction, Wiley.
2. Pawitan Y, (2001). In all Likelihood: Statistical
Modelling and Inference Using Likelihood, Oxford
University Press.
3. Chatfield C. (1989). The analysis of time series. An
introduction. 4th Ed. Chapman & Hall.
4. Harvey A (1993). Time Series Models, Harvester
Wheatsheaf.
5. Efron B., Tibshirani R. (1993). An introduction to
Bootstrap, Chapman and Hall.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model
Contents
•There are only three layers: Input, Hidden and
Output. There is only one hidden layer.
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model II
Contents
Time Series
Prediction
TS & NNs
•The hidden layer provides a non-linear transformation
of the input space to the hidden space, which is
assumed usually of high enough dimension.
•The output layer combines in a linear way the
activations of the hidden layer.
RBF Model
Conclusions
•Note: The RBF model owns its development on ideas
of fitting hyper-surfaces to data points in a highdimensional space.
•In Numerical Analysis, radial-basis functions were
introduced for the solution of real multivariate
interpolation problems.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model III
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•In the RBF model the hidden units provide a set of
“functions” that constitute an arbitrary “basis” for the
input patterns when they are expanded to the hidden
space.
•The inspiration for the RBF model is based on
Cover’s theorem (1965) on the separability of
patterns:
“A complex pattern-classification problem cast in a
high-dimensional space nonlinearly is more likely to be
linearly separable than in a low-dimensional space”.
•This leads to consider the multivariable interpolation
problem in high-dimensional space:
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model IV
Contents
Time Series
Prediction
Given a set of N different points {xi Rm0| I=1,2,..,N}
and a corresponding set of N real numbers {di R1 |
I=1,2,…,N}, find a function F:RN  R1 that satisfies
the interpolation condition:
F(xi)= di , I=1,2,…,N
TS & NNs
RBF Model
Conclusions
For strict interpolation the interpolating surface, i.e. F,
is constrained to pass through all data points.
•The radial-basis function (RBF) technique consists of
choosing a function F that has the following form:

F ( x) 
N


 wi (|| x  xi ||)
i 1
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model V
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
Where {(||x-xi||) | I=1,2,…,N} is a set of N arbitrary
functions, known as radial-basis functions, and ||•||
denotes a norm, which is usually the Euclidean. The
data points xi Rm0 are taken to be the centers of the
radial-basis functions.
•Assume that d describes the desired response vector
and w is the linear weight vector. N is the size of the
training set. Let  denote an N x N matrix with
elements:
ij = (||xj-xi||) , (j,i)=1,2,..,N
 is called the interpolation matrix.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model VI
•Thus according to the above theorem we can write:
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
w = x
The solution for the weight vector is:
W = -1x
Assuming that  is non-singular. The Micchelli’s
Theorem provides assurances for a set of functions
that create non-singular matrix :
Let {xi}i=1N be a set of distinct points in Rm0 . Then the
N x N interpolation matrix  is nonsingular.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model VII
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Functions that are covered by Micchalli’s theorem
include:
•Multiquadrics:
(r)=(r2 + c2)½ c>0, r R
•Inverse Multiquadrics:
(r)=1/(r2 + c2)½ c>0, r R
•Gaussian functions:
(r)=exp(-r2/22) >0, r R
•All that is required for nonsigular  is that the points
x be different.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model VIII
Contents
Time Series
Prediction
TS & NNs
RBF Model
•Universal Approximation Theorem for RBF
Networks:
For any continuous input-output mapping function
f(x) there is an RBF network with a set of centers
{ti}i=1m1 and a common width >0 such that the inputoutput mapping function F(x) realized by the RBF
network is close to f(x) in the Lp norm, p  [1,].
Conclusions
The RBF network is consisting of functions F: Rm0  R
represented by:
 
m1
x  ti

F ( x )   wi G (
)
i 1

CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model IX
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•Results on Sample Complexity, Computational
Complexity and Generalisation Performance for
RBF Networks:
•The
generalisation error converges to zero only if
the number of hidden units m1, increases more
slowly than the size N of the training sample;
•For
a given size N of training sample, the optimum
number of hidden units, m1* , behaves as:
m1*  N1/3
•The
RBF network exhibits a rate of approximation
O (1/ m1) that is similar to that of an MLP with
sigmoid activation functions.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model X
•
Contents
Time Series
Prediction
Comparison of MLP and RBF networks:
1.
An RBF network has a single hidden layer. An
MLP has one or more hidden layers;
2.
Typically the nodes of an MLP in a hidden or
output layer share the same neuronal model.
On the other hand the nodes of an RBF in a
hidden layer play a different role than those in
the output layer;
3.
The hidden layer of an RBF is non-linear. The
output layer is linear. Typically in an MLP both
layers are nonlinear;
TS & NNs
RBF Model
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model XI
4.
An RBF network computes as argument of its
activation function the Euclidean norm of the
input vector and the center of the unit. In MLP
networks the activation function computes the
inner product of the input vector and the
weight vector of the node;
5.
MLPs are global approximators; RBFs are local
approximators due to the localised decaying
Gaussian (or other) function.
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Learning Law for Radial Basis Networks
•
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
To develop a learning law for RBF networks we
assume that the error function has the following
form:
1
E 
2
N
 ej
2
j 1
Where N is the size of the training sample used to do
the learning, and ej is the error signal defined by:

e j  d j  F (x j )
M

 d j   wi G(|| x j  ti ||ci )
i 1
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Learning Law for Radial Basis Networks II
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•We need to find the free parameters wi, ti and -1 so
as to minimise E. Ci is a norm weighting matrix, i.e.:
||x||C2 = (Cx)T(Cx)=xCTCx
•We use a weighted norm matrix when the individual
elements of x belong to different classes.
•To calculate the update equations we use gradient
descent on the instantaneous error function E. We
get the following update rules for the free parameters:
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Learning Law for Radial Basis Networks III
1. Linear weights (output layer):
Contents
Time Series
E (n)

wi (n)
N


e
(
n
)
G
(||
x

x
 j
j
i ||Ci )
j 1
Prediction
TS & NNs
RBF Model
Conclusions
E (n)
wi (n  1)  wi (n)  1
wi (n)
i=1,2,…,m1
2. Positions of centers (hidden layer):
N
E (n)
 
1 
 2wi (n) e j (n)G' (|| x j  ti (n) ||Ci )i [ x j  ti (n)]
ti (n)
j 1


E (n)
ti (n  1)  ti (n)   2
ti (n)
CS 476: Networks of Neural Computation, CSD, UOC, 2009
i=1,2,…,m1
Learning Law for Radial Basis Networks IV
3. Spreads of centers (hidden layer):
Contents
N
E ( n )
 
  wi (n) e j (n)G ' (|| x j  ti ||Ci )Q ji (n)
1
 i ( n )
j 1
 
 
Q ji (n)  [ x j  ti (n)][x j  ti (n)]T
E ( n )
1
1
 i (n  1)   i (n)  3
1
 i ( n )
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•
Note that three different learning rates 1, 2, 3
are used in the gradient descent equations.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Contents
Time Series
Prediction
TS & NNs
RBF Model
Conclusions
•In time series modelling we seek to extract the
maximum possible structure we can find in the series.
•We terminate the analysis of a series when the
residuals do not contain any more structure, i.e. they
have an IID structure.
•NN can be used as models in time series prediction.
•RBF networks are a second paradigm of multi layer
perceptrons.
•They are inspired by interpolation theory (numerical
analysis)
•They can be trained with the gradient descent
method, the same as the MLP case.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Download