entropy as a measurement for the quality of demand forecasting

advertisement
ENTROPY AS A MEASUREMENT FOR
THE QUALITY OF DEMAND FORECASTING
Bernd Scholz-Reiter, Jan Topi Tervo, Uwe Hinrichs
Department of Planning and Control of Production Systems, University of Bremen
bsr, ter, hin@biba.uni-bremen.de
Production planning and control is a highly complex process influenced by
many factors. An important part of this broad task is demand forecasting, for
which many methods already have been developed. But due to the occurring
dynamics in the used data, the prediction may differ strongly from the optimum
and thus errors leading to rising costs are inevitable. In this paper we will
propose the entropy as a measurement for the quality of demand forecasting
respectively as relative estimation for the forecasting error. In general, entropy
is a measurement for disorder and thus also for information content. Since lack
of information leads to inaccuracy of forecasting, the entropy can be identified
with the quality of demand prediction. First results on the basis of time-series
obtained from mathematical functions, discrete-event simulations of a
production network scenario and a real shop-floor system will show the
successful transfer of this method.
1. INTRODUCTION
Nowadays, production planning and control is a challenging task due to changing
market conditions and increasing dynamics in global network organizations. Its
primary objective is to schedule and realize the ongoing production plan efficiently
(Eversheim et al., 1996). To do so, the production capacities and the needed amount
of resources have to be regarded.
While the number of machines is constant in general, the demand for any kind of
resource or material has to be forecasted and ordered with respect to the planned
output in a defined time-period. Incorrect or invalid forecasts can lead to severe
consequences: ordering too much material will result in higher stocks with rising
costs for stock-holding and materials. When ordering less than the needed amount
the risk of production downtimes arises. Therefore, exact and secure methods for the
important process of demand forecasting are needed.
This becomes clearer when looking at the several factors or sources of
information which influence the planned demand for materials in a defined period of
time. First of all, exact numbers from the sales market are needed, which determine
the production plan. Here, seasonal fluctuations can occur depending on the kind of
produced goods. Furthermore, increasing dynamics in present markets have been
observed and nonlinear effects in production systems or production networks have
been verified (Scholz-Reiter et al., 2003), (Wiendahl et al., 2000). Regarding all
2
Digital Enterprise Technology
these factors and the possible economic results, it is obvious that adaptive and
trustful methods have to be used when forecasting the demand.
Until now several approaches for demand forecasting based on statistical and
mathematical techniques are used. The future demand is forecasted by using a timeseries consisting of former values (Granger, 1989). Although these methods were
tested and show a strong reliability, there are still no means to measure the quality of
the calculated result. But since demand forecasting techniques basically depend on
preceding information, a measurement for the prediction quality should be based on
the available information content. Hence, we propose the entropy to characterize the
quality of demand forecasting respectively the relative estimation for the forecasting
error, as entropy is a measurement for disorder and, thus, also for information
content.
In the following, several forecasting methods as well as the entropy in general
are presented. The next step is to apply these techniques to different time-series
showing demand values and compare the measured forecasting error with the
calculated entropy. Therefore, several time-series taken from different mathematical
functions, discrete-event simulations of a production network scenario and a real
shop-floor system were used.
2. FORECASTING METHODS
In recent publications many forecasting methods have been proposed for a broad
variation of different settings (Makridakis et al., 1998). But since the focus of this
paper is rather on the quality of the forecasting than on the method itself, we will
concentrate on two different basic techniques, which will be presented briefly here.
The first is the moving average approach, which is best suitable for simple timeseries with identifiable fluctuations around a mean value and without cycles.
Thereby, not all available data from a time-series is used, but only the last n values.
The demand in the future period i+1 is then determined by the averaged demand
i(n) of the considered past period (Granger, 1989):
λi +1 = λi (n) =
1
n
i
Σ λj .
j =i − n
(1)
The number of values given by n allows looking at a limited time segment and
thereby shows a high flexibility. But also, n influences the reaction to changes: a
large value of n neglects rapid changes, while a smaller value of n follows fast
dynamics.
The other applied method is exponential smoothing or exponentially weighted
moving average (Granger, 1989). Here the future demand i+1 is calculated from the
weighted average of the measured demand i and the forecasted demand i( ) of the
past period:
λ i +1 (α ) = αλ i + (1 − α ) λ i (α ) with 0 < α < 1 .
(2)
When including the n past values, this leads to the weighted average of the data:
i
λi +1 (α ) = α Σ (1 − α ) i − j λ j .
j =i − n
i-j
(3)
The factor (1- ) causes an exponential decrease of the influence of the past
values on the average. If is near one, the decay is strong, i.e., the effect of the past
Entropy as a Measurement for the Quality of Demand Forecasting
3
values is weak. In contrast, when is near zero, the decay is weaker and past values
are taken into account more strongly. The challenge is to find a suitable value for .
Until now there is no objective way to define this factor.
3. ENTROPY
Commonly, the word entropy is associated with disorder, uncertainty or ignorance.
It originates from two different domains of science, namely physics and information
theory. Both derivations have similarities, but require knowledge in each domain.
Entropy as a measure with physical meaning was introduced by Clausius (1865)
and later precised by Boltzmann (1880). In thermodynamics, a macroscopic state is
described by the microscopic behaviour of all its N particles. These are defined by
their positions and their impulses, which span a 6N-dimensional phase space.
Entropy then gives a measurement for the quantity of different possible microstates
of that thermodynamical system or the volume of phase space occupied by it. In
other words, it describes the internal disorder within a system. Since entropy in
statistical physics gives a probabilistic treatment to a system's thermal fluctuations,
higher entropy also means a greater lack of information on the exact configuration
of the system. Hence, it has many similarities with entropy derived in information
theory. This definition is principally based on Shannon (Shannon, 1948) and in this
sense it is a measure for the amount of randomness hidden in an information
sequence. This means that a sequence with redundancies or statistical regularities
exhibits small values of entropy and in contrast, a uniform distribution of sequence
symbols, e.g., white noise, leads to the highest entropy value. As a consequence,
history and future of that sequence are completely uncorrelated. Since this paper is
focussed on time-series analysis, the information theoretical definition of entropy
will be considered.
3.1 Symbolic dynamics
In order to calculate a value for entropy, a sequence of symbols is needed. In time
series with discrete values, e.g., buffer levels, this condition is granted. But for a
continuous variable, the values have to be transformed into an adequate sequence.
Nevertheless, the number of discrete values may also be reduced by such a
transformation. In physics, this method is well known as ‘symbolic dynamics’ (Hao,
1988).
Apparently, when transforming a time-series into a symbol sequence, a large
amount of detailed information is lost, but some invariant and robust characteristics
such as periodicity, symmetry and chaos can be conserved. But this strongly
depends on the choice of the transformation. Due to the reduction of details, the
analyses of symbol sequences are less vulnerable to noise (Daw et al., 2003) and,
consequently, conclusions drawn from these sequences are more precise.
Before calculating the entropy, the first choice to make is the alphabet size |A|=l,
i.e., the number of symbols used to transform the original time-series into a
sequence of symbols. This variable determines how much of the original
information is conserved. The simplest case is a binary alphabet with l=2 and
A={0,1}. The next step is to decide about the transformation itself. There are two
4
Digital Enterprise Technology
elementary different ways: static and dynamic transformations (for illustration see
Figure 1). The static transformation is realized by choosing one (in the binary case)
or more thresholds and the different symbols are then assigned to the intervals
between them. There are diverse rules to calculate those thresholds, e.g., data mean
or median value (Daw et al., 2003). The dynamic transformation is preferred when
the dynamics are more important than the absolute values. Thereby, step-to-step
differences in the sequence are taken into account and, in the binary case for
example, a positive difference leads to one and a negative to the other symbol. Of
course it is possible to make a bad choice for the transformation, hence all or at least
the relevant information is lost.
(b)
Demand
Demand
(a)
11000011110001111000
1101100011010000111
Timesamples
Timesamples
Figure 1 - Illustration of symbol sequence generation. In (a) a binary static and in (b)
a binary dynamical transformation is shown.
3.2 Calculation of entropy
In order to calculate the Shannon-Entropy 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1
symbol sequence statistics have to be 1 1 0
101
performed. More precisely, a histogram of
011
repeating sequences of length L has to be
.....
obtained. Therefore, L consecutively 6 5 3 6 4 0 1 3 6 5 2 4 0 0 1 3 7
following symbols s are combined to a word Figure 2: The code sequence (lower
sL and every word is uniquely coded to a row) is produced by a window of
decimal number (see Figure 2) to avoid the length L=3 being slid over the
handling of long symbol sequences. symbol sequence (upper row).
Figuratively, one can think of a window of
size L being slid from the beginning to the end of the sequence and at every position
a word of length L is found. Then, a histogram of the relative word frequencies p(sL)
can be obtained and with
HS = −
p(s L ) log l p(s L )
L
s ∈A
(4)
L
the Shannon-Entropy can be calculated as the sum over all possible words of length
L. Since the value of the entropy is strongly dependent on the word length L, a
standardisation with the maximally possible entropy is required:
H=
HS
∈ [0, 1]
HMAX
(5)
Entropy as a Measurement for the Quality of Demand Forecasting
5
This maximum value is obtained for a uniform distribution of the word frequencies
p(sL) =
1
, ∀sL
l
L
(6)
and thus HMAX=L. This leads to a zero entropy for constant sequences of symbols
and to H=1 for a completely random symbol sequence.
4. MEASUREMENT FOR QUALITY OF DEMAND
FORECASTING
The entropy as a reliable measurement for demand forecasting quality is evaluated
by comparing the forecasted demand with the real demand value of the next time
step. Then the correlation between forecasting error and calculated entropy is
identified. But previously, the parameters l (alphabet size), L (word length), n (time
horizon) and (smoothing factor) in Equations (1)-(5) have to be determined.
The maximum possible word length strongly depends on the alphabet size and
the length of the time-series: The larger the alphabet and the shorter the time-series,
the smaller is the possible word length (and vice versa) (Daw et al., 2003). Since real
time-series are in general rather short, we use a binary alphabet to be able to get a
word length up to L=5. The data mean was used as a threshold for a static
transformation, except for the real shop-floor system where additionally a dynamical
transformation was applied. Also, for the shop-floor system it was unfortunately
only possible to calculate the entropy up to a word length L=4 due to the shortness
of the time-series of only 360 samples. To be as realistic as possible, the time
horizon is chosen to be half a year, i.e., with one time sample per day. This leads to
n=150 time samples and additionally, the smoothing parameter was found to
produce in average best results for =0.76.
4.1 Simple examples
constant function
sine function
random function
100
75
Demand
To depict the properties of the
entropy, its value for three simple
time-series generated from a
constant, a sine and a uniformly
distributed random function (see
Figure 3) is used. The calculated
values are enlisted in Table 1. As
stated above, a constant function
leads to a single peak in the
distribution of generated words
and thus zero entropy will follow.
The computational calculation for
different word lengths confirms
this result. As a consequence,
50
25
0
0
5
10
15
20
25
Timesamples
30
35
40
Figure 3: Extract from the constant, sinusoidal
and random function, respectively the
generated time-series (denoted as points), used
for entropy calculation and forecasting.
6
Digital Enterprise Technology
forecasting without error is possible. Contrarily, a time-series of random values
leads to a uniform distribution of generated words and hence to a maximum entropy
of value one; an exact forecasting is impossible. The sine function produces for
increasing word lengths decreasing entropy values (see. Table 1) with a mean of
approximately 0.5, because a longer word implies more information and a better
predictability.
The forecasting of a constant time-series is trivial. Both methods (gliding
average and exponential smoothing) will deliver exact results of future demand
without any error. This coincides with the entropy value of zero. Similarly, the
forecasting error for the random function corresponds to the calculated entropy.
Here, the forecasting method has no significance, since the past values do not
correlate at all with future values. This is shown by the forecasted values (calculated
with gliding average method) varying only a little around 50 and their error to the
real
demand,
fluctuating
between 0 and 100. Table 3 Table 1: Entropy values of different word lengths
enlists these values for three for the three different time-series generated by a
randomly picked points in time constant, sine and random function.
of
the
time-series.
The
Entropy
Word length
prediction of the demand for
constant
sine
random
the sine function is in average
3
0.00
0.56
1.00
better with the exponential
4
0.00
0.50
1.00
smoothing than with the gliding
5
0.00
0.47
1.00
average method.
For the forecasting error it is of major importance at which time step a prediction
is made. Around the minimum and maximum values of the function good values can
be obtained, while rather large errors occur when the slope is large. This reflects the
entropy value of about 0.5 calculated in Table 1.
4.2 Simulation time-series
To generate more application-oriented time-series a discrete event simulation model
of a supply chain of four enterprises with external customer driving (Scholz-Reiter
et al., 2005) was used. The customer demand was realized by a discrete sinusoidal
and a uniformly distributed random function (see Figure 4). The entropy values
calculated for both time-series are comparable to those calculated in Section 4.1.
The random demand leads to a random fluctuation in the time-series and so an
entropy value of one follows. On the other hand, the sinusoidal demand causes a
deterministic structure similar to the sine function and accordingly an entropy value
Table 2: Entropy values of different word lengths for the two different timeseries generated by the DES model with a sinusoidal and random customer
demand and entropy values of different word lengths for the time-series of
the real shop-floor system created with static and dynamical transformation.
Entropy
Word length
Simulation Data
Real Data
Sine
Random
Dynamical
Static
2
0.95
0.90
3
0.61
1.00
0.93
0.90
4
0.56
1.00
0.92
0.90
5
0.52
1.00
-
Entropy as a Measurement for the Quality of Demand Forecasting
7
of about 0.57 is calculated (see. Table 2). These values correspond to the forecasting
errors. As shown in Table 3 the sine function can be forecasted well because the
used forecasting methods deliver best results when only marginal dynamics are
present. Again a comparison for randomly picked points in time of the time-series
were done (see Table 3).
4.3 Real data
The last time-series to be analysed is taken from the demand of a real shop-floor
system (see. Figure 5). Here, the entropy is calculated with static and dynamical
transformations to constitute the differences between them for this time-series. As
shown in Table 2 the values differ only slightly with values of 0.9 and 0.93
respectively. Again, the entropy value corresponds to the predictability of the timeseries: It shows an almost random behaviour with only little determinism. This is
confirmed by the calculated forecasting values compared with real demand (see
Table 3).
sine demand
random demand
35
20
25
15
Demand
Demand
30
10
20
15
10
5
0
5
0
25
50
75
100
Timesamples
Figure 4: Extract from two different
time-series generated by the DES model
with a sinusoidal and random customer.
0
25
50
75
100
125
150
Timesamples
Figure 5: Extract from the demand timeseries of a real shop-floor system.
5. SUMMARY
The entropy can be calculated quickly and easily for rather short word length (up to
L=8) and realistic demand time-series of a length of max. 10000 time steps. Since it
is a measurement for uncertainty it corresponds to the predictability of time-series.
Therefore, no absolute forecasting error can be obtained, but a graduation between 0
(perfectly predictable) and 1 (not predictable at all) is very well possible.
The presented results show that this property of the entropy can be successfully
transferred to relatively measure the reliability of demand forecasting. But for a
promising application in order forecasting methods further research has to be done,
which will deal with the evaluation of the several parameters and concrete
recommended actions especially.
8
Digital Enterprise Technology
Table 3: Forecasted and real demand values
for all mentioned time-series for randomly
picked points in time, respectively their
relative error.
TimeForecasted Real
Relative
Series
value
Value
Error
sine
function
random
function
sinusoidal
demand
random
demand
Real data
29.1
84.9
1.6
53.9
53.3
53.1
15.1
15.03
10.88
13.33
13.55
13.41
3.01
3.05
3.42
18.7
92.4
1.4
60.9
77.9
18.5
12.59
14.99
10.31
18.38
10.85
11.35
3
1
11
55.6%
8.1%
14.3%
11.5%
31.6%
187.0%
19.9%
0.3%
5.5%
27.5%
24.9%
15.4%
24.0%
205.0%
68.9%
6. ACKNOWLEDGEMENTS
This work is funded by the German Research Foundation (DFG) under the reference
number Scho 540/13-1 "Synchronisation of the nodes in production and logistics
networks" and the Volkswagen Foundation under the reference number I/78 217
“Modelling and Analysis of Production and Logistics Networks Using Methods of
Nonlinear Dynamics”
7. REFERENCES
1. Daw, C. S., Finney, C. E. A., Tracy, E. R.. A review of symbolic analysis of experimental data. Review
of Scientific Instruments 74, 2003, pp. 915-930.
2. Eversheim, W., Schuh, G.. Produktion und Management Bd. 2. Springer, 1996.
3. Granger, C. W. J.. Forecasting in Business and Economics. Academic Press, London, 1989.
4. Hao, B.-L.. Elementary Symbolic Dynamics and Chaos in Dissipative Systems. World Scientific, 1988.
5. Makridakis, S., Wheelwright, S. C., Hyndman, R. J.. Forecasting. Wiley and Sons. 1998.
6. Scholz-Reiter, B., Freitag, M.. On the Dynamics of Manufacturing Systems – A State Space Perspective. Proceedings of the 36th CIRP-International Seminar on Manufacturing Systems, 2003, pp.
455-462.
7. Scholz-Reiter, B., Hinrichs, U., Delhoum, S.. Analyse auftretender Instabilitäten in dynamischen
Produktions- und Logistiknetzwerken. In: Industrie Management 21 (2005) 5, S. 25-28.
8. Shannon, C. E.. A mathematical theory of communication. The Bell System Technical Journal, 27,
1948, pp. 379-423 and pp. 623-656.
9. Wiendahl, H.-P., Worbs, J.. Simulation based analysis of complex production systems with methods of
nonlinear dynamics. IMCC'2000 International Manufacturing Conference in China, 2000.
Download