4. experiments and results

advertisement
REFRIGERANT LEAK PREDICTION IN SUPERMARKETS USING EVOLVED
NEURAL NETWORKS
Dan W. Taylor, David W. Corne,
University of Reading, UK
d.taylor@logicalgenetics.com, d.w.corne@reading.ac.uk
ABSTRACT
The loss of refrigerant gas from commercial refrigeration
systems leads is a major maintenance cost for most
supermarket chains. Gas leaks can also have a detrimental
effect on the environment. Monitoring systems maintain a
constant watch for faults such as this, but often fail to detect
them until major damage has been caused. This paper
describes a system which uses data received at a central
alarm monitoring centre to predict the occurrence of gas
leaks. Evolutionary algorithms are used to breed neural
networks which achieve high accuracies given limited
training data.
1. INTRODUCTION
In recent years, large supermarket chains in the UK have
become increasingly aware of refrigeration systems in their
stores. This has happened for a number of reasons, most
notably because refrigeration is one of the largest costs
when setting up and running a store and their are a number
of ways in which the associated systems can be optimised to
save money. Now, with the added pressures placed upon
those who operate commercial refrigeration systems by
environmental legislation, such as the Kyoto and Montreal
protocols and ever increasing energy costs, the optimisation
of refrigeration systems is more important than ever. See
[1] for a more detailed review of this.
Individual cabinets and cold-rooms within a typical UK
supermarket are part of a complex system of interdependent
items of machinery, electronic and computer control units
and many hundreds of metres of pipe-work and cabling.
Unlike the small, sealed refrigerators which can be
found in most of our homes, the refrigeration systems to be
found in supermarkets are fed with refrigerant via a network
of piping which runs under the floor of the store.
Refrigerant is allowed to evaporate within cabinets to
absorb heat, and the resulting hot gas is then pumped to
condensers outside the store. Large electrically powered
compressors, situated away from the shop floor, are used to
facilitate this.
As might be expected, the presence of refrigerant gas in
this large, complex mechanical system inevitably leads to
the occasional leak. A larger supermarket will have around
100 individual cooled cases and the associated refrigeration
system can hold around 800Kg of refrigerant. Refrigerant
costs around £15 per kilogram and can have detrimental
effects if leaked into the atmosphere. It is therefore
imperative that leaks from refrigeration systems be
minimised, both from financial and environmental points of
view.
JTL Systems Ltd (www.jtl.co.uk) manufacture advanced
electronic controllers which control and co-ordinate
refrigeration systems in supermarkets. These systems, as
well as controlling cabinet temperature, gather data on
various parameters of the store-wide refrigeration system.
This data is used to optimise the operation of machinery and
schedule defrosts, whilst also being used to generate alarms.
Alarms are essentially warnings of adverse conditions in
equipment. Alarms are transmitted, via modem link, to a
central monitoring centre. At this monitoring centre, trained
operators watch for serious events and call the appropriate
store staff or maintenance personnel to avert situations
where stock may be endangered.
Gas losses have been highlighted by JTL and their
customers (major supermarket chains) as a very important
area in which to concentrate resources.
There are
essentially two types of gas leak:
 Fast: Equivalent to a burst tyre on a car: a large crack or
hole in piping or machinery causes gas to be lost quickly
and the refrigeration system to be immediately impaired.
Fast leaks can be detected immediately at the JTL
monitoring centre and the appropriate action taken.
 Slow: Equivalent to a slow puncture: gas slowly leaks
from the system causing a seemingly unrelated series of
alarms and alterations in the performance of the system.
This type of leak is more frequent and can be much
harder to detect. JTL’s customers tend to loose more
money through slow leaks than through fast leaks.
This paper details work undertaken to develop systems
which use alarm data, gathered from refrigeration systems in
supermarkets, to predict/detect the occurrence of slow gas
leaks. There is a clear commercial requirement for such a
system as it will allow pre-emptive maintenance to be
scheduled, thus minimising the amount of gas allowed to
leak from the system.
The prediction/classification technique used in this paper
is an extension of that presented in [2]. Neural networks are
trained using a combination of evolutionary algorithms and
traditional back propagation learning. This training scheme
has been shown to be marginally more effective than
evolved rule-sets and back propagation used in isolation.
A description of the data available for prediction systems
and the various pre-processing operations performed upon it
can be found in section 2. Section 3 goes on to describe the
EA/BP hybrid training system in more detail. In section 4
we outline the various experiments performed and their
results and finally, a concluding discussion and some
suggested areas for further exploration can be found in
section 5.
monitoring centre, staff have neither the time or the
expertise required to watch for these patterns.
As the receipt of an alarm is a discreet event, without
any duration, it was necessary to present our prediction
system with a series of categorised alarm totals. This gives
us a list of small valued integers. We create a vector of n
samples, each of length t. This covers a period of n * t
hours. For each sample period we create a three-tuple of
values corresponding to the sum totals of alarms occurring
within that sample period, in each of three categories:
2. ENGINEERING AND ALARM DATA
Thus for a vector where n=3, t=8 we have nine values,
corresponding to plant, coldroom and cabinet alarm totals
for each of our three sample periods, spanning 24 hours
altogether.
There are two important data sets which must be
combined in order to create training data suitable for
training classifiers to solve the task in hand. These are
outlined in this section, along with details of how they were
combined and the pre-processing operations used to create
valid training data.
2.1. ALARM DATA
As previously mentioned, when adverse conditions are
detected by on-site monitoring and control hardware they
are brought to the attention of operators at JTL’s dedicated
monitoring centre. The Network Controller, which is the
principle component of the in-store refrigeration control
system, uses it’s in-built modem to send a small package of
data to the monitoring centre via the telecommunications
network. This data package is known as an Alarm and
contains useful information, including:
 The id number of the unit which raised the alarm
 The nature of the alarm conditions and any related
information – such as temperature or pressure readings
 The time at which the alarm was first raised
Information from alarms is copied to a large relational
database called “Site Central”. Alarm data has been
archived here for almost three years and well over two
million individual alarm records are stored. These alarms
correspond to 40,000 control/monitoring units at 400 stores
monitored by JTL for it’s main customer.
A few human experts can diagnose problems with
refrigeration systems using this alarm data. Some types of
fault, gas loss in particular, have a well defined, but often
quite subtle, pattern of events that can be detected by those
in the know. Due to training and resource issues at the
 Plant alarms (compressors/machinery)
 Coldroom alarms (large cooled storerooms)
 Cabinet alarms (refrigerated cabinets on the shop floor)
2.2. ENGINEERING DATA
In order to train prediction systems to recognise alarm
patterns associated with a gas loss it is important to have a
large set of training data, containing a number of examples
of alarm patterns corresponding to previous gas loss events.
The record of gas leaks for the period between 1 st Jan 2000
and 5th April 2002 was obtained from JTL’s main
customer’s maintenance logging system. This data records
the date on which an engineer attended a site and what
action was taken during their visit. SQL was written to
pinpoint and record the dates of 240 engineering visits
corresponding to gas losses over the two year period.
Sadly the engineering logs do not record an exact time
for the gas loss event, only the date on which the engineer
visited. This means that choosing an input vector for our
classifier which immediately proceeds the gas loss event is
not possible. As a compromise the input vector’s last
sample ends at 00:00am the day the engineer visited. So the
gas loss could have occurred between one second and
twenty four hours after the end of our input vector.
Our inability to select an input vector which immediately
proceeds a gas loss event is compounded by the fact that
slow gas leaks take place over a period which varies in
length from hours to days. Our system must therefore
behave more like a classifier than a prediction system;
deciding whether a gas loss is currently occurring or not,
rather than predicting that a gas loss will occur at a given
time.
It is also worth noting that when generating training data
patterns we were unable to distinguish between fast and
slow gas losses because this data is not recorded in the
engineering logs.
2.3. GENERATION OF TRAINING DATA
3. EVOLVING NEURAL NETWORKS
Training data was generated using the engineering and
alarm data sets. This training data corresponds to all
recorded occurrences of gas loss at monitored stores for a
two year period. 240 patterns in all. Our classifier, in order
to be correctly trained, also needs a set of training patterns
corresponding to sites which are operating normally (or
have non-gas loss related problems).
This data was generated in a similar way, using the alarm
data. Identically structured vectors of n samples were
created for randomly selected sites. These vectors end at
randomly selected dates and times. The dates and times
used for these training patterns were generated according to
two important constraints:
 The date/time selected must be within the period chosen
for examination
 The corresponding alarm totals vector must not overlap
any recorded gas loss event at the site
Using this scheme we generated an additional 256
training data patterns which we expect not to correspond to
gas leaks in stores. This gives us a total of 496 training data
patterns.
The output of the neural network is a single Boolean
value where:
The system used to evolve neural networks is quite similar
to EP-Net [4] [5]. Unlike EP-Net, the system developed
does not evolve network structure, but does allow genetic
operators to be used to find a set of weights for the
network.
3.1. NETWORK REPRESENTATION
The neural network representation used is based around two
data structures: the connection matrix and the weight
vector.
These two simple structures are capable of
representing networks with high levels of complexity
(including recurrent and partially recurrent networks, though
these are not investigated here).
The simple network shown in figure 1 is used as an
example.
N2
N0
1  Gas loss
0  No gas loss
Thus we have 240 training patterns for which we expect
an output of 1 and 256 patterns for which we expect an
output of 0.
To make the input vectors more neural network friendly
we multiplied each vector by a scaling factor of 0.1. So an
input value corresponding to 10 alarms is presented to our
classifier as 1.
Due to the extremely small quantity of training data
available to us it was necessary to generate test and training
data partitions using the 0.632 bootstrap method [3]. We
sample an n length dataset, n times at random with
replacement to create the training data set and use the
remaining, unselected patterns for test data. This gives us a
training data set which is, on average, 63.2% the size of the
original data set. Accuracies on training data are quite
optimistic while, conversely, test data accuracies are rather
pessimistic. To counteract this we calculate the overall
error value thusly:
E = (0.368 * Etest) + (0.632 * Etrain)
To compensate for any atypical results that may be
generated due to a particular partitioning, we generated 5
differently partitioned sets of training and test data from our
original data set. Training runs are then performed on these
data sets for a specified number of generations and the mean
error rate calculated (see section 4).
W0
W2
W5
W3
N3
W1
W6
N4
W7
N5
W4
N1
Input
Sigmoid/Hidden
Output
Bias
Figure 1: A simple neural network model with 6 neurons
(N0 to N5) and 8 weights (W0 to W7)
We use four different types of neuron in our model. Input
neurons are a simple placeholder for values to be inputted to
the network. Output neurons are a similar placeholder and
have no activation function. Output neurons can receive
only one incoming connection, the weight value of which is
set to 1. Bias neurons have a constant output value of 1 and
can not accept incoming connections, outputs from these are
used to bias neurons of other types, as in [6]. Finally,
sigmoid (or hidden) neurons are standard neurons with a
sigmoid activation function [7].
Table 1 shows the connection matrix for the network in
figure 1. The connection matrix, for a network with n
neurons, is an n x n, sparsely populated matrix representing
the connections between neurons.
Elements in the
connection matrix can be either a rogue “no connection”
value, or a positive integer value which indexes the
corresponding, real valued weight in the weight vector.
An element Mij (column i, row j) represents the connection
from neuron i to neuron j. If such a connection exists then
Mij will store an index to the weight vector element which
holds the weight value for this connection.
0
1
2
3
4
5
0
x
x
x
x
x
x
1
x
x
x
x
x
x
2
x
x
x
x
x
x
3
0
1
2
x
x
x
4
3
4
5
6
x
x
5
x
x
x
x
7
x
Table 1: The connection matrix, contains indices into the
weight vector
Table 2 shows the weight vector for our example network.
The weight vector is a simple list of double precision
floating point values. These values are the weight values of
the connections between neurons in our network.
0
1
2
3
4
5
6
7
W0
W1
W2
W3
W4
W5
W6
W7
Table 2: The weight vector stores real valued weights
connecting neurons
Networks used in the experiments here all have one layer of
hidden nodes, one layer of inputs, one for each of our
training data elements and a single output which is our gas
loss prediction.
3.2. Evolutionary Operators
We are not attempting to alter the structure of the neural
network in any way during training. So the connection
matrix is largely unimportant. However, because of the way
in which the connection matrix and weight vector interact,
we find that weights corresponding to the inputs of a given
neuron can be found close together (adjacent in fact) in the
weight vector. This makes the weight vector an ideal
candidate for our gene.
The proximity of symbols associated with similar
functions within our gene encourages the preservation of
schemata (i.e. individual behavioural traits) as genetic
operators are applied. This has been shown to heighten the
effectiveness of the evolutionary algorithm as a whole [8].
Our evolutionary training scheme is a simple hybrid of
standard genetic operations and back propagation learning.
We start with a population of randomly initialised
individuals. These are sorted into order of fitness (based on
the sum squared error on our training data set [9]). Each
generation we kill some of the least fit individuals. We then
breed replacements using crossover and a possible mutation
and insert them into the population.
Two parents are selected for random multi-point
crossover [10], based on a tournament between a pair of
individuals chosen at random from the surviving population.
Crossover is performed using a number of points between 0
and m, where m is a predefined maximum dependent on the
size of the networks used.
Mutation is performed numerically on weight values,
rather than on their binary representations. This simplifies
the mutation process by removing problems experienced
when alterations are made to individual bits within the
sign/mantissa/exponent representation of real numbers. A
mutation value, selected using a zero centred probability
distribution with exponential decay, is added to the weight
value chosen for mutation. The number of weights mutated
varies between 1 and g, where g is the number of real
numbers in the gene.
Finally, the fittest 10% of our population are allowed a
number of epochs of standard back propagation. This has
been found to help with local optimisation within the
problem search space.
4. EXPERIMENTS AND RESULTS
Three different network topologies were trained using each
of the three differently partitioned training and test data sets.
This gives us fifteen discreet results. We expected, before
carrying out the experiments, that networks with lower
numbers of hidden nodes (25 in this case, see table 3) would
perform better on training data but be less good at
generalisation, with lower accuracies on test data.
Conversely, we expected that larger networks (60 hidden
nodes, see table 5) would be better at generalisation but with
lower training data accuracies. These hypotheses were
proved true, as shown below.
We also expected that networks with a medium number
of hidden nodes (45, see table 4) would provide us with a
“best of both worlds” solution, having acceptable training
and test data accuracies. All three of the network topologies
had almost identical bootstrapped accuracy levels, though
the 45 hidden node networks had the worst overall
performance. The inability of the medium sized networks to
generalise and give good training data accuracies caused
them to have a disappointingly low performance overall.
In all experiments we used networks with 21 inputs,
based on 3 alarm total categories for each of 7 periods of 24
hours.
Set
Training Data
Accuracy
Test Data
Accuracy
Bootstrapped
Accuracy
0
1
2
3
4
M:
83%
77%
85%
83%
84%
82.40%
64%
66%
59%
60%
59%
61.60%
76.01%
72.95%
75.43%
74.54%
74.80%
74.75%
Table 3: Accuracies for the five training/test data sets with
no offset and 25 hidden nodes. Where M is the mean value
over the five runs
accuracy of 75%. Although, strictly speaking, this was not
achieved, it has been agreed by all involved that the results
obtained are adequate for our purposes. Due to the small
amount of data available to train the prediction systems a
high premium has been placed on the ability of the system
to generalise. With this in mind we have decided to
implement our first test system in the real world using the
larger (60 hidden node) networks.
Further work will be carried out in the coming months to
increase the system’s overall accuracy and ability to
generalise. Work will also be carried out to highlight other
faults and problem areas which may be suitable for
prediction using the methods highlighted here.
ACKNOWLEDGEMENTS
Set
Training Data
Accuracy
Test Data
Accuracy
Bootstrapped
Accuracy
0
1
2
3
4
M:
80%
77%
82%
81%
84%
80.80%
64%
67%
62%
65%
54%
62.40%
74.11%
73.32%
74.64%
75.11%
72.96%
74.03%
Table 4: Accuracies for the five training/test data sets with
no offset and 45 hidden nodes. Where M is the mean value
over the five runs
Set
Training Data
Accuracy
Test Data
Accuracy
Bootstrapped
Accuracy
0
1
2
3
4
M:
80%
77%
82%
81%
82%
80.4%
66%
68%
64%
61%
61%
64.00%
74.85%
73.69%
75.38%
73.64%
74.27%
74.37%
Table 5: Accuracies for the five training/test data sets with
no offset and 60 hidden nodes. Where M is the mean value
over the five runs
5. CONCLUSION
Prediction systems developed as a result of this work are to
be installed at JTL’s alarm monitoring centre, where they
will be used to alert trained staff to the possibility of gas
losses. Their role will be largely that of an early warning
system, advising staff that further attention may need to be
paid to systems at the store in question. Because we are
expecting to have a “human in the loop” at all times, and
because we can not expect miracles from our rather sketchy
training data set, lower accuracy levels can be permitted.
Before work began on these systems the authors, along
with staff at the monitoring centre, agreed upon a target
We acknowledge the support of the Teaching Company
Directorate (via the DTI and EPSRC) and JTL Systems Ltd.
for funding this project. The authors also thank Evosolve
Ltd for partial financial support.
BIBLIOGRAPHY
[1] R Gluckman, “Current Legislation Affecting Refrigeration”,
Proceedings, 9th Annual Conference of the Institute of
Refrigeration.
[2] D W Taylor, D W Corne et al, “Predicting Alarms in
Supermarket Refrigeration Systems Using Evolutionary
Techniques”, Proceedings, World Congress on Computational
Intelligence (WCCI-2002)
[3] B Efron “Bootstrap Methods, Another Look At The
Jackknife”, Annals of Statistics 7:1 - 26
[4] X Yao, Y Liu, “A New Evolutionary System For Evolving
Neural Networks”, IEEE Transactions on Neural Networks 8(3)
[5] X Yao, Y Liu, “A Population Based Learning Algorithm
Which Learns Both Architectures and Weights of Neural
Networks” Chinese Journal of Advanced Software Research 3(1)
[6] J D Knowles, D W Corne, “Evolving Neural Networks for
Cancer Radiotherapy”, Practical Handbook of Genetic Algorithms:
Applications, 2nd Edition. Chapman Hall/CRC Press pp 443-488
[7] D E Rumelhart et al, “Learning Representations By Back
Propagation of Errors” Parallel Distributed Processing: Exploration
of the Microstructure of Cognition Vol 1 Chapter 8, MIT Press
[8] !!! Schemata and schema theory !!!
[9] !!! Sum Square Error !!!
[10] !!! Random Multi-point crossover !!!
Download