REFRIGERANT LEAK PREDICTION IN SUPERMARKETS USING EVOLVED NEURAL NETWORKS Dan W. Taylor, David W. Corne, University of Reading, UK d.taylor@logicalgenetics.com, d.w.corne@reading.ac.uk ABSTRACT The loss of refrigerant gas from commercial refrigeration systems leads is a major maintenance cost for most supermarket chains. Gas leaks can also have a detrimental effect on the environment. Monitoring systems maintain a constant watch for faults such as this, but often fail to detect them until major damage has been caused. This paper describes a system which uses data received at a central alarm monitoring centre to predict the occurrence of gas leaks. Evolutionary algorithms are used to breed neural networks which achieve high accuracies given limited training data. 1. INTRODUCTION In recent years, large supermarket chains in the UK have become increasingly aware of refrigeration systems in their stores. This has happened for a number of reasons, most notably because refrigeration is one of the largest costs when setting up and running a store and their are a number of ways in which the associated systems can be optimised to save money. Now, with the added pressures placed upon those who operate commercial refrigeration systems by environmental legislation, such as the Kyoto and Montreal protocols and ever increasing energy costs, the optimisation of refrigeration systems is more important than ever. See [1] for a more detailed review of this. Individual cabinets and cold-rooms within a typical UK supermarket are part of a complex system of interdependent items of machinery, electronic and computer control units and many hundreds of metres of pipe-work and cabling. Unlike the small, sealed refrigerators which can be found in most of our homes, the refrigeration systems to be found in supermarkets are fed with refrigerant via a network of piping which runs under the floor of the store. Refrigerant is allowed to evaporate within cabinets to absorb heat, and the resulting hot gas is then pumped to condensers outside the store. Large electrically powered compressors, situated away from the shop floor, are used to facilitate this. As might be expected, the presence of refrigerant gas in this large, complex mechanical system inevitably leads to the occasional leak. A larger supermarket will have around 100 individual cooled cases and the associated refrigeration system can hold around 800Kg of refrigerant. Refrigerant costs around £15 per kilogram and can have detrimental effects if leaked into the atmosphere. It is therefore imperative that leaks from refrigeration systems be minimised, both from financial and environmental points of view. JTL Systems Ltd (www.jtl.co.uk) manufacture advanced electronic controllers which control and co-ordinate refrigeration systems in supermarkets. These systems, as well as controlling cabinet temperature, gather data on various parameters of the store-wide refrigeration system. This data is used to optimise the operation of machinery and schedule defrosts, whilst also being used to generate alarms. Alarms are essentially warnings of adverse conditions in equipment. Alarms are transmitted, via modem link, to a central monitoring centre. At this monitoring centre, trained operators watch for serious events and call the appropriate store staff or maintenance personnel to avert situations where stock may be endangered. Gas losses have been highlighted by JTL and their customers (major supermarket chains) as a very important area in which to concentrate resources. There are essentially two types of gas leak: Fast: Equivalent to a burst tyre on a car: a large crack or hole in piping or machinery causes gas to be lost quickly and the refrigeration system to be immediately impaired. Fast leaks can be detected immediately at the JTL monitoring centre and the appropriate action taken. Slow: Equivalent to a slow puncture: gas slowly leaks from the system causing a seemingly unrelated series of alarms and alterations in the performance of the system. This type of leak is more frequent and can be much harder to detect. JTL’s customers tend to loose more money through slow leaks than through fast leaks. This paper details work undertaken to develop systems which use alarm data, gathered from refrigeration systems in supermarkets, to predict/detect the occurrence of slow gas leaks. There is a clear commercial requirement for such a system as it will allow pre-emptive maintenance to be scheduled, thus minimising the amount of gas allowed to leak from the system. The prediction/classification technique used in this paper is an extension of that presented in [2]. Neural networks are trained using a combination of evolutionary algorithms and traditional back propagation learning. This training scheme has been shown to be marginally more effective than evolved rule-sets and back propagation used in isolation. A description of the data available for prediction systems and the various pre-processing operations performed upon it can be found in section 2. Section 3 goes on to describe the EA/BP hybrid training system in more detail. In section 4 we outline the various experiments performed and their results and finally, a concluding discussion and some suggested areas for further exploration can be found in section 5. monitoring centre, staff have neither the time or the expertise required to watch for these patterns. As the receipt of an alarm is a discreet event, without any duration, it was necessary to present our prediction system with a series of categorised alarm totals. This gives us a list of small valued integers. We create a vector of n samples, each of length t. This covers a period of n * t hours. For each sample period we create a three-tuple of values corresponding to the sum totals of alarms occurring within that sample period, in each of three categories: 2. ENGINEERING AND ALARM DATA Thus for a vector where n=3, t=8 we have nine values, corresponding to plant, coldroom and cabinet alarm totals for each of our three sample periods, spanning 24 hours altogether. There are two important data sets which must be combined in order to create training data suitable for training classifiers to solve the task in hand. These are outlined in this section, along with details of how they were combined and the pre-processing operations used to create valid training data. 2.1. ALARM DATA As previously mentioned, when adverse conditions are detected by on-site monitoring and control hardware they are brought to the attention of operators at JTL’s dedicated monitoring centre. The Network Controller, which is the principle component of the in-store refrigeration control system, uses it’s in-built modem to send a small package of data to the monitoring centre via the telecommunications network. This data package is known as an Alarm and contains useful information, including: The id number of the unit which raised the alarm The nature of the alarm conditions and any related information – such as temperature or pressure readings The time at which the alarm was first raised Information from alarms is copied to a large relational database called “Site Central”. Alarm data has been archived here for almost three years and well over two million individual alarm records are stored. These alarms correspond to 40,000 control/monitoring units at 400 stores monitored by JTL for it’s main customer. A few human experts can diagnose problems with refrigeration systems using this alarm data. Some types of fault, gas loss in particular, have a well defined, but often quite subtle, pattern of events that can be detected by those in the know. Due to training and resource issues at the Plant alarms (compressors/machinery) Coldroom alarms (large cooled storerooms) Cabinet alarms (refrigerated cabinets on the shop floor) 2.2. ENGINEERING DATA In order to train prediction systems to recognise alarm patterns associated with a gas loss it is important to have a large set of training data, containing a number of examples of alarm patterns corresponding to previous gas loss events. The record of gas leaks for the period between 1 st Jan 2000 and 5th April 2002 was obtained from JTL’s main customer’s maintenance logging system. This data records the date on which an engineer attended a site and what action was taken during their visit. SQL was written to pinpoint and record the dates of 240 engineering visits corresponding to gas losses over the two year period. Sadly the engineering logs do not record an exact time for the gas loss event, only the date on which the engineer visited. This means that choosing an input vector for our classifier which immediately proceeds the gas loss event is not possible. As a compromise the input vector’s last sample ends at 00:00am the day the engineer visited. So the gas loss could have occurred between one second and twenty four hours after the end of our input vector. Our inability to select an input vector which immediately proceeds a gas loss event is compounded by the fact that slow gas leaks take place over a period which varies in length from hours to days. Our system must therefore behave more like a classifier than a prediction system; deciding whether a gas loss is currently occurring or not, rather than predicting that a gas loss will occur at a given time. It is also worth noting that when generating training data patterns we were unable to distinguish between fast and slow gas losses because this data is not recorded in the engineering logs. 2.3. GENERATION OF TRAINING DATA 3. EVOLVING NEURAL NETWORKS Training data was generated using the engineering and alarm data sets. This training data corresponds to all recorded occurrences of gas loss at monitored stores for a two year period. 240 patterns in all. Our classifier, in order to be correctly trained, also needs a set of training patterns corresponding to sites which are operating normally (or have non-gas loss related problems). This data was generated in a similar way, using the alarm data. Identically structured vectors of n samples were created for randomly selected sites. These vectors end at randomly selected dates and times. The dates and times used for these training patterns were generated according to two important constraints: The date/time selected must be within the period chosen for examination The corresponding alarm totals vector must not overlap any recorded gas loss event at the site Using this scheme we generated an additional 256 training data patterns which we expect not to correspond to gas leaks in stores. This gives us a total of 496 training data patterns. The output of the neural network is a single Boolean value where: The system used to evolve neural networks is quite similar to EP-Net [4] [5]. Unlike EP-Net, the system developed does not evolve network structure, but does allow genetic operators to be used to find a set of weights for the network. 3.1. NETWORK REPRESENTATION The neural network representation used is based around two data structures: the connection matrix and the weight vector. These two simple structures are capable of representing networks with high levels of complexity (including recurrent and partially recurrent networks, though these are not investigated here). The simple network shown in figure 1 is used as an example. N2 N0 1 Gas loss 0 No gas loss Thus we have 240 training patterns for which we expect an output of 1 and 256 patterns for which we expect an output of 0. To make the input vectors more neural network friendly we multiplied each vector by a scaling factor of 0.1. So an input value corresponding to 10 alarms is presented to our classifier as 1. Due to the extremely small quantity of training data available to us it was necessary to generate test and training data partitions using the 0.632 bootstrap method [3]. We sample an n length dataset, n times at random with replacement to create the training data set and use the remaining, unselected patterns for test data. This gives us a training data set which is, on average, 63.2% the size of the original data set. Accuracies on training data are quite optimistic while, conversely, test data accuracies are rather pessimistic. To counteract this we calculate the overall error value thusly: E = (0.368 * Etest) + (0.632 * Etrain) To compensate for any atypical results that may be generated due to a particular partitioning, we generated 5 differently partitioned sets of training and test data from our original data set. Training runs are then performed on these data sets for a specified number of generations and the mean error rate calculated (see section 4). W0 W2 W5 W3 N3 W1 W6 N4 W7 N5 W4 N1 Input Sigmoid/Hidden Output Bias Figure 1: A simple neural network model with 6 neurons (N0 to N5) and 8 weights (W0 to W7) We use four different types of neuron in our model. Input neurons are a simple placeholder for values to be inputted to the network. Output neurons are a similar placeholder and have no activation function. Output neurons can receive only one incoming connection, the weight value of which is set to 1. Bias neurons have a constant output value of 1 and can not accept incoming connections, outputs from these are used to bias neurons of other types, as in [6]. Finally, sigmoid (or hidden) neurons are standard neurons with a sigmoid activation function [7]. Table 1 shows the connection matrix for the network in figure 1. The connection matrix, for a network with n neurons, is an n x n, sparsely populated matrix representing the connections between neurons. Elements in the connection matrix can be either a rogue “no connection” value, or a positive integer value which indexes the corresponding, real valued weight in the weight vector. An element Mij (column i, row j) represents the connection from neuron i to neuron j. If such a connection exists then Mij will store an index to the weight vector element which holds the weight value for this connection. 0 1 2 3 4 5 0 x x x x x x 1 x x x x x x 2 x x x x x x 3 0 1 2 x x x 4 3 4 5 6 x x 5 x x x x 7 x Table 1: The connection matrix, contains indices into the weight vector Table 2 shows the weight vector for our example network. The weight vector is a simple list of double precision floating point values. These values are the weight values of the connections between neurons in our network. 0 1 2 3 4 5 6 7 W0 W1 W2 W3 W4 W5 W6 W7 Table 2: The weight vector stores real valued weights connecting neurons Networks used in the experiments here all have one layer of hidden nodes, one layer of inputs, one for each of our training data elements and a single output which is our gas loss prediction. 3.2. Evolutionary Operators We are not attempting to alter the structure of the neural network in any way during training. So the connection matrix is largely unimportant. However, because of the way in which the connection matrix and weight vector interact, we find that weights corresponding to the inputs of a given neuron can be found close together (adjacent in fact) in the weight vector. This makes the weight vector an ideal candidate for our gene. The proximity of symbols associated with similar functions within our gene encourages the preservation of schemata (i.e. individual behavioural traits) as genetic operators are applied. This has been shown to heighten the effectiveness of the evolutionary algorithm as a whole [8]. Our evolutionary training scheme is a simple hybrid of standard genetic operations and back propagation learning. We start with a population of randomly initialised individuals. These are sorted into order of fitness (based on the sum squared error on our training data set [9]). Each generation we kill some of the least fit individuals. We then breed replacements using crossover and a possible mutation and insert them into the population. Two parents are selected for random multi-point crossover [10], based on a tournament between a pair of individuals chosen at random from the surviving population. Crossover is performed using a number of points between 0 and m, where m is a predefined maximum dependent on the size of the networks used. Mutation is performed numerically on weight values, rather than on their binary representations. This simplifies the mutation process by removing problems experienced when alterations are made to individual bits within the sign/mantissa/exponent representation of real numbers. A mutation value, selected using a zero centred probability distribution with exponential decay, is added to the weight value chosen for mutation. The number of weights mutated varies between 1 and g, where g is the number of real numbers in the gene. Finally, the fittest 10% of our population are allowed a number of epochs of standard back propagation. This has been found to help with local optimisation within the problem search space. 4. EXPERIMENTS AND RESULTS Three different network topologies were trained using each of the three differently partitioned training and test data sets. This gives us fifteen discreet results. We expected, before carrying out the experiments, that networks with lower numbers of hidden nodes (25 in this case, see table 3) would perform better on training data but be less good at generalisation, with lower accuracies on test data. Conversely, we expected that larger networks (60 hidden nodes, see table 5) would be better at generalisation but with lower training data accuracies. These hypotheses were proved true, as shown below. We also expected that networks with a medium number of hidden nodes (45, see table 4) would provide us with a “best of both worlds” solution, having acceptable training and test data accuracies. All three of the network topologies had almost identical bootstrapped accuracy levels, though the 45 hidden node networks had the worst overall performance. The inability of the medium sized networks to generalise and give good training data accuracies caused them to have a disappointingly low performance overall. In all experiments we used networks with 21 inputs, based on 3 alarm total categories for each of 7 periods of 24 hours. Set Training Data Accuracy Test Data Accuracy Bootstrapped Accuracy 0 1 2 3 4 M: 83% 77% 85% 83% 84% 82.40% 64% 66% 59% 60% 59% 61.60% 76.01% 72.95% 75.43% 74.54% 74.80% 74.75% Table 3: Accuracies for the five training/test data sets with no offset and 25 hidden nodes. Where M is the mean value over the five runs accuracy of 75%. Although, strictly speaking, this was not achieved, it has been agreed by all involved that the results obtained are adequate for our purposes. Due to the small amount of data available to train the prediction systems a high premium has been placed on the ability of the system to generalise. With this in mind we have decided to implement our first test system in the real world using the larger (60 hidden node) networks. Further work will be carried out in the coming months to increase the system’s overall accuracy and ability to generalise. Work will also be carried out to highlight other faults and problem areas which may be suitable for prediction using the methods highlighted here. ACKNOWLEDGEMENTS Set Training Data Accuracy Test Data Accuracy Bootstrapped Accuracy 0 1 2 3 4 M: 80% 77% 82% 81% 84% 80.80% 64% 67% 62% 65% 54% 62.40% 74.11% 73.32% 74.64% 75.11% 72.96% 74.03% Table 4: Accuracies for the five training/test data sets with no offset and 45 hidden nodes. Where M is the mean value over the five runs Set Training Data Accuracy Test Data Accuracy Bootstrapped Accuracy 0 1 2 3 4 M: 80% 77% 82% 81% 82% 80.4% 66% 68% 64% 61% 61% 64.00% 74.85% 73.69% 75.38% 73.64% 74.27% 74.37% Table 5: Accuracies for the five training/test data sets with no offset and 60 hidden nodes. Where M is the mean value over the five runs 5. CONCLUSION Prediction systems developed as a result of this work are to be installed at JTL’s alarm monitoring centre, where they will be used to alert trained staff to the possibility of gas losses. Their role will be largely that of an early warning system, advising staff that further attention may need to be paid to systems at the store in question. Because we are expecting to have a “human in the loop” at all times, and because we can not expect miracles from our rather sketchy training data set, lower accuracy levels can be permitted. Before work began on these systems the authors, along with staff at the monitoring centre, agreed upon a target We acknowledge the support of the Teaching Company Directorate (via the DTI and EPSRC) and JTL Systems Ltd. for funding this project. The authors also thank Evosolve Ltd for partial financial support. BIBLIOGRAPHY [1] R Gluckman, “Current Legislation Affecting Refrigeration”, Proceedings, 9th Annual Conference of the Institute of Refrigeration. [2] D W Taylor, D W Corne et al, “Predicting Alarms in Supermarket Refrigeration Systems Using Evolutionary Techniques”, Proceedings, World Congress on Computational Intelligence (WCCI-2002) [3] B Efron “Bootstrap Methods, Another Look At The Jackknife”, Annals of Statistics 7:1 - 26 [4] X Yao, Y Liu, “A New Evolutionary System For Evolving Neural Networks”, IEEE Transactions on Neural Networks 8(3) [5] X Yao, Y Liu, “A Population Based Learning Algorithm Which Learns Both Architectures and Weights of Neural Networks” Chinese Journal of Advanced Software Research 3(1) [6] J D Knowles, D W Corne, “Evolving Neural Networks for Cancer Radiotherapy”, Practical Handbook of Genetic Algorithms: Applications, 2nd Edition. Chapman Hall/CRC Press pp 443-488 [7] D E Rumelhart et al, “Learning Representations By Back Propagation of Errors” Parallel Distributed Processing: Exploration of the Microstructure of Cognition Vol 1 Chapter 8, MIT Press [8] !!! Schemata and schema theory !!! [9] !!! Sum Square Error !!! [10] !!! Random Multi-point crossover !!!