MineralResourcePub

advertisement
MODELLING FAILURE PATTERN OF A MINING TRUCK WITH A
DECISION TREE ALGORITHM
HUI HU and TAD S. GOLOSINSKI
Mining Engineering, University of Missouri-Rolla
Rolla, Missouri, 65409-0450, USA
This paper reports on development of the failure pattern recognition model for a mining truck. The
model inputs, VIMS data collected in a mine, were processed using one of the Decision Tree
algorithms, a module of the Intelligent Miner For Data software of IBM. The results indicate that the
Decision Tree allows for identification and quantification of relations between the various types of
VIMS data. As such it can be used for development of a model that would allow prognosticating
truck condition and performance. Full development of this capacity requires further research.
1. Introduction
Modern mining equipment if fitted with numerous sensors that monitor its condition and
performance. Data collected by these sensors is used to alert the operator to existence of
abnormal operating conditions and to perform emergency shutdown if the pre-set values
of the monitoring parameters are exceeded. This data is also used for post-failure
diagnostics and for reporting and analysis of equipment performance.
It is believed that availability of this voluminous data, together with availability of
sophisticated data processing methods and tools, may allow for extraction of additional
information contained in the data. One method that may permit this is data mining 1,2.
The research presented in this paper investigates use of the data collected from
various sensors installed on a mining truck for construction of a truck model, which may
allow for reliable projection of both the truck performance and its condition into the
future. Subject to research was data collected by a variety of sensors installed on an offhighway mining truck that together constitute the VIMS (Vital Information Monitoring
System) system of Caterpillar 3. The data mining tool was the IBM Intelligent Miner for
Data 4.
2. Data Description
The data used in this research consists of 81,911 snapshot (event recorder) and datalogger
records, each containing values of 70 truck parameters measured over a period of time.
The data was collected from a Caterpillar 789C truck during its normal operation in a
surface mine.
The snapshot stores a segment of truck history that contains values of all 70
monitored parameters recorded during the period of six minutes, each parameter value
recorded once per second. The snapshot recording is triggered by one of a set of
predefined events, usually occurrence of an abnormal situation where a specific
parameter reaches a critical value. A snapshot record describes truck conditions from five
minutes before the event to one minute after the event 3. In this paper every snapshot
record is called “event” for simplicity.
Unlike snapshot, the data logger records values of all truck parameters that are
monitored by VIMS over varying periods of time, also at one-second intervals 4. The
recording and its end are triggered manually, with individual records covering periods of
up to 30 minutes of truck operation. Datalogger records do not have to be associated with
any events.
Of the 70 truck parameters used in this research, values of 26 were recorded as
categorical and the remaining 44 as numeric values. The examples of basic statistical
description of both the categorical and numerical parameter values are presented in Table
1 and Table 2.
Table 1. Example of categorical parameter values
Parameter Name
ModalValue ModalFrequency(%)
ACTUAL_GEAR_352
Neutral
41.55
AFTRCLR_LVL_137
OK
98.95
BODY_LVR_727
NotMoving
95.4
BODY_POS_726
Down
93.7
Table 2. Example of numerical parameter values
Minimum Maximum Mean Standard
Name
Value
Value
Value Deviation
AFTRCLR_TEMP_110
0
95
41.851 12.8003
AMB_AIR_TEMP_791
0
38.5
21.9324 7.01852
ATMOS_PRES_790
0
93
89.4499 9.19852
BOOST_PRES_105
0
164
31.098 50.1644
3. Experimental Design
3.1. Objective of experiments
Experiments were designed to evaluate and quantify the pattern of changes in parameter
values as associated with various events. As the sensors installed on the truck activate the
snapshot recorder when the predefined limit of a parameter is reached, the objective was
to identify any patterns in parameter values that may allow for early failure recognition.
These patterns were then used for prediction of future events by building a decision tree
classification model of a truck. The model was to predict an occurrence of a selected
event based on the pattern of changes in values of other parameters.
The events recorded most frequently in the available VIMS data set were selected as
the main targets of analysis. These were Engine Speed, event no. 767, and Engine
Coolant Flow, event no. 949. In addition events recorded during normal truck operation,
and those classified by the truck system as “Other” were selected for analysis as well. All
these are identified in Table 3, which also shows the percentage of data that was
associated with each event class.
Table 3. Event class description
Class Number
Class Description
Size %
949
Engine Cool Flow
14.49
767
Engine Speed
13.17
0
Normal Operation
18.10
Other
Other Event
54.24
The Engine Speed is defined as the actual rotational speed of the crankshaft. For the
modeled truck this event is activated when the engine speed reaches 2250 rpm and
deactivated when the speed drops to 1900 rpm. The Engine Cool Flow is defined as the
coolant flow status in the engine cooling system. During normal operation, the coolant
flow switch is closed. The switch opens when coolant flow is less than specified; its
opening triggers the event.
3.2. Data mining tools
IBM Intelligent Miner software package was used as the data mining tool. The basic
algorithm used was SPRINT, a modified CART (Classification and Regression Tree). It
was chosen in preference to the neural network classification algorithm as the Decision
Tree approach is easier to interpret and understand by engineers, thus facilitating easy
analysis of the truck failure pattern 5.
The workings of SPRINT are similar to that of most popular decision tree
algorithms, such as C4.5 (see Quinlan 6); the major distinction is that SPRINT induces
strictly binary trees and uses re-sampling techniques for error estimation and tree
pruning, while C4.5 partitions according to attribute values 7. The GINI index is used to
measure the misclassification for the point split by SPRINT algorithm. For a data set S
containing examples from n classes, the gini(s) is defined as shown in Eq.(1) where p j is
the relative frequency of class j in S. If a split divides s into two subsets s1 and s2, the
index of the divided data ginisplit(s) is given by Eq.(2). The advantage of this approach is
that the index calculation requires only the knowledge of distribution of the class values
in each of the partitions 8.
gini (s)  1   p 2j
gini split (s) 
n1
n
gini ( s1 )  2 gini (s 2 )
n
n
(1)
(2)
The tree accuracy is estimated by testing the classifier on the subsequent cases whose
correct classification has been observed 6. The v-fold cross-validation technique estimates
the tree error rate. This estimation of error rate is used to prun the tree and choose the
best classifier. More detail about this algorithm can be found elsewhere 9.
3.3. Experimental procedures
The two main procedures of data mining are training called also model construction, and
testing called also model validation. In training mode, the function builds a model based
on the selected input data. This model is later used as a classifier. In test mode, the
function uses a set of data to verify that the model created in the training mode produces
results with satisfactory precision.
In this work all available data was split into two parts. Bulk of the data, 90%, was
used for model training. The remainder, 10% of available data, was used for model
testing. The data recorded by Engine Speed sensor and Engine Cool Flow sensor were
not used for the failure pattern recognition, as their values were to be predicted.
Apart from testing the model on 10% of the VIMS data available for the modeled
truck, it was also tested on a separate set of VIMS data collected on another truck of the
same make and model, and working in the same surface mine. The purpose of these runs
was to define the performance and the range of applicability of the model. Specifically,
the model error rate was defined and used for evaluating the performance of the training
and the testing processes.
4. Results and Discussions
A number of experiments were conducted, with models yielding high error rate for
prediction of Engine Cool Flow and Engine Speed events. This would indicate that that
the Decision Tree based model is not the best tool to classify these events. Representative
model output is shown in figure 1, a confusion matrix for the pruned tree that shows the
distribution of the misclassifications. The data set contained 10,692 records related to
Engine Cool Flow event, of which only 9,478 were classified correctly yielding the 12%
error rate. For the Engine Speed event this error was much bigger with 82% of the
records misclassified.
Errors
Predicted Class -->
= 11274 (21.74%)
| EngCoolFlow | OtherEvent
| HiEngSpd
| Normal
|
--------------------------------------------------------------------EngCoolFlow (11.4%)
|
9478 |
325 |
OtherEvent (8.9%)
|
1296 |
16469 |
325 |
HiEngSpd (81.6%)
|
7536 |
325 |
1793 |
Normal
|
425 |
0|
83 |
18735 |
17119 |
3025 |
(3.8%)
824 |
65 | total = 10692
4 | total = 18094
66 | total = 9720
12846 | total = 13354
12981 |
total = 51860
Fig. 1. Confusion matrix of training dataset (90% of available data) with four classes
Analysis of the VIMS data set used in evaluations led to definition of the underlying
problem. It was found that some of the VIMS Event records often contained several
events; therefore these records were not independent.
Errors
Predicted Class -->
= 2545 (6.182%)
| OtherEvent
| Eng-Spd
| Normal
|
---------------------------------------------------OtherEvent (9.0%)
|
16474 |
1620 |
0 | total = 18094
Eng-Spd (3.4%)
|
326 |
9387 |
Normal (4.4%)
|
0|
592 |
12762 | total = 13354
16800 |
11599 |
12769 | total = 41168
7 | total = 9720
Fig. 2. Confusion matrix of training dataset (90% of available data)
Errors
Predicted Class -->
= 282 (6.165%)
| OtherEvent
| Eng-Spd
| Normal
|
---------------------------------------------------OtherEvent (5.6%)
|
1897 |
113 |
0 | total = 2010
Eng-Spd (9.5%)
|
103 |
977 |
0 | total = 1080
Normal (4.4%)
|
0|
66 |
2000 |
1156 |
1418 |
total = 1484
1418 | total = 4574
Fig. 3. Confusion matrix of tested dataset (10% of available data)
To assure that analyzed event records are independent all records related the event
Engine Cool Flow were removed from the analyzed data set. Models based on the new
data set yielded much lower, satisfactory error rate. These are shown in figure 2 and
figure 3, both of which present the related confusion matrix. The results of the modeling
have improved significantly. The error rates obtained for the data used for training and
that used for testing were defined to be 6.182 % and 6.165 % respectively. This confirms
Fig. 4. Decision tree model: graphic interpretation
that the model as described constitutes a reasonably accurate reflection of the truck
behavior and can be used for truck condition predictions.
The model may be presented as a binary decision tree (figure 4). Each interior node
of the binary decision tree tests an attribute of a record. If the attribute value satisfies the
test, the record is sent down the left branch of the node. If the attribute value does not
meet the requirements, the record is sent down the right branch of the node 3. Three
classes are marked with different colors at upper left corner. They are reflected in the tree
map as solid square. The solid circles are the decision nodes. The binary decision tree
consists of the root node on top, followed by non-leaf nodes and leaf nodes. Branches
connect a node to two other nodes. Root and non-leaf nodes are represented as pie charts.
Leaf nodes are represented as rectangles.
In this model, out of total of 70 only 24 truck parameters were found to be of
importance in classifying the Engine Speed, Normal, and Other Event. The other
parameters do not contribute significantly to event pattern recognition and may be
disregarded.
The model was also used to predict occurrence of events at another truck for which
VIMS data was available. However, the related confusion matrix, shown in figure 5,
indicates much high error rate. This rate is particularly high for Other Event events;
obviously quite different events have been recorded on the other truck. The Normal event
prediction has the error rate of 66.7%, most likely a result of different operating
conditions. It is, therefore, concluded that the model is specific to a truck for which it
was developed and can not be used for prediction of condition of different trucks.
Errors
Predicted Class -->
= 20931 (67.68%)
| OtherEvent
| Eng-Spd
| Normal
|
---------------------------------------------------OtherEvent (81.1%)
|
2840 |
8997 |
3202 | total = 15039
Eng-Spd
|
0|
3541 |
1499 | total = 5040
228 |
7005 |
3615 | total = 10848
3068 |
19543 |
8316 | total = 30927
(29.7%)
Normal (66.7%)
|
Fig. 5. Confusion matrix of testing dataset (7EK00388)
The Engine Speed events have the highest prediction accuracy in this case, which
allows a speculation that High Engine Speed events of many trucks can be predicted
based on a model developed for one truck only. Further work is needed to confirm
correctness of this speculation.
5. Conclusions
If real time data on truck condition is available, the predictive model can be built to
project the truck condition into the future. Such model may be built using classification
tree algorithm as described in this paper.
If VIMS Snapshot data is used in model construction and in modeling attention has
to be paid to the way this data is acquired. If several events take place during the snapshot
data recording, only the primary event that triggered the recording can be used in
evaluations.
Truck condition model as described in this paper cannot be freely used for condition
predictions of other trucks. It appears that only some of the wide variety of events can be
predicted in this situation. Definition of the specific events that can be modeled, and the
reliability of the related predictions need further investigations.
Acknowledgements
Financial support of the investigations reported on in this paper by Caterpillar, Inc.
of Peoria, Illinois, is gratefully acknowledged.
References
1.
2.
3.
4.
5.
6.
7.
T. S. Golosinski, Data Mining Uses in Mining. Proceedings, Computer Applications
in the Minerals Industries (APCOM), Beijing, China, 2001, pp. 763-766.
T. S. Golosinski, H. Hu, and R. Elias, Data Mining VIMS for Information on Truck
Condition. Proceedings, Computer Applications in the Minerals Industries
(APCOM), Beijing, China, 2001, pp. 397- 402.
Caterpillar, Inc., Vital Information management System (VIMS): System Operation
Testing and Adjusting (1999), Company publication.
IBM (International Business Machines Corporation), Manual: “Using the Intelligent
Miner for Data” (2000), Company publication.
IBM (International Business Machines Corporation), Intelligent Miner for Data:
Enhance Your Business Intelligence (1999). Company publication.
J. R. Quinlan, C4.5: Programs for Machine Learning (1993), Morgan Kaufmann
Publishers, Inc.
J. Jang, C. Sun, Neuro-Fuzzy and Soft Computing (1997), Prentice-Hall, Inc.
8.
9.
L. Breiman, J. Friedman, Classification and Regression Tree (1984), Wadsworth
International Group.
J. Shafer, SPRINT: A Scalable Parallel Classifier for Data Mining in Proceedings of
the 22nd VLDB Conference Mumbai (Bombay), India, 1996.
Download