CzechHu

advertisement
Mine Planning and Equipment Selection 2002
MODELING CONDITION AND PERFORMANCE OF MINING EQUIPMENT
Tad S. Golosinski and Hui Hu
Department of Mining Engineering, University of Missouri-Rolla, Rolla, MO 65409-0450, USA
ABSTRACT: The paper follows on the earlier MPES publication that suggested use of data mining techniques to model
operation of mining equipment. It reports on the new developments that concentrated on modeling of performance and
condition of mining trucks based on the analysis of digital data collected in the field by truck vital sign information and
management system. The models developed as the result of this work allow for projection of truck condition and
performance into future with reasonably high accuracy. As such they allow for better control of mining operation and are
expected to find numerous applications in mines worldwide.
1 INTRODUCTION
Modern mining equipment is equipped with
numerous sensors that monitor its condition and
performance. Data collected by these sensors is used
to alert the operator to existence of abnormal
operating conditions and to perform emergency
shutdown if the pre-set upper or lower limits of the
monitoring parameters are reached. This data is also
used for post-failure diagnostics and for reporting
and analysis of equipment performance. Availability
of this voluminous data, together with availability of
sophisticated data processing methods and tools,
allow for extraction of additional information
contained in the data. One method that may permit
this is data mining (Golosinski, 2001 and
Golosinski and Hu, 2001).
The research presented in this paper investigates
use of the data collected from various sensors
installed on the mining truck for construction of a
truck model, which allows for reliable prediction of
both the truck performance and its condition into the
future. Subject to research was data collected by a
variety of sensors installed on off-highway mining
trucks that together constitute the VIMS (Vital
Information Monitoring System) system of
Caterpillar (Caterpillar 1999). The data mining tool
was the IBM Intelligent Miner for Data (IBM 2000).
2 DATA DESCRIPTION
The data used in this research consists of
snapshot (event recorder) and datalogger records,
each containing values of 70 truck parameters
measured over a period of time. The data was
collected from 6 Caterpillar 789B trucks during their
operation in a surface mine.
The snapshot stores a segment of truck history
that contains values of all 70 monitored parameters
recorded during the period of six minutes. Each
parameter value recorded once per second. The
snapshot recording is triggered by one of a set of
predefined events, usually occurrence of an
abnormal situation indicated by a critical value of a
monitored parameter. A snapshot record describes
truck conditions from five minutes before the event
to one minute after the event (Caterpillar 1999). In
this paper, every snapshot record is called “event”
for simplicity.
Unlike snapshot, the data logger records values of all truck
parameters that are monitored by VIMS over varying periods
of time, also at one-second intervals (Caterpillar 1999). The
recording is triggered and stopped manually, with individual
records covering periods of up to 30 minutes of truck
operation. Datalogger records do not have to be associated
with any events.
Of the 70 truck parameters monitored in the field, values
of 26 were recorded as categorical and the remaining 44 as
numeric values. The examples of basic statistical description
of both the categorical and numerical parameter values are
presented in Tables 1 and 2. Previous research was confined to
analysis of numerical data only (Hu and Golosinski, 2002).
The approach present in this paper analyzes both types of data
the actual values being statistical parameters of recorded
values defined for one to three minute time intervals. The
statistical parameters include:









Minimum
Maximum
Range
Average
Standard Deviation
Variance
Regression Intercept
Regression Slope
Regression Sum of Square
Table 1. Example of categorical parameter values
Neutral
Modal Frequency
(%)
41.55
AFTRCLR_LVL_137
OK
98.95
BODY_LVR_727
Not Moving
95.4
BODY_POS_726
Down
93.7
Parameter Name
Modal Value
ACTUAL_GEAR_352
Table 2. Example of numerical parameter values
Parameter Name
AFTRCLR_TEMP_110
Minimum Maximum
Value
Value
0
95
Mean
Value
41.8
Standard
Deviation
12.8
38.5
21.9
7.0
AMB_AIR_TEMP_791
0
ATMOS_PRES_790
0
93
89.4
9.1
BOOST_PRES_105
0
164
31.0
50.1
Figure 1 illustrates prediction of one VIMS event, “high
engine speed”. To predict occurrence of this event statistical
data is defined for each three minute interval of VIMS
records. If one set of three-minute data has similar statistical
characteristics as do the first three minutes of the “high engine
speed” snapshot the probability exists that that high engine
Mine Planning and Equipment Selection 2002
speed will be reported after another two minutes of
truck operation.
VIMS Event Prediction
Normal Engine Speed
High Engine Speed
Normal Engine Speed
Snapshot
VIMS
Data
0 0 0 0 0 0 1 2 3 4 5 6 0 0 0 0 0
High Eng
Event_ID
Predicted
Label
Other
Other
767_1
767_2
Eng_1
Eng_2
Other
Other
Figure 1. VIMS event prediction model
As shown in Figure 1, the one-minute model can
predict events that will occur within the next 4
minutes of truck operation. Similarly the two-minute
model can provide predictions extending of event to
occur within the following 3 minutes. The threeminute model can only predict events that will occur
within the following two minutes.
3 MODEL DESIGN
3.1. Objective
Modeling was intended to evaluate and quantify
the pattern of changes in parameter values as
associated with various events. As the sensors
installed on the truck activate the snapshot recorder
when the predefined limit of a parameter is reached,
the objective was to identify any patterns in
parameter values that may allow for early failure
recognition. These patterns were then used for
prediction of future events by building a decision
tree classification model of the truck. The model
was to predict an occurrence of a selected event
based on the pattern of changes in values of other
parameters.
The “high engine speed” events that were most
numerous in analyzed data were chosen to be the
main targets of analysis. In addition data collected
during normal operation was selected for
comparative analysis and assigned a name “other”.
The “engine speed” is defined as the actual
rotational speed of the crankshaft. For the modeled
truck this event is activated when the engine speed
reaches 2250 rpm and deactivated when the speed
drops to 1900 rpm.
3.2. Data Mining Tools
IBM Intelligent Miner software package was
used as the data mining tool. The basic algorithm
used was SPRINT, a modified CART (Classification
and Regression Tree) algorithm. It was chosen in
preference to the neural network classification
algorithm as it is easier to interpret and understand,
thus facilitating easy analysis of the truck failure
pattern (IBM 1999).
The workings of SPRINT are similar to that of
most popular decision tree algorithms, such as C4.5
(Quinlan, 1993); the major distinction is that SPRINT induces
strictly binary trees and uses re-sampling techniques for error
estimation and tree pruning, while C4.5 partitions according to
attribute values (Jang and Sun, 1997). The GINI index is used
to measure the misclassification for the point split by SPRINT
algorithm. For a data set S containing examples from n
classes, the gini(s) is defined as shown in Eq.(1) where pj is
the relative frequency of class j in S. If a split divides s into
two subsets s1 and s2, the index of the divided data ginisplit(s) is
given by Eq.(2). The advantage of this approach is that the
index calculation requires only the knowledge of distribution
of the class values in each of the partitions (Breiman and
Friedman, 1984).
(1)
gini (s)  1   p 2j
ginisplit( s ) 
n1
n
gini ( s1 )  2 gini ( s2 )
n
n
(2)
The tree accuracy is estimated by testing the classifier on
the subsequent cases whose correct classification has been
observed (Quinlan, 1993). The v-fold cross-validation
technique estimates the tree error rate. This estimation of error
rate is used to prune the tree and choose the best classifier.
More detail about this algorithm can be found elsewhere
(Shafer, 1996).
3.3. Modeling Procedures
The two main procedures of data mining are training
called also model construction, and testing called also model
validation. In training mode, the function builds a model based
on the selected input data. This model is later used as a
classifier. In test mode, the function uses a set of data to verify
that the model created in the training mode produces results
with satisfactory precision.
In this work all available data was split into two parts.
Bulk of the data, 86.4%, was used for model training. The
remainder, 13.6% of available data, was used for model
testing. The test data includes dataset #1 (random selection)
and dataset #2 (whole snapshot and datalogger).
After three models are built based on the one, two, and
three minute statistical data sets, the error rate was defined and
used for evaluating the performance of the training and the
testing processes.
4 RESULTS AND DISCUSSION
The model built on three-minute statistical data has less
than 5% training error rate, 19% error rate on test #1, and 14%
error rate on test #2. The model shows better performance on
unseen VIMS event prediction than one- and two- minute
models (Table 3). However, the tradeoff is that this model can
only provide two-minute early prediction with three classes,
“Eng1”, “Eng2” and “Other”.
Representative three-minute model output is shown in
figures 1 to3 for training data set, for test #1 data and test #2
data as the confusion matrix. A confusion matrix for the
pruned tree shows the distribution of the misclassifications. In
every matrix, the number on the diagonal is the correct
classification; others are the number of misclassification.
Mine Planning and Equipment Selection 2002
Table 3. Model Performance Comparison
Error Rate %
training
test #1
test #2
one-minute
6.7
24
17.9
two-minute
three-minute
5.7
4.9
29.8
19
15.3
14
o n e - m in u te
tw o - m in u te
th r e e - m in u te
A v e ra g e
E rro r R a te
%
0 .4 5
0 .4 0
0 .3 5
0 .3 0
0 .2 5
Total Errors = 28 (4.878%)
Predicted Class --> | OTHER | ENG1
| ENG2
0 .2 0
0 .1 5
0 .1 0
|
---------------------------------------------------OTHER
|
411 |
23 |
4|
total = 438
ENG1
|
1|
65 |
0|
total = 66
ENG2
|
0|
0|
70 |
total = 70
0 .0 5
0 .0 0
tr a in in g
te s t
---------------------------------------------------412 |
88 |
74 |
total = 574
Figure 5. Comparison of average error rate
Figure2. Training: three-minute statistical model
Predicted Class --> | OTHER
| ENG1
| ENG2
th re e -m in u te
0 .4 0
---------------------------------------------------|
42 |
9|
0|
total = 51
ENG1
|
3|
5|
0|
total = 8
ENG2
|
0|
0|
4|
total = 4
0 .3 5
0 .3 0
0 .2 5
---------------------------------------------------14 |
tw o -m in u te
|
OTHER
45 |
o n e -m in u te
E rro r R a te
S ta n d a rd
D e v ia tio n
Total Errors = 12 (19.05%)
4|
total = 63
0 .2 0
0 .1 5
Figure 3. Test#1: three-minute statistical model
0 .1 0
Total Errors = 9 (14.06%)
Predicted Class --> | OTHER
| ENG1
| ENG2
0 .0 5
0 .0 0
|
tra in in g
---------------------------------------------------OTHER
|
47 |
5|
0|
total = 52
ENG1
|
4|
2|
0|
total = 6
|
0|
0|
6|
total = 6
ENG2
---------------------------------------------------51 |
7|
6|
Figure 6. Error rate: comparison of standard Deviations
Table 4. Error rate statistics.
Training
total = 64
Figure 4. Test#2: three-minute statistical model
4.1. Model Performance Analysis
After the VIMS data was aggregated into
statistical data at the three-minute interval, the
number of rows is dramatically reduced and the data
mining process is much faster than mining second
data. Error rates on test data (unseen events) are
reduced to 19% and 14% (figure 2, 3) for test #1 and
test #2 respectively, which means the model is more
robust than using statistical data at one and two
minute interval.
In addition, the error rate, as well as the related
error rate of mean and standard deviation, is
calculated for every class. As per figure 5 and 6, the
three-minute model presents the best prediction
performance with 3% and 21% average error rates
and 3% and 26% standard deviation errors for both
the training and tests datasets.
te s t
Total Test
correct
total
Error Rate %
correct
total
Error Rate %
Other
411
438
0.06
89
103
0.14
Eng1
65
66
0.02
7
14
0.50
Eng2
Average
(Error Rate)
Standard
Deviation
(Error Rate)
70
70
0.00
10
10
0.00
0.03
0.21
0.03
0.26
While the three-minute model has low average prediction
error rate, the high error rate of standard deviation for test data
sets makes the prediction unstable. As an example in threeminute model 14% error rate of “Other” event prediction
(Table 4) implies the 14% probability of the false alarm
indicating too high engine speed. The class “Eng1” as defined
by the two-minute model with 50% error rate (Table 4) might
imply 50% probability of the false high engine speed alarm.
Thus the related model is rather unreliable and can not be used
for prediction of events taking place.
Mine Planning and Equipment Selection 2002
4.2. Three-Minute Decision Tree Classification
Model
This approach resulted in development of more
knowledgeable decision tree (i.e. simpler one) that
can be presented as a binary decision tree (figure 7).
Each interior node of the binary decision tree tests
an attribute of a record. If the attribute value
satisfies the test, the record is sent down the left
branch of the node. If the attribute value does not
meet the requirements, the record is sent down the
right branch of the node. Three classes are marked
with different colors at upper left corner. The solid
circles are the decision nodes. The binary decision
tree consists of the root node on top, followed by
non-leaf nodes and leaf nodes. Branches connect a
node to two other nodes. Root and non-leaf nodes
are represented as pie charts. Leaf nodes are
represented as rectangles.
Figure 7. Decision Tree Structure or Three-Minute
Statistical Data
In this decision tree (figure 7) the root node,
named “ENG_SPD_MAX”, classifies nearly all
“Eng2” events into right leaf (69 out of 70). These
are displayed as the yellow rectangle. This reflects
the fact that the “high engine speed” event is
activated and recorded when the engine speed
reaches the predefined limit at the second threeminute of the snapshot. The rule for this
classification is:
If (ENG_SPD_MAX>=2184.25)
then class=ENG
The activation value of “high engine speed”
event defined by VIMS is 2250 rpm. This differs
from the value determined by the decision tree
model 2184.25 re misclassified as “Eng2” events.
The rest of events are further classified using
more complex rules. One of these rules classifies the
“Eng1” event as follows (the circled leaf in figure 7):
If (ENG_SPD_MAX<2184.25)
and (TRBO_IN_PRES_MIN<87.75)
and (ENG_SPD_REGR_SYY<44991960)
and (RTR_LTR_SUSPCYL_REGR_INTERCEPT<-1688.9)
and (GEAR_SELECT_RANGE>=100.5)
then class=ENG1
Of 15 “Eng1” events that were analyzed this rule has
classified 14 events correctly, with only one misclassification.
As such it allows for prediction of the event in question two
minutes before it occurs.
5 CONCLUSIONS
This approach compresses the information into statistical
table and provides the prediction with certain accuracy. It also
gives the possibility to predict the event for two minutes
earlier. However, the prediction accuracy needs further
improvement and results need verified by more test data. The
possible approach to improve the prediction accuracy is to add
more statistical parameters and use more VIMS data.
REFERENCES
Breiman, L. and Friedman, J. 1984, Classification and
regression tree . Wadsworth International Group.
Caterpillar, Inc. 1999, Vital Information Management System
(VIMS): system operation testing and adjusting. Company
publication.
Golosinski, T. S. 2001, Data mining uses in mining.
Proceedings, Computer Applications in the Minerals
Industries (APCOM), Beijing, China, 2001, pp. 763-766.
Golosinski, T. S. and Hu, H 2001, Data mining of mine
equipment databases. Proceedings of the Artificial Neural
Networks in Engineering Conference (ANNIE 2001), St.
Louis, Missouri, U.S.A.
Hu, H. and Golosinski, T. S. 2002, Failure pattern recognition
of a mining truck with a decision tree algorithm. Mineral
Resources Engineering (in print).
IBM (International Business Machines Corporation) 2000,
Manual: “Using the Intelligent Miner for Data”. Company
publication.
IBM (International Business Machines Corporation) 1999,
Intelligent Miner for Data: enhance your business
intelligence. Company publication.
Jang, J. and Sun, C. 1997, Neuro-fuzzy and soft computing.
Prentice-Hall, Inc.
Quinlan, J. R. 1993, C4.5: Programs for machine learning.
Morgan Kaufmann Publishers, Inc.
Shafer, J. 1996, SPRINT: a scalable parallel classifier for
data mining. Proceedings of the 22nd VLDB Conference,
Mumbai (Bombay), India, 1996.
Download