Data Mining Mine Equipment Databases

advertisement
DATA MINING OF MINE EQUIPMENT DATABASES
TAD S. GOLOSINSKI
University of Missouri-Rolla,
MO 65409-0450, USA
HUI HU
University of Missouri-Rolla,
MO 65409-0450, USA
ABSTRACT
The paper presents research into use of data mining methods for knowledge
discovery in mining databases. The data was collected using VIMS system of
Caterpillar installed on several trucks operating in a surface mine. It was mined
with IBM Intelligent Miner for data. Data mining was found to allow for
identification and quantification of relations between the various types of
VIMS data. As such it offers the potential for development of a truck model
that can be used for prognosticating truck condition and performance.
Development of this capability requires further research.
INTRODUCTION
Modern mining equipment is fitted with numerous sensors that monitor its condition
and performance. The data collected by these sensors is used to alert the operator to
existence of abnormal operating conditions and to perform emergency shut-own if the
pre-set values of the monitoring parameters are exceeded. This data is also used for postfailure diagnostics and for reporting and analysis of equipment performance.
It is believed that availability of this voluminous data, together with availability of
sophisticated data processing methods and tools, may allow for extraction of a variety of
additional information contained in the data. One method that may be of value is data
mining (Golosinski, 2001).
The research presented in this paper analyzes data collected from various sensors
installed on several mining trucks with the purpose to develop a model of truck operation
that may facilitate reliable projection of truck performance and its condition into the
future. Data was collected from Caterpillar 789B trucks equipped with VIMS (Vital
Information Management Systems, during the period of January to October 2000. IBM
Intelligent Miner for Data was used to conduct data mining.:
VIMS OPERATION
Caterpillar's Vital Information Management System (VIMS) is installed on selected
CAT mining equipment. It is intended to assist with machine management by informing
operators, service personnel and supervisors of the status of selected machine functions
and by providing information on equipment production and performance. VIMS monitors
and records parameters of numerous sensors that are integrated into the vehicle design. It
has the capacity to alert the operator if these parameters exceed the pre-set critical values.
In addition it can conduct emergency equipment shut-down if so programmed
(Caterpillar, 2000).
On-board VIMS unit records the collected data as well as occurrence of certain
VIMS events. The recorded data can be downloaded into a notebook computer.
Alternately it can be sent to the central control unit via radio (VIMS Wireless).
VIMS DATA
VIMS records data in seven different formats. These are:
Event Summary List (ESL). A VIMS event is recorded when the measured value of a
monitored parameter exceeds that considered acceptable. Event List is a record of events
that are occurring on the machine. It is limited to the last 500 events, listed in a
chronological order.
Snapshot. Snapshot stores a segment of machine history that consists of values of all
monitored parameters recorded at one-second interval. The snapshot is triggered by
VIMS event and as such it is related to abnormal condition or emergency situation of the
machine.
Data Logger. Data Logger records values of all the machine parameters that are
monitored by VIMS and sampled at one-second intervals. The logger is started and
stopped by the operator command and can record data for up to 30 minutes.
Trends. Trends record the minimums, maximums and averages of the selected
machine condition parameters for a pre-selected period of time.
Cumulative. Cumulative records the number of occurrences of specific events over a
pre-set period of time. An example of cumulative information can be the engine
revolutions or fuel consumption over the life of the machine, or its component.
Histogram. Histogram records the performance history of a selected parameter since
last reset. For example a histogram of the engine speed would indicate the percentages of
time that the engine operated within a pre-specified speed ranges.
Payload. Payload carried by the machine can be recorded if so specified and
providing that the machine is equipped with an appropriate sensor.
Four different data types are recorded. These are:
Sensed Data. This data contains values of sensor parameters and position of
switches installed on the machine.
Internal Data. This data is generated internally within VIMS main module. It
includes records of date and time.
Communicated Data. This data is acquired through the data links to various
machine components, including non-CAT components. For example the engine speed
may be monitored and recorded through the data link to the electronic engine control
system.
Calculated Data. This data is calculated by the VIMS main module as a function of
other data that is being collected. As an example event duration may be calculated based
on internal data and stored in the event list.
INTELLIGENT MINER
Variety of data mining software is available from numerous vendors. It includes
Intelligent Miner of International Business Machines Corporation, MineSet of Silicon
Graphic Inc., Clementine of Integral Solutions Limited of U.K. and other (Westphal and
Blaxton, 1998). The IBM Intelligent Miner (IM) version 6.1 was used for data mining
reported in this paper (IBM, 2000). It offers a choice of algorithms, is easy to use, and
has proven itself useful in many commercial applications.
Following mining and statistical functions are included in Intelligent Miner:
1. Mining functions: associations, demographic and neural clustering, sequential
patterns and similar sequences, tree and neural classification, and neural and RBF (Radial
Basis Function) prediction.
2. Statistics functions: bivariate statistics, linear regression, principal component
analysis, univariate curve fitting and factor analysis.
The IM allows modeling of events and processes that can be either usual or unusual.
Usual events describe the situation that is considered normal and for which the relations
between different attributes are sought. For example, relations between truck operating
and mechanical attributes can be defined such as a relation between engine load and truck
payload. Definition and quantification of these relations may be of help in improving
efficiency of truck operation or help with operator training.
The unusual events are failures of the monitored machine or its component. Data
mining of these events may allow for definition of algorithms that would facilitate
modeling of truck operation to help with planning of its maintenance and reduction of
downtime.
To facilitate data mining of VIMS databases with IM the data format has to be
adapted to that acceptable to the IM. The original VIMS data, downloaded from an onboard VIMS unit, can be easily merged into MS Access 97 database using the VIMS
PC99 software. However, IM does not accept Access data format and to facilitate its use
data has to be transferred to DB2 database that is compatible with version 6.1 of
Intelligent Miner.
DATA MINING METHODS
Of the various data mining methods used by IM the following were used in this
investigations: Major Factor Analysis, Clustering, Classification, and Sequential Pattern.
Data mining was done on VIMS database that consisted of 300 MB of records
collected on several Caterpillar model 789B trucks that operated in a surface mine
between February and October 2000. Data collected by VIMS data logger consisted of
105 data sets, each set covering a period of up to 30 minutes of truck operation. Overall
85 parameters of truck condition and performance were monitored with their values
recorded each second by the on-board VIMS.
The original data was transferred to DB2 and pre-processed. This included data
clean-up, and identification and extraction of data that is of interest to the problem at
hand.
INVESTIGATIONS
Relationship between Truck Parameters in VIMS Data Logger Data
Not all truck parameters are independent and a variety of relations exist between
them. The preliminary research done by Ataman (2001) defined significant correlations
to exist between various parameters. Two VIMS parameters, engine speed and fuel flow,
were found to show strong correlation with many other parameters. Also confirmed was a
relatively strong relation between engine oil pressure and engine speed, indicated in the
VIMS manual. The relation between engine coolant temperature and aftercooler
temperature was another expected result. No other significant relations were identified.
This work confirms that the linear regression method of IM can be used to define
and quantify the relations that may exist between various parameters describing truck
performance and condition. It is believed that these relations, in turn, can be used for
truck operation. In relation to VIMS data the major problem is data format
incompatibility with that of IM. An interface between VIMS and IM needs to be
developed that would allow for easy data transfer and manipulation.
Major Factor Analysis (MFA) of VIMS Data Logger Data
In statistical terms, all parameters constitute variables. The relationship between two
variables is defined by the correlation coefficient. For the purpose of modeling truck
condition and performance high correlation between any two variables indicates
redundancy. MFA eliminates this redundancy by combining correlated variables into
factors. Lower number of factors simplifies further analysis.
In the described research all monitored truck parameters constituted inputs into
MFA. The analysis was performed using varimax rotation that maximizes the variance of
the factor loadings for each input variable. The rotated factors have a high correlation
with one set of input variables and little or no correlation with another set of input
variables. The varimax
rotational strategy can
give
a
clearer
interpretation
of
the
results by classifying
variables
into
new
independent factors.
Figure 1. presents
the factor loadings that
quantify
strength
of
relationships
between
variables
in
the
investigated
databases.
Their value reflects the
linear
relationship
between
the
input
variables
and
the
corresponding
factors,
Figure 1. The Factor Loading View
and varies between –1
and +1. If the factor loading is +1 there is a perfect positive relationship between the
variable and the factor. Factor loading of –1 denotes a perfect negative relationship. If the
factor loading is 0, there is no relationship between the input variable and the factor.
In the factor loading window, the vertical axis represents one of the factors while the
horizontal one represents another. The dots depict the factor loadings. The labels next to
the dots show the number of the input variables, name of each variable identifiable at the
label list on the right side of figure 1. If a dot has a high coordinated value on one of the
axes and lies in close proximity to it, there is distinct relationship between one of the two
factors and this variable (IBM, 2000).
The results of this analysis identified 19 independent statistically factors that
represent the original 85 truck parameters. The variables that are included into the same
factor are highly correlated. Table 1 summarizes the results of Major Factor Analysis
The first factor accounts for 29% of variables, or 24 truck parameters. These are
highly correlated with each other as well as with the first factor. All the 24 parameters
define temperature and pressure, including atmospheric temperature, engine coolant
temperature, turbocharger inlet air pressure, etc. Therefore this class of parameters is
represents temperature/pressure indicators of the truck.
The second factor accounts for 12% of the parameters. It groups engine load
indicators and includes such variables as engine speed, throttle position, boost pressure,
and so on. Interestingly the ECM (Electronic Control Module) calculates engine load as a
function of: engine speed, throttle switch position, throttle position, boost pressure, and
atmospheric pressure. The third factor can be thought as the payload indicator, and the
fourth factor is the fluid level indicator. No physical interpretation for all factors can be
provided at present.
The MFA output results also include factor scores, the actual values of individual
observations for the factors. These factor scores are particularly useful when further
analysis of factors is to be performed.
In conclusion, the Major Factor Analysis can be used to reduce the number of truck
performance and condition parameters that one needs to be concerned with, thus simplify
further analysis. Lower number of variables in the input to clustering and classification
saves evaluation time and minimizes problems created by missing variable values.
Table 1. Machine Parameter Indicators (Factors)
No
.
Factor (percentage)
Indicator
1
Factor 1 (29%)
Temperature
2
Factor 2 (12%)
Engine Load
3
Factor 3 (5%)
Payload
4
Factor 4 (6%)
Fluid Level
5
Factor 6 (2.9%)
Road Condition
6
Factor 8 (2.7%)
Transmission Switch
7
8
Factor 13 (2%)
Factor 14 (2.28%)
Auto Lube
Body Level
9
Factor 16 (3.2%)
Fan Speed
10
Factor 17 (2.4%)
Engine Fuel Rate
Parameters (Variables)
Atmospheric Temperature, engine
coolant temperature, turbocharger inlet
air pressure, etc.
Engine Speed, throttle Position, Boost
Pressure, etc.
Payload, Suspension Cylinder
Pressures, Payload Status, Machine
Pitch.
Engine Oil Level, Low Steering
Pressure, Engine Oil Pressure, etc.
RTR-LTR and RTF-LTF Suspension
Level, Machine Rack
Torque Converter Screen, Transmission
Charge Filter, etc.
Auto Lube Datalink, Auto Lube
Body Level, Body Position
High or Low Speed Fan, Ground
Speed, etc.
Engine Fuel Rate
Clustering of VIMS Data Logger Data
Clustering searches for characteristics that most frequently occur in common and
groups the related data into clusters. The number of detected clusters and the properties
of each cluster are the results. In addition distribution of characteristics within the clusters
is quantified.
The Demographic Clustering
provides fast and natural clustering of
very large databases. It automatically
determines the number of clusters to
be generated. Similarities between
records are determined by comparing
their field values. The clusters are
then defined so that Condorcet’s
criterion is maximized (IBM, 2000).
Following the Major Factor
analysis the remaining data set was
data
mined
using
the
IM
Figure 2. Demographic Clustering -IM Output
demographic clustering. As a result
the data set was segmented into 9
clusters as shown in fig. 2 (Golosinski, Hu, and
Figure 4. Demographic Clustering:
Elias, 2001). The three largest clusters each
Payload Cluster (Horizontal Scale:
Payload in Tons)
account for the 14% of the whole data set
Fig. 3 and 4 show a zoom of the cluster
related to haul distance and to truck payload.
The haul distance cluster, shown in fig. 3
indicates that the haul distance is one of the
main determinants of fuel consumption rate.
Interestingly, the percentage of 6 to 10 mile
long hauls in this cluster is approximately
40%, while the same percentage for the whole
%
population is only 5%. One possible
0 2 4 6 8 10 12 14 16 18 20 22 24
explanation is that on the long hauls truck fuel
consumption rate is larger since truck spends
Figure 3. Demographic Clustering:
more time running at the full load. On short
Haul Distance Cluster (Horizontal
hauls more time is spent loading / dumping /
Scale: Haul Distance in Miles)
maneuvering / waiting activities during which
fuel consumption is low.
Fig.4 shows the payload cluster. It
indicates that all trucks in this cluster were
empty (100% of the cluster), while the
percentage of empty trucks in the whole
database is only around 50%. All the trucks in %
Payload cluster were traveling at 4th gear with
0 20 40 60 80 100 120 140 160 180
the speed of 25 to 35 MPH and the fuel
consumption rate was average.
The other clusters identified in this work are presented in fig. 2. These contain
variety of information related to truck performance.
Classification of VIMS Data Logger Data
Classification is used to segregate database records into pre-defined classes based on
specific criteria. Thus this technique can be used to define what truck operating or
condition parameters define fuel consumption rate, what parameters define its cycle time
and the like.
The tree-classification mining function builds a classification model as a binary
decision tree. Each interior node of the binary decision tree tests an attribute of a record.
If the attribute value satisfies the test, the record is sent down the left branch of the node.
If the attribute value does not meet the requirements, the record is sent down the right
branch of the node. The 4 classes are marked with different colors at upper left corner.
They are reflected in the tree map as solid square. The solid circles are the decision
nodes. The binary decision tree consists of the root node on top, followed by non-leaf
nodes and leaf nodes. Branches connect a node to 2 other nodes. Root and non-leaf nodes
are represented as pie charts. Leaf nodes are represented as rectangles.
Each node can display its characteristics in the window shown at the bottom of fig.5.
This information includes:
Label: The pre-dominant class label
of the selected node.
Test: The split criterion for this node.
This applies only to non-leaf nodes and
specifies a simple selection.
Records: The number of records
contained in each of the sub-nodes of the
selected node.
Distribution: The number of records
corresponding to each of the possible class
labels. The classification is most
meaningful if all records belong to one
leaf node only. However, by pruning the
binary decision tree, records of other
nodes can be assigned to the selected
node.
Figure 5. Classification-Tree
Purity: The percentage of correctly
classified records assigned to a node.
The Tree Classification was done for the fuel consumption rate, leading to definition
of four classes of parameters that indicate high fuel consumption. These can be identified
by tracking the thick black line with the arrow that links the nodes and continues on to
the rectangles at the foot of figure 5. Use of color in computer generated figure 5 makes
the tracking easy.
Selected observations that can be made in this case indicate that:
1. When ground speed is in the range from 12.25 MPH to 15.5 MPH and the
payload is over 126.85 t, 96.8% of the 283 analyzed records indicate high engine
fuel consumption rate.
2. When ground speed is more than 31.5 MPH and actual gear is higher than 5, all
146 records show low engine fuel consumption rate.
3. The ground speed has more impact on the engine fuel consumption rate than do
other parameters.
Sequential Pattern Recognition in VIMS Event Logs
Thousands of events are recorded during a life of an average mining truck. These are
stored in the Event data log of the VIMS database. All VIMS events are classified into
two categories, data event and maintenance event. The data event is related to the
machine operating status, such as low engine oil level. The maintenance event is related
to the machine control system, a problem of the VIMS itself, such as severed sensor wire.
VIMS and related tools can only list and tabulate events. Other tools are needed to
discover additional knowledge that may exist in VIMS database. One such tool is the
Sequential Pattern, one of the data mining methods of Intelligent Miner. It can be used to
discover similar sequential data patterns in VIMS data bases.
The inputs to sequential pattern analysis were: Serial Number, System Measurement
Unit (SMU) and Event Identifier, a combination of event description and event level. The
minimum support level was set at 80%. Database of 42,514 events recorded on 12 trucks
were datamined. These included 69 event types. As a result 77,327 sequential patterns
were identified. The sequential patterns were identified to exist in all the 12 machines at
some point of time. The events Engine Oil Level, TC Out Temperature and VIMS
Snapshot show particularly strong relationship with each other.
Engine Oil Level monitors engine oil level and informs engine ECM when it drops
below acceptable minimum. This is an on/off switch type signal with switch open when
the oil level is low. TC Out Temperature monitors oil temperature on the outlet side of
the torque converter. Sensor signal pulse width changes as the torque converter oil
temperature changes. VIMS then determine the temperature based on the width of sensor
signal pulse. VIMS Snapshot is a percentage of memory space that is left available for
storing of the VIMS Snapshot data.
The physical interpretation of relation between the first two of the above is clear, the
last one is rather unusual. Since VIMS Snapshot is triggered by full VIMS Snapshot
memory it may happen when the data is not timely downloaded. On the other hand more
detailed analysis indicates that the first two occupy large part of VIMS Snapshot
memory,especially when the events take place frequently, or when the operator
repeatedly ignores these two events leading to overfilling the memory.
Information on relation between the events engine oil level and event of torque
converter temperature allows to predict increase of torque converter temperature from
the reading of engine oil level and vise versa. As such it is important to the task of this
work.
Overall the sequential patterns identified to exist in VIMS Event database with high
confidence level constitute very low percentage of all events. This does not disqualify
Sequential Pattern as a data mining method useful for mining VIMS databases. However,
it indicates the need to revise the approach used in future investigations. Possible changes
are: use of larger databases that include data collected from a variety of trucks and from
different mines, inclusion into analyzed databases of other data, external to VIMS, and
increasing the size of VIMS data sets so that these cover extended periods of truck
operation.
DISCUSSION
Work presented in this paper indicates that data mining of VIMS generated data
bases allows for discovery of knowledge contained in these databases. In particular
relations that exist between various VIMS-collected data can be identified, described and
quantified. This can be achieved through use of two IM statistical functions: linear
regression and factor analysis. While both these functions can serve this purpose, Factor
Analysis allows to significantly reduce the number of variables that need to be considered
and groups all related parameters into factors.
The clustering can be used to segment VIMS database through grouping of data that
have similar characteristics. This allows to idnetify the paramterers that are of key
importance to truck performance. Thus, as an example, the parameters that influence
truck fuel consumption rate can be defined. Further work is needed to fully define
applicability of custering to the problem at hand. It appears that clear definition of
relations or goals sought may be needed to realize the full potential of this method.
The classification was applied to interpret clustering results. It allowed for
quantification of impact that various VIMS parameters have on truck fuel consumption. It
was further proven that classification can be used to build a model that describes the
behaviour of VIMS parameters that are of interest. The research indicates that
classification alone does not yield meaningfull results. It does, however, yield these
results if used in conjunction with clustering.
Sequential patterns are usually used to find predictable patterns of behavior for a
given phenomenon over a period of time. In relation to VIMS parameters the intent was
to be able to predict occurrence of a specific event based on occurrence of similar event
in the past. However, this approach was unsuccessfull. While a multitude of similar
patterns was identified to exist in the data, its wariability was to large to permit draving
of valid conclusions on their repeatibility. It is believed that more complete VIMS
database, collected over extended period of time and in a variety of truck operating
condition may overcome this problem, and permit development of predictive models of
truck behaviour.
Ultimate goal of this work is to construct a model of truck operation that would
allow projection of truck condition and performance into future. The relations discovered
to exist between various VIMS parameters lay a foundation for this modeling work. Of
particular importance is use of Factor Analysis that reduces the number of parameters to
manageable level. Classification and clustering allows for analysis of truck performance
and indirectly its optimization. Use of sequential patterns need further study.
CONCLUSIONS
The investigations presented above prove that data mining can be used to analyze
performance of mining equipment. In particular the relations between its various
operating, condition and performance parameters can be defined, and quantified using
regression and factor analysis methods.
VIMS provides variety of data that quantify truck condition and performance. To
maximize use of this data it needs to be collected continuously over expended periods of
time and under a variety of operating and climatic conditions.
Intelligent Miner contains a variety of data mining tools, many of which can be used
to successfully data mine VIMS databases. However, the input data format of IM is not
compatible with that of VIMS generated data. Therefore an interface between the two is
needed that facilitates fast data reformatting.
More investigations are needed to fully define the applicability of data mining to
knowledge discovery in VIMS databases. Use of databases collected over extended
period of time and containing data external to VIMS that define truck operating condition
may be the key factor in determining this applicability.
ACKNOWLEDGEMENT
Financial support of investigations presented in this paper by Univeristy of Missouri
Research Board is gratefully acknowledged.
REFERENCES
Bernson, A. and Smith, S.J. 1997. “Data Warehousing, Data Mining and OLAP”. McGraw-Hill.
Westphal, C. and Blaxton, T. 1998. “Data Mining Solutions”. John Wiley and Sons, Inc.
Caterpillar, Inc. 1999. “Vital Information management System (VIMS): System Operation Testing
and Adjusting”. Company publication.
IBM (International Business Machines Corporation). 2000. Manual: “Using the Intelligent Miner for
Data”. Company publication.
Golosinski. 2001. “Data Mining Uses in Mining”. Proceedings, APCOM 2001, Beijing, China.
Golosinski, T.S., Hu, Hui and Elias, R. 2001. “Data Mining VIMS for Information on Truck
Condition”. APCOM 2001, Beijing, China.
Ataman, K. 2001. M.S. Thesis: “Data Mining for Prediction of Condition and Performance of Mine
Machinery”. University of Missouri-Rolla publication.
Madiba E. 2001. M.S. Thesis: “Application of IBM DB2 Intelligent Miner for data to mine Vital
Information Management System (VIMS) data”. University of Missouri-Rolla publication.
Download