A Comprehensive Review for Industrial Applicability of Artificial Neural Networks

advertisement
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
585
A Comprehensive Review for Industrial Applicability
of Artificial Neural Networks
Magali R. G. Meireles, Paulo E. M. Almeida, Student Member, IEEE, and
Marcelo Godoy Simões, Senior Member, IEEE
Abstract—This paper presents a comprehensive review of the
industrial applications of artificial neural networks (ANNs), in
the last 12 years. Common questions that arise to practitioners
and control engineers while deciding how to use NNs for specific
industrial tasks are answered. Workable issues regarding implementation details, training and performance evaluation of such
algorithms are also discussed, based on a judiciously chronological organization of topologies and training methods effectively
used in the past years. The most popular ANN topologies and
training methods are listed and briefly discussed, as a reference
to the application engineer. Finally, ANN industrial applications
are grouped and tabulated by their main functions and what
they actually performed on the referenced papers. The authors
prepared this paper bearing in mind that an organized and
normalized review would be suitable to help industrial managing
and operational personnel decide which kind of ANN topology and
training method would be adequate for their specific problems.
Index Terms—Architecture, industrial control, neural network
(NN) applications, training.
I. INTRODUCTION
I
N engineering and physics domains, algebraic and differential equations are used to describe the behavior and functioning properties of real systems and to create mathematical
models to represent them. Such approaches require accurate
knowledge of the system dynamics and the use of estimation
techniques and numerical calculations to emulate the system
operation. The complexity of the problem itself may introduce
uncertainties, which can make the modeling nonrealistic or inaccurate. Therefore, in practice, approximate analysis is used
and linearity assumptions are usually made. Artificial neural
networks (ANNs) implement algorithms that attempt to achieve
a neurological related performance, such as learning from experience, making generalizations from similar situations and
judging states where poor results were achieved in the past.
ANN history begins in the early 1940s. However, only in the
mid-1980s these algorithms became scientifically sound and caManuscript received October 23, 2001; revised September 20, 2002. Abstract
published on the Internet March 4, 2003. This work was supported by the National Science Foundation under Grant ECS 0134130.
M. R. G. Meireles was with the Colorado School of Mines, Golden, CO
80401 USA. She is now with the Mathematics and Statistics Department, Pontific Catholic University of Minas Gerais, 30.535-610 Belo Horizonte, Brazil
(e-mail: magali@pucminas.br).
P. E. M. Almeida was with the Colorado School of Mines, Golden, CO 80401
USA. He is now with the Federal Center for Technological Education of Minas
Gerais, 30.510-000 Belo Horizonte, Brazil (e-mail: paulo@dppg.cefetmg.br).
M. G. Simões is with the Colorado School of Mines, Golden, CO 80401 USA
(e-mail: msimoes@mines.edu).
Digital Object Identifier 10.1109/TIE.2003.812470
pable of application. Since the late 1980s, ANN started to be
utilized in a plethora of industrial applications.
Nowadays, ANN are being applied to a lot of real world,
industrial problems, from functional prediction and system
modeling (where physical processes are not well understood
or are highly complex), to pattern recognition engines and
robust classifiers, with the ability to generalize while making
decisions about imprecise input data. The ability of ANN to
learn and approximate relationships between input and output
is decoupled from the size and complexity of the problem
[49]. Actually, as relationships based on inputs and outputs
are enriched, approximation capability improves. ANN offers
ideal solutions for speech, character and signal recognition.
There are many different types of ANN. Some of the more
popular include multilayer perceptron (MLP) (which is generally
trained with the back-propagation of error algorithm), learning
vector quantization, radial basis function (RBF), Hopfield and
Kohonen, to name a few. Some ANN are classified as feed
forward while others are recurrent (i.e., implement feedback)
depending on how data is processed through the network.
Another way of classifying ANN types is by their learning
method (or training), as some ANN employ supervised training,
while others are referred to as unsupervised or self-organizing.
This paper concentrates on industrial applications of neural
networks (NNs). It was found that training methodology is more
conveniently associated with a classification of how a certain
NN paradigm is supposed to be used for a particular industrial problem. There are some important questions to answer,
in order to adopt an ANN solution for achieving accurate, consistent and robust modeling. What is required to use an NN?
How are NNs superior to conventional methods? What kind of
problem functional characteristics should be considered for an
ANN paradigm? What kind of structure and implementation
should be used in accordance to an application in mind? This
article will follow a procedure that will bring such managerial questions together and into a framework that can be used
to evaluate where and how such technology fits for industrial
applications, by laying out a classification scheme by means
of clustered concepts and distinctive characteristics of ANN
engineering.
II. NN ENGINEERING
Before laying out the foundations for choosing the best ANN
topology, learning method and data handling for classes of
industrial problems, it is important to understand how artificial
intelligence (AI) evolved with required computational resources.
0278-0046/03$17.00 © 2003 IEEE
586
Artificial intelligence applications moved away from laboratory
experiments to real world implementations. Therefore, software complexity also became an issue since conventional Von
Neumann machines are not suitable for symbolic processing,
nondeterministic computations, dynamic execution, parallel,
distributed processing, and management of extensive knowledge bases [118].
In many AI applications, the knowledge needed to solve
a problem may be incomplete, because the source of the
knowledge is unknown at the time the solution is devised, or
the environment may be changing and cannot be anticipated
at design time. AI systems should be designed with an open
concept that allows continuous refinement and acquisition of
new knowledge.
There exist engineering problems for which finding the perfect solution requires a practically impossible amount of resources and an acceptable solution would be fine. NNs can give
good solutions for such classes of problems. Tackling the best
ANN topology, learning method and data handling themselves
become engineering approaches. The success of using ANN for
any application depends highly on the data processing, (i.e., data
handling before or during network operation). Once variables
have been identified and data has been collected and is ready to
use, one can process it in several ways, to squeeze more information out of and filter it.
A common technique for coping with nonnormal data is to
perform a nonlinear transform to the data. To apply a transform,
one simply takes some function of the original variable and uses
the functional transform as a new input to the model. Commonly
used nonlinear transforms include powers, roots, inverses, exponentials, and logarithms [107].
A. Assessment of NN Performance
An ANN must be used in problems exhibiting knottiness,
nonlinearity, and uncertainties that justify its utilization [45].
They present the following features to cope with such complexities:
• learning from training data used for system identification; finding a set of connection strengths will allow
the network to carry out the desired computation [96];
• generalization from inputs not previously presented
during the training phase; by accepting an input and
producing a plausible response determined by the internal ANN connection structure makes such a system
robust against noisy data, features exploited in industrial applications [59];
• mapping of nonlinearity making them suitable for
identification in process control applications [90];
• parallel processing capability, allowing fast processing for large-scale dynamical systems;
• applicable to multivariable systems; they naturally
process many inputs and have many outputs.
• used as a black-box approach (no prior knowledge
about a system) and implemented on compact processors for space and power constrained applications.
In order to select a good NN configuration, there are several
factors to take into consideration. The major points of interest
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
regarding the ANN topology selection are related to network
design, training, and practical considerations [25].
B. Training Considerations
Considerations, such as determining the input and output
variables, choosing the size of the training data set, initializing
network weights, choosing training parameter values (such
as learning rate and momentum rate), and selecting training
stopping criteria, are important for several network topologies.
There is no generic formula that can be used to choose the
parameter values. Some guidelines can be followed as an initial
trial. After a few trials, the network designer should have
enough experience to set appropriate criteria that suit a given
problem.
The initial weights of an NN play a significant role in the convergence of the training method. Without a priori information
about the final weights, it is a common practice to initialize all
weights randomly with small absolute values. In linear vector
quantization and derived techniques, it is usually required to
renormalize the weights at every training epoch. A critical parameter is the speed of convergence, which is determined by
the learning coefficient. In general, it is desirable to have fast
learning, but not so fast as to cause instability of learning iterations. Starting with a large learning coefficient and reducing
it as the learning process proceeds, results in both fast learning
and stable iterations. The momentum coefficients are, usually,
set according to a schedule similar to the one for the learning
coefficients [128].
Selection of training data plays a vital role in the performance
of a supervised NN. The number of training examples used to
train an NN is sometimes critical to the success of the training
process. If the number of training examples is not sufficient,
then the network cannot correctly learn the actual input–output
relation of the system. If the number of training examples is too
large, then the network training time will be longer. For some
applications, such as real-time adaptive neural control, training
time is a critical variable. For others, such as training the network to perform fault detection, the training can be performed
off-line and more training data are preferred, over using insufficient training data to achieve greater network accuracy. Generally, rather than focusing on volume, it is better to concentrate
on the quality and representational nature of the data set. A good
training set should contain routine, unusual and boundary-condition cases [8].
Popular criteria used to stop network training are small meansquare training error and small changes in network weights.
Definition about how small is usually up to the network designer
and is based on the desired accuracy level of the NN. Using
as an example a motor bearing fault diagnosis process, they
used a learning rate of 0.01 and momentum of 0.8; the training
was stopped when the root mean-square error of the training
set or the change in network weights was sufficiently small
for that application (less than 0.005) [72]. Therefore, if any
prior information about the relationship between inputs and
outputs is available and used correctly, the network structure
and training time can be reduced and the network accuracy
can be significantly improved.
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
TABLE I
ORGANIZATION OF NNS BASED ON THEIR FUNCTIONAL CHARACTERISTICS
C. Network Design
Some of the design considerations include determining the
number of input and output nodes to be used, the number of
hidden layers in the network and the number of hidden nodes
used in each hidden layer. The number of input nodes is typically taken to be the same as the number of state variables. The
number of output nodes is typically the number that identifies
the general category of the state of the system. Each node constitutes a processing element and it is connected through various weights to other elements. In the past, there was a general
practice of increasing the number of hidden layers, to improve
training performance. Keeping the number of layers at three and
adjusting the number of processing elements in the hidden layer,
can achieve the same goal. A trial-and-error approach is usually
used to determine the number of hidden layer processing elements, starting with a low number of hidden units and increasing
this number as learning problems occur. Even though choosing
these parameters is still a trial-and-error process, there are some
guidelines that can be used, (i.e., testing the network’s performance). It is a common practice to choose a set of training data
and a set of testing data that are statistically significant and representative of the system under consideration. The training data
set is used to train the NN, while the testing data is used to test
the network performance, after the training phase finishes.
D. Practical Considerations
Practical considerations regarding the network accuracy, robustness and implementation issues must be addressed, for
real-world implementation. For ANN applications, it is usually considered a good estimation performance when pattern
recognition achieves more than 95% of accuracy in overall and
comprehensive data recalls [25]. Selection and implementation
of the network configuration needs to be carefully studied since
it is desirable to use the smallest possible number of nodes
while maintaining a suitable level of conciseness. Pruning algorithms try to make NNs smaller by trimming unnecessary
links or units, so the cost of runtime, memory and hardware
implementation can be minimized and generalization is improved. Depending on the application, some system functional
characteristics are important in deciding which ANN topology
should be used [81]. Table I summarizes the most common ANN
structures used for pattern recognition, associative memory,
optimization, function approximation, modeling and control,
image processing, and classification purposes.
587
III. EVOLUTION OF UNDERLYING FUNDAMENTALS THAT
PRECEDED INDUSTRIAL APPLICATIONS
While there are several tutorials and reviews discussing the
full range of NNs topologies, learning methods and algorithms,
the authors in this paper intend to cover what had actually been
applied to industrial applications. An initial historical perspective is important to get the picture for the age of industrial applications, which started in 1988, just after the release of [123]
by Widrow.
It is well known that the concept of NNs came into existence around the Second World War. In 1943, McCulloch
and Pitts proposed the idea that a mind-like machine could be
manufactured by interconnecting models based on behavior
of biological neurons, laying out the concept of neurological
networks [77]. Wiener gave this new field the popular name
cybernetics, whose principle is the interdisciplinary relationship among engineering, biology, control systems, and brain
functions [125]. At that time, computer architecture was not
fully defined and the research led to what is today defined
as the Von Neumann-type computer. With the progress in research on the brain and computers, the objective changed from
the mind-like machine to manufacturing a learning machine,
for which Hebb’s learning model was initially proposed [53].
In 1958, Rosenblatt from the Cornell Aeronautical Laboratory put together a learning machine, called the “perceptron.”
That was the predecessor of current NNs. He gave specific
design guidelines used by the early 1960s [91]. Widrow and
Hoff proposed the “ADALINE” (ADAptive LINear Element),
a variation on the pPerceptron, based on a supervised learning
rule (the “error correction rule”) which could learn in a faster
and more accurate way: synaptic strengths were changed in
proportion to the error (what the output is and what it should
have been) multiplied by the input. Such a scheme was successfully used for echo cancellation in telephone lines and is
considered to be the first industrial application of NNs [124].
During the 1960s, the forerunner for current associative memory
systems was the work of Steinbuch with his “Learning Matrix,” which was a binary matrix accepting a binary vector
as input, producing a binary vector as output and capable of
forming associations between pairs with a Boolean Hebbian
learning procedure [108]. The perceptron received considerable excitement, when it was first introduced, because of its
conceptual simplicity. The ADALINE is a weighted sum of the
inputs, together with a least-mean-square (LMS) algorithm to
adjust the weights and to minimize the difference between the
desired signal and the actual output. Because of the rigorous
mathematical foundation of the LMS algorithm, ADALINE
has become a powerful tool for adaptive signal processing and
adaptive control, leading to work on competitive learning and
self-organization. However, Minsky and Papert proved mathematically that the Perceptron could not be used for a class
of problems defined as nonseparable logic functions [80].
Very few investigators conducted research on NNs during the
1970s. Albus developed his adaptive “Cerebellar Model Articulation Controller” (CMAC), which is a distributed table-lookup
system based on his view of models of human memory [1]. In
1974, Werbos originally developed the backpropagation algorithm. Its first practical application was to estimate a dynamic
588
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
model, to predict nationalism and social communications [120].
However, his work remained almost unknown in the scientific community for more than ten years. In the early 1980s,
Hopfield introduced a recurrent-type NN, which was based
on Hebbian learning law. The model consisted of a set of
first-order (nonlinear) differentiable equations that minimize a
certain energy function [55]. In the mid-1980s, backpropagation was rediscovered by two independent groups led by Parker
and Rumelhart et al., as the learning algorithm of feedforward
NNs [88], [95]. Grossberg and Carpenter made significant contributions with the “Adaptive Resonance Theory” (ART) in
the mid-1980s, based on the idea that the brain spontaneously
organizes itself into recognition codes and neurons organize
themselves to tune various and specific patterns defined as
self-organizing maps [20]. The dynamics of the network were
modeled by first order differentiable equations based on implementations of pattern clustering algorithms. Furthermore,
Kosko extended some of the ideas of Grossberg and Hopfield
to develop his adaptive “Bi-directional Associative Memory”
(BAM) [67]. Hinton, Sejnowski, and Ackley developed the
“Boltzmann Machine,” which is a kind of Hopfield net that
settles into solutions by a simulated annealing process as a
stochastic technique [54]. Broomhead and Lowe first introduced “RBF networks” in 1988 [15]. Although the basic idea
of RBF was developed before, under the name method of potential function, their work opened another NN frontier. Chen
proposed functional-link networks (FLNs), where a nonlinear
functional transform of the network inputs aimed lower computational efforts and fast convergence [22].
The 1988 DARPA NN study listed various NN applications
supporting the importance of such technology for commercial
and industrial applications and triggering a lot of interest in
the scientific community, leading eventually to applications
in industrial problems. Since then, the application of ANN to
sophisticated systems has skyrocketed. NNs found widespread
relevance for several different fields. Our literature review
showed that practical industrial applications were reported in
peer-reviewed engineering journals from as early as 1988.
Extensive use has been reported in pattern recognition and
classification for image and speech recognition, optimization
in planning of actions, motions, and tasks and modeling,
identification, and control.
Fig. 1 shows some industrial applications of NNs reported in
the last 12 years. The main purpose here is to give an idea of the
most used ANN topologies and training algorithms and to relate
them to common fields in the industrial area. For each entry,
the type of the application, used ANN topology, implemented
training algorithm, and the main authors are presented. The collected data give a good picture of what has actually migrated
from academic research to practical industrial fields and shows
some of the authors and groups responsible for this migration.
IV. DILEMMATIC PIGEONHOLE OF NEURAL STRUCTURES
Choosing an ANN solution for an immediate application is
a situation that requires a choice between options that are (or
seem) equally unfavorable or mutually exclusive. Several issues
must be considered when regarding the problem point of view.
The main features of an ANN can be classified as follows [81].
• Topology of the networks: multilayered, single-layered, or recurrent. The network is multilayered if it has
distinct layers such as input, hidden and output. There
are no connections among the neurons within the same
layer. If each neuron can be connected with every other
neuron in the network through directed edges, except
the output node, this network is called single layered
(i.e., there is no hidden layer). A recurrent network
distinguishes itself from the feedforward topologies in
that it has at least one feedback loop [49].
• Data flow: recurrent or nonrecurrent. A nonrecurrent or feedforward model where the outputs always
propagate from left to right in the diagrams. The
outputs from the input layer neurons propagate to the
right, becoming inputs to the hidden layer neurons and
then, outputs from the hidden layer neurons propagate
to the right, becoming inputs to the output layer
neurons. An NN, in which the outputs can propagate
in both directions, forward and backward, is called a
recurrent model.
• Types of input values: binary, bipolar or continuous. Neurons in an artificial network can be defined to
process different kinds of signals. The most common
types are binary (restricted to either 0 or 1), bipolar (either 1 or 1) and continuous (continuous real numbers
in a certain range).
• Forms of activation: linear, step, or sigmoid. Activation functions will define the way neurons will behave inside the network structure and, therefore, the
kind of relationship that will occur between input and
output signals.
A common classification of ANNs is based on the way in
which their elements are interconnected. There is a study that
presents the approximate percentage of network utilization as:
MLP, 81.2%; Hopfield, 5.4%; Kohonen, 8.3%; and the others,
5.1% [49]. This section will cover the main types of networks
that have been used in industrial applications, in a reasonable
number of reports and applications. A comprehensive listing of
all available ANN structures and topologies is out of the scope
of this discussion.
A. MLPs
In this structure, each neuron output is connected to every
neuron in subsequent layers connected in cascade with no connections between neurons in the same layer. A typical diagram
of this structure is detailed in Fig. 2.
MLP has been reported in several applications. Some examples are speed control of dc motors [94], [117], diagnostics
of induction motor faults [24], [25], [41], [42], induction
motor control [17], [18], [56], [127], and current regulator for
pulsewidth-modulation (PWM) rectifiers [31]. Maintenance
and sensor failure detection was reported in [82], check valves
operating in a nuclear power plant [57], [114], and vibration
monitoring in rolling element bearings [2]. It was widely
applied in feedback control [19], [40], [52], [59], [87], [89],
[109], [110] and fault diagnosis of robotic systems [116].
This structure was also used in a temperature control system
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
589
Fig. 1. Selected industrial applications reported since 1989.
[63], [64], monitoring feed water flow rate and component
thermal performance of pressurized water reactors [61], and
fault diagnosis in a heat exchanger continuous stirred tank
reactor system [102]. It was used in a controller for turbo
590
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
Fig. 2. MLP basic topology.
Fig. 4. Hopfield network structure.
Fig. 3.
Typical recurrent network structure.
generators [117], digital current regulation of inverter drivers
[16], and welding process modeling and control [4], [32]. The
MLP was used in modeling chemical process systems [12], to
produce quantitative estimation of concentration of chemical
components [74], and to select powder metallurgy materials
and process parameters [23]. Optimization of the gas industry
was reported by [121], as well as prediction of daily natural gas
consumption needed by gas utilities [65]. The MLP is indeed
the most used structure and spread out across several disciplines
like identification and defect detection on woven fabrics [99],
prediction of paper cure in the papermaking industry [39],
controller steering backup truck [85], and modeling of plate
rolling processes [46].
B. Recurrent NNs (RNNs)
A network is called recurrent when the inputs to the neurons
come from external input, as well as from the internal neurons,
consisting of both feed forward and feedback connections between layers and neurons. Fig. 3 shows such a structure; it was
demonstrated that recurrent network could be effectively used
for modeling, identification, and control of nonlinear dynamical systems [83].
The trajectory-tracking problem, of controlling the nonlinear
dynamic model of a robot, was evaluated using an RNN; this
network was used to estimate the dynamics of the system and its
inverse dynamic model [97]. It was also used to control robotic
manipulators, facilitating the rapid execution of the adaptation
process [60]. A recurrent network was used to approximate a
trajectory tracking to a very high degree of accuracy [27]. It
was applied to estimate the spectral content of noisy periodic
waveforms that are common in engineering processes [36]. The
Hopfield network model is the most popular type of recurrent
NN. It can be used as associative memory and can also be applied to optimization problems. The basic idea of the Hopfield
network is that it can store a set of exemplar patterns as multiple
stable states. Given a new input pattern, which may be partial
or noisy, the network can converge to one of the exemplar patterns nearest to the input pattern. As shown in Fig. 4, a Hopfield
network consists of a single layer of neurons. The network is
recurrent and fully interconnected (every neuron in the network
is connected to every other neuron). Each input/output takes a
discrete bipolar value of either 1 or 1 [81].
A Hopfield network was used to indicate how to apply it to
the problem of linear system identification, minimizing the least
square of error rates of estimates of state variables [29]. A modified Hopfield structure was used to determine the imperfection
by the degree of orthogonality between the automated extracted
feature, from the send-through image and the class feature of
early good samples. The performance measure used for such
an automatic feature extraction is based on a certain mini-max
cost function useful for image classification [112]. Simulation
results illustrated the Hopfield network’s use, showing that this
technique can be used to identify the frequency transfer functions of dynamic plants [28]. An approach to detect and isolate
faults in linear dynamic systems was proposed and systems parameters were estimated by Hopfield network [105].
C. Nonnrecurrent Unsupervised Kohonen Networks
A Kohonen network is a structure of interconnected processing units that compete for the signal. It uses unsupervised
learning and consists of a single layer of computational nodes
(and an input layer). This type of network uses lateral feedback,
which is a form of feedback whose magnitude is dependent on
the lateral distance from the point of application. Fig. 5 shows
the architecture with two layers. The first is the input layer
and the second is the output layer, called the Kohonen layer.
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
591
Fig. 7. Adaptive resonance theory network.
Fig. 5.
Fig. 6.
Kohonen network structure.
CMAC network structure.
Every input neuron is connected to every output neuron with its
associated weight. The network is nonrecurrent, input information propagates only from the left to right. Continuous (rather
than binary or bipolar) input values representing patterns are
presented sequentially in time through the input layer, without
specifying the desired output. The output neurons can be
arranged in one or two dimensions. A neighborhood parameter,
or radius, , can be defined to indicate the neighborhood of a
specific neuron. It has been used as a self-organization map for
classification [98] and pattern recognition purposes in general.
D. CMAC
The input mapping of the CMAC algorithm can be seen as
a set of multidimensional interlaced receptive fields, each one
with finite and sharp borders. Any input vector to the network
excites some of these fields, while the majority of the receptive
fields remain unexcited, not contributing to the corresponding
output. On the other hand, the weighted average of the excited
receptive fields will form the network output. Fig. 6 shows a
schematic diagram of this structure. This figure depicts the nonlinear input mapping in the Albus approach and a hashing operation that can be performed to decrease the amount of memory
needed to implement the receptive fields.
CMAC networks are considered local algorithms because, for
a given input vector, only a few receptive fields will be active
and contribute to the corresponding network output [3]. In the
same way, the training algorithm for a CMAC network should
affect only the weights corresponding to active fields, excluding
the majority of inactive fields in the network. This increases the
efficiency of the training process, minimizing the computational
efforts needed to perform adaptation in the whole network.
CMAC was primarily applied to complex robotic systems
involving multiple feedback sensors and multiple command
variables. Common experiments involved control of position
and orientation of an object using a video camera mounted
at the end of a robot arm and moving objects with arbitrary
orientation relative to the robot [79]. This network was also
used for air-to-fuel ratio control of automotive fuel-injection
systems. Experimental results showed that the CMAC is very
effective in learning the engine nonlinearities and in dealing
with the significant time delays inherent in engine sensors [76].
E. Adaptive Resonance Theory (ART) (Recurrent,
Unsupervised)
The main feature of ART, when compared to other similar
structures, is its ability to not forget after learning. Usually,
NNs are not able to learn new information without damaging
what was previously ascertained. This is caused by the fact
that when a new pattern is presented to an NN in the phase
of learning, the network tries modifying the weights at node
inputs, which only represent what was previously learned. The
ART network is recurrent and self-organizing. Its structure is
shown in Fig. 7. It has two basic layers and no hidden layers.
The input layer is also called “comparing” while the output
layer is called “recognizing.” This network is composed of two
completely interconnected layers in both directions. [58]. It was
successfully used for sensor pattern interpretation problems
[122], among others.
592
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
Fig. 9. Probabilistic ANN structure.
Fig. 8. RBF network structure.
F. RBF Networks
Fig. 8 shows the basic structure of an RBF network. The input
nodes pass the input values to the connecting arcs and the first
layer connections are not weighted. Thus, each hidden node
receives each input value unaltered. The hidden nodes are the
RBF units. The second layer of connections is weighted and the
output nodes are simple summations. This network does not extend to more layers.
For applications such as fault diagnosis, RBF networks offer
advantages over MLP. It is faster to train because training of
the two layers is decoupled [70]. This network was used to
improve the quantity and the quality of the galvanneale sheet
produced on galvanizing lines, integrating various approaches,
including quality monitoring, diagnosis, control, and optimization methods [13]. RBF was trained to evaluate and compare
the different grasping alternatives by a robotic hand, according
to geometrical and technological aspects of object surfaces,
as well as the specific task to be accomplished. The adoption
of RBF topology was justified by two reasons [34].
• In most cases, it presents higher training speed
when compared with ANN based on back-propagation
training methods.
• It allows an easier optimization of performance,
since the only parameter that can be used to modify its
structure is the number of neurons in the hidden layer.
Results using RBF networks were presented to illustrate that it is
possible to successfully control a generator system [43]. Power
electronic drive controllers have also been implemented with
these networks, in digital signal processors (DSPs), to attain robust properties [38].
G. Probabilistic NNs (PNNs)
PNNs are somewhat similar in structure to MLPs. Some
basic differences among them are the use of activation by
exponential functions and the connection patterns between
neurons. As a matter of fact, the neurons at the internal layers
are not fully connected, depending on the application in turn.
Fig. 9 depicts this structure, showing its basic differences from
ordinary MLP structure. PNN training is normally easy and
instantaneous, because of the smaller number of connections.
Fig. 10.
Polynomial ANN structure.
A practical advantage is that, unlike other networks, it operates
completely in parallel and the signal flows in a unique direction,
without a need for feedback from individual neurons to the
inputs. It can be used for mapping, classification and associative
memory, or to directly estimate a posteriori probabilities [103],
[104]. Probabilistic NNs were used to assist operators while
identifying transients in nuclear power plants, such as plant
accident scenario, equipment failure or an external disturbance
to the system, at the earliest stages of their developments [6].
H. Polynomial Networks
Fig. 10 depicts a polynomial network. It has its topology
formed during the training process. Due to this feature, it is
defined as a plastic network. The neuron activation function is
based on elementary polynomials of arbitrary order. In this example, the network has seven inputs, although the network uses
only five of them. This is due to the automatic input selection
capability of the training algorithm. Automatic feature selection
is very useful in control applications when the plant model order
is unknown. Each neuron output can be expressed by a secondorder polynomial function
, where
and
are inputs and
and are polynomial coefficients which are equivalent to the
network weights and is the neuron output. The Group Method
of Data Handling (GMDH) is a statistics-based training method
largely used in modeling economic, ecological, environmental
and medical problems. The GMDH training algorithm can be
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
593
Fig. 12.
Fig. 11.
Functional link ANN structure.
used to adjust the polynomial coefficients and to find the network structure. This algorithm employs two sets of data: One
for estimating the network weights and the other for testing with
neurons should survive during the training process. A new form
of implementation of a filter was proposed using a combination
of recurrent NN and polynomial NN [101].
I. FLNs
Since NNs are used for adaptive identification and control,
the learning capabilities of the networks can have significant
effects on the performance of closed-loop systems. If the information content of data input to a network can be modified online, then it will more easily extract salient features of the data.
The functional link acts on an element of an input vector, or on
all the input vectors, by generating a set of linearly independent
functions, then evaluating these functions with the pattern as the
argument. Thus, both the training time and training error of the
network can be improved [113]. Fig. 11 shows a functional link
NN, which can be considered a one-layer feedforward network
with an input preprocessing element. Only the weights in the
output layer are adjusted [66].
The application of an FLN was presented for heating, ventilating, and air conditioning (HVAC) thermal dynamic system
identification and control. The use of an NN provided a means of
adapting a controller online, in an effort to minimize a given cost
index. The identification networks demonstrated the capacity to
learn changes in the plant dynamics and to accurately predict
future plan behavior [113]. A robust ANN controller, to the motion control of rigid-link electrically driven robot using an FLN,
had been presented. The method did not require the robot dynamics to be exactly known [69]. Multilayer feedforward and
FLN forecasters were used to model the complex relationship
between weather parameters and previous gas intake with future
consumption [65]. The FLN was used to improve performance
in the face of unknown nonlinear characteristics by adding nonlinear effects to the linear optimal controller of robotic systems
[66].
J. Functional Polynomial Networks (FPNs)
This network structure merges both models of functional
link and polynomial network resulting in a very powerful ANN
Common grids of CNNs.
model due to the automatic input selection capability of the
polynomial networks. The FPN presents advantages such as fast
convergence, no local minima problem, structure automatically
defined by the training process, and no adjustment of learning
parameters. It has been tested for speed control with a dc motor
and the results have been compared with the ones provided by
an indirect adaptive control scheme based on MLPs trained by
backpropagation [101].
K. Cellular NNs (CNNs)
The most general definition for such networks is that they
are arrays of identical dynamical systems, called cells, which
are only locally connected. Only adjacent cells interact directly
with each other [78]. In the simplest case, a CNN is an array
of simple, identical, nonlinear, dynamical circuits placed on a
two–dimensional (2-D) geometric grid, as shown in Fig. 12. If
these grids are duplicated in a three–dimensional (3-D) form,
a multilayer CNN can be constructed [30]. It is an efficient architecture for performing image processing and pattern recognition [51]. This kind of network has been applied to problems
of image classification for quality control. Gulglielmi et al. [47]
described a fluorescent magnetic particle inspection, which is
a nondestructive method for quality control of ferromagnetic
materials.
V. TRAINING METHODS
There are basically two main groups of training (or learning)
algorithms: supervised learning (which includes reinforcement
learning) and unsupervised learning. Once the structure of an
NN has been selected, a training algorithm must be attached,
to minimize the prediction error made by the network (for
supervised learning) or to compress the information from the
inputs (for unsupervised learning). In supervised learning, the
correct results (target values, desired outputs) are known and
are given to the ANN during training so that the ANN can
adjust its weights to try match its outputs to the target values.
After training, an ANN is tested as follows. One gives it only
input values, not target values and sees how close the network
comes to outputting the correct target values. Unsupervised
learning involves no target values; it tries to auto-associate
information from the inputs with an intrinsic reduction of data
dimension, similar to extracting principal components in linear
systems. This is the role of the training algorithms (i.e., fitting the
model represented by the network to the training data available).
The error, of a particular configuration of the network, can be
594
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
LMS or Widrow–Hoff Learning: The LMS algorithm is quite
similar to perceptron learning algorithm. The differences are as
follows.
1) The error is based on the sum of inputs to the unit rather
than the binary-valued output to the unit . Therefore,
(2)
Fig. 13.
Supervised learing scheme.
determined by running all the training cases through the network
and comparing the actual output generated with the desired or
target outputs or clusters. In learning algorithms considerations,
one is interested in whether a particular algorithm converges,
its speed of convergence and the computational complexity
of the algorithm.
In supervised learning, a set of inputs and correct outputs is
used to train the network. Before the learning algorithms are
applied to update the weights, all the weights are initialized
randomly [84]. The network, using this set of inputs, produces
its own outputs. These outputs are compared with the correct
outputs and the differences are used to modify the weights, as
shown in Fig. 13. A special case of supervised learning is reinforcement learning, shown in Fig. 14, where there is no set of
inputs and correct outputs. Training is commanded only by signals indicating if the produced output is bad or good, according
to a defined criterion.
Fig. 15 shows the principles of unsupervised learning, also
known as self-organized learning, where a network develops its
classification rules by extracting information from the inputs
presented to the network. In other words, by using the correlation of the input vectors, the learning rule changes the network
weights to group the input vectors into clusters. By doing so,
similar input vectors will produce similar network outputs since
they belong to a same cluster.
2) The linear sum of the inputs is passed through bipolar
sigmoid functions, which produces the output 1 or 1,
depending on the polarity of the sum.
This algorithm can be used in structures as RBF networks and
was successfully applied [43], [70]. Some CMAC approaches
can also use this algorithm to adapt a complex robotic system
involving multiple feedback sensors and multiple command
variables [79].
Grossberg Learning: Sometimes known as in-star and
out-star training, this algorithm is updated as follows:
(3)
where could be the desired input values (in-star training) or
the desired output values (out-star training) depending on the
network structure.
B. First-Order Gradient Methods
Backpropagation: Backpropagation is a generalization of
the LMS algorithm. In this algorithm, an error function is
defined as the mean-square difference between the desired
output and the actual output of the feedforward network [45]. It
is based on steepest descent techniques extended to each of the
layers in the network by the chain rule. Hence, the algorithm
, of the
computes the partial derivative
error function with respect to the weights. The error function
, where
is the
is defined as
is the network output. The objective is to
desired output,
by taking the error gradient
minimize the error function
with respect to the parameters or weight vector, for example,
, that is to be adapted. The weights are then updated by using
A. Early Supervised Learning Algorithms
Early learning algorithms were designed for single layer NNs.
They are generally more limited in their applicability, but their
importance in history is remarkable.
Perceptron Learning: A single-layer perceptron is trained as
follows.
1) Randomly initialize all the networks weights.
2) Apply the inputs and calculate the sum of each unit
.
3) The outputs from each unit are
(4)
where
is the learning rate and
(5)
(1)
This algorithm is simple to implement and computationally less
complex than other modified forms. Despite some disadvantages, it is popularly used and there are numerous extensions
to improve it. Some of these techniques will be presented.
Backpropagation With Momentum (BPM): The basic improvement to the backpropagation algorithm is to introduce a
momentum term in the weights updating equation
where
4) Compute the error
is the known desired output value.
5) Update each weight as
.
6) Repeat steps 2)–4) until the errors reach the satisfactory
level.
(6)
is commonly selected inside
where the momentum factor
[0,1]. Adding the momentum term improves the convergence
speed and helps the network from being trapped in a local
minimum.
threshold
otherwise
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
Fig. 14.
595
Reinforcement learning scheme.
where
(10)
(11)
Fig. 15.
Unsupervised learning scheme.
A modification to (6) was proposed in 1990, inserting the
constant , defined by the user [84]
(7)
The idea was to reduce the possibility of the network being
trapped in the local minimum.
Delta–Bar–Delta (DBD): The DBD learning rules use adaptive learning rates, to speed up the convergence. The adaptive
learning rate adopted is based on a local optimization method.
This technique uses gradient descent for the search direction and
then applies individual step sizes for each weight. This means
the actual direction taken in weight space is not necessarily
along the line of the steepest gradient. If the weight updates between consecutive iterations are in opposite directions, the step
size is decreased; otherwise, it is increased. This is prompted by
the idea that if the weight changes are oscillating, the minimum
is between the oscillations and a smaller step size might find
that minimum. The step size may be increased again once the
error has stopped oscillating.
denotes the learning rate for the weight
, then
If
(8)
and
is as follows:
(9)
otherwise
) are specified by
The positive constant and parameters (
is basically an exponentially dethe user. The quantity
caying trace of gradient values. When the and are set to
zero, the learning rates assume a constant value as in the standard backpropagation algorithm.
Using momentum along with the DBD algorithm can enhance
performance considerably. However, it can make the search diverge wildly, especially if is moderately large. The reason is
that momentum magnifies learning rate increments and quickly
leads to inordinately large learning steps. One possible solution
is to keep factor very small, but this can easily lead to slow
increase in and little speedup [84].
C. Second-Order Gradient Methods
These methods use the Hessian matrix , which is the matrix
of second derivatives of with respect to the weights . This
matrix contains information about how the gradient changes in
different directions in weight space
(12)
Newton Method: The Newton method weights update is processed as follows:
(13)
However, the Newton method is not commonly used because
computing the Hessian matrix is computationally expensive.
Furthermore, the Hessian matrix may not be positive definite
at every point in the error surface. To overcome the problem,
several methods have being proposed to approximate the Hessian matrix [84].
Gauss–Newton Method: The Gauss–Newton method promatrix that is an approximation to the Hessian
duces an
matrix, having elements represented by
(14)
596
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
where
. However, the
Gauss–Newton method may still have ill conditioning if
close to or is singular [75].
Levenberg–Marquardt (LM) Method: The LM method overcomes this difficulty by including an additional term, which
is added to the Gauss–Newton approximation of the Hessian
giving
(15)
where is a small positive value and is the identity matrix.
could also be made adaptive by having
if
if
(16)
where is a value defined by the user. It is important to notice
that when is large, the algorithm becomes backpropagation
and, when is small, the algorithm bewith learning rate
comes Gauss–Newton. An NN, trained by using this algorithm,
can be found in the diagnosis of various motor bearing faults
through appropriate measurement and interpretation of motor
bearing vibration signals [72].
eligibility traces can be generated using the following linear
difference equation:
(22)
determines the trace decay rate, is the
where
input, and is the output.
Adaptive Critic Learning: The weights update in a critic network is as follows [9]:
(23)
, is a constant discount factor,
is the
where ,
is the
reinforcement signal from the environment, and
is the prediction at time of
trace of the input variable .
eventual reinforcement, which can be described as a linear func. The adaptive critic network output
tion
, the improved or internal reinforcement signal, is computed
from the predictions as follows:
(24)
E. Unsupervised Learning
D. Reinforcement Learning Algorithms
Reinforcement learning has one of its roots in psychology,
from the idea of rewards and penalties used to teach animals to
do simple tasks with small amounts of feedback information.
Barto and others proposed the Adaptive Critic Learning algorithm to solve discrete domain decision-making problems in the
1980s. Their approach was generalized to the NNs field later
used by Sutton, who used CMAC and other ANN structures to
learn paths for mobile robots in maze-like environments [111].
Linear Reward–Penalty Learning: When the reinforcement
signal is positive ( 1), the learning rule is
for
(17)
(18)
If the reinforcement signal is negative ( 1), the learning rule is
(19)
for
(20)
denotes the probability
where and are learning rates,
at iteration , and is the number of actions taken. For positive
reinforcement, the probability of the current action is increased
with relative decrease in the probabilities of the other actions.
The adjustment is reversed in the case of negative reinforcement.
Associative Search Learning: In this algorithm, the weights
are updated as follows [9]:
(21)
is the reinforcement signal and is eligibility.
where
Positive indicates the occurrence of a rewarding event and
negative indicates the occurrence of a punishing event. It
can be regarded as a measure of the change in the value of a
performance criterion. Eligibility reflects the extent of activity
in the pathway or connection link. Exponentially decaying
Hebbian Learning: Weights are updated as follows:
(25)
(26)
is the weight from th unit to th unit at time
where
is the excitation level of the source unit or th
step ,
is the excitation level of the destination unit
unit, and
or the th output unit. In this system, learning is a purely
local phenomenon, involving only two units and a synapse.
No global feedback system is required for the neural pattern
to develop. A special case of Hebbian learning is correlation
and
learning, which uses binary activation for function
is defined as the desired excitation level for the destination
unit. While Hebbian learning is performed in unsupervised
environments, correlation learning is supervised [128].
Boltzmann Machine Learning: The Boltzmann Machine
training algorithm uses a kind of stochastic technique known
as simulated annealing, to avoid being trapped in local minima
of the network energy function. The algorithm is as follows.
1) Initialize weights.
2) Calculate activation as follows.
a) Select an initial temperature.
b) Until thermal equilibrium, repeatedly calculate the
probability that is active by (23).
c) Exit when the lowest temperature is reached.
Otherwise, reduce temperature by a certain annealing
schedule and repeat step 2)
(27)
is the total input
Above, is the temperature,
received by the th unit, and the activation level of unit
is set according to this probability.
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
597
Kohonen Self-Organizing Learning: The network is trained
according to the following algorithm, frequently called the
“winner-takes-all” rule.
1) Apply an input vector .
(in -dimensional space) be2) Calculate the distance
of each unit. In Eutween and the weight vectors
clidean space, this is calculated as follows:
(28)
3) The unit that has the weight vector closest to is declared
, becomes
the winner unit. This weight vector, called
the center of a group of weight vectors that lie within a
.
distance from
, train
4) For all weight vectors within a distance of
this group of nearby weight vectors according to the formula that follows:
(29)
5) Perform steps 1)–4), cycling through each input vector
until convergence.
F. Practical Considerations
NNs are unsurpassed at identifying patterns or trends in data
and well suited for prediction or forecasting needs including
sales and customer research, data validation, risk management,
and industrial process control. One of the fascinating aspects, of
the practical implementation of NNs to industrial applications,
is the ability to manage data interaction between electrical and
mechanical behavior and often other disciplines, as well. The
majority of the reported applications involve fault diagnosis and
detection, quality control, pattern recognition, and adaptive control [14], [44], [74], [115]. Supervised NNs can mimic the behavior of human control systems, as long as data corresponding
to the human operator and the control input are supplied [7],
[126]. Most of the existing, successful applications in control
use supervised learning, or any form of a reinforcement learning
approach that is also supervised. Unsupervised learning is not
suitable, particularly for online control, due the slow adaptation
and required time for the network to settle into stable conditions. Unsupervised learning schemes are used mostly for pattern recognition, by defining group of patterns into a number of
clusters or classes.
There are some advantages to NNs over multiple data regression. There is no need to select the most important independent variables in the data set. The synapses associated with
irrelevant variables readily show negligible weight values; in
their turn, relevant variables present significant synapse weight
values. There is also no need to propose a model function
as required in multiple regressions. The learning capability of
NNs allows them to discover more complex and subtle interactions between the independent variables, contributing to
the development of a model with maximum precision. NNs
are intrinsically robust showing more immunity to noise, an
important factor in modeling industrial processes. NNs have
been applied within industrial domains, to address the inherent
complexity of interacting processes under the lack of robust
analytical models of real industrial processes. In many cases,
network topologies and training parameters are systematically
varied until satisfactory convergence is achieved. Currently,
the most widely used algorithm for training MLPs is the backpropagation algorithm. It minimizes the mean square error
between the desired and the actual output of the network. The
optimization is carried out with a gradient-descent technique.
There are two critical issues in network learning: estimation
error and training time. These issues may be affected by the
network architecture and the training set. The network architecture includes the number of hidden nodes, number of hidden
layers and values of learning parameters. The training set is
related to the number of training patterns, inaccuracies of input
data and preprocessing of data. The backpropagation algorithm
does not always find the global minimum, but may stop at a
local minimum. In practice, the type of minimum has little
importance, as long as the desired mapping or classification
is reached with a desired accuracy. The optimization criterion
of the backpropagation algorithm is not very good, from the
pattern recognition point of view. The algorithm minimizes
the square error between the actual and the desired output, not
the number of faulty classifications, which is the main goal
in pattern recognition. The algorithm is too slow for practical applications, especially if many hidden layers are used.
In addition, a backpropagation net has poor memory. When
the net learns something new it forgets the old. Despite its
shortcomings, bac-propagation is broadly used. Although the
back-propagation algorithm has been a significant milestone,
many attempts have been made to speed up the convergence
and significant improvement are observed by using various
second order approaches, namely, Newton’s method, conjugate
gradient’s, or the LM optimization technique [5], [10], [21],
[48]. The issues to be dealt with are [84], [102] as follows:
1)
2)
3)
4)
slow convergence speed;
sensitivity to initial conditions;
trapping in local minima;
instability if learning rate is too large.
One of the alternatives for the problem of being trapped in a local
minimum is adding the momentum term using the BPM, which
also improves the convergence speed. Another alternative used
when a backpropagation algorithm is difficult to implement,
as in analog hardware, is the random weight change (RWC).
This algorithm has shown to be immune to offset sensitivity
and nonlinearity errors. It is a stochastic learning that makes
sure that the error function decreases on average, since it is
going up or down at any one time. It is often called simulated
annealing because of its operational similarity to annealing
processes [73]. Second order gradient methods use the matrix
with respect to the weights
.
of second derivatives of
However, computing this matrix is computationally expensive
and the methods presented tried to approximate this matrix
to make algorithms more accessible.
Linear reward–penalty, associative search, and adaptive critic
algorithms are characterized as a special case of supervised
learning called reinforcement learning. They do not need to
explicitly compute derivatives. Computation of derivatives
usually introduces a lot of high frequency noise in the control
loop. Therefore, they are very suitable for some complex
systems, where basic training algorithms may fail or produce
suboptimal results. On the other hand, those methods present
598
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
slower learning processes; and, because of this, they are adopted
especially in the cases where only a single bit of information
(for example, whether the output is right or wrong) is available.
themselves, have to be found from data in the training phase,
without supervision.
C. Process Control
VI. TAXONOMY OF NN APPLICATIONS
From the viewpoint of industrial applications, ANN applications can be divided into four main categories.
A. Modeling and Identification
Modeling and identification are techniques to describe the
physical principles existing between the input and the output
of a system. The ANN can approximate these relationships
independent of the size and complexity of the problem. It has
been found to be an effective system for learning discriminants,
for patterns from a body of examples. MLP is used as the
basic structure for a bunch of applications [4], [12], [17], [18],
[32], [46], [56], [85], [94], [119], [127]. The Hopfield network
can be used to identify problems of linear time-varying or
time-invariant systems [28]. Recurrent network topology [36],
[83] has received considerable attention for the identification
of nonlinear dynamical systems. A functional-link NN approach (FLN) was used to perform thermal dynamical system
identification [113].
B. Optimization and Classification
Optimization is often required for design, planning of actions, motions, and tasks. However, as known in the Traveling
Salesman Problem, many parameters cause the amount of calculation to be tremendous and the ordinary method cannot be
applied. An affective approach is to find the optimal solution
by defining an energy function and using the NN with parallel
processing, learning and self-organizing capabilities to operate
in such a way that the energy is reduced. It is shown that application of the optimal approach makes effective use of ANN
sensing, recognizing, and forecasting capabilities, in the control
of robotic manipulators with impact taken into account [45].
Classification using an ANN can also be viewed as an
optimization problem, provided that the existing rules to distinguish the various classes of events/materials/objects can be
described in functional form. In such cases, the networks will
decide if a particular input belongs to one of the defined classes
by optimizing the functional rules and a posteriori evaluating
the achieved results. Different authors [34], [70] have proposed
RBF approaches. For applications such as fault diagnosis, RBF
networks offer clear advantages over MLPs. They are faster
to train, because layer training is decoupled [70]. Cellular
networks [47], ART networks [122], and Hopfield networks
[105], [112] can be used as methods to detect, to isolate faults,
and to promote industrial quality control. MLPs are also widely
used for these purposes [23], [25], [35], [102], [114], [121].
This structure can be found in induction motor [41], [42] and
bearing [72] fault diagnosis, for nondestructive evaluation of
check valve performance and degradation [2], [57], in defect
detection on woven fabrics [99], and in robotic systems [116].
Finally, it is important to mention clustering applications,
which are special cases of classification where there is no
supervision during the training phase. The relationships between elements of the existing classes and even the classes
The NN makes use of nonlinearity, learning, parallel processing, and generalization capabilities for application to
advanced intelligent control. They can be classified into some
major methods, such as supervised control, inverse control,
neural adaptive control, back-propagation of utility (which is
an extended method of a back-propagation through time) and
adaptive critics (which is an extended method of reinforcement
learning algorithm) [2]. MLP structures were used for digital
current regulation of inverter drives [16], to predict trajectories
in robotic environments [19], [40], [52], [73], [79], [87], [89],
[110], to control turbo generators [117], to monitor feed water
flow rate and component thermal performance of pressurized
water reactors [61], to regulate temperature [64], and to predict
natural gas consumption [65]. Dynamical versions of MLP
networks were used to control a nonlinear dynamic model of
a robot [60], [97], to control manufacturing cells [92], and to
implement a programmable cascaded low-pass filter [101]. A
dynamic MLP is a classical MLP structure where the outputs
are fed back to the inputs by means of time delay elements.
Other structures can be found as functional link networks to
control robots [69] and RBF networks to predict, from operating
conditions and from features of a steel sheet, the thermal energy required to correct alloying [13]. RBF networks can be observed, as well, in predictive controllers for drive systems [38].
Hopfield structures were used for torque minimization control
of redundant manipulators [33]. FPNs can be used for function
approximation inside specific control schemes [100]. CMAC
networks were implemented in research automobiles [76] and
to control robots [79].
D. Pattern Recognition
Some specific ANN structures, such as Kohonen and probabilistic networks, are studied and applied mainly for image and
voice recognition. Research in image recognition includes initial vision (stereo vision of both eyes, outline extraction, etc.)
close to the biological (particularly brain) function, manually
written character recognition by cognition at the practical level
and cell recognition for mammalian cell cultivation by using
NNs [45]. Kohonen networks were used for image inspection
and for disease identification from mammographic images [98].
Probabilistic networks were used for transient detection to enhance nuclear reactors’ operational safety [6]. As in the others
categories, the MLP is widely used as well. The papermaking
industry [39] is one such example.
VII. CONCLUSION
This paper has described theoretical aspects of NNs related to
their relevance for industrial applications. Common questions
that an engineer would ask when choosing an NN for a particular application were answered. Characteristics of industrial
processes, which would justify the ANN utilization, were discussed and some areas of importance were proposed. Important
structures and training methods, with relevant references that illustrated the utilization of those concepts, were presented.
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
This survey observed that, although ANNs have a history
of more than 50 years, most of industrial applications were
launched in the last ten years, where it was justified that the
investigators provided either an alternative or a complement to
other classical techniques. Those ANN applications demonstrated adaptability features integrated with the industrial
problem, thus becoming part of the industrial processes. The
authors firmly believe that such an intricate field of NNs is just
starting to permeate a broad range of interdisciplinary problem
solving streams. The potential of NNs will be integrated into a
still larger and all-encompassing field of intelligence systems
and will soon be taught for students and engineers as an
ordinary mathematical tool.
REFERENCES
[1] J. S. Albus, “A new approach to manipulator control: The cerebellar
model articulation controller,” Trans. ASME, J. Dyn. Syst., Meas. Control, vol. 97, pp. 220–227, Sept. 1975.
[2] I. E. Alguíndigue and R. E. Uhrig, “Automatic fault recognition in mechanical components using coupled artificial neural networks,” in Proc.
IEEE World Congr. Computational Intelligence, June–July 1994, pp.
3312–3317.
[3] P. E. M. Almeida and M. G. Simões, “Fundamentals of a fast convergence parametric CMAC network,” in Proc. IJCNN’01, vol. 3, 2001,
pp. 3015–3020.
[4] K. Andersen, G. E. Cook, G. Karsai, and K. Ramaswamy, “Artificial
neural networks applied to arc welding process modeling and control,”
IEEE Trans. Ind. Applicat., vol. 26, pp. 824–830, Sept./Oct. 1990.
[5] T. J. Andersen and B. M. Wilamowski, “A modified regression algorithm
for fast one layer neural network training,” in Proc. World Congr. Neural
Networks, vol. 1, Washington DC, July 17–21, 1995, pp. 687–690.
[6] I. K. Attieh, A. V. Gribok, J. W. Hines, and R. E. Uhrig, “Pattern recognition techniques for transient detection to enhance nuclear reactors’ operational safety,” in Proc. 25th CNS/CNA Annu. Student Conf., Knoxville,
TN, Mar. 2000.
[7] S. M. Ayala, G. Botura Jr., and O. A. Maldonado, “AI automates substation control,” IEEE Comput. Applicat. Power, vol. 15, pp. 41–46, Jan.
2002.
[8] D. L. Bailey and D. M. Thompson, “Developing neural-network applications,” AI Expert, vol. 5, no. 9, pp. 34–41, 1990.
[9] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements
that can solve difficult control problems,” IEEE Trans. Syst., Man, Cybern., vol. SMC-13, pp. 834–846, Sept./Oct. 1983.
[10] R. Battiti, “First- and second-order methods for learning: Between
steepest descent and Newton’s method,” Neural Computation, vol. 4,
no. 2, pp. 141–166, 1992.
[11] B. Bavarian, “Introduction to neural networks for intelligent control,”
IEEE Contr. Syst. Mag., vol. 8, pp. 3–7, Apr. 1988.
[12] N. V. Bhat, P. A. Minderman, T. McAvoy, and N. S. Wang, “Modeling
chemical process systems via neural computation,” IEEE Contr. Syst.
Mag., vol. 10, pp. 24–30, Apr. 1990.
[13] G. Bloch, F. Sirou, V. Eustache, and P. Fatrez, “Neural intelligent control
for a steel plant,” IEEE Trans. Neural Networks, vol. 8, pp. 910–918,
July 1997.
[14] Z. Boger, “Experience in developing models of industrial plants by large
scale artificial neural networks,” in Proc. Second New Zealand International Two-Stream Conf. Artificial Neural Networks and Expert Systems,
1995, pp. 326–329.
[15] D. S. Broomhead and D. Lowe, “Multivariable functional interpolation
and adaptive network,” Complex Syst., vol. 2, pp. 321–355, 1988.
[16] M. Buhl and R. D. Lorenz, “Design and implementation of neural networks for digital current regulation of inverter drives,” in Conf. Rec.
IEEE-IAS Annu. Meeting, 1991, pp. 415–423.
[17] B. Burton and R. G. Harley, “Reducing the computational demands of
continually online-trained artificial neural networks for system identification and control of fast processes,” IEEE Trans. Ind. Applicat., vol.
34, pp. 589–596, May/June 1998.
[18] B. Burton, F. Kamran, R. G. Harley, T. G. Habetler, M. Brooke, and R.
Poddar, “Identification and control of induction motor stator currents
using fast on-line random training of a neural network,” in Conf. Rec.
IEEE-IAS Annu. Meeting, 1995, pp. 1781–1787.
599
[19] R. Carelli, E. F. Camacho, and D. Patiño, “A neural network based feed
forward adaptive controller for robots,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 1281–1288, Sept. 1995.
[20] G. A. Carpenter and S. Grossberg, “Associative learning, adaptive
pattern recognition and cooperative- competitive decision making,” in
Optical and Hybrid Computing, H. Szu, Ed. Bellingham, WA: SPIE,
1987, vol. 634, pp. 218–247.
[21] C. Charalambous, “Conjugate gradient algorithm for efficient training
of artificial neural networks,” Proc. Inst. Elect. Eng., vol. 139, no. 3, pp.
301–310, 1992.
[22] S. Chen and S. A. Billings, “Neural networks for nonlinear dynamic
system modeling and identification,” Int. J. Control, vol. 56, no. 2, pp.
319–346, 1992.
[23] R. P. Cherian, L. N. Smith, and P. S. Midha, “A neural network approach
for selection of powder metallurgy materials and process parameters,”
Artif. Intell. Eng., vol. 14, pp. 39–44, 2000.
[24] M. Y. Chow, P. M. Mangum, and S. O. Yee, “A neural network approach
to real-time condition monitoring of induction motors,” IEEE Trans. Ind.
Electron., vol. 38, pp. 448–453, Dec. 1991.
[25] M. Y. Chow, R. N. Sharpe, and J. C. Hung, “On the application and
design of artificial neural networks for motor fault detection—Part II,”
IEEE Trans. Ind. Electron., vol. 40, pp. 189–196, Apr. 1993.
, “On the application and design of artificial neural networks for
[26]
motor fault detection—Part I,” IEEE Trans. Ind. Electron., vol. 40, pp.
181–188, Apr. 1993.
[27] T. W. S. Chow and Y. Fang, “A recurrent neural-network based real-time
learning control strategy applying to nonlinear systems with unknown
dynamics,” IEEE Trans. Ind. Electron., vol. 45, pp. 151–161, Feb. 1998.
[28] S. R. Chu and R. Shoureshi, “Applications of neural networks in learning
of dynamical systems,” IEEE Trans. Syst., Man, Cybern., vol. 22, pp.
160–164, Jan./Feb. 1992.
[29] S. R. Chu, R. Shoureshi, and M. Tenorio, “Neural networks for system
identification,” IEEE Contr. Syst. Mag., vol. 10, pp. 31–35, Apr. 1990.
[30] L. O. Chua, T. Roska, T. Kozek, and Á. Zarándy, “The CNN Paradigm—A short tutorial,” in Cellular Neural Networks, T. Roska and J.
Vandewalle, Eds. New York: Wiley, 1993, pp. 1–14.
[31] M. Cichowlas, D. Sobczuk, M. P. Kazmierkowski, and M. Malinowski,
“Novel artificial neural network based current controller for PWM rectifiers,” in Proc. 9th Int. Conf. Power Electronics and Motion Control,
2000, pp. 41–46.
[32] G. E. Cook, R. J. Barnett, K. Andersen, and A. M. Strauss, “Weld modeling and control using artificial neural network,” IEEE Trans. Ind. Applicat., vol. 31, pp. 1484–1491, Nov./Dec. 1995.
[33] H. Ding and S. K. Tso, “A fully neural-network-based planning scheme
for torque minimization of redundant manipulators,” IEEE Trans. Ind.
Electron., vol. 46, pp. 199–206, Feb. 1999.
[34] G. Dini and F. Failli, “Planning grasps for industrial robotized applications using neural networks,” Robot. Comput. Integr. Manuf., vol. 16,
pp. 451–463, Dec. 2000.
[35] M. Dolen and R. D. Lorenz, “General methodologies for neural network programming,” in Proc. IEEE Applied Neural Networks Conf.,
Nov. 1999, pp. 337–342.
, “Recurrent neural network topologies for spectral state estimation
[36]
and differentiation,” in Proc. ANNIE Conf., St. Louis, MO, Nov. 2000.
[37] M. Dolen, P. Y. Chung, E. Kayikci, and R. D. Lorenz, “Disturbance
force estimation for CNC machine tool feed drives by structured neural
network topologies,” in Proc. ANNIE Conference, St. Louis, MO, Nov.
2000.
[38] Y. Dote, M. Strefezza, and A. Suyitno, “Neuro fuzzy robust controllers
for drive systems,” in Proc. IEEE Int. Symp. Industrial Electronics,
1993, pp. 229–242.
[39] P. J. Edwards, A. F. Murray, G. Papadopoulos, A. R. Wallace, J. Barnard,
and G. Smith, “The application of neural networks to the papermaking
industry,” IEEE Trans. Neural Networks, vol. 10, pp. 1456–1464, Nov.
1999.
[40] M. J. Er and K. C. Liew, “Control of adept one SCARA robot using
neural networks,” IEEE Trans. Ind. Electron., vol. 44, pp. 762–768, Dec.
1997.
[41] F. Filippetti, G. Franceschini, and C. Tassoni, “Neural networks aided
on-line diagnostics of induction motor rotor faults,” IEEE Trans. Ind.
Applicat., vol. 31, pp. 892–899, July/Aug. 1995.
[42] F. Filippetti, G. Franceschini, C. Tassoni, and P. Vas, “Recent developments of induction motor drives fault diagnosis using AI techniques,”
IEEE Trans. Ind. Electron., vol. 47, pp. 994–1004, Oct. 2000.
[43] D. Flynn, S. McLoone, G. W. Irwin, M. D. Brown, E. Swidenbank, and
B. W. Hogg, “Neural control of turbogenerator systems,” Automatica,
vol. 33, no. 11, pp. 1961–1973, 1997.
600
[44] D. B. Fogel, “Selecting an optimal neural network industrial electronics
society,” in Proc. IEEE IECON’90, vol. 2, 1990, pp. 1211–1214.
[45] T. Fukuda and T. Shibata, “Theory and applications of neural networks
for industrial control systems,” IEEE Trans. Ind. Applicat., vol. 39, pp.
472–489, Nov./Dec. 1992.
[46] A. A. Gorni, “The application of neural networks in the modeling of
plate rolling processes,” JOM-e, vol. 49, no. 4, electronic document, Apr.
1997.
[47] N. Guglielmi, R. Guerrieri, and G. Baccarani, “Highly constrained
neural networks for industrial quality control,” IEEE Trans. Neural
Networks, vol. 7, pp. 206–213, Jan. 1996.
[48] M. T. Hagan and M. Menhaj, “Training feedforward networks with
the Marquardt algorithm,” IEEE Trans. Neural Networks, vol. 5, pp.
989–993, Nov. 1994.
[49] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd
ed. New York: Prentice-Hall, 1995.
[50] S. M. Halpin and R. F. Burch, “Applicability of neural networks to industrial and commercial power systems: A tutorial overview,” IEEE Trans.
Ind. Applicat., vol. 33, pp. 1355–1361, Sept./Oct. 1997.
[51] H. Harrer and J. Nossek, “Discrete-time cellular neural networks,” in
Cellular Neural Networks, T. Roska and J. Vandewalle, Eds. New
York: Wiley, 1993, pp. 15–29.
[52] H. Hashimoto, T. Kubota, M. Sato, and F. Harashima, “Visual control of
robotic manipulator based on neural networks,” IEEE Trans. Ind. Electron., vol. 39, pp. 490–496, Dec. 1992.
[53] D. O. Hebb, The Organization of Behavior. New York: Wiley, 1949.
[54] G. E. Hinton and T. J. Sejnowski, “Learning and relearning in Boltzmann
machines,” in The PDP Research Group, D. Rumelhart and J. McClelland, Eds. Cambridge, MA: MIT Press, 1986.
[55] J. J. Hopfield, “Neural networks and physical systems with emergent
collective computational abilities,” in Proc. Nat. Acad. Sci., vol. 79, Apr.
1982, pp. 2445–2558.
[56] C. Y. Huang, T. C. Chen, and C. L. Huang, “Robust control of induction
motor with a neural-network load torque estimator and a neural-network
identification,” IEEE Trans. Ind. Electron., vol. 46, pp. 990–998, Oct.
1999.
[57] A. Ikonomopoulos, R. E. Uhrig, and L. H. Tsoukalas, “Use of neural
networks to monitor power plant components,” in Proc. American Power
Conf., vol. 54-II, Apr. 1992, pp. 1132–1137.
[58] M. Jelínek. (1999) Everything you wanted to know about ART
neural networks, but were afraid to ask. [Online]. Available:
http://cs.felk.cvut.cz/~xjeline1/semestralky/nan
[59] S. Jung and T. C. Hsia, “Neural network impedance force control of
robot manipulator,” IEEE Trans. Ind. Electron., vol. 45, pp. 451–461,
June 1998.
[60] A. Karakasoglu and M. K. Sundareshan, “A recurrent neural networkbased adaptive variable structure model-following control of robotic manipulators,” Automatica, vol. 31, no. 10, pp. 1495–1507, 1995.
[61] K. Kavaklioglu and B. R. Upadhyaya, “Monitoring feedwater flow
rate and component thermal performance of pressurized water reactors
by means of artificial neural networks,” Nucl. Technol., vol. 107, pp.
112–123, July 1994.
[62] M. Kawato, Y. Uno, M. Isobe, and R. Suzuki, “Hierarchical neural network model for voluntary movement with application to robotics,” IEEE
Contr. Syst. Mag., vol. 8, pp. 8–15, Apr. 1988.
[63] M. Khalid and S. Omatu, “A neural network controller for a temperature
control system,” IEEE Contr. Syst. Mag., vol. 12, pp. 58–64, June 1992.
[64] M. Khalid, S. Omatu, and R. Yusof, “Temperature regulation with neural
networks and alternative control schemes,” IEEE Trans. Neural Networks, vol. 6, pp. 572–582, May 1995.
[65] A. Khotanzad, H. Elragal, and T. L. Lu, “Combination of artificial
neural-network forecasters for prediction of natural gas consumption,”
IEEE Trans. Neural Networks, vol. 11, pp. 464–473, Mar. 2000.
[66] Y. H. Kim, F. L. Lewis, and D. M. Dawson, “Intelligent optimal control
of robotic manipulators using neural network,” Automatica, vol. 36, no.
9, pp. 1355–1364, 2000.
[67] B. Kosko, “Adaptive bi-directional associative memories,” Appl. Opt.,
vol. 26, pp. 4947–4960, 1987.
[68] S. Y. Kung and J. N. Hwang, “Neural network architectures for robotic
applications,” IEEE Trans. Robot. Automat., vol. 5, pp. 641–657, Oct.
1989.
[69] C. Kwan, F. L. Lewis, and D. M. Dawson, “Robust neural-network control of rigid-link electrically driven robots,” IEEE Trans. Neural Networks, vol. 9, pp. 581–588, July 1998.
[70] J. A. Leonard and M. A. Kramer, “Radial basis function networks for
classifying process faults,” IEEE Contr. Syst. Mag., vol. 11, pp. 31–38,
Apr. 1991.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003
[71] F. L. Lewis, A. Yesildirek, and K. Liu, “Multilayer neural-net robot controller with guaranteed tracking performance,” IEEE Trans. Neural Networks, vol. 7, pp. 388–399, Mar. 1996.
[72] B. Li, M. Y. Chow, Y. Tipsuwan, and J. C. Hung, “Neural-network based
motor rolling bearing fault diagnosis,” IEEE Trans. Ind. Electron., vol.
47, pp. 1060–1069, Oct. 2000.
[73] J. Liu, B. Burton, F. Kamran, M. A. Brooke, R. G. Harley, and T. G.
Habetler, “High speed on-line neural network of an induction motor immune to analog circuit nonidealities,” in Proc. IEEE Int. Symp. Circuits
and Systems, June 1997, pp. 633–636.
[74] Y. Liu, B. R. Upadhyaya, and M. Naghedolfeizi, “Chemometric data
analysis using artificial neural networks,” Appl. Spectrosc., vol. 47, no.
1, pp. 12–23, 1993.
[75] L. Ljung, System Identification: Theory for the User. New York: Prentice-Hall, 1987.
[76] M. Majors, J. Stori, and D. Cho, “Neural network control of automotive
fuel-injection systems,” IEEE Contr. Syst. Mag., vol. 14, pp. 31–36, June
1994.
[77] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent
in nervous activity,” Bull. Math. Biophys., vol. 5, pp. 115–133, 1943.
[78] M. Milanova, P. E. M. Almeida, J. Okamoto Jr., and M. G. Simões, “Applications of cellular neural networks for shape from shading problem,”
in Proc. Int. Workshop Machine Learning and Data Mining in Pattern
Recognition, Lecture Notes in Artificial Intelligence, P. Perner and M.
Petrou, Eds., Leipzig, Germany, September 1999, pp. 52–63.
[79] W. T. Miller III, “Real-time application of neural networks for sensorbased control of robots with vision,” IEEE Trans. Syst., Man, Cybern.,
vol. 19, pp. 825–831, July/Aug. 1989.
[80] M. L. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press, 1969.
[81] T. Munakata, Fundamentals of the New Artificial Intelligence—Beyond
Traditional Paradigms. Berlin, Germany: Springer-Verlag, 1998.
[82] S. R. Naidu, E. Zafiriou, and T. J. McAvoy, “Use of neural networks for
sensor failure detection in a control system,” IEEE Contr. Syst. Mag.,
vol. 10, pp. 49–55, Apr. 1990.
[83] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. Neural Networks,
vol. 1, pp. 4–27, Mar. 1990.
[84] G. W. Ng, Application of Neural Networks to Adaptive Control of Nonlinear Systems. London, U.K.: Research Studies Press, 1997.
[85] D. H. Nguyen and B. Widrow, “Neural networks for self-learning control
systems,” IEEE Contr. Syst. Mag., vol. 10, pp. 18–23, Apr. 1990.
[86] J. R. Noriega and H. Wang, “A direct adaptive neural-network control
for unknown nonlinear systems and its application,” IEEE Trans. Neural
Networks, vol. 9, pp. 27–34, Jan. 1998.
[87] T. Ozaki, T. Suzuki, T. Furuhashi, S. Okuma, and Y. Uchikawa, “Trajectory control of robotic manipulators using neural networks,” IEEE
Trans. Ind. Electron., vol. 38, June 1991.
[88] D. B. Parker, “A comparison of algorithms for neuron-like cells,” in
Neural Networks for Computing, J. S. Denker, Ed. New York: American Institute of Physics, 1986, pp. 327–332.
[89] P. Payeur, H. Le-Huy, and C. M. Gosselin, “Trajectory prediction for
moving objects using artificial neural networks,” IEEE Trans. Ind. Electron., vol. 42, pp. 147–158, Apr. 1995.
[90] M. H. Rahman, R. Fazlur, R. Devanathan, and Z. Kuanyi, “Neural network approach for linearizing control of nonlinear process plants,” IEEE
Trans. Ind. Electron., vol. 47, pp. 470–477, Apr. 2000.
[91] F. Rosenblatt, “The perceptron: A probabilistic model for information
storage and organization in the brain,” Psych. Rev., vol. 65, pp. 386–408,
1958.
[92] G. A. Rovithakis, V. I. Gaganis, S. E. Perrakis, and M. A. Christodoulou,
“Real-time control of manufacturing cells using dynamic neural networks,” Automatica, vol. 35, no. 1, pp. 139–149, 1999.
[93] A. Rubaai and M. D. Kankam, “Adaptive real-time tracking controller
for induction motor drives using neural designs,” in Conf. Rec. IEEE-IAS
Annu. Meeting, vol. 3, Oct. 1996, pp. 1709–1717.
[94] A. Rubaai and R. Kotaru, “Online identification and control of a DC
motor using learning adaptation of neural networks,” IEEE Trans. Ind.
Applicat., vol. 36, pp. 935–942, May/June 2000.
[95] D. D. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536,
1986.
[96] D. E. Rumelhart, B. Widrow, and M. A. Lehr, “The basic ideas in neural
networks,” Commun. ACM, vol. 37, no. 3, pp. 87–92, Mar. 1994.
[97] M. Saad, P. Bigras, L. A. Dessaint, and K. A. Haddad, “Adaptive robot
control using neural networks,” IEEE Trans. Ind. Electron., vol. 41, pp.
173–181, Apr. 1994.
MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs
[98] S. Sardy and L. Ibrahim, “Experimental medical and industrial applications of neural networks to image inspection using an inexpensive personal computer,” Opt. Eng., vol. 35, no. 8, pp. 2182–2187, Aug. 1996.
[99] S. Sardy, L. Ibrahim, and Y. Yasuda, “An application of vision system
for the identification and defect detection on woven fabrics by using
artificial neural networks,” in Proc. Int. Joint Conf. Neural Networks,
1993, pp. 2141–2144.
[100] A. P. A. Silva, P. C. Nascimento, G. L. Torres, and L. E. B. Silva, “An
alternative approach for adaptive real-time control using a nonparametric neural network,” in Conf. Rec. IEEE-IAS Annu. Meeting, 1995,
pp. 1788–1794.
[101] L. E. B. Silva, B. K. Bose, and J. O. P. Pinto, “Recurrent-neural-network-based implementation of a programmable cascaded low-pass filter
used in stator flux synthesis of vector-controlled induction motor drive,”
IEEE Trans. Ind. Electron., vol. 46, pp. 662–665, June 1999.
[102] T. Sorsa, H. N. Koivo, and H. Koivisto, “Neural networks in process
fault diagnosis,” IEEE Trans. Syst., Man. Cybern., vol. 21, pp. 815–825,
July/Aug. 1991.
[103] D. F. Specht, “Probabilistic neural networks for classification, mapping,
or associative memory,” in Proc. IEEE Int. Conf. Neural Networks, July
1988, pp. 525–532.
, “Probabilistic neural networks,” Neural Networks, vol. 3, pp.
[104]
109–118, 1990.
[105] A. Srinivasan and C. Batur, “Hopfield/ART-1 neural network-based
fault detection and isolation,” IEEE Trans. Neural Networks, vol. 5, pp.
890–899, Nov. 1994.
[106] W. E. Staib and R. B. Staib, “The intelligence arc furnace controller: A
neural network electrode position optimization system for the electric
arc furnace,” presented at the IEEE Int. Joint Conf. Neural Networks,
New York, NY, 1992.
[107] R. Steim, “Preprocessing data for neural networks,” AI Expert, pp.
32–37, Mar. 1993.
[108] K. Steinbuch and U. A. W. Piske, “Learning matrices and their applications,” IEEE Trans. Electron. Comput., vol. EC-12, pp. 846–862, Dec.
1963.
[109] F. Sun, Z. Sun, and PY. Woo, “Neural network-based adaptive controller
design of robotic manipulators with an observer,” IEEE Trans. Neural
Networks, vol. 12, pp. 54–67, Jan. 2001.
[110] M. K. Sundareshan and C. Askew, “Neural network-assisted variable
structure control scheme for control of a flexible manipulator arm,” Automatica, vol. 33, no. 9, pp. 1699–1710, 1997.
[111] R. S. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1996, vol. 8,
pp. 1038–1044.
[112] H. H. Szu, “Automatic fault recognition by image correlation neural network techniques,” IEEE Trans. Ind. Electron., vol. 40, pp. 197–208, Apr.
1993.
[113] J. Teeter and M. Y. Chow, “Application of functional link neural network to HVAC thermal dynamic system identification,” IEEE Trans.
Ind. Electron., vol. 45, pp. 170–176, Feb. 1998.
[114] L. Tsoukalas and J. Reyes-Jimenez, “Hybrid expert system-neural network methodology for nuclear plant monitoring and diagnostics,” in
Proc. SPIE Applications of Artificial Intelligence VIII, vol. 1293, Apr.
1990, pp. 1024–1030.
[115] R. E. Uhrig, “Application of artificial neural networks in industrial
technology,” in Proc. IEEE Int. Conf. Industrial Technology, 1994, pp.
73–77.
[116] A. T. Vemuri and M. M. Polycarpou, “Neural-network-based robust fault
diagnosis in robotic systems,” IEEE Trans. Neural Networks, vol. 8, pp.
1410–1420, Nov. 1997.
[117] G. K. Venayagamoorthy and R. G. Harley, “Experimental studies with a
continually online-trained artificial neural network controller for a turbo
generator,” in Proc. Int. Joint Conf. Neural Networks, vol. 3, Washington, DC, July 1999, pp. 2158–2163.
[118] B. W. Wah and G. J. Li, “A survey on the design of multiprocessing
systems for artificial intelligence applications,” IEEE Trans. Syst., Man,
Cybern., vol. 19, pp. 667–692, July/Aug. 1989.
[119] S. Weerasooriya and M. A. El-Sharkawi, “Identification and control of
a DC motor using back-propagation neural networks,” IEEE Trans. Energy Conversion, vol. 6, pp. 663–669, Dec. 1991.
[120] P. J. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,” Ph.D. dissertation, Harvard Univ., Cambridge, MA, 1974.
[121]
, “Maximizing long-term gas industry profits in two minutes in
lotus using neural network methods,” IEEE Trans. Syst., Man, Cybern.,
vol. 19, pp. 315–333, Mar./Apr. 1989.
601
[122] J. R. Whiteley, J. F. Davis, A. Mehrotra, and S. C. Ahalt, “Observations
and problems applying ART2 for dynamic sensor pattern interpretation,”
IEEE Trans. Syst., Man, Cybern. A, vol. 26, pp. 423–437, July 1996.
[123] B. Widrow, DARPA Neural Network Study. Fairfax, VA: Armed Forces
Communications and Electronics Assoc. Int. Press, 1988.
[124] B. Widrow and M. E. Hoff Jr., “Adaptive switching circuits,” 1960 IRE
Western Electric Show Conv. Rec., pt. 4, pp. 96–104, Aug. 1960.
[125] N. Wiener, Cybernetics. Cambridge, MA: MIT Press, 1961.
[126] M. J. Willis, G. A. Montague, D. C. Massimo, A. J. Morris, and M.
T. Tham, “Artificial neural networks and their application in process
engineering,” in IEE Colloq. Neural Networks for Systems: Principles
and Applications, 1991, pp. 71–74.
[127] M. Wishart and R. G. Harley, “Identification and control of induction
machines using artificial neural networks,” IEEE Trans. Ind. Applicat.,
vol. 31, pp. 612–619, May/June 1995.
[128] J. M. Zurada, Introduction to Artificial Neural Networks. Boston, MA:
PWS–Kent, 1995.
Magali R. G. Meireles received the B.E. degree
from the Federal University of Minas Gerais, Belo
Horizonte, Brazil, in 1986, and the M.Sc. degree
from the Federal Center for Technological Education,
Belo Horizonte, Brazil, in 1998, both in electrical
engineering.
She is an Associate Professor in the Mathematics
and Statistics Department, Pontific Catholic University of Minas Gerais, Belo Horizonte, Brazil.
Her research interests include applied artificial
intelligence and engineering education. In 2001, she
was a Research Assistant in the Division of Engineering, Colorado School of
Mines, Golden, where she conducted research in the Mechatronics Laboratory.
Paulo E. M. Almeida (S’00) received the B.E. and
M.Sc. degrees from the Federal University of Minas
Gerais, Belo Horizonte, Brazil, in 1992 and 1996,
respectively, both in electrical engineering, and the
Dr.E. degree from São Paulo University, São Paulo,
Brazil.
He is an Assistant Professor at the Federal Center
for Technological Education of Minas Gerais, Belo
Horizonte, Brazil. His research interests are applied
artificial intelligence, intelligent control systems, and
industrial automation. In 2000–2001, he was a Visiting Scholar in the Division of Engineering, Colorado School of Mines, Golden,
where he conducted research in the Mechatronics Laboratory.
Dr. Almeida is a member of the Brazilian Automatic Control Society. He received a Student Award and a Best Presentation Award from the IEEE Industrial
Electronics Society at the 2001 IEEE IECON, held in Denver, CO.
Marcelo Godoy Simões (S’89–M’95–SM’98)
received the B.S. and M.Sc. degrees in electrical
engineering from the University of São Paulo, São
Paulo, Brazil, in 1985 and 1990, respectively, the
Ph.D. degree in electrical engineering from the
University of Tennessee, Knoxville, in 1995, and
the Livre-Docencia (D.Sc.) degree in mechanical
engineering from the University of São Paulo, in
1998.
He is currently an Associate Professor at the Colorado School of Mines, Golden, where he is working
to establish several research and education activities. His interests are in the
research and development of intelligent applications, fuzzy logic and neural
networks applications to industrial systems, power electronics, drives, machine
control, and distributed generation systems.
Dr. Simões is a recipient of a National Science Foundation (NSF)—Faculty
Early Career Development (CAREER) Award, which is the NSF’s most prestigious award for new faculty members, recognizing activities of teacher/scholars
who are considered most likely to become the academic leaders of the 21st
century.
Download