IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 585 A Comprehensive Review for Industrial Applicability of Artificial Neural Networks Magali R. G. Meireles, Paulo E. M. Almeida, Student Member, IEEE, and Marcelo Godoy Simões, Senior Member, IEEE Abstract—This paper presents a comprehensive review of the industrial applications of artificial neural networks (ANNs), in the last 12 years. Common questions that arise to practitioners and control engineers while deciding how to use NNs for specific industrial tasks are answered. Workable issues regarding implementation details, training and performance evaluation of such algorithms are also discussed, based on a judiciously chronological organization of topologies and training methods effectively used in the past years. The most popular ANN topologies and training methods are listed and briefly discussed, as a reference to the application engineer. Finally, ANN industrial applications are grouped and tabulated by their main functions and what they actually performed on the referenced papers. The authors prepared this paper bearing in mind that an organized and normalized review would be suitable to help industrial managing and operational personnel decide which kind of ANN topology and training method would be adequate for their specific problems. Index Terms—Architecture, industrial control, neural network (NN) applications, training. I. INTRODUCTION I N engineering and physics domains, algebraic and differential equations are used to describe the behavior and functioning properties of real systems and to create mathematical models to represent them. Such approaches require accurate knowledge of the system dynamics and the use of estimation techniques and numerical calculations to emulate the system operation. The complexity of the problem itself may introduce uncertainties, which can make the modeling nonrealistic or inaccurate. Therefore, in practice, approximate analysis is used and linearity assumptions are usually made. Artificial neural networks (ANNs) implement algorithms that attempt to achieve a neurological related performance, such as learning from experience, making generalizations from similar situations and judging states where poor results were achieved in the past. ANN history begins in the early 1940s. However, only in the mid-1980s these algorithms became scientifically sound and caManuscript received October 23, 2001; revised September 20, 2002. Abstract published on the Internet March 4, 2003. This work was supported by the National Science Foundation under Grant ECS 0134130. M. R. G. Meireles was with the Colorado School of Mines, Golden, CO 80401 USA. She is now with the Mathematics and Statistics Department, Pontific Catholic University of Minas Gerais, 30.535-610 Belo Horizonte, Brazil (e-mail: magali@pucminas.br). P. E. M. Almeida was with the Colorado School of Mines, Golden, CO 80401 USA. He is now with the Federal Center for Technological Education of Minas Gerais, 30.510-000 Belo Horizonte, Brazil (e-mail: paulo@dppg.cefetmg.br). M. G. Simões is with the Colorado School of Mines, Golden, CO 80401 USA (e-mail: msimoes@mines.edu). Digital Object Identifier 10.1109/TIE.2003.812470 pable of application. Since the late 1980s, ANN started to be utilized in a plethora of industrial applications. Nowadays, ANN are being applied to a lot of real world, industrial problems, from functional prediction and system modeling (where physical processes are not well understood or are highly complex), to pattern recognition engines and robust classifiers, with the ability to generalize while making decisions about imprecise input data. The ability of ANN to learn and approximate relationships between input and output is decoupled from the size and complexity of the problem [49]. Actually, as relationships based on inputs and outputs are enriched, approximation capability improves. ANN offers ideal solutions for speech, character and signal recognition. There are many different types of ANN. Some of the more popular include multilayer perceptron (MLP) (which is generally trained with the back-propagation of error algorithm), learning vector quantization, radial basis function (RBF), Hopfield and Kohonen, to name a few. Some ANN are classified as feed forward while others are recurrent (i.e., implement feedback) depending on how data is processed through the network. Another way of classifying ANN types is by their learning method (or training), as some ANN employ supervised training, while others are referred to as unsupervised or self-organizing. This paper concentrates on industrial applications of neural networks (NNs). It was found that training methodology is more conveniently associated with a classification of how a certain NN paradigm is supposed to be used for a particular industrial problem. There are some important questions to answer, in order to adopt an ANN solution for achieving accurate, consistent and robust modeling. What is required to use an NN? How are NNs superior to conventional methods? What kind of problem functional characteristics should be considered for an ANN paradigm? What kind of structure and implementation should be used in accordance to an application in mind? This article will follow a procedure that will bring such managerial questions together and into a framework that can be used to evaluate where and how such technology fits for industrial applications, by laying out a classification scheme by means of clustered concepts and distinctive characteristics of ANN engineering. II. NN ENGINEERING Before laying out the foundations for choosing the best ANN topology, learning method and data handling for classes of industrial problems, it is important to understand how artificial intelligence (AI) evolved with required computational resources. 0278-0046/03$17.00 © 2003 IEEE 586 Artificial intelligence applications moved away from laboratory experiments to real world implementations. Therefore, software complexity also became an issue since conventional Von Neumann machines are not suitable for symbolic processing, nondeterministic computations, dynamic execution, parallel, distributed processing, and management of extensive knowledge bases [118]. In many AI applications, the knowledge needed to solve a problem may be incomplete, because the source of the knowledge is unknown at the time the solution is devised, or the environment may be changing and cannot be anticipated at design time. AI systems should be designed with an open concept that allows continuous refinement and acquisition of new knowledge. There exist engineering problems for which finding the perfect solution requires a practically impossible amount of resources and an acceptable solution would be fine. NNs can give good solutions for such classes of problems. Tackling the best ANN topology, learning method and data handling themselves become engineering approaches. The success of using ANN for any application depends highly on the data processing, (i.e., data handling before or during network operation). Once variables have been identified and data has been collected and is ready to use, one can process it in several ways, to squeeze more information out of and filter it. A common technique for coping with nonnormal data is to perform a nonlinear transform to the data. To apply a transform, one simply takes some function of the original variable and uses the functional transform as a new input to the model. Commonly used nonlinear transforms include powers, roots, inverses, exponentials, and logarithms [107]. A. Assessment of NN Performance An ANN must be used in problems exhibiting knottiness, nonlinearity, and uncertainties that justify its utilization [45]. They present the following features to cope with such complexities: • learning from training data used for system identification; finding a set of connection strengths will allow the network to carry out the desired computation [96]; • generalization from inputs not previously presented during the training phase; by accepting an input and producing a plausible response determined by the internal ANN connection structure makes such a system robust against noisy data, features exploited in industrial applications [59]; • mapping of nonlinearity making them suitable for identification in process control applications [90]; • parallel processing capability, allowing fast processing for large-scale dynamical systems; • applicable to multivariable systems; they naturally process many inputs and have many outputs. • used as a black-box approach (no prior knowledge about a system) and implemented on compact processors for space and power constrained applications. In order to select a good NN configuration, there are several factors to take into consideration. The major points of interest IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 regarding the ANN topology selection are related to network design, training, and practical considerations [25]. B. Training Considerations Considerations, such as determining the input and output variables, choosing the size of the training data set, initializing network weights, choosing training parameter values (such as learning rate and momentum rate), and selecting training stopping criteria, are important for several network topologies. There is no generic formula that can be used to choose the parameter values. Some guidelines can be followed as an initial trial. After a few trials, the network designer should have enough experience to set appropriate criteria that suit a given problem. The initial weights of an NN play a significant role in the convergence of the training method. Without a priori information about the final weights, it is a common practice to initialize all weights randomly with small absolute values. In linear vector quantization and derived techniques, it is usually required to renormalize the weights at every training epoch. A critical parameter is the speed of convergence, which is determined by the learning coefficient. In general, it is desirable to have fast learning, but not so fast as to cause instability of learning iterations. Starting with a large learning coefficient and reducing it as the learning process proceeds, results in both fast learning and stable iterations. The momentum coefficients are, usually, set according to a schedule similar to the one for the learning coefficients [128]. Selection of training data plays a vital role in the performance of a supervised NN. The number of training examples used to train an NN is sometimes critical to the success of the training process. If the number of training examples is not sufficient, then the network cannot correctly learn the actual input–output relation of the system. If the number of training examples is too large, then the network training time will be longer. For some applications, such as real-time adaptive neural control, training time is a critical variable. For others, such as training the network to perform fault detection, the training can be performed off-line and more training data are preferred, over using insufficient training data to achieve greater network accuracy. Generally, rather than focusing on volume, it is better to concentrate on the quality and representational nature of the data set. A good training set should contain routine, unusual and boundary-condition cases [8]. Popular criteria used to stop network training are small meansquare training error and small changes in network weights. Definition about how small is usually up to the network designer and is based on the desired accuracy level of the NN. Using as an example a motor bearing fault diagnosis process, they used a learning rate of 0.01 and momentum of 0.8; the training was stopped when the root mean-square error of the training set or the change in network weights was sufficiently small for that application (less than 0.005) [72]. Therefore, if any prior information about the relationship between inputs and outputs is available and used correctly, the network structure and training time can be reduced and the network accuracy can be significantly improved. MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs TABLE I ORGANIZATION OF NNS BASED ON THEIR FUNCTIONAL CHARACTERISTICS C. Network Design Some of the design considerations include determining the number of input and output nodes to be used, the number of hidden layers in the network and the number of hidden nodes used in each hidden layer. The number of input nodes is typically taken to be the same as the number of state variables. The number of output nodes is typically the number that identifies the general category of the state of the system. Each node constitutes a processing element and it is connected through various weights to other elements. In the past, there was a general practice of increasing the number of hidden layers, to improve training performance. Keeping the number of layers at three and adjusting the number of processing elements in the hidden layer, can achieve the same goal. A trial-and-error approach is usually used to determine the number of hidden layer processing elements, starting with a low number of hidden units and increasing this number as learning problems occur. Even though choosing these parameters is still a trial-and-error process, there are some guidelines that can be used, (i.e., testing the network’s performance). It is a common practice to choose a set of training data and a set of testing data that are statistically significant and representative of the system under consideration. The training data set is used to train the NN, while the testing data is used to test the network performance, after the training phase finishes. D. Practical Considerations Practical considerations regarding the network accuracy, robustness and implementation issues must be addressed, for real-world implementation. For ANN applications, it is usually considered a good estimation performance when pattern recognition achieves more than 95% of accuracy in overall and comprehensive data recalls [25]. Selection and implementation of the network configuration needs to be carefully studied since it is desirable to use the smallest possible number of nodes while maintaining a suitable level of conciseness. Pruning algorithms try to make NNs smaller by trimming unnecessary links or units, so the cost of runtime, memory and hardware implementation can be minimized and generalization is improved. Depending on the application, some system functional characteristics are important in deciding which ANN topology should be used [81]. Table I summarizes the most common ANN structures used for pattern recognition, associative memory, optimization, function approximation, modeling and control, image processing, and classification purposes. 587 III. EVOLUTION OF UNDERLYING FUNDAMENTALS THAT PRECEDED INDUSTRIAL APPLICATIONS While there are several tutorials and reviews discussing the full range of NNs topologies, learning methods and algorithms, the authors in this paper intend to cover what had actually been applied to industrial applications. An initial historical perspective is important to get the picture for the age of industrial applications, which started in 1988, just after the release of [123] by Widrow. It is well known that the concept of NNs came into existence around the Second World War. In 1943, McCulloch and Pitts proposed the idea that a mind-like machine could be manufactured by interconnecting models based on behavior of biological neurons, laying out the concept of neurological networks [77]. Wiener gave this new field the popular name cybernetics, whose principle is the interdisciplinary relationship among engineering, biology, control systems, and brain functions [125]. At that time, computer architecture was not fully defined and the research led to what is today defined as the Von Neumann-type computer. With the progress in research on the brain and computers, the objective changed from the mind-like machine to manufacturing a learning machine, for which Hebb’s learning model was initially proposed [53]. In 1958, Rosenblatt from the Cornell Aeronautical Laboratory put together a learning machine, called the “perceptron.” That was the predecessor of current NNs. He gave specific design guidelines used by the early 1960s [91]. Widrow and Hoff proposed the “ADALINE” (ADAptive LINear Element), a variation on the pPerceptron, based on a supervised learning rule (the “error correction rule”) which could learn in a faster and more accurate way: synaptic strengths were changed in proportion to the error (what the output is and what it should have been) multiplied by the input. Such a scheme was successfully used for echo cancellation in telephone lines and is considered to be the first industrial application of NNs [124]. During the 1960s, the forerunner for current associative memory systems was the work of Steinbuch with his “Learning Matrix,” which was a binary matrix accepting a binary vector as input, producing a binary vector as output and capable of forming associations between pairs with a Boolean Hebbian learning procedure [108]. The perceptron received considerable excitement, when it was first introduced, because of its conceptual simplicity. The ADALINE is a weighted sum of the inputs, together with a least-mean-square (LMS) algorithm to adjust the weights and to minimize the difference between the desired signal and the actual output. Because of the rigorous mathematical foundation of the LMS algorithm, ADALINE has become a powerful tool for adaptive signal processing and adaptive control, leading to work on competitive learning and self-organization. However, Minsky and Papert proved mathematically that the Perceptron could not be used for a class of problems defined as nonseparable logic functions [80]. Very few investigators conducted research on NNs during the 1970s. Albus developed his adaptive “Cerebellar Model Articulation Controller” (CMAC), which is a distributed table-lookup system based on his view of models of human memory [1]. In 1974, Werbos originally developed the backpropagation algorithm. Its first practical application was to estimate a dynamic 588 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 model, to predict nationalism and social communications [120]. However, his work remained almost unknown in the scientific community for more than ten years. In the early 1980s, Hopfield introduced a recurrent-type NN, which was based on Hebbian learning law. The model consisted of a set of first-order (nonlinear) differentiable equations that minimize a certain energy function [55]. In the mid-1980s, backpropagation was rediscovered by two independent groups led by Parker and Rumelhart et al., as the learning algorithm of feedforward NNs [88], [95]. Grossberg and Carpenter made significant contributions with the “Adaptive Resonance Theory” (ART) in the mid-1980s, based on the idea that the brain spontaneously organizes itself into recognition codes and neurons organize themselves to tune various and specific patterns defined as self-organizing maps [20]. The dynamics of the network were modeled by first order differentiable equations based on implementations of pattern clustering algorithms. Furthermore, Kosko extended some of the ideas of Grossberg and Hopfield to develop his adaptive “Bi-directional Associative Memory” (BAM) [67]. Hinton, Sejnowski, and Ackley developed the “Boltzmann Machine,” which is a kind of Hopfield net that settles into solutions by a simulated annealing process as a stochastic technique [54]. Broomhead and Lowe first introduced “RBF networks” in 1988 [15]. Although the basic idea of RBF was developed before, under the name method of potential function, their work opened another NN frontier. Chen proposed functional-link networks (FLNs), where a nonlinear functional transform of the network inputs aimed lower computational efforts and fast convergence [22]. The 1988 DARPA NN study listed various NN applications supporting the importance of such technology for commercial and industrial applications and triggering a lot of interest in the scientific community, leading eventually to applications in industrial problems. Since then, the application of ANN to sophisticated systems has skyrocketed. NNs found widespread relevance for several different fields. Our literature review showed that practical industrial applications were reported in peer-reviewed engineering journals from as early as 1988. Extensive use has been reported in pattern recognition and classification for image and speech recognition, optimization in planning of actions, motions, and tasks and modeling, identification, and control. Fig. 1 shows some industrial applications of NNs reported in the last 12 years. The main purpose here is to give an idea of the most used ANN topologies and training algorithms and to relate them to common fields in the industrial area. For each entry, the type of the application, used ANN topology, implemented training algorithm, and the main authors are presented. The collected data give a good picture of what has actually migrated from academic research to practical industrial fields and shows some of the authors and groups responsible for this migration. IV. DILEMMATIC PIGEONHOLE OF NEURAL STRUCTURES Choosing an ANN solution for an immediate application is a situation that requires a choice between options that are (or seem) equally unfavorable or mutually exclusive. Several issues must be considered when regarding the problem point of view. The main features of an ANN can be classified as follows [81]. • Topology of the networks: multilayered, single-layered, or recurrent. The network is multilayered if it has distinct layers such as input, hidden and output. There are no connections among the neurons within the same layer. If each neuron can be connected with every other neuron in the network through directed edges, except the output node, this network is called single layered (i.e., there is no hidden layer). A recurrent network distinguishes itself from the feedforward topologies in that it has at least one feedback loop [49]. • Data flow: recurrent or nonrecurrent. A nonrecurrent or feedforward model where the outputs always propagate from left to right in the diagrams. The outputs from the input layer neurons propagate to the right, becoming inputs to the hidden layer neurons and then, outputs from the hidden layer neurons propagate to the right, becoming inputs to the output layer neurons. An NN, in which the outputs can propagate in both directions, forward and backward, is called a recurrent model. • Types of input values: binary, bipolar or continuous. Neurons in an artificial network can be defined to process different kinds of signals. The most common types are binary (restricted to either 0 or 1), bipolar (either 1 or 1) and continuous (continuous real numbers in a certain range). • Forms of activation: linear, step, or sigmoid. Activation functions will define the way neurons will behave inside the network structure and, therefore, the kind of relationship that will occur between input and output signals. A common classification of ANNs is based on the way in which their elements are interconnected. There is a study that presents the approximate percentage of network utilization as: MLP, 81.2%; Hopfield, 5.4%; Kohonen, 8.3%; and the others, 5.1% [49]. This section will cover the main types of networks that have been used in industrial applications, in a reasonable number of reports and applications. A comprehensive listing of all available ANN structures and topologies is out of the scope of this discussion. A. MLPs In this structure, each neuron output is connected to every neuron in subsequent layers connected in cascade with no connections between neurons in the same layer. A typical diagram of this structure is detailed in Fig. 2. MLP has been reported in several applications. Some examples are speed control of dc motors [94], [117], diagnostics of induction motor faults [24], [25], [41], [42], induction motor control [17], [18], [56], [127], and current regulator for pulsewidth-modulation (PWM) rectifiers [31]. Maintenance and sensor failure detection was reported in [82], check valves operating in a nuclear power plant [57], [114], and vibration monitoring in rolling element bearings [2]. It was widely applied in feedback control [19], [40], [52], [59], [87], [89], [109], [110] and fault diagnosis of robotic systems [116]. This structure was also used in a temperature control system MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs 589 Fig. 1. Selected industrial applications reported since 1989. [63], [64], monitoring feed water flow rate and component thermal performance of pressurized water reactors [61], and fault diagnosis in a heat exchanger continuous stirred tank reactor system [102]. It was used in a controller for turbo 590 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 Fig. 2. MLP basic topology. Fig. 4. Hopfield network structure. Fig. 3. Typical recurrent network structure. generators [117], digital current regulation of inverter drivers [16], and welding process modeling and control [4], [32]. The MLP was used in modeling chemical process systems [12], to produce quantitative estimation of concentration of chemical components [74], and to select powder metallurgy materials and process parameters [23]. Optimization of the gas industry was reported by [121], as well as prediction of daily natural gas consumption needed by gas utilities [65]. The MLP is indeed the most used structure and spread out across several disciplines like identification and defect detection on woven fabrics [99], prediction of paper cure in the papermaking industry [39], controller steering backup truck [85], and modeling of plate rolling processes [46]. B. Recurrent NNs (RNNs) A network is called recurrent when the inputs to the neurons come from external input, as well as from the internal neurons, consisting of both feed forward and feedback connections between layers and neurons. Fig. 3 shows such a structure; it was demonstrated that recurrent network could be effectively used for modeling, identification, and control of nonlinear dynamical systems [83]. The trajectory-tracking problem, of controlling the nonlinear dynamic model of a robot, was evaluated using an RNN; this network was used to estimate the dynamics of the system and its inverse dynamic model [97]. It was also used to control robotic manipulators, facilitating the rapid execution of the adaptation process [60]. A recurrent network was used to approximate a trajectory tracking to a very high degree of accuracy [27]. It was applied to estimate the spectral content of noisy periodic waveforms that are common in engineering processes [36]. The Hopfield network model is the most popular type of recurrent NN. It can be used as associative memory and can also be applied to optimization problems. The basic idea of the Hopfield network is that it can store a set of exemplar patterns as multiple stable states. Given a new input pattern, which may be partial or noisy, the network can converge to one of the exemplar patterns nearest to the input pattern. As shown in Fig. 4, a Hopfield network consists of a single layer of neurons. The network is recurrent and fully interconnected (every neuron in the network is connected to every other neuron). Each input/output takes a discrete bipolar value of either 1 or 1 [81]. A Hopfield network was used to indicate how to apply it to the problem of linear system identification, minimizing the least square of error rates of estimates of state variables [29]. A modified Hopfield structure was used to determine the imperfection by the degree of orthogonality between the automated extracted feature, from the send-through image and the class feature of early good samples. The performance measure used for such an automatic feature extraction is based on a certain mini-max cost function useful for image classification [112]. Simulation results illustrated the Hopfield network’s use, showing that this technique can be used to identify the frequency transfer functions of dynamic plants [28]. An approach to detect and isolate faults in linear dynamic systems was proposed and systems parameters were estimated by Hopfield network [105]. C. Nonnrecurrent Unsupervised Kohonen Networks A Kohonen network is a structure of interconnected processing units that compete for the signal. It uses unsupervised learning and consists of a single layer of computational nodes (and an input layer). This type of network uses lateral feedback, which is a form of feedback whose magnitude is dependent on the lateral distance from the point of application. Fig. 5 shows the architecture with two layers. The first is the input layer and the second is the output layer, called the Kohonen layer. MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs 591 Fig. 7. Adaptive resonance theory network. Fig. 5. Fig. 6. Kohonen network structure. CMAC network structure. Every input neuron is connected to every output neuron with its associated weight. The network is nonrecurrent, input information propagates only from the left to right. Continuous (rather than binary or bipolar) input values representing patterns are presented sequentially in time through the input layer, without specifying the desired output. The output neurons can be arranged in one or two dimensions. A neighborhood parameter, or radius, , can be defined to indicate the neighborhood of a specific neuron. It has been used as a self-organization map for classification [98] and pattern recognition purposes in general. D. CMAC The input mapping of the CMAC algorithm can be seen as a set of multidimensional interlaced receptive fields, each one with finite and sharp borders. Any input vector to the network excites some of these fields, while the majority of the receptive fields remain unexcited, not contributing to the corresponding output. On the other hand, the weighted average of the excited receptive fields will form the network output. Fig. 6 shows a schematic diagram of this structure. This figure depicts the nonlinear input mapping in the Albus approach and a hashing operation that can be performed to decrease the amount of memory needed to implement the receptive fields. CMAC networks are considered local algorithms because, for a given input vector, only a few receptive fields will be active and contribute to the corresponding network output [3]. In the same way, the training algorithm for a CMAC network should affect only the weights corresponding to active fields, excluding the majority of inactive fields in the network. This increases the efficiency of the training process, minimizing the computational efforts needed to perform adaptation in the whole network. CMAC was primarily applied to complex robotic systems involving multiple feedback sensors and multiple command variables. Common experiments involved control of position and orientation of an object using a video camera mounted at the end of a robot arm and moving objects with arbitrary orientation relative to the robot [79]. This network was also used for air-to-fuel ratio control of automotive fuel-injection systems. Experimental results showed that the CMAC is very effective in learning the engine nonlinearities and in dealing with the significant time delays inherent in engine sensors [76]. E. Adaptive Resonance Theory (ART) (Recurrent, Unsupervised) The main feature of ART, when compared to other similar structures, is its ability to not forget after learning. Usually, NNs are not able to learn new information without damaging what was previously ascertained. This is caused by the fact that when a new pattern is presented to an NN in the phase of learning, the network tries modifying the weights at node inputs, which only represent what was previously learned. The ART network is recurrent and self-organizing. Its structure is shown in Fig. 7. It has two basic layers and no hidden layers. The input layer is also called “comparing” while the output layer is called “recognizing.” This network is composed of two completely interconnected layers in both directions. [58]. It was successfully used for sensor pattern interpretation problems [122], among others. 592 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 Fig. 9. Probabilistic ANN structure. Fig. 8. RBF network structure. F. RBF Networks Fig. 8 shows the basic structure of an RBF network. The input nodes pass the input values to the connecting arcs and the first layer connections are not weighted. Thus, each hidden node receives each input value unaltered. The hidden nodes are the RBF units. The second layer of connections is weighted and the output nodes are simple summations. This network does not extend to more layers. For applications such as fault diagnosis, RBF networks offer advantages over MLP. It is faster to train because training of the two layers is decoupled [70]. This network was used to improve the quantity and the quality of the galvanneale sheet produced on galvanizing lines, integrating various approaches, including quality monitoring, diagnosis, control, and optimization methods [13]. RBF was trained to evaluate and compare the different grasping alternatives by a robotic hand, according to geometrical and technological aspects of object surfaces, as well as the specific task to be accomplished. The adoption of RBF topology was justified by two reasons [34]. • In most cases, it presents higher training speed when compared with ANN based on back-propagation training methods. • It allows an easier optimization of performance, since the only parameter that can be used to modify its structure is the number of neurons in the hidden layer. Results using RBF networks were presented to illustrate that it is possible to successfully control a generator system [43]. Power electronic drive controllers have also been implemented with these networks, in digital signal processors (DSPs), to attain robust properties [38]. G. Probabilistic NNs (PNNs) PNNs are somewhat similar in structure to MLPs. Some basic differences among them are the use of activation by exponential functions and the connection patterns between neurons. As a matter of fact, the neurons at the internal layers are not fully connected, depending on the application in turn. Fig. 9 depicts this structure, showing its basic differences from ordinary MLP structure. PNN training is normally easy and instantaneous, because of the smaller number of connections. Fig. 10. Polynomial ANN structure. A practical advantage is that, unlike other networks, it operates completely in parallel and the signal flows in a unique direction, without a need for feedback from individual neurons to the inputs. It can be used for mapping, classification and associative memory, or to directly estimate a posteriori probabilities [103], [104]. Probabilistic NNs were used to assist operators while identifying transients in nuclear power plants, such as plant accident scenario, equipment failure or an external disturbance to the system, at the earliest stages of their developments [6]. H. Polynomial Networks Fig. 10 depicts a polynomial network. It has its topology formed during the training process. Due to this feature, it is defined as a plastic network. The neuron activation function is based on elementary polynomials of arbitrary order. In this example, the network has seven inputs, although the network uses only five of them. This is due to the automatic input selection capability of the training algorithm. Automatic feature selection is very useful in control applications when the plant model order is unknown. Each neuron output can be expressed by a secondorder polynomial function , where and are inputs and and are polynomial coefficients which are equivalent to the network weights and is the neuron output. The Group Method of Data Handling (GMDH) is a statistics-based training method largely used in modeling economic, ecological, environmental and medical problems. The GMDH training algorithm can be MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs 593 Fig. 12. Fig. 11. Functional link ANN structure. used to adjust the polynomial coefficients and to find the network structure. This algorithm employs two sets of data: One for estimating the network weights and the other for testing with neurons should survive during the training process. A new form of implementation of a filter was proposed using a combination of recurrent NN and polynomial NN [101]. I. FLNs Since NNs are used for adaptive identification and control, the learning capabilities of the networks can have significant effects on the performance of closed-loop systems. If the information content of data input to a network can be modified online, then it will more easily extract salient features of the data. The functional link acts on an element of an input vector, or on all the input vectors, by generating a set of linearly independent functions, then evaluating these functions with the pattern as the argument. Thus, both the training time and training error of the network can be improved [113]. Fig. 11 shows a functional link NN, which can be considered a one-layer feedforward network with an input preprocessing element. Only the weights in the output layer are adjusted [66]. The application of an FLN was presented for heating, ventilating, and air conditioning (HVAC) thermal dynamic system identification and control. The use of an NN provided a means of adapting a controller online, in an effort to minimize a given cost index. The identification networks demonstrated the capacity to learn changes in the plant dynamics and to accurately predict future plan behavior [113]. A robust ANN controller, to the motion control of rigid-link electrically driven robot using an FLN, had been presented. The method did not require the robot dynamics to be exactly known [69]. Multilayer feedforward and FLN forecasters were used to model the complex relationship between weather parameters and previous gas intake with future consumption [65]. The FLN was used to improve performance in the face of unknown nonlinear characteristics by adding nonlinear effects to the linear optimal controller of robotic systems [66]. J. Functional Polynomial Networks (FPNs) This network structure merges both models of functional link and polynomial network resulting in a very powerful ANN Common grids of CNNs. model due to the automatic input selection capability of the polynomial networks. The FPN presents advantages such as fast convergence, no local minima problem, structure automatically defined by the training process, and no adjustment of learning parameters. It has been tested for speed control with a dc motor and the results have been compared with the ones provided by an indirect adaptive control scheme based on MLPs trained by backpropagation [101]. K. Cellular NNs (CNNs) The most general definition for such networks is that they are arrays of identical dynamical systems, called cells, which are only locally connected. Only adjacent cells interact directly with each other [78]. In the simplest case, a CNN is an array of simple, identical, nonlinear, dynamical circuits placed on a two–dimensional (2-D) geometric grid, as shown in Fig. 12. If these grids are duplicated in a three–dimensional (3-D) form, a multilayer CNN can be constructed [30]. It is an efficient architecture for performing image processing and pattern recognition [51]. This kind of network has been applied to problems of image classification for quality control. Gulglielmi et al. [47] described a fluorescent magnetic particle inspection, which is a nondestructive method for quality control of ferromagnetic materials. V. TRAINING METHODS There are basically two main groups of training (or learning) algorithms: supervised learning (which includes reinforcement learning) and unsupervised learning. Once the structure of an NN has been selected, a training algorithm must be attached, to minimize the prediction error made by the network (for supervised learning) or to compress the information from the inputs (for unsupervised learning). In supervised learning, the correct results (target values, desired outputs) are known and are given to the ANN during training so that the ANN can adjust its weights to try match its outputs to the target values. After training, an ANN is tested as follows. One gives it only input values, not target values and sees how close the network comes to outputting the correct target values. Unsupervised learning involves no target values; it tries to auto-associate information from the inputs with an intrinsic reduction of data dimension, similar to extracting principal components in linear systems. This is the role of the training algorithms (i.e., fitting the model represented by the network to the training data available). The error, of a particular configuration of the network, can be 594 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 LMS or Widrow–Hoff Learning: The LMS algorithm is quite similar to perceptron learning algorithm. The differences are as follows. 1) The error is based on the sum of inputs to the unit rather than the binary-valued output to the unit . Therefore, (2) Fig. 13. Supervised learing scheme. determined by running all the training cases through the network and comparing the actual output generated with the desired or target outputs or clusters. In learning algorithms considerations, one is interested in whether a particular algorithm converges, its speed of convergence and the computational complexity of the algorithm. In supervised learning, a set of inputs and correct outputs is used to train the network. Before the learning algorithms are applied to update the weights, all the weights are initialized randomly [84]. The network, using this set of inputs, produces its own outputs. These outputs are compared with the correct outputs and the differences are used to modify the weights, as shown in Fig. 13. A special case of supervised learning is reinforcement learning, shown in Fig. 14, where there is no set of inputs and correct outputs. Training is commanded only by signals indicating if the produced output is bad or good, according to a defined criterion. Fig. 15 shows the principles of unsupervised learning, also known as self-organized learning, where a network develops its classification rules by extracting information from the inputs presented to the network. In other words, by using the correlation of the input vectors, the learning rule changes the network weights to group the input vectors into clusters. By doing so, similar input vectors will produce similar network outputs since they belong to a same cluster. 2) The linear sum of the inputs is passed through bipolar sigmoid functions, which produces the output 1 or 1, depending on the polarity of the sum. This algorithm can be used in structures as RBF networks and was successfully applied [43], [70]. Some CMAC approaches can also use this algorithm to adapt a complex robotic system involving multiple feedback sensors and multiple command variables [79]. Grossberg Learning: Sometimes known as in-star and out-star training, this algorithm is updated as follows: (3) where could be the desired input values (in-star training) or the desired output values (out-star training) depending on the network structure. B. First-Order Gradient Methods Backpropagation: Backpropagation is a generalization of the LMS algorithm. In this algorithm, an error function is defined as the mean-square difference between the desired output and the actual output of the feedforward network [45]. It is based on steepest descent techniques extended to each of the layers in the network by the chain rule. Hence, the algorithm , of the computes the partial derivative error function with respect to the weights. The error function , where is the is defined as is the network output. The objective is to desired output, by taking the error gradient minimize the error function with respect to the parameters or weight vector, for example, , that is to be adapted. The weights are then updated by using A. Early Supervised Learning Algorithms Early learning algorithms were designed for single layer NNs. They are generally more limited in their applicability, but their importance in history is remarkable. Perceptron Learning: A single-layer perceptron is trained as follows. 1) Randomly initialize all the networks weights. 2) Apply the inputs and calculate the sum of each unit . 3) The outputs from each unit are (4) where is the learning rate and (5) (1) This algorithm is simple to implement and computationally less complex than other modified forms. Despite some disadvantages, it is popularly used and there are numerous extensions to improve it. Some of these techniques will be presented. Backpropagation With Momentum (BPM): The basic improvement to the backpropagation algorithm is to introduce a momentum term in the weights updating equation where 4) Compute the error is the known desired output value. 5) Update each weight as . 6) Repeat steps 2)–4) until the errors reach the satisfactory level. (6) is commonly selected inside where the momentum factor [0,1]. Adding the momentum term improves the convergence speed and helps the network from being trapped in a local minimum. threshold otherwise MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs Fig. 14. 595 Reinforcement learning scheme. where (10) (11) Fig. 15. Unsupervised learning scheme. A modification to (6) was proposed in 1990, inserting the constant , defined by the user [84] (7) The idea was to reduce the possibility of the network being trapped in the local minimum. Delta–Bar–Delta (DBD): The DBD learning rules use adaptive learning rates, to speed up the convergence. The adaptive learning rate adopted is based on a local optimization method. This technique uses gradient descent for the search direction and then applies individual step sizes for each weight. This means the actual direction taken in weight space is not necessarily along the line of the steepest gradient. If the weight updates between consecutive iterations are in opposite directions, the step size is decreased; otherwise, it is increased. This is prompted by the idea that if the weight changes are oscillating, the minimum is between the oscillations and a smaller step size might find that minimum. The step size may be increased again once the error has stopped oscillating. denotes the learning rate for the weight , then If (8) and is as follows: (9) otherwise ) are specified by The positive constant and parameters ( is basically an exponentially dethe user. The quantity caying trace of gradient values. When the and are set to zero, the learning rates assume a constant value as in the standard backpropagation algorithm. Using momentum along with the DBD algorithm can enhance performance considerably. However, it can make the search diverge wildly, especially if is moderately large. The reason is that momentum magnifies learning rate increments and quickly leads to inordinately large learning steps. One possible solution is to keep factor very small, but this can easily lead to slow increase in and little speedup [84]. C. Second-Order Gradient Methods These methods use the Hessian matrix , which is the matrix of second derivatives of with respect to the weights . This matrix contains information about how the gradient changes in different directions in weight space (12) Newton Method: The Newton method weights update is processed as follows: (13) However, the Newton method is not commonly used because computing the Hessian matrix is computationally expensive. Furthermore, the Hessian matrix may not be positive definite at every point in the error surface. To overcome the problem, several methods have being proposed to approximate the Hessian matrix [84]. Gauss–Newton Method: The Gauss–Newton method promatrix that is an approximation to the Hessian duces an matrix, having elements represented by (14) 596 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 where . However, the Gauss–Newton method may still have ill conditioning if close to or is singular [75]. Levenberg–Marquardt (LM) Method: The LM method overcomes this difficulty by including an additional term, which is added to the Gauss–Newton approximation of the Hessian giving (15) where is a small positive value and is the identity matrix. could also be made adaptive by having if if (16) where is a value defined by the user. It is important to notice that when is large, the algorithm becomes backpropagation and, when is small, the algorithm bewith learning rate comes Gauss–Newton. An NN, trained by using this algorithm, can be found in the diagnosis of various motor bearing faults through appropriate measurement and interpretation of motor bearing vibration signals [72]. eligibility traces can be generated using the following linear difference equation: (22) determines the trace decay rate, is the where input, and is the output. Adaptive Critic Learning: The weights update in a critic network is as follows [9]: (23) , is a constant discount factor, is the where , is the reinforcement signal from the environment, and is the prediction at time of trace of the input variable . eventual reinforcement, which can be described as a linear func. The adaptive critic network output tion , the improved or internal reinforcement signal, is computed from the predictions as follows: (24) E. Unsupervised Learning D. Reinforcement Learning Algorithms Reinforcement learning has one of its roots in psychology, from the idea of rewards and penalties used to teach animals to do simple tasks with small amounts of feedback information. Barto and others proposed the Adaptive Critic Learning algorithm to solve discrete domain decision-making problems in the 1980s. Their approach was generalized to the NNs field later used by Sutton, who used CMAC and other ANN structures to learn paths for mobile robots in maze-like environments [111]. Linear Reward–Penalty Learning: When the reinforcement signal is positive ( 1), the learning rule is for (17) (18) If the reinforcement signal is negative ( 1), the learning rule is (19) for (20) denotes the probability where and are learning rates, at iteration , and is the number of actions taken. For positive reinforcement, the probability of the current action is increased with relative decrease in the probabilities of the other actions. The adjustment is reversed in the case of negative reinforcement. Associative Search Learning: In this algorithm, the weights are updated as follows [9]: (21) is the reinforcement signal and is eligibility. where Positive indicates the occurrence of a rewarding event and negative indicates the occurrence of a punishing event. It can be regarded as a measure of the change in the value of a performance criterion. Eligibility reflects the extent of activity in the pathway or connection link. Exponentially decaying Hebbian Learning: Weights are updated as follows: (25) (26) is the weight from th unit to th unit at time where is the excitation level of the source unit or th step , is the excitation level of the destination unit unit, and or the th output unit. In this system, learning is a purely local phenomenon, involving only two units and a synapse. No global feedback system is required for the neural pattern to develop. A special case of Hebbian learning is correlation and learning, which uses binary activation for function is defined as the desired excitation level for the destination unit. While Hebbian learning is performed in unsupervised environments, correlation learning is supervised [128]. Boltzmann Machine Learning: The Boltzmann Machine training algorithm uses a kind of stochastic technique known as simulated annealing, to avoid being trapped in local minima of the network energy function. The algorithm is as follows. 1) Initialize weights. 2) Calculate activation as follows. a) Select an initial temperature. b) Until thermal equilibrium, repeatedly calculate the probability that is active by (23). c) Exit when the lowest temperature is reached. Otherwise, reduce temperature by a certain annealing schedule and repeat step 2) (27) is the total input Above, is the temperature, received by the th unit, and the activation level of unit is set according to this probability. MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs 597 Kohonen Self-Organizing Learning: The network is trained according to the following algorithm, frequently called the “winner-takes-all” rule. 1) Apply an input vector . (in -dimensional space) be2) Calculate the distance of each unit. In Eutween and the weight vectors clidean space, this is calculated as follows: (28) 3) The unit that has the weight vector closest to is declared , becomes the winner unit. This weight vector, called the center of a group of weight vectors that lie within a . distance from , train 4) For all weight vectors within a distance of this group of nearby weight vectors according to the formula that follows: (29) 5) Perform steps 1)–4), cycling through each input vector until convergence. F. Practical Considerations NNs are unsurpassed at identifying patterns or trends in data and well suited for prediction or forecasting needs including sales and customer research, data validation, risk management, and industrial process control. One of the fascinating aspects, of the practical implementation of NNs to industrial applications, is the ability to manage data interaction between electrical and mechanical behavior and often other disciplines, as well. The majority of the reported applications involve fault diagnosis and detection, quality control, pattern recognition, and adaptive control [14], [44], [74], [115]. Supervised NNs can mimic the behavior of human control systems, as long as data corresponding to the human operator and the control input are supplied [7], [126]. Most of the existing, successful applications in control use supervised learning, or any form of a reinforcement learning approach that is also supervised. Unsupervised learning is not suitable, particularly for online control, due the slow adaptation and required time for the network to settle into stable conditions. Unsupervised learning schemes are used mostly for pattern recognition, by defining group of patterns into a number of clusters or classes. There are some advantages to NNs over multiple data regression. There is no need to select the most important independent variables in the data set. The synapses associated with irrelevant variables readily show negligible weight values; in their turn, relevant variables present significant synapse weight values. There is also no need to propose a model function as required in multiple regressions. The learning capability of NNs allows them to discover more complex and subtle interactions between the independent variables, contributing to the development of a model with maximum precision. NNs are intrinsically robust showing more immunity to noise, an important factor in modeling industrial processes. NNs have been applied within industrial domains, to address the inherent complexity of interacting processes under the lack of robust analytical models of real industrial processes. In many cases, network topologies and training parameters are systematically varied until satisfactory convergence is achieved. Currently, the most widely used algorithm for training MLPs is the backpropagation algorithm. It minimizes the mean square error between the desired and the actual output of the network. The optimization is carried out with a gradient-descent technique. There are two critical issues in network learning: estimation error and training time. These issues may be affected by the network architecture and the training set. The network architecture includes the number of hidden nodes, number of hidden layers and values of learning parameters. The training set is related to the number of training patterns, inaccuracies of input data and preprocessing of data. The backpropagation algorithm does not always find the global minimum, but may stop at a local minimum. In practice, the type of minimum has little importance, as long as the desired mapping or classification is reached with a desired accuracy. The optimization criterion of the backpropagation algorithm is not very good, from the pattern recognition point of view. The algorithm minimizes the square error between the actual and the desired output, not the number of faulty classifications, which is the main goal in pattern recognition. The algorithm is too slow for practical applications, especially if many hidden layers are used. In addition, a backpropagation net has poor memory. When the net learns something new it forgets the old. Despite its shortcomings, bac-propagation is broadly used. Although the back-propagation algorithm has been a significant milestone, many attempts have been made to speed up the convergence and significant improvement are observed by using various second order approaches, namely, Newton’s method, conjugate gradient’s, or the LM optimization technique [5], [10], [21], [48]. The issues to be dealt with are [84], [102] as follows: 1) 2) 3) 4) slow convergence speed; sensitivity to initial conditions; trapping in local minima; instability if learning rate is too large. One of the alternatives for the problem of being trapped in a local minimum is adding the momentum term using the BPM, which also improves the convergence speed. Another alternative used when a backpropagation algorithm is difficult to implement, as in analog hardware, is the random weight change (RWC). This algorithm has shown to be immune to offset sensitivity and nonlinearity errors. It is a stochastic learning that makes sure that the error function decreases on average, since it is going up or down at any one time. It is often called simulated annealing because of its operational similarity to annealing processes [73]. Second order gradient methods use the matrix with respect to the weights . of second derivatives of However, computing this matrix is computationally expensive and the methods presented tried to approximate this matrix to make algorithms more accessible. Linear reward–penalty, associative search, and adaptive critic algorithms are characterized as a special case of supervised learning called reinforcement learning. They do not need to explicitly compute derivatives. Computation of derivatives usually introduces a lot of high frequency noise in the control loop. Therefore, they are very suitable for some complex systems, where basic training algorithms may fail or produce suboptimal results. On the other hand, those methods present 598 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 slower learning processes; and, because of this, they are adopted especially in the cases where only a single bit of information (for example, whether the output is right or wrong) is available. themselves, have to be found from data in the training phase, without supervision. C. Process Control VI. TAXONOMY OF NN APPLICATIONS From the viewpoint of industrial applications, ANN applications can be divided into four main categories. A. Modeling and Identification Modeling and identification are techniques to describe the physical principles existing between the input and the output of a system. The ANN can approximate these relationships independent of the size and complexity of the problem. It has been found to be an effective system for learning discriminants, for patterns from a body of examples. MLP is used as the basic structure for a bunch of applications [4], [12], [17], [18], [32], [46], [56], [85], [94], [119], [127]. The Hopfield network can be used to identify problems of linear time-varying or time-invariant systems [28]. Recurrent network topology [36], [83] has received considerable attention for the identification of nonlinear dynamical systems. A functional-link NN approach (FLN) was used to perform thermal dynamical system identification [113]. B. Optimization and Classification Optimization is often required for design, planning of actions, motions, and tasks. However, as known in the Traveling Salesman Problem, many parameters cause the amount of calculation to be tremendous and the ordinary method cannot be applied. An affective approach is to find the optimal solution by defining an energy function and using the NN with parallel processing, learning and self-organizing capabilities to operate in such a way that the energy is reduced. It is shown that application of the optimal approach makes effective use of ANN sensing, recognizing, and forecasting capabilities, in the control of robotic manipulators with impact taken into account [45]. Classification using an ANN can also be viewed as an optimization problem, provided that the existing rules to distinguish the various classes of events/materials/objects can be described in functional form. In such cases, the networks will decide if a particular input belongs to one of the defined classes by optimizing the functional rules and a posteriori evaluating the achieved results. Different authors [34], [70] have proposed RBF approaches. For applications such as fault diagnosis, RBF networks offer clear advantages over MLPs. They are faster to train, because layer training is decoupled [70]. Cellular networks [47], ART networks [122], and Hopfield networks [105], [112] can be used as methods to detect, to isolate faults, and to promote industrial quality control. MLPs are also widely used for these purposes [23], [25], [35], [102], [114], [121]. This structure can be found in induction motor [41], [42] and bearing [72] fault diagnosis, for nondestructive evaluation of check valve performance and degradation [2], [57], in defect detection on woven fabrics [99], and in robotic systems [116]. Finally, it is important to mention clustering applications, which are special cases of classification where there is no supervision during the training phase. The relationships between elements of the existing classes and even the classes The NN makes use of nonlinearity, learning, parallel processing, and generalization capabilities for application to advanced intelligent control. They can be classified into some major methods, such as supervised control, inverse control, neural adaptive control, back-propagation of utility (which is an extended method of a back-propagation through time) and adaptive critics (which is an extended method of reinforcement learning algorithm) [2]. MLP structures were used for digital current regulation of inverter drives [16], to predict trajectories in robotic environments [19], [40], [52], [73], [79], [87], [89], [110], to control turbo generators [117], to monitor feed water flow rate and component thermal performance of pressurized water reactors [61], to regulate temperature [64], and to predict natural gas consumption [65]. Dynamical versions of MLP networks were used to control a nonlinear dynamic model of a robot [60], [97], to control manufacturing cells [92], and to implement a programmable cascaded low-pass filter [101]. A dynamic MLP is a classical MLP structure where the outputs are fed back to the inputs by means of time delay elements. Other structures can be found as functional link networks to control robots [69] and RBF networks to predict, from operating conditions and from features of a steel sheet, the thermal energy required to correct alloying [13]. RBF networks can be observed, as well, in predictive controllers for drive systems [38]. Hopfield structures were used for torque minimization control of redundant manipulators [33]. FPNs can be used for function approximation inside specific control schemes [100]. CMAC networks were implemented in research automobiles [76] and to control robots [79]. D. Pattern Recognition Some specific ANN structures, such as Kohonen and probabilistic networks, are studied and applied mainly for image and voice recognition. Research in image recognition includes initial vision (stereo vision of both eyes, outline extraction, etc.) close to the biological (particularly brain) function, manually written character recognition by cognition at the practical level and cell recognition for mammalian cell cultivation by using NNs [45]. Kohonen networks were used for image inspection and for disease identification from mammographic images [98]. Probabilistic networks were used for transient detection to enhance nuclear reactors’ operational safety [6]. As in the others categories, the MLP is widely used as well. The papermaking industry [39] is one such example. VII. CONCLUSION This paper has described theoretical aspects of NNs related to their relevance for industrial applications. Common questions that an engineer would ask when choosing an NN for a particular application were answered. Characteristics of industrial processes, which would justify the ANN utilization, were discussed and some areas of importance were proposed. Important structures and training methods, with relevant references that illustrated the utilization of those concepts, were presented. MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs This survey observed that, although ANNs have a history of more than 50 years, most of industrial applications were launched in the last ten years, where it was justified that the investigators provided either an alternative or a complement to other classical techniques. Those ANN applications demonstrated adaptability features integrated with the industrial problem, thus becoming part of the industrial processes. The authors firmly believe that such an intricate field of NNs is just starting to permeate a broad range of interdisciplinary problem solving streams. The potential of NNs will be integrated into a still larger and all-encompassing field of intelligence systems and will soon be taught for students and engineers as an ordinary mathematical tool. REFERENCES [1] J. S. Albus, “A new approach to manipulator control: The cerebellar model articulation controller,” Trans. ASME, J. Dyn. Syst., Meas. Control, vol. 97, pp. 220–227, Sept. 1975. [2] I. E. Alguíndigue and R. E. Uhrig, “Automatic fault recognition in mechanical components using coupled artificial neural networks,” in Proc. IEEE World Congr. Computational Intelligence, June–July 1994, pp. 3312–3317. [3] P. E. M. Almeida and M. G. Simões, “Fundamentals of a fast convergence parametric CMAC network,” in Proc. IJCNN’01, vol. 3, 2001, pp. 3015–3020. [4] K. Andersen, G. E. Cook, G. Karsai, and K. Ramaswamy, “Artificial neural networks applied to arc welding process modeling and control,” IEEE Trans. Ind. Applicat., vol. 26, pp. 824–830, Sept./Oct. 1990. [5] T. J. Andersen and B. M. Wilamowski, “A modified regression algorithm for fast one layer neural network training,” in Proc. World Congr. Neural Networks, vol. 1, Washington DC, July 17–21, 1995, pp. 687–690. [6] I. K. Attieh, A. V. Gribok, J. W. Hines, and R. E. Uhrig, “Pattern recognition techniques for transient detection to enhance nuclear reactors’ operational safety,” in Proc. 25th CNS/CNA Annu. Student Conf., Knoxville, TN, Mar. 2000. [7] S. M. Ayala, G. Botura Jr., and O. A. Maldonado, “AI automates substation control,” IEEE Comput. Applicat. Power, vol. 15, pp. 41–46, Jan. 2002. [8] D. L. Bailey and D. M. Thompson, “Developing neural-network applications,” AI Expert, vol. 5, no. 9, pp. 34–41, 1990. [9] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike elements that can solve difficult control problems,” IEEE Trans. Syst., Man, Cybern., vol. SMC-13, pp. 834–846, Sept./Oct. 1983. [10] R. Battiti, “First- and second-order methods for learning: Between steepest descent and Newton’s method,” Neural Computation, vol. 4, no. 2, pp. 141–166, 1992. [11] B. Bavarian, “Introduction to neural networks for intelligent control,” IEEE Contr. Syst. Mag., vol. 8, pp. 3–7, Apr. 1988. [12] N. V. Bhat, P. A. Minderman, T. McAvoy, and N. S. Wang, “Modeling chemical process systems via neural computation,” IEEE Contr. Syst. Mag., vol. 10, pp. 24–30, Apr. 1990. [13] G. Bloch, F. Sirou, V. Eustache, and P. Fatrez, “Neural intelligent control for a steel plant,” IEEE Trans. Neural Networks, vol. 8, pp. 910–918, July 1997. [14] Z. Boger, “Experience in developing models of industrial plants by large scale artificial neural networks,” in Proc. Second New Zealand International Two-Stream Conf. Artificial Neural Networks and Expert Systems, 1995, pp. 326–329. [15] D. S. Broomhead and D. Lowe, “Multivariable functional interpolation and adaptive network,” Complex Syst., vol. 2, pp. 321–355, 1988. [16] M. Buhl and R. D. Lorenz, “Design and implementation of neural networks for digital current regulation of inverter drives,” in Conf. Rec. IEEE-IAS Annu. Meeting, 1991, pp. 415–423. [17] B. Burton and R. G. Harley, “Reducing the computational demands of continually online-trained artificial neural networks for system identification and control of fast processes,” IEEE Trans. Ind. Applicat., vol. 34, pp. 589–596, May/June 1998. [18] B. Burton, F. Kamran, R. G. Harley, T. G. Habetler, M. Brooke, and R. Poddar, “Identification and control of induction motor stator currents using fast on-line random training of a neural network,” in Conf. Rec. IEEE-IAS Annu. Meeting, 1995, pp. 1781–1787. 599 [19] R. Carelli, E. F. Camacho, and D. Patiño, “A neural network based feed forward adaptive controller for robots,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 1281–1288, Sept. 1995. [20] G. A. Carpenter and S. Grossberg, “Associative learning, adaptive pattern recognition and cooperative- competitive decision making,” in Optical and Hybrid Computing, H. Szu, Ed. Bellingham, WA: SPIE, 1987, vol. 634, pp. 218–247. [21] C. Charalambous, “Conjugate gradient algorithm for efficient training of artificial neural networks,” Proc. Inst. Elect. Eng., vol. 139, no. 3, pp. 301–310, 1992. [22] S. Chen and S. A. Billings, “Neural networks for nonlinear dynamic system modeling and identification,” Int. J. Control, vol. 56, no. 2, pp. 319–346, 1992. [23] R. P. Cherian, L. N. Smith, and P. S. Midha, “A neural network approach for selection of powder metallurgy materials and process parameters,” Artif. Intell. Eng., vol. 14, pp. 39–44, 2000. [24] M. Y. Chow, P. M. Mangum, and S. O. Yee, “A neural network approach to real-time condition monitoring of induction motors,” IEEE Trans. Ind. Electron., vol. 38, pp. 448–453, Dec. 1991. [25] M. Y. Chow, R. N. Sharpe, and J. C. Hung, “On the application and design of artificial neural networks for motor fault detection—Part II,” IEEE Trans. Ind. Electron., vol. 40, pp. 189–196, Apr. 1993. , “On the application and design of artificial neural networks for [26] motor fault detection—Part I,” IEEE Trans. Ind. Electron., vol. 40, pp. 181–188, Apr. 1993. [27] T. W. S. Chow and Y. Fang, “A recurrent neural-network based real-time learning control strategy applying to nonlinear systems with unknown dynamics,” IEEE Trans. Ind. Electron., vol. 45, pp. 151–161, Feb. 1998. [28] S. R. Chu and R. Shoureshi, “Applications of neural networks in learning of dynamical systems,” IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 160–164, Jan./Feb. 1992. [29] S. R. Chu, R. Shoureshi, and M. Tenorio, “Neural networks for system identification,” IEEE Contr. Syst. Mag., vol. 10, pp. 31–35, Apr. 1990. [30] L. O. Chua, T. Roska, T. Kozek, and Á. Zarándy, “The CNN Paradigm—A short tutorial,” in Cellular Neural Networks, T. Roska and J. Vandewalle, Eds. New York: Wiley, 1993, pp. 1–14. [31] M. Cichowlas, D. Sobczuk, M. P. Kazmierkowski, and M. Malinowski, “Novel artificial neural network based current controller for PWM rectifiers,” in Proc. 9th Int. Conf. Power Electronics and Motion Control, 2000, pp. 41–46. [32] G. E. Cook, R. J. Barnett, K. Andersen, and A. M. Strauss, “Weld modeling and control using artificial neural network,” IEEE Trans. Ind. Applicat., vol. 31, pp. 1484–1491, Nov./Dec. 1995. [33] H. Ding and S. K. Tso, “A fully neural-network-based planning scheme for torque minimization of redundant manipulators,” IEEE Trans. Ind. Electron., vol. 46, pp. 199–206, Feb. 1999. [34] G. Dini and F. Failli, “Planning grasps for industrial robotized applications using neural networks,” Robot. Comput. Integr. Manuf., vol. 16, pp. 451–463, Dec. 2000. [35] M. Dolen and R. D. Lorenz, “General methodologies for neural network programming,” in Proc. IEEE Applied Neural Networks Conf., Nov. 1999, pp. 337–342. , “Recurrent neural network topologies for spectral state estimation [36] and differentiation,” in Proc. ANNIE Conf., St. Louis, MO, Nov. 2000. [37] M. Dolen, P. Y. Chung, E. Kayikci, and R. D. Lorenz, “Disturbance force estimation for CNC machine tool feed drives by structured neural network topologies,” in Proc. ANNIE Conference, St. Louis, MO, Nov. 2000. [38] Y. Dote, M. Strefezza, and A. Suyitno, “Neuro fuzzy robust controllers for drive systems,” in Proc. IEEE Int. Symp. Industrial Electronics, 1993, pp. 229–242. [39] P. J. Edwards, A. F. Murray, G. Papadopoulos, A. R. Wallace, J. Barnard, and G. Smith, “The application of neural networks to the papermaking industry,” IEEE Trans. Neural Networks, vol. 10, pp. 1456–1464, Nov. 1999. [40] M. J. Er and K. C. Liew, “Control of adept one SCARA robot using neural networks,” IEEE Trans. Ind. Electron., vol. 44, pp. 762–768, Dec. 1997. [41] F. Filippetti, G. Franceschini, and C. Tassoni, “Neural networks aided on-line diagnostics of induction motor rotor faults,” IEEE Trans. Ind. Applicat., vol. 31, pp. 892–899, July/Aug. 1995. [42] F. Filippetti, G. Franceschini, C. Tassoni, and P. Vas, “Recent developments of induction motor drives fault diagnosis using AI techniques,” IEEE Trans. Ind. Electron., vol. 47, pp. 994–1004, Oct. 2000. [43] D. Flynn, S. McLoone, G. W. Irwin, M. D. Brown, E. Swidenbank, and B. W. Hogg, “Neural control of turbogenerator systems,” Automatica, vol. 33, no. 11, pp. 1961–1973, 1997. 600 [44] D. B. Fogel, “Selecting an optimal neural network industrial electronics society,” in Proc. IEEE IECON’90, vol. 2, 1990, pp. 1211–1214. [45] T. Fukuda and T. Shibata, “Theory and applications of neural networks for industrial control systems,” IEEE Trans. Ind. Applicat., vol. 39, pp. 472–489, Nov./Dec. 1992. [46] A. A. Gorni, “The application of neural networks in the modeling of plate rolling processes,” JOM-e, vol. 49, no. 4, electronic document, Apr. 1997. [47] N. Guglielmi, R. Guerrieri, and G. Baccarani, “Highly constrained neural networks for industrial quality control,” IEEE Trans. Neural Networks, vol. 7, pp. 206–213, Jan. 1996. [48] M. T. Hagan and M. Menhaj, “Training feedforward networks with the Marquardt algorithm,” IEEE Trans. Neural Networks, vol. 5, pp. 989–993, Nov. 1994. [49] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. New York: Prentice-Hall, 1995. [50] S. M. Halpin and R. F. Burch, “Applicability of neural networks to industrial and commercial power systems: A tutorial overview,” IEEE Trans. Ind. Applicat., vol. 33, pp. 1355–1361, Sept./Oct. 1997. [51] H. Harrer and J. Nossek, “Discrete-time cellular neural networks,” in Cellular Neural Networks, T. Roska and J. Vandewalle, Eds. New York: Wiley, 1993, pp. 15–29. [52] H. Hashimoto, T. Kubota, M. Sato, and F. Harashima, “Visual control of robotic manipulator based on neural networks,” IEEE Trans. Ind. Electron., vol. 39, pp. 490–496, Dec. 1992. [53] D. O. Hebb, The Organization of Behavior. New York: Wiley, 1949. [54] G. E. Hinton and T. J. Sejnowski, “Learning and relearning in Boltzmann machines,” in The PDP Research Group, D. Rumelhart and J. McClelland, Eds. Cambridge, MA: MIT Press, 1986. [55] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” in Proc. Nat. Acad. Sci., vol. 79, Apr. 1982, pp. 2445–2558. [56] C. Y. Huang, T. C. Chen, and C. L. Huang, “Robust control of induction motor with a neural-network load torque estimator and a neural-network identification,” IEEE Trans. Ind. Electron., vol. 46, pp. 990–998, Oct. 1999. [57] A. Ikonomopoulos, R. E. Uhrig, and L. H. Tsoukalas, “Use of neural networks to monitor power plant components,” in Proc. American Power Conf., vol. 54-II, Apr. 1992, pp. 1132–1137. [58] M. Jelínek. (1999) Everything you wanted to know about ART neural networks, but were afraid to ask. [Online]. Available: http://cs.felk.cvut.cz/~xjeline1/semestralky/nan [59] S. Jung and T. C. Hsia, “Neural network impedance force control of robot manipulator,” IEEE Trans. Ind. Electron., vol. 45, pp. 451–461, June 1998. [60] A. Karakasoglu and M. K. Sundareshan, “A recurrent neural networkbased adaptive variable structure model-following control of robotic manipulators,” Automatica, vol. 31, no. 10, pp. 1495–1507, 1995. [61] K. Kavaklioglu and B. R. Upadhyaya, “Monitoring feedwater flow rate and component thermal performance of pressurized water reactors by means of artificial neural networks,” Nucl. Technol., vol. 107, pp. 112–123, July 1994. [62] M. Kawato, Y. Uno, M. Isobe, and R. Suzuki, “Hierarchical neural network model for voluntary movement with application to robotics,” IEEE Contr. Syst. Mag., vol. 8, pp. 8–15, Apr. 1988. [63] M. Khalid and S. Omatu, “A neural network controller for a temperature control system,” IEEE Contr. Syst. Mag., vol. 12, pp. 58–64, June 1992. [64] M. Khalid, S. Omatu, and R. Yusof, “Temperature regulation with neural networks and alternative control schemes,” IEEE Trans. Neural Networks, vol. 6, pp. 572–582, May 1995. [65] A. Khotanzad, H. Elragal, and T. L. Lu, “Combination of artificial neural-network forecasters for prediction of natural gas consumption,” IEEE Trans. Neural Networks, vol. 11, pp. 464–473, Mar. 2000. [66] Y. H. Kim, F. L. Lewis, and D. M. Dawson, “Intelligent optimal control of robotic manipulators using neural network,” Automatica, vol. 36, no. 9, pp. 1355–1364, 2000. [67] B. Kosko, “Adaptive bi-directional associative memories,” Appl. Opt., vol. 26, pp. 4947–4960, 1987. [68] S. Y. Kung and J. N. Hwang, “Neural network architectures for robotic applications,” IEEE Trans. Robot. Automat., vol. 5, pp. 641–657, Oct. 1989. [69] C. Kwan, F. L. Lewis, and D. M. Dawson, “Robust neural-network control of rigid-link electrically driven robots,” IEEE Trans. Neural Networks, vol. 9, pp. 581–588, July 1998. [70] J. A. Leonard and M. A. Kramer, “Radial basis function networks for classifying process faults,” IEEE Contr. Syst. Mag., vol. 11, pp. 31–38, Apr. 1991. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 50, NO. 3, JUNE 2003 [71] F. L. Lewis, A. Yesildirek, and K. Liu, “Multilayer neural-net robot controller with guaranteed tracking performance,” IEEE Trans. Neural Networks, vol. 7, pp. 388–399, Mar. 1996. [72] B. Li, M. Y. Chow, Y. Tipsuwan, and J. C. Hung, “Neural-network based motor rolling bearing fault diagnosis,” IEEE Trans. Ind. Electron., vol. 47, pp. 1060–1069, Oct. 2000. [73] J. Liu, B. Burton, F. Kamran, M. A. Brooke, R. G. Harley, and T. G. Habetler, “High speed on-line neural network of an induction motor immune to analog circuit nonidealities,” in Proc. IEEE Int. Symp. Circuits and Systems, June 1997, pp. 633–636. [74] Y. Liu, B. R. Upadhyaya, and M. Naghedolfeizi, “Chemometric data analysis using artificial neural networks,” Appl. Spectrosc., vol. 47, no. 1, pp. 12–23, 1993. [75] L. Ljung, System Identification: Theory for the User. New York: Prentice-Hall, 1987. [76] M. Majors, J. Stori, and D. Cho, “Neural network control of automotive fuel-injection systems,” IEEE Contr. Syst. Mag., vol. 14, pp. 31–36, June 1994. [77] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bull. Math. Biophys., vol. 5, pp. 115–133, 1943. [78] M. Milanova, P. E. M. Almeida, J. Okamoto Jr., and M. G. Simões, “Applications of cellular neural networks for shape from shading problem,” in Proc. Int. Workshop Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Artificial Intelligence, P. Perner and M. Petrou, Eds., Leipzig, Germany, September 1999, pp. 52–63. [79] W. T. Miller III, “Real-time application of neural networks for sensorbased control of robots with vision,” IEEE Trans. Syst., Man, Cybern., vol. 19, pp. 825–831, July/Aug. 1989. [80] M. L. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press, 1969. [81] T. Munakata, Fundamentals of the New Artificial Intelligence—Beyond Traditional Paradigms. Berlin, Germany: Springer-Verlag, 1998. [82] S. R. Naidu, E. Zafiriou, and T. J. McAvoy, “Use of neural networks for sensor failure detection in a control system,” IEEE Contr. Syst. Mag., vol. 10, pp. 49–55, Apr. 1990. [83] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. Neural Networks, vol. 1, pp. 4–27, Mar. 1990. [84] G. W. Ng, Application of Neural Networks to Adaptive Control of Nonlinear Systems. London, U.K.: Research Studies Press, 1997. [85] D. H. Nguyen and B. Widrow, “Neural networks for self-learning control systems,” IEEE Contr. Syst. Mag., vol. 10, pp. 18–23, Apr. 1990. [86] J. R. Noriega and H. Wang, “A direct adaptive neural-network control for unknown nonlinear systems and its application,” IEEE Trans. Neural Networks, vol. 9, pp. 27–34, Jan. 1998. [87] T. Ozaki, T. Suzuki, T. Furuhashi, S. Okuma, and Y. Uchikawa, “Trajectory control of robotic manipulators using neural networks,” IEEE Trans. Ind. Electron., vol. 38, June 1991. [88] D. B. Parker, “A comparison of algorithms for neuron-like cells,” in Neural Networks for Computing, J. S. Denker, Ed. New York: American Institute of Physics, 1986, pp. 327–332. [89] P. Payeur, H. Le-Huy, and C. M. Gosselin, “Trajectory prediction for moving objects using artificial neural networks,” IEEE Trans. Ind. Electron., vol. 42, pp. 147–158, Apr. 1995. [90] M. H. Rahman, R. Fazlur, R. Devanathan, and Z. Kuanyi, “Neural network approach for linearizing control of nonlinear process plants,” IEEE Trans. Ind. Electron., vol. 47, pp. 470–477, Apr. 2000. [91] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psych. Rev., vol. 65, pp. 386–408, 1958. [92] G. A. Rovithakis, V. I. Gaganis, S. E. Perrakis, and M. A. Christodoulou, “Real-time control of manufacturing cells using dynamic neural networks,” Automatica, vol. 35, no. 1, pp. 139–149, 1999. [93] A. Rubaai and M. D. Kankam, “Adaptive real-time tracking controller for induction motor drives using neural designs,” in Conf. Rec. IEEE-IAS Annu. Meeting, vol. 3, Oct. 1996, pp. 1709–1717. [94] A. Rubaai and R. Kotaru, “Online identification and control of a DC motor using learning adaptation of neural networks,” IEEE Trans. Ind. Applicat., vol. 36, pp. 935–942, May/June 2000. [95] D. D. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986. [96] D. E. Rumelhart, B. Widrow, and M. A. Lehr, “The basic ideas in neural networks,” Commun. ACM, vol. 37, no. 3, pp. 87–92, Mar. 1994. [97] M. Saad, P. Bigras, L. A. Dessaint, and K. A. Haddad, “Adaptive robot control using neural networks,” IEEE Trans. Ind. Electron., vol. 41, pp. 173–181, Apr. 1994. MEIRELES et al.: INDUSTRIAL APPLICABILITY OF ANNs [98] S. Sardy and L. Ibrahim, “Experimental medical and industrial applications of neural networks to image inspection using an inexpensive personal computer,” Opt. Eng., vol. 35, no. 8, pp. 2182–2187, Aug. 1996. [99] S. Sardy, L. Ibrahim, and Y. Yasuda, “An application of vision system for the identification and defect detection on woven fabrics by using artificial neural networks,” in Proc. Int. Joint Conf. Neural Networks, 1993, pp. 2141–2144. [100] A. P. A. Silva, P. C. Nascimento, G. L. Torres, and L. E. B. Silva, “An alternative approach for adaptive real-time control using a nonparametric neural network,” in Conf. Rec. IEEE-IAS Annu. Meeting, 1995, pp. 1788–1794. [101] L. E. B. Silva, B. K. Bose, and J. O. P. Pinto, “Recurrent-neural-network-based implementation of a programmable cascaded low-pass filter used in stator flux synthesis of vector-controlled induction motor drive,” IEEE Trans. Ind. Electron., vol. 46, pp. 662–665, June 1999. [102] T. Sorsa, H. N. Koivo, and H. Koivisto, “Neural networks in process fault diagnosis,” IEEE Trans. Syst., Man. Cybern., vol. 21, pp. 815–825, July/Aug. 1991. [103] D. F. Specht, “Probabilistic neural networks for classification, mapping, or associative memory,” in Proc. IEEE Int. Conf. Neural Networks, July 1988, pp. 525–532. , “Probabilistic neural networks,” Neural Networks, vol. 3, pp. [104] 109–118, 1990. [105] A. Srinivasan and C. Batur, “Hopfield/ART-1 neural network-based fault detection and isolation,” IEEE Trans. Neural Networks, vol. 5, pp. 890–899, Nov. 1994. [106] W. E. Staib and R. B. Staib, “The intelligence arc furnace controller: A neural network electrode position optimization system for the electric arc furnace,” presented at the IEEE Int. Joint Conf. Neural Networks, New York, NY, 1992. [107] R. Steim, “Preprocessing data for neural networks,” AI Expert, pp. 32–37, Mar. 1993. [108] K. Steinbuch and U. A. W. Piske, “Learning matrices and their applications,” IEEE Trans. Electron. Comput., vol. EC-12, pp. 846–862, Dec. 1963. [109] F. Sun, Z. Sun, and PY. Woo, “Neural network-based adaptive controller design of robotic manipulators with an observer,” IEEE Trans. Neural Networks, vol. 12, pp. 54–67, Jan. 2001. [110] M. K. Sundareshan and C. Askew, “Neural network-assisted variable structure control scheme for control of a flexible manipulator arm,” Automatica, vol. 33, no. 9, pp. 1699–1710, 1997. [111] R. S. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1996, vol. 8, pp. 1038–1044. [112] H. H. Szu, “Automatic fault recognition by image correlation neural network techniques,” IEEE Trans. Ind. Electron., vol. 40, pp. 197–208, Apr. 1993. [113] J. Teeter and M. Y. Chow, “Application of functional link neural network to HVAC thermal dynamic system identification,” IEEE Trans. Ind. Electron., vol. 45, pp. 170–176, Feb. 1998. [114] L. Tsoukalas and J. Reyes-Jimenez, “Hybrid expert system-neural network methodology for nuclear plant monitoring and diagnostics,” in Proc. SPIE Applications of Artificial Intelligence VIII, vol. 1293, Apr. 1990, pp. 1024–1030. [115] R. E. Uhrig, “Application of artificial neural networks in industrial technology,” in Proc. IEEE Int. Conf. Industrial Technology, 1994, pp. 73–77. [116] A. T. Vemuri and M. M. Polycarpou, “Neural-network-based robust fault diagnosis in robotic systems,” IEEE Trans. Neural Networks, vol. 8, pp. 1410–1420, Nov. 1997. [117] G. K. Venayagamoorthy and R. G. Harley, “Experimental studies with a continually online-trained artificial neural network controller for a turbo generator,” in Proc. Int. Joint Conf. Neural Networks, vol. 3, Washington, DC, July 1999, pp. 2158–2163. [118] B. W. Wah and G. J. Li, “A survey on the design of multiprocessing systems for artificial intelligence applications,” IEEE Trans. Syst., Man, Cybern., vol. 19, pp. 667–692, July/Aug. 1989. [119] S. Weerasooriya and M. A. El-Sharkawi, “Identification and control of a DC motor using back-propagation neural networks,” IEEE Trans. Energy Conversion, vol. 6, pp. 663–669, Dec. 1991. [120] P. J. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioral sciences,” Ph.D. dissertation, Harvard Univ., Cambridge, MA, 1974. [121] , “Maximizing long-term gas industry profits in two minutes in lotus using neural network methods,” IEEE Trans. Syst., Man, Cybern., vol. 19, pp. 315–333, Mar./Apr. 1989. 601 [122] J. R. Whiteley, J. F. Davis, A. Mehrotra, and S. C. Ahalt, “Observations and problems applying ART2 for dynamic sensor pattern interpretation,” IEEE Trans. Syst., Man, Cybern. A, vol. 26, pp. 423–437, July 1996. [123] B. Widrow, DARPA Neural Network Study. Fairfax, VA: Armed Forces Communications and Electronics Assoc. Int. Press, 1988. [124] B. Widrow and M. E. Hoff Jr., “Adaptive switching circuits,” 1960 IRE Western Electric Show Conv. Rec., pt. 4, pp. 96–104, Aug. 1960. [125] N. Wiener, Cybernetics. Cambridge, MA: MIT Press, 1961. [126] M. J. Willis, G. A. Montague, D. C. Massimo, A. J. Morris, and M. T. Tham, “Artificial neural networks and their application in process engineering,” in IEE Colloq. Neural Networks for Systems: Principles and Applications, 1991, pp. 71–74. [127] M. Wishart and R. G. Harley, “Identification and control of induction machines using artificial neural networks,” IEEE Trans. Ind. Applicat., vol. 31, pp. 612–619, May/June 1995. [128] J. M. Zurada, Introduction to Artificial Neural Networks. Boston, MA: PWS–Kent, 1995. Magali R. G. Meireles received the B.E. degree from the Federal University of Minas Gerais, Belo Horizonte, Brazil, in 1986, and the M.Sc. degree from the Federal Center for Technological Education, Belo Horizonte, Brazil, in 1998, both in electrical engineering. She is an Associate Professor in the Mathematics and Statistics Department, Pontific Catholic University of Minas Gerais, Belo Horizonte, Brazil. Her research interests include applied artificial intelligence and engineering education. In 2001, she was a Research Assistant in the Division of Engineering, Colorado School of Mines, Golden, where she conducted research in the Mechatronics Laboratory. Paulo E. M. Almeida (S’00) received the B.E. and M.Sc. degrees from the Federal University of Minas Gerais, Belo Horizonte, Brazil, in 1992 and 1996, respectively, both in electrical engineering, and the Dr.E. degree from São Paulo University, São Paulo, Brazil. He is an Assistant Professor at the Federal Center for Technological Education of Minas Gerais, Belo Horizonte, Brazil. His research interests are applied artificial intelligence, intelligent control systems, and industrial automation. In 2000–2001, he was a Visiting Scholar in the Division of Engineering, Colorado School of Mines, Golden, where he conducted research in the Mechatronics Laboratory. Dr. Almeida is a member of the Brazilian Automatic Control Society. He received a Student Award and a Best Presentation Award from the IEEE Industrial Electronics Society at the 2001 IEEE IECON, held in Denver, CO. Marcelo Godoy Simões (S’89–M’95–SM’98) received the B.S. and M.Sc. degrees in electrical engineering from the University of São Paulo, São Paulo, Brazil, in 1985 and 1990, respectively, the Ph.D. degree in electrical engineering from the University of Tennessee, Knoxville, in 1995, and the Livre-Docencia (D.Sc.) degree in mechanical engineering from the University of São Paulo, in 1998. He is currently an Associate Professor at the Colorado School of Mines, Golden, where he is working to establish several research and education activities. His interests are in the research and development of intelligent applications, fuzzy logic and neural networks applications to industrial systems, power electronics, drives, machine control, and distributed generation systems. Dr. Simões is a recipient of a National Science Foundation (NSF)—Faculty Early Career Development (CAREER) Award, which is the NSF’s most prestigious award for new faculty members, recognizing activities of teacher/scholars who are considered most likely to become the academic leaders of the 21st century.