Guided Neural Network Learning Using a Fuzzy Controller, with Applications to Textile Spinning PEITSANG WU, SHU-CHERNG FANG, HENRY L. W. NUTTLE, JAMES R. WILSON, and RUSSELL E. KING North Carolina State University, USA We apply neural networks to build a \metamodel" of the relation between key input parameters and output performance measures of a simulated textile spinning plant. We investigate two dierent neural network estimation algorithms, namely back-propagation and an algorithm incorporating a fuzzy controller for the learning rate. According to our experience, both algorithms are capable of providing high-quality predictions. In addition, results obtained using a fuzzy controller for the learning rate suggest a signicant potential for speeding up the training process. Key words: neural network, fuzzy control, textile manufacturing, spinning operations This research is sponsored by National Textile Center Grant #S92C4. INTRODUCTION The textile industry is extremely competitive internationally. Due to the low cost of foreign labor, the U.S. has been rapidly losing its market share to overseas competitors. Because the industry is labor intensive, many jobs in the U.S. are threatened. The American textile industry currently employs 1.8 million people through 26,000 companies, representing 10% of the entire American manufacturing workforce. During the past 12 years, 500,000 U.S. jobs have been lost due to textile imports; and if this trend continues, by the year 2002, one million more jobs will be eliminated. Compared to the American automotive, petroleum, and primary metals industries, the textile industry contributes more to the U.S. gross domestic product (Moncarz, 1992). The textile pipeline modeling project at North Carolina State University (NCSU) (Hunter et al., 1993) has continued and expanded the eort in helping the U.S. textile industry that was begun under the auspices of CRAFTM, an industry-university consortium also formed at NCSU. The objectives of the current project center on the understanding and description of how information and material must ow through the apparel-supply pipeline in response to consumer demand and of how rms in the various manufacturing segments should react to this demand as it percolates back through the pipeline. Regardless of the conditions under which the pipeline operates in the future| whether QR (quick response), traditional, or otherwise|decision making will still be in the hands of the management of the individual rms. Answering high-level management questions about the cost/benet consequences of policy changes (such as installing QR, broadening the product mix, reducing the minimum order size, and adopting new quality procedures) is of special importance. Two basic goals of this project are to understand the operation of each rm and then to understand the interactions between the objectives of the various rms. This is necessary in order to understand the trade-os required to make the U.S. complex responsive, exible, and productive and, thus, viable and competitive. Building on its earlier work and now operating under the auspices of the National Textile Center, the expanded research team has been specifying and developing an integrated set of simulation models of the rms in the apparel pipeline, from spinning through cut-and-sew. Fig. 1 provides a schematic overview of the system. The system is sized to produce/process about 25 million pounds of yarn per year, about half of which will be consumed by 4{5 pipeline apparel manufacturers; and the rest will be sold to outside customers for yarn and fabric. A testbed set of garments (including basic, seasonal, and fashion garments) to be assembled in the apparel plants, in turn, species the variety of colors, fabrics, and yarns to be produced in the other plants. These simulation models provide a vehicle for: (a) understanding the interactions between decisions taken within dierent rms; (b) analyzing and developing operational practices within individual rms and across subsets of rms; (c) developing high-level management information systems; and (d) training personnel in making operational decisions. Object-oriented versions of these simulation models will provide the capability to easily tailor the models to match a specic 2 National Textile Center Pipeline Modeling Project External Sales External Sales Knitting Spinning External Sales Dyeing and Finishing Yarn Dyeing M a s t e r S c h d u l e Knit Shirts Slacks Weaving External Sales Operations Model Shirts Blouses Fig. 1. Overview of textile-apparel-retail pipeline model. company's plant(s) and to integrate all of the separate models into a consolidated model of the entire ber-textile-apparel-retail pipeline. To date we have built several computer simulation models of the individual plants in the pipeline, along with a testbed of garments and a master schedule generator (Hunter et al., 1993). In this paper, we focus on the spinning operation. By using the spinning plant simulation model (Clarke and King, 1993; Powell, 1992), we study the impact of key input parameters (such as number of yarns and target inventory levels) on selected output performance measures (such as order response times and inventory levels). As a rst attempt to create an interactive high-level management information system, we have exploited neural network techniques to develop a decision surface model relating simulation-generated performance measures to selected input parameters of the spinning simulation. THE SPINNING OPERATION Description of the Spinning Process As depicted in Fig. 2, the process of spinning cotton into yarn involves opening and cleaning, carding, drawing, roving, and spinning. First, bales of cotton are opened and blended to ensure homogeneity. The cotton bers are then cleaned to remove any dirt and foreign objects that remain from agricultural processing. The opening operation yields small uy clumps of bers called eece which are then ready for carding. Carding machines further divide the eece and separate it into 3 individual bers. The carding machines provide further cleaning and partially align the bers. The resulting rough, rope-like strand is called card sliver. Several strands of card sliver are put together and elongated on a drawing machine. This process improves the uniformity of the sliver by further aligning and blending the bers. The sliver is usually drawn twice. With ring spinning, the drawn sliver is passed onto the roving machines where it is further drawn into a smaller rope form and slightly twisted to make it easier to handle. Spinning creates the desired twist and thickness (count) of the yarn and winds it onto bobbins. Winding machines then combine bobbins onto cones to be used at the weaving or knitting machines. Fiber Inventory Opening Spinning Coning Carding Drawing Packaging Finished /QC Inventory Roving Shipping Fig. 2. The ring spinning process. The nal products from the spinning plant are yarns of given counts and twists. A given product can be used to create a variety of apparel items. For further details, see Lord (1974) and Rohlena (1975). A Simulation Model of Spinning Operations The spinning model is coded in the SIMAN simulation language (Pegden, Shannon, and Sadowski, 1995), supported by many user-supplied discrete-event subroutines coded in FORTRAN. For this simulation model, most operational parameters (number and types of frames, cycle times, yields, etc.) are user-supplied inputs and may be easily changed to represent a wide variety of operational scenarios. The spinning simulation model is also constructed so that it is possible to replace, for example, the scheduling procedure. The model is designed to simulate the basic activity of a ring-spinning operation capable of producing around 25 million pounds of cotton yarn per year. The simulated activity includes spinning frame scheduling, schedule execution, changeovers, coning (winding) and inspection, and shipping. At the present time, spinning is modeled as a make-to-stock operation. (As the overall pipeline is currently designed, about 50 percent of the annual demand for yarn will come from customers who are not in the pipeline and are not included in the master schedule.) Customer orders are either 4 call-outs against blanket orders, randomly generated on a weekly basis, or spot orders, randomly generated on a daily basis. Orders vary as to count and quantity. The count mix and average weekly volume can be varied throughout the year to reect seasonality in use of dierent yarn weights. The strategy for controlling the inventory levels of nished yarns is a \target-level" system. Production is incrementally raised or lowered periodically in order to try to maintain a specied inventory level. The spinning frames are scheduled reactively based upon deviation of yarn inventory from the target. The plant operates 24 hours per day, six days per week for a user-specied number of weeks per year. Cycle times between frame dos are random to reect frame eciency, and yield per do is random to reect the results of inspection. After dong, the yarn is delayed for a random time representing coning, inspection, and packaging before it is classied as available for shipping. Orders are shipped daily, ve days a week, limited by shipping capacity. Priority is given to blanket orders. Measures of performance include production levels, order response times (shipping lead times), inventory levels, frame utilization, and margin. For more details on the spinning simulation model, see Clarke and King (1993) and Powell (1992). DECISION SURFACE MODELING The objective of decision surface modeling is to develop an interactive information system that captures the essential features of each pipeline plant model (or integrated collection of such plant models) in mathematical relationships between plant performance (inventory, order lead times, etc.) and key decision parameters (product mix, number of machines, etc.). These \metamodels" are intended to provide high-level management with rapid, easy-to-use capability to predict impact on system performance or various \what-if" scenarios such as: What are the consequences of broadening the product mix? What are the cost/benets of reducing order lead times? What are the cost/benets of introducing new equipment? This section describes an attempt to build a \metamodel" of the relationship between key input parameters and output performance measures for a spinning operation using neural networks. The rest of the paper will describe this eort and results to date. Neural Network Architecture A neural network consists of several layers of computational units called neurons and a set of dataconnections which join neurons in one layer to those in another. The network takes inputs and produces outputs through the work of trained neurons. Neurons usually calculate their outputs 5 using a signal-activation function of their inputs, where the signal-activation function has a sigmoid shape. Throughout this study, we employ the logistic signal-activation function, f (x) = 1 +1e,x for , 1 < x < 1: (1) The functional form (1) is used primarily for computational convenience. Using some known results (for example, input-output pairs observed in the target system), we assign a weight to each connection to determine how an activation signal that travels along that connection inuences the receiving neuron. The connections and their weights are the most important parameters in a neural network model since they determine the outputs of the network. Because the weights of the connections can be adjusted by experiencing and incorporating more known results, the relationship of the network's outputs to its inputs changes with exposure to additional responses from the target system. Hence we say that a neural network has the ability to learn. The process of repeatedly exposing the network to known responses from the target system to estimate appropriate connection weights is called training. More details on neural networks can be found in Freeman and Skapura (1991), Gallant (1993), Smith (1993), and Zurada (1992). There are two kinds of training methods, namely \supervised learning" and \unsupervised learning". In supervised learning, we assume that on each occasion when an input is applied, a corresponding target response of the system is also provided as a supervising signal. This supervising signal, or target response, is called the \teacher". The teacher provides a basis for estimating and reducing errors between the neural network's output and the target response. To reduce this error, we use a negative gradient direction for the sum of squared estimation errors as the basis for better weight assignments. On the other hand, in unsupervised learning, the target response is not provided and hence no error information is available. The learning must somehow be accomplished purely based on system inputs and outputs about which we have little or no knowledge. Unsupervised learning is sometimes called \learning without a teacher" (Zurada, 1992). Throughout our study, the outputs from the spinning simulation model are used as target responses in a supervised learning environment. The number of layers in a neural network and its connection structure greatly aects its performance. Usually a fully connected multilayer network produces more accurate outputs but with a higher computational burden. In our study, since the spinning operation is eventually going to be linked with other pipeline operations, we selected a fully connected three-layer neural network, consisting of input, hidden, and output layers to balance accuracy and computational requirements. A schematic diagram of such a network is illustrated in Fig. 3. As mentioned earlier, a neural network can learn. A commonly used learning rule is the socalled \delta learning rule". Assume that some initial weights have been suitably assigned before each learning experiment starts. As shown in Fig. 4 for neuron k in the output layer (k = 1; : : :; K ), the J 1 vector T k [Wk1 ; : : :; WkJ ] W 6 Input Layer W11 V 11 x y V 21 1 1 o y x Output Layer Hidden Layer 1 2 2 o 2 x V JI I y J-1 y o W KJ K J Fig. 3. Neural network architecture represents the connection weights to neuron k in the output layer from the J neurons in the hidden layer. (Throughout this paper, the roman superscript T denotes the transpose of a vector or matrix.) The corresponding J 1 vector = [Y1 ; : : :; YJ ]T represents the inputs to each neuron in the output layer from the J neurons in the hidden layer. The learning signal r is in general a function of k , , and supervising signal dk . In our study, dk is the kth performance measure generated by the simulation model for k = 1; : : :; K . For neuron k in the output layer, the dependence of the learning signal on k , , and dk is expressed as Y W Y r = r(Wk; Y; dk): W Y As explained later in this subsection, we take the learning rate to have the specic functional form r(Wk ; Y; dk) = dk , f h W Y WkTYi f 0WkTY ; WY (2) where f 0 kT is the derivative of the signal-activation function (1) evaluated at kT . If the delta learning rule is applied on successive training steps indexed by t (where t = 1; 2; : : :), then at step t the increment in the current weight vector k (t) required to go to the next step is given by W W W Y Y k (t) = r[ k (t); (t); dk] (t); where is a positive number called the learning constant. The rate of learning is determined by . The new weight vector at training step (epoch) t + 1 is adapted from the weight vector at step t 7 y y 1 W k1 2 W k2 W T o f(W Y) kj k yj W y ∆W kJ T k f’(W Y) J + d Y r d - O k k k C Fig. 4. The delta learning rule. according to the dierence equation Wk(t + 1) = Wk(t) + r[Wk (t); Y(t); dk] Y(t): (3) For a continuous-time learning process, the analogue of (3) is the dierential equation d k (t) = r[ (t); (t); d ] (t): W dt Wk Y k Y In the following justication of the form of the learning rule (2), we suppress the dependence of all quantities on the training step t for notational simplicity. We dene output error as the dierence T between the output value ok = f k and the supervising signal dk . The learning rule (2) can be readily derived by applying the estimation principle of least squares to the output error, where the squared error is dened as h i2 E = 12 (dk , ok )2 = 21 dk , f kT : (4) By calculating the gradient vector of E with respect to k , we obtain an \error gradient vector", WY WY W rE = ,(dk , ok )f 0 WkTY Y: The components of the gradient vector are @E = ,(d , o )f 0WTY Y for j = 1; 2; :::; J: k k j k @Wkj The delta learning rule is used to minimize the sum of squared errors of the form (4) taken over all training patterns; and thus at training step t, the required increment k (t) of the weight vector k (t) must be in the negative gradient direction so that we take W W 8 W k (t) = , rE (t): Expressing the right-hand side of this last equation explicitly in terms of the signal-activation function (1) and the weight vector k (t), we have W , rE (t) = dk , f [Wk (t)]T Y f 0 [Wk (t)]T Y n o n o Y: (5) In order to accelerate the convergence to an optimal weight assignment, we choose the method of \error back-propagation with momentum" (Zurada, 1992) which modies k (t) by incorporating a \momentum" applied to the previous information k (t , 1) so that we take W Wk (t) = , rE (t) + Wk (t , 1); W where: the arguments t and t , 1 are used to indicate the current and the most recent previous training steps, respectively; , rE (t) is given by (5); and is a user-selected positive momentum constant (Zurada, 1992). Input Parameter Design The key outputs from the simulation model used in this study were the average inventory level, the percentage of spot orders shipped within 5 days, the percentage of blanket orders shipped within 5 days, and the percentage of all orders shipped within 5 days. Based on earlier simulation experiments with the spinning model (Powell, 1992), the key input parameters which aect these performance measures are the number of yarns in the product line, the number of frames in the plant, the frame utilization level required to meet the demand, the number of days required for coning and inspection, the percentage of customer orders that are blanket orders, the target number of days of inventory used for control purposes, and the average size of a customer order. Therefore, for our three-layer neural network, there are seven input nodes and four output nodes. In addition, for computational purposes, nine neurons are used in the hidden layer. In the spirit of a two-level factorial experiment (Montgomery, 1991), we xed each input parameter either at the lowest value of interest or at the highest value of interest to create patterns (design points) for the experiment. Table 1 shows the extreme values used for the input parameters. The selected extreme values of the input parameters yield 27 = 128 training patterns for the neural network. For each sample, we replicated the spinning simulation four times in order to estimate the mean and variance of the response at each pattern (design point). For testing purposes, an additional 99 patterns were generated by the simulation model. First, we xed all seven input parameters in the middle of the range of interest to provide one \center" pattern. Next we xed six input parameters at their midrange values and set the remaining parameter at its highest and lowest values to generate 2 7 = 14 patterns. Then we xed ve input parameters at their middle values and set the remaining two parameters at their highest and lowest values to generate 4 21 = 84 patterns. 9 Table 1. Selected levels of key input parameters Levels Parameter Low High # of Yarns 6 yarns 18 yarns # of Frames 15 frames 45 frames Utilization 82% 88% Coning/Inspection 2 days 4 days Blanket Orders 40% 100% Target Inventory 6 days 12 days Order Size 5000 lbs 15000 lbs Test Runs and Validation As mentioned earlier, a classic back-propagation neural network with momentum (BPNET) was used in our study. In order to nd the optimal weight-assignment for the network, we rst trained the neural network using the 128 patterns described above. As shown by the solid curve in Fig. 5, after 500 training steps, the total mean square error (mse) is reduced to about 0.30. However, when the resulting network was applied to the additional 99 patterns for validation purposes, the results (shown by the dotted line in Fig. 5) were not very satisfactory. These results indicate that a linear model is not sucient and we need additional training patterns to expose the potential curvature of the response surface. mse 10 Training 5 Validation 0 0 500 eopch 1000 Fig. 5. Validation pilot run for 128 patterns using back-propagation. In order to assess the curvature of the response surface, the \center" pattern was reclassied 10 as a training pattern instead of a validation sample. In other words, we used 129 training patterns and 98 validation samples for test runs. The result is shown in Fig. 6. The gure indicates that the mse curve for the validation samples reached its minimum early. (The data for Fig. 6 showed that the minimum was reached at training step t = 19; 813. Beyond the training step yielding the minimum mse, the neural network becomes \overtrained". At this point, the normalized mean square error for training attains a value of 0.03078 while the normalized mean square error for validation is 0.19299. For the training result, more detailed test information is given in the Table 2. 1.5 1.0 mse Training Validation 0.5 0.0 0 50000 epoch 100000 Fig. 6. Validation pilot run for 129 patterns using back-propagation. Table 2. Results with BPNET and 129 training patterns Target response dk Average Overall Blanket Spot Statistic Inventory leadtime orders orders Average relative error ,0.7457 5.2911 3.1547 12.6674 Variance of relative errors 16.7433 1542.7400 770.0309 1766.2286 Maximum relative error 9.1327 338.2040 242.5470 354.4183 Minimum relative error ,11.2743 ,20.7460 ,35.6387 ,24.3477 97.5th percentile of relative errors ,0.0395 12.06924 7.9434 19.9198 2.5th percentile of relative errors ,1.4518 ,1.4870 ,1.6340 5.4150 The rst row of numbers in the table indicates that over all 129 training patterns, the Average relative error 100(ok , dk )=dk % of the neural network output (prediction) ok vs. the simulation output (target response) dk has the 11 respective values ,0.75%, 5.29%, 3.15%, and 12.67% for the average inventory, the overall leadtime, the blanket order leadtime, and the spot order leadtime, where a negative value indicates underprediction. Note that the combined average relative error of all the four predictions is 5.09%; and the worst performance is seen in the prediction for the spot order leadtime. Hence the percentage of spot orders lled within 5 days was most dicult to predict. Rows 2{5 of Table 2 respectively display the variance of the relative errors, the maximum relative errors, and the minimum relative errors for each target response computed over all 129 training patterns. Again the gures suggest that prediction of the spot order leadtime performance is more dicult than the other performance measures. For the 98 validation samples, the detailed test information is shown in the Table 3. Table 3. Results obtained with BPNET and 98 validation samples Target response dk Average Overall Blanket Spot Inventory leadtime orders orders Average relative error ,0.8874 3.5010 2.9610 ,1.6343 Variance of relative error 119.8984 27.6934 28.8958 66.7292 Maximum relative error 69.2617 24.7048 28.7201 22.3840 Minimum relative error ,40.0680 ,9.7861 ,3.5308 ,33.5616 97.5th percentile of relative errors 1.2805 4.5429 4.0253 ,0.0169 2.5th percentile of relative errors ,3.0553 2.4591 1.8967 ,3.2516 In this case the combined average relative error of all four predictions is only 0.99%. Notice that the predictions for the validation samples actually are more accurate than those for training patterns. In particular the average relative error for spot order leadtime performance has much smaller magnitude in the validation patterns than in the training patterns. Although the spot order performance was the most variable output measure in the actual simulation runs, the neural network predictions are still relatively accurate. This qualies the neural network model as a good tool for the prediction of performance in textile spinning operations. Output Decision Surface Using the optimal weights obtained through the validation samples, we see that the neural network metamodel can predict the output measures, based on the given inputs, very eectively. However, while we can represent the four-dimensional output surface in terms of the seven-dimensional input parameters, we only provide three-dimensional plots. Ten groups of three-dimensional surface plots can be found in Wu et al. (1994). 12 A NEURAL NETWORK WITH FUZZY CONTROL Basic Ideas After constructing the BPNET model for the textile spinning model, we began to study its \learning capability" for performance enhancement. The learning capability of a neural network is a complicated issue. It depends on the structure of connections, the activation function, the number of hidden layers and hidden neurons, as well as the learning rate. Each of these parameters typically has a signicant impact on the performance of a neural network model. For example, excessively large learning rates or too few hidden neurons may cause the network to diverge or converge to a high error level (Hertz and Hu, 1992; Tveter, 1991). On the other hand, excessively small learning rates or too many hidden neurons may result in lengthy computation time. Unfortunately, at this time, there are very few quantitative rules to guide selection of these control parameters. In this section we focus on controlling the learning rate, which was a xed constant in the BPNET model described in the previous section. In general, we know that the learning rate must be a small number to ensure convergence; and as the magnitude of the prediction error decreases during the training process, the learning rate should decrease correspondingly. But the specic value and the frequency of reduction of the learning rate still lie within the realm of guesswork. It is only \vaguely" understood. This \vagueness" led us to study \fuzzy control" of the learning rate of our neural network model (Hertz and Hu, 1992). The basic idea is to apply a fuzzy-neuron controller to modulate the learning rate of each input neuron throughout the training process. Roughly speaking, when the magnitude of the output error is \large" (positively or negatively), we would like to set a \relatively large" learning rate to accelerate the learning process; when the magnitude of the output error is \medium" (positively or negatively), we slow down the learning rate to a \medium" value; and nally when the magnitude of the output error is \small" (positively or negatively), we would like to adopt a \relatively small" learning rate to ensure convergence. Hence a simple fuzzy-neuron control rule is established according to the Table 4. Table 4. Fuzzy control rule Characteristic Matching categories output error, ok , dk NL NM NS ZE PS PM PL learning rate, (t) L M S ZE S M L In Table 4, NL means \Negatively Large", NM means \Negatively Medium", NS means \Negatively Small", ZE means \Zero Equivalence", PS means \Positively Small", PM means \Positively Medium", PL means \Positively Large", S means \Small", M means \Medium", and L means \Large" (Kosko, 1992, and Zurada, 1992). Note that this rule could be generic for all neural networks using back-propagation learning algorithms. To deal with the vagueness of \Positively (Negatively) Large", \Positively (Negatively) 13 Medium" and \Positive (Negatively) Small", the output error is further dened by a fuzzy membership function (Kosko, 1992, and Smith, 1993). A fuzzy membership function Ae : Z ! [0; 1] denes a membership value of the fuzzy set Ae between 0 and 1 for every element z in the universe of disclosure Z . This number Ae(z ) indicates the degree to which the input variable z belongs to the fuzzy set Ae. The membership function can take many dierent shapes depending on the characteristics of the underlying problem. In practice, triangular or trapezoidal shapes are often chosen for simplied computation. In this study, triangular membership functions of output error were chosen. These membership functions are displayed in Fig. 7 and Fig. 8. With the fuzzy control rule and fuzzy membership function described above and for a given value of output error, our objective is to determine a corresponding learning rate, . One way to nd the learning rate is to use the well-known \extension principle" (Zimmermann, 1988). Let ZfE, PfS, PM f , PfL , NfS , NM f , and NfL be the membership functions of an output error that f (zero), PS f (positively small), PM g (positively respectively correspond to each of the fuzzy sets ZE f (positively large), NS f (negatively small), NM g (negatively medium), and NL g (negamedium), PL tively large). In other words, for a given output error v , the quantity Ae(v ) indicates the degree to which this output error is classied in category A, where A 2 fNS; NM; NL; ZE; PS; PM; PLg. We use arg max Ae(v ) to indicate the category in which v has highest value of the membership Ae function. In case there is a tie in the degree of membership, we prefer to classify the output error in a category with \smaller error magnitude", i.e., the priority is given to categories in the following order: (i) ZE; (ii) PS or NS; (iii) PM or NM; and nally (iv) PL or NL. In order to determine the learning rate, (t), for a particular training step t, we use the n most recent observations of output error, fvt,n , vt,n+1 , ..., vt,1 g, where n is user-specied. At the current training step t, the learning rate is determined by the following procedure. For t = 1, (t) is arbitrarily assigned. For 2 t n, assign i h category = arg max 1min (v ) : it,1 Ae i Ae (6) If category is PS or NS, then (t) = S (small). If category is PM or NM, then (t) = M (medium). If category is PL or NL, then (t) = L (large). If category is ZE, then (t) = ZE (zero). For t > n, assign h i category = arg max t,nmin (v ) : it,1 Ae i Ae 14 (7) If category is PS or NS, then (t) = S (small). If category is PM or NM, then (t) = M (medium). If category is PL or NL, then (t) = L (large). If category is ZE, then (t) = ZE (zero). Notice that in (6) and (7), the quantity vi is the output error at training step i. The numerical values of S, M, L, and ZE are specied by the user. We illustrate (6) and (7) with a small example in which we choose n = 3, L = 1:0, M = 0:7, S = 0:3, and ZE = 0. The membership functions for output error are dened as in Fig. 7. Table 5 shows how the learning rate (t) is assigned in the rst 10 training steps. M e m b e r s h i p NL Membership of Output_error PS PM NM NS ZE PL 1.0 0.8 0.6 0.4 0.2 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Output_error Fig. 7. The membership functions of output error for the ten-step example. Table 5. Illustration of fuzzy control for the neural network Step learning rate Ae(vi) for each category A output error t (t) NS NM NL ZE PS PM PL vi 1 1.0 0.0 0.0 0.0 0.0 0.0 0.000 1.000 0.700 2 1.0 0.0 0.0 0.0 0.0 0.0 0.000 1.000 0.650 3 1.0 0.0 0.0 0.0 0.0 0.0 0.000 1.000 0.600 4 1.0 0.0 0.0 0.0 0.0 0.0 0.000 0.700 0.555 5 1.0 0.0 0.0 0.0 0.0 0.0 0.000 0.667 0.550 6 1.0 0.0 0.0 0.0 0.0 0.0 0.300 0.367 0.505 7 1.0 0.0 0.0 0.0 0.0 0.0 0.333 0.333 0.500 8 1.0 0.0 0.0 0.0 0.0 0.0 0.633 0.033 0.455 9 0.7 0.0 0.0 0.0 0.0 0.0 1.000 0.000 0.400 10 0.7 0.0 0.0 0.0 0.0 0.0 1.000 0.000 0.380 15 In training step 1, we select an arbitrary learning rate, say 1.0. The resulting output error of step 1 is 0.7, which is in the category PL with membership 1.0. By the above rule (6), we select a L (large) learning rate for step 2, i.e., 1.0. The resulting output error in step 2 is 0.65, which again is in the category of PL with membership 1.0. In order to nd the learning rate of step 3, we use the previous output errors from steps 1 and 2, which are both in the category of PL. By the formula (6), the category with the largest membership function is PL. Therefore, we assign an L (large) learning rate to step 3, which is again 1.0. The resulting output error for step 3 is 0.60, which is also in the category of PL with membership 1.0. For the learning rate of step 4, we use the n = 3 output errors from steps 1, 2, and 3. Looking at their output errors, we nd that all are in the category PL. Hence the category with the largest membership function is PL and we assign L (large) learning rate to step 4. The resulting output error of step 4 is 0.555, which is in the category of PL with membership 0.7, and in the others with membership 0. Continuing in this manner, in order to nd the learning rate of step 10, we use the output errors of training steps 7, 8, and 9. The output error of epoch 7 is 0.5, which is in the category of PL with membership 0.333, and in the category of PM with membership 0.333. The output error of epoch 8 is 0.455, which is in the category of PL with membership 0.033, and in the category of PM with membership 0.633. The output error of epoch 9 is 0.4, which is in the category of PM with membership 1.0. The smallest membership across all three epochs for PL is 0.0 at epoch 9, and the smallest membership across the three epochs for PM is 0.333 at epoch 7. Notice that PM with membership 0.333 is the largest membership among the categories. Therefore, we select PM as our output error category. By the rule, M (medium) learning rate is selected for the epoch 10, which is 0.7. In the spinning operation study, the membership functions of the output error were dened as in Fig. 8. For computational purposes, we used 0.1 as the zero-equivalent learning rate ZE, 0.2 as the small learning rate S , 0.3 as the medium learning rate M, and 0.4 as the large learning rate L. Also, the maximum number n of observations used to classify the current output error was arbitrary set to n = 10. Further studies on the fuzzy membership assignment and optimal number of observations are in progress. Input Parameter Design In order to compare the results of the fuzzy neuron control network with the classic back-propagation network, the input parameter design was kept the same as that used with the back-propagation network. Test Runs and Validation The fuzzy learning rate controller described above was used in conjunction with the classic backpropagation neural network for the textile spinning operation. We call this new model a \fuzzy control neural network" (FCNET). In order to nd the optimal weight-assignment of the network, 16 M e m b e r s h i p NL Membership of Output_error PS PM NM NS ZE PL 1.0 0.8 0.6 0.4 0.2 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Output_error Fig. 8. The membership functions of output error for the spinning simulation. we applied the previously dened 129 training patterns and 98 validation samples for test runs. The results are shown in Fig. 9. From the gure, we see that the curve for validation samples reaches at its minimum very quickly (at training step t = 6; 925). At this point, the normalized mean square error for training is 0.03439 while the normalized mean square error of the validation sample is 0.19961. For the training results, detailed test information is given in Table 6, while this information for the validation samples is given in Table 7. Table 6. Results using FCNET and 129 training patterns Target response dk Average Overall Blanket Spot Inventory leadtime orders orders Average relative error ,0.5566 5.3466 3.9529 13.7162 Variance of relative error 20.1848 1693.3670 1067.8162 1204.2605 Maximum relative error 8.7217 346.1830 284.4080 233.7359 Minimum relative error ,15.0727 ,23.9870 ,38.9305 ,33.6958 97.5th percentile of relative errors 0.2187 12.4479 9.5920 19.7047 2.5th percentile of relative errors ,1.3319 ,1.7547 ,1.6862 7.7277 The combined average relative error of all four predictions is 5.61% for the training results, while the performance of the prediction relating the spot order leadtime is again the worst. All these results are consistent with those of the BPNET model but are obtained with far fewer epochs. For the validation samples, the combined average relative error of all the four predictions is ,0.10%. Notice that the error for spot order leadtime is ,1.59% which is far lower than for the training patterns. Again the validation sample results are consistent with those of the BPNET model. 17 Table 7. Results using FCNET and 98 validation samples Target response dk Average Overall Blanket Spot Inventory leadtime orders orders Average relative error ,4.2366 3.0473 2.3809 ,1.5900 Variance of relative error 111.4241 31.9166 24.8861 104.1227 Maximum relative error 60.6293 24.3160 27.3872 35.2720 Minimum relative error ,42.9737 ,10.7228 ,4.6670 ,38.1166 97.5th percentile of relative errors ,2.1467 4.1658 3.3686 0.4303 2.5th percentile of relative errors ,6.3266 1.9288 1.3932 ,3.6105 0.8 mse 0.6 Training 0.4 Validation 0.2 0.0 0 50000 epoch 100000 Fig. 9. Validation pilot run for 129 patterns using fuzzy controller. Output Analysis Plots of the response surfaces estimated by the FCNET algorithm are displayed in Wu et al. (1994). Several observations can be made here. 1. From Tables 6 and 7, we see that the overall average relative error is about 5.61% for the training patterns and ,0.10% for the validation samples. This indicates better prediction on the validation samples than on the training patterns. 2. The prediction of the FCNET metamodel is slightly worse than that of the BPNET metamodel for the training patterns. This is due to the fact that the BPNET model xes its learning rate at a small number while the FCNET model varies its fuzzy learning rates, which may cause slight degradation in the accuracy of predictions based on FCNET. However, for the 18 validation samples, the prediction of the FCNET model is better than the BPNET model. This implies that the FCNET model's prediction capability is comparable to the BPNET model in practice. 3. The FCNET metamodel attains its optimal weight assignment at training step t = 6; 925 while the BPNET metamodel attains its optimal weight assignment at training step t = 19; 813. This shows the potential computational savings of the FCNET model. A comparison of the learning performance is shown in the Fig. 10. 0.25 mse 0.20 0.15 FCNET BPNET 0.10 0.05 0.00 0 50000 epoch 100000 Fig. 10. Estimation errors for back-propagation vs. the fuzzy controller. CONCLUDING REMARKS In this study, we have investigated two dierent metamodel estimation algorithms based on neural networks, namely BPNET and FCNET. We have applied these algorithms to the prediction of certain performance measures of a textile spinning operation based on the main input parameters for that operation. According to our experience, both metamodel estimation algorithms are capable of providing good predictions. The results of using a fuzzy controller for the learning rate suggests a signicant potential for speeding up training. For future studies, we will consider incorporating fuzzy controllers for other parameters including the learning rate of the hidden layer, the rate of the activation function, and the rate of the momentum method. We also will investigate the possibility of using a separate neural network to specify the fuzzy membership of each control parameter. 19 BIBLIOGRAPHY Clarke, L. A. M. & King, R. E. (1993). PD-Based inventory control for a textile spinning plant. Technical Report #93-13, Department of Industrial Engineering, North Carolina State University, Raleigh, NC. Freeman, J. A. & Skapura, D. M. (1991). Neural Networks: Algorithms, Applications, and Programming Techniques. Reading, MA: Addison-Wesley. Gallant, S. I. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press. Hertz, D. B. & Hu Q. (1992). Fuzzy-neuron controller for backpropagation networks. In Proceedings of the Third Workshop on Neural Networks (pp. 474{478). Hunter, N. A., King, R. E., Nuttle, H. L. W., & Wilson, J.R. (1993). North Carolina apparel pipeline modeling project. International Journal of Clothing Science and Technology, Vol. 5, No. 3/4, pp. 19{24. Kosko, B. (1992). Neural Networks and Fuzzy Systems, Englewood Clis, NJ: Prentice Hall. Lord, P. R. (1974). Spinning conversion of ber to yarn. Unpublished Master's thesis, School of Textiles, North Carolina State University, Raleigh, NC. Moncarz, H. T. (1992). Information Technology Vision for the U.S. Fiber/Textile/Apparel Industry, Internal Report NIST-IR 4986, National Institute of Science and Technology, U.S. Department of Commerce, Washington, D.C. Montgomery, D. C. (1991). Design and Analysis of Experiments, Second Edition. New York: John Wiley & Sons. Pegden, C. D., Shannon, R. E., & Sadowski, R. P. Introduction to Simulation Using SIMAN, Second Edition. New York: McGraw-Hill. Powell, K. A. (1992). Interactive decision support modeling for the textile spinning industry, Unpublished Master's thesis, Department of Industrial Engineering, North Carolina State University, Raleigh, NC. Rohlena, V. (1975). Open-End Spinning. Amsterdam, NY: Elsevier Science Ltd. Smith, M. (1993). Neural Networks for Statistical Modeling. New York: Van Nostrand Reinhold. Tveter, D. (1991). Getting a fast break with backprop. AI Expert, Vol. 6, pp. 36{43. Wu, P., Fang, S. C., Nuttle, H. L. W., King, R. E., & Wilson J. R. (1994). Decision surface modeling of textile spinning operations using neural network technology. Technical Report #94-1, Department of Industrial Engineering, North Carolina State University, Raleigh, NC. Zimmermann, H. J. (1988). Fuzzy set Theory and Its Applications. Norwell, MA: Kluwer Nijho Publishing. Zurada, J. M. (1992). Introduction to Articial Neural Systems. St. Paul, MN: West Publishing Company. 20