Guided Neural Network Learning Using a Fuzzy Controller, with

advertisement
Guided Neural Network Learning Using a Fuzzy Controller,
with Applications to Textile Spinning
PEITSANG WU, SHU-CHERNG FANG,
HENRY L. W. NUTTLE, JAMES R. WILSON, and RUSSELL E. KING
North Carolina State University, USA
We apply neural networks to build a \metamodel" of the relation between key input parameters and
output performance measures of a simulated textile spinning plant. We investigate two dierent
neural network estimation algorithms, namely back-propagation and an algorithm incorporating a
fuzzy controller for the learning rate. According to our experience, both algorithms are capable
of providing high-quality predictions. In addition, results obtained using a fuzzy controller for the
learning rate suggest a signicant potential for speeding up the training process.
Key words: neural network, fuzzy control, textile manufacturing, spinning operations
This research is sponsored by National Textile Center Grant #S92C4.
INTRODUCTION
The textile industry is extremely competitive internationally. Due to the low cost of foreign labor,
the U.S. has been rapidly losing its market share to overseas competitors. Because the industry
is labor intensive, many jobs in the U.S. are threatened. The American textile industry currently
employs 1.8 million people through 26,000 companies, representing 10% of the entire American
manufacturing workforce. During the past 12 years, 500,000 U.S. jobs have been lost due to textile
imports; and if this trend continues, by the year 2002, one million more jobs will be eliminated.
Compared to the American automotive, petroleum, and primary metals industries, the textile
industry contributes more to the U.S. gross domestic product (Moncarz, 1992).
The textile pipeline modeling project at North Carolina State University (NCSU) (Hunter et
al., 1993) has continued and expanded the eort in helping the U.S. textile industry that was begun
under the auspices of CRAFTM, an industry-university consortium also formed at NCSU. The
objectives of the current project center on the understanding and description of how information
and material must ow through the apparel-supply pipeline in response to consumer demand and of
how rms in the various manufacturing segments should react to this demand as it percolates back
through the pipeline. Regardless of the conditions under which the pipeline operates in the future|
whether QR (quick response), traditional, or otherwise|decision making will still be in the hands
of the management of the individual rms. Answering high-level management questions about the
cost/benet consequences of policy changes (such as installing QR, broadening the product mix,
reducing the minimum order size, and adopting new quality procedures) is of special importance.
Two basic goals of this project are to understand the operation of each rm and then to understand
the interactions between the objectives of the various rms. This is necessary in order to understand
the trade-os required to make the U.S. complex responsive, exible, and productive and, thus,
viable and competitive.
Building on its earlier work and now operating under the auspices of the National Textile Center, the expanded research team has been specifying and developing an integrated set of simulation
models of the rms in the apparel pipeline, from spinning through cut-and-sew. Fig. 1 provides a
schematic overview of the system.
The system is sized to produce/process about 25 million pounds of yarn per year, about half of
which will be consumed by 4{5 pipeline apparel manufacturers; and the rest will be sold to outside
customers for yarn and fabric. A testbed set of garments (including basic, seasonal, and fashion
garments) to be assembled in the apparel plants, in turn, species the variety of colors, fabrics, and
yarns to be produced in the other plants.
These simulation models provide a vehicle for: (a) understanding the interactions between
decisions taken within dierent rms; (b) analyzing and developing operational practices within
individual rms and across subsets of rms; (c) developing high-level management information
systems; and (d) training personnel in making operational decisions. Object-oriented versions of
these simulation models will provide the capability to easily tailor the models to match a specic
2
National Textile Center
Pipeline Modeling Project
External
Sales
External
Sales
Knitting
Spinning
External
Sales
Dyeing
and
Finishing
Yarn
Dyeing
M
a
s
t
e
r
S
c
h
d
u
l
e
Knit
Shirts
Slacks
Weaving
External
Sales
Operations Model
Shirts
Blouses
Fig. 1. Overview of textile-apparel-retail pipeline model.
company's plant(s) and to integrate all of the separate models into a consolidated model of the
entire ber-textile-apparel-retail pipeline.
To date we have built several computer simulation models of the individual plants in the
pipeline, along with a testbed of garments and a master schedule generator (Hunter et al., 1993).
In this paper, we focus on the spinning operation. By using the spinning plant simulation model
(Clarke and King, 1993; Powell, 1992), we study the impact of key input parameters (such as
number of yarns and target inventory levels) on selected output performance measures (such as
order response times and inventory levels). As a rst attempt to create an interactive high-level
management information system, we have exploited neural network techniques to develop a decision
surface model relating simulation-generated performance measures to selected input parameters of
the spinning simulation.
THE SPINNING OPERATION
Description of the Spinning Process
As depicted in Fig. 2, the process of spinning cotton into yarn involves opening and cleaning,
carding, drawing, roving, and spinning. First, bales of cotton are opened and blended to ensure
homogeneity. The cotton bers are then cleaned to remove any dirt and foreign objects that remain
from agricultural processing. The opening operation yields small uy clumps of bers called eece
which are then ready for carding. Carding machines further divide the eece and separate it into
3
individual bers. The carding machines provide further cleaning and partially align the bers. The
resulting rough, rope-like strand is called card sliver. Several strands of card sliver are put together
and elongated on a drawing machine. This process improves the uniformity of the sliver by further
aligning and blending the bers. The sliver is usually drawn twice. With ring spinning, the drawn
sliver is passed onto the roving machines where it is further drawn into a smaller rope form and
slightly twisted to make it easier to handle. Spinning creates the desired twist and thickness (count)
of the yarn and winds it onto bobbins. Winding machines then combine bobbins onto cones to be
used at the weaving or knitting machines.
Fiber
Inventory
Opening
Spinning
Coning
Carding
Drawing
Packaging
Finished
/QC
Inventory
Roving
Shipping
Fig. 2. The ring spinning process.
The nal products from the spinning plant are yarns of given counts and twists. A given
product can be used to create a variety of apparel items. For further details, see Lord (1974) and
Rohlena (1975).
A Simulation Model of Spinning Operations
The spinning model is coded in the SIMAN simulation language (Pegden, Shannon, and Sadowski,
1995), supported by many user-supplied discrete-event subroutines coded in FORTRAN. For this
simulation model, most operational parameters (number and types of frames, cycle times, yields,
etc.) are user-supplied inputs and may be easily changed to represent a wide variety of operational
scenarios. The spinning simulation model is also constructed so that it is possible to replace, for
example, the scheduling procedure.
The model is designed to simulate the basic activity of a ring-spinning operation capable of
producing around 25 million pounds of cotton yarn per year. The simulated activity includes
spinning frame scheduling, schedule execution, changeovers, coning (winding) and inspection, and
shipping.
At the present time, spinning is modeled as a make-to-stock operation. (As the overall pipeline
is currently designed, about 50 percent of the annual demand for yarn will come from customers
who are not in the pipeline and are not included in the master schedule.) Customer orders are either
4
call-outs against blanket orders, randomly generated on a weekly basis, or spot orders, randomly
generated on a daily basis. Orders vary as to count and quantity. The count mix and average
weekly volume can be varied throughout the year to reect seasonality in use of dierent yarn
weights.
The strategy for controlling the inventory levels of nished yarns is a \target-level" system.
Production is incrementally raised or lowered periodically in order to try to maintain a specied
inventory level. The spinning frames are scheduled reactively based upon deviation of yarn inventory
from the target.
The plant operates 24 hours per day, six days per week for a user-specied number of weeks
per year. Cycle times between frame dos are random to reect frame eciency, and yield per
do is random to reect the results of inspection. After dong, the yarn is delayed for a random
time representing coning, inspection, and packaging before it is classied as available for shipping.
Orders are shipped daily, ve days a week, limited by shipping capacity. Priority is given to blanket
orders. Measures of performance include production levels, order response times (shipping lead
times), inventory levels, frame utilization, and margin. For more details on the spinning simulation
model, see Clarke and King (1993) and Powell (1992).
DECISION SURFACE MODELING
The objective of decision surface modeling is to develop an interactive information system that
captures the essential features of each pipeline plant model (or integrated collection of such plant
models) in mathematical relationships between plant performance (inventory, order lead times,
etc.) and key decision parameters (product mix, number of machines, etc.). These \metamodels"
are intended to provide high-level management with rapid, easy-to-use capability to predict impact
on system performance or various \what-if" scenarios such as:
What are the consequences of broadening the product mix?
What are the cost/benets of reducing order lead times?
What are the cost/benets of introducing new equipment?
This section describes an attempt to build a \metamodel" of the relationship between key input
parameters and output performance measures for a spinning operation using neural networks. The
rest of the paper will describe this eort and results to date.
Neural Network Architecture
A neural network consists of several layers of computational units called neurons and a set of dataconnections which join neurons in one layer to those in another. The network takes inputs and
produces outputs through the work of trained neurons. Neurons usually calculate their outputs
5
using a signal-activation function of their inputs, where the signal-activation function has a sigmoid
shape. Throughout this study, we employ the logistic signal-activation function,
f (x) = 1 +1e,x for , 1 < x < 1:
(1)
The functional form (1) is used primarily for computational convenience. Using some known results (for example, input-output pairs observed in the target system), we assign a weight to each
connection to determine how an activation signal that travels along that connection inuences the
receiving neuron. The connections and their weights are the most important parameters in a neural
network model since they determine the outputs of the network. Because the weights of the connections can be adjusted by experiencing and incorporating more known results, the relationship of
the network's outputs to its inputs changes with exposure to additional responses from the target
system. Hence we say that a neural network has the ability to learn. The process of repeatedly exposing the network to known responses from the target system to estimate appropriate connection
weights is called training. More details on neural networks can be found in Freeman and Skapura
(1991), Gallant (1993), Smith (1993), and Zurada (1992).
There are two kinds of training methods, namely \supervised learning" and \unsupervised
learning". In supervised learning, we assume that on each occasion when an input is applied, a
corresponding target response of the system is also provided as a supervising signal. This supervising signal, or target response, is called the \teacher". The teacher provides a basis for estimating
and reducing errors between the neural network's output and the target response. To reduce this
error, we use a negative gradient direction for the sum of squared estimation errors as the basis for
better weight assignments. On the other hand, in unsupervised learning, the target response is not
provided and hence no error information is available. The learning must somehow be accomplished
purely based on system inputs and outputs about which we have little or no knowledge. Unsupervised learning is sometimes called \learning without a teacher" (Zurada, 1992). Throughout our
study, the outputs from the spinning simulation model are used as target responses in a supervised
learning environment.
The number of layers in a neural network and its connection structure greatly aects its
performance. Usually a fully connected multilayer network produces more accurate outputs but
with a higher computational burden. In our study, since the spinning operation is eventually going
to be linked with other pipeline operations, we selected a fully connected three-layer neural network,
consisting of input, hidden, and output layers to balance accuracy and computational requirements.
A schematic diagram of such a network is illustrated in Fig. 3.
As mentioned earlier, a neural network can learn. A commonly used learning rule is the socalled \delta learning rule". Assume that some initial weights have been suitably assigned before
each learning experiment starts. As shown in Fig. 4 for neuron k in the output layer (k = 1; : : :; K ),
the J 1 vector
T
k [Wk1 ; : : :; WkJ ]
W
6
Input
Layer
W11
V 11
x
y
V 21
1
1
o
y
x
Output
Layer
Hidden
Layer
1
2
2
o
2
x
V JI
I
y
J-1
y
o
W
KJ
K
J
Fig. 3. Neural network architecture
represents the connection weights to neuron k in the output layer from the J neurons in the hidden
layer. (Throughout this paper, the roman superscript T denotes the transpose of a vector or matrix.)
The corresponding J 1 vector
= [Y1 ; : : :; YJ ]T
represents the inputs to each neuron in the output layer from the J neurons in the hidden layer.
The learning signal r is in general a function of k , , and supervising signal dk . In our study, dk
is the kth performance measure generated by the simulation model for k = 1; : : :; K . For neuron k
in the output layer, the dependence of the learning signal on k , , and dk is expressed as
Y
W Y
r = r(Wk; Y; dk):
W Y
As explained later in this subsection, we take the learning rate to have the specic functional form
r(Wk ; Y; dk) = dk , f
h
W Y
WkTYi f 0WkTY ;
WY
(2)
where f 0 kT is the derivative of the signal-activation function (1) evaluated at kT .
If the delta learning rule is applied on successive training steps indexed by t (where t = 1; 2; : : :),
then at step t the increment in the current weight vector k (t) required to go to the next step is
given by
W
W
W Y
Y
k (t) = r[ k (t); (t); dk] (t);
where is a positive number called the learning constant. The rate of learning is determined by .
The new weight vector at training step (epoch) t + 1 is adapted from the weight vector at step t
7
y
y
1
W
k1
2
W
k2
W
T
o
f(W Y)
kj
k
yj
W
y
∆W
kJ
T
k
f’(W Y)
J
+
d
Y
r
d
- O
k
k
k
C
Fig. 4. The delta learning rule.
according to the dierence equation
Wk(t + 1) = Wk(t) + r[Wk (t); Y(t); dk] Y(t):
(3)
For a continuous-time learning process, the analogue of (3) is the dierential equation
d k (t) = r[ (t); (t); d ] (t):
W
dt
Wk Y
k
Y
In the following justication of the form of the learning rule (2), we suppress the dependence of
all quantities on the training step
t for
notational
simplicity. We dene output error as the dierence
T
between the output value ok = f k
and the supervising signal dk . The learning rule (2)
can be readily derived by applying the estimation principle of least squares to the output error,
where the squared error is dened as
h
i2
E = 12 (dk , ok )2 = 21 dk , f kT
:
(4)
By calculating the gradient vector of E with respect to k , we obtain an \error gradient vector",
WY
WY
W
rE = ,(dk , ok )f 0 WkTY Y:
The components of the gradient vector are
@E = ,(d , o )f 0WTY Y for j = 1; 2; :::; J:
k
k
j
k
@Wkj
The delta learning rule is used to minimize the sum of squared errors of the form (4) taken
over all training patterns; and thus at training step t, the required increment k (t) of the weight
vector k (t) must be in the negative gradient direction so that we take
W
W
8
W
k (t) = , rE (t):
Expressing the right-hand side of this last equation explicitly in terms of the signal-activation
function (1) and the weight vector k (t), we have
W
, rE (t) = dk , f [Wk (t)]T Y f 0 [Wk (t)]T Y
n
o n
o
Y:
(5)
In order to accelerate the convergence to an optimal weight assignment, we choose the method of
\error back-propagation with momentum" (Zurada, 1992) which modies k (t) by incorporating
a \momentum" applied to the previous information k (t , 1) so that we take
W
Wk (t) = , rE (t) + Wk (t , 1);
W
where: the arguments t and t , 1 are used to indicate the current and the most recent previous
training steps, respectively; , rE (t) is given by (5); and is a user-selected positive momentum
constant (Zurada, 1992).
Input Parameter Design
The key outputs from the simulation model used in this study were the average inventory level,
the percentage of spot orders shipped within 5 days, the percentage of blanket orders shipped
within 5 days, and the percentage of all orders shipped within 5 days. Based on earlier simulation
experiments with the spinning model (Powell, 1992), the key input parameters which aect these
performance measures are the number of yarns in the product line, the number of frames in the
plant, the frame utilization level required to meet the demand, the number of days required for
coning and inspection, the percentage of customer orders that are blanket orders, the target number
of days of inventory used for control purposes, and the average size of a customer order. Therefore,
for our three-layer neural network, there are seven input nodes and four output nodes. In addition,
for computational purposes, nine neurons are used in the hidden layer.
In the spirit of a two-level factorial experiment (Montgomery, 1991), we xed each input
parameter either at the lowest value of interest or at the highest value of interest to create patterns
(design points) for the experiment. Table 1 shows the extreme values used for the input parameters.
The selected extreme values of the input parameters yield 27 = 128 training patterns for the
neural network. For each sample, we replicated the spinning simulation four times in order to
estimate the mean and variance of the response at each pattern (design point).
For testing purposes, an additional 99 patterns were generated by the simulation model. First,
we xed all seven input parameters in the middle of the range of interest to provide one \center"
pattern. Next we xed six input parameters at their midrange values and set the remaining parameter at its highest and lowest values to generate 2 7 = 14 patterns. Then we xed ve input
parameters at their middle values and set the remaining two parameters at their highest and lowest
values to generate 4 21 = 84 patterns.
9
Table 1. Selected levels of key input parameters
Levels
Parameter
Low
High
# of Yarns
6 yarns 18 yarns
# of Frames
15 frames 45 frames
Utilization
82%
88%
Coning/Inspection
2 days
4 days
Blanket Orders
40%
100%
Target Inventory
6 days
12 days
Order Size
5000 lbs 15000 lbs
Test Runs and Validation
As mentioned earlier, a classic back-propagation neural network with momentum (BPNET) was
used in our study. In order to nd the optimal weight-assignment for the network, we rst trained
the neural network using the 128 patterns described above. As shown by the solid curve in Fig. 5,
after 500 training steps, the total mean square error (mse) is reduced to about 0.30. However, when
the resulting network was applied to the additional 99 patterns for validation purposes, the results
(shown by the dotted line in Fig. 5) were not very satisfactory. These results indicate that a linear
model is not sucient and we need additional training patterns to expose the potential curvature
of the response surface.
mse
10
Training
5
Validation
0
0
500
eopch
1000
Fig. 5. Validation pilot run for 128 patterns using back-propagation.
In order to assess the curvature of the response surface, the \center" pattern was reclassied
10
as a training pattern instead of a validation sample. In other words, we used 129 training patterns
and 98 validation samples for test runs. The result is shown in Fig. 6. The gure indicates that
the mse curve for the validation samples reached its minimum early. (The data for Fig. 6 showed
that the minimum was reached at training step t = 19; 813. Beyond the training step yielding
the minimum mse, the neural network becomes \overtrained". At this point, the normalized mean
square error for training attains a value of 0.03078 while the normalized mean square error for
validation is 0.19299. For the training result, more detailed test information is given in the Table 2.
1.5
1.0
mse
Training
Validation
0.5
0.0
0
50000
epoch
100000
Fig. 6. Validation pilot run for 129 patterns using back-propagation.
Table 2. Results with BPNET and 129 training patterns
Target response dk
Average Overall Blanket
Spot
Statistic
Inventory leadtime
orders
orders
Average relative error
,0.7457
5.2911
3.1547 12.6674
Variance of relative errors
16.7433 1542.7400 770.0309 1766.2286
Maximum relative error
9.1327 338.2040 242.5470 354.4183
Minimum relative error
,11.2743 ,20.7460 ,35.6387 ,24.3477
97.5th percentile of relative errors ,0.0395 12.06924
7.9434 19.9198
2.5th percentile of relative errors
,1.4518 ,1.4870 ,1.6340
5.4150
The rst row of numbers in the table indicates that over all 129 training patterns, the
Average relative error 100(ok , dk )=dk %
of the neural network output (prediction) ok vs. the simulation output (target response) dk has the
11
respective values ,0.75%, 5.29%, 3.15%, and 12.67% for the average inventory, the overall leadtime,
the blanket order leadtime, and the spot order leadtime, where a negative value indicates underprediction. Note that the combined average relative error of all the four predictions is 5.09%; and
the worst performance is seen in the prediction for the spot order leadtime. Hence the percentage
of spot orders lled within 5 days was most dicult to predict. Rows 2{5 of Table 2 respectively
display the variance of the relative errors, the maximum relative errors, and the minimum relative
errors for each target response computed over all 129 training patterns. Again the gures suggest
that prediction of the spot order leadtime performance is more dicult than the other performance
measures.
For the 98 validation samples, the detailed test information is shown in the Table 3.
Table 3. Results obtained with BPNET and 98 validation samples
Target response dk
Average Overall Blanket
Spot
Inventory leadtime orders
orders
Average relative error
,0.8874 3.5010 2.9610 ,1.6343
Variance of relative error
119.8984 27.6934 28.8958 66.7292
Maximum relative error
69.2617 24.7048 28.7201 22.3840
Minimum relative error
,40.0680 ,9.7861 ,3.5308 ,33.5616
97.5th percentile of relative errors
1.2805 4.5429 4.0253 ,0.0169
2.5th percentile of relative errors
,3.0553 2.4591 1.8967 ,3.2516
In this case the combined average relative error of all four predictions is only 0.99%. Notice
that the predictions for the validation samples actually are more accurate than those for training
patterns. In particular the average relative error for spot order leadtime performance has much
smaller magnitude in the validation patterns than in the training patterns. Although the spot
order performance was the most variable output measure in the actual simulation runs, the neural
network predictions are still relatively accurate. This qualies the neural network model as a good
tool for the prediction of performance in textile spinning operations.
Output Decision Surface
Using the optimal weights obtained through the validation samples, we see that the neural network
metamodel can predict the output measures, based on the given inputs, very eectively. However,
while we can represent the four-dimensional output surface in terms of the seven-dimensional input
parameters, we only provide three-dimensional plots. Ten groups of three-dimensional surface plots
can be found in Wu et al. (1994).
12
A NEURAL NETWORK WITH FUZZY CONTROL
Basic Ideas
After constructing the BPNET model for the textile spinning model, we began to study its \learning capability" for performance enhancement. The learning capability of a neural network is a
complicated issue. It depends on the structure of connections, the activation function, the number
of hidden layers and hidden neurons, as well as the learning rate. Each of these parameters typically
has a signicant impact on the performance of a neural network model. For example, excessively
large learning rates or too few hidden neurons may cause the network to diverge or converge to a
high error level (Hertz and Hu, 1992; Tveter, 1991). On the other hand, excessively small learning
rates or too many hidden neurons may result in lengthy computation time. Unfortunately, at this
time, there are very few quantitative rules to guide selection of these control parameters.
In this section we focus on controlling the learning rate, which was a xed constant in the
BPNET model described in the previous section. In general, we know that the learning rate must
be a small number to ensure convergence; and as the magnitude of the prediction error decreases
during the training process, the learning rate should decrease correspondingly. But the specic
value and the frequency of reduction of the learning rate still lie within the realm of guesswork.
It is only \vaguely" understood. This \vagueness" led us to study \fuzzy control" of the learning
rate of our neural network model (Hertz and Hu, 1992).
The basic idea is to apply a fuzzy-neuron controller to modulate the learning rate of each
input neuron throughout the training process. Roughly speaking, when the magnitude of the output error is \large" (positively or negatively), we would like to set a \relatively large" learning rate
to accelerate the learning process; when the magnitude of the output error is \medium" (positively
or negatively), we slow down the learning rate to a \medium" value; and nally when the magnitude of the output error is \small" (positively or negatively), we would like to adopt a \relatively
small" learning rate to ensure convergence. Hence a simple fuzzy-neuron control rule is established
according to the Table 4.
Table 4. Fuzzy control rule
Characteristic
Matching categories
output error, ok , dk NL NM NS ZE PS PM PL
learning rate, (t)
L M S ZE S M L
In Table 4, NL means \Negatively Large", NM means \Negatively Medium", NS means \Negatively Small", ZE means \Zero Equivalence", PS means \Positively Small", PM means \Positively
Medium", PL means \Positively Large", S means \Small", M means \Medium", and L means
\Large" (Kosko, 1992, and Zurada, 1992).
Note that this rule could be generic for all neural networks using back-propagation learning
algorithms. To deal with the vagueness of \Positively (Negatively) Large", \Positively (Negatively)
13
Medium" and \Positive (Negatively) Small", the output error is further dened by a fuzzy membership function (Kosko, 1992, and Smith, 1993). A fuzzy membership function Ae : Z ! [0; 1]
denes a membership value of the fuzzy set Ae between 0 and 1 for every element z in the universe
of disclosure Z . This number Ae(z ) indicates the degree to which the input variable z belongs
to the fuzzy set Ae. The membership function can take many dierent shapes depending on the
characteristics of the underlying problem. In practice, triangular or trapezoidal shapes are often
chosen for simplied computation. In this study, triangular membership functions of output error
were chosen. These membership functions are displayed in Fig. 7 and Fig. 8.
With the fuzzy control rule and fuzzy membership function described above and for a given
value of output error, our objective is to determine a corresponding learning rate, . One way to
nd the learning rate is to use the well-known \extension principle" (Zimmermann, 1988). Let
ZfE, PfS, PM
f , PfL , NfS , NM
f , and NfL be the membership functions of an output error that
f (zero), PS
f (positively small), PM
g (positively
respectively correspond to each of the fuzzy sets ZE
f (positively large), NS
f (negatively small), NM
g (negatively medium), and NL
g (negamedium), PL
tively large). In other words, for a given output error v , the quantity Ae(v ) indicates the degree
to which this output error is classied in category A, where A 2 fNS; NM; NL; ZE; PS; PM; PLg.
We use arg max Ae(v ) to indicate the category in which v has highest value of the membership
Ae
function. In case there is a tie in the degree of membership, we prefer to classify the output error in
a category with \smaller error magnitude", i.e., the priority is given to categories in the following
order: (i) ZE; (ii) PS or NS; (iii) PM or NM; and nally (iv) PL or NL.
In order to determine the learning rate, (t), for a particular training step t, we use the n
most recent observations of output error, fvt,n , vt,n+1 , ..., vt,1 g, where n is user-specied. At the
current training step t, the learning rate is determined by the following procedure.
For t = 1, (t) is arbitrarily assigned.
For 2 t n, assign
i
h
category = arg max 1min
(v ) :
it,1 Ae i
Ae
(6)
If category is PS or NS, then (t) = S (small).
If category is PM or NM, then (t) = M (medium).
If category is PL or NL, then (t) = L (large).
If category is ZE, then (t) = ZE (zero).
For t > n, assign
h
i
category = arg max t,nmin
(v ) :
it,1 Ae i
Ae
14
(7)
If category is PS or NS, then (t) = S (small).
If category is PM or NM, then (t) = M (medium).
If category is PL or NL, then (t) = L (large).
If category is ZE, then (t) = ZE (zero).
Notice that in (6) and (7), the quantity vi is the output error at training step i. The numerical
values of S, M, L, and ZE are specied by the user.
We illustrate (6) and (7) with a small example in which we choose n = 3, L = 1:0, M = 0:7,
S = 0:3, and ZE = 0. The membership functions for output error are dened as in Fig. 7. Table 5
shows how the learning rate (t) is assigned in the rst 10 training steps.
M
e
m
b
e
r
s
h
i
p
NL
Membership of Output_error
PS PM
NM NS ZE
PL
1.0
0.8
0.6
0.4
0.2
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Output_error
Fig. 7. The membership functions of output error for the ten-step example.
Table 5. Illustration of fuzzy control for the neural network
Step learning rate
Ae(vi) for each category A
output error
t
(t)
NS NM NL ZE PS PM PL
vi
1
1.0
0.0 0.0 0.0 0.0 0.0 0.000 1.000
0.700
2
1.0
0.0 0.0 0.0 0.0 0.0 0.000 1.000
0.650
3
1.0
0.0 0.0 0.0 0.0 0.0 0.000 1.000
0.600
4
1.0
0.0 0.0 0.0 0.0 0.0 0.000 0.700
0.555
5
1.0
0.0 0.0 0.0 0.0 0.0 0.000 0.667
0.550
6
1.0
0.0 0.0 0.0 0.0 0.0 0.300 0.367
0.505
7
1.0
0.0 0.0 0.0 0.0 0.0 0.333 0.333
0.500
8
1.0
0.0 0.0 0.0 0.0 0.0 0.633 0.033
0.455
9
0.7
0.0 0.0 0.0 0.0 0.0 1.000 0.000
0.400
10
0.7
0.0 0.0 0.0 0.0 0.0 1.000 0.000
0.380
15
In training step 1, we select an arbitrary learning rate, say 1.0. The resulting output error of
step 1 is 0.7, which is in the category PL with membership 1.0. By the above rule (6), we select
a L (large) learning rate for step 2, i.e., 1.0. The resulting output error in step 2 is 0.65, which
again is in the category of PL with membership 1.0. In order to nd the learning rate of step 3,
we use the previous output errors from steps 1 and 2, which are both in the category of PL. By
the formula (6), the category with the largest membership function is PL. Therefore, we assign an
L (large) learning rate to step 3, which is again 1.0. The resulting output error for step 3 is 0.60,
which is also in the category of PL with membership 1.0. For the learning rate of step 4, we use
the n = 3 output errors from steps 1, 2, and 3. Looking at their output errors, we nd that all
are in the category PL. Hence the category with the largest membership function is PL and we
assign L (large) learning rate to step 4. The resulting output error of step 4 is 0.555, which is in
the category of PL with membership 0.7, and in the others with membership 0.
Continuing in this manner, in order to nd the learning rate of step 10, we use the output errors
of training steps 7, 8, and 9. The output error of epoch 7 is 0.5, which is in the category of PL
with membership 0.333, and in the category of PM with membership 0.333. The output error of
epoch 8 is 0.455, which is in the category of PL with membership 0.033, and in the category of PM
with membership 0.633. The output error of epoch 9 is 0.4, which is in the category of PM with
membership 1.0. The smallest membership across all three epochs for PL is 0.0 at epoch 9, and
the smallest membership across the three epochs for PM is 0.333 at epoch 7. Notice that PM with
membership 0.333 is the largest membership among the categories. Therefore, we select PM as our
output error category. By the rule, M (medium) learning rate is selected for the epoch 10, which is
0.7.
In the spinning operation study, the membership functions of the output error were dened
as in Fig. 8. For computational purposes, we used 0.1 as the zero-equivalent learning rate ZE, 0.2
as the small learning rate S , 0.3 as the medium learning rate M, and 0.4 as the large learning rate
L. Also, the maximum number n of observations used to classify the current output error was
arbitrary set to n = 10. Further studies on the fuzzy membership assignment and optimal number
of observations are in progress.
Input Parameter Design
In order to compare the results of the fuzzy neuron control network with the classic back-propagation
network, the input parameter design was kept the same as that used with the back-propagation
network.
Test Runs and Validation
The fuzzy learning rate controller described above was used in conjunction with the classic backpropagation neural network for the textile spinning operation. We call this new model a \fuzzy
control neural network" (FCNET). In order to nd the optimal weight-assignment of the network,
16
M
e
m
b
e
r
s
h
i
p
NL
Membership of Output_error
PS PM
NM NS ZE
PL
1.0
0.8
0.6
0.4
0.2
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Output_error
Fig. 8. The membership functions of output error for the spinning simulation.
we applied the previously dened 129 training patterns and 98 validation samples for test runs.
The results are shown in Fig. 9. From the gure, we see that the curve for validation samples
reaches at its minimum very quickly (at training step t = 6; 925). At this point, the normalized
mean square error for training is 0.03439 while the normalized mean square error of the validation
sample is 0.19961. For the training results, detailed test information is given in Table 6, while this
information for the validation samples is given in Table 7.
Table 6. Results using FCNET and 129 training patterns
Target response dk
Average Overall
Blanket
Spot
Inventory leadtime
orders
orders
Average relative error
,0.5566
5.3466
3.9529 13.7162
Variance of relative error
20.1848 1693.3670 1067.8162 1204.2605
Maximum relative error
8.7217 346.1830 284.4080 233.7359
Minimum relative error
,15.0727 ,23.9870 ,38.9305 ,33.6958
97.5th percentile of relative errors
0.2187 12.4479
9.5920 19.7047
2.5th percentile of relative errors
,1.3319 ,1.7547 ,1.6862
7.7277
The combined average relative error of all four predictions is 5.61% for the training results,
while the performance of the prediction relating the spot order leadtime is again the worst. All
these results are consistent with those of the BPNET model but are obtained with far fewer epochs.
For the validation samples, the combined average relative error of all the four predictions is
,0.10%. Notice that the error for spot order leadtime is ,1.59% which is far lower than for the
training patterns. Again the validation sample results are consistent with those of the BPNET
model.
17
Table 7. Results using FCNET and 98 validation samples
Target response dk
Average Overall Blanket
Spot
Inventory leadtime orders
orders
Average relative error
,4.2366
3.0473 2.3809 ,1.5900
Variance of relative error
111.4241 31.9166 24.8861 104.1227
Maximum relative error
60.6293 24.3160 27.3872 35.2720
Minimum relative error
,42.9737 ,10.7228 ,4.6670 ,38.1166
97.5th percentile of relative errors ,2.1467
4.1658 3.3686
0.4303
2.5th percentile of relative errors
,6.3266
1.9288 1.3932 ,3.6105
0.8
mse
0.6
Training
0.4
Validation
0.2
0.0
0
50000
epoch
100000
Fig. 9. Validation pilot run for 129 patterns using fuzzy controller.
Output Analysis
Plots of the response surfaces estimated by the FCNET algorithm are displayed in Wu et al. (1994).
Several observations can be made here.
1. From Tables 6 and 7, we see that the overall average relative error is about 5.61% for the
training patterns and ,0.10% for the validation samples. This indicates better prediction on
the validation samples than on the training patterns.
2. The prediction of the FCNET metamodel is slightly worse than that of the BPNET metamodel
for the training patterns. This is due to the fact that the BPNET model xes its learning
rate at a small number while the FCNET model varies its fuzzy learning rates, which may
cause slight degradation in the accuracy of predictions based on FCNET. However, for the
18
validation samples, the prediction of the FCNET model is better than the BPNET model.
This implies that the FCNET model's prediction capability is comparable to the BPNET
model in practice.
3. The FCNET metamodel attains its optimal weight assignment at training step t = 6; 925 while
the BPNET metamodel attains its optimal weight assignment at training step t = 19; 813.
This shows the potential computational savings of the FCNET model. A comparison of the
learning performance is shown in the Fig. 10.
0.25
mse
0.20
0.15
FCNET
BPNET
0.10
0.05
0.00
0
50000
epoch
100000
Fig. 10. Estimation errors for back-propagation vs. the fuzzy controller.
CONCLUDING REMARKS
In this study, we have investigated two dierent metamodel estimation algorithms based on neural
networks, namely BPNET and FCNET. We have applied these algorithms to the prediction of
certain performance measures of a textile spinning operation based on the main input parameters
for that operation. According to our experience, both metamodel estimation algorithms are capable
of providing good predictions.
The results of using a fuzzy controller for the learning rate suggests a signicant potential for
speeding up training. For future studies, we will consider incorporating fuzzy controllers for other
parameters including the learning rate of the hidden layer, the rate of the activation function, and
the rate of the momentum method. We also will investigate the possibility of using a separate
neural network to specify the fuzzy membership of each control parameter.
19
BIBLIOGRAPHY
Clarke, L. A. M. & King, R. E. (1993). PD-Based inventory control for a textile spinning plant.
Technical Report #93-13, Department of Industrial Engineering, North Carolina State University, Raleigh, NC.
Freeman, J. A. & Skapura, D. M. (1991). Neural Networks: Algorithms, Applications, and Programming Techniques. Reading, MA: Addison-Wesley.
Gallant, S. I. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.
Hertz, D. B. & Hu Q. (1992). Fuzzy-neuron controller for backpropagation networks. In Proceedings
of the Third Workshop on Neural Networks (pp. 474{478).
Hunter, N. A., King, R. E., Nuttle, H. L. W., & Wilson, J.R. (1993). North Carolina apparel
pipeline modeling project. International Journal of Clothing Science and Technology, Vol. 5,
No. 3/4, pp. 19{24.
Kosko, B. (1992). Neural Networks and Fuzzy Systems, Englewood Clis, NJ: Prentice Hall.
Lord, P. R. (1974). Spinning conversion of ber to yarn. Unpublished Master's thesis, School of
Textiles, North Carolina State University, Raleigh, NC.
Moncarz, H. T. (1992). Information Technology Vision for the U.S. Fiber/Textile/Apparel Industry,
Internal Report NIST-IR 4986, National Institute of Science and Technology, U.S. Department
of Commerce, Washington, D.C.
Montgomery, D. C. (1991). Design and Analysis of Experiments, Second Edition. New York: John
Wiley & Sons.
Pegden, C. D., Shannon, R. E., & Sadowski, R. P. Introduction to Simulation Using SIMAN,
Second Edition. New York: McGraw-Hill.
Powell, K. A. (1992). Interactive decision support modeling for the textile spinning industry, Unpublished Master's thesis, Department of Industrial Engineering, North Carolina State University, Raleigh, NC.
Rohlena, V. (1975). Open-End Spinning. Amsterdam, NY: Elsevier Science Ltd.
Smith, M. (1993). Neural Networks for Statistical Modeling. New York: Van Nostrand Reinhold.
Tveter, D. (1991). Getting a fast break with backprop. AI Expert, Vol. 6, pp. 36{43.
Wu, P., Fang, S. C., Nuttle, H. L. W., King, R. E., & Wilson J. R. (1994). Decision surface
modeling of textile spinning operations using neural network technology. Technical Report
#94-1, Department of Industrial Engineering, North Carolina State University, Raleigh, NC.
Zimmermann, H. J. (1988). Fuzzy set Theory and Its Applications. Norwell, MA: Kluwer Nijho
Publishing.
Zurada, J. M. (1992). Introduction to Articial Neural Systems. St. Paul, MN: West Publishing
Company.
20
Download