Nghiên cứu cải tiến thuật toán học của mạng nơron

advertisement
Nguyễn Hữu Công và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
93(05): 53 - 59
A STUDY TO IMPROVE A LEARNING ALGORITHM
OF NEURAL NETWORKS
Cong Huu Nguyen*1, Thanh Nga Thi Nguyen2, Ngoc Van Dong3
1
Thai Nguyen University, 2College of Technology – TNU,
3
Ha Noi Vocational College of electrical mechanical
ABSTRACT
Since the last mid- twentieth century, the study of optimization algorithms, especially on the
development of
digital computers, is
increasingly
becoming an
important
branch of
mathematics. Nowadays, those mathematical
tools are practically
applied to
neural
networks training. In the process of finding an optimal algorithm to minimize the convergence
time of the solution or avoiding the weak minima, local minima, the problems are starting to study
the characteristics of the error surface. For the complex error surface as cleft-error surface, that its
contours are stretched, bent forming cleft and cleft shaft, the old algorithms can not be settled.This
paper proposes an algorithm to improve the convergence of the solution and the ability to exit
from undesired areas on the error surface.
Keywords: neural networks, special error surface, local minima, optimization, algorithms
BACKGROUND*
In the process of finding an optimal algorithm
to minimize the convergence time of the
solution or to avoid tweak m inima, local
minima, the problems are starting to study the
characteristics of the error surface
and take it as
a
starting
point for
improvement or
propose a
new training
algorithm. When mentioning about the neural
networks, trained network quality is usually
offered
(supervised
learning). This
related quality function
and led to
the
concept of
network
quality surface. Sometimes, we also call the
quality surface
by other terms: the error
surface, the executing surface. Figure 1 shows
an error surface. There are some special
things to note for this surface such as: the
slope is drastically changing on the parameter
space. For this reason, it will be difficult
to choose an apprppriate pace for learning
algorithm known as the steepest descent
algoritm, conjugate gradient… In some areas
of the error surface is very flat, allowing large
learning rate, while other regions of big
slopes, require a small learning rate.
Other methods such as rules of torque,
adaptive learning rate VLBP (Variable
*
Tel: 0913 589758, Email: conghn@tnu.edu.vn
Learning Rate Back propagation algoritm) are
not effective in this problem [5].
Thus, with the complex quality surfaces are
more difficult in the process of finding the
optimal weights and can still blocked at the
shaft
of
the cleft before
reaching the
minimum point, if the quality surface is the
cleft form. Probably,
having
next strategy to solve this problem is that after
reaching near
the cleft
shaft by gradient
method
with
calculated
step
approach minimized following line (or with s
pecified learning steps) we will move along
the
bottom
of a
narrow
cleft through the gradually
asymptotic
geometry,
it
is
assumed that
the
geometry is a line
or approximately
quadratic curve.
The objective of this paper is to study and
apply
the cleft
algorithm to
calculate
the learning step for finding the optimal
weights of neural networks to solve the
control problem.
CLEFT-OVERSTEP ALGORITHM
NEURAL NETWORK TRAINING
FOR
Cleft-overstep principle
Examining the unconstrained minimizing
optimization problem:
J(u) → min, u∈ En (1)
53
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Nguyễn Hữu Công và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
Where u is the minimizing vector in an ndimensional Euclidean space, J(u) is the target
lim J ( x ) = b
function which satisfies
u →∞
(2)
The optimization algorithm for problem (1)
has the iteration equation as follows:
(3)
u k +1 = u k + α K s k , k = 0,1.. ...
k
k+1
where u and u are the starting point and the
ending point of the kth iteration step, sk is the
vecgtor which show the changed direction of
numeric variables in n-dimentional space;
α k is the step length.
α k is determined according to the cleftoverstep principle and called a “cleftoverstep” step and equation (3) is called the
cleft-overstep algorithm.
The basic difference between cleft-overstep
method and other methods is in the principle
for step adjustment. According to this
principle, the step length of the searching
point at each iteration step is not smaller than
the smallest step length at which the target
function reaches the (local) minimum value in
the moving direction at that iteration step.
The searching optimization trajectory of the
cleft-overstep principle creates a geometric
picture in which the searching point
“oversteps” the cleft bottom at each iteration
step. To specify the cleft-overstep principle
we examine a one numeric variable function at
each iteration step [4]:
h (α ) = J ( u k + α .s k )
(4)
Suppose that sk is the direction of the target
function at the point uk. According to
condition (2), there is a smallest value α * > 0
so
that
h(α)
reaches
minimum:
α * = arg min h (α ) , α > 0
(5)
k
If J(u ), this also means h(α), continuously
differentiable, we can define the cleft-overstep
step as follows:
h ' (α )
α =α v
93(05): 53 - 59
( α v is the overstep step, means that it
oversteps the cleft)
The variation graph of function h(α), when the
optimization trajectory changes from the
starting point uk to the ending point uk+1 is
illustrated in figure 2. We can see that when
the value α ascends from 0, go through the
minimal point α * of h(α), to the value α v ,
the corresponding optimization trajectory
moves forward parallely with sk in the
relationship that u k +1 = u k + α K s k , k = 0,1.. and
takes a step length of α = α v ≥ α * . This graph
also shows that, considering the moving
direction, the target function changes in the
descending direction from point uk, but when
it reaches point uk+1 it changes to the
ascending direction.
If we use moving steps according to condition
(5), we may be trapped at the cleft axis and the
corresponding optimization algorithm is also
trapped at that point. Also, if the optimization
process follows condition (6), the searching
point is not allowed to be located at the cleft
bottom before the optimal solution is obtained
and, simultaneously, it always draw a
trajectory which overstep the cleft bottom. In
order to obtain effective and stable iteration
process, condition (6) is substituted by
condition (7).
α v ≥ α * = arg min h (α ) , h (α v ) − h* ≤ λ  h0 − h* 
α >0
(7)
Where: 0 < λ < 1 is called overstep
coefficient
h * = h(α * ) h 0 = h(α 0 )
Determining the cleft-overstep step
> 0, h (α v ) ≤ h ( 0 )
(6)
Figure 1: Cleft-similar error surface
54
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Nguyễn Hữu Công và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
93(05): 53 - 59
the Backpropagation
procedure
and
learning step calculated by principles of cleftoverstep principle. The example: for an input
vector, neural networks have
to
answer what it is. The
software provides
a structured network of 35 input neurons,
5 middle
layer neurons,
10 output
layer neurons. Sigmoid is
activated
function; this function’s characteristic is easy
to make the cleft - error surface. [1]
Start
Figure 2: Determining the cleft-overstep step
α=a =0.5
Correctness ε>0
γ=0.1
Initialized u0
Searching
direction s0
vượt khe α v
Choosing the length of learning step in the
cleft problem is of grave importance. If this
length is too short, the running time of
computers will be long. If this length is too
long, there may be difficulties in searching
process because it is difficult to observe the
curvedness of the cleft. Therefore, adaptive
learning step for the cleft problem is essential
in the process of searching for optimal
solution. In this section, we propose a simple,
yet effective way to find the cleft-overstep
step. Suppose that J(u) is continuous and
satisfies the condition limJ(u) = ∞ when
k-1
u → ∞ and at each iteration k, point u and
moving vector sk-1 was determined. We need
to determine the length of the step α k which
satisfies condition (7).
If instead of h* in (7) by estimate
h(α)=h(u+
as)
0
α=β
β=a
β=1.5β
0
h(α)≤h(
β)
h ≈ h * , h > h * , we still get cleft-overstep by
definition.
Therefore,
to simplify programming
should take the
smallest value of h
simply
calculated
corresponding in each iteration,
without accurately
determining h *. It also
significantly reduces the number of objective
function’s value. We have
following algorithm:
( )
αk
T
θ = α + γ (β −
( )
1
α=0
β=a
1
0
α −β ≤ε
1
α=θ
identified the
hk' (αk ).hk' (0) = J s' u k −1 S k −1J s' u k S k −1 < 0
h(α)≥h(
0)
0
h(θ)≤h(
α)
α=β
1
T
(8)
PROGRAM AND RESULTS
To illustrate above remarks, here we offer
a method using neural network training with
β=θ
End
Figure 3: Diagram of algorithm that
determines the cleft-overstep learning step
55
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Nguyễn Hữu Công và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
The principle of network training algoritms is
done
by
Backpropagation
procedure
associated with learning step calculated by
cleft-overstep
principle. Cleft-overstep
algoritm has been presented in section 2.
93(05): 53 - 59
v=
M −1
∑ bj y j
j =0
v: total weight of
so
∂ v / ∂ bj = yj (ignoring the
index
of neurons in output layer)
We use J = 0.5 * (z-t)2, so ∂ J / ∂ z = (z-t).
Activated function
of
output
layer
neuron is sigmoid z = g (v), with ∂ z / ∂ v = z
(1-z).
We have:
∂J ∂J ∂z ∂v
= . . = ( z − t ).z (1 − z ) . y
∂b ∂z ∂v ∂b
(9)
Since then we have the updated formula of
output layer’s weight as follows (ignoring the
∆b = α .( z − t ) .z.(1 − z ) . y
Figure 4: Structure of neural network for
recognition
Thus with the use of the fastest descent
methods to update the weight of the network,
we need the information related to separate
derivative of
the
error
function in
each weight,
that
is to
determine
formulate and update algorithm of
weight
in the hidden layer and output layer. For
a given sample set, we will calculate the
derivative of error function by taking the
derivative’s
sum on each sample in that
set. Analysis method and the derivative are
based on
the "chain rule". According
to the slope of the tangent to the error
curve in the w – axis cross section called the
partial derivative of error function J taken by
that weight, denoted ∂ J/∂ w, using the chain
rule we have:
∂J ∂J ∂w1 ∂wn
=
.
...
∂w ∂w1 ∂w2 ∂w
Adjust the weights of output layer:
Define:b: the weight of output layer
z: the output of output layer.
t: the desired target value
yj: the output of neurons in the hidden
layer
indices):
(10)
We
will
use formula (10) [4]
in the procedure DIEUCHINHTRONGSO ()
to
adjust
the
weight
of
output
layer, learning rate α is calculated according
to the principle of the cleft-overstep principle.
Adjust the weights of hidden layers:
The derivative of the objective function for
a weight
of hidden
layer is
calculated by chain rule:
∂J ∂a = ( ∂J ∂y ) .( ∂y ∂u ) .( ∂u ∂a )
Define: a: the weight of hidden layer
y: output of the neuron in the hidden
layer
xi: the components of the input vector
of input layer
u: total
weight
u=
N −1
∑ ai xi
i =0
so ∂ u / ∂ ai = xi
k: index of neuron in output layer
We have objective derivative of the weight
of hidden layer
∂J K−1
= ∑( zk − tk ).zk .(1− zk ) .bk .y.(1− y) .xi
∂ai k=0
(11)
From here, we can adjust formula for
weight of the hidden layer as below:
K−1
∆ai =α.∑( zk −tk ).zk.(1−zk ).bk.y.(1− y).xi
k=0
(12)
56
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Nguyễn Hữu Công và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
In
this formula, the
index i denotes the
ith neuron of input layer, the index k denotes
the kth neuron of output layer.
We
will
use formula
(12) [4] in
the procedure DIEUCHINHTRONGSO () to
adjust
the
weights
of hidden
layer, learning rate α is calculated according
to the principle of the cleft-overstep principle.
The network structure
Using
the sigmoid function
that is prone to produce the narrow cleft network
quality, the
f =
equation
1
1 + exp ( -x )
(13)
Example
Recognizing the characters are the digits 0, 1,
... 9; [1]
Comparison
of convergence of
the three learning
step methods:
cleft-
93(05): 53 - 59
overstep principle, fixed step and gradually
descent step.
We use a matrix of 5 × 7 = 35 for each
character encoding. Corresponding to each
input vector x is a vector of size 35 × 1, with
components receiving the value of 0 or 1.
Thus, we can select the input layer with
35 inputs. To distinguish ten characters, we
make the output layer is 10. For the hidden
layer, five neurons are selected, so
Hidden
layer weight
matrix: W1,1,
size 35 × 5
Output layer weight matrix: W2,1, size 5 × 10
Input vector x, size 35 × 1
Hidden layer output vector y, size 5 × 1
Output layer output vector z, size 10 × 1
After compiling with Visual C + +, run the
program, in turn training the network by 3
methods:
fixed learning step, gradually
descent
step and cleft
overstep. Each method, we try to train 20
times. The result gives the following table:
Table 1: Input sample file records {0 1 2 3 4 5 6 7 8 9}
No
Gradually descent step
from 1
Fail
Fail
7902 (iteration)
3634 (iteration)
7213
2416
Fail
2908
12570
2748
9709
3169
Fail
2315
9173
2375
Fail
Fail
8410
2820
10333
2618
12467
2327
Fail
3238
9631
2653
12930
2652
10607
Fail
Fail
2792
7965
2322
11139
2913
Fail
2689
Average: 10003 iterations, 7 Average: 2740 iteration, 3
Fail/20
Fail/20
Fixed Step 0.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Summary
Cleft-overstep
Fail
23 (iteration)
50
34
31
42
43
33
34
33
32
39
44
Fail
31
53
31
42
42
33
Average: 35 iteration,
fail/20
2
57
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Nguyễn Hữu Công và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
Comments:
We have trained the network by three different
methods and have realized that learning by
cleft overstep principle has much higher
convergence speed, the number of fail times is
also reduced.
One drawback of the cleft overstep principle
is that the time for calculating by computer
in each iteration is long, this is because we
defined that the constants FD = 1 - e4 is
small. However,
the
total
network training time is more beneficial.
CONCLUSION
In this paper, the authors have proposed
successfully the use of "cleft-overstep"
algorithm to improve the neural network
training having the special error surface and
have illustrated particularly through the
application of hand writing recognition.
Through research and experimentation,
gained results have shown that: with the
neural network structure that error surface is
shaped
deep
cleft,
still
use
this
Backpropagation algorithm but applying
"cleft-overstep" to train the network gives us
more accuracy and faster convergence speed
than the gradient method.
The use of "cleft-overstep" algorithm can be
applied to train a neural network structure that
has the special error surface. Thus, the results
of this study can be applied to many other
problems in the field of telecommunications,
control,
and
information
technology.
The paper should be more research on the
identification of vector direction searching in
the algorithm "cleft-overstep"and the
93(05): 53 - 59
changing the assess standard of the quality
function to reduce the complexity of the
calculation process on the computer
[6]. However, the results of this study have
initially reflected the correctness of the
proposed algorithm and revealed possibilities
to practical applications.
REFERENCES
[1]. Cong Huu Nguyen; Thanh Nga Thi
Nguyen; Phuong Huy Nguyen(2011); Research on
the application of genetic algorithm combined
with the “cleft-overstep” algorithm for improving
learning process of MLP neural network with
special error surface , Natural Computation
(ICNC), 2011 Seventh International Conference on
Issue Date: 26-28 July 2011; On page(s): 222 - 227 .
[2]. Maciej Lawrynczuk (2010), “Training or
neural models for predictive control”, Insitute of
control and computation Engineering, Faculty of
Electronics and Information Technology, Warsaw
University of Technology, ul. Nowowiejska 15/19,
00-665 Warsaw, Poland, Neurocomputing 73.
[3]. Thuc Nguyen Dinh & Hai Hoang Duc,
Artificial Intelligence – Neural Network, Method
and Application, Educational Publisher, Ha noi.
[4]. Nguyen Van Manh and Bui Minh Tri, Method
of “cleft-overstep” by perpendicular direction for
solving the unconstrained nonlinear optimization
problem, Acta Mathematica Vietnamica, vol. 15,
N02, 1990.
[5]. Hagan, M.T., H.B. Demuth and M.H Beal,
Neural Networks Design, PWS Publishing
Company, Boston, 1996.
[6]. R.K. Al Seyab, Y. Cao (2007) “Nonlinear
system identification for predictive control using
continuous time recurrent neural networks and
automatic differentiation”, School of Engineering
Cranfield University, College Road, Cranfield,
Bedford MK43 0AL, UK, Science Direct
58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Nguyễn Thị Hồng Hoa và Đtg
Tạp chí KHOA HỌC & CÔNG NGHỆ
93(05): 61 - 64
TÓM TẮT
NGHIÊN CỨU CẢI TIẾN THUẬT TOÁN HỌC CỦA MẠNG NƠRON
Nguyễn Hữu Công*1, Nguyễn Thị Thanh Nga2, Đồng Văn Ngọc3
1
Đại học Thái Nguyên, 2Trường Đại học Kỹ thuật Công nghiệp – ĐH Thái Nguyên
3
Trường Cao đẳng nghề Cơ điện Hà Nội
Từ giữa thế kỷ XX, nghiên cứu về các thuật toán tối ưu hóa, đặc biệt là sự phát triển của kỹ thuật
số máy tính, đang ngày càng trở thành một lĩnh vực quan trọng trong toán học. Ngày nay, những
công cụ toán học này được áp dụng để huấn luyện các mạng nơron. Trong quá trình tìm kiếm một
thuật toán tối ưu để giảm thiểu thời gian hội tụ hoặc tránh các cực tiểu yếu, cực tiểu địa phương,
đều bắt đầu từ việc nghiên cứu các đặc tính của mặt lỗi (mặt sai số). Đối với các bề mặt lỗi phức
tạp như mặt lỗi có bề mặt hở, đường nét của nó được kéo dài, uốn cong hình thành trục và hở hàm
ếch, các thuật toán cũ không thể giải quyết được. Bài báo này đề xuất một thuật toán để cải thiện
sự hội tụ và khả năng để thoát ra từ các khu vực không mong muốn trên các bề mặt lỗi đặc biệt.
Từ khóa: mạng nơron, mặt lỗi đặc biệt, cực tiểu địa phương, tối ưu hóa, thuật toán học
Ngày nhận bài: , ngày phản biện: , ngày duyệt đăng:
*
59
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
http://www.lrc-tnu.edu.vn
Download