Journal of Mathematical Research with Applications Jan., 2014, Vol. 34, No. 1, pp. 114–126 DOI:10.3770/j.issn:2095-2651.2014.01.012 Http://jmre.dlut.edu.cn A Modified Gradient-Based Neuro-Fuzzy Learning Algorithm for Pi-Sigma Network Based on First-Order Takagi-Sugeno System Yan LIU1,2 , Jie YANG1 , Dakun YANG1 , Wei WU1,∗ 1. School of Mathematical Sciences, Dalian University of Technology, Liaoning 116024, P. R. China; 2. School of Information Science and Engineering, Dalian Polytechnic University, Liaoning 116034, P. R. China Abstract This paper presents a Pi-Sigma network to identify first-order Tagaki-Sugeno (T-S) fuzzy inference system and proposes a simplified gradient-based neuro-fuzzy learning algorithm. A comprehensive study on the weak and strong convergence for the learning method is made, which indicates that the sequence of error function goes to a fixed value, and the gradient of the error function goes to zero, respectively. Keywords first-order Takagi-Sugeno inference system; Pi-Sigma network; convergence. MR(2010) Subject Classification 47S40; 93C42; 82C32 1. Introduction The combination of neural networks and fuzzy set theory is showing special promise and is a growing hot topic in recent years. The Takagi-Sugeno (T-S) fuzzy model was proposed by Takagi-Sugeno [1], which is characterized as a set of IF-THEN rules. Pi-Sigma Network (PSN) [2] is a class of high-order feedforward network and is known to provide inherently more powerful mapping abilities than traditional feedforward neural networks. A hybrid Pi-Sigma network was introduced by Jin [3], which is capable of dealing with the nonlinear systems more efficiently. Combination of the benefits of high-order network and Takagi-Sugeno inference system makes Pi-Sigma network have a simple structure, less training epoch and fast computational speed [4]. Despite numerous works dealing with T-S systems analysis on identification and stability, fewer studies have been done concerning the convergence of the learning process. In this paper, a comprehensive study on convergence results for Pi-Sigma network based on first-order T-S system is presented. In particular, the monotonicity of the error function in the learning iteration is proven. Both the weak and strong convergence results are obtained, Received July 19, 2012; Accepted November 25, 2012 Supported by the Fundamental Research Funds for the Central Universities, the National Natural Science Foundation of China (Grant No. 11171367) and the Youth Foundation of Dalian Polytechnic University (Grant No. QNJJ 201308). * Corresponding author E-mail address: wuweiw@dlut.edu.cn (Wei WU) A modified learning algorithm for Pi-Sigma network based on first-order Takagi-Sugeno system 115 indicating that the gradient of the error function goes to zero and the weight sequence goes to a fixed point, respectively. The remainder of this paper is organized as follows. A brief introduction of first-order Takagi-Sugeno system and Pi-Sigma neural network is proposed in the next section. Section 3 demonstrates our modified neuro-fuzzy learning algorithm based on gradient method. The main convergence results are provided in Section 4. Section 5 presents a proof of the convergence theorem. Some brief conclusions are drawn in Section 6. 2. First-order Takagi-Sugeno inference system and high-order network 2.1. First-order Takagi-Sugeno inference system A general fuzzy system, which is a T-S model, is comprised of a set of IF-THEN fuzzy rules having the following form: Ri : If x1 is A1i and x2 is A2i and . . . and xm is Ami then yi = fi (·), (1) where Ri (i = 1, 2, . . . , n) denotes the i-th implication, n is the number of the fuzzy implications of the fuzzy model, x1 , . . . , xm are the premise variables, fi (·) is the consequence of the i-th implication, which is a nonlinear or linear function of the premises, and Ali is the fuzzy subset whose membership function is continuous piecewise-polynomial function. In the first-order Takagi-Sugeno system (T-S1), the output function fi (·) is a first order polynomial of the input variables x1 , . . . , xm and the corresponding output yi is determined by [5, 6] yi = p0i + p1i x1 + · · · pmi xm . (2) Given an input x = (x1 , x2 , . . . , xm ), the final output of the fuzzy model is expressed by y= n X hi y i , (3) i=1 where hi is the overall truth value of the premises of the i-th implication calculated as hi = A1i (x1 )A2i (x2 ) . . . Ami (xm ) = m Y Ali (xl ). (4) l=1 We mention that there is another form of (3), that is [7], n n ³X ´ ³X ´ y= hi y i / hi . i=1 (5) i=1 For simplicity of learning, a common strategy is to obtain the fuzzy consequence without computing the center of gravity [8]. Therefore, we adopt the form of (3) throughout our discussions. 2.2. Pi-Sigma neural network The conventional feed-forward neural network has summary nodes which is difficult to identify some complex problem. A hybrid Pi-Sigma neural network is shown in Figure 1. In this 116 Yan LIU, Jie YANG, Dakun YANG, et al pi 6 6 6 3 A11 A12 A1n x1 h1 A22 A2 n Am1 xm Am 2 Amn 6 A21 x2 3 h2 3 hn Figure 1 Topological structure of the first-order Takagi-Sugeno inference system high-order network, Σ denotes the summing neurons, Π denotes the product neurons. The output of pi-sigma neural network is y= n X hi y i = i=1 m n X ¡Y i=1 ¢ µAli (xl ) (p0i + p1i x1 + · · · + pmi xm ) = hPx, (6) l=1 where “ · ” denotes the usual inner product, pi = (p0i , p1i , . . . , pmi )T , i = 1, 2, . . . , n, h = (h1 , h2 , . . . , hn ), P = (pT1 , pT2 , . . . , pTn )T , x = (1, x1 , x2 , . . . , xm )T . From (6), it can be seen that this Pi-Sigma network is one form of T-S type fuzzy system. For fuzzy system implemented by this type network, the degree of membership function and parameters can be updated indirectly. For more efficient identification of nonlinear systems, Gaussian membership function is commonly used for the fuzzy judgment “xl is Ali ” which is defined by ¡ ¢ ¡ ¢ 2 Ali (xl ) = exp − (xl − ali )2 /σli = exp − (xl − ali )2 b2li , (7) where ali is the center of Ali (xl ), and σli is the width of Ali (xl ), bli is the reciprocal of σli (xl ), i = 1, 2, . . . , n, l = 1, 2, . . . , m. What we need is to preset some initial weights, which will be updated to their optimal values when some learning algorithm is implemented. 3. Modified gradient-based neuro-fuzzy learning algorithm Let us introduce an operator “¯” for the description of the learning method. Definition 3.1 ([8]) Let u = (u1 , u2 , . . . , un )T ∈ Rn , v = (v1 , v2 , . . . , vn )T ∈ Rn . Define the operator “¯” by u ¯ v = (u1 v1 , u2 v2 , . . . , un vn )T ∈ Rn . It is easy to verify the following properties of the operator “¯”: 1) ku ¯ vk ≤ kukkvk, 2) (u ¯ v) · (x ¯ y) = (u ¯ v ¯ x) · y, 3) (u + v) ¯ x = u ¯ x + v ¯ x, A modified learning algorithm for Pi-Sigma network based on first-order Takagi-Sugeno system 117 where u, v, x, y ∈ Rn , and “·” and “|| ||”represent the usual inner product and Euclidean norm, respectively. Suppose that the training sample set is {xj , Oj }Jj=1 ⊂ Rm × R for Pi-Sigma network, where x and Oj are the input and the corresponding ideal output of the j-th sample, respectively. The fuzzy rules are provided by (1). We denote the weight vector connecting the input layer and the Σ layer by pi = (p0i , p1i , . . . , pmi )T , (i = 1, 2, . . . , n) (see Figure 1). Similarly, we denote the centers and the reciprocals of the widths of the corresponding Gaussian membership functions by j ai = (a1i , a2i , . . . , ami )T , bi = (b1i , b2i , . . . , bmi )T = ( 1 1 1 T , ,..., ) , 1 ≤ i ≤ n, σ1i σ2i σmi (8) respectively, and take them as the weight vector connecting the input layer and membership layer. For simplicity, all parameters are incorporated into a weight vector ¡ ¢T W = pT1 , pT2 , . . . , pTn , aT1 , . . . , aTn , bT1 , . . . , bTn . (9) The error function is defined as E(W) = J J n J X X X 1X j (y − Oj )2 = gj ( hji (pi · xj )) = gj (hj Pxj ), 2 j=1 j=1 i=1 j=1 (10) where Oj is the desired output for the j-th training pattern xj , y j is the corresponding fuzzy reasoning result, J is the number of training patterns, and hj = (hj1 , hj2 , . . . , hjn )T = h(xj ), gj (t) = 1 (t − Oj )2 , t ∈ R, 1 ≤ j ≤ J. 2 (11) The purpose of the network learning is to find W∗ such that E(W∗ ) = min E(W). (12) The gradient descent method is often used to solve this optimization problem. Remark 3.1 ([8]) Due to this simple simplification, the differentiation with respect to the denominator is avoided, and the cost of calculating the gradient of the error function is reduced. Let us describe our modified gradient-based neuro-fuzzy learning algorithm. Noting (8) is valid, then we have hjq = m Y l=1 Alq (xjl ) = m Y m ³X ¢ ¡ ¢´ ¡ − (xjl − alq )2 b2lq . exp − (xjl − alq )2 b2lq = exp l=1 (13) l=1 The gradient of the error function E(W) with respect to pi is given by J ∂E(W) X 0 j j j j = gj (h Px )hi x . ∂pi j=1 (14) 118 Yan LIU, Jie YANG, Dakun YANG, et al To compute the partial gradient ∂E(W) ∂ai , we note, ∀1 ≤ i ≤ n, 1 ≤ q ≤ n ³P ³P m ¡ m ¡ ¢´ ¢´ j 2 2 ∂ exp − (xjl − ali )2 b2li ∂ exp − (xl − alq ) blq j ∂hq l=1 l=1 , q = i, = = ∂ai ∂ai ∂ai 0, q 6= i, and ∂ exp ³P m l=1 ´ (−(xjl − ali )2 b2li ) ∂ai = (2hji (xj1 − a1i )b21i , · · · , 2hji (xjm − ami )b2mi )T ¡ ¢ = 2hji (xj − ai ) ¯ bi ¯ bi . (15) (16) It follows from (10), (15), (16), that, for 1 ≤ i ≤ n, the partial gradient of the error function E(W) with respect to ai is J n ∂hjq ∂E(W) X 0 j j X = gj (h Px )( (pq · xj ) ) ∂ai ∂ai q=1 j=1 =2 J X ¡ ¢ gj0 (hj Pxj )(pi · xj )hji (xj − ai ) ¯ bi ¯ bi . (17) j=1 Similarly, for 1 ≤ i ≤ n, the partial gradient of the error function E(W) with respect to bi is n J ∂hjq ¢ ∂E(W) X 0 j j ¡ X (pq · xj ) = gj (h Px ) ∂bi ∂bi q=1 j=1 = −2 J X ¡ ¢ gj0 (hj Pxj )(pi · xj )hji (xj − ai ) ¯ (xj − ai ) ¯ bi . (18) j=1 Combined with (14), (17) and (18), the gradient of the error function E(W) with respect to W is constructed as follows ∂E(W) ³ ∂E(W) T ∂E(W) T ∂E(W) T ∂E(W) T = ( ) ,...,( ) ,( ) ,...,( ) , ∂W ∂p1 ∂pn ∂a1 ∂an ∂E(W) T ´T ∂E(W) T ) ,...,( ) . (19) ( ∂b1 ∂bn Preset an arbitrary initial value W0 , the weights are updated in the following fashion based on the modified neuro-fuzzy learning algorithm Wk+1 = Wk + ∆Wk , k = 0, 1, 2, . . . , (20) where ¡ ¢T ∆Wk = (∆pk1 )T , . . . , (∆pkn )T , (∆ak1 )T , . . . , (∆akn )T , (∆bk1 )T , . . . , (∆bkn )T , and ∂E(W) ∂E(W) ∂E(W) , ∆aki = −η , ∆bki = −η , 1 ≤ i ≤ n, ∂pi ∂ai ∂bi η > 0 is a constant learning rate. (20) is also given by ∆pki = −η Wk+1 = Wk − η ∂E(W) , k = 0, 1, 2, . . . . ∂W (21) (22) A modified learning algorithm for Pi-Sigma network based on first-order Takagi-Sugeno system 119 4. Convergence theorem To analyze the convergence of the algorithm, we need the following assumption: (A) There exists a constant C0 > 0 such that kpki k ≤ C0 , kaki k ≤ C0 , kbki k ≤ C0 for all i = 1, 2, . . . , n, k = 1, 2, . . . Let us specify some constants to be used in our convergence analysis as follows: M = max {kxj k, kOj k}, 1≤j≤J C1 = max{C0 + M, (C0 + M )C0 }, C2 =2JC0 C1 (nC0 C1 + C1 ) max{C01 , C12 }+ 2C02 C12 (nC0 C1 + C1 ) + 2JC0 C12 (C0 + C1 )(nC0 C1 + C1 ), 1 C3 =J(nC0 C1 + C1 ) max{ , 4C02 C12 , 4C14 }, 2 C4 =4nJC12 max{C02 C12 , C12 , 1}, C5 =C2 + C3 + C4 where xj is the j-th given training pattern, Oj is the corresponding desired output, n and J are the numbers of the fuzzy rules and the training patterns, respectively. Theorem If Assumption (A) is valid, the error function E(W) is defined in (10) and the learning rate η is chosen such that 0 < η < C15 is satisfied, then starting from an arbitrary initial value W0 , the learning sequence {Wk } is generated by (22) and (19), and we have (i) E(Wk+1 ) ≤ E(Wk ), k = 0, 1, 2, . . .; there exists E ∗ > 0 such that limk→∞ E(Wk ) = E ∗ ; (ii) limk→∞ EW (Wk ) = 0. 5. Proof of the convergence theorem The proof is divided into two parts dealing with Statements (i) and (ii), respectively. Proof of Statement (i) For any 1 ≤ j ≤ J, 1 ≤ i ≤ n and k = 0, 1, 2, . . . we define the following notations for convergence: k,j k j Φk,j P x , Ψk,j = hk+1,j − hk,j , ξik,j = xj − aki , Φk,j = ξik,j ¯ bki . 0 =h i (23) Noticing error function (10) is valid, and applying the Taylor mean value theorem with Lagrange remainder, we have E(Wk+1 ) − E(Wk ) = J X ¡ ¢ gj (Φ0k+1,j ) − gj (Φk,j 0 ) j=1 = J h X j=1 i 1 2 k+1,j k+1 j − Φk,j gj0 (Φk,j P x − hk,j Pk xj ) + gj00 (sk,j )(Φk+1,j 0 0 ) 0 )(h 2 120 Yan LIU, Jie YANG, Dakun YANG, et al = J h X ¡ k,j ¢i gj0 (Φk,j ∆Pk+1 xj + Ψk,j Pk+1 xj + Ψk,j ∆Pk xj + 0 ) h j=1 J 1 X 00 2 g (sk,j )(Φ0k+1,j − Φk,j 0 ) , 2 j=1 j k+1,j where sk,j ∈ R is a constant between Φk,j . 0 and Φ0 Employing (14), we have J X k,j ∆Pk xj ) = gj0 (Φk,j 0 )(h J X gj0 (Φk,j 0 ) = n X ¢ j k T hk,j x i (∆pi ) i=1 j=1 j=1 n X ¡ (∆pki )T i=1 n X J X k,j j gj0 (Φk,j 0 )hi x j=1 ∂E(W ) ³ ∂E(Wk ) ´ · −η ∂pi ∂pi i=1 n ° ° X ° ∂E(Wk ) °2 = −η ° ° . ∂pi i=1 = k t,j t,j Using the Taylor expansion, and noticing ht,j i = exp(−Φi · Φi ), partly similar with the proof of Lemma 2 in [16], we have Ψk,j Pk xj = n X k j (hk+1,j − hk,j i i )(pi · x ) = i=1 = n X ³ ´ k+1,j k,j k,j (pki · xj ) exp(−Φk+1,j · Φ ) − exp(−Φ · Φ ) i i i i i=1 k,j ¡ (pki · xj )hi i=1 n X 1 2 n X k,j ¢ − (Φk+1,j · Φk+1,j − Φk,j i i i · Φi ) + k+1,j k,j 2 (pki · xj ) exp(e ts,j · Φk+1,j − Φk,j i )(Φi i i · Φi ) , i=1 k,j where e ts,j lies between −Φk+1,j · Φk+1,j and −Φk,j i i i i · Φi . Employing the property 2) of the operator“¯” in the Definition 3.1, we deduce n X ´ ³ k,j (pki · xj )hk,j −(Φk+1,j · Φk+1,j − Φk,j i i i i · Φi ) i=1 = n X ´i h ³ k,j k+1,j k,j k+1,j k,j k+1,j k,j ) − Φ ) · (Φ − Φ ) + (Φ − Φ · (Φ (pki · xj )hk,j − 2Φ i i i i i i i i i=1 = −2 n X n X ¡ k,j k+1,j k,j ¢ k+1,j 2 − Φk,j ) − − Φ (pki · xj )hk,j (pki · xj )hk,j · (Φ Φ i k i i kΦi i i i i=1 i=1 = −2 n X (pki · xj )hk,j i ³¡ ¢ ¡ ¢´ ξik,j ¯ bki · (−∆aki ) ¯ bk+1 + ξik,j ¯ ∆bki − i i=1 n X i=1 k+1,j 2 − Φk,j (pki · xj )hk,j i k i kΦi (24) A modified learning algorithm for Pi-Sigma network based on first-order Takagi-Sugeno system =2 n X 121 ¡ k,j ¢ ¡ ¢ (pki · xj )hk,j ξi ¯ bki · ∆aki ¯ bk+1 − i i i=1 2 n X k,j ¯ bki ) · (ξik,j ¯ ∆bki ) − (pki · xj )hk,j i (ξi =2 2 k+1,j 2 − Φk,j (pki · xj )hk,j i )k i kΦi i=1 i=1 n X n X k,j ¯ bki ¯ bki ) · ∆aki + 2 (pki · xj )hk,j i (ξi n X i=1 i=1 n X n X k,j (pki · xj )hk,j ¯ ξik,j ¯ bki ) · ∆bki − i (ξi i=1 k,j ¯ bki ¯ ∆bki ) · ∆aki − (pki · xj )hk,j i (ξi k+1,j 2 (pki · xj )hk,j − Φk,j i kΦi i )k . i=1 So we have J X gj0 (Φk,j 0 ) n X ¡ k,j ¢ (pki · xj )hk,j − (Φik+1,j · Φk+1,j − Φk,j i i · Φi ) i i=1 j=1 =2 J X gj0 (Φk,j 0 ) j=1 2 J X 2 gj0 (Φk,j 0 ) = i=1 n X k,j (pki · xj )hk,j ¯ bki ¯ ∆bki ) · ∆aki − i (ξi i=1 gj0 (Φk,j 0 ) j=1 n X k,j (pki · xj )hk,j ¯ bki ¯ bki ) · ∆aki + i (ξi i=1 j=1 J X n X n X k,j (pki · xj )hk,j ¯ ξik,j ¯ bki ) · ∆bki − i (ξi i=1 J X gj0 (Φk,j 0 )δ j=1 J X n X ∂E(Wk ) k,j · ∆aki + 2 gj0 (Φk,j (pki · xj )hk,j ¯ bki ¯ ∆bki ) · ∆aki + 0 ) i (ξi ∂ai j=1 i=1 n X ∂E(Wk ) ∂bi i=1 · ∆bki − J X gj0 (Φk,j 0 ) j=1 n X k+1,j 2 (pki · xj )hk,j − Φk,j i kΦi i )k . (25) i=1 k,j n k j n k j n k j We can easily get kΦk,j 0 k = kΣi=1 hi (pi · x )k ≤ Σi=1 kpi · x k ≤ Σi=1 kpi kkx k = nM C0 , = kxj − aki k ≤ M + C0 . By definition of gj (t) in (11), it is easy to find that gj0 (t) = t − Oj , then we can get |gj0 (Φk,j 0 )| ≤ (nM C0 + M ) ≤ nC0 C1 + C1 . kξik,j k Together with Assumption (A), and the property 1) of the operator “¯”, we get 2 J X gj0 (Φk,j 0 ) j=1 n X k,j (pki · xj )hk,j ¯ bki ¯ ∆bki ) · ∆aki i (ξi i=1 ≤ 2JC02 C12 (nC0 C1 + C1 ) n X k∆bki kk∆aki k i=1 ≤ JC02 C12 (nC0 C1 + C1 ) n X ¡ ¢ k∆aki k2 + k∆bki k2 , i=1 and − J X j=1 gj0 (Φk,j 0 ) n X i=1 k+1,j 2 (pki · xj )hk,j − Φk,j i kΦi i )k (26) 122 Yan LIU, Jie YANG, Dakun YANG, et al ≤ C0 C1 (nC0 C1 + C1 ) J X n X 2 kΦik+1,j − Φk,j i k j=1 i=1 = C0 C1 (nC0 C1 + C1 ) J X n X j=1 i=1 n X k(−∆aki ) ¯ bk+1 + ξik,j ¯ ∆bki k2 i ¡ ≤ 2JC0 C1 (nC0 C1 + C1 ) C02 k∆aki k2 + C12 k∆bki k2 ¢ i=1 ≤ C21 n X ¡ ¢ k∆aki k2 + k∆bki k2 , (27) i=1 where C21 = 2JC0 C1 (nC0 C1 + C1 ) max{C02 , C12 }. The combination of (29)–(31) leads to J X gj0 (Φk,j 0 ) j=1 ≤ n X ¡ k,j ¢ (pki · xj )hk,j − (Φk+1,j · Φk+1,j − Φk,j i i i · Φi ) i i=1 n X ∂E(Wk ) ∂ai i=1 · ∆aki + n X ∂E(Wk ) ∂bi i=1 · ∆bki + C22 n X ¡ ¢ k∆aki k2 + k∆bki k2 , i=1 where C22 = C21 + JC02 C12 (nC0 C1 + C1 ). Furthermore, n J gj0 (Φk,j 0 ) 1 XX k j k+1,j k,j 2 (p · x ) exp(e ts,j · Φk+1,j − Φk,j i )(Φi i i · Φi ) 2 j=1 i=1 i J n C0 C1 (nC0 C1 + C1 ) X X k+1,j k,j 2 (Φi · Φk+1,j − Φk,j i i · Φi ) 2 j=1 i=1 ≤ n J ¤2 C0 C1 (nC0 C1 + C1 ) X X £ k+1,j k+1,j (Φi + Φk,j − Φk,j = i ) · (Φi i ) 2 j=1 i=1 ≤ 2C0 C12 (nC0 C1 + C1 ) J X n X 2 kΦk+1,j − Φk,j i i k j=1 i=1 ≤ C23 n X ¡ ¢ k∆aki k2 + k∆bki k2 , (28) i=1 where C23 = 2JC0 C12 (C0 + C1 )(nC0 C1 + C1 ). Combining (28) and (32) leads to J X k,j k j gj0 (Φk,j P x ) 0 )(Ψ j=1 ≤ n X ∂E(Wk ) ∂ai i=1 = −η · ∆aki + n X ∂E(Wk ) · ∆bki + C2 n X ¡ k∆aki k2 + k∆bki k2 ¢ ∂bi i=1 i=1 n ³° ° ° ° ° ° ´ k °2 ´ k X 2 ° ∂E(W ) ° ° ∂E(Wk ) °2 ° ∂E(Wk ) °2 ° ∂E(W ) ° ° ° +° ° + C2 η 2 ° +° ° , ° ∂ai ∂bi ∂ai ∂bi i=1 n ³° X i=1 where C2 = C22 + C23 . t,j t Notice kΦt,j i k = kξi ¯ bi k ≤ C1 , kbi k ≤ C0 , kξi k ≤ C0 + M = C1 . By the Taylor mean A modified learning algorithm for Pi-Sigma network based on first-order Takagi-Sugeno system 123 value theorem with Lagrange remainder and the properties of Euclidean norm, we get ° t+1,j °2 ° h1 ° − ht,j 1 ° t+1,j t,j ° ° − h 2 ° h2 ° kΨt,j k2 = ° ° ° ··· ° ° ° hk+1,j − ht,j ° n n = n X 2 (ht+1,j − ht,j i i ) = i=1 = n ³ X ´2 ¢ ¡ ¡ t,j t,j ¢ − exp − Φ · Φ exp − Φit+1,j · Φt+1,j i i i i=1 n ³ X ³¡ ¢ ´2 ¢ ¡ t+1,j · Φi − Φt,j − exp(e st,j Φit+1,j + Φt,j i i ) i i=1 ≤ n X ¡ ¢¢2 ¡ |2C1 | ξit+1,j ¯ bit+1,j − ξit,j ¯ bt,j i i=1 = n ³ X ° °´2 t,j ° |2C1 |°(ξit+1,j − ξit,j ) ¯ bit+1,j + ξit,j ¯ (bt+1,j − b ) i i i=1 ≤ n X ¡ |2C1 C0 |kξit+1,j − ξit,j k + 2C12 kbt+1 − bti k i ¢2 i=1 ≤ n ³ X n ´2 X 2 C31 (k∆ai t k + k∆bi t k) ≤ 2C31 (k∆ai t k2 + k∆bi t k2 ), i=1 i=1 set,j i t,j where C31 = 2C1 max{C0 , C1 } and lies between −Φt+1,j · Φt+1,j and −Φt,j i i i · Φi . A combination of Cauchy-Schwartz inequality and (33) gives J X J n X ¡ k,j ¢ X k T j gj0 (Φk,j ∆Pk xj = gj0 (Φk,j Ψk,j 0 ) Ψ 0 ) i (∆pi ) x j=1 j=1 = i=1 J X n X k,j k T j gj0 (Φk,j 0 )Ψi (∆pi ) x j=1 i=1 ≤C1 (nC0 C1 + C1 ) J X n X k T kΨk,j i kk(∆pi ) k j=1 i=1 J ≤ n C1 (nC0 C1 + C1 ) X X 2 k T 2 (kΨk,j i k + k(∆pi ) k ) 2 j=1 i=1 2 ≤J(nC0 C1 + C1 )C31 n X ¡ ¢ k∆aki k2 + k∆bki k2 + i=1 n (nC0 C1 + C1 ) X ¡ J k(∆pki )T k2 2 i=1 ≤C3 n ³X ¡ i=1 k(∆pki )T k2 + ° ° k °2 ° 2 ° ∂E(W ) ° , =C3 η ° ∂W ° n X i=1 k∆aki k2 + n X i=1 k∆bki k2 ´ 124 Yan LIU, Jie YANG, Dakun YANG, et al where C3 = J(nC0 C1 + C1 ) max{ 12 , 4C02 C12 , 4C14 }. gj00 (t) = 1 is easily deduced from the definition of gj (t) in (11), and we get J 1 X 00 2 g (sk,j )(Φk+1,j − Φk,j 0 0 ) 2 j=1 j J = 1 X k+1,j 2 kΦ − Φk,j 0 k 2 j=1 0 J 1 X k+1,j k+1 j = kh P x − hk,j Pk xj k2 2 j=1 = J n n X 1 X X k+1,j k+1 j k j 2 k hi (pi · x ) − hk,j i (pi · x )k 2 j=1 i=1 i=1 = J n n X 1 X X k+1,j k+1 k+1 k (hi − hk,j · xj ) + (hk+1,j − hk,j · xj )k2 i )(pi i i )(pi 2 j=1 i=1 i=1 ≤ 1 XX k+1 k+1 k(hk+1,j − hk,j · xj ) + (hk+1,j − hk,j · xj )k2 i i )(pi i i )(pi 2 j=1 i=1 ≤ 1 XX k+1 k+1 [k(hk+1,j − hk,j · xj )k2 + k(hk+1,j − hk,j · xj )k2 ] i i )(pi i i )(pi 2 j=1 i=1 J n J n n ³° ° ° ° ° ° ´ X ° ∂E(Wk ) °2 ° ∂E(Wk ) °2 ° ∂E(Wk ) °2 ° +° ° +° ° ° ∂pi ∂ai ∂bi i=1 ° ∂E(Wk ) °2 ° ° = C4 η 2 ° ° , ∂W where C4 = 4nJC12 max{C02 C12 , C12 , 1}. Using the Taylor expansion theorem, for k = 0, 1, 2, . . . we get ≤ C4 η 2 E(Wk+1 ) − E(Wk ) n ° n ³° ° ° ° ° ´ X X ° ∂E(Wk ) °2 ° ∂E(Wk ) °2 ° ∂E(Wk ) °2 ≤ −η ° ° −η ° ° +° ° + ∂pi ∂ai ∂bi i=1 i=1 n ° n ° n ° ° ° ° ´ ³X ° ∂E(Wk ) °2 X ° ∂E(Wk ) °2 X ° ∂E(Wk ) °2 C3 η 2 + + ° ° ° ° ° + ° ∂pi ∂ai ∂bi i=1 i=1 i=1 ° ∂E(Wk ) °2 ° ° (C2 + C4 )η 2 ° ° ∂W ° ∂E(Wk ) °2 ° ° ≤ −(η − C5 η 2 )° ° , ∂W k+1,j where C5 = C2 + C3 + C4 , and sk,j ∈ R lies between Φk,j . Write β = η − C5 η 2 , then 0 and Φ0 ° ∂E(Wk ) °2 ° ° E(Wk+1 ) ≤ E(Wk ) − β ° (30) ° . ∂W Suppose the learning rate η satisfies 1 0<η< . (31) C5 A modified learning algorithm for Pi-Sigma network based on first-order Takagi-Sugeno system 125 Then there holds E(Wk+1 ) ≤ E(Wk ), k = 0, 1, 2, . . . . (32) From (32), we know the nonnegative sequence {E(W k )} is monotone, adding it is bounded below, hence, there exists E ∗ > 0 such that limk→∞ E(Wk ) = E ∗ . The statement (i) is proved. Proof of Statement (ii) Using (35), we have k ° ° ° ∂E(Wk ) °2 X ° ° ∂E(Wt ) °2 ° E(Wk+1 ) ≤ E(Wk ) − β ° ° ≤ · · · ≤ E(W0 ) − β ° ° . ∂W ∂W t=0 For E(Wk+1 ) ≥ 0, we get β Set k → ∞, then This immediately gives k ° ° X ° ∂E(Wt ) °2 ° ° ≤ E(W0 ). ∂W t=0 ∞ ° ° X 1 ° ∂E(Wt ) °2 ° ° ≤ E(W0 ) < ∞. ∂W β t=0 ° ∂E(Wk ) ° ° ° lim ° ° = 0. k→∞ ∂W (33) The Statement (ii) is proved. And this completes the proof of Theorem. 6. Conclusion First-order Takagi-Sugeno (T-S) system has recently been a powerful practical engineering tool for modeling and control of complex systems. Ref. [7] showed that Pi-Sigma network is capable of dealing with the nonlinear systems more efficiently, and it is a good model for firstorder T-S system identification. We note that the convergence property for Pi-Sigma neural network learning is an interesting research topic which offers an effective guarantee in real application. To further enhance the potential of Pi-Sigma network, a modified gradient-based algorithm based on first-order T-S inference system has been proposed to reduce the computational cost of learning. Our contribution is to provide a rigorous convergence analysis for this learning method, and some convergence results are given which indicate that the gradient of the error function goes to zero and the fuzzy parameter sequence goes to a local minimum of the error function, respectively. References [1] T. TAKAGI, M. SUGENO. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst., Man, Cyber., 1985, SMC-15(1): 116–132. [2] Y. SHIN, J. GHOSH. The Pi-Sigma networks: an efficient higher-order neural network for pattern classification and function approximation. in: Proceedings of International Joint Conference on Neural Networks, 1991, 1: 13–18. [3] Yaochu JIN, Jingping JIANG, Jing ZHU. Neural network based fuzzy identification and its application to modeling and control of complex systems. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(6): 990–997. 126 Yan LIU, Jie YANG, Dakun YANG, et al [4] Weixin YU, M. Q. LI, Jiao LUO, et al. Prediction of the mechanical properties of the post-forged Ti–6Al–4V alloy using fuzzy neural network. Materials and Design, 2012, 31: 3282–3288. [5] D. M. LIU, G. NAADIMUTHU, E. S. LEE. Trajectory tracking in aircraft landing operations management using the adaptive neural fuzzy inference system. Comput. Math. Appl., 2008, 56(5): 1322–1327. [6] I. TURKMEN, K. GUNEY. Genetic tracker with adaptive neuro-fuzzy inference system for multiple target tracking. Expert Systems with Applications, 2008, 35(4): 1657–1667. [7] K. CHATURVED, M. PANDIT, L. SRIVASTAVA. Modified neo-fuzzy neuron-based approach for economic and environmental optimal power dispatch. Applied Soft Computing, 2008, 8(4): 1428–1438. [8] Wei WU, Long LI, Jie YANG, et al. A modified gradient-based neuro-fuzzy learning algorithm and its convergence. Information Sciences, 2010, 180(9): 1630–1642. [9] E. CZOGALA, W. PEDRYCZ. On identification in fuzzy systems and its applications in control problems. Fuzzy Sets and Systems, 1981, 6(1): 73–83. [10] W. PEDRYCZ. An identification algorithm in fuzzy relational systems. Fuzzy Sets and Systems, 1984, 13(2): 153–167. [11] Yongyan CAO, P. M. FRANK. Analysis and Synthesis of Nonlinear Time-Delay Systems via Fuzzy Control Approach. IEEE Transactions on Fuzzy Systems, 2000, 8(2): 200–211. [12] Yuju CHEN, Tsungchuan HUANG, Reychue HWANG. An effective learning of neural network by using RFBP learning algorithm. Inform. Sci., 2004, 167(1-4): 77–86. [13] Chao ZHANG, Wei WU, Yan XIONG. Convergence analysis of batch gradient algorithm for three classes of sigma-pi neural networks. Neural Processing Letters, 2007, 26(3): 177–189. [14] Chao ZHANG, Wei WU, Xianhua CHEN, et al. Convergence of BP algorithm for product unit neural networks with exponential weights. Neurocomputing, 2008, 72(1–3): 513–520. [15] J. S. WANG, C. S. G. LEE. Self-adaptive neuro-fuzzy inference systems for classification applications. IEEE Transactions on Fuzzy Systems, 2002, 10(6): 790–802. [16] H. ICHIHASHI, I. B. TKSEN. A neuro-fuzzy approach to data analysis of pairwise comparisons. Internat. J. Approx. Reason., 1993, 9(3): 227–248.