Chapter 4 Multilayer Perceptrons 授課教師: 張傳育 博士 (Chuan-Yu Chang Ph.D.) E-mail: chuanyu@yuntech.edu.tw Tel: (05)5342601 ext. 4337 Office: ES709 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering Introduction Multilayer perceptrons The network is consists of a set of sensory units, that constitute the input layer, one or more hidden layers of computation nodes, and an output layer of computation nodes. 屬於supervised learning,是一種基於error-correction learning的error back-propagation algorithm 包含兩個階段:forward pass和backward pass Forward pass: an activity pattern is applied to the sensory nodes , and its effect propagates through the network layer by layer. Backward pass: the synaptic weights are all adjusted in accordance with an error-correction rule. • The error signal is propagated backward through the network. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 2 Introduction (cont.) A multilayer perceptron has three distinctive characteristic: The mode of each neuron in the network includes a nonlinear activation function. Sigmoidal nonlinearity The network contains one or more layers of hidden neurons. Enable the network to learn complex tasks by extracting progressively more meaningful features from the input patterns. The network exhibits a high degrees of connectivity, determined by the synapses of the network. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 3 Some preliminaries The architectural graph of a MLP with two hidden layers and an output layer. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 4 Some preliminaries (cont.) Two kinds of signals are identified: Function signals Comes in at the input end of the network, propagates forward through the network, and emerges at the output end of the network as an output signal. Error signal Originates at an output neuron of the network, and propagates backward through the network 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 5 Some preliminaries (cont.) 在multilayer perceptron中的每個隱藏層或 輸出層的神經元都被設計來執行兩種計算 功能: 神經元的輸出是輸入訊號經由權重的連結經一非 線性轉換所得到的結果 用來估計梯度向量(gradient vector) ,以回傳到 整個網路 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 6 Signal-flow graph highlighting the details of output neuron j Neuron j y0 1 w j 0 ( n) b j ( n) d j (n) y j (n) j (v j (n)) yi (n) w ji (n) v j (n) () y j (n) 1 m v j (n) w ji (n) yi (n) e j (n) i 0 chain rule E (n) E (n) e j (n) y j (n) v j (n) wij (n) e j (n) y j (n) v j (n) wij (n) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 7 Back-propagation algorithm (cont.) The error signal at the output of neuron j at iteration n is defined by (4.1) e j n d j n y j n The instantaneous value of the total error energy is defined as 1 E n e 2j n 2 jc 輸出層神經元的個數 (4.2) The average squared error energy is obtained by summing E(n) over all n and then normalizing with respect to the set size N 訓練樣本的個數 N 1 Eav E n N n1 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.3) 8 Back-propagation algorithm (cont.) Eav represents the cost function as a measure of learning performance. The objective of the learning process is to adjust the free parameters of the network to minimize Eav. The weights are updated on a pattern-bypattern basis until one epoch, that is, one complete presentation of the entire training set has been dealt with. The adjustments to the weights are made in accordance with the respective errors computed for each pattern presented to the network. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 9 Back-propagation algorithm (cont.) The induced local field vj(n) is defined as m v j n w ji n yi n (4.4) i 0 The function signal yj(n) appearing at the output of neuron j at iteration n is y j n j v j n (4.5) 根據LMS及credit-assignment,倒傳遞網路會使 用chain rule來計算每個神經鍵的修正權重 Sensitivity factor E n E n e j n y j n v j n w ji n e j n y j n v j n w ji n 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.6) 10 Back-propagation algorithm (cont.) Eq(4.2)兩邊對ej(n)作偏微分 E n e j n e j n (4.7) Eq(4.1)兩邊對yj(n)作偏微分 e j n y j n 1 Eq(4.5) 對vj(n)作偏微分 y j n j ' v j n v j n Eq(4.4)對wj(n)作偏微分 v j n w ji n yi n 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.8) (4.9) (4.10) 11 Back-propagation algorithm (cont.) 將Eq(4.7)到Eq(4.10)整合起來 E n e j n j ' v j nyi n w ji n (4.11) 根據delta rule來修正權重值 E n w ji n w ji n 因此,將Eq(4.11)代入Eq(4.12)可得 wji n d j nyi n 其中,local gradient dj(n)定義成 d j n (4.12) (4.13) E n v j n E n e j n y j n w ji n y j n v j n e j n j ' v j n 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.14) 12 E ( n) y0 1 1 2 e 2j (n) jC e j ( n) d j ( n ) y j ( n ) w j 0 ( n) b j ( n) d j (n) y j (n) j (v j (n)) yi (n) w ji (n) v j (n) () y j (n) 1 e j (n) m v j (n) w ji (n) yi (n) i 0 E (n) E (n) e j (n) y j (n) v j (n) wij (n) e j (n) y j (n) v j (n) wij (n) E (n) e j ( n) e j (n) e j (n) y j (n) 1 y j (n) v j (n) j (v j (n)) v j (n) wij (n) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering yi (n) 13 Back-propagation algorithm (cont.) Case 1: neuron j is an output node 使用Eq(4.1)計算與此顆神經元有關的誤差ej(n) ,再使 用Eq(4.14)計算其local gradient dj(n) 。 Case 2: neuron j is a hidden node There is no specified desired response for that neuron. The error signal for a hidden neuron would have to be determined recursively in terms of the error signals of all the neuron to which that hidden neuron is directly connected. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 14 Signal-flow graph of output neuron k connected to hidden neuron j Fig. 4.4 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 15 Back-propagation algorithm (cont.) 根據Eq(4.14) ,可定義hidden neuron j的local gradient dj(n) d j n 因為 E n y j n y j n v j n E n j ' v j n , y j n E n neuron j is hidden 1 ek2 n 2 kc (4.15) (4.16) 對Eq(4.16)進行偏微分 E n e n ek k y j n y j n k 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.17) 16 Back-propagation algorithm (cont.) 使用chain rule,E(4.17)可改寫成 E n e n vk n ek n k y j n k vk n y j n (4.18) 從圖四發現 ek n d k n yk n d k n k vk n, 因此 neuronk is an output node ek n k ' vk n vk n 從圖四也發現 (4.19) (4.20) m vk n wkj ny j n j 0 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.21) 17 Back-propagation algorithm (cont.) 對Eq(4.21)對yj(n)偏微分 vk n wkj n y j n (4.22) 將Eq(4.20)和Eq(4.22)代入Eq(4.18)可得 E n ek n k ' vk n wkj n y j n k (4.23) d k n wkj n k 最後將Eq(4.23)代入Eq(4.15)可得local gradient dj(n)的 back-propagation formula d j n j ' v j nd k nwkj n, neuron j is hidden (4.24) k 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 18 Neuron k Neuron j y0 1 y0 1 w j 0 ( n) b j ( n) yi (n) w ji (n) v j (n) d k (n) () y j (n) wkj (n) () vk (n) yk (n) 1 ek (n) ek (n) d k (n) yk (n) d k (n) k (vk (n)) m ek (n) k (vk (n)) vk (n) vk (n) wkj (n) y j (n) j 0 vk (n) wkj (n) y j (n) E (n) ek (n) k (vk (n)) wkj (n) δ k (n)wkj (n) y j (n) k k δ j (n) j (v j (n))δ k (n)wkj (n) 資訊工程所 醫學影像處理實驗室(k Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 19 Back-propagation algorithm (cont.) Summary The correction wji(n) applied to the synaptic weight connecting neuron i to neuron j is defined by the delta rule W eight Learning Local Input signal correction rate parameter gradient of neuron j d n w n y n ji j i (4.25) The local gradient dj(n) depends on whether neuron j is an output node or a hidden node 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 20 Back-propagation algorithm (cont.) The two passes of Computation Forward pass The synaptic weights remain unaltered throughout the network, and the function signals of the network are computed on a neuron-by-neuron basis. The output of neuron j is (4.26) y j n v j n where induced local field of neuron j is defined by m v j n w ji n yi n Backward pass i 0 (4.27) Starts at the output layer by passing the error signals leftward through the network, layer by layer, and recursively computing the d for each neuron. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 21 Back-propagation algorithm (cont.) Activation function 在MLP中,(.)必須是連續可微 普遍使用的activation function為Sigmodial nonlinearity Logistic function j v j n 1 1 exp av j n a 0 and v j n (4.30) 對Eq(4.30)微分可得 j ' v j n a exp avj n 1 exp av n (4.31) 2 j 因為yj(n)=j(vj(n))Eq(4.30)可改寫成 j ' v j n ayj n1 y j n 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.32) 22 Back-propagation algorithm (cont.) 在輸出層的神經元 j,yj(n)=oj(n) ,因此,神經元 j 的 local gradient可表示成 d j n e j n j ' v j n a d j n o j n o j n 1 o j n , neuron j is an output node (4.33) 隱藏層中的任意神經元 j 的local gradient可表示成 d j n j ' v j nd k n wkj n k ay j n 1 - y j n d k n wkj n, neuron j is hidden (4.34) k 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 23 Back-propagation algorithm (cont.) Hyperbolic tangent function j v j n a tanhbvj n, a, b 0 (4.35) 對vj(n)微分 j ' v j n absech2 bv j n b ab 1 tanh bv j n a y j n a y j n a 2 (4.36) 在輸出層的神經元j的local gradient d j n e j n j ' v j n b d j n o j n a o j n a o j n a (4.37) 在隱藏層的神經元j的local gradient d j n j ' v j n d k n wkj n k b a y j n a y j n d k n wkj n , neuron j is hidden a k 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.38) 24 Rate of Learning BP演算法提供一個使用最陡坡降法在權重空間 軌跡的近似。 較小的learning rate parameter ,網路中神經鍵權重 的改變也較少,因此, 在權重向量的軌跡較平滑。然 而其收斂時間較長。 較大的learning rate parameter ,網路中神經鍵權重 的改變也較大,可加速網路的收斂速度,然而在權重 向量的軌跡則會出現震盪狀。網路可能會不穩定。 可藉由momentum項的引入,以增加學習速度,同時 避免網路的不穩定性 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 25 Rate of Learning BP approximate the trajectory of steepest descent smaller learning-rate parameter makes smoother path Increasing rate of learning yet avoiding danger of instability by including a momentum term w ji (n) α w ji (n 1) ηδ j (n) y j (n) where is momentum constant n n w ji (n) η α δ j (t ) yi (t ) η α n-t t 0 n-t t 0 a.沿波前進 E (n) w ji (n) 網路要converge,momentum constant必須嚴格 限制在0 | | 1 b.谷底振盪 a.當偏微分項與上次同號,加速(wji(n)值變大) i.e. η加大 (加快downhill) b.當偏微分項與上次反號,相消(wji(n)值變小) (對符號的震盪有stabilizing effect) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 26 Rate of Learning (cont.) 引入momentum項,有benefit of preventing the learning process from terminating in a shallow local minimum Learning rate 可以是connection-dependent. Synaptic weights可以是某些固定,某些可調。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 27 Mode of Training Mode of Training Epoch : one complete presentation of training data randomize the order of presentation for each epoch Sequential mode: on-line, pattern, stochastic mode for each training sample, synaptic weights are updated require less storage converge much fast, particularly training data is redundant random order makes trapping at local minimum less likely Batch mode Weight updating is performed after the presentation of all the training examples that constitute an epoch. at the end of one epoch, synaptic weights are updated e j (n) Eav (n) η N w ji (n) η e j ( n) w ji (n) N n 1 w ji (n) (4.42-43) may be robust with outliers 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 28 Sequential Mode v.s. Batch Mode of Training 1. 2. Pattern Mode:每次Mini 1 e 2j (n) 2 jC 每給一pattern,計算出 w ji (n),update it ,再給另一 pattern ,計算出新的w ji (n) ,update…… Batch Mode: 當N個pattern都用過後,其總 w ji為 1 w ji N N 1 w ( n ) ji N n 1 E (n) w (n) N n 1 ji N N e j (n) n 1 w ji (n) e j ( n) 所有的pattern都算過,求出總平均update值之後再做更 N 改 MiniEav 1 e 2j (n) 2N n 1 jC Eavg (n) η w ji (n) η w ji (n) N N e j (n) n 1 w ji (n) e j ( n) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 29 Sequential Mode 與 Batch Mode 之比較 Sequential: On-line 以 sequential 較易,因為須較少之 storage 在 weight space 為 stochastic 之特性,不易被 trapped 到 local minimum 但由於 stochastic ,故較難以 theoretical prove 證明其 convergence 由於其 simple 及 effective to large and difficult problems , 其仍 popular Batch: accurate estimate of the gradient vector Convergence to local minimum is guaranteed 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 30 Stopping Criteria No well-defined stopping criteria Terminate when gradient vector g(w) = 0 (一階微分等於 零) located at local or global minimum Terminate when error measure is stationary. The BP algorithm is considered to have converged when the Euclidean norm of the gradient vector reaches a sufficiently small gradient threshold. • 缺點:for successful trials, learning times may be long. The BP algorithm is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small. • 缺點:太大的squared error變化門檻值,造成網路太早停止學習。 Terminate if NN’s generalization performance is adequate 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 31 Summary of the BP algorithm 1. 2. Initialization: a. good choice of initial value b. weights 不可太大 目標:讓neuron sum vj 落於activation範圍內。 Presentations of training Examples 3. Forward Computation 4. Compute the induced local fields and function signals of the network by proceeding forward through the network, layer by layer. Backward Computation 5. Present the network with an epoch of training examples. Compute the local gradients ds of the network. Iteration 給予新的training example,重複步驟3和4,直到滿足stopping criterion。 The order of presentation of training examples should be randomized from epoch to epoch. The momentum and learning-rate parameter are typically decreased as the number of training iteration increases. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 32 Signal-flow graphical summary of backpropagation learning Sensitivity graph:用 來計算local gradient 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 33 XOR Problem 單層perceptron因為沒有hidden neuron,所以無法 分割非線性的pattern。 可使用具有單一隱藏層的網路來解XOR問題 每個神經元是以McCulloch-Pitts Model (threshold model) -1.5 Neuron 1 (.) +1 Neuron 3 -2 +1 +1 (.) (.) +1 +1 -0.5 -0.5 Neuron 2 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 34 XOR Problem (cont.) 第一顆隱藏層神經元所決 定的decision boundary 第二顆隱藏層神經元所決 定的decision boundary The function of the output neuron is to construct a linear combination of the decision boundaries formed by the two hidden neurons. 整個網路所決定的decision boundary 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 35 Some Hints for Making the B-P Sequential vs. Batch update The sequential mode of BP learning is computationally faster than the batch mode. Maximizing information content: Using pattern that is radically different from all those previously used also called emphasizing scheme 但須留意: outliers 及不可將input distribution給扭曲了。 Activation function 反對稱(antisymmetric)的激發函數的學習速度比非對稱 (nonsymmetric)的激發函數快。 典型的反對稱激發函數為hyperbolic tangent。 Target values The target values (desired response) be chosen within the range of the activation function。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 36 Antisymmetric activation function Nonsymmetric activation function f(-v)=- f(v) Hyperbolic tangent activation function f(v)= a tanh (bv) a=1.7159, b=2/3 f(1)=1 f(-1)=-1 At the origin the slope of the activation function is close to unity, f(0)=ab=1.7159*2/3=1.1424 The second derivative of f(v) attains its maximum value at v=1. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 37 Some Hints for Making the B-P Normalizing the input Preprocessed to have mean close to zero Input should be uncorrelated Decorrelated input should be scaled to have their covariance approximately equal. By PCA Operation of mean removal, decorrelation, and covariance equalization 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 38 Some Hints for Making the B-P (cont.) Initialization Weights should be relatively small 具有zero mean ,且其variance等於連接到此神經元神 經鍵數目的倒數。 Learning from hints: 允許將已知的資訊加到f() ,以加速學習。 Learning rate 後層的learning rate需比前層的learning rate低。 當神經元的input多的時候,給較小的learning rate。 LeCun(1993) ,連接到此神經元神經鍵數目的平方根 的倒數。(Eq 4.49) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 39 Initialization 考慮一使用hyperbolic tangent function的multiplayer perceptron網路, 令其bias為0,則神經元j的induced local field可表示成 m v j w ji yi i 1 assume 1. 平均值為0 Unit variance assume 2. assume 3. Inputs are uncorrelated assume 4. Zero mean Let the variance of weight be represented by 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 40 Then we have 0 m m i 1 i 1 v E[v j ] E[ w ji yi ] E[ w ji ]E[ yi ] 0 0 m m 2 2 2 v E[(v j v ) ] E[v j ] E w ji w jk yi yk i 1 j 1 m m m E w ji w jk E yi yk E[ w2ji ] m w2 i 1 j 1 i 1 =1 as i=k A good strategy for initializing the synaptic weights, the standard deviation of the induced local field of a neuron須落在 activation 之 linear 範圍內 因為hyperbolic tangent 之二階微分在 則 2 v 1 m w2 1 w m 1 v=1時有maximum 2 m is the number of synaptic connections of a neuron (4.49) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 41 Some Hints for Making the B-P Perform Better: 1. Asymmetric activation function 2. dj should be offset by some E away from the limiting value 3. Weight and bias limited within a small range where Fi 為neuron i 之 fan-in 4. all neurons should learn at the same rate 2.4 2.4 ( , ) Fi Fi a. Fan-in 大, then η 小 b. 後面layer η 小 5. use pattern-by-pattern updating 6. Pattern orders for training should be randomized 7. Prior information should be used in assigning f(‧) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 42 Output Representation and Decision Rule 理論上,一個M-class的分類問題,需要有 M個輸出來 表示所有可能的分類。 令ykj表示輸入xj在網路中第k個輸出 yk , j Fk x j , k 1,2,...,M (4.50) 其中Fk(.)定義網路的輸入與第k個輸出的對應函數 為方便表示,令 F x , F x ,...,F x Fx y j y j ,1 , y j , 2 ,..., y M , j T (4.51) T 1 j 2 j M j j xj Multilayer Perceptron w y1,j y2,j yM,j 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 43 Output Representation and Decision Rule 我們希望: After a multilayer perceptron is trained, what should the optimum decision rule be for classification the M outputs of the network? 任何合理的output decision必須基於 F : m x y M 最小化empirical risk functional 2 1 N R 2N d j Fx j (4.52) (4.53) j 1 假設網路是以binary target來訓練,因此 1 d kj 0 when the prototypex j belongsto classCk when the prototypex j does not belongclassCk 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.54) 44 Output Representation and Decision Rule 因此,class Ck可表示成M維target vector 0 1 kt h element 0 一個MLP classifier適當的output decision rule,就如同一 個近似的Bayes rule: Classify the random vector x as belonging to class Ck if Fk x Fj x for all j k (4.55) where Fk(x) and Fj(x) are elements of the vector-valued mapping function. F1 x F x The vector x is assigned membership in a Fx 2 FM x particular class if the corresponding output value is greater than some fixed threshold. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 45 Computer Experiment 利用一電腦實驗來顯示multilayer perceptron作為 pattern classifier的行為分析。 Class 1: 1 1 2 f x (x |C1 ) 2 12 exp 2 x μ1 2 1 (4.56) μ1 mean vector 0,0T , 12 variance 1 Class 2: 1 f x (x |C2 ) exp 2 x μ 2 2 2 2 2 2 1 2 (4.57) μ1 2,0 , 12 4 T 且 p(C1 ) p(C2 ) 0.5 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 46 Computer Experiment Eq(4.56)和Eq(4.57)的圖 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 47 Class C1、Class C2、joint scatter diagram Class 1 500點的分佈 Class 2 500點的分佈 Class 1和Class 2 的 joint 分佈 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 48 Bayesian Decision Boundary Let f x (x|C1 ) and p(C2 ) Λ(x) 1 f x (x|C2 ) p(C1 ) if (x) x C1 if (x) x C2 if p(C1 ) p(C2 ) 0.5 1 (x) 1 為 boundary 2 令 (x) 2 exp( 1 x μ 2 1 x μ 2 ) 1 1 2 12 2 12 2 22 12 1 1 取log 2 2 Hence 2 22 Error: x μ2 2 12 x μ1 4 log 2 2 x xc 2 r2 Pe P(e |C1 ) p1 P(e |C2 ) p2 0.5 0.1056 0.5 0.2642 0.1849 Pc 1 Pe 0.8151 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 49 參數設定(1) :決定最佳的隱藏層神經元數量 單一隱藏層、以sequential mode來train BP:下表為相關參 數設定 Number of hidden neurons: m1 (2,∞) Learning-rate parameter: η (0,1) Momentum constant : α (0,1) 以最少之hidden neurons, 以達到與Bayesian classifier相 近之performance 選擇 2 hidden neuron開始 表4.2中的mean-square error可由Eq(4.53)求得。 A small MSE does not necessarily imply good generalization 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 50 參數設定(1) :決定最佳的隱藏層神經元數量 網路經過 N個訓練樣本的訓練後,已經收斂,則正確分 類的機率可表示成 Pc, N p1 Pc, NC1 p2 Pc, NC2 where p1 p2 0.5 Pc, NC1 1 N Pc, NC2 1 f x xC1 dx 1 N f x xC2 dx 其中,Ω1(N)表示MLP將向量 x分成第一類C1的範圍。 然而實際上, Ω1(N)是經由訓練所得到,很難得到一數 學式。 使用實驗的方式,隨機從C1和C2中(一樣機率),挑取另外 一組樣本,來訓練 MLP網路。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 51 參數設定(1) :決定最佳的隱藏層神經元數量 令A為不屬於原本 N個測試樣本中可被正確分類的個數, 則 A pN N 套用Chernoff bound可得 P pN p e 2 exp 2e 2 N d 當e=0.01, d=0.01時,可得N=26500 因此選用的測試樣本大小為32000。 表4.2的Pc是將10次的測試平均後的結果。 雖然比兩個隱藏神經 元有較低的MSE,但 分類錯誤率較高。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 52 參數設定(2) :決定最佳的learning & momentum 常數 A lower MSE over a training set does not necessarily imply good generalization. Heuristic and experimental procedures dominate the optimal selection of and . 在此實驗中,MLP有兩個隱藏神經元,={0.01, 0.1, 0.5, 0.9} ,momentum ={0.0, 0.1, 0.5, 0.9} ,training set有 500 examples,訓練次數700次。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 53 =0.01 =0.1 =0.9 =0.5 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 54 MSE Best learning curves selected from the four parts Conclusions: 1. 2. 3. η小,slow convergence,但較易找到deeper local minima η0, α1 increase speed of convergence, 若η1,則 α須0,系統才會穩定 {η=0.5, α=0.9 }及{η=0.9, α=0.9 }系統造成oscillations ,且最後收斂的MSE值偏高。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 55 參數設定(3) :決定最佳的network design M opt 2 opt 0.1 opt 0.5 training set=1000patterns testing set=32000patterns 20次independent runs找最佳,最差及平均 MSE 0.34 fastest average slowest 0.30 0.26 0.22 0.20 0 5 10 20 30 40 50 20次中最佳的3個decision boundaries. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 56 Generalization Capability A network is said to generalize well when the inputoutput mapping computed by the network is correct for test data never used in creating or training the network. Assumed that the test data are drawn from the same population used to generate the training data. The learning process may be viewed as a “curve-fitting” problem. A NN is designed to generalize well will produce a correct input-output mapping even when the input is slightly different from the examples used to train the network. A NN learns too many input-output examples, the network may end up memorizing the training data. (overfitting, or overtraining) When the network is overtrained, it loses the ability to generalize between similar input-output patterns. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 57 Generalization Capability (cont.) Generalization is affected by: 1. the size of training set 2. the architecture of the network 3. the physical complexity of the problem 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 58 Generalization Capability (cont.) Sufficient Training Set Size for a Valid Generalization 當網路結構固定,需要多大的training set才能獲得一個良好的 generalization? 當training set固定,則怎樣的網路架構能達到良好的 generalization? 根據VC dimension W (4.85) N O e where W is the total number of free parameters (synaptic weights and biases) e denotes the fraction of classification errors permitted on test data 10%的誤差,則訓練樣本數至少應該是10倍的自由參數個數。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 59 Generalization Capability (cont.) Let d[-1,1] : classification M: total number of hidden nodes e: fraction of error permitted on the test w: total number of weight According to Baum & Haussler ( 1989 ) : The network will almost certainly provide generalization if : 1: The fraction of errors on training set is less than e / 2 2: N 32w ln( 32 M ) e e training set size 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 60 Approximations of functions 在MLP中應該提供多少隱藏層才能提供一能對任何連續函數的近似實 現? How a MLP provides the approximation capability to any continuous mapping? (Universal Approximation Theorem) Cybenko 1989, Funahashi 1989, Hornik 1989 Let φ(‧): nonconstant, bounded, monotone-increasing, continuous function. 相當於MLP的隱藏層輸出 :continuous function on Im0:m0-dimensional hypercube [0,1] m0 f C ( I m0 ) given m0 m1 e 0, M , i , bi , w ji F ( x1 ,..., xm 0 ) i ( wij x j bi ) and i 1 j 1 F ( x1 ,..., xm0 ) f ( x1 ,..., xm0 ) e {x1 ,..., xm0 } I m0 相當於MLP的輸出 given sufficient hidden neurons (M夠大), MLP可approximate continuous function on Im0 A signal hidden layer is sufficient for a MLP to compute a uniform e approximation to a given training set. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 61 Cross-validation 首先將available data set隨機分成training set 和test set,再 進一步將training set 分成 a: estimation of the model ( training ) b: evaluation of the performance( validation ) 1: 用test set data可測試是否overfitting 2: 可一邊training, 一邊test. 若validation performance 未降反升time to stop! 3: 用以調η大小,若MSE in generalization 未持續下降, 則reduce η↓ generalization 可以小network, 或大network,少epochs 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 62 Cross-validation (cont.) Periodic estimation-followed-by-validation After a period of estimation (training), the synaptic weights and bias levels of the MLP are all fixed, and the network is operated in its forward mode. The validation error is thus measured for each example in the validation subset. When the validation phase is completed, the estimation is resumed for another period, and the process is repeated. 當網路學習次數超過validation的最低點時,表示training data含有noise,所以網路就須停止訓練。 若training data為noise free,該如何決定其early stop point? 如果estimation和validation error無法同時趨近於零,表示網路不 具有精確model此函數的能力。只好嘗試找尋integrated-squared error。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 63 Cross-validation (cont.) A statistical theory of the overfitting phenomenon Amari (1996)探討使用Early stopping method時應注意事項。 Batch learning, single hidden layer Two modes of behavior are identified depending on the size of the training set. Nonasymptotic mode, (N<W) • Early stopping method的訓練方式會比exhaustive training法的 generalization好。 • 當N<30W時,會發生Overfitting • 將訓練樣本分成estimation及validation的最佳比值 2W 1 1 2W 1 1 1 2W ropt 1 • 若W為較大值時 ropt • 例如:W=100, ropt=0.07,表示93%的訓練樣本分為estimation subset,7%的訓練樣本分為validation subset。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 64 Cross-validation (cont.) Asymptotic mode (N>30W) • Exhaustive learning is satisfactory when the size of the training sample is large compared to the number of network parameters. Multifold cross-validation 適用於當labeled example少的時候。 Dividing the available set of N examples into K subsets. 留一個當測試樣本,其餘的當訓練樣本。 經過 K次的試驗後,可得平均的MSE。 Leave-one-out method • N-1 examples are used to train the model, and the model is validated by testing it on the example left out. 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 65 Network Pruning Techniques Minimizing the size of the network while maintaining good performance. Network growing 從較小的網路結構開始,當無法達到想要的設計規格時, 增加一個神經元或是一隱藏層。 Network pruning 從一較大的、對想要的問題能有適當效能解的網路結構開 始,刪除某些神經元或是神經鍵。 只探討network pruning Regulation of certain synaptic connections from the network Deletion of certain synaptic connections from the network 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 66 Network Pruning Techniques Complexity Regularization We need an appropriate tradeoff between reliability of the training data and goodness of the model. By minimizing the total risk expressed as Rw Cs w lCc w (4.94) where Es(w) is the standard performance measure (在MLP中為 MSE) Ec(w) is the complexity penalty lis a regularization parameter The complexity Ec(w) can be obtained by (使F(x,w)的第k次對輸入向量x的微分值變小) (4.95) smoothnessof F (x, w) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 67 Network Pruning Techniques Three complexity regularization Weight decay 強迫網路中的某些神經鍵的權重值趨近於零,而允許某些神 經鍵的權重值維持相對大的值。 將神經鍵的權重值概分成兩群:具有較大影響力及較小影響 力(excess weight)。 這些過剩的權重將被設為趨近於零,以改善generalization Ec w w 2 wi2 (4.96) iCtotal 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 68 Network Pruning Techniques Weight elimination The complexity penalty定義如下: Ec w w / w 1 w / w 2 i iCtotal 0 預先給定的參數值 (4.97) 2 i 0 當|wi|<<w0時,complexity penalty對wi而言趨近於零。這意謂 著,第i個神經鍵的權重是不可靠的,應該從網路中刪除。 當|wi|>>w0時,complexity penalty對wi而言趨近於1。這意謂 著,第i個神經鍵的權重是重要的。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 69 Network Pruning 當|wi|<<w0時,complexity penalty趨近於0 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 70 Network Pruning Approximate smoother 適用於具有一隱藏層及輸出層只有一個神經元的MLP Ec w M woj2 wj p (4.98) j 1 其中woj為輸出層的權重,wj則為隱藏層第j個神經元的權重向量 (4.99) 這種方法比前面的weight decay和weight elimination好,因為 • 它有將隱藏層和輸出層的神經鍵分開考慮 • 它能擷取兩個神經鍵權重的相互關係。 • 但這種方法比較複雜。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 71 Hessian-based network pruning(新版) 使用error surface的二階微分資訊來進行network pruning,目的在於 network complexity和training-error performance間作一折衷。 使用Taylor series建構一local approximation of the cost function Eav 擾亂量 在w的梯度 Hessian matrix Eav (w w) Eav (w) g T (w)w 1 3 w T Hw O( w ) 2 目的是找出一組參數,從MLP網路中刪除會使cost function的增加量 最少。假設 External approximation:網路必須訓練後才能將某些參數刪除,因此g 會設為零。 Ouadratic approximation: 假設在local 或global 最小值附近的error surface是接近二次方的,因此較高階的項將可被刪除 Eav Eav (w w ) Eav (w ) 1 1 w T Hw w T Hw 2 2 資訊工程所 醫學影像處理實驗室( Medical Image Processing Lab. T ) 因為當network converges, 一階微分為0 g (w) 0 Graduate School of Computer Science & Information Engineering g T (w )w 72 Hessian Matrix of Error Surface (舊版) 在converge之後拿除某些weights,但儘量不影響 e 將E 對 δwi 做 Taylor series 展開 1 hijdwidw j higher order 2 i i j E E 其中 g i , hij wi wi w j E 在平衡點 , local minimum w 0 i dE g idwi 1. w w1 2 2 i j w T 若將某一wi拿除,相當於在wi加上某一量wi, 1i 2 [0,...,0,1,0,...,0] wi 0 而其結果 wi wi 0 1Ti w wi 0 w i To minimize E ,given the constraint 1Ti w wi 0 1 T T By Lagrangian S w Hw l (1i w wi ) 2 2. 若為guadratic E function,則higher order ≈ 0 Then dE 1 hijdwidw j 1 dwT Hdw 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 73 Solving Lagrangian S with respect w 1. S h w l 0 ij j j wi S h w 0 for k i kj j wk j 將 w代入constraint that 1Ti w wi 0 Hw l1i 0 (vector form) w lH 11i ………(*) l1Ti H 11i wi 0 [ H 1 ]ii : H-1之第ii個element wi 1 l[ H ]ii wi 0 l 1 [ H ]ii 將 l 代入(*) then w w i1 H 11i [ H ]ii 如果將wi拿除,而儘量不影響整個network,則所有w所須改變之量(w ) 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 74 Solving Lagrangian S with respect w 將 w wi 1 [H ]ii 1 H 1i 代入 S 1 w T Hw l (1Ti w wi ) 2 第一項 wi wi 1 1 wi2 T 1 1 (1i H )H( 1 H 1i ) 1 2 2 [H 1 ]ii [H ]ii [H ]ii 第二項:由constraint知為0 可由下式去verify: 1Ti ( wi [H 1 ]ii H 11i ) wi wi wi 0 因此 wi確實經 w update後 會被拿除。 1 wi2 S 2 [H 1 ]i ,i 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 75 Computing the inverse Hessian matrix : H-1 當dimension太大時H-1難求得 2 av where H E 2 w 根據 Eav (w ) Then 1 2N Eav 1 w N H( N ) 2Eav w 2 1 N N (d (n) o(n)) 2 n 1 total # of examples (d (n) F (w, x)) 2 n 1 F (w, x) w (d (n) F (w, x)) n 1 1 N N F (w, x) F (w, x) T 2 F (w, x) w w w 2 (d (n) F (w, x)) n 1 N F (w, x) F (w, x) w w n 1 N 1 2N N T 當network fully trained, d(n)-o(n) 0 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 76 1 F (w, x) 令 ( n) w N Then n H(n) (k ) T (k ) H(n 1) (n) T (n) n 1,2,..., N (4.110) k 1 令 A B 1 CDCT ,由matrix inverse lemma 1 T 1 T A B BC(D C BC) C B (*) C ( n) 比較Eq(4.110)和(*)可得 A H(n) Β 1 H(n 1) D 1 Then we have H 1 (n 1) (n) T (n)H 1 (n 1) H (n) H (n 1) 1 T (n)H 1 (n 1) (n) 1 1 scalar 令 H (0) d I ,可iteratively求得 H 1 (n) 1 1 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 77 Virtues and limitations of BP Training BP的特性: gradient descent, not an optimization technique Simple to compute locally Perform stochastic gradient descent Connectionism Feature Detection Hidden neurons play a critical role as feature dectors. Function Approximation 以BP訓練的MLP是一種nested sigmoidal scheme。 Computational Efficiency BP演算法每次update的參數為polynomial,因此BP演算法具有 computational efficiency 。 Sensitivity Analysis BP可對輸入和輸出的實現進行sensitivity analysis Robustness 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 78 Virtues and limitations of BP Training Convergence BP在權重空間的error surface上以instantaneous estimate方式求梯 度。因此是一種stochastic。 當error surface太過平坦,將造成收斂速度太慢。反之若error surface太陡,將越過error surface的最低點。 負梯度向量方向可能指向錯誤的方向。 Local Minima BP learning is basically a hill climbing technique, it runs the risk of being trapped in a local minimum. Scaling how well the network behaves as the computational task increases in size and complexity e.g. to perform the parity function, it was found that the computation time increases exponentially with the number of inputs 對一個MLP採用fully connected是不明智的。 網路的架構和加在神經鍵上的權重應該結合欲解決問題的prior information 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 79 Accelerated Convergence 1. Every network parameter has its own learning rate 2. Every η allowed to vary from one iteration to the next 3. 某一 weight 之 update 4. E 有相同符號 wi η E 若符號經常變換η wi the use of a different and time-varying η 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 80 Supervised Learning Viewed as an Optimization Problem 將MLP的監督式學習視為一種數值最佳化的問題。 使用Taylor series建構一local approximation of the cost function Eav 在w的梯度 Eav (w n w n ) Eav (w n ) 擾亂量 Hessian matrix g T (n)w n 1 w T n Hn w n 2 (高階項 ) 其中 Eav w 2Eav w gn , Hn w w w n w 2 w w n 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 81 Supervised Learning Viewed as an Optimization Problem BP演算法的最陡坡降法的標準權重修正量為 wn gn 優點是容易實現,但缺點則是當問題較大時,收斂速度太 慢。即使可以加上慣性量,相對的也增加參數設定上的困 擾。 為了在收斂效能上有明顯的提升,必須在訓練過程使用高 階資訊。 假設在目前的w(n)附近的error surface為quadratic approximation,則權重向量最佳的修正值可定義為 w * n H1 ngn 問題是H-1很難求得。 因此採用Conjugate-Gradient Method 可加速收斂速度,同時避免計算H-1的複雜度。 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 82 Conjugate-Gradient Method A-conjugate Given the matrix A, we say that a set of nonzero vectors s(0), s(1),…,s(W-1) is A-conjugate, if the following condition is satisfied sT (n)As( j ) 0, n j 則這些向量之間為線性獨立(linearly independent) < Proof > :假設不是 linearly, then 因為根據前面的假設A為正值,且s(0)不為零 因此上式不可能為零 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 83 Conjugate-Gradient Method Considering the minimization of the quadratic function 1 T f (x) x Ax bT x c Ax b 0 2 (4.122) A is a nw的symmetric, positive definite matrix,則解為 x* A 1b 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.123) 84 Conjugate-Gradient Method Example 4.1 x x0 , x1 T v A1 / 2 x Orthogonal direction vector 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 85 Conjugate-Gradient Method For a given set of A-conjugate vectors s(0), s(1),…, s(W-1)則conjugate direction method for unconstrained minimization of f(x)為 xn 1 xn nsn, n 0,1,..., W 1 (4.125) where (n) 是一scalar,定義成 f xn nsn min f xn sn 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering (4.126) 86 Conjugate-Gradient Method (1) 將(4.125)帶入(4.122) f (x) (2)求 f (x) ,可解出 0 x(n) A1b (3)因此,conjugate direction法保證,從任意的x(0)開始, 經由最多W次iteration,可找到最佳解x* 。 因此,依照conjugate direction法,x(n+1)逐步的在線性空 間Dn上最小化f(x) 也就是 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 87 Conjugate-Gradient Method Note: 上面方法須預知A - conjugatevectors,亦即 S (0), S (1),...,S ( w 1) 改為以下方法來推算S (0), S (1),...,S ( w 1) f 此項其實也等於 為gradient descent!! x 其中, 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 88 Conjugate-Gradient Method (n) 之推導 S (n) and S (n 1) 必須為 A - conjugate, 故 S T (n 1) AS (n) S T (n 1) A[r (n) (n) S (n 1)] 0 S T (n 1) AS (n 1) S T (n 1) Ar(n) 表由此 (n) 求得之S (n)彼此之間為A - conjugate 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 89 Conjugate-Gradient Method For computational reasons, it would be desirable to evaluate (n) without explicit knowledge of A 有人提出下列兩種方法: Polak-Ribiere formula Superior for Non-quadratic optimization Feletcher-Reeves formula 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 90 Conjugate-Gradient Method 1 : non - quadrat ic, P olak- Ribiere is superior to Flet cher- Reevesform. (a) 在做 search 時, 其direct ion將逐漸偏離, 造成 Algrit hm t ojam.ie, S (n) 幾乎 r (n). 此時 r (n) r ( n 1). 則 P olak- Ribiere 之 (n) 0 S (n) r (n), breaking t he jam 故較佳 (b) 有時 P olak- Ribiere 方法亦會 cycleindefinet ely 改進方法, 以 max PR ,0 亦即當 PR 0, 則 0, 相當於 rest art t he search 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 91 Conjugate-Gradient Method 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 92 Conjugate-Gradient Method 當計算 (n),不知道 A 時 In computing (n) using line search : e av ( w s), 其中 w s 相當為 tracea line in the w - dimensional vectorspace of w bracketingphase: find bracket[] sectioningphase: 產生下一sequence之 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 93 Conjugate-Gradient Method Ex: 1 2 3 而 e av (1 ) e av (3 ) e av ( 2 ) (亦即在1 ,3 中間, 存在一 2 , 其 e av ( 2 ) 比 e av (1 ),e av (3 )都小) 則 minimum點在 [1 , 3 ] 之間 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 94 Conjugate-Gradient Method 若以1 , 2 , 3 做一 parabolic,找到此 parabolic的頂點4 , 測 e av 4 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 95 Conjugate-Gradient Method 則若 e av (4 ) e av (2 ), then wehave1 2 4 3 而 e av (1 ) e av (3 ) e av (2 ) e av (4 ), 則表示 min 在 2 ,3 之間 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 96 Conjugate-Gradient Method 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 97 Conjugate-Gradient Method 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 98 Conjugate-Gradient Method 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 99 Conjugate-Gradient Method 資訊工程所 醫學影像處理實驗室(Medical Image Processing Lab. ) Graduate School of Computer Science & Information Engineering 100