Function Approximation Fariba Sharifian Somaye Kafi Function Approximation spring 2006 1 Contents Introduction to Counterpropagation Full Counterpropagation Architecture Algorithm Application example Forward only Counterpropagation Architecture Algorithm Application example Function Approximation spring 2006 2 Contents Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 3 Introduction to Counterpropagation are multilayer networks based on combination of input, clustering and output layers can be used to compress data, to approximate functions, or to associate patterns approximate its training input vectors pair by adoptively constructing a lookup table Function Approximation spring 2006 4 Introduction to Counterpropagation (cont.) training has two stages Clustering Output weight updating There are two types of it Full Forward only Function Approximation spring 2006 5 Full Counterpropagation Produces an approximation x*:y* based on input of an x vector input of a y vector only input of an x:y ,possibly with some distorted or missing elements in either or both vectors. Function Approximation spring 2006 6 Full Counterpropagation (cont.) Phase 1 The units in the cluster layer compete. The learning rule for weight updates on the winning cluster unit is (only the winning unit is allowed to learn) wiJnew wiJold xi wiJold i 1,2,...,n new old old ukJ wkJ yk ukJ k 1,2,...,m (This is standard Kohonen learning) Function Approximation spring 2006 7 Full Counterpropagation (cont.) Phase 2 The weights from the winning cluster unit J to the output units are adjusted so that the vector of activations of the units in the Y output layer, y*, is an approximation to the input vector y; x*, is an approximation to the input vector x. The weight updates for the units in the Y output and X output layers are v v a yk v k 1,2,..., m new old old t Ji t Ji bxi t Ji i 1,2,..., n (This is known as Grossberg learning) new Jk Function Approximation spring 2006 old Jk old Jk 8 Architecture of Full Counterpropagation X1 w Xi u Ym Zj v Y1 * Yk* Ym* Zp Cluster layer Y1 Yk Z1 Xn Function Approximation spring 2006 Hidden layer t X1* Xi* Xn* 9 Full Counterpropagation Algorithm x : input trai ning vector : x ( x1 ,..., xi ,..., xn ) y : target output corresponding to input x : y ( y1,...,yk ,...,ym ) z j : activation of cluster layer unit Z j x* : computed approximation to vector x y * : computed approximation to vector y wij : weight from X input layer, unit X i , to cluster layer, unit Z j ukj : weight from Y input layer, unit Yk , to cluster layer, unit Z j v jk : weight from cluster layer, unit Z j , to Y output layer, unit Yk* t jk : weight from cluster layer, unit Z j , to X output layer, unit X i* , : learning rates for weight s into cluster layer (Kohonen learning) Approximation rates for weight out from cluster layer (Grossberg learning) 10 aFunction , b : learning spring 2006 Full Counterpropagation Algorithm (phase 1) Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3-8 Step 3. For each training input pair x:y, do Step 4-6 Step 4. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 5. Find winning cluster unit; call its index J Step 6. Update weights for unit ZJ: Function Approximation spring 2006 Step 7. Reduce learning rate and . Step 8. Test stopping condition for Phase 1 training 11 Full Counterpropagation algorithm (phase 2) Step 9. While stopping condition for Phase 2 is false, do Step 1016 (Note: and are small, constant values during phase 2) Step 10. For each training input pair x:y, do Step 11-14 Function Approximation spring 2006 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ: 12 Full Counterpropagation Algorithm (phase 2)(cont.) Step 14. Update weights from unit ZJ to the output layers Step 15. Reduce learning rate a and b. Step 16. Test stopping condition for Phase 2 training. Function Approximation spring 2006 13 Which cluster is the winner? dot product (find the cluster with the largest net input) net j xi wij yk ukj Euclidean distance i(find the clusterk with smallest square distance from the input) Dj xi wij yk ukj 2 i Function Approximation spring 2006 2 k 14 Full Counterpropagation Application The application for counterpropagation is as follows: Step0: initialize weights. step1: for each input pair x:y, do step 2-4. Step2: set X input layer activation to vector x set Y input layer activation to vector Y; Function Approximation spring 2006 15 Full Counterpropagation Application (cont.) Step3: find cluster unit Z, that is closest to the input pair Step4: compute approximations to x and y: X*i=tji Y*k=ujk Function Approximation spring 2006 16 Full counterpropagation example Function approximation of y=1/x After training phase we have Cluster unit z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 Function Approximation spring 2006 v 0.11 0.14 0.20 0.30 0.60 1.60 3.30 5.00 7.00 9.00 w 9.0 7.0 5.0 3.3 1.6 0.6 0.3 0.2 0.14 0.11 17 Full counterpropagation example (cont.) X1 Y1 0.11 0.14 0.2 9.0 Function Approximation spring 2006 Z1 7.0 5.0 Z2 . . . 7.0 Y1* 9.0 5.0 Z10 0.14 0.2 0.11 X1* 18 Full counterpropagation example (cont.) To approximate value for y for x=0.12 As we don’t know any thing about y compute D just by means of x D1=(.12-.11)2 =.0001 D2=.0004 D3=.064 D4=.032 D5=.23 D6=2.2 D7=10.1 D8=23.8 D9=47.3 D10=81 Function Approximation spring 2006 19 Forward Only Counterpropagation Is a simplified version of the full counterpropagation Are intended to approximate y=f(x) function that is not necessarily invertible It may be used if the mapping from x to y is well defined, but the mapping from y to x is not. Function Approximation spring 2006 20 Forward Only Counterpropagation Architecture XY X1 w XY u Z1 Y1 Xi Zj Yk Xn Zp Input layer Function Approximation spring 2006 Cluster layer Ym Output layer 21 Forward Only Counterpropagation Algorithm Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3-8 Step 3. For each training input x, do Step 4-6 Step 4. Set X input layer activations to vector x Step 5. Find winning cluster unit; call its index j Step 6. Update weights for unit ZJ: new old old wiJ wiJ xi wiJ , i 1,2,..., n Function Approximation spring 2006 Step 7. Reduce learning rate Step 8. Test stopping condition for Phase 1 training. 22 Step 9. While stopping condition for Phase 2 is false, do Step 10-16 (Note: is small, constant values during phase 2) Step 10. For each training input pair x:y, do Step 11-14 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ ( is small) wiJ weights xi from wiJunit, ZJ i to1,2,..., n layers Stepw14. the output iJ Update new old old uJknew uJkold a yk uJkold , k 1,2,...,m. Step 15. Reduce learning rate a. Step 16. Test stopping condition for Phase 2 training. Function Approximation spring 2006 23 Forward Only Counterpropagation Application Step0: initialize weights (by training in previous subsection). Step1: present input vector x. Step2: find unit J closest to vector x. Step3: set activation output units: yk=ujk Function Approximation spring 2006 24 Forward only counterpropagation example Function approximation of y=1/x After training phase we have Cluster unit z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 Function Approximation spring 2006 w 0.5 1.5 2.5 . . . . . . 9.5 u 5.5 0.75 0.4 . . . . . . 0.1 25 Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 26 Introduction analytical description for a set of data referred to as data modeling or system identification Function Approximation spring 2006 27 standard tools Splines Wavelets Neural network Function Approximation spring 2006 28 Why Using Neural Network Splines & Wavelets not generalize well to higher 3 dimensional spaces universal approximators parallel architecture trained to map multidimensional nonlinear functions Function Approximation spring 2006 29 Why Using Neural Network (cont) Central to the solution of differential equations. Provide differentiable closed-analytic- form solutions have very good generalization properties widely applicable translates into a set of nonlinear, transcendental weight equations cascade structure nonlinearity of the hidden nodes linear operations in the input and output layers Function Approximation spring 2006 30 Function Approximation Using Neural Network functions not known analytically have a set of precise input–output samples functions modeled using an algebraic approach design objectives: exact matching approximate matching feedforward neural networks Data: Input Output And/or gradient information Function Approximation spring 2006 31 Objective exact solutions sufficient degrees of freedom retaining good generalization properties synthesize a large data set by a parsimonious network Function Approximation spring 2006 32 Input-to-node values algebraic training base if all sigmoidal functions inputs are known weight equations become algebraic input-to-node values, sigmoidal functions inputs determine the saturation level of each sigmoid at a given data point Function Approximation spring 2006 33 weight equations structure analyze & train a nonlinear neural network means linear algebra controlling the distribution controlling the saturation level of the active nodes Function Approximation spring 2006 34 Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 35 Development of Neural Network Weight Equations Objective approximate a smooth scalar function of q Inputs using a feedforward sigmoidal network Function Approximation spring 2006 36 Derivative information can improve network’s generalization properties partial derivatives with input can be incorporated in the training set Function Approximation spring 2006 37 Network Output z: computed as a nonlinear transformation w: input weight p: input b: bias d: output bias v: output weight :sigmoid functions such as: input-to-node variables Function Approximation spring 2006 38 Scalar OutPut of Network Function Approximation spring 2006 39 Exactly Match of the Function’s Outputs output weighted equation Function Approximation spring 2006 40 Gradient Equations derivative of the network output with respect to its inputs Function Approximation spring 2006 41 Exact Matching of the Function’s Derivatives gradient weight equations Function Approximation spring 2006 42 Input-to-node Weight Equations rewriting 12 Function Approximation spring 2006 43 Four Algebraic Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 44 Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 45 A.Exact Matching of Function Input-Output Data Input S is known matrix ps strategy for producing a well-conditioned S input weights o random number N(0,1) L scaling factor user-defined scalar input-to-node values that do not saturate the sigmoids Function Approximation spring 2006 46 Input bias The input bias d is computed to center each sigmoid at one of the training pairs from Function Approximation spring 2006 47 Finally, the linear system in (9) is solved for v by inverting S Function Approximation spring 2006 48 17 produced an ill-conditioned S => computation repeated Function Approximation spring 2006 49 Exact Input-Output-Based Algebraic Algorithm Function Approximation spring 2006 50 Fig. 2-a. Exact input–output-based algebraic algorithm Exact Input-Output-Based Algebraic Algorithm with gradient information. Fig. 2-b. Exact input–output-based algebraic algorithm with added p-steps for incorporating gradient information. Function Approximation spring 2006 51 Then Exact matching Input output gradient information solved exactly simultaneously for the neural parameters. Function Approximation spring 2006 52 Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 53 B.Approximate Matching of Gradient Data in Algebra Training estimate output weights input-to-node values first soluation: use randomized W all parameters refined by a p-step node-by-node update algorithm. Function Approximation spring 2006 54 Approximate Matching of Gradient Data in Algebra Training (cont) d and Function Approximation spring 2006 can be computed solely from 55 Approximate Matching of Gradient Data in Algebra Training (cont) kith gradient equations solved for the input weights associated with the ith node Function Approximation spring 2006 56 Approximate Matching of Gradient Data in Algebra Training (cont) end of each step terminate Solve user-specified gradient tolerance error enters through v and through the input weights error adjusted in later steps basic idea ith node input weights mainly contribute to the kth partial derivatives Function Approximation spring 2006 57 Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 58 C.Approximate Matching of Function Input-Output Data algebraic approach approximate parsimonious network exact sulotion s<p satisfy rank(S|u)= rank(S)= s example linear system in (9) not square sp inverse relationship between u and v (9) will be overdetermined Function Approximation spring 2006 59 Approximate Matching of Function Input-Output Data (cont) superimposes technique networks that individually map the nonlinear function over portions of its input space training set, covering entire input space input space divided into m subsets Function Approximation spring 2006 60 Approximate Matching of Function Input-Output Data (cont) J Fig. 3.Superposition of one s-node network Function Approximation spring 2006 -node neural networks into 61 Approximate Matching of Function Input-Output Data (cont) the gth neural network approximates the vector by the estimate Function Approximation spring 2006 62 Approximate Matching of Function Input-Output Data (cont) full network matrix of input-to-node values with the Terms element in the ith column and kth row main diagonal terms input-to-node value matrices for m sub-networks off-diagonal terms, columnwise linearly dependent on the elements in Function Approximation spring 2006 63 Approximate Matching of Function Input-Output Data (cont) output weights S constructed to be of rank s rank of = s or s+1 zero or small error during the superposition error does not increase with m Function Approximation spring 2006 64 Approximate Matching of Function Input-Output Data (cont) key to developing algebraic training techniques construct a matrix S, through N display the desired characteristics desired characteristics S must be of rank s s is kept small to produce a parsimonious network. Function Approximation spring 2006 65 Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006 66 D.Exact Matching of Function Gradient Data Gradient-based training sets At every training point k is known for e of the neural network inputs denoted by x remaining (q-e) denoted by a Input–output information Function Approximation spring 2006 & 67 Exact Matching of Function Gradient Data (cont) input weight output weight gradient weight input-to-node weight equation Function Approximation spring 2006 68 First Linear System(36) by reorganizing all s=p => is a known vector rewritten values -dimensional column f A is a ps(q-e+1)s matrix computed from all –input vectors Function Approximation spring 2006 69 Second Linear System(34) known (34) system Becomes linear always can be solved for v provided s = p S nonsingular v can be treated as a constant Function Approximation spring 2006 70 Third Linear System(35) (35) becomes linear unknowns consist of x-input weights known gradients in training set X is a known epes Function Approximation spring 2006 71 Exact Matching of Function Gradient Data (cont) algorithm goals determines effective distribution for elements weight equations solved in one step first solved strategy with probability=1, produce well-conditioned S consists of generating according to Function Approximation spring 2006 72 Input-to-Output Values Substituted in (38) Function Approximation spring 2006 73 Input-to-Output Values (cont) sigmoids are very nearly centered desirable one sigmoid be centered for a given input prevent ill-conditioning S same sigmoid should close to saturation for any other known input need a factor absolute value of the largest element in Function Approximation spring 2006 74 Exact Matching of Function Gradient Data (cont) Function Approximation spring 2006 75 Example: Neural Network Modeling of the Sine Function A sigmoidal neural network is trained to approximate the sine function u=sin(y) over the domain 0≤ y ≤π The training set is comprised of the gradient and output information shown in the table1.{yk, uk , ck} k=1,2,3 q=e=1 Function Approximation spring 2006 76 Function Approximation spring 2006 77 Function Approximation spring 2006 78 It is shown that the data is matched exactly by a network with two nodes Suppose the input-to-node values and are chosen such that Function Approximation spring 2006 79 Function Approximation spring 2006 80 Function Approximation spring 2006 81 equations. In this example, is chosen to make the above weight equations consistent and to meet the assumptions in (57) and (60)–(61). It can be easily shown that this corresponds to computing the elements of ( and ) from the equation Function Approximation spring 2006 82 Function Approximation spring 2006 83 Function Approximation spring 2006 84 Function Approximation spring 2006 85 Conclusion algebraic training vs optimization-based techniques. faster execution speeds better generalization properties reduced computational complexity can be used to find a direct correlation between the number of network nodes needed to model a given data set and the desired accuracy of representation. Function Approximation spring 2006 86 Function Approximation Fariba Sharifian Somaye Kafi Function Approximation spring 2006 87