Document

Multi layer feed-forward NN ARCHITECTURE Output layer Input layer Hidden Layer Neural Networks NN 3 1 A solution for the XOR problem x1 x1 -1 -1 1 1 x2 -1 1 -1 1 1 x1 xor x2 -1 1 1 -1 -1 1 x2 -1 -1 0.1 +1 x1 +1 -1 (v) = if v > 0 -1 if v  0  is the sign function. -1 x2 1 +1 +1 -1 Neural Networks NN 3 2 NEURON MODEL • Sigmoid Function  (v j ) 1 Increasing a  (v j )  vj  vj -10 -8 -6 -4 -2 • • • • 2 4 6 8 10 1 e w i 0,...,m 1  av j y ji i v j induced field of neuron j Most common form of activation function a     threshold function Differentiable (important in theory) Neural Networks NN 3 3 LEARNING ALGORITHM • Back-propagation algorithm Function signals Forward Step Error signals Backward Step • It adjusts the weights of the NN in order to minimize the average squared error. Neural Networks NN 3 4 Average Squared Error • Error signal of output neuron j at presentation of n-th training example: e (n)  d (n) - y (n) j • Total energy at time n: layer j E(n)  1 2 e jC • Average squared error Measure of learning performance j EAV  2 j C: Set of neurons (n) in output N: size of training set N 1 N  E (n) n 1 • Goal: Adjust weights of NN Neural Networks NN 3to minimize EAV 5 Notation e j Error at output of neuron j yj Output of neuron j vj  w i 0,...,m Neural Networks ji yi induced local field of neuron j NN 3 6 Weight Update Update rule is based on the gradient descent method take a step in the direction yielding the maximum decrease of E E w ji  - w ji Step in direction opposite to the gradient With w ji weight associated to the link from neuron i to neuron j Neural Networks NN 3 7 Define the Local Gradient of neuronejj (n)  d j (n) - y j (n) E E(n)  e (n) Local Gradient j  jC v j 1 2 We obtain 2 j  j  e j  ( v j ) because E E e j y j    e j ( 1) ' ( v j ) v j e j y j v j Neural Networks NN 3 8 Update Rule • We obtain w ji   j yi because E E v j  w ji v j w ji E  j v j Neural Networks NN 3 v j w ji E w ji  - w ji  yi 9 Compute local gradient of neuron j • The key factor is the calculation of ej • There are two cases: – Case 1): j is a output neuron – Case 2): j is a hidden neuron Neural Networks NN 3 10 Error ej of output neuron • Case 1: j output neuron ej  dj - yj Then  j  (d j - y j ) ' (v j ) Neural Networks NN 3 11 Local gradient of hidden neuron • Case 2: j hidden neuron • the local gradient for neuron j is recursively determined in terms of the local gradients of all neurons to which neuron j is directly connected Neural Networks NN 3 12 Use the Chain Rule e j (n)  d j (n) - y j (n) E y j j  y j v j y j v j E(n)    ' (v j ) E e k    ek  y j y j kC from We obtain Neural Networks e (n) jC 2 j   e k  v k ek    y  v kC k j   e k    ' (vk ) v k E   y j 1 2  kC NN 3 k v k  w kj y j w kj 13 Local Gradient of hidden neuron j Hence  j    ( v j ) k w kj kC w1j j ’(vj) 1 ’(v1) e1 k ’(vk) ek ’(vm) em wkj wm j m Neural Networks NN 3 Signal-flow graph of backpropagation error signals to neuron j 14 Delta Rule • Delta rule wji = j yi j    (v j )(dj  y j )   ( v j ) k wkj IF j output node IF j hidden node kC C: Set of neurons in the layer following the one containing j Neural Networks NN 3 15 Local Gradient of neurons  ' ( v j )  ay j[1  y j ] a>0 if j hidden node ay [1  y ]  w j j  k kj j  k ay j[1  y j ][d j  y j ] If j output node Neural Networks NN 3 16 Backpropagation algorithm • Two phases of computation: – Forward pass: run the NN and compute the error for each neuron of the output layer. – Backward pass: start at the output layer, and pass the errors backwards through the network, layer by layer, by recursively computing the local gradient of each neuron. Neural Networks NN 3 17 TRAINING • Sequential mode (on-line, pattern or stochastic mode): – (x(1), d(1)) is presented, a sequence of forward and backward computations is performed, and the weights are updated using the delta rule. – Same for (x(2), d(2)), … , (x(N), d(N)). Neural Networks NN 3 18 Training • The learning process continues on an epochby-epoch basis until the stopping condition is satisfied. • From one epoch to the next choose a randomized ordering for selecting examples in the training set. Neural Networks NN 3 19 Stopping criterions • Sensible stopping criterions: – Average squared error change: Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.1, 0.01]). – Generalization based criterion: After each epoch the NN is tested for generalization. If the generalization performance is adequate then stop. Neural Networks NN 3 20 Generalization • Generalization: NN generalizes well if the I/O mapping computed by the network is nearly correct for new data (test set). • Factors that influence generalization: – the size of the training set. – the architecture of the NN. – the complexity of the problem at hand. • Overfitting (overtraining): when the NN learns too many I/O examples it may end up memorizing the training data. Neural Networks NN 3 21 Expressive capabilities of NN Boolean functions: • Every boolean function can be represented by network with single hidden layer • but might require exponential (in number of inputs) hidden units Continuous functions: • Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer • Any function can be approximated to arbitrary accuracy by a network with two hidden layers Neural Networks NN 3 22 Generalized delta rule • If  small  Slow rate of learning If  large  Large changes of weights  NN can become unstable (oscillatory) • Method to overcome above drawback: include a momentum term in the delta rule Generalized w ji (n)  w ji (n  1)  j (n)yi (n) delta function momentum constant Neural Networks NN 3 23 Generalized delta rule • the momentum accelerates the descent in steady downhill directions. • the momentum has a stabilizing effect in directions that oscillate in time. Neural Networks NN 3 24  adaptation Heuristics for accelerating the convergence of the back-prop algorithm through  adaptation: • Heuristic 1: Every weight should have its own . • Heuristic 2: Every  should be allowed to vary from one iteration to the next. Neural Networks NN 3 25 NN DESIGN • • • • • Data representation Network Topology Network Parameters Training Validation Neural Networks NN 3 26 Setting the parameters • How are the weights initialized? • How is the learning rate chosen? • How many hidden layers and how many neurons? • How many examples in the training set? Neural Networks NN 3 27 Initialization of weights • In general, initial weights are randomly chosen, with typical values between -1.0 and 1.0 or -0.5 and 0.5. • If some inputs are much larger than others, random initialization may bias the network to give much more importance to larger inputs. In such a case, weights can be initialized as follows: w ji   21N wkj   21N Neural Networks  For weights from the input to the first layer 1 |x i | i 1,...,N  i 1,...,N ( 1 w ji x )  i For weights from the first to the second layer NN 3 28 Choice of learning rate • The right value of  depends on the application. Values between 0.1 and 0.9 have been used in many applications. • Other heuristics adapt  during the training as described in previous slides. Neural Networks NN 3 29 How many layers and neurons • The number of layers and of neurons depend on the specific task. In practice this issue is solved by trial and error. • Two types of adaptive algorithms can be used: – start from a large network and successively remove some neurons and links until network performance degrades. – begin with a small network and introduce new neurons until performance is satisfactory. Neural Networks NN 3 30 How many examples • Rule of thumb: – the number of training examples should be at least five to ten times the number of weights of the network. • Other rule: |W| N (1 - a) Neural Networks |W|= number of weights a=expected accuracy on test set NN 3 31

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib