Supervised learning network-latest

Supervised learning network G.Anuradha Architecture • Earlier attempts to build intelligent and self learning systems using simple components • Used to solve simple classification problems • Used by Rosenblatt to explain the patternrecognition abilities of biological visual systems. Sensory Unit Associator Unit Binary activation function Response Unit Activation +1 0 -1 Quiz • Which of the features would probably not be useful for classifying handwritten digits from binary images? Raw pixels from images Set of strokes that can be combined to form various digits Day of the year on which the digits were drawn Number of pixels set to one Perceptron networks-Theory single-layer feed forward networks 1. It has 3 units:, 1. 2. 3. 2. 3. 4. input(sensory), hidden(associator unit) Output (response unit) Input-hidden fixed weights -1,0,1 assigned at random, binary activation fn: Output unit (1,0,-1) activation, binary step fn: with threshold θ Output of perceptron is y  f ( yin) 1ifyin      f ( yin)  0if    yin     1 yin      Perceptron theory 5. Weight updation between hidden and output unit 6. Checks out for error between hidden and output layer 7. Error=target-calculated 8. weights are adjusted in case of error wi (new)  wi (old)  txi b(new)  b(old)  t α is the learning rate, ‘t’ is the target which is -1 or 1. No error-no weight change-training is stopped Single classification perceptron network x 0 1 x1 X1 b w1 xi Xi xn Xn wi Y wn y Perceptron training algo for single output classes • Step 0: initialize weights,bias,learning rate(between 0 and1) • Step 1: perform step 2-6 until final stopping condition is false • Step 2: perform steps 3-5 for each training pair indicated by s:t • Step 3: input layer is applied with identity activation fn: – xi=si • Step 4: calculate yin y=f(yin) 1ifyin      f ( yin)  0if    yin     1yin      Perceptron training algo for single output classes • Step 5: Weight and bias adjustment: Compare the value of actual and desired(target) If y≠t else wi (new)  wi (old )  txi b(new)  b(old )  t wi(new)=wi(old) b(new)=b(old) •Step 6: train the network until there is no weight change. This is the stopping condition for the network. If not met start from Step n2 EXAMPLE Start Stop If weight change s Initialize weights and bias Set α (0 to 1) W(new)=w(old) B(new)=b(old) For each s:t wi (new)  wi (old)  txi b(new)  b(old)  t If y!=t Y Activate input units Xi=si Calculate net input Apply activation function y=f(yin) Perceptron training algo for multiple output classes • Step 0: Initialize the weights, biases, and learning rate suitably • Step 1: Check for stopping condition; if false then perform steps 2-6 • Step 2: Perform steps 3 to 5 for each bipolar or binary training vector pair s:t • Step 3: Set activation(identity) a each input unit i=1 to n xi=si Perceptron training algo for multiple output classes • Step 4: calculate output response n yinj  bj   xiwij i 1 Activations are applied over the net input to calculate the output response 1ifyin      f ( yin)  0if    yin     1 yin      Perceptron training algo for multiple output classes • Step 5: Make adjustment in weights and bias for j=1 to m and i=1 to n If ti≠yj then wij(new)  wij(old)  tjxi else wij(new)  wij(old ) bj (new)  bj (old ) Step 6: Check for stopping condition. No change in weights then stop training process Example of AND Linear separability • Perceptron network is used for linear separability concept. • Separating line is based of threshold θ • The condition for separating the response from region of positive to region of zero is w1x1+w2x2+b> θ • The condition for separating the response from region of zero to region of negative is w1x1+w2x2+b<- θ What binary threshold neurons cannot do • A binary threshold output unit cannot even tell if two single bit features are the same! Positive cases (same): (1,1)  1; (0,0)  1 Negative cases (different): (1,0)  0; (0,1)  0 • The four input-output pairs give four inequalities that are impossible to satisfy: w1 + w2 ³ q , 0 ³ q w1 < q , w2 < q -q w1 w2 1 x2 x1 A geometric view of what binary threshold neurons cannot do Imagine “data-space” in which the axes correspond to components of an input vector. – Each input vector is a point in this space. – A weight vector defines a plane in data-space. – The weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold. 0,1 1,1 0,0 1,0 The positive and negative cases cannot be separated by a plane Discriminating simple patterns under translation with wrap-around • Suppose we just use pixels as the features. • Can a binary threshold unit discriminate between different patterns that have the same number of on pixels? – Not if the patterns can translate with wrap-around! pattern A pattern A pattern A pattern B pattern B pattern B Learning with hidden units • • • For such linear separability problem we require an additional layer called as hidden layer. Networks without hidden units are very limited in the input-output mappings they can learn to model.. We need multiple layers of adaptive, non-linear hidden units. Solution to EXOR problem ADALINE • A network with a single linear unit is called an ADALINE (ADAptive LINear Neuron) • Input-output relationship is linear • Uses bipolar activation for its input signals and its target output • Weights between the input and output are adjustable and has only one output unit • Trained using Delta rule (Least mean square) or (Widrow-Hoff rule) Architecture • Delta rule for Single output unit – Minimize the error over all training patterns. – Done by reducing the error for each pattern one at a time • Delta rule for adjusting the weight for ith pattern is (i=1to n) wi   (t  yin) xi • Delta rule in case of several output units for adjusting the weight from ith input unit to jth output unit wij   (t  yinj ) xi Difference between Perceptron and Delta Rule Perceptron Delta Originates from hebbian assumption Derived from gradientdescent method Stops after a finite number of learning steps Continuous forever converging asymptotically to the solution Minimizes error over all training patterns Architecture x0=1 1 b x1 X1 w1  yin= x1wi f(yin) w2 x2 X2 wn yin xn Xn e=t-yin Adaptive algorithm O/p error generator t Start Stop Y Initialize weights and bias and α If Ei=Es Input the specified tolerance error Es Calculate error Ei=Σ(t-yin)2 For each s:t wi (new)  wi (old )   (t  yin) xi b(new)  b(old )   (t  yin) Y Activate input units Xi=si Calculate net input Yin=b+Σxi wi Madaline • Two or more adaline are integrated to develop madaline model • Used for nonlinearly separable logic functions (EX-OR) function • Used for adaptive noise cancellation and adaptive inverse control • In noise cancellation the objective is to filter out an interference component by identifying a linear model of a measurable noise source and the corresponding immeasurable interference. • ECG, echo elimination from long distance telephone transmission lines

Supervised learning network-latest

Related documents

Products

Support

Supervised learning network-latest

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib