VII. Neural Networks and Learning 11/24/03 1 Supervised Learning • Produce desired outputs for training inputs • Generalize reasonably & appropriately to other inputs • Good example: pattern recognition • Feedforward multilayer networks 11/24/03 2 input layer 11/24/03 hidden layers ... ... ... ... ... ... Feedforward Network output layer 3 Typical Artificial Neuron connection weights inputs output threshold 11/24/03 4 Typical Artificial Neuron linear combination activation function net input (local field) 11/24/03 5 Equations n hi = ∑ w ij s j − θ j=1 h = Ws − θ Net input: Neuron output: € 11/24/03 s′i = σ ( hi ) s′ = σ (h) 6 11/24/03 ... ... Single-Layer Perceptron 7 Variables x1 w1 xj Σ wj h Θ y wn xn 11/24/03 θ 8 Single Layer Perceptron Equations Binary threshold activation function : 1, if h > 0 σ ( h ) = Θ( h ) = 0, if h ≤ 0 1, if ∑ w x > θ j j j Hence, y = 0, otherwise € 1, if w ⋅ x > θ = 0, if w ⋅ x ≤ θ 11/24/03 € 9 € 2D Weight Vector w2 w ⋅ x = w x cos φ v cos φ = x + w w⋅ x = w v φ € € € x – w⋅ x > θ ⇔ w v >θ ⇔v >θ w 11/24/03 v w1 θ w 10 N-Dimensional Weight Vector + w normal vector separating hyperplane – 11/24/03 11 Goal of Perceptron Learning • Suppose we have training patterns x1, x2, …, xP with corresponding desired outputs y1, y2, …, yP • where xp ∈ {0, 1}n, yp ∈ {0, 1} • We want to find w, θ such that yp = Θ(w⋅xp – θ) for p = 1, …, P 11/24/03 12 Treating Threshold as Weight n h = ∑ w j x j − θ j=1 x1 n = −θ + ∑ w j x j w1 xj j=1 Σ wj wn xn 11/24/03 h Θ y € θ 13 Treating Threshold as Weight n h = ∑ w j x j − θ j=1 x0 = –1 x1 n w1 xj wn xn j=1 Σ wj = −θ + ∑ w j x j θ = w0 h Θ y € Let x 0 = −1 and w 0 = θ n n ˜ ⋅ x˜ h = w 0 x 0 + ∑ w j x j =∑ w j x j = w j=1 11/24/03 € j= 0 14 Augmented Vectors θ w1 ˜ = w M wn −1 p x 1 p x˜ = M p xn ˜ ⋅ x˜ p ), p = 1,K,P We want y p = Θ( w € 11/24/03 € 15 Reformulation as Positive Examples We have positive (y p = 1) and negative (y p = 0) examples ˜ ⋅ x˜ p > 0 for positive, w ˜ ⋅ x˜ p ≤ 0 for negative Want w € € € € Let z p = x˜ p for positive, z p = −x˜ p for negative ˜ ⋅ z p ≥ 0, for p = 1,K,P Want w Hyperplane through origin with all z p on one side 11/24/03 16 Adjustment of Weight Vector z5 z1 z9 z10 11 z z6 z2 z3 z8 z4 z7 11/24/03 17 Outline of Perceptron Learning Algorithm 1. initialize weight vector randomly 2. until all patterns classified correctly, do: a) for p = 1, …, P do: 1) if zp classified correctly, do nothing 2) else adjust weight vector to be closer to correct classification 11/24/03 18 Weight Adjustment p η z ˜ ′′ p w ηz z ˜′ w p ˜ w € € € € € € 11/24/03 19 Improvement in Performance p ˜ ⋅ z < 0, If w ˜ ′ ⋅ z = (w ˜ + ηz ) ⋅ z w p p p p p p ˜ = w ⋅ z + ηz ⋅ z € p ˜ ⋅ z +η z =w p 2 p ˜ > w⋅ z 11/24/03 € 20 Perceptron Learning Theorem • If there is a set of weights that will solve the problem, • then the PLA will eventually find it • (for a sufficiently small learning rate) • Note: only applies if positive & negative examples are linearly separable 11/24/03 21 Classification Power of Multilayer Perceptrons • Perceptrons can function as logic gates • Therefore MLP can form intersections, unions, differences of linearly-separable regions • Classes can be arbitrary hyperpolyhedra • Minsky & Papert criticism of perceptrons • No one succeeded in developing a MLP learning algorithm 11/24/03 22 Credit Assignment Problem input layer 11/24/03 hidden layers ... ... ... ... ... ... How do we adjust the weights of the hidden layers? output layer 23