Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Artificial Neural Networks Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Artificial Neural Networks Single Layer Networks 1 Artificial Neural Networks Properties Applications Classical Examples Biological Background 2 Single Layer Networks Limitations Training Single Layer Networks 3 Multi Layer Networks Possible Mappings Backprop algoritmen Practical Problems 4 Generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Multi Layer Networks Generalization Properties 1 Artificial Neural Networks Properties Applications Classical Examples Biological Background 2 Single Layer Networks Limitations Training Single Layer Networks 3 Multi Layer Networks Possible Mappings Backprop algoritmen Practical Problems 4 Generalization Artificial Neural Networks Single Layer Networks Artificial Neural Networks (ANN) Inspired from the nervous system Parallel processing We will focus on one class of ANNs: Feed-forward Layered Networks Multi Layer Networks Generalization Artificial Neural Networks Applications Classical Examples Applications Classical Examples Single Layer Networks Multi Layer Networks Operates like a general ”Learning Box”! ALVINN Classification Autonomous driving Yes/No Function Approximation [−1, 1] Video image Trained to mimic the behavior of human drivers Multidimensional Mapping Steering Generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Classical Examples Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Biological Background Classical Examples How do real neurons (nerve cells) work? NetTalk Speech Synthesis Dendrites Soma "Hello" Phonem Written text Dendrites Passive reception of (chemical) signals Soma (Cell Body) Summing, Thresholding Coded pronunciation Axon Aktive pulses are transmitted to other cells Trained using a large database of spoken text Artificial Neural Networks Single Layer Networks Multi Layer Networks Axon Generalization Biological Background Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Biological Background ANN-caricatures Nerve cells can vary in shape and other properties (simplified view of the neural information processing) Σ Weighted input signals Summing Thresholded output Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Limitations 1 Artificial Neural Networks Properties Applications Classical Examples Biological Background 2 Single Layer Networks Limitations Training Single Layer Networks 3 Multi Layer Networks Possible Mappings Backprop algoritmen Practical Problems 4 Generalization What do we mean by a Single Layer Network? Each cell operates independently of the others! It is sufficient to understand what one cell can compute Generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Limitations Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Limitations What can a single ”cell” compute? X o = sign ! xi wi i Σ Geometrical interpretation x2 ~x Input in vector format w ~ Weights in vector format w x1 o Output o = sign X ! xi wi Separating hyper plane Linear separability i Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Training Single Layer Networks Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Training Single Layer Networks Learning in ANNs Perceptron Learning What does learning mean here? Incremental learning The network structure is normally fixed Weights only change when the output is wrong Learning means finding the best weights wi Update rule: wi ← wi + η(t − o)xi Always converges if the problem is solvable Two good algorithms for single layer networks: Perceptron Learning Delta Rule Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Artificial Neural Networks Single Layer Networks Training Single Layer Networks 1 Artificial Neural Networks Properties Applications Classical Examples Biological Background 2 Single Layer Networks Limitations Training Single Layer Networks 3 Multi Layer Networks Possible Mappings Backprop algoritmen Practical Problems 4 Generalization Delta Rule (LMS-rule) Incremental learning Weights always change ~ T ~x )xi wi ← wi + η(t − w Converges only in the mean Will find an optimal solution even it the problem can not be fully solved Multi Layer Networks Generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Possible Mappings Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Possible Mappings What is the point of having multiple layers? Will it be even better with more layers? Two layers can describe any classification Two layers can approximate any ”continuous” function Three layers can sometimes do the same thing more efficiently More than three layers are rarely used A two layer network can implement arbitrary decision surfaces ...provided we have enough hidden units Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Backprop algoritmen Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Backprop algoritmen How can we train a multi layer network? Neither perceptron learning, nor the delta rule can be used Fundamental problem: When the network gives the wrong answer there is no information on in which direction the weights need to change to improve the result Basic idéa: Minimize the error (E ) as a function of all weights (~ w) Fundamental trick: Use threshold-like, but continuous functions 1 Σ Artificial Neural Networks 2 Σ Single Layer Networks Multi Layer Networks Backprop algoritmen Compute the direction in weight space where the error increases the most gradw~ (E ) Change the weights in the opposite direction wi ← wi − η Generalization Artificial Neural Networks Single Layer Networks ∂E ∂wi Multi Layer Networks Generalization Backprop algoritmen Normally one can use the error from each example separately E= 1 X (tk − ok )2 2 The gradient can be expressed as a function of a local generalized error δ ∂E = −δi xj ∂wji k∈Out A common ”threshold-like function” is Output layer: 1 ρ(y ) = 1 + e −y wji ← wji + ηδi xj δk = ok · (1 − ok ) · (tk − ok ) Hidden layers: 1 0.75 δh = oh · (1 − oh ) · 1 1 + e−x 0.5 −5 0 5 wkh δk k∈Out 0.25 0 −10 X 10 The error δ propagates backwards through the layers Error backpropagation (BackProp) Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Multi Layer Networks Generalization Practical Problems 1 Artificial Neural Networks Properties Applications Classical Examples Biological Background 2 Single Layer Networks Limitations Training Single Layer Networks 3 Multi Layer Networks Possible Mappings Backprop algoritmen Practical Problems 4 Generalization Things to think about then using BackProp Sloooow Normal to require thousands of iterations through the dataset Gradient following Risk of getting stuck in local minima Many parameters Step size η Number of layers Number of hidden units Input and output representation Initial weights Artificial Neural Networks Single Layer Networks Multi Layer Networks Generalization Artificial Neural Networks Single Layer Networks Generalization Risk for overfitting (överinlärning)! The net normally interpolates smoothly between the data points If the network has too many degrees of freedom (weights), the risk increases that learning will find a ”strange” solution 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0 0 0.2 0.4 0.6 0.8 0 1 0 Results in good generalization Artificial Neural Networks Single Layer Networks Multi Layer Networks Limiting the number of hidden units tends to improve generalization 1 0.75 0.5 0.25 0 0 0.2 0.4 0.6 0.8 1 Generalization 0.2 0.4 0.6 0.8 1