Artificial Neural Networks

advertisement
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Single Layer Networks
1
Artificial Neural Networks
Properties
Applications
Classical Examples
Biological Background
2
Single Layer Networks
Limitations
Training Single Layer Networks
3
Multi Layer Networks
Possible Mappings
Backprop algoritmen
Practical Problems
4
Generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Multi Layer Networks
Generalization
Properties
1
Artificial Neural Networks
Properties
Applications
Classical Examples
Biological Background
2
Single Layer Networks
Limitations
Training Single Layer Networks
3
Multi Layer Networks
Possible Mappings
Backprop algoritmen
Practical Problems
4
Generalization
Artificial Neural Networks
Single Layer Networks
Artificial Neural Networks (ANN)
Inspired from the nervous system
Parallel processing
We will focus on one class of ANNs:
Feed-forward Layered Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Applications
Classical Examples
Applications
Classical Examples
Single Layer Networks
Multi Layer Networks
Operates like a general ”Learning Box”!
ALVINN
Classification
Autonomous driving
Yes/No
Function Approximation
[−1, 1]
Video image
Trained to mimic the behavior of human drivers
Multidimensional Mapping
Steering
Generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Classical Examples
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Biological Background
Classical Examples
How do real neurons (nerve cells) work?
NetTalk
Speech Synthesis
Dendrites Soma
"Hello"
Phonem
Written text
Dendrites
Passive reception of (chemical) signals
Soma (Cell Body)
Summing, Thresholding
Coded pronunciation
Axon
Aktive pulses are transmitted to other cells
Trained using a large database of spoken text
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Axon
Generalization
Biological Background
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Biological Background
ANN-caricatures
Nerve cells can vary in shape and other properties
(simplified view of the neural information processing)
Σ
Weighted input signals
Summing
Thresholded output
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Limitations
1
Artificial Neural Networks
Properties
Applications
Classical Examples
Biological Background
2
Single Layer Networks
Limitations
Training Single Layer Networks
3
Multi Layer Networks
Possible Mappings
Backprop algoritmen
Practical Problems
4
Generalization
What do we mean by a Single Layer Network?
Each cell operates independently of the others!
It is sufficient to understand what one cell can compute
Generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Limitations
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Limitations
What can a single ”cell” compute?
X
o = sign
!
xi wi
i
Σ
Geometrical interpretation
x2
~x Input in vector format
w
~ Weights in vector format
w
x1
o Output
o = sign
X
!
xi wi
Separating hyper plane
Linear separability
i
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Training Single Layer Networks
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Training Single Layer Networks
Learning in ANNs
Perceptron Learning
What does learning mean here?
Incremental learning
The network structure is normally fixed
Weights only change when the output is wrong
Learning means finding the best weights wi
Update rule: wi ← wi + η(t − o)xi
Always converges if the problem is solvable
Two good algorithms for single layer networks:
Perceptron Learning
Delta Rule
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Single Layer Networks
Training Single Layer Networks
1
Artificial Neural Networks
Properties
Applications
Classical Examples
Biological Background
2
Single Layer Networks
Limitations
Training Single Layer Networks
3
Multi Layer Networks
Possible Mappings
Backprop algoritmen
Practical Problems
4
Generalization
Delta Rule (LMS-rule)
Incremental learning
Weights always change
~ T ~x )xi
wi ← wi + η(t − w
Converges only in the mean
Will find an optimal solution even it the problem can not be
fully solved
Multi Layer Networks
Generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Possible Mappings
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Possible Mappings
What is the point of having multiple layers?
Will it be even better with more layers?
Two layers can describe any classification
Two layers can approximate any ”continuous” function
Three layers can sometimes do the same thing more efficiently
More than three layers are rarely used
A two layer network can implement arbitrary decision surfaces
...provided we have enough hidden units
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Backprop algoritmen
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Backprop algoritmen
How can we train a multi layer network?
Neither perceptron learning, nor the delta rule can be used
Fundamental problem:
When the network gives the wrong answer
there is no information on in which direction
the weights need to change to improve the result
Basic idéa:
Minimize the error (E ) as a function of all weights (~
w)
Fundamental trick:
Use threshold-like, but continuous functions
1
Σ
Artificial Neural Networks
2
Σ
Single Layer Networks
Multi Layer Networks
Backprop algoritmen
Compute the direction in weight space where the error
increases the most gradw~ (E )
Change the weights in the opposite direction
wi ← wi − η
Generalization
Artificial Neural Networks
Single Layer Networks
∂E
∂wi
Multi Layer Networks
Generalization
Backprop algoritmen
Normally one can use the error from each example separately
E=
1 X
(tk − ok )2
2
The gradient can be expressed as a function of a local generalized
error δ
∂E
= −δi xj
∂wji
k∈Out
A common ”threshold-like function” is
Output layer:
1
ρ(y ) =
1 + e −y
wji ← wji + ηδi xj
δk = ok · (1 − ok ) · (tk − ok )
Hidden layers:
1
0.75
δh = oh · (1 − oh ) ·
1
1 + e−x
0.5
−5
0
5
wkh δk
k∈Out
0.25
0
−10
X
10
The error δ propagates backwards through the layers
Error backpropagation (BackProp)
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Multi Layer Networks
Generalization
Practical Problems
1
Artificial Neural Networks
Properties
Applications
Classical Examples
Biological Background
2
Single Layer Networks
Limitations
Training Single Layer Networks
3
Multi Layer Networks
Possible Mappings
Backprop algoritmen
Practical Problems
4
Generalization
Things to think about then using BackProp
Sloooow
Normal to require thousands of iterations through the dataset
Gradient following
Risk of getting stuck in local minima
Many parameters
Step size η
Number of layers
Number of hidden units
Input and output representation
Initial weights
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Generalization
Artificial Neural Networks
Single Layer Networks
Generalization
Risk for overfitting (överinlärning)!
The net normally interpolates smoothly between the data points
If the network has too many degrees of freedom (weights), the risk
increases that learning will find a ”strange” solution
1
1
0.75
0.75
0.5
0.5
0.25
0.25
0
0
0.2
0.4
0.6
0.8
0
1
0
Results in good generalization
Artificial Neural Networks
Single Layer Networks
Multi Layer Networks
Limiting the number of hidden units tends to improve
generalization
1
0.75
0.5
0.25
0
0
0.2
0.4
0.6
0.8
1
Generalization
0.2
0.4
0.6
0.8
1
Download