PPT - Bo Yuan

advertisement
Classification III
Lecturer: Dr. Bo Yuan
LOGO
E-mail: yuanb@sz.tsinghua.edu.cn
Overview
Artificial Neural Networks
2
Biological Motivation

1011: The number of neurons in the human brain

104: The average number of connections of each neuron

10-3: The fastest switching times of neurons

10-10: The switching speeds of computers

10-1: The time required to visually recognize your mother
3
Biological Motivation
 The power of parallelism
 The information processing abilities of biological neural systems follow
from highly parallel processes operating on representations that are
distributed over many neurons.
 The motivation of ANN is to capture this kind of highly parallel
computation based on distributed representations.
 Sequential machines vs. Parallel machines
 Group A
 Using ANN to study and model biological learning processes.
 Group B
 Obtaining highly effective machine learning algorithms, regardless of how
closely these algorithms mimic biological processes.
4
Neural Network Representations
5
Robot vs. Human
6
Perceptrons
x1
x2
.
.
.
w1
w2
x0=1
w0
n

1, if  wi  xi  0
o
i 0
0, otherwise
∑
wn
n
w x
i 0
i
i
xn
1, if w0  w1  x1      wn  xn  0
o( x1 ,..., xn )  
0, otherwise
7
Power of Perceptrons
-0.3
-0.8
0.5
0.5
0.5
AND
Input
0.5
OR
∑
Output
Input
∑
Output
0
0
-0.8
0
0
0
-0.3
0
0
1
-0.3
0
0
1
0.2
1
1
0
-0.3
0
1
0
0.2
1
1
1
0.3
1
1
1
0.7
1
8
Error Surface
Error
w1
w2
9
Gradient Descent
1
2
E ( w)   (td  od )
2 dD
Batch Learning
 E E
E 
E ( w)  
,
,...,


w

w

w
1
n
 0
wi  wi  wi
E
where wi  
wi
Learning Rate
10
Delta Rule
E
 1
2

(
t

o
)
 d d
wi wi 2 dD
o( x)  w  x
1

 
(t d  od ) 2
2 dD wi
wi    (td  od ) xid
1

  2(t d  od )
(t d  od )
2 dD
wi
dD

  (t d  od )
(t d  w  xd )
wi
d D
  (t d  od )( xid )
d D
11
Batch Learning
GRADIENT_DESCENT (training_examples, η)
 Initialize each wi to some small random value.
 Until the termination condition is met, Do
 Initialize each Δwi to zero.
 For each <x, t> in training_examples, Do
• Input the instance x to the unit and compute the output o
• For each linear unit weight wi, Do
– Δwi ← Δwi + η(t-o)xi
 For each linear unit weight wi, Do
• wi ← wi + Δwi
12
Stochastic Learning
wi  wi  wi
where wi   (t  o) xi
For example, if xi=0.8, η=0.1, t=1 and o=0
Δwi = η(t-o)xi = 0.1×(1-0) × 0.8 = 0.08
+
+
+
+
+
-
-
+
-
-
13
Stochastic Learning: NAND
Tar
get
Input
x0
x1
x2
t
Output
Initial
Individual
Weights
w0
w1
w2
Final Error Correction
Sum
Output
C0
C1
C2
S
x0
·
w0
x1
·
w1
x2 C0+
· C1+
w2 C2
o
E
R
t-o
LR x E
Final
Weights
w0
w1
w2
1
0
0
1
0
0
0
0
0
0
0
0
1
+0.1
0.1
0
0
1
0
1
1
0.1
0
0
0.1
0
0
0.1
0
1
+0.1
0.2
0
0.1
1
1
0
1
0.2
0
0.1 0.2
0
0
0.2
0
1
+0.1
0.3 0.1 0.1
1
1
1
0
0.3 0.1 0.1 0.3 0.1 0.1 0.5
0
0
0
0.3 0.1 0.1
1
0
0
1
0.3 0.1 0.1 0.3
0
0.3
0
1
+0.1
0.4 0.1 0.1
1
0
1
1
0.4 0.1 0.1 0.4
0
0.1 0.5
0
1
+0.1
0.5 0.1 0.2
1
1
0
1
0.5 0.1 0.2 0.5 0.1
0.6
1
0
0
0.5 0.1 0.2
1
1
1
0
0.5 0.1 0.2 0.5 0.1 0.2 0.8
1
-1
-0.1
0.4
0
0.1
1
0
0
1
0.4
0.5
0
0.1
.
1
.
1
.
0
.
1
.
.
.
.
.
0.8 -.2 -.1 0.8 -.2
threshold=0.5
learning rate=0.1
0
0.1 0.4
0
0
0
0
0.4
0
1
+0.1
.
0
.
0.6
.
1
.
0
.
0
14
.
.
.
0.8 -.2 -.1
Multilayer Perceptron
15
XOR
p  q  pq  pq  ( p  q)( pq)
+
Input
-
q
-
Output
0
0
0
0
1
1
1
0
1
1
1
0
+
p
Cannot be separated by a single line.
16
XOR
p  q  ( p  q)  ( p  q)
AND
+
OR
q
-
NAND
NAND
+
p
OR
p
17
q
Hidden Layer Representations
NAND
-
++
p
q
OR
NAND
AND
0
0
0
1
0
0
1
1
1
1
1
0
1
1
1
1
1
1
0
0
Input
AND
OR
18
Hidden
Output
The Sigmoid Threshold Unit
x1
x2
.
.
.
w1
w2
x0=1
w0
o   (net ) 
∑
n
wn
net   wi  xi
i 0
xn
Sigm oidFunction
1
 ( y) 
1  e y
d ( y )
  ( y )  (1   ( y ))
dy
19
1
1  e  net
Backpropagation Rule
Ed ( w) 
1
2
(
t

o
)
 k k
2 koutputs
Ed
w ji  
w ji
Ed
Ed netj
Ed



x ji
w ji netj w ji netj
j
• xji = the i th input to unit j
• wji = the weight associated with the i th input to unit j
• netj = ∑wjixji (the weighted sum of inputs for unit j )
i
• oj= the output of unit j
• tj= the target output of unit j
• σ = the sigmoid function
• outputs = the set of units in the final layer
• Downstream (j ) = the set of units directly taking the output of unit j as inputs
20
Training Rule for Output Units
Ed
Ed o j


net j o j net j
Ed
 1

(tk  ok ) 2

o j o j 2 koutputs
Ed
 1

(t j  o j ) 2
o j o j 2

 (t j  o j )
1
2(t j  o j )
2
o j
 (t j  o j )
o j
netj

 (netj )
netj
 o j (1  o j )
Ed
 (t j  o j )o j (1  o j )
netj
Ed
w ji  
  (t j  o j )o j (1  o j ) x ji
w ji
21
Training Rule for Hidden Units
Ed
Ed netk
netk o j


k


net j kDownstream ( j ) netk net j kDownstream ( j )
o j net j


k
kDownstream ( j )
k
wkj
o j
net j


wkj o j (1  o j )
k  
Ed
netk
k
kDownstream ( j )
 j  o j (1  o j )
k
 w
k kj
kDownstream ( j )
j
wji   j x ji
netj
22
BP Framework
 BACKPROPAGATION (training_examples, η, nin, nout, nhidden)
 Create a network with nin inputs, nhidden hidden units and nout output units.
 Initialize all network weights to small random numbers.
 Until the termination condition is met, Do

For each <x, t> in training_examples, Do
• Input the instance x to the network and computer the output o of every unit.
• For each output unit k, calculate its error term δk
 k  ok (1  ok )(tk  ok )
• For each hidden unit h, calculate its error term δh
 h  oh (1  oh )
w
koutputs

kh k
• Update each network weight wji
wji  wji  wji  wji  j x ji
23
More about BP Networks …
 Convergence and Local Minima
 The search space is likely to be highly multimodal.
 May easily get stuck at a local solution.
 Need multiple trials with different initial weights.
 Evolving Neural Networks
 Black-box optimization techniques (e.g., Genetic Algorithms)
 Usually better accuracy
 Can do some advanced training (e.g., structure + parameter).
 Xin Yao (1999) “Evolving Artificial Neural Networks”, Proceedings of the IEEE,
pp. 1423-1447.
 Representational Power
 Deep Learning
24
More about BP Networks …
 Overfitting
 Tend to occur during later iterations.
 Use validation dataset to terminate the training when necessary.
 Practical Considerations
 Momentum
 Adaptive learning rate
wji (n)   j x ji  wji (n 1)
• Small: slow convergence, easy to get stuck
• Large: fast convergence, unstable
Validation
Error
Error
Training
Time
Weight
25
Beyond BP Networks
XOR
In
011000110101…
Out ? 1 ? ? 0 ? ? 0 ? ? 1 ? …
Elman Network
26
Beyond BP Networks
Hopfield Network
Energy Landscape of Hopfield Network
27
Beyond BP Networks
28
When does ANN work?
 Instances are represented by attribute-value pairs.
 Input values can be any real values.
 The target output may be discrete-valued, real-valued, or a vector of
several real- or discrete-valued attributes.
 The training samples may contain errors.
 Long training times are acceptable.
 Can range from a few seconds to several hours.
 Fast evaluation of the learned target function may be required.
 The ability to understand the learned function is not important.
 Weights are difficult for humans to interpret.
29
Reading Materials
 Text Book
 Richard O. Duda et al., Pattern Classification, Chapter 6, John Wiley & Sons Inc.
 Tom Mitchell, Machine Learning, Chapter 4, McGraw-Hill.
 http://page.mi.fu-berlin.de/rojas/neural/index.html.html
 Online Demo
 http://neuron.eng.wayne.edu/software.html
 http://www.cbu.edu/~pong/ai/hopfield/hopfield.html
 Online Tutorial
 http://www.autonlab.org/tutorials/neural13.pdf
 http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html
 Wikipedia & Google
30
Review
 What is the biological motivation of ANN?
 When does ANN work?
 What is a perceptron?
 How to train a perceptron?
 What is the limitation of perceptrons?
 How does ANN solve non-linearly separable problems?
 What is the key idea of Backpropogation algorithm?
 What are the main issues of BP networks?
 What are the examples of other types of ANN?
31
Next Week’s Class Talk
 Volunteers are required for next week’s class talk.
 Topic 1: Applications of ANN
 Topic 2: Recurrent Neural Networks
 Hints:
 Robot Driving
 Character Recognition
 Face Recognition
 Hopfield Network
 Length: 20 minutes plus question time
32
Assignment
 Topic: Training Feedforward Neural Networks
 Technique: BP Algorithm
 Task 1: XOR Problem
 4 input samples
•
•
000
101
 Task 2: Identity Function
 8 input samples
•
•
•

10000000  10000000
00010000  00010000
…
Use 3 hidden units
 Deliverables:
 Report
 Code (any programming language, with detailed comments)
 Due: Sunday, 14 December
 Credit: 15%
33
Download