PPT - Bo Yuan

Classification III Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Artificial Neural Networks 2 Biological Motivation  1011: The number of neurons in the human brain  104: The average number of connections of each neuron  10-3: The fastest switching times of neurons  10-10: The switching speeds of computers  10-1: The time required to visually recognize your mother 3 Biological Motivation  The power of parallelism  The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons.  The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations.  Sequential machines vs. Parallel machines  Group A  Using ANN to study and model biological learning processes.  Group B  Obtaining highly effective machine learning algorithms, regardless of how closely these algorithms mimic biological processes. 4 Neural Network Representations 5 Robot vs. Human 6 Perceptrons x1 x2 . . . w1 w2 x0=1 w0 n  1, if  wi  xi  0 o i 0 0, otherwise ∑ wn n w x i 0 i i xn 1, if w0  w1  x1      wn  xn  0 o( x1 ,..., xn )   0, otherwise 7 Power of Perceptrons -0.3 -0.8 0.5 0.5 0.5 AND Input 0.5 OR ∑ Output Input ∑ Output 0 0 -0.8 0 0 0 -0.3 0 0 1 -0.3 0 0 1 0.2 1 1 0 -0.3 0 1 0 0.2 1 1 1 0.3 1 1 1 0.7 1 8 Error Surface Error w1 w2 9 Gradient Descent 1 2 E ( w)   (td  od ) 2 dD Batch Learning  E E E  E ( w)   , ,...,   w  w  w 1 n  0 wi  wi  wi E where wi   wi Learning Rate 10 Delta Rule E  1 2  ( t  o )  d d wi wi 2 dD o( x)  w  x 1    (t d  od ) 2 2 dD wi wi    (td  od ) xid 1    2(t d  od ) (t d  od ) 2 dD wi dD    (t d  od ) (t d  w  xd ) wi d D   (t d  od )( xid ) d D 11 Batch Learning GRADIENT_DESCENT (training_examples, η)  Initialize each wi to some small random value.  Until the termination condition is met, Do  Initialize each Δwi to zero.  For each <x, t> in training_examples, Do • Input the instance x to the unit and compute the output o • For each linear unit weight wi, Do – Δwi ← Δwi + η(t-o)xi  For each linear unit weight wi, Do • wi ← wi + Δwi 12 Stochastic Learning wi  wi  wi where wi   (t  o) xi For example, if xi=0.8, η=0.1, t=1 and o=0 Δwi = η(t-o)xi = 0.1×(1-0) × 0.8 = 0.08 + + + + + - - + - - 13 Stochastic Learning: NAND Tar get Input x0 x1 x2 t Output Initial Individual Weights w0 w1 w2 Final Error Correction Sum Output C0 C1 C2 S x0 · w0 x1 · w1 x2 C0+ · C1+ w2 C2 o E R t-o LR x E Final Weights w0 w1 w2 1 0 0 1 0 0 0 0 0 0 0 0 1 +0.1 0.1 0 0 1 0 1 1 0.1 0 0 0.1 0 0 0.1 0 1 +0.1 0.2 0 0.1 1 1 0 1 0.2 0 0.1 0.2 0 0 0.2 0 1 +0.1 0.3 0.1 0.1 1 1 1 0 0.3 0.1 0.1 0.3 0.1 0.1 0.5 0 0 0 0.3 0.1 0.1 1 0 0 1 0.3 0.1 0.1 0.3 0 0.3 0 1 +0.1 0.4 0.1 0.1 1 0 1 1 0.4 0.1 0.1 0.4 0 0.1 0.5 0 1 +0.1 0.5 0.1 0.2 1 1 0 1 0.5 0.1 0.2 0.5 0.1 0.6 1 0 0 0.5 0.1 0.2 1 1 1 0 0.5 0.1 0.2 0.5 0.1 0.2 0.8 1 -1 -0.1 0.4 0 0.1 1 0 0 1 0.4 0.5 0 0.1 . 1 . 1 . 0 . 1 . . . . . 0.8 -.2 -.1 0.8 -.2 threshold=0.5 learning rate=0.1 0 0.1 0.4 0 0 0 0 0.4 0 1 +0.1 . 0 . 0.6 . 1 . 0 . 0 14 . . . 0.8 -.2 -.1 Multilayer Perceptron 15 XOR p  q  pq  pq  ( p  q)( pq) + Input - q - Output 0 0 0 0 1 1 1 0 1 1 1 0 + p Cannot be separated by a single line. 16 XOR p  q  ( p  q)  ( p  q) AND + OR q - NAND NAND + p OR p 17 q Hidden Layer Representations NAND - ++ p q OR NAND AND 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 Input AND OR 18 Hidden Output The Sigmoid Threshold Unit x1 x2 . . . w1 w2 x0=1 w0 o   (net )  ∑ n wn net   wi  xi i 0 xn Sigm oidFunction 1  ( y)  1  e y d ( y )   ( y )  (1   ( y )) dy 19 1 1  e  net Backpropagation Rule Ed ( w)  1 2 ( t  o )  k k 2 koutputs Ed w ji   w ji Ed Ed netj Ed    x ji w ji netj w ji netj j • xji = the i th input to unit j • wji = the weight associated with the i th input to unit j • netj = ∑wjixji (the weighted sum of inputs for unit j ) i • oj= the output of unit j • tj= the target output of unit j • σ = the sigmoid function • outputs = the set of units in the final layer • Downstream (j ) = the set of units directly taking the output of unit j as inputs 20 Training Rule for Output Units Ed Ed o j   net j o j net j Ed  1  (tk  ok ) 2  o j o j 2 koutputs Ed  1  (t j  o j ) 2 o j o j 2   (t j  o j ) 1 2(t j  o j ) 2 o j  (t j  o j ) o j netj   (netj ) netj  o j (1  o j ) Ed  (t j  o j )o j (1  o j ) netj Ed w ji     (t j  o j )o j (1  o j ) x ji w ji 21 Training Rule for Hidden Units Ed Ed netk netk o j   k   net j kDownstream ( j ) netk net j kDownstream ( j ) o j net j   k kDownstream ( j ) k wkj o j net j   wkj o j (1  o j ) k   Ed netk k kDownstream ( j )  j  o j (1  o j ) k  w k kj kDownstream ( j ) j wji   j x ji netj 22 BP Framework  BACKPROPAGATION (training_examples, η, nin, nout, nhidden)  Create a network with nin inputs, nhidden hidden units and nout output units.  Initialize all network weights to small random numbers.  Until the termination condition is met, Do  For each <x, t> in training_examples, Do • Input the instance x to the network and computer the output o of every unit. • For each output unit k, calculate its error term δk  k  ok (1  ok )(tk  ok ) • For each hidden unit h, calculate its error term δh  h  oh (1  oh ) w koutputs  kh k • Update each network weight wji wji  wji  wji  wji  j x ji 23 More about BP Networks …  Convergence and Local Minima  The search space is likely to be highly multimodal.  May easily get stuck at a local solution.  Need multiple trials with different initial weights.  Evolving Neural Networks  Black-box optimization techniques (e.g., Genetic Algorithms)  Usually better accuracy  Can do some advanced training (e.g., structure + parameter).  Xin Yao (1999) “Evolving Artificial Neural Networks”, Proceedings of the IEEE, pp. 1423-1447.  Representational Power  Deep Learning 24 More about BP Networks …  Overfitting  Tend to occur during later iterations.  Use validation dataset to terminate the training when necessary.  Practical Considerations  Momentum  Adaptive learning rate wji (n)   j x ji  wji (n 1) • Small: slow convergence, easy to get stuck • Large: fast convergence, unstable Validation Error Error Training Time Weight 25 Beyond BP Networks XOR In 011000110101… Out ? 1 ? ? 0 ? ? 0 ? ? 1 ? … Elman Network 26 Beyond BP Networks Hopfield Network Energy Landscape of Hopfield Network 27 Beyond BP Networks 28 When does ANN work?  Instances are represented by attribute-value pairs.  Input values can be any real values.  The target output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes.  The training samples may contain errors.  Long training times are acceptable.  Can range from a few seconds to several hours.  Fast evaluation of the learned target function may be required.  The ability to understand the learned function is not important.  Weights are difficult for humans to interpret. 29 Reading Materials  Text Book  Richard O. Duda et al., Pattern Classification, Chapter 6, John Wiley & Sons Inc.  Tom Mitchell, Machine Learning, Chapter 4, McGraw-Hill.  http://page.mi.fu-berlin.de/rojas/neural/index.html.html  Online Demo  http://neuron.eng.wayne.edu/software.html  http://www.cbu.edu/~pong/ai/hopfield/hopfield.html  Online Tutorial  http://www.autonlab.org/tutorials/neural13.pdf  http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html  Wikipedia & Google 30 Review  What is the biological motivation of ANN?  When does ANN work?  What is a perceptron?  How to train a perceptron?  What is the limitation of perceptrons?  How does ANN solve non-linearly separable problems?  What is the key idea of Backpropogation algorithm?  What are the main issues of BP networks?  What are the examples of other types of ANN? 31 Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic 1: Applications of ANN  Topic 2: Recurrent Neural Networks  Hints:  Robot Driving  Character Recognition  Face Recognition  Hopfield Network  Length: 20 minutes plus question time 32 Assignment  Topic: Training Feedforward Neural Networks  Technique: BP Algorithm  Task 1: XOR Problem  4 input samples • • 000 101  Task 2: Identity Function  8 input samples • • •  10000000  10000000 00010000  00010000 … Use 3 hidden units  Deliverables:  Report  Code (any programming language, with detailed comments)  Due: Sunday, 14 December  Credit: 15% 33

PPT - Bo Yuan

Related documents

Products

Support

PPT - Bo Yuan

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib