Lecture 25 Artificial Intelligence: Recognition Tasks (S&G, §13.4) 5/29/2016 CS 100 - Lecture 24 1 The Cognitive Inversion • Computers can do some things very well that are difficult for people – – – – e.g., arithmetic calculations playing chess & other board games doing proofs in formal logic & mathematics handling large amounts of data precisely • But computers are very bad at some things that are easy for people (and even some animals) – e.g., face recognition & general object recognition – autonomous locomotion – sensory-motor coordination • Conclusion: brains work very differently from Von Neumann computers 5/29/2016 CS 100 - Lecture 24 2 The 100-Step Rule • Typical recognition tasks take less than one second • Neurons take several milliseconds to fire • Therefore then can be at most about 100 sequential processing steps 5/29/2016 CS 100 - Lecture 24 3 Two Different Approaches to Computing Von Neumann: Narrow but Deep Neural Computation: Shallow but Wide … 5/29/2016 CS 100 - Lecture 24 4 How Wide? Retina: 100 million receptors Optic nerve: one million nerve fibers 5/29/2016 CS 100 - Lecture 24 5 A Very Brief Tour of Real Neurons (and Real Brains) 5/29/2016 CS 100 - Lecture 24 6 5/29/2016 CS 100 - Lecture 24 (fig. from internet) 7 Left Hemisphere 5/29/2016 CS 100 - Lecture 24 8 Typical Neuron 5/29/2016 CS 100 - Lecture 24 9 Grey Matter vs. White Matter 5/29/2016 CS 100 - Lecture 24 (fig. from Carter 1998) 10 Neural Density in Cortex • 148 000 neurons / sq. mm • Hence, about 15 million / sq. cm 5/29/2016 CS 100 - Lecture 24 11 Relative Cortical Areas 5/29/2016 CS 100 - Lecture 24 12 Information Representation in the Brain 5/29/2016 CS 100 - Lecture 24 13 Macaque Visual System 5/29/2016 CS 100 - Lecture 24 (fig. from Van Essen & al. 1992) 14 Hierarchy of Macaque Visual Areas 5/29/2016 CS 100 - Lecture 24 (fig. from Van Essen & al. 1992) 15 Bat Auditory Cortex 5/29/2016 CS 100 - Lecture 24 (figs. from Suga, 1985) 16 Real Neurons 5/29/2016 CS 100 - Lecture 24 17 Typical Neuron 5/29/2016 CS 100 - Lecture 24 18 Dendritic Trees of Some Neurons A. B. C. D. E. F. G. H. I. J. K. L. 5/29/2016 inferior olivary nucleus granule cell of cerebellar cortex small cell of reticular formation small gelatinosa cell of spinal trigeminal nucleus ovoid cell, nucleus of tractus solitarius large cell of reticular formation spindle-shaped cell, substantia gelatinosa of spinal chord large cell of spinal trigeminal nucleus putamen of lenticular nucleus double pyramidal cell, Ammon’s horn of hippocampal cortex thalamic nucleus globus pallidus of lenticular nucleus CS 100 - Lecture 24 (fig. from Trues & Carpenter, 1964) 19 Layers and Minicolumns 5/29/2016 CS 100 - Lecture 24 (fig. from Arbib 1995, p. 270) 20 Neural Networks in Visual System of Frog 5/29/2016 CS 100 - Lecture 24 (fig. from Arbib 1995, p. 1039) 21 Frequency Coding 5/29/2016 CS 100 - Lecture 24 (fig. from Anderson, Intr. Neur. Nets) 22 Variations in Spiking Behavior 5/29/2016 CS 100 - Lecture 24 23 Chemical Synapse 1. 2. 3. 4. 5/29/2016 CS 100 - Lecture 24 (fig. from Anderson, Intr. Neur. Nets) Action potential arrives at synapse Ca ions enter cell Vesicles move to membrane, release neurotransmitter Transmitter crosses cleft, causes postsynaptic voltage change 24 Neurons are Not Logic Gates • Speed – electronic logic gates are very fast (nanoseconds) – neurons are comparatively slow (milliseconds) • Precision – logic gates are highly reliable digital devices – neurons are imprecise analog devices • Connections – logic gates have few inputs (usually 1 to 3) – many neurons have >100 000 inputs 5/29/2016 CS 100 - Lecture 24 25 Artificial Neural Networks 5/29/2016 CS 100 - Lecture 24 26 Typical Artificial Neuron connection weights inputs output threshold 5/29/2016 CS 100 - Lecture 24 27 Typical Artificial Neuron summation activation function net input 5/29/2016 CS 100 - Lecture 24 28 Operation of Artificial Neuron 02 = 0 1 1 = –1 14 = 4 0 2 1 –1 S 3 3>2 1 4 2 1 5/29/2016 CS 100 - Lecture 24 29 Operation of Artificial Neuron 12 = 2 1 1 = –1 04 = 0 1 2 1 –1 S 1 1<2 0 4 2 0 5/29/2016 CS 100 - Lecture 24 30 input layer 5/29/2016 hidden layers CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network output layer 31 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... input ... Feedforward Network In Operation (1) 32 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (2) 33 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (3) 34 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (4) 35 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (5) 36 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (6) 37 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (7) 38 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... ... Feedforward Network In Operation (8) output 39 Comparison with Non-Neural Net Approaches • Non-NN approaches typically decide output from a small number of dominant factors • NNs typically look at a large number of factors, each of which weakly influences output • NNs permit: – subtle discriminations – holistic judgments – context sensitivity 5/29/2016 CS 100 - Lecture 24 40 Connectionist Architectures • The knowledge is implicit in the connection weights between the neurons • Items of knowledge are not stored in dedicated memory locations, as in a Von Neumann computer • “Holographic” knowledge representation: – each knowledge item is distributed over many connections – each connection encodes many knowledge items • Memory & processing is robust in face of damage, errors, inaccuracy, noise, … 5/29/2016 CS 100 - Lecture 24 41 Differences from Digital Calculation • Information represented in continuous images (rather than language-like structures) • Information processing by continuous image processing (rather than explicit rules applied in individual steps) • Indefiniteness is inevitable (rather than definiteness assumed) 5/29/2016 CS 100 - Lecture 24 42 Supervised Learning • Produce desired outputs for training inputs • Generalize reasonably & appropriately to other inputs • Good example: pattern recognition • Neural nets are trained rather than programmed – another difference from Von Neumann computation 5/29/2016 CS 100 - Lecture 24 43 Learning for Output Neuron (1) 0 Desired output: 2 1 –1 S 3 3>2 1 No change! 4 2 1 5/29/2016 1 CS 100 - Lecture 24 44 Learning for Output Neuron (2) 0 Desired output: 2 1 –1.5 –1 X S 3 X 1.75 3>2 XXX 1.75 <2 X 4 3.25 2 1 5/29/2016 CS 100 - Lecture 24 X 1 0 0 Output is too large 45 Learning for Output Neuron (3) 1 Desired output: 2 1 –1 S 1 1<2 0 No change! 4 2 0 5/29/2016 0 CS 100 - Lecture 24 46 Learning for Output Neuron (4) 1 Desired output: X 2 2.75 1 –0.5 –1 X S 1 X 2.25 1<2 XXX 2.25 >2 4 2 0 5/29/2016 CS 100 - Lecture 24 X 0 1 1 Output is too small 47 Credit Assignment Problem input layer 5/29/2016 CS 100 - Lecture 24 ... ... ... ... ... hidden layers ... ... How do we adjust the weights of the hidden layers? Desired output output layer 48 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Forward Pass CS 100 - Lecture 24 49 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Actual vs. Desired Output CS 100 - Lecture 24 50 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Correct Output Neuron Weights CS 100 - Lecture 24 51 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Correct Last Hidden Layer CS 100 - Lecture 24 52 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Correct All Hidden Layers CS 100 - Lecture 24 53 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Correct All Hidden Layers CS 100 - Lecture 24 54 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Correct All Hidden Layers CS 100 - Lecture 24 55 ... ... ... ... ... Desired output Input 5/29/2016 ... ... Back-Propagation: Correct First Hidden Layer CS 100 - Lecture 24 56 Use of Back-Propagation (BP) • Typically the weights are changed slowly • Typically net will not give correct outputs for all training inputs after one adjustment • Each input/output pair is used repeatedly for training • BP may be slow • But there are many better ANN learning algorithms 5/29/2016 CS 100 - Lecture 24 57 ANN Training Procedures • Supervised training: we show the net the output it should produce for each training input (e.g., BP) • Reinforcement training: we tell the net if its output is right or wrong, but not what the correct output is • Unsupervised training: the net attempts to find patterns in its environment without external guidance 5/29/2016 CS 100 - Lecture 24 58 Applications of ANNs • “Neural nets are the second-best way of doing everything” • If you really understand a problem, you can design a special purpose algorithm for it, which will beat a NN • However, if you don’t understand your problem very well, you can generally train a NN to do it well enough 5/29/2016 CS 100 - Lecture 24 59