Lecture 24 4/15/04 15:17 The Cognitive Inversion • Computers can do some things very well that are difficult for people – – – – Lecture 25 Artificial Intelligence: Recognition Tasks (S&G, §13.4) e.g., arithmetic calculations playing chess & other board games doing proofs in formal logic & mathematics handling large amounts of data precisely • But computers are very bad at some things that are easy for people (and even some animals) – e.g., face recognition & general object recognition – autonomous locomotion – sensory-motor coordination • Conclusion: brains work very differently from Von Neumann computers 4/15/04 CS 100 - Lecture 24 1 4/15/04 • Typical recognition tasks take less than one second • Neurons take several milliseconds to fire • Therefore then can be at most about 100 sequential processing steps CS 100 - Lecture 24 2 Two Different Approaches to Computing The 100-Step Rule 4/15/04 CS 100 - Lecture 24 Von Neumann: Narrow but Deep Neural Computation: Shallow but Wide … 3 4/15/04 How Wide? CS 100 - Lecture 24 4 A Very Brief Tour of Real Neurons Retina: 100 million receptors (and Real Brains) Optic nerve: one million nerve fibers 4/15/04 CS 100 CS 100 - Lecture 24 5 4/15/04 CS 100 - Lecture 24 6 1 Lecture 24 4/15/04 15:17 Left Hemisphere 4/15/04 CS 100 - Lecture 24 7 4/15/04 CS 100 - Lecture 24 8 (fig. from internet) Grey Matter vs. White Matter Typical Neuron 4/15/04 CS 100 - Lecture 24 9 4/15/04 CS 100 - Lecture 24 10 (fig. from Carter 1998) Relative Cortical Areas Neural Density in Cortex • 148 000 neurons / sq. mm • Hence, about 15 million / sq. cm 4/15/04 CS 100 CS 100 - Lecture 24 11 4/15/04 CS 100 - Lecture 24 12 2 Lecture 24 4/15/04 15:17 Macaque Visual System Information Representation in the Brain 4/15/04 CS 100 - Lecture 24 13 4/15/04 CS 100 - Lecture 24 14 (fig. from Van Essen & al. 1992) Hierarchy of Macaque Visual Areas 4/15/04 Bat Auditory Cortex CS 100 - Lecture 24 15 4/15/04 (fig. from Van Essen & al. 1992) CS 100 - Lecture 24 16 (figs. from Suga, 1985) Typical Neuron Real Neurons 4/15/04 CS 100 CS 100 - Lecture 24 17 4/15/04 CS 100 - Lecture 24 18 3 Lecture 24 4/15/04 15:17 Dendritic Trees of Some Neurons A. B. C. D. E. F. G. H. I. J. K. L. 4/15/04 Layers and Minicolumns inferior olivary nucleus granule cell of cerebellar cortex small cell of reticular formation small gelatinosa cell of spinal trigeminal nucleus ovoid cell, nucleus of tractus solitarius large cell of reticular formation spindle-shaped cell, substantia gelatinosa of spinal chord large cell of spinal trigeminal nucleus putamen of lenticular nucleus double pyramidal cell, Ammon’s horn of hippocampal cortex thalamic nucleus globus pallidus of lenticular nucleus CS 100 - Lecture 24 19 4/15/04 (fig. from Trues & Carpenter, 1964) Neural Networks in Visual System of Frog 4/15/04 CS 100 - Lecture 24 20 (fig. from Arbib 1995, p. 270) Frequency Coding CS 100 - Lecture 24 21 4/15/04 (fig. from Arbib 1995, p. 1039) CS 100 - Lecture 24 22 (fig. from Anderson, Intr. Neur. Nets) Variations in Spiking Behavior Chemical Synapse 1. 2. 3. 4. 4/15/04 CS 100 - Lecture 24 23 4/15/04 CS 100 - Lecture 24 Action potential arrives at synapse Ca ions enter cell Vesicles move to membrane, release neurotransmitter Transmitter crosses cleft, causes postsynaptic voltage change 24 (fig. from Anderson, Intr. Neur. Nets) CS 100 4 Lecture 24 4/15/04 15:17 Neurons are Not Logic Gates • Speed – electronic logic gates are very fast (nanoseconds) – neurons are comparatively slow (milliseconds) Artificial Neural Networks • Precision – logic gates are highly reliable digital devices – neurons are imprecise analog devices • Connections – logic gates have few inputs (usually 1 to 3) – many neurons have >100 000 inputs 4/15/04 CS 100 - Lecture 24 25 4/15/04 Typical Artificial Neuron CS 100 - Lecture 24 26 Typical Artificial Neuron connection weights summation inputs activation function output net input threshold 4/15/04 CS 100 - Lecture 24 27 Operation of Artificial Neuron 4/15/04 1×2 = 2 1 × 1 = –1 0×4 = 0 1 2 1 –1 Σ 2 3 3>2 1 1 4 CS 100 –1 Σ 1 1<2 0 4 2 1 4/15/04 28 Operation of Artificial Neuron 0×2 = 0 1 × 1 = –1 1×4 = 4 0 CS 100 - Lecture 24 CS 100 - Lecture 24 2 0 29 4/15/04 CS 100 - Lecture 24 30 5 Lecture 24 4/15/04 15:17 31 4/15/04 ... ... ... ... ... ... CS 100 - Lecture 24 34 35 4/15/04 CS 100 - Lecture 24 ... ... ... ... Feedforward Network In Operation (5) ... ... ... ... ... ... CS 100 CS 100 - Lecture 24 ... 4/15/04 Feedforward Network In Operation (4) 4/15/04 ... ... 33 ... CS 100 - Lecture 24 32 Feedforward Network In Operation (3) ... ... ... ... ... ... Feedforward Network In Operation (2) CS 100 - Lecture 24 ... CS 100 - Lecture 24 4/15/04 ... ... output layer hidden layers 4/15/04 ... ... ... ... ... ... input layer input ... Feedforward Network In Operation (1) Feedforward Network 36 6 Lecture 24 4/15/04 15:17 4/15/04 CS 100 - Lecture 24 37 ... ... ... ... ... ... ... ... ... CS 100 - Lecture 24 38 Comparison with Non-Neural Net Approaches Feedforward Network In Operation (8) ... 4/15/04 ... ... Feedforward Network In Operation (7) ... ... ... ... ... ... Feedforward Network In Operation (6) output • Non-NN approaches typically decide output from a small number of dominant factors • NNs typically look at a large number of factors, each of which weakly influences output • NNs permit: – subtle discriminations – holistic judgments – context sensitivity 4/15/04 CS 100 - Lecture 24 39 4/15/04 Connectionist Architectures – each knowledge item is distributed over many connections – each connection encodes many knowledge items • Memory & processing is robust in face of damage, errors, inaccuracy, noise, … CS 100 CS 100 - Lecture 24 40 Differences from Digital Calculation • The knowledge is implicit in the connection weights between the neurons • Items of knowledge are not stored in dedicated memory locations, as in a Von Neumann computer • “Holographic” knowledge representation: 4/15/04 CS 100 - Lecture 24 41 • Information represented in continuous images (rather than language-like structures) • Information processing by continuous image processing (rather than explicit rules applied in individual steps) • Indefiniteness is inevitable (rather than definiteness assumed) 4/15/04 CS 100 - Lecture 24 42 7 Lecture 24 4/15/04 15:17 Learning for Output Neuron (1) Supervised Learning • Produce desired outputs for training inputs • Generalize reasonably & appropriately to other inputs • Good example: pattern recognition • Neural nets are trained rather than programmed 0 1 Σ 3 3>2 43 Learning for Output Neuron (2) 1 1 No change! 2 1 CS 100 - Lecture 24 4/15/04 CS 100 - Lecture 24 44 Learning for Output Neuron (3) 0 1 Desired output: 2 1 –1 4 – another difference from Von Neumann computation 4/15/04 Desired output: 2 –1.5 –1 X Σ 3 X 1.75 3>2 XXX 1.75 <2 2 4/15/04 1 0 Output is too large X 4 3.25 1 X 1 0 CS 100 - Lecture 24 Desired output: 2 –1 Σ 1 1<2 No change! 4 2 0 45 0 0 4/15/04 CS 100 - Lecture 24 46 Credit Assignment Problem Learning for Output Neuron (4) How do we adjust the weights of the hidden layers? 4 2 0 4/15/04 CS 100 CS 100 - Lecture 24 X 0 1 1 Output is too small input layer 47 4/15/04 hidden layers CS 100 - Lecture 24 ... 1<2 XXX 2.25 >2 ... 1 X 2.25 ... Σ ... –0.5 –1 X ... 1 ... Desired output: X 2 2.75 ... 1 Desired output output layer 48 8 Lecture 24 4/15/04 15:17 49 51 4/15/04 ... ... ... ... ... ... 4/15/04 ... Desired output Input 53 ... ... ... ... ... ... ... ... ... ... CS 100 52 ... Desired output CS 100 - Lecture 24 CS 100 - Lecture 24 Back-Propagation: Correct All Hidden Layers ... 4/15/04 Desired output Input Back-Propagation: Correct All Hidden Layers Input ... ... ... ... ... ... ... CS 100 - Lecture 24 50 ... Desired output 4/15/04 CS 100 - Lecture 24 Back-Propagation: Correct Last Hidden Layer ... ... Back-Propagation: Correct Output Neuron Weights Input ... 4/15/04 ... CS 100 - Lecture 24 Desired output Input ... 4/15/04 ... ... ... ... ... ... Desired output Input ... ... Back-Propagation: Actual vs. Desired Output ... ... Back-Propagation: Forward Pass CS 100 - Lecture 24 54 9 Lecture 24 4/15/04 15:17 4/15/04 ... Desired output Input 55 ... ... ... ... ... ... ... ... ... ... CS 100 - Lecture 24 ... Desired output Input 4/15/04 Back-Propagation: Correct First Hidden Layer ... ... Back-Propagation: Correct All Hidden Layers CS 100 - Lecture 24 56 ANN Training Procedures Use of Back-Propagation (BP) • Typically the weights are changed slowly • Typically net will not give correct outputs for all training inputs after one adjustment • Each input/output pair is used repeatedly for training • BP may be slow • But there are many better ANN learning algorithms • Supervised training: we show the net the output it should produce for each training input (e.g., BP) • Reinforcement training: we tell the net if its output is right or wrong, but not what the correct output is • Unsupervised training: the net attempts to find patterns in its environment without external guidance 4/15/04 4/15/04 CS 100 - Lecture 24 57 CS 100 - Lecture 24 58 Applications of ANNs • “Neural nets are the second-best way of doing everything” • If you really understand a problem, you can design a special purpose algorithm for it, which will beat a NN • However, if you don’t understand your problem very well, you can generally train a NN to do it well enough 4/15/04 CS 100 CS 100 - Lecture 24 59 10