Modeling with neural networks Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Temporal Dynamics of Learning Center Institute for Neural Computation UCSD 1 Ways to understand how the brain works • Behavioral measures • Choices • Reaction times • Eye movements • Brain imaging • • • • • • PET fMRI MEG EEG NIRS DTI • Neural recording • Single cell recording • Multicell recording • Optical imaging • Voltage-sensitive dyes • Optogenetics • ECOG • Modeling • Neural networks • Bayesian Models • Abstract Mathematical Models Modeling with Neural Networks 2 Why model? • Models rush in where theories fear to tread. • Models can be manipulated in ways people cannot • Models can be analyzed in ways people cannot. Modeling with Neural Networks 3 Models rush in where theories fear to tread • Theories are high level descriptions of the processes underlying behavior. • They are often not explicit about the processes involved. • They are difficult to reason about if no mechanisms are explicit -- they may be too high level to make explicit predictions. • Theory formation itself is difficult. Modeling with Neural Networks 4 Models rush in where theories fear to tread • Using machine learning techniques, one can often build a working model of a task for which we have no theories or algorithms (e.g., expression recognition). • A working model provides an “intuition pump” for how things might work, especially if they are “neurally plausible” (e.g., development of face processing - Dailey and Cottrell). • A working model may make unexpected predictions (e.g., the Interactive Activation Model and SLNT). Modeling with Neural Networks 5 Your first neural net: The Interactive Activation Model: A model of reading from print • Word level • Letter level • Feature level Modeling with Neural Networks 6 Operation of the model Modeling with Neural Networks 7 Operation of the model Modeling with Neural Networks 8 Example of data accounted for… Pseudoword effect Modeling with Neural Networks 9 Example of data accounted for… Pseudoword effect Modeling with Neural Networks 10 • Example of data predicted What about non-pronounceable non-words like SLNT? • SLNT has a lot of friends at the word level • The model predicts that there should be a superiority effect for SLNT. • The tested this in UCSD Psychology sophomores and got the predicted effect Modeling with Neural Networks 11 Summary • Why model? • Models make assumptions explicit • Models (because they are run on a computer and can be highly non-linear) can make unexpected predictions • While no model is “correct”, the more data a model predicts, the more we “believe” that model… Modeling with Neural Networks 12 Models can be manipulated in ways people cannot • We can see the effects of variations in cortical architecture (e.g., split (hemispheric) vs. non-split models (Shillcock and Monaghan word perception model)). • We can see the effects of variations in processing resources (e.g., variations in number of hidden units in Plaut et al. models). Modeling with Neural Networks 13 Models can be manipulated in ways people cannot • We can see the effects of variations in environment (e.g., what if our parents were cans, cups or books instead of humans? I.e., is there something special about face expertise versus visual expertise in general? (Sugimoto and Cottrell, Joyce and Cottrell)). • We can see variations in behavior due to different kinds of brain damage within a single “brain” (e.g. Juola and Plunkett, Hinton and Shallice). Modeling with Neural Networks 14 Models can be analyzed in ways people cannot In the following, I specifically refer to neural network models. • We can do single unit recordings. • We can selectively ablate and restore parts of the network, even down to the single unit level, to assess the contribution to processing. • We can measure the individual connections -- e.g., the receptive and projective fields of a unit. • We can measure responses at different layers of processing (e.g., which level accounts for a particular judgment: perceptual, object, or categorization? (Dailey et al. J Cog Neuro 2002). Modeling with Neural Networks 15 How (I like) to build Cognitive Models • In a domain where there is a lot of data and controversy! • I like to be able to relate them to the brain, so “neurally plausible” models are preferred -- neural nets. • The model should be a working model of the actual task, rather than a cartoon version of it. • Of course, the model should nevertheless be simplifying (i.e. it should be constrained to the essential features of the problem at hand): • Do we really need to model the (supposed) translation invariance and size invariance of biological perception? • As far as I can tell, NO! • Then, take the model “as is” and fit the experimental data: No fitting parameters is to be preferred over 1, 2 , or 3. Modeling with Neural Networks 16 The other way (I like) to build Cognitive Models • In domains where there is little data and much mystery • Use them as exploratory models -- in domains where there is little direct data (e.g. no single cell recordings in infants or undergraduates) to suggest what we might find if we could get the data. These can then serve as “intuition pumps.” • Examples: • Why we might get specialized face processors • Why those face processors get recruited for other tasks Modeling with Neural Networks 17 A few giants Modeling with Neural Networks 18 A few giants Frank Rosenblatt invented the perceptron: • One of the first neural networks to learn by supervised training • Still in use today! Modeling with Neural Networks 19 A few giants Dave E. Rumelhart with Geoff Hinton and Ron Williams invented back-propagation • Many had invented back propagation before; few could appreciate as deeply as Dave did what they had when they discovered it. Modeling with Neural Networks 20 A few giants • Hal White was a theoretician of neural networks • Hal White’s paper with Max Stinchcombe: “Multilayer feedforward networks are universal approximators” is his second most-cited paper, at 8,114 cites. Modeling with Neural Networks 21 A few giants • In yet another paper (in Neural Computation, 1989), he wrote “The premise of this article is that learning procedures used to train artificial neural networks are inherently statistical techniques. It follows that statistical theory can provide considerable insight into the properties, advantages, and disadvantages of different network learning methods…” This was one of the first papers to make the connection between neural networks and statistical models - and thereby put them on a sound statistical foundation. Modeling with Neural Networks 22 What is backpropagation, and why is/was it important? • We have billions and billions of neurons that somehow work together to create the mind. • These neurons are connected by 1014 - 1015 synapses, which we think encode the “knowledge” in the network - too many for us to explicitly program them in our models • Rather we need some way to indirectly set them via a procedure that will achieve some goal by changing the synaptic strengths (which we call weights). • This is called learning in these systems. Modeling with Neural Networks 23 Learning: A bit of history • Frank Rosenblatt studied a simple version of a neural net called a perceptron: • A single layer of processing • Binary output • Can compute simple things like (some) boolean functions (OR, AND, etc.) Modeling with Neural Networks 24 Learning: A bit of history output net input Modeling with Neural Networks 25 Learning: A bit of history Modeling with Neural Networks 26 Learning: A bit of history • Rosenblatt (1962) discovered a learning rule for perceptrons called the perceptron convergence procedure. • Guaranteed to learn anything computable (by a two-layer perceptron) • Unfortunately, not everything was computable (Minsky & Papert, 1969) Modeling with Neural Networks 27 Perceptron Learning Demonstration • Output activation rule: • First, compute the net input to the output unit: wixi = net • Then, compute the output as: If net then output = 1 else output = 0 output net input Modeling with Neural Networks 28 Perceptron Learning Demonstration • Output activation rule: • First, compute the net input to the output unit: wixi = net If net then output = 1 else output = 0 • Learning rule: If output is 1 and should be 0, then lower weights to active inputs and raise the threshold () If output is 0 and should be 1, then raise weights to active inputs and lower the threshold () (“active input” means xi = 1, not 0) Modeling with Neural Networks 29 Characteristics of perceptron learning • Supervised learning: Gave it a set of input-output examples for it to model the function (a teaching signal) • Error correction learning: only correct it when it is wrong. • Random presentation of patterns. • Slow! Learning on some patterns ruins learning on others. Modeling with Neural Networks 30 Perceptron Learning Made Simple • Output activation rule: • First, compute the net input to the output unit: wixi = net If net then output = 1 else output = 0 • Learning rule: If output is 1 and should be 0, then lower weights to active inputs and raise the threshold () If output is 0 and should be 1, then raise weights to active inputs and lower the threshold () Modeling with Neural Networks 31 Perceptron Learning Made Simple • Learning rule: If output is 1 and should be 0, then lower weights to active inputs and raise the threshold () If output is 0 and should be 1, then raise weights to active inputs and lower the threshold () • Learning rule: wi(t+1) = wi(t) + *(teacher - output)*xi ( is the learning rate) Modeling with Neural Networks 32 Perceptron Learning Made Simple • Learning rule: If output is 1 and should be 0, then lower weights to active inputs and raise the threshold () If output is 0 and should be 1, then raise weights to active inputs and lower the threshold () • Learning rule: wi(t+1) = wi(t) + *(teacher - output)*xi ( is the learning rate) • This is known as the delta rule because learning is based on the delta (difference) between what you did and what you should have done: = (teacher - output) Modeling with Neural Networks 33 Problems with perceptrons • The learning rule comes with a great guarantee: anything a perceptron can compute, it can learn to compute. • Problem: Lots of things were not computable, e.g., XOR (Minsky & Papert, 1969) • Minsky & Papert said: • if you had hidden units, you could compute any boolean function. • But no learning rule exists for such multilayer networks, and we don’t think one will ever be discovered. Modeling with Neural Networks 34 Problems with perceptrons Modeling with Neural Networks 35 Aside about perceptrons • They didn’t have hidden units - but Rosenblatt assumed nonlinear preprocessing! • Hidden units compute features of the input • The nonlinear preprocessing is a way to choose features by hand. • Support Vector Machines essentially do this in a principled way, followed by a (highly sophisticated) perceptron learning algorithm. Modeling with Neural Networks 36 Enter Rumelhart, Hinton, & Williams (1985) • Discovered a learning rule for networks with hidden units. • Works a lot like the perceptron algorithm: • • • • Randomly choose an input-output pattern present the input, let activation propagate through the network give the teaching signal propagate the error back through the network (hence the name back propagation) • change the connection strengths according to the error Modeling with Neural Networks 37 Enter Rumelhart, Hinton, & Williams (1985) OUTPUTS ... Hidden Units Activation Error ... INPUTS • The actual algorithm uses the chain rule of calculus to go downhill in an error measure with respect to the weights • The hidden units must learn features that solve the problem Modeling with Neural Networks 38 XOR Back Propagation Learning AND OR Random Network XOR Network • Here, the hidden units learned AND and OR - two features that when combined appropriately, can solve the problem Modeling with Neural Networks 39 XOR Back Propagation Learning Random Network OR AND XOR Network But, depending on initial conditions, there are an infinite number of ways to do XOR - backprop can surprise you with innovative solutions. Modeling with Neural Networks 40 Why is/was this wonderful? • • • • • Efficiency Learns internal representations Learns internal representations Learns internal representations Generalizes to recurrent networks Modeling with Neural Networks 41 Hinton’s Family Trees example • Idea: Learn to represent relationships between people that are encoded in a family tree: Modeling with Neural Networks 42 Hinton’s Family Trees example • Idea 2: Learn distributed representations of concepts: localist outputs Learn: features of these entities useful for solving the task Input: localist people localist relations Localist: one unit “ON” to represent each item Modeling with Neural Networks 43 People hidden units: Hinton diagram • What does the unit 1 encode? What is unit 1 encoding? Modeling with Neural Networks 44 People hidden units: Hinton diagram • What does unit 2 encode? What is unit 2 encoding? Modeling with Neural Networks 45 People hidden units: Hinton diagram • Unit 6? What is unit 6 encoding? Modeling with Neural Networks 46 People hidden units: Hinton diagram When all three are on, these units pick out Christopher and Penelope: Other combinations pick out other parts of the trees Modeling with Neural Networks 47 Relation units What does the lower middle one code? Modeling with Neural Networks 48 Lessons • The network learns features in the service of the task - i.e., it learns features on its own. • This is useful if we don’t know what the features ought to be. • Can explain some human phenomena Modeling with Neural Networks 49 Thanks to funders, GURONS, and you! Questions? Modeling with Neural Networks 50