NEURONAL NETWORKS AND CONNECTIONIST (PDP) MODELS • Thorndike’s “Law of Effect” (1920’s) – Reward strengthens connections for operant response • Hebb’s “reverberatory circuits” (1940’s) – Self-sustaining circuits of activity – This “stm” consolidates memories – “Hebbian” learning (Co-occurrence of activation strengthens synaptic paths) – Memory as stable changes in connection strengths (CONNECTIONISM) • Selfridge’s Pandemonium (1955) – A parallel, distributed model of pattern recognition Selfridge’s Pandemonium Model of Letter Recognition • Rosenblatt’s Perceptrons (1962) – Simple PDP model that could learn – Limited to certain kinds of associations • Hinton & Anderson (1981) – PDP model of associative memory – Solves the “XOR” problem with hidden layers of units – Adds powerful new learning algorithms • McClelland & Rumelhart (1986) – Explorations in the Microstructure of Cognition – Shows power and scope of PDP approach – Emergence of “subsymbolic” models and brain metaphor – PDP modeling explodes A symbolic-level network model (McClelland, 1981) • The Jets and Sharks model – Parallel connections and weights between units – Excitatory and inhibitory connections – Retrieval, not encoding – “units” still symbolic A PDP TUTORIAL • A PDP model consists of – A set of processing units (neurons) – Usually arranged in “layers” – Connections from each unit in one layer to ALL units of the next Input Hidden Output – Each unit has an activity level – Knowledge as distributed patterns of activity at input and output – Connections vary in strength – And may be excitatory or inhibitory input -.5 -.5 output +.5 +.5 – Input to unit k = sum of weighted connections – for each connection from layer j: input from (j) = activity(j) x weight(j to k) j2 j1 +1 +1 -.5 -.5 +.5 +.5 -1 k1 +1 k1 – The same net can associate a large number of distinct I/O patterns -1 +1 0 -.5 -.5 +.5 +.5 0 – The I/O function of units can be linear or nonlinear threshold – Hidden layers add power and mystery e.g., solving the XOR problem: +1 +1 +1 +1 +1 +1 threshold +1 -2 +1 LEARNING IN PDP MODELS • Learning algorithms – Rules for adjusting weights during “training” • The Hebbian Rule – Co-activation strengthens connection – Change (jk) = activation (j) x activation (k) • The Delta Rule – Weights adjusted by amount of error compared to desired output – Change (jk) = activation (j) x r[ error (k)] • Backpropagation – Delta rule adjustments cascade backward through hidden unit layers EVALUATION OF PDP MODELS • Wide range of successes of learning – – – – XOR and other “nonlinear” problems Text-to-speech associations Concepts and stereotypes Grammatical “rules” and exceptions • Powerful and intuitive framework – At least some “biological face validity” • Parallel processing • Distributed networks for memory • Graceful degradation • complex behavior emerges from “simple” system • Limitations and issues – Too many “degrees of freedom”? – Learning even “simple” associations requires thousands of cycles – Some processes (e.g., backpropagation) have no known neural correlate • (yes, but massive feedback and “recurrent” loops in cortex) – Some failures to model human behavior • Pronouncing pseudohomophones • Abruptness of stages in verb learning • Catastrophic retroactive interference – Difficulty in modelling “serially ordered” and “rule-based” behavior – Need for processes outside of model