Connectionist Computation A model of computation that is inspired by the neuronal architecture of the brain. Proposed by McCulloch and Pitts in 1943, also considered a general model of computing by von Neumann. Increasingly popular in the 1980s (parallel computing, parallel distributed processing, cognitive modeling: connectionist, Neuronal networks. Connectionist computation and the Necker cube Connectionist computing is particularly useful for the computation of constraint satisfaction, that is, to determine the values of a set of variables that have to satisfy certain conditions. Example: the description of the two perceptional states of the Necker cube E H F Model Neurons and Neuronal Networks Model neurons A neuron receives an input from potentially many sources. The input is represented by a vector x of numerical values x=(x1,x2,x3…xn). The output is a function on this vector: f(x). The output can in turn be input into potentially many other neurons. General convention: input sources that increase the output (excitatory) marked by arrow, input sources that decrease the output (inhibitory) marked by dots. x1 x2 f(x1,x2,x3,x4) x3 x4 input output (dendrites) (cell body) (axon) The relevant input and output in real neurons is the frequency of spikes (continuous value). Often we consider simplified models with discrete or even binary values. The function f that we assume should be simple. The simplest functions are threshold functions: Each input channel i gets assigned a weight wi (we have a vector of weights (w1,w2,…wn); if the sum of the weighted excitatory inputs (w1x1 + w2x2 + …) is greater than a threshold, a specified output is created. Neurons for and, or, not The basic Boolean operators can be analyzed as very simple neurons. Assuming input values 0 and 1, and fires (1) if the sum of the two inputs is greater than 1; or fires if the sum of the two inputs is greater or equal to 1; not has an inhibitory input, it fires if there is no input, and it stops firing if there is an input. (As weight we can assume 1). 1 >1 0 and or not We can imagine more “noise-resistant” neurons that can also compute with inputs that deviate from 0 and 1. The threshold value for and and orcould be set to 1.5 and 0.75. Networks We can assume neurons that perform similar operations on more than two input channels, but we also can build such units out of simpler ones. The following network will fire if the input from 4 sources consists of one or more 1’s: 1 G D B C The two states identify certain conditions for the corners of the cube that can be visualized as neurons that fire if the conditions are met. There are excitatory connections in within the conditions (a) and (b), and inhibitory conditions between those sets. They affect the position of each point with respect to its neighboring points, here illustrated just for the case “A in foreground”. Each point can be in the background or in the foreground. Conditions (a) Conditions (b) A in foreground A in background B in background B in foreground C in background C in foreground D in foreground D in background E in foreground E in background F in background F in foreground G in background G in foreground H in foreground H in background If all the conditions (a) or (b) hold simultaneously, the states of the units strengthen each other; mixed states lead to conflicts between the units. Perceptrons A relatively simple type of network architecture, perceptrons have been suggested in particular for pattern recognition (Rosenblatt 1958; Minsky & Papert 1969). Consider an array of neurons that can be in defined states (the input units, e.g. retina neurons), and a second layer (the output units or perceptrons), each of which is connected to some or all of the first neurons. Perceptrons for pattern recognition The following is an example. The task is to recognize “edges” in the upper array (an important task for vision). 0 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 As indicated with two examples, each output unit (lower line) gets input from three adjacent input units (its receptive field). E.g. the underlined unit gets the input (0,1,1). Each output unit gives the same weight to these input vectors, (-1,2,-1); the unit fires if the sum of the weighed inputs is >0. In our example we get the following sums: -1 1 A 1 0 0 1 -1 0 -1 1 0 1 1 -1 Perceptrons can recognize edges in two dimensions as well, and they can recognize more complex configurations. 1 CGS 360, Introduction to Cognitive Science, Steve Wechsler, Dept. of Linguistics, UT Austin, wechsler@mail.utexas.edu, 2/17/2016 0 The limits of perceptrons Perceptrons are simple devices for computational tasks that can be carried out locally, by considering the input in each perceptron’s receptive field only. There are tasks that they cannot do, for example, to find out whether a figure is connected. Hidden units and multilayered perceptrons With perceptrons we can’t even implement exclusive or (xor), a logical operator that yields 1 if exactly one of the two inputs is 1. This is because perceptrons are monotone: they fire once a certain threshold is reached; they don’t stop if the threshold is exceeded. The xor problem can be solved with hidden units between the input units and the output units (numbers at the arrows represent the weight of the input). +1 +1 1.5 -2 0.5 +1 +1 Hidden units (internal representation units) increase the computational power of neural networks substantially. For pattern recognition and image processing a multilayered architecture was proposed in which the output units of one level are the input units of the next level. This technique is used, for example, for character and word recognition. Basic elements of characters are identified by perceptrons at the lowest layer, perceptrons at higher layers fire when they detect certain combinations of such features. Learning How are neural networks built? In particular, what determines the connections between units, their weight, and their threshold value? How can a neural network learn? Learning consists in development of new connections, in the loss of existing connections, and in the modification of the weight of existing connections. To make thinks simpler we can disregard the first two processes and assume that every unit is connected to every other unit, and connections can be suppressed by changing their weight to 0. How perceptrons learn For perceptrons there are effective learning algorithms, for example supervised learning with reinforcement: Start in a state in which each perceptron comes with random weights and thresholds (or perhaps 0 everywhere). Provide an input, observe the output, notice the difference from the expected output. Change weights and thresholds for output units randomly and observe the output; if it is closer to the expected output, keep the change, otherwise stay with the old value. Repeat the last two steps over and over again till the network stabilizes in a particular state. Example: How a perceptron learns “or”. Assumption: Possible weights are -1, 0, 1; expected output (threshold) 1). Abbreviations: i = input, w = weights, o = output. Random initial state of weights: w: [-1,0] i: [1, 0], w: [-1, 0], o: -1 expected o: 1 random change w: [0, 0], o: 0, closer to 1, keep. i: [0, 1], w: [0, 0], o: 0 expected o: 1 random change w: [0, -1], o: -1 that’s worse. random change w: [1, 0], o: 0, that’s the same. random change w: [0, 1], o: 1 that’s better, keep. i: [1, 1], w: [0,1], o: 1 expected o: 1, don’t change i: [1, 0], w: [0, 1], o: 0 expected o: 1 random change w: [-1, 1], o: -1, worse, don’t change random change w: [1, 0], o: 0, that’s the same. random change w: [1, 1], o: 2 (1), that’s better, keep. i: [0, 0], w: [1, 1], o: 0 expected o: <1, don’t change. Subsequent trials with inputs will not lead to any change, w: [1, 1] is the correct result. Probabilistic variation helps escaping possible “local minima” in the search for a solution (like in annealing). Learning with hidden units For simple double-layered networks the learning algorithm is guaranteed to give us the right solution eventually (there is one local minimum that the learning algorithm approximates). Not so for multi-layered networks, for which we may have many local minima. This is because the network does not “know” how to judge random variations in the settings of the hidden units. (They cannot be directly compared with the expected output). But there are solutions that work quite well. The central idea is the backpropagation algorithm (Rumelhart, Hinton, Williams; supervised learning in multilayer neural networks). The output units send back to each hidden unit to which they are connected the information about the sum of the errors of the hidden unit (i.e., they specify the difference between what the hidden units actually sent, and which signal from the hidden units would have led to the intended output. That is, the output units “teach” the hidden units. The hidden units can use this information to adjust the weights of their inputs, and so on. Symbolic vs. Connectionist Computing (Connectionism: philosophical issues) Symbolic: Representations consist of disjoint elements that are concatenated following syntactic rules. They are local; element stands for a particular thing. Computations break down tasks into smaller tasks. They are typically sequential; we distinguish between a CPU and a memory unit. Sensitive to disturbance (noise). Connectionist: Representations are continuous and nonconcatenative. They are distributed. No clear distinction between representation and computation, also no distinction between CPU and memory. Processes are parallel, to a high degree. Relatively insensitive to noise. Problems for connectionism: The modeling of propositional information, e.g. a dog bit a mailman. Thematic roles (actor, patient) must be distinguished, one person can act in multiple roles. (Pinker, “Connectoplasm”) CGS 360, Introduction to Cognitive Science, Steve Wechsler, Dept. of Linguistics, UT Austin, wechsler@mail.utexas.edu, 2/17/2016