About connectionism

advertisement
Connectionist Computation
A model of computation that is inspired by the neuronal
architecture of the brain. Proposed by McCulloch and
Pitts in 1943, also considered a general model of computing by von Neumann. Increasingly popular in the 1980s
(parallel computing, parallel distributed processing, cognitive modeling: connectionist,  Neuronal networks.
Connectionist computation and the Necker cube
Connectionist computing is particularly useful for the computation of constraint satisfaction, that is, to determine the
values of a set of variables that have to satisfy certain conditions. Example: the description of the two perceptional
states of the Necker cube
E
H
F
Model Neurons and Neuronal Networks
Model neurons
A neuron receives an input from potentially many sources.
The input is represented by a vector x of numerical values
x=(x1,x2,x3…xn). The output is a function on this vector:
f(x). The output can in turn be input into potentially many
other neurons. General convention: input sources that increase the output (excitatory) marked by arrow, input
sources that decrease the output (inhibitory) marked by
dots.
x1
x2
f(x1,x2,x3,x4)
x3
x4
input
output
(dendrites) (cell body) (axon)
The relevant input and output in real neurons is the frequency of spikes (continuous value). Often we consider simplified models with discrete or even binary values.
The function f that we assume should be simple. The simplest functions are threshold functions: Each input channel
i gets assigned a weight wi (we have a vector of weights
(w1,w2,…wn); if the sum of the weighted excitatory inputs
(w1x1 + w2x2 + …) is greater than a threshold, a specified
output is created.
Neurons for and, or, not
The basic Boolean operators can be analyzed as very simple
neurons. Assuming input values 0 and 1, and fires (1) if the
sum of the two inputs is greater than 1; or fires if the sum of
the two inputs is greater or equal to 1; not has an inhibitory
input, it fires if there is no input, and it stops firing if there
is an input. (As weight we can assume 1).
1
>1
0
and
or
not
We can imagine more “noise-resistant” neurons that can
also compute with inputs that deviate from 0 and 1. The
threshold value for and and orcould be set to 1.5 and 0.75.
Networks
We can assume neurons that perform similar operations on
more than two input channels, but we also can build such
units out of simpler ones. The following network will fire if
the input from 4 sources consists of one or more 1’s:
1
G
D
B
C
The two states identify certain conditions for the corners of
the cube that can be visualized as neurons that fire if the
conditions are met. There are excitatory connections in
within the conditions (a) and (b), and inhibitory conditions
between those sets. They affect the position of each point
with respect to its neighboring points, here illustrated just
for the case “A in foreground”. Each point can be in the
background or in the foreground.
Conditions (a)
Conditions (b)
A in foreground
A in background
B in background
B in foreground
C in background
C in foreground
D in foreground
D in background
E in foreground
E in background
F in background
F in foreground
G in background
G in foreground
H in foreground
H in background
If all the conditions (a) or (b) hold simultaneously, the states
of the units strengthen each other; mixed states lead to conflicts between the units.
Perceptrons
A relatively simple type of network architecture, perceptrons have been suggested in particular for pattern recognition (Rosenblatt 1958; Minsky & Papert 1969). Consider an
array of neurons that can be in defined states (the input
units, e.g. retina neurons), and a second layer (the output
units or perceptrons), each of which is connected to some or
all of the first neurons.
Perceptrons for pattern recognition
The following is an example. The task is to recognize “edges” in the upper array (an important task for vision).
0 0 1 1 1 1 0 0 0 1 0 1 1 0
0 1 0 0 1 0 0 0 1 0 1 1 0
As indicated with two examples, each output unit (lower
line) gets input from three adjacent input units (its receptive
field). E.g. the underlined unit gets the input (0,1,1). Each
output unit gives the same weight to these input vectors,
(-1,2,-1); the unit fires if the sum of the weighed inputs is
>0. In our example we get the following sums:
-1
1
A
1
0
0
1
-1
0
-1
1
0
1
1
-1
Perceptrons can recognize edges in two dimensions as well,
and they can recognize more complex configurations. 
1
CGS 360, Introduction to Cognitive Science, Steve Wechsler, Dept. of Linguistics, UT Austin, wechsler@mail.utexas.edu, 2/17/2016
0
The limits of perceptrons
Perceptrons are simple devices for computational tasks that
can be carried out locally, by considering the input in each
perceptron’s receptive field only. There are tasks that they
cannot do, for example, to find out whether a figure is connected. 
Hidden units and multilayered perceptrons
With perceptrons we can’t even implement exclusive or
(xor), a logical operator that yields 1 if exactly one of the
two inputs is 1. This is because perceptrons are monotone:
they fire once a certain threshold is reached; they don’t stop
if the threshold is exceeded.
The xor problem can be solved with hidden units between
the input units and the output units (numbers at the arrows
represent the weight of the input).
+1
+1
1.5
-2
0.5
+1
+1
Hidden units (internal representation units) increase the
computational power of neural networks substantially. For
pattern recognition and image processing a multilayered
architecture was proposed in which the output units of one
level are the input units of the next level.
This technique is used, for example, for character and word
recognition. Basic elements of characters are identified by
perceptrons at the lowest layer, perceptrons at higher layers
fire when they detect certain combinations of such features.
Learning
How are neural networks built? In particular, what determines the connections between units, their weight, and their
threshold value? How can a neural network learn?
Learning consists in development of new connections, in the
loss of existing connections, and in the modification of the
weight of existing connections. To make thinks simpler we
can disregard the first two processes and assume that every
unit is connected to every other unit, and connections can be
suppressed by changing their weight to 0. 
How perceptrons learn
For perceptrons there are effective learning algorithms, for
example supervised learning with reinforcement:
 Start in a state in which each perceptron comes with random weights and thresholds (or perhaps 0 everywhere).
 Provide an input, observe the output, notice the difference
from the expected output.
 Change weights and thresholds for output units randomly
and observe the output; if it is closer to the expected output,
keep the change, otherwise stay with the old value.
 Repeat the last two steps over and over again till the network stabilizes in a particular state.
Example: How a perceptron learns “or”. Assumption: Possible weights are -1, 0, 1; expected output (threshold) 1).
Abbreviations: i = input, w = weights, o = output.
Random initial state of weights: w: [-1,0]
i: [1, 0], w: [-1, 0], o: -1 expected o: 1
random change w: [0, 0], o: 0, closer to 1, keep.
i: [0, 1], w: [0, 0], o: 0
expected o: 1
random change w: [0, -1], o: -1 that’s worse.
random change w: [1, 0], o: 0, that’s the same.
random change w: [0, 1], o: 1 that’s better, keep.
i: [1, 1], w: [0,1], o: 1 expected o: 1, don’t change
i: [1, 0], w: [0, 1], o: 0
expected o: 1
random change w: [-1, 1], o: -1, worse, don’t change
random change w: [1, 0], o: 0, that’s the same.
random change w: [1, 1], o: 2 (1), that’s better, keep.
i: [0, 0], w: [1, 1], o: 0
expected o: <1, don’t change.
Subsequent trials with inputs will not lead to any change, w:
[1, 1] is the correct result.
Probabilistic variation helps escaping possible “local minima” in the search for a solution (like in annealing).
Learning with hidden units
For simple double-layered networks the learning algorithm
is guaranteed to give us the right solution eventually (there
is one local minimum that the learning algorithm approximates). Not so for multi-layered networks, for which we
may have many local minima. This is because the network
does not “know” how to judge random variations in the
settings of the hidden units. (They cannot be directly compared with the expected output).
But there are solutions that work quite well. The central
idea is the backpropagation algorithm (Rumelhart, Hinton,
Williams; supervised learning in multilayer neural networks). The output units send back to each hidden unit to
which they are connected the information about the sum of
the errors of the hidden unit (i.e., they specify the difference
between what the hidden units actually sent, and which
signal from the hidden units would have led to the intended
output. That is, the output units “teach” the hidden units.
The hidden units can use this information to adjust the
weights of their inputs, and so on.
Symbolic vs. Connectionist Computing
(Connectionism: philosophical issues)
 Symbolic: Representations consist of disjoint elements
that are concatenated following syntactic rules. They are
local; element stands for a particular thing. Computations
break down tasks into smaller tasks. They are typically
sequential; we distinguish between a CPU and a memory
unit. Sensitive to disturbance (noise).
 Connectionist: Representations are continuous and nonconcatenative. They are distributed. No clear distinction
between representation and computation, also no distinction
between CPU and memory. Processes are parallel, to a high
degree. Relatively insensitive to noise.
Problems for connectionism: The modeling of propositional
information, e.g. a dog bit a mailman. Thematic roles (actor, patient) must be distinguished, one person can act in
multiple roles. (Pinker, “Connectoplasm”)
CGS 360, Introduction to Cognitive Science, Steve Wechsler, Dept. of Linguistics, UT Austin, wechsler@mail.utexas.edu, 2/17/2016
Download