BindingProblem

advertisement
Jerry,
Sorry: I have been having trouble getting my brain awake and in gear. So I keep writing blather.
But we are going out soon so I will send you what I have. I may be seeing a doctor on Monday
in your area so if you like I can stop by (briefly because I still have to get this ready for
Tuesday’s class).
Assume that an Index is the name of an object-token. Having a token of the name is like having
the name of some variable in a computer – it enables you to access its value or whatever is the
named thing. Of course the usual thing in a computer is that you access some symbol in the
machine, but that needn’t be the case. I see no reason why given the name of an object the
computer could not turn its video camera to it. In that case it might have to have the object’s
cylindrical coordinates, focal distance and brightness, but that’s all part of the backstage
(subpersonal, modular) machinery. But that isn’t how I want to think of it.
Given the name of an object (e.g., o) I have been assuming that the visual system can do such
things as move its attention to it without moving its gaze. In some sense this is asking the mind
to do something to some non-mental thing and it’s true that this is usually not possible and it may
even be true that Joe Levine says it can’t be done directly but has to be mediated by a mental
representation. So be it. But it still remains the case that you can accomplish the function of
switching your attention to a named object in your field of view (either by continuous movement,
like many people believe, or without moving through intermediate locations, as happens in
Sperling’s recent model).
One of the main assumptions about FINSTs is that these indexes allow the evaluation of
predicates such as Red(x) or Beside(x,y) where x and y are indexes that are bound to and thereby
refer to two distinct objects in the field of view (why 2 objects and not two views of one object?
Good question and one I had worried about because it requires additional bells and whistles, but
it does not require a miracle. I think it shows either than index binding occurs later in the visua
module – after distinct views are computed – or it occurs early and cannot select views at all,
only different objects. Given that we are dealing with preconceptual processes the latter makes
more sense. More on that later (maybe).
There was also the question of whether you can interrogate an object knowing only its name (or
the name of its FINST). That question, it seems to me, is one clouded in unnecessary mystery.
The way you do that is just the way you direct your attention (and not your gaze) to an object of
interest. You do it in all cases by doing something to the proximal stimulus which we assume
maps any relevant properties of the distal stimulus of interest. It’s true that you see distal
objects and not other links in the causal chain. But that does not mean you could not affect your
perception of the distal object by doing something to its proximal link in the chain. You could,
for example, break the chain by turning off an intermediate link, thereby preventing perception
of the relevant distal object or making it less salient. Similarly, you could check whether the
cells at the proximal stimulus are able activate a P detector, etc.
In Seeing and Visualizing I suggest a connectionist-type model for combining the detection of a
property with a circuit that inhibits all but one region of the proximal stimulus and thereby
detects the presence of a property at a particular object, where the object is not identified by its
location but by a name that simply identifies it as one of the objects that caused an index to be
assigned. The network is not smart but it reveals that there is no mystery in detecting P-atObject-O. The name of the object is available only temporarily or it can be made to track the
object so long as the object moves continuously (so the tracking through occlusion requires some
additional mechanism or assumptions) by a technique suggested by Koch & Ullman – make the
circuit enhance locations on the proximal stimulus that are near the currently selected one.
In case you are interested I provide several levels of detail of the network (including a circuit at
the end and the Appendix of Chapter 5 of Seeing and Visualizing). A summary is that the model
shows that it is possible to in effect send a signal (a pulse) to a distal object by manipulating its
current proximal projection (you are right that you could not send a signal to the distal object –
the best you can do is ‘open/close your eyes’ to it). In this network the following happens. A set
of transducers is activated by incoming information. Each point on the proximal stimulus has
lots of punctate transducers. A transducer is partially (subthreshold) activated by the presence of
some feature, say, feature F. What will happen eventually is that a signal will be sent to selected
transducers on the proximal stimulus. If that transducer is partially activated it will go over
threshold and fire. Since all transducers of the particular type (which detect a particular feature)
are connected to a logical disjunction network, the output of the mechanism will be the
information that there is F somewhere on the proximal stimulus. What makes it possible to ask
this question of a particular location on the proximal stimulus is the Winner-Take-All (WTA)
network that shuts down all transducers except the one that has the highest activity. Putting
aside the details, it does it in a manner familiar to connectionist network builders; by using a
WTA network that settles into just one node (or one path through the network), being active at
the end of a series of adjustments known as a “relaxation”. Each place in the proximal stimulus
(call that a proximal place or PP) thus has several links going to it: one from each property
transducer at that location, one from the WTA network which ensures that only the most active
(the most salient) PP is able to respond – the other PP units having been shut down. The PP also
receives one input from a selected feature inquiry channel (depicted in the attached figure as a
button for each property). The end result is that you can open the channel to one PP by pulsing it
with a signal corresponding to the property you want to check on, so you can find out whether
the currently most active object (or some other activated/indexed object) has property P. You
can also find out other properties that it has because you have an open channel to that object.
This simple picture does not deal with multiple objects or moving objects, but they can be fit into
the same general picture with additional mechanisms of the same general type.
Download