Assignment2_Question3-2

advertisement
Psychology 209, Winter, 2013
Question 3.2.3 (Revised)
Revised Assignment (DUE FRIDAY): Hand in answers to questions 3.2.1 – 2.3.4, following the handbook
for 2.2.1, 3.2.2, and 3.2.4. Following these instructions for Question 3.2.3
The purpose of this question is to help you gain an intuitive sense of the difficulty of escaping from some
local maxima, and also of the specific difficulties encountered in the Cube example in the handbook. To
convey a sense of this, I thought it would be useful to have you envision a minimum sequence of state
changes, or single unit activation flips, that would lead from a [4 4] maximum all the way to a global
maximum. What is the minimum number of state changes that would have to occur? Four units would
have to go on, four would have to go off, so, eight state changes.
Write down a sequence of state changes that would take you from a specific [4 4] state to a specific
global maximum state. For simplicity we can number the units in the cube network 1-16 as I have done
in Fig 1, attached. A series of state changes that would work might begin with unit 5 switching from off
to on, for example. Note that there are many such series that could work, some harder than others, but
your task here is to choose just one such series.
Now, suppose that, in the course of running at a fixed temperature T = 1, the network selected units for
updating in the order you have listed them. It should be clear that the probability that these units
would in fact be selected for updating in the given order is very low, but the probability of any sequence
of eight updates is equally low, so we will focus on the probabilities of the flips involved.
At each update, calculate the probability that the unit’s state would be changed to the new value
indicated in your list. You should end up with a table listing the eight units, each unit’s state after the
flip, and the probability of each flip. Finally, you can calculate the overall probability that, if the
network did visit these states in this order, it would actually reach the global maximum at the end of the
series of flips. Since each of these events are independent, this is just the product of the first number,
times the second, times the third…etc… times the eighth. To see that this is true, note that if the other
outcome occurs at any one of the eight updates, the network will not be in the global maximum at the
end of the eight updates. So to reach it, the first flip must occur, then the second, etc.
How do you calculate the probability that a given unit’s state will flip at each update? This probability is
equal to the probability that the unit’s activation will be set to the new value (regardless of the old
value) after the flip occurs. For reference, see Equation 3.7 in the PDP Handbook. Let’s suppose that for
a given unit, the new value is ‘on’ or activation = 1. You calculate the probability that the unit will be on
after the update by (a) first calculating the unit’s net input and dividing that by the temperature (which
is 1), and (b) then using the logistic function to calculate the probability that the unit will be on. To
minimize your need to calculate this, I provide ‘p(on)’ for relevant values of net in the attached Table 1
(we neglect T because it is 1). Note that sometimes, for a given unit, the new value is ‘off’ or activation
= 0. In this case, we need to know p(off), which is 1-p(on). These values are also given in the table. As a
sanity check, recall that a unit with positive net input is more likely to be set to ‘on’ than ‘off’ – a unit
with negative net input is more likely to be set to ‘off’ than ‘on’.
You can organize your results in a table where each line shows the unit chosen to update, the state it
should be in after the update, and the probability that it will actually end up in that state. At the bottom
you can write down the product of these eight probabilities.
The next step of the assignment is to calculate the probability that the reverse sequence of steps would
take you back from the global maximum to the starting place. You can write these numbers down in
another column of the same table, working your way from the bottom to the top. You will note that the
first step of the backward sequence reverses the last step of the forward sequence. You should see
something systematic about the probability of the reverse change compared with the probability of the
change that occurred in the forward direction. At the end, you can then calculate the overall probability
that this reverse sequence of updates would actually take you back to the starting [4 4] state. Write this
down at the bottom of the new column, as for the first case. The two sequences of visits are equally
likely, so the ratio of the probabilities of all of the updates coming out as needed gives you a sense of
the relative likelihood of escape vs. return to the local minimum. Hand in the table you have created,
and answer the following questions.
Questions: (a) What is the relationship between the probability of each ‘forward’ flip and the probability
of the corresponding ‘backward’ flip? (b) Which steps are the hardest to achieve, in each direction? (c)
What is the ratio of the total forward probability to the total backward probability? (d) Make a brief
summary statement, based on these numbers, about the relative likelihood of escape vs. return at
temperature 1. You may mention caveats or other comments, but stay within 3-4 sentences for part (d).
Now, conduct the (simpler) analysis of a minimal sequence of updates required to get from a [6 2]
maximum to the nearby [8 0] global maximum, as well as a minimum sequence of steps to get from the
same [6 2] maximum to a neighboring [4 4] maximum (one that’s only 4 steps away from the [6 2]
maximum). In each case, also calculate the probability that the reverse sequence of updates would take
you back from the endpoint to the starting point. Only four steps are required in each case – and it is
possible that you already constructed both sequences in the earlier part of the problem. But maybe you
did not. When you have completed this analysis, you will have two additional tables each with four lines
and a summary row at the bottom. You should hand in these tables (labeling them so we know which is
which), as well as the answers to these final questions:
Final Questions. (e) Which step is the hardest to achieve in the forward direction from the [6 2]
maximum to the [8 8] maximum? Compare how hard it is to achieve this step to how hard it is to
achieve the hardest step between the [4 4] maximum and the [6 2] maximum. Explain the reason for
this difference. (f) With all of the results above in hand, imagine an ensemble of networks at some time
during settling, and suppose that twenty of these networks are in [6 2] maxima, twenty are in [4 4]
maxima, and twenty are in [2 6] maxima. Imagine that each network settles for 50 more cycles of 16
updates. Use the information above to speculate on how the distribution will have changed. Will there
be fewer [6 2], [4 4], and [2 6] states? Will there still be about the same number of [6 2] [4 4] and [2 6]
cases as before? Try to explain your answer. Maximum of six sentences for part (f).
Table 1.
Net inputs and probabilities, Temperature=1
net
p(on)
p(off)
3.0000
0.9526
0.0474
2.5000
0.9241
0.0759
2.0000
0.8808
0.1192
1.5000
0.8176
0.1824
1.0000
0.7311
0.2689
0.5000
0.6225
0.3775
0.0000
0.5000
0.5000
-0.5000
0.3775
0.6225
-1.0000
0.2689
0.7311
-1.5000
0.1824
0.8176
-2.0000
0.1192
0.8808
-2.5000
0.0759
0.9241
-3.0000
0.0474
0.9526
Fig. 1. Labels for the sixteen vertices in the cube problem
Download