homework

advertisement
Advanced Topics in Machine Learning
HW 1 – Due Thursday, April 14, 2005
1. Consider the network illustrated below, with two input units (x1 and x2), one hidden
unit h1), and two output units (o1 and o2).
o1
o2
w3
w4
h1
w1
w2
x1
x2
(a) Let x, t be a training example, with input x = (1, 0) and target t = (0.9, 0.1). Suppose
the weights are initialized to w1 = 0.1, w2 = -0.1, w3 = 0.2, and w4 = -0.5, and the learning
rate  is 0.3. Give the values of o1 and o2 after x has been input to and forwardpropagated through the network, using the sigmoid activation function at each non-input
node.
(b) Suppose that back-propagation is run and the weights changed after the network is
given this single training example. Give the new values of each of the weights.
(c) Now suppose the same training example is input to and forward-propagated through
the network again, with the new weight values. Give the resulting values of o1 and o2.
Are they closer to the target values than they were in step (a)?
2. Consider the perceptron illustrated below.
o

w1
x1
w1
x2
+1
(a) Give values for w1, w2, and bias  that will implement the Boolean function
f(x1, x2) = x1 v x2 (where v stands for logical-or). Assume x1, x2 [0,1], and f(x1, x2) =
1 if at least one of x1, x2 is 1; f(x1, x2) = -1 otherwise. Use the sgn activation function
for the output node.
(b) Give the equation of the separation line defined by your weights and bias, and sketch
its graph. Show that it correctly separates the two classes.
3. Consider a single linear neuron with inputs x which are n-dimensional vectors with
components xi , i=1…n . For a set of patterns (td, xd) , d=1…D , the corresponding error,
or cost function, is
E ( w) 
 t
D
1
2D
d 1
d
 w  xd 
2
(note that the superscript d tells you which pattern , it is not an exponent) and the inner
product is defined as usual
w  xd 
n
w x
i 1
i
d
i
.
a. Show that the error function can be written as
E ( w) 
1 1 D d 2

t   2 w  rˆ  w  ( Rˆ w) 



2  D d 1

where the sample autocorrelation matrix has elements
1 D
Rˆ ij   xid x dj
D d 1
and the sample cross-correlation vector has components
rˆi 
1 D d d
x t
D d 1
.
b. Explicitly evaluate the condition that at the cost function’s minimum, all the
components of the gradient of E(w) vanish

E ( w)  0
w j
to show that the optimum weight (estimated on this data) ŵ satisfies
Rˆ wˆ  rˆ .
Conclude that if the autocorrelation matrix is invertible, the unique global
minimum is at
wˆ  Rˆ 1 rˆ .
c. Use its definition to show that the autocorrelation matrix is positive semi-definite,
that is for any vector V, it is true that
V  (R V )  0
.
Download