Advanced Topics in Machine Learning HW 1 – Due Thursday, April 14, 2005 1. Consider the network illustrated below, with two input units (x1 and x2), one hidden unit h1), and two output units (o1 and o2). o1 o2 w3 w4 h1 w1 w2 x1 x2 (a) Let x, t be a training example, with input x = (1, 0) and target t = (0.9, 0.1). Suppose the weights are initialized to w1 = 0.1, w2 = -0.1, w3 = 0.2, and w4 = -0.5, and the learning rate is 0.3. Give the values of o1 and o2 after x has been input to and forwardpropagated through the network, using the sigmoid activation function at each non-input node. (b) Suppose that back-propagation is run and the weights changed after the network is given this single training example. Give the new values of each of the weights. (c) Now suppose the same training example is input to and forward-propagated through the network again, with the new weight values. Give the resulting values of o1 and o2. Are they closer to the target values than they were in step (a)? 2. Consider the perceptron illustrated below. o w1 x1 w1 x2 +1 (a) Give values for w1, w2, and bias that will implement the Boolean function f(x1, x2) = x1 v x2 (where v stands for logical-or). Assume x1, x2 [0,1], and f(x1, x2) = 1 if at least one of x1, x2 is 1; f(x1, x2) = -1 otherwise. Use the sgn activation function for the output node. (b) Give the equation of the separation line defined by your weights and bias, and sketch its graph. Show that it correctly separates the two classes. 3. Consider a single linear neuron with inputs x which are n-dimensional vectors with components xi , i=1…n . For a set of patterns (td, xd) , d=1…D , the corresponding error, or cost function, is E ( w) t D 1 2D d 1 d w xd 2 (note that the superscript d tells you which pattern , it is not an exponent) and the inner product is defined as usual w xd n w x i 1 i d i . a. Show that the error function can be written as E ( w) 1 1 D d 2 t 2 w rˆ w ( Rˆ w) 2 D d 1 where the sample autocorrelation matrix has elements 1 D Rˆ ij xid x dj D d 1 and the sample cross-correlation vector has components rˆi 1 D d d x t D d 1 . b. Explicitly evaluate the condition that at the cost function’s minimum, all the components of the gradient of E(w) vanish E ( w) 0 w j to show that the optimum weight (estimated on this data) ŵ satisfies Rˆ wˆ rˆ . Conclude that if the autocorrelation matrix is invertible, the unique global minimum is at wˆ Rˆ 1 rˆ . c. Use its definition to show that the autocorrelation matrix is positive semi-definite, that is for any vector V, it is true that V (R V ) 0 .