CMPUT 328: Midterm Sample Questions You can expect multiple choice, true false, short answer type, calculations type questions. The theme for midterm is exclusively on backpropagation, gradient descent, chain rule of derivatives, etc. Here are some sample questions. Q1. Compute the forward pass of the following part of a convolutional net: J I K Max pool Conv O ReLU W If I is the following image patch 1 3 2 4 6 4 4 8 3 1 0 2 2 1 4 3 9 1 4 7 2 3 9 2 and W is the following 3-by-3 filter matrix -1 0 1 -1 0 1 -1 0 1 Compute J, K and O. You are not doing any zero padding while doing the convolution. Assume stride size 1 for the convolution. Assume a 2-by-2 max pooling with stride 2. Write J, K and O below. Q2. Now Assume that ideal output IO = [7, 10]. Also assume the loss as 0.5 times the square of Euclidean distance between IO and O. Compute back-propagation for O, K, J, W, and I. Look at the notebook posted in Week 5 resources. Q3: Suppose a function compute predicted class probabilities as: 𝑦 𝑝 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑋𝑤) where X is the feature matrix, w is the parameter vector. Let the loss function be cross entropy between 𝑦𝑝 and ground truth vector y. Compute gradient of cross entropy loss with respect to parameter vector w. Q4. Apply chain rule for a fully connected neural network, such as: 𝑦 𝑝 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑟𝑒𝑙𝑢(𝑋𝑤1 )𝑤2 ), where w1 and w2 are parameter vectors/matrices. Q5. Demonstrate application of chain rule of derivative to the neural net with output: 𝑦𝑝 = (𝑥𝑊1 + 𝑏1)𝑊2 + 𝑏2 with respect to a L2-loss using ground truth vector y. Here 𝜎 is the sigmoid function, x is input vector. Using chain rule compute gradients of the L2-loss with respect to all four parameters: 𝑊1 , 𝑏1 , 𝑊2 , 𝑏2 . Q6. Consider a neural net that outputs yp = softmax(sigmoid(X*w+b)*K+c). The loss is cross entropy between yp and ground truth y. Compute the gradient of the loss with respect to W, b, K, and c. Assume the addition signs imply broadcast additions over the batch dimension (the first dimension). Here * denotes matrix-matrix or matrix-vector multiplication. Q7. Consider a neural net that outputs Y = relu(X*W)*W. The loss is L2 between Y and X. Compute the gradient of the loss with respect to W. Q8. Starting with x=-1.0 and y=1.0, apply gradient descent to minimize (𝑥�, 𝑦�) = 𝑥�2 + 𝑦�4. First write gradient of f, then write gradient descent algorithm. Assume a suitable step length value. Compute values of (x,y) for four successive iterations. Q8. Compute yp = softmax(x). x = [-1, 3.3, 0.3]. Compute Jacobian of yp at x = [-1, 3.3, 0.3]. Q9. yp = softmax(x) and loss L = crossentropy(yp,y), where y is the ideal output. Write an expression for the gradient of L with respect to x. Q10. (True/False). Residual blocks help with gradient flow during backpropagation. Q11. Explain vanishing gradient problem. How do you mitigate vanishing gradient problem? Q12. True/False. Backprogation needs to store all feature maps in a neural network. Q13. True/False loss.backward() in pytorch adjusts parameters of the model. Q14. True/False PyTorch optimizers perform backward passes. Q15. True/False In PyTorch, a variable and the loss gradient of the same variable have the same shape. Q16. True/False The function argmax is differentiable. Q17. True/False Numerical derivatives are more accurate than but less efficient to compute than PyTorch autograd based derivative computation. Q18. True/False PyTorch autograd cannot work if your code has branching (e.g., if-else) statements. Q19. True/False PyTorch autograd cannot work if your code has loop (e.g., for) statements. Q20. True/False PyTorch can compute derivatives during the forward pass.