Uploaded by Binh Phan H.

Midterm Practice

advertisement
CMPUT 328: Midterm Sample Questions
You can expect multiple choice, true false, short answer type, calculations type questions. The
theme for midterm is exclusively on backpropagation, gradient descent, chain rule of
derivatives, etc.
Here are some sample questions.
Q1. Compute the forward pass of the following part of a convolutional net:
J
I
K
Max
pool
Conv
O
ReLU
W
If I is the following image patch
1
3
2
4
6
4
4
8
3
1
0
2
2
1
4
3
9
1
4
7
2
3
9
2
and W is the following 3-by-3 filter matrix
-1
0
1
-1
0
1
-1
0
1
Compute J, K and O. You are not doing any zero padding while doing the convolution. Assume
stride size 1 for the convolution. Assume a 2-by-2 max pooling with stride 2. Write J, K and O
below.
Q2. Now Assume that ideal output IO = [7, 10]. Also assume the loss as 0.5 times the square of
Euclidean distance between IO and O. Compute back-propagation for O, K, J, W, and I. Look at
the notebook posted in Week 5 resources.
Q3: Suppose a function compute predicted class probabilities as: 𝑦 𝑝 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑋𝑤)
where X is the feature matrix, w is the parameter vector. Let the loss function be cross
entropy between 𝑦𝑝 and ground truth vector y. Compute gradient of cross entropy loss with
respect to parameter vector w.
Q4. Apply chain rule for a fully connected neural network, such as: 𝑦 𝑝 =
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑟𝑒𝑙𝑢(𝑋𝑤1 )𝑤2 ), where w1 and w2 are parameter vectors/matrices.
Q5. Demonstrate application of chain rule of derivative to the neural net with output: 𝑦𝑝 =
(𝑥𝑊1 + 𝑏1)𝑊2 + 𝑏2 with respect to a L2-loss using ground truth vector y. Here 𝜎 is the sigmoid
function, x is input vector. Using chain rule compute gradients of the L2-loss with respect to all
four parameters: 𝑊1 , 𝑏1 , 𝑊2 , 𝑏2 .
Q6. Consider a neural net that outputs yp = softmax(sigmoid(X*w+b)*K+c). The loss is cross entropy
between yp and ground truth y. Compute the gradient of the loss with respect to W, b, K, and c. Assume
the addition signs imply broadcast additions over the batch dimension (the first dimension). Here *
denotes matrix-matrix or matrix-vector multiplication.
Q7. Consider a neural net that outputs Y = relu(X*W)*W. The loss is L2 between Y and X. Compute the
gradient of the loss with respect to W.
Q8. Starting with x=-1.0 and y=1.0, apply gradient descent to minimize (𝑥�, 𝑦�) = 𝑥�2 + 𝑦�4. First write
gradient of f, then write gradient descent algorithm. Assume a suitable step length value. Compute
values of (x,y) for four successive iterations.
Q8. Compute yp = softmax(x). x = [-1, 3.3, 0.3]. Compute Jacobian of yp at x = [-1, 3.3, 0.3].
Q9. yp = softmax(x) and loss L = crossentropy(yp,y), where y is the ideal output. Write an expression for
the gradient of L with respect to x.
Q10. (True/False). Residual blocks help with gradient flow during backpropagation.
Q11. Explain vanishing gradient problem. How do you mitigate vanishing gradient problem?
Q12. True/False. Backprogation needs to store all feature maps in a neural network.
Q13. True/False loss.backward() in pytorch adjusts parameters of the model.
Q14. True/False PyTorch optimizers perform backward passes.
Q15. True/False In PyTorch, a variable and the loss gradient of the same variable have the same shape.
Q16. True/False The function argmax is differentiable.
Q17. True/False Numerical derivatives are more accurate than but less efficient to compute than PyTorch
autograd based derivative computation.
Q18. True/False PyTorch autograd cannot work if your code has branching (e.g., if-else) statements.
Q19. True/False PyTorch autograd cannot work if your code has loop (e.g., for) statements.
Q20. True/False PyTorch can compute derivatives during the forward pass.
Download