Uploaded by Adam Crawford

FinalExam-F20-V2

advertisement
CS 439 - Final Exam
Prof. A.D. Gunawardena
Administer Date: Friday Dec 18, 2020 from 8:00-11:00 AM
• This test is based topics from linear algebra
• (if you can print) Please write the answers ONLY in the space provided. You may lose points for unncessarily
long answers.
• if you do not have a printer, just write answer to each question on a separate page.
• Please scan the answer pages and upload to canvas as a SINGLE pdf file.
• You have 120 minutes to complete this exam plus time before and after (150 minutes total)
I understand that my work may not be graded without you signing below. I certify that the answers to this test
represents my own work and I have read RU academic integrity policies https://www.cs.rutgers.edu/academicintegrity/introduction
PRINT your name :
SIGN your name :
netID :
Exam Score
Question
Points
1
10
2
10
3
10
4
10
5
10
6
10
7
10
Last
1-4 (bonus)
Total
70 + bonus
Score
grader
i
Question 1 - Linear Regression - 10 points
1. Which of the following are suitable for a linear regression model? Select all that apply.
(a) predicting housing prices based on square footage and number of rooms
(b) Finding the hidden structure in an image
(c) predicting BUY or NO-BUY based on user clicks
(d) Classifying tumors as cancer or malignant
(e) Predicting your final exam score based on prior lab and quiz scores
(f) none of these
2. Given below is a plot of a data set (x,y) points.
We would like to find a linear model that fits the above data well. The following three models were considered
(note the scaling of the axes). In each case, estimate some values of θ0 , θ1 , θ2 that will likely give a good fit to
the data.
(a) y = θ0 + θ1 ∗ x.
(b) y = θ0 + θ1 ∗ x + θ2 ∗ x2 .
(c) y = θ0 + θ1 ∗ x + θ2 ∗ sin(x).
3. Consider the following loss functions defined for the linear regression model hθ (x). Which of the following
functions are considered good and why? (hint: A ”good” loss function has a clear minimum)
(a) L(θ, x, y) = abs(hθ (x) − y).
(b) L(θ, x, y) = (hθ (x) − y)2 .
Page ii
(c) L(θ, x, y) = log2 (hθ (x) − y)
(d) L(θ, x, y) = (hθ (x) − y)3
4. Suppose X is a matrix of n observations where each observation has d feature values and y is the corresponding
actual values vector. The optimal value of θ (least squares regression) is obtained by solving the equation
Xθ − y = 0. Which of the following is the correct optimal analytical solution for θ?
(a) θ = X −1 y
(b) θ = (X T X)−1 y
(c) θ = y/X
(d) θ = X −1 X T y
(e) θ = (X T X)−1 (X T y)
(f) none of these
Question 2 - Classification - 10 points
1. Assume that we train a binary classifer on some data set. Suppose y is the set of observed/true labels and ŷ is
the set of predicted labels. The following table shows the results for 10 observations. Consider values 1 to be
WINS and values 0 to be LOSSES. Define the following categories.
- True Positive (TP) : observations that correctly predicted a win
- False Positive (FP) : observations that incorrectly predicted a win
- True Negative (TN) : observations that correctly predicted a loss
- False Negative (FN) : observations that incorrectly predicted a loss
(a) How many false positives are on the table?
(b) How many True negatives are on the table?
(c) How many WINS were incorrectly predicted as LOSSES?
(d) Recall of a classifier is defined as the number of true positives (TP) over the number of true positives
(TP) plus the number of false negatives (FN) (or TP / (TP+FN)). Find the recall of the classifier as
observed in the above table. Show all work to receive full credit.
Page iii
2. Consider the following figures of different shapes plotted in a two dimensional feature space. Suppose we are
interested in classifying the type of shape based on the location.
(a) Which figure best illustrate the substantial class imbalance? Briefly explain your answer.
(b) Which figure is linearly separable? Briefly explain your answer.
(c) Which figure corresponds to a multi-class classification problem? Briefly explain your answer.
(d) Suppose we apply the following feature transformation φ(x) = [x1 < 0, x2 > 0, 1] where x1 < 0 and x2 > 0
are boolean expressions where if the expression is true, we use value 1 and if the expression is false, we
use the value 0. Which of the above plots is linearly separable and why?
Page iv
Question 3 - Linear Algebra Fundamentals - - 10 points
1. Given two 3D vectors v = [3, 0, 4] and u = [0, 0, 1], find the projection of v onto u.
2. What is eucledean distance between the two vectors v and u? You do not need to simplify.
3. Confirm that two vectors v and u are not orthogonal to each other. Show work.
4. Consider the following table of observations. Answer the follow up questions using the table.
(a) How do we determine if the table has two duplicate rows? Explain briefly using vector terminology.
(b) Suppose that the table contains two columns whose dot product turns out to be zero. What is a possible
explanation for this? Assume that all scores are non-negative.
5. Consider the set of all linear combinations of the two 3D vectors v = [1, 0, 1] and u = [0, 1, 0]. Describe the set
using a geometric interpretation in R3 .
6. Suppose we have an observation-feature matrix of size n by d where at least half the columns are linearly
dependent on the others. Explain how the rank of the matrix is affected by this observation. Note that the
rank of the matrix is the maximum number of lineary independent rows or columns of the matrix.
Page v
Question 4 - Regularization - - 10 points
1. Briefly Explain the concept of regularization.
2. Consider the following formula for regularized loss minimization. Answer the follow up questions based on the
formula.
(a) Which part of the formula is the regularization term?
(b) If we choose a very large λ, how does it affect the coefficients of the matrix Θ?
(c) If we choose a very small λ, how does it affect the coefficients of the matrix Θ?
(d) How do we determine the λ, that is just right?
Page vi
Question 5 - Unsupervised Learning - - 10 points
1. Which of the following data sets is suitable for an unsupervised learning model? Select all that apply.
A subset of useful data from a large dataset
Determine the correct number of categories from some generic data
Detemine the internal structure of a set of images
Find out the most tweeted phrases by political figures
Given data belongs to 3 different classes (red, green, blue), separate them into ther own clusters
none of these
2. The k-means algorithm is an unsupervised learning model used in clustering applications. Given below is the
algorithm.
Answer the following questions based on k-means algorithm.
(a) How do we select cluster centers initially in k-means algorithm?
(b) What is best way to determine the number of clusters (k) that is needed?
(c) If we increase the number of clusters, does the overall/sum error (error is how far a point from its assigned
center) increase or decrease? Why?
(d) How does an outlier affect the cluster centers? How do you propose to minimize the impact from outliers?
(e) If we are clustering m points into k clusters using 100 iterations, write down the big O complexity of the
k-means algorithm using m and k. Do not simplify the terms. Justify your answer.
Page vii
(f) What is the most expensive computational step of the algorithm?
3. The k-means++ algorithm is an improved version of the k-means clustering algorithm. Given below is the
algorithm.
Answer the following questions based on k-means++ algorithm.
(a) What is the purpose of the k-means++ algorithm?
(b) Is it guranteed that k-means++ is always performs better than k-means algorithm? Justify your answer.
(c) If we are clustering m points into k clusters using t iterations, write down the big O complexity of the
k-means++ algorithm algorithm using m, k and t. Do not simplify any terms.
Page viii
Question 6 - Deep Learning - - 10 points
1. Consider the following deep learning architecture.
(a) Assuming no bias in the network, how many θ parameters need to be learned? Justify your answer.
(b) Discuss any strategies for determining the number of neurons in the hidden layer?
2. Consider the following network. The network output at each layer is a binary value (using sigmoid function for
non-linearity). Recall that the sigmoid function g(z) is 1 if z >= 0 and 0 if z < 0.
(a) Determine some values of θ0 , θ1 , θ2 so that the function hθ (x) returns 1 if and only if both x1 and x2 are
1 and otherwise returns 0. The boolean variables x1 and x2 can only take binary (0/1) values. Note that
we compute the boolean AND function in this network. Need to show work to receive credit.
(b) Determine some values of θ0 , θ1 , θ2 so that the function hθ (x) returns 0 if and only if both x1 and x2 are
0 and otherwise returns 1. The boolean variables x1 and x2 can only take binary (0/1) values. Note that
we compute the boolean OR function in this network. . Need to show work to receive credit.
Page ix
(c) Design a network (show neurons and weights) that negates a boolean value. Note that you only need bias
and one input. The network returns 1 if x1 = 0 and 0 if x1 = 1. Draw the network below.
3. Consider the following extended network.
(a) Use the θ values determined in above networks to build a neural network for computing XNOR function.
The XNOR function is defined as (A AND B) OR (Not A AND Not B)
Page x
Question 7 - Recommender Systems - 10 points
1. State two applications of recommender systems.
2. State two issues that you need to deal with in designing recommender systems
3. Consider the following user-item rating matrix.
Find a prediction (using user-user absolute distance) for at least 2 missing points. The formula for computing
missing entry is given by
Page xi
Last Question - WE ARE DONE - 1-4 points - BONUS
Now that you are done with the exam, write a poem, draw a picture, write a brief note to indicate how you
feel right now. Be as creative as you can to receive full credit.
Page xii
Download