Uploaded by Dishant Gupta

Dishant Gupta HW1

advertisement
A1.
To derive the modified normal equations for the given cost function, let's start by rewriting the cost function in
matrix/vector notation.
The cost function E(x(q)) can be rewritten as:
E(x(q)) = 1/2 * ∑[w(i) * (y(i) - hθ(x(i)))^2]
X: The design matrix, in which each row represents a training example and each column a feature. Each
row is denoted by x(i) and has dimensions (m x n), where m is the number of training samples and n
denotes the number of features.
y: The vector containing the goal values for each training case. Its measurements are (m x 1).
The parameter vector containing the linear regression model's coefficients. It has (n x 1) dimensions.
W: The diagonal weight matrix, where the diagonal elements are w(i). It is (m x m) in size.
Using the above notation, we can rewrite the cost function in matrix/vector notation as:
E(x(q)) = 1/2 * (y - Xθ)^T * W * (y - Xθ)
To obtain the modified normal equations, we must first determine the value that minimises the cost function
E(x(q)). This can be accomplished by taking the derivative of E(x(q)) with respect to, setting it to zero, and
solving for.
Expanding the cost function:
E(x(q)) = 1/2 * (y - Xθ)^T * W * (y - Xθ)
= 1/2 * (y^T - θ^T * X^T) * W * (y - Xθ)
= 1/2 * (y^T * W * y - y^T * W * Xθ - θ^T * X^T * W * y + θ^T * X^T * W * Xθ)
The derivative of E(x(q)) with respect to θ:
dE(x(q))/dθ = -1/2 * (y^T * W * X - θ^T * X^T * W * X - θ^T * X^T * W * X + θ^T * X^T * W * X * Xθ)
Simplifying this expression, we get:
dE(x(q))/dθ = -1/2 * (-2 * θ^T * X^T * W * X + 2 * θ^T * X^T * W * X * Xθ)
= θ^T * X^T * W * X * Xθ - θ^T * X^T * W * X
To find the minima, we set the derivative equal to zero:
θ^T * X^T * W * X * Xθ - θ^T * X^T * W * X = 0
Taking the transpose this equation to obtain the modified normal equations:
X^T * W * X * Xθ = X^T * W * y
Simplifying
(X^T * W * X)^(-1) * X
A2.
Let's maximize the log-likelihood function by taking the derivative with respect to μ and setting it equal to
zero.
d/dμ [ln p(D|μ)] = ∑[x(i)/μ - (1 - x(i))/(1 - μ)] = 0
To simplify this expression, let's multiply through by μ(1 - μ) to eliminate the denominators:
∑[x(i)(1 - μ) - μ(1 - x(i))] = 0
Expanding and rearranging terms, we get:
∑[x(i) - μx(i) - μ + μx(i)] = 0
Simplifying further:
∑[x(i) - μ] = 0
Now, let's rearrange the sum:
∑x(i) - mμ = 0
Rearranging terms again, we have:
∑x(i) = mμ
Dividing both sides by m, we get:
(1/m) * ∑x(i) = μ
The left side of the equation is the average number of "heads" in the dataset, denoted as h (the total number
of "heads" divided by the number of coin flips, m). Therefore, the ML solution for μ is:
μ_ML = h
In conclusion, the ML solution for μ is given by μ_ML = h, where h is the average number of "heads" in the
dataset.
A3.
To derive the logistic regression cost function using maximum likelihood, we start with the assumption that
the probability of y given x is described by:
P(y = 1|x; θ) = hθ(x)
P(y = 0|x; θ) = 1 - hθ(x)
where hθ(x) represents the hypothesis function.
Given a dataset D = {(x(1), y(1)), (x(2), y(2)), ..., (x(m), y(m))}, where x(i) represents the input features and
y(i) represents the corresponding label (0 or 1), we want to find the parameters θ that maximize the
likelihood of the observed data.
The likelihood function for logistic regression is given by the product of the probabilities for each example in
the dataset.
L(θ) = ∏[P(y(i) = 1|x(i); θ)^y(i) * P(y(i) = 0|x(i); θ)^(1 - y(i))]
Taking the logarithm of the likelihood function (log-likelihood)
ln L(θ) = ∑[y(i) * ln P(y(i) = 1|x(i); θ) + (1 - y(i)) * ln P(y(i) = 0|x(i); θ)]
Substituting the expressions for the probabilities P(y = 1|x; θ) and P(y = 0|x; θ), we get:
ln L(θ) = ∑[y(i) * ln hθ(x(i)) + (1 - y(i)) * ln (1 - hθ(x(i)))]
By minimizing the negative log-likelihood (NLL) function, which is obtained by negating the log-likelihood
function.
NLL(θ) = - ln L(θ) = - ∑[y(i) * ln hθ(x(i)) + (1 - y(i)) * ln (1 - hθ(x(i)))]
This negative log-likelihood function serves as the cost function for logistic regression.
In summary, the logistic regression cost function derived using maximum likelihood is:
NLL(θ) = - ∑[y(i) * ln hθ(x(i)) + (1 - y(i)) * ln (1 - hθ(x(i)))]
where hθ(x) is the hypothesis function.
Download