Dishant Gupta HW1

A1. To derive the modified normal equations for the given cost function, let's start by rewriting the cost function in matrix/vector notation. The cost function E(x(q)) can be rewritten as: E(x(q)) = 1/2 * ∑[w(i) * (y(i) - hθ(x(i)))^2] X: The design matrix, in which each row represents a training example and each column a feature. Each row is denoted by x(i) and has dimensions (m x n), where m is the number of training samples and n denotes the number of features. y: The vector containing the goal values for each training case. Its measurements are (m x 1). The parameter vector containing the linear regression model's coefficients. It has (n x 1) dimensions. W: The diagonal weight matrix, where the diagonal elements are w(i). It is (m x m) in size. Using the above notation, we can rewrite the cost function in matrix/vector notation as: E(x(q)) = 1/2 * (y - Xθ)^T * W * (y - Xθ) To obtain the modified normal equations, we must first determine the value that minimises the cost function E(x(q)). This can be accomplished by taking the derivative of E(x(q)) with respect to, setting it to zero, and solving for. Expanding the cost function: E(x(q)) = 1/2 * (y - Xθ)^T * W * (y - Xθ) = 1/2 * (y^T - θ^T * X^T) * W * (y - Xθ) = 1/2 * (y^T * W * y - y^T * W * Xθ - θ^T * X^T * W * y + θ^T * X^T * W * Xθ) The derivative of E(x(q)) with respect to θ: dE(x(q))/dθ = -1/2 * (y^T * W * X - θ^T * X^T * W * X - θ^T * X^T * W * X + θ^T * X^T * W * X * Xθ) Simplifying this expression, we get: dE(x(q))/dθ = -1/2 * (-2 * θ^T * X^T * W * X + 2 * θ^T * X^T * W * X * Xθ) = θ^T * X^T * W * X * Xθ - θ^T * X^T * W * X To find the minima, we set the derivative equal to zero: θ^T * X^T * W * X * Xθ - θ^T * X^T * W * X = 0 Taking the transpose this equation to obtain the modified normal equations: X^T * W * X * Xθ = X^T * W * y Simplifying (X^T * W * X)^(-1) * X A2. Let's maximize the log-likelihood function by taking the derivative with respect to μ and setting it equal to zero. d/dμ [ln p(D|μ)] = ∑[x(i)/μ - (1 - x(i))/(1 - μ)] = 0 To simplify this expression, let's multiply through by μ(1 - μ) to eliminate the denominators: ∑[x(i)(1 - μ) - μ(1 - x(i))] = 0 Expanding and rearranging terms, we get: ∑[x(i) - μx(i) - μ + μx(i)] = 0 Simplifying further: ∑[x(i) - μ] = 0 Now, let's rearrange the sum: ∑x(i) - mμ = 0 Rearranging terms again, we have: ∑x(i) = mμ Dividing both sides by m, we get: (1/m) * ∑x(i) = μ The left side of the equation is the average number of "heads" in the dataset, denoted as h (the total number of "heads" divided by the number of coin flips, m). Therefore, the ML solution for μ is: μ_ML = h In conclusion, the ML solution for μ is given by μ_ML = h, where h is the average number of "heads" in the dataset. A3. To derive the logistic regression cost function using maximum likelihood, we start with the assumption that the probability of y given x is described by: P(y = 1|x; θ) = hθ(x) P(y = 0|x; θ) = 1 - hθ(x) where hθ(x) represents the hypothesis function. Given a dataset D = {(x(1), y(1)), (x(2), y(2)), ..., (x(m), y(m))}, where x(i) represents the input features and y(i) represents the corresponding label (0 or 1), we want to find the parameters θ that maximize the likelihood of the observed data. The likelihood function for logistic regression is given by the product of the probabilities for each example in the dataset. L(θ) = ∏[P(y(i) = 1|x(i); θ)^y(i) * P(y(i) = 0|x(i); θ)^(1 - y(i))] Taking the logarithm of the likelihood function (log-likelihood) ln L(θ) = ∑[y(i) * ln P(y(i) = 1|x(i); θ) + (1 - y(i)) * ln P(y(i) = 0|x(i); θ)] Substituting the expressions for the probabilities P(y = 1|x; θ) and P(y = 0|x; θ), we get: ln L(θ) = ∑[y(i) * ln hθ(x(i)) + (1 - y(i)) * ln (1 - hθ(x(i)))] By minimizing the negative log-likelihood (NLL) function, which is obtained by negating the log-likelihood function. NLL(θ) = - ln L(θ) = - ∑[y(i) * ln hθ(x(i)) + (1 - y(i)) * ln (1 - hθ(x(i)))] This negative log-likelihood function serves as the cost function for logistic regression. In summary, the logistic regression cost function derived using maximum likelihood is: NLL(θ) = - ∑[y(i) * ln hθ(x(i)) + (1 - y(i)) * ln (1 - hθ(x(i)))] where hθ(x) is the hypothesis function.

Dishant Gupta HW1

Related documents

Products

Support

Dishant Gupta HW1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib