Logistic Regression: Cost Function To train the parameters π€ and π, we need to define a cost function. Recap: π¦Μ (π) = π(π€ π π₯ (π) + π), where π(π§ (π) )= π₯ (π) the i-th training example 1 (π) 1+ π −π§ πΊππ£ππ {(π₯ (1) , π¦ (1) ), β― , (π₯ (π) , π¦ (π) )}, π€π π€πππ‘ π¦Μ (π) ≈ π¦ (π) Loss (error) function: The loss function measures the discrepancy between the prediction (π¦Μ (π) ) and the desired output (π¦ (π) ). In other words, the loss function computes the error for a single training example. πΏ(π¦Μ (π) , π¦ (π) ) = 1 (π¦Μ (π) − π¦ (π) )2 2 πΏ(π¦Μ (π) , π¦ (π) ) = −( π¦ (π) log(π¦Μ (π) ) + (1 − π¦ (π) )log(1 − π¦Μ (π) ) • • If π¦ (π) = 1: πΏ(π¦Μ (π) , π¦ (π) ) = − log(π¦Μ (π) ) where log(π¦Μ (π) ) and π¦Μ (π) should be close to 1 If π¦ (π) = 0: πΏ(π¦Μ (π) , π¦ (π) ) = − log(1 − π¦Μ (π) ) where log(1 − π¦Μ (π) ) and π¦Μ (π) should be close to 0 Cost function The cost function is the average of the loss function of the entire training set. We are going to find the parameters π€ πππ π that minimize the overall cost function. π π π=1 π=1 1 1 π½(π€, π) = ∑ πΏ(π¦Μ (π) , π¦ (π) ) = − ∑[( π¦ (π) log(π¦Μ (π) ) + (1 − π¦ (π) )log(1 − π¦Μ (π) )] π π