Loss Functions in Linear Regression

Loss function (aka cost function or error function) • Quantifies the difference between the predicted values by the model and the actual values in the training data • Key to understand how machines learn • The goal of training a machine learning model is to minimize this loss • Example: Mean squared error (MSE), Absolute error Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 1 Linear Regression • Supervised machine learning technique • Used to: • Estimate the value of dependent variable for given independent variable(s) • Explain the impact of change in dependent variable for a unit change in independent variable • Check which variable is highly influencing the dependent variable Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 2 𝑦 = 𝑚𝑥 + 𝑐 x = Independent variable y = Dependent variable m = Slope term or Regression coefficient c = Intercept term Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 3 Loss function for linear regression 𝑁 𝑁 𝑆𝑆𝐸 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2 = ෍ 𝑦𝑖 − (𝑚. 𝑥𝑖 + 𝑐) 2 𝑖=1 𝑖=1 Where 𝑥𝑖 =observed value of the independent variable for the ith observation 𝑦𝑖 =observed value of the dependent variable for the ith observation 𝑦ො𝑖 =predicted value of the dependent variable for the ith observation Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 𝑚. 𝑥 + 𝑐 4 Gradient Descent Algorithm c c 𝑁 𝑁 𝑆𝑆𝐸 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2 = ෍ 𝑦𝑖 − (𝑚. 𝑥𝑖 + 𝑐) 2 𝑖=1 Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 𝑖=1 5 𝑁 𝑆𝑆𝐸 = ෍ 𝑦𝑖 − (𝑚. 𝑥𝑖 + 𝑐) 2 Height (Y) 5 𝑖=1 4 3 𝑦1 − 𝑦 ෞ1 1 SSE 2 Weight (X) 1 2 3 4 5 Y-intercept Assume: Slope of the best fit line is known Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 6 𝑁 𝑆𝑆𝐸 = ෍ 𝑦𝑖 − (𝑚. 𝑥𝑖 + 𝑐) 2 𝑖=1 Height (Y) 5 𝑆𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡𝑎𝑛𝑔𝑒𝑛𝑡 = 𝑁 4 𝑑(𝑆𝑆𝐸) 𝑑𝑐 = ෍ −2 𝑦𝑖 − 𝑚. 𝑥𝑖 − 𝑐 3 𝑖=1 Step size = Derivative value x Learning Rate New value = Old value – Step size 𝑦𝑖 − 𝑦ෝ𝑖 1 SSE 2 c=0 Weight (X) 1 2 3 4 5 Y-intercept Assume: Slope of the best fit line is known 7 Cases on Linear regression • Marketwise solutions • Student performance • Food delivery times Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 8 What to look for in results for Linear regression? • Regression Equation: • Represents the relationship between the dependent and independent variables 𝑛 𝑦ො = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑛 𝑥𝑛 = 𝛽0 + ෍ 𝛽𝑖 𝑥𝑖 𝑖=1 • Coefficients: • Intercept (𝛽0): The expected value of the dependent variable when all independent variables are 0. • Slope coefficients (𝛽1,𝛽2,…, 𝛽n): Represent the change in the dependent variable for a one-unit increase in the independent variable, keeping others constant. • Significance: • P-Value: Tests if the coefficient is significantly different from zero. • 𝑝 < α: Significant; variable contributes meaningfully. • 𝑝 ≥ α : Not significant; may not be impactful. • Common values for α : 1%, 5% Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 9 Evaluation & comparison of Regression techniques • Mean Absolute Error (MAE) 1 𝑛 𝑀𝐴𝐸 = σ𝑖=1 𝑦𝑖 − 𝑦ො𝑖 𝑛 • Mean Squared Error (MSE) 𝑀𝑆𝐸 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦ො𝑖 )2 1 𝑛 • Root Mean Squared Error (RMSE) 𝑅𝑀𝑆𝐸 = 𝑀𝑆𝐸 2 𝑆𝑆𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 =1− 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 • R-Squared (R2) 𝑅 • Adjusted R-Squared (1−𝑅2 )(𝑛−1) 2 𝑅𝑎𝑑𝑗 = 1 − 𝑛−𝑝−1 • Mean Absolute Percentage Error (MAPE) 1 𝑛 𝑀𝐴𝑃𝐸 = σ𝑛𝑖=1 Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 𝑦𝑖 −𝑦ො 𝑖 𝑦𝑖 × 100 10 Metric Purpose MAE Provides a straightforward measure of prediction accuracy. Easy to interpret, less sensitive to Ignores the direction of errors and outliers than MSE. doesn't penalize large errors heavily. MSE Evaluates the overall accuracy of the model by emphasizing large errors. Penalizes large errors more heavily, highlighting significant deviations. RMSE Accounts for large errors more Provides an interpretable measure effectively than MAE, easier to of error magnitude. interpret than MSE. R-Squared Weaknesses Sensitive to outliers, harder to interpret due to squaring. Still sensitive to outliers, computationally intensive for large datasets. Measures how well the independent variables explain the target variable's variance. Provides a single measure of Can increase with additional model fit; widely recognized and predictors, even if they don't used. improve the model. Evaluates model fit while More realistic measure of model Can be computationally intensive; performance for multiple may still not identify overfitting regression. completely. Adjusted Rpenalizing for adding irrelevant Squared predictors. MAPE Strengths Useful for scale-independent performance measurement, especially in business contexts. Easy to understand; works well for comparing performance across datasets with different scales. Can produce misleading results when actual values are close to zero. 11 Ways to get a balanced fit model • Cross-validation • Eg. K-fold cross validation • Regularization • Eg. L1 and L2 regularization • Dimensionality reduction • Eg. Principal component analysis • Ensemble techniques • Eg. Max voting, Averaging, Bagging Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 12 Regularization • Technique used to prevent overfitting and improve the generalization performance of models • Overfitting occurs when a model learns to fit the training data too closely, capturing noise and making it perform poorly on unseen data • If we have a large set of data, then we can be confident that the linear regression model will be accurate • What if we have a small set of training data? Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 13 Height (Y) Weight (X) Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 14 Height (Y) Low Bias but High variance - Overfitting We would be better off introducing some bias for reducing variance Weight (X) Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 15 Regularization • Regularization methods add a penalty to the model's objective function to encourage it to have smaller and simpler coefficients • That is, make the predicted variable less sensitive to the input variables by flattening the slope • Two common types of regularization used in linear models: L1 (Lasso) Regularization & L2 (Ridge) Regularization Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 16 Ridge (L2) regularization • Adds the squared values of the coefficients to the loss function • Prevent overfitting by shrinking the coefficients towards zero without forcing them to be exactly zero • L2 Loss function: 𝑆𝑆𝐸 + 𝜆 × σ 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 2 • Here 𝜆 is the multiplier that determines the severity of penalty • Can take any values from 0 to +∞ Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 17 Lasso (L1) regularization • Lasso: Least Absolute Shrinkage and Selection Operator • Adds the absolute values of the coefficients to the loss function • L1 Loss function: 𝑆𝑆𝐸 + 𝜆 × σ 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 18 Values of 𝜆 • At 𝜆=0, Lasso / Ridge regression line and Least squares line is the same • As we increase 𝜆, the slope becomes flatter closer to 0 • With flatter slope, the predicted variable becomes less sensitive to the independent variables • To find optimal 𝜆 try a bunch of values for 𝜆 and use crossvalidation (typically 10-fold CV) to determine which one results in the lowest variance Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 19 Difference between Ridge and Lasso regularization • L1 regularization (Lasso) encourages feature selection and sparsity, while L2 regularization (Ridge) prevents overfitting by shrinking coefficients • When the regularization strength (λ) is sufficiently high, Lasso forces some coefficients to become exactly zero, effectively removing the associated features from the model. • Ridge's penalty, on the other hand, is smooth and continuous, leading to coefficients that are very close to zero but typically not exactly zero, thus preserving all features in the model • Link for more information • The choice between them depends on the problem and the trade-off between complexity and simplicity in the model Lecture slides of Prof Vishwesh Singbal, Goa Institute of Management 20

Loss Functions in Linear Regression

Related documents

Products

Support

Loss Functions in Linear Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib