Chapter 6: Multiple Linear Regression • The most popular model for making predictions is the multiple linear regression • Is used to fit a relationship between a numerical outcome variable Y (also called the response, target, or dependent variable) and a set of predictorsX1, X2, …, Xp (also referred to as independent variables, input variables, regressors, or covariates) • The assumption is that the following function approximates the relationship between the predictors and outcome variable: • where β0, …, βp are coefficients and ε is the noise or unexplained part. • Data are then used to estimate the coefficients and to quantify the noise. linear regression One of the most used methods in the predictions process is a set of columns x1, x2, x3 used in the forecasting process and I have columns of outputs called Y in which its value is predicted and determined. And her basic idea is to try to find the value of the equation. And the value of x1, x2, x3 is not present in the table but the attempt is made to calculate the value of the base or beta and through which it is possible to place the value of x and thus know the value of y. Explanatory vs. Predictive Modeling • They are two popular but different objectives behind fitting a regression model • Explaining or quantifying the average effect of inputs on an outcome (explanatory or descriptive task) • => How is X related to Y? • Predicting the outcome value for new records, given their input values (predictive task) • => If you know X can you predict Y? • Both explanatory and predictive modeling involve using a dataset to fit a model (i.e., to estimate coefficients) • However, the modeling steps and performance assessment differ in the two cases, usually leading to different final models. • Therefore, the choice of model is closely tied to whether the goal is explanatory or predictive. • Explanatory • Is used in Classical statistic • The entire dataset is used for building the model • Data are treated as a random sample from a larger population of interest. • The model is used in decision-making to generate statements such as “a unit increase in service speed (X1) is associated with an average increase of 5 points in customer satisfaction (Y), all other factors (X2, X3, …, Xp) being equal.” • In explanatory models the focus is on the coefficients (β) • Predictive • The focus is on predicting new individual records. • The dataset are typically split into a training set and a validation set • Predictive model is used for micro-decision-making at the record level • E.g. The regression model is used to predict customer satisfaction for each new customer • In predictive models the focus is on the output (Y)