Chapter 6: Multiple Linear Regression
• The most popular model for making
predictions is the multiple linear
regression
• Is used to fit a relationship between a
numerical outcome variable Y (also called
the response, target, or dependent
variable) and a set of predictorsX1, X2, …,
Xp (also referred to as independent
variables, input variables, regressors, or
covariates)
• The assumption is that the following
function approximates the relationship
between the predictors and outcome
variable:
• where β0, …, βp are coefficients and ε is the noise or
unexplained part.
• Data are then used to estimate the coefficients and
to quantify the noise.
linear regression
One of the most used methods in the predictions process is a set of
columns x1, x2, x3 used in the forecasting process and I have columns
of outputs called Y in which its value is predicted and determined.
And her basic idea is to try to find the value of the equation.
And the value of x1, x2, x3 is not present in the table but the attempt is
made to calculate the value of the base or beta and through which it is
possible to place the value of x and thus know the value of y.
Explanatory vs. Predictive Modeling
• They are two popular but different objectives behind
fitting a regression model
• Explaining or quantifying the average effect of inputs on an
outcome (explanatory or descriptive task)
• => How is X related to Y?
• Predicting the outcome value for new records, given their input
values (predictive task)
• => If you know X can you predict Y?
• Both explanatory and predictive modeling involve using a
dataset to fit a model (i.e., to estimate coefficients)
• However, the modeling steps and performance assessment
differ in the two cases, usually leading to different final
models.
• Therefore, the choice of model is closely tied to whether
the goal is explanatory or predictive.
• Explanatory
• Is used in Classical statistic
• The entire dataset is used for building the model
• Data are treated as a random sample from a larger
population of interest.
• The model is used in decision-making to generate
statements such as “a unit increase in service speed
(X1) is associated with an average increase of 5 points
in customer satisfaction (Y), all other factors (X2, X3, …,
Xp) being equal.”
• In explanatory models the focus is on the coefficients
(β)
• Predictive
• The focus is on predicting new individual records.
• The dataset are typically split into a training set and a
validation set
• Predictive model is used for micro-decision-making at
the record level
• E.g. The regression model is used to predict customer
satisfaction for each new customer
• In predictive models the focus is on the output (Y)