10/14/2025
Simple Linear
Regression
REIGNER JAY B. ESCARTIN, RMT
College of Medical and Biological Sciences
University of the Immaculate Conception
315
Linear Regression
Linear Regression is the next step up after correlation. It is used
when we want to predict the value of a variable based on the
value of another variable. The variable we want to predict is
called the dependent variable (or sometimes, the outcome
variable). The variable we are using to predict the other variable's
value is called the independent variable (or sometimes, the
predictor variable).
316
Linear Regression
EXAMPLE:
You could use linear regression to understand whether exam
performance can be predicted based on revision time; whether
cigarette consumption can be predicted based on smoking
duration; and so forth. If you have two or more independent
variables, rather than just one, you need to use multiple
regression.
317
1
10/14/2025
Linear Regression & Correlation
❏ Linear regression is closely related to correlation
❏ Both involved relationships between two variables
❏ Both
uses paired scores taken from the same or matched
subjects
❏ Correlation is concerned with the magnitude and direction of
the relationship
❏ Regression focuses on using the relationship for prediction
❏ If the relationship is perfect, the prediction is also perfect.
318
Slope-intercept form of a line
y = mx + b
where,
x = random variable
m = slope of the line (rise/run)
b = y-intercept (crosses y-axis)
319
The Regression Statistical Model
where,
β0 = y - intercept population
parameter
β1 = slope population parameter
ϵ = error term, “noise” that prevent
the x and y from forming a
perfectly straight line, or
unexplained variation in y (also
called Residuals)
Population Regression Model:
“Data = fit + residual”
320
2
10/14/2025
Best Fitting Line Equation
❏ The intercept β0, slope β1, and standard deviation σ of y are the
unknown parameters of the regression model and must be estimated
from the sample data.
❏ The least squares regression line obtained from the sample data is the
best estimate of the true population regression line (
):
❏ The value of ŷ from the least squares regression line is really a
prediction of the mean value of y for a given value of x.
321
General Linear Regression Lines
❏ If E(y) = β0 + 0(x)
❏ Slope β1 is 0
❏ If E(y) = β0 + β1(x)
❏ If E(y) = β0 - β1(x)
❏ Slope β1 is positive
❏ Slope β1 is negative
322
Linear Regression
323
3
10/14/2025
Linear Regression
The line allows us to
predict what score
they would have
based on the
hemoglobin level
324
Linear Regression
y= b + mx
y= -16+8x
b = -16
(intercept)
m=8
(slope/unstandardized
regression coefficient)
y= -16+8(11.5g/dL)
y= 76
325
Linear Regression
y= -16+8(14.5g/dL)
y= 100
y= b + mx
y= -16+8x
b = -16
(intercept)
m=8
(slope/unstandardized
regression coefficient)
y= -16+8(11.5g/dL)
y= 76
This is in the context of a perfect relationship, but in reality we never have perfect relationships.
326
4
10/14/2025
Prediction and Imperfect
Relationships
❏ In imperfect relationships, the task is to determine a single line
that best describes the data.
❏ A line that minimizes errors of prediction
❏ The
least squares regression line - a prediction line that
minimizes the errors of prediction
327
Prediction and Imperfect
Relationships
328
Prediction and Imperfect
Relationships
329
5
10/14/2025
Constructing the least squares
regression line: Regression of y on x
330
Constructing the least squares
regression line: Regression of y on x
331
Prediction and Imperfect
Relationships
Predicted Y’
Observed Y
332
6
10/14/2025
Prediction and Imperfect
Relationships
Predicted Y’
Observed Y
333
Prediction and Imperfect
Relationships
Predicted Y’
Error = Observed Y’ - Predicted Y
= Y - Y’
}
Error/residuals
Observed Y
334
Sum of Squares: Basis of evaluating
the regression model
❏ Most
regression analyses will produce the best model
available, but how good is it actually and how much error is in
the model?
❏ This can be determined by looking at ‘the goodness of fit’
using the sums of squares. This is a measure of how close the
actual data points are close to the modelled regression line.
335
7
10/14/2025
Sum of Squares: Basis of evaluating
the regression model
The vertical difference between the
data points and the predicted
regression line is known as the
residuals. These values are squared
to remove the negative numbers and
then summed to give SSR. This is
effectively the error of the model or
the ‘goodness of fit’, obviously the
smaller the value the less error in the
model.
336
Sum of Squares: Basis of evaluating
the regression model
The vertical difference between the
data points and the mean of the
outcome variable can be calculated.
These values are squared to remove
the negative numbers and then
summed to give the total sum of the
squares SST. This shows how good
the mean value is as a model of the
outcome scores.
337
Sum of Squares: Basis of evaluating
the regression model
The vertical difference between the
mean of the outcome variable and the
predicted regression line is now
determined. Again these values are
squared to remove the negative
numbers and then summed to give
the model sum of squares (SSM).
This indicates how better the model is
compared to just using the mean of
the outcome variable. SST is the total
sum of the squares.
338
8
10/14/2025
Sum of Squares: Basis of evaluating
the regression model
So, the larger the SSM the better the model is at predicting the
outcome compared to the mean value alone. If this is
accompanied by a small SSR the model also has a small error.
R2 is similar to the coefficient of determination in correlation in
that it shows how much of the variation in the outcome variable
can be predicted by the predictor variable(s).
339
Sum of Squares: Basis of evaluating
the regression model
In regression, the model is assessed by the F statistic based on
the improvement in the prediction of the model SSM and the
residual error SSR. The larger the F value the better the model.
340
Reporting the Results
Here it can be seen that the correlation (R) between the two variables
is high (0.784). The R2 value of 0.614 tells us that right leg strength
accounts for 61.4% of the variance in kick distance. DurbinWatson
checks for correlations between residuals, which can invalidate the
test. This should be above 1 and below 3 and ideally around 2.
R 2 provides information on how much variance is explained by the
model using the predictors provided.
341
9
10/14/2025
Reporting the Results
The ANOVA table shows all the sums of squares mentioned
earlier. With regression being the model and Residual being the
error. The F-statistic is significant - 𝑝 = .002. This tells us that the
model is a significantly better predictor of kicking distance than
the mean distance. Report as 𝐹 (1, 11) = 17.53, 𝑝 = .002.
F-statistic provides information as to how good the model is.
342
Reporting the Results
This table gives the coefficients (unstandardized) that can be put in the linear
equation:
The unstandardized (b)y = mx + b
or
y = b + mx
value provides a constant
which reflects the strength
where
the
relationship
y
=
estimated dependent outcome variable score of
between the predictor(s)
m
=
regression coefficient (R_Strength) (slope)
and the outcome variable.
b
=
constant (intercept)
x
=
a score on the independent predictor variable
Distance = 57.105 + (6.452 * 60) = 454.6 m
343
Reporting the Results
A simple linear regression was used to predict rugby kicking
distance from right leg strength. Leg strength was shown to
explain a significant amount of the variance in the kicking
distance: 𝐹(1, 11) = 17.53, 𝑝 = .002, 𝑟 2 = 0.614. The regression
coefficient ( 𝑏 = 6.452 ) allows the kicking distance to be
predicted using the following regression equation:
Distance = 57.105 + (6.452 * Right leg strength)
344
10