Uploaded by Sharjeel Ahmed

OLS Linear Regression Assumptions: A Guide

advertisement
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
My Store
Glossary
Home
About Me
Contact Me
Statistics By Jim
Making statistics intuitive
Graphs
Basics
Hypothesis Testing
Regression
ANOVA
Probability
Time Series
Fun
7 Classical Assumptions of Ordinary Least
Squares (OLS) Linear Regression
By Jim Frost — 159 Comments
Ordinary Least Squares (OLS) is the most common estimation method for linear models—and
that’s true for a good reason. As long as your model satisfies the OLS assumptions for linear
regression, you can rest easy knowing that you’re getting the best possible estimates.
Regression is a powerful analysis that can analyze multiple variables simultaneously to answer
complex research questions. However, if you don’t satisfy the OLS assumptions, you might not be
able to trust the results.
In this post, I cover the OLS linear regression assumptions, why they’re essential, and help you
determine whether your model satisfies the assumptions.
What Does OLS Estimate and What are Good Estimates?
First, a bit of context.
Regression analysis is like other inferential methodologies. Our goal is to draw a random sample
from a population and use it to estimate the properties of that population.
In regression analysis, the coefficients in the regression equation are estimates of the actual
population parameters. We want these coefficient estimates to be the best possible estimates!
Suppose you request an estimate—say for the cost of a service that you are considering. How
would you define a reasonable estimate?
1. The estimates should tend to be right on target. They should not be systematically too high
or too low. In other words, they should be unbiased or correct on average.
2. Recognizing that estimates are almost never exactly correct, you want to minimize the
discrepancy between the estimated value and actual value. Large differences are bad!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
1/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
These two properties are exactly what we need for our coefficient estimates!
When your linear regression model satisfies the OLS assumptions, the procedure generates
unbiased coefficient estimates that tend to be relatively close to the true population values
(minimum variance). In fact, the Gauss-Markov theorem states that OLS produces estimates that
are better than estimates from all other linear model estimation methods when the assumptions
hold true.
For more information about the implications of this theorem on OLS estimates, read my post:
The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates.
The Seven Classical OLS Assumptions
Like many statistical analyses, ordinary least squares (OLS) regression has underlying
assumptions. When these classical assumptions for linear regression are true, ordinary least
squares produces the best estimates. However, if some of these assumptions are not true, you
might need to employ remedial measures or use other estimation methods to improve the
results.
Many of these assumptions describe properties of the error term. Unfortunately, the error term
is a population value that we’ll never know. Instead, we’ll use the next best thing that is available
—the residuals. Residuals are the sample estimate of the error for each observation.
Residuals = Observed value – the fitted value
When it comes to checking OLS assumptions, assessing the residuals is crucial!
There are seven classical OLS assumptions for linear regression. The first six are mandatory to
produce the best estimates. While the quality of the estimates does not depend on the seventh
assumption, analysts often evaluate it for other important reasons that I’ll cover.
OLS Assumption 1: The regression model is linear in the
coefficients and the error term
This assumption addresses the functional form of the model. In statistics, a regression model is
linear when all terms in the model are either the constant or a parameter multiplied by an
independent variable. You build the model equation only by adding the terms together. These
rules constrain the model to one type:
In the equation, the betas (βs) are the parameters that OLS estimates. Epsilon (ε) is the random
error.
In fact, the defining characteristic of linear regression is this functional form of the parameters
rather than the ability to model curvature. Linear models can model curvature by including
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
2/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
nonlinear variables such as polynomials and transforming exponential functions.
To satisfy this assumption, the correctly specified model must fit the linear pattern.
Related posts: The Difference Between Linear and Nonlinear Regression and How to Specify a
Regression Model
OLS Assumption 2: The error term has a population mean of
zero
The error term accounts for the variation in the dependent variable that the independent
variables do not explain. Random chance should determine the values of the error term. For your
model to be unbiased, the average value of the error term must equal zero.
Suppose the average error is +7. This non-zero average error indicates that our model
systematically underpredicts the observed values. Statisticians refer to systematic error like this
as bias, and it signifies that our model is inadequate because it is not correct on average.
Stated another way, we want the expected value of the error to equal zero. If the expected value
is +7 rather than zero, part of the error term is predictable, and we should add that information
to the regression model itself. We want only random error left for the error term.
You don’t need to worry about this assumption when you include the constant in your regression
model because it forces the mean of the residuals to equal zero. For more information about this
assumption, read my post about the regression constant.
OLS Assumption 3: All independent variables are uncorrelated
with the error term
If an independent variable is correlated with the error term, we can use the independent variable
to predict the error term, which violates the notion that the error term represents unpredictable
random error. We need to find a way to incorporate that information into the regression model
itself.
This assumption is also referred to as exogeneity. When this type of correlation exists, there is
endogeneity. Violations of this assumption can occur because there is simultaneity between the
independent and dependent variables, omitted variable bias, or measurement error in the
independent variables.
Violating this assumption biases the coefficient estimate. To understand why this bias occurs,
keep in mind that the error term always explains some of the variability in the dependent
variable. However, when an independent variable correlates with the error term, OLS incorrectly
attributes some of the variance that the error term actually explains to the independent variable
instead. For more information about violating this assumption, read my post about confounding
variables and omitted variable bias.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
3/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Related post: What are Independent and Dependent Variables?
OLS Assumption 4: Observations of the error term are
uncorrelated with each other
One observation of the error term should not predict the next observation. For instance, if the
error for one observation is positive and that systematically increases the probability that the
following error is positive, that is a positive correlation. If the subsequent error is more likely to
have the opposite sign, that is a negative correlation. This problem is known both as serial
correlation and autocorrelation. Serial correlation is most likely to occur in time series models.
For example, if sales are unexpectedly high on one day, then they are likely to be higher than
average on the next day. This type of correlation isn’t an unreasonable expectation for some
subject areas, such as inflation rates, GDP, unemployment, and so on.
Assess this assumption by graphing the residuals in the order that the data were collected. You
want to see randomness in the plot. In the graph for a sales model, there is a cyclical pattern
with a positive correlation.
As I’ve explained, if you have information that allows you to predict the error term for an
observation, you must incorporate that information into the model itself. To resolve this issue,
you might need to add an independent variable to the model that captures this information.
Analysts commonly use distributed lag models, which use both current values of the dependent
variable and past values of independent variables.
For the sales model above, we need to add variables that explains the cyclical pattern.
Serial correlation reduces the precision of OLS estimates. Analysts can also use time series
analysis for time dependent effects.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
4/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
An alternative method for identifying autocorrelation in the residuals is to assess the
autocorrelation function, which is a standard tool in time series analysis.
Related post: Introduction to Time Series Analysis
OLS Assumption 5: The error term has a constant variance (no
heteroscedasticity)
The variance of the errors should be consistent for all observations. In other words, the variance
does not change for each observation or for a range of observations. This preferred condition is
known as homoscedasticity (same scatter). If the variance changes, we refer to that as
heteroscedasticity (different scatter).
The easiest way to check this assumption is to create a residuals versus fitted value plot. On this
type of graph, heteroscedasticity appears as a cone shape where the spread of the residuals
increases in one direction. In the graph below, the spread of the residuals increases as the fitted
value increases.
Heteroscedasticity reduces the precision of the estimates in OLS linear regression.
Related post: Heteroscedasticity in Regression Analysis
Note: When assumption 4 (no autocorrelation) and 5 (homoscedasticity) are both true,
statisticians say that the error term is independent and identically distributed (IID) and refer to
them as spherical errors.
OLS Assumption 6: No independent variable is a perfect linear
function of other explanatory variables
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
5/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Perfect correlation occurs when two variables have a Pearson’s correlation coefficient of +1 or -1.
When one of the variables changes, the other variable also changes by a completely fixed
proportion. The two variables move in unison.
Perfect correlation suggests that two variables are different forms of the same variable. For
example, games won and games lost have a perfect negative correlation (-1). The temperature in
Fahrenheit and Celsius have a perfect positive correlation (+1).
Ordinary least squares cannot distinguish one variable from the other when they are perfectly
correlated. If you specify a model that contains independent variables with perfect correlation,
your statistical software can’t fit the model, and it will display an error message. You must
remove one of the variables from the model to proceed.
Perfect correlation is a show stopper. However, your statistical software can fit OLS regression
models with imperfect but strong relationships between the independent variables. If these
correlations are high enough, they can cause problems. Statisticians refer to this condition as
multicollinearity, and it reduces the precision of the estimates in OLS linear regression.
Related post: Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
OLS Assumption 7: The error term is normally distributed
(optional)
OLS does not require that the error term follows a normal distribution to produce unbiased
estimates with the minimum variance. However, satisfying this assumption allows you to
perform statistical hypothesis testing and generate reliable confidence intervals and prediction
intervals.
The easiest way to determine whether the residuals follow a normal distribution is to assess a
normal probability plot, a type of QQ plot. If the residuals follow the straight line on this type of
graph, they are normally distributed. They look good on the plot below!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
6/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
If you need to obtain p-values for the coefficient estimates and the overall test of significance,
check this assumption!
Why You Should Care About the Classical OLS Assumptions
In a nutshell, your linear model should produce residuals that have a mean of zero, have a
constant variance, and are not correlated with themselves or other variables.
If these assumptions hold true, the OLS procedure creates the best possible estimates. In
statistics, estimators that produce unbiased estimates that have the smallest variance are
referred to as being “efficient.” Efficiency is a statistical concept that compares the quality of the
estimates calculated by different procedures while holding the sample size constant. OLS is the
most efficient linear regression estimator when the assumptions hold true.
Another benefit of satisfying these assumptions is that as the sample size increases to infinity,
the coefficient estimates converge on the actual population parameters.
If your error term also follows the normal distribution, you can safely use hypothesis testing to
determine whether the independent variables and the entire model are statistically significant.
You can also produce reliable confidence intervals and prediction intervals.
Knowing that you’re maximizing the value of your data by using the most efficient methodology
to obtain the best possible estimates should set your mind at ease. It’s worthwhile checking
these OLS assumptions! The best way to assess them is by using residual plots. To learn how to
do this, read my post about using residual plots!
If you’re learning regression and like the approach I use in my blog, check out my Intuitive Guide
to Regression Analysis book! You can find it on Amazon and other retailers.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
7/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Filed Under: Regression
Tagged With: assumptions
Comments
Quang Dat says
September 21, 2023 at 7:02 pm
This article is amazing!
Reply
SOLTANI Hachemi says
December 23, 2022 at 8:37 am
Firstly, many thanks for these valuable explanations
Secondly, I have a question about the banner models.
What are the assumptions of the fixed effects and random effects model
How can random effects model assumptions be tested through STATA?
Is unit root testing a necessary condition to perform the previous models?
Thanks in advance
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
8/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
g.bez says
November 17, 2022 at 8:06 am
I just discovered your website and I have to say it. This is excellent content. It’s really
hard to find detailed and well explained tutorials like this one, covering all the nuances
of statistics topics online. Your entire website is extremely helpful. Thank you very
much.
Reply
Jim Frost says
November 18, 2022 at 4:17 pm
Thanks so much! I’m thrilled to hear that my website has been helpful!
Reply
Marcelo Castro says
October 21, 2022 at 11:54 am
Jim,
I hope you are ok.
In order to adjust models that mix nonstationary IVs with stationary IVs is it needed a
special software? I mean, if you need to use first differenced IVs with non differenced
IVs in the same multiple regression, what are we going to do with the DV???
Thanks!
Reply
Jim Frost says
October 23, 2022 at 4:46 pm
Hi Marcelo,
If the DV is non-stationary, you probably only need to difference the non-stationary
DVs.
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
9/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Waison Makumbirofa says
October 18, 2022 at 4:08 am
Good day Jim
Help me link the below formulas to the assumptions
a) E(ui/Xi) =0
b) Var(ui/Xi) =
c) cov(ui,uj/xi,xj)=0
d) no perfect multicollinearity
Reply
Jim Frost says
October 19, 2022 at 1:25 am
Hi Waison,
All those use notation with which you should be familiar. U represents the error.
A relates to the 2nd assumption in my post. The E is for expected value or mean.
B relates assumption 5 in my post. Var is variance and it should be equal across all
values of X. In other words, no heteroscedasticity.
C relates to assumption 4. There should be no autocorrelation amongst the error
term. Cov is covariance, an unstandardized form of correlation.
No perfect multicollinearity relates to assumptions six about no perfect linear
relationship between independent variables.
I hope that helps!
Reply
Marcelo Castro says
October 15, 2022 at 10:16 am
Jim,
When you mention “changes”, do you mean differencing IVs and DV? I am trying to
avoid (very much) this…I am trying this alternative: if the errors are AR(1), they are
stationary and therefore there is less chance to have a spurious regression. And when
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
10/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
the selected IVs will be working together in the model, maybe I get a model with non
significant residuals autocorrelation, as time order related IVs will be included.
Thank you
Reply
Jim Frost says
October 17, 2022 at 3:15 pm
Hi Marcelo,
Yes, that’s right. But if your data are truly stationary, it’s not necessary. I just
wanted to point out a possible condition where R-squared would be misleading
even when you satisfy the assumptions because you were asking about the
goodness-of-fit measures.
Reply
Marcelo Castro says
October 14, 2022 at 7:58 pm
Jim,
I hope you are ok.
Working with time series regression is is just like walking on thin ice, so I would like to
ask you if quality of fit measures like R2predicted and Mallows Cp can also be used for
evaluating models like we do with cross sectional studies. Please suppose that OLS
assumptions are being met.
Thank you very much
Reply
Jim Frost says
October 14, 2022 at 10:30 pm
Hi Marcelo,
For time series OLS models, the same assumptions apply with the difference being
that it is more likely that the autocorrelation assumption won’t be satisfied.
However, if all assumptions are satisfied, yes, you can use all the regular goodnessof-fit measures.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
11/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
If you’re using Mallows’ Cp to find a subset model that is more precise than the full
model, just be sure to check all the assumptions for your candidate subset models.
It’s possible that the subset models won’t satisfy them all. Mallows’ Cp won’t
indicate that.
Also be aware that if your IVs and DVs both contain a trend over time, you’ll get an
overly inflated R-squared even when all the assumptions are satisfied. In this
scenario, you need to use the changes in the relevant IV to predict the changes in
the DV.
So, there are some additional concerns with time series OLS, but in general, yes
you can use the same goodness-of-fit measures.
Reply
Waison Makumbirofa says
October 14, 2022 at 2:02 am
Morning Jim. How do i tackle the question given in this way “Discuss the following
assumptions of the Classical Linear Regression Model (CLRM) model, outlining the
consequences of violating these assumptions.
a) E(ui |X_i )=0
b) Var ( ui/Xi) =σ^2
c) Cov(ui,uj/ Xi,Xj) =0
d) no perfect multicollinearity
How can i no this formular represents which assumption?
Reply
Rita Fontes says
October 13, 2022 at 6:00 pm
Thanks for your detailed response, Jim!
Regarding the interaction question, I think that the problem indeed relies on the fact
that I have few observations. The thing is: I can’t add more observations because I am
studying a certain type of service, where the number of observations is limited by itself
and I am studying almost the whole population. This means that I have a very
unbalanced dataset, because the independent variables are characteristics of the
companies providing that service, and given the high heterogeneity among them, this
inevitably results in significant differences. On the other hand, ignoring some
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
12/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
companies, to obtain a more balanced dataset, doesn’t seem right, because, that way,
I’m biasing the results.
I am aware of the 10 observations/variable rule, but in this case it seems impossible to
comply with. I’ve seen some works in the same field where this rule is often violated,
because the number of things being analysed is very small. At this moment, I have a
ratio of 7 observations/variable, but, for example, one level of a categorical variable
only has one observation. It is not ideal but I have to work with the data that has been
given me.
Do you have any advice you would like to give me, in order to tackle this situation?
Reply
Rita Fontes says
October 12, 2022 at 1:54 pm
Hi Jim,
I have some questions about assumption #3. You say in your book the following:
“If an independent variable is correlated with the error term, we can use the
independent variable to predict the error term, which violates the notion that the error
term represents unpredictable random error. We need to find a way to incorporate that
information into the regression model itself. This assumption is also referred to as
exogeneity. Violations of this assumption can occur because there is simultaneity
between the independent and dependent variables, omitted variable bias, incorrectly
modeled curvature, or measurement error in the independent variables. (…) To check
this assumption, graph the residuals by each independent variable.”
This leads me to several questions:
– First of all, the error term is almost always unkown, we only know the residuals.
Besides, from what I read about this, endogeneity exists if the covariance between the
error and the regressor is different than zero. So, when doing a regression and
computing the residuals to assess this assumption, how can endogeneity be detected?
Because, thinking of the way the OLS works, it is intuitive to me that the covariance will
always be zero and so, endogeneity will never be detected. On the other hand, plots can
be really useful, but only for cases of incorrect model curvature (to detect nonlinear
behaviours). So the question that arises is: how can I detect endogeneity, for exemple,
due to simultaneity between the independent and dependent variables? It appears to
me that we either know beforehand (for example, the relationship between price, x, and
demand, y, is endogeneous), and take that into account, or we simply don’t know, and
there are no tests that can clearly indicate its presence. I am new to this world of
regression, so I would like to know if I’m interpreting this whole situation correctly or
not.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
13/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
– Secondly, how can I evaluate the presence of endogeneity for categorical variables
with several levels? Is it relevant to analyse the residuals plot for this type of variable?
– Lastly, I have a question about a different subject but to make things easier I’ll just ask
here. I am now doing a regression analysis and I was trying to evaluate a theory I had,
so I added an interaction between two dummys (pretty much like the hot dog –
chocolate example you gave). The thing is that without the interaction one of the
variables is signifcant and the other is not, however when I did the interaction between
them everything turned insignificant (main effects and interaction). Does this means
that the interaction is not modeling well the outcome and thus I should not use it, or is
there some other reason that I’m not seeing?
Thanks in advance!
Reply
Jim Frost says
October 13, 2022 at 4:47 pm
Hi Rita,
Yes, you’re correct that the residuals are an estimate of the true error in a similar
fashion that the regression coefficients estimate the population parameters.
Unfortunately, we almost never know the true population values for the
parameters or error. For more information, read my post about Residuals.
However, we can still learn a lot from residuals–just like we can learn a lot from the
coefficient estimates.
To detect correlations between the residuals and the independent variables, use
residual plots and calculate the correlations. Generally, when a correlation exists
between an IV and the residuals, you can see it by graphing the residuals by that IV.
You can also calculate the correlation between them.
For a categorical IV, again, just graph the residuals by the various levels of the
categorical variable. You want to see random scatter that follows the same
distribution for all levels.
Finally, regarding the interaction question. That’s interesting. You don’t usually see
everything becoming insignificant like that. Here are several things to consider.
How many observations do you have versus terms in the model. You just might
have too few observations, producing low statistical power, for the number of
terms. It might be the case that including the interaction term reduces power just
enough to make everything not significant. Before adding the interaction term, are
the p-values very significant or borderline? How many observations do you have? If
you have two IVs and the interaction, you should have a bare minimum of 30
observations, but more is better.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
14/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
It sounds like you have categorical IVs. However, if they are continuous, I’d
recommend centering the continuous variables. That might restore the significance
of the main effects. It won’t help the interaction term but at least you’d have
consistent results.
Unfortunately, I don’t have enough information to say whether your interaction
term should be included or not. You’ll really need to use subject-area knowledge to
make that determination. If the interaction is theoretically sound, you wouldn’t
want to remove it just due to the statistical insignificance. If you don’t include an
interaction that is warranted, you can’t trust the results for the main effects.
On the other hand, if the interaction isn’t warranted, you’d rather not include it but
it’s not as bad if you include it unnecessarily.
In other words, if you have to choose between excluding a relevant variable (or
term) vs. including one that might not be relevant, it’s better to err on the side of
including an irrelevant term (which would be the interaction term in this case). If
you include an unnecessary interaction term, it won’t bias the main effects.
I don’t have clear answers for your interaction question but consider its theoretical
relevance. And if you’re on the low side in terms of observations, it might be the
case that you just don’t have enough statistical power and might need to obtain a
larger sample to get a clear answer.
Reply
Marcelo Castro says
September 7, 2022 at 8:37 pm
Thanks a lot, Jim. I will think twice before using transformations. Only as a last resource.
Reply
Marcelo Castro says
September 6, 2022 at 10:26 am
Hi Jim,
Could you help me with another doubt? Suppose you need to transform some IVs and,
for the rest of them, you need to use other transformation. How do I back transform the
results after building the multiple regression model, in order to get useful coefficients? I
am not modeling for predicting, only for estimating relationships.
Thank you
Marcelo
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
15/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Jim Frost says
September 6, 2022 at 5:06 pm
Hi Marcelo,
The back transformation is the reverse of the transformation. It undoes the
transformation so that you go from the transformed data units to the
untransformed (or natural) data units. Consequently, the nature of the
transformation depends on the transformation.
Just be aware that transformations tend to disguise the true nature of the
relationship. Just as an example, you can get a coefficient in natural units that
makes appear to be a linear relationship, but in reality, it might reflect a curvilinear
relationship. You’ll need to be extra careful about graphing the data to understand
the relationship.
For an example, read my post about log-log plots, which use log transformed
models. In the second example, I show a fitted plot that shows the actual
relationship being fit in untransformed units. It’s entirely different than what the
transformed data look like!
Reply
Marcelo Castro says
August 17, 2022 at 10:20 am
Thank you very much, Jim.
Best wishes.
Reply
Marcelo Castro says
August 16, 2022 at 8:17 am
Thank you for the answer, Jim. Do you know if there is a kind of value for the
autocorrelation that could be used as a limit? I mean, the model woud be damaged if
the value I got from the residual analysis were greater than this limit. This is different
from “there is” and “there is not” that we get using hypothesis testing.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
16/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Jim Frost says
August 17, 2022 at 12:43 am
Hi Marcelo,
The Durbin-Watson statistic is the one with which I’m most familiar. It is a test
where it detects autocorrelation or not. Specifically, the test determines whether
the autocorrelation is significantly different from zero (no autocorrelation). I know
that you’re looking more for an amount of autocorrelation that causes problems
rather than a yes/no type test result. However, if the test indicates your model has
this condition, you should take some remedial action because a problem exists.
You mentioned that you use Minitab (and I used to work there too!), so here is a
link to their help page about the Durbin-Watson Test for Autocorrelation. It
contains some additional information about using the test.
Reply
Marcelo Castro says
August 15, 2022 at 4:00 pm
Hi Jim,
First of all, thank you very much for the work you have been doing. I bought your book
on regression and it really helps. My doubt is related to time series data, which
generally violate OLS assumption on autocorrelated residuals. I must build a multiple
regression model in order to estimate the relationship between internal company
indicators and customer satisfaction, both collected in a time basis. Do you think OLS
can be used if residuals show no autocorrelation? In case they are autocorrelated, is
there any kind of remedial measure that can be applied? I have read a little bit on
generalized least squares, but I work with Minitab 18 and could not find it there. I think
that it would be also needed to use lags for these data.
Thank you!
Reply
Jim Frost says
August 15, 2022 at 11:11 pm
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
17/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Hi Marcelo, you’re very welcome! And I’m glad to hear that my regression book has
been helpful.
Yes, it is often possible to model time series data using OLS but you have additional
concerns that you don’t have for non-time series data. As you point out,
autocorrelation can be a problem with time series data. This condition exists when
residual can predict another residual. Residuals are correlated. To resolve this
problem, you need to add one or more variables provides the explanatory
information that is contained within the correlated residuals. By including that
information as variables in the model, it should no longer be present in the
residuals.
These variables can be new variables or lagged variables for the independent
and/or dependent variables. Lagged values are past values of a variable. In this
manner, you can use previous values to explain the current values, which can
reduce the autocorrelation. You’ll need to do some research, try fitting some
models with lagged variables, and, if at all possible, see what others with similar
research have done.
You mentioned lags and that is a key approach I would try! I don’t discuss this
problem at length in my regression book, but do take a look at OLS Assumption #4,
“Observations of the error term are uncorrelated with each other,” in Chapter 9 for
an example and a few pointers on this specific issue.
Reply
Adrian Olszewski says
June 28, 2022 at 6:09 am
Actually, the dependent value may be totally skewed and perfectly conditionally normal.
We should remember that every regression is about the conditional statistic (mean,
quantile, robust central tendency). If there is at least one categorical covariate, the
difference between the distribution of the DV and the conditional DV may differ as hell.
If the IVs are all numerical, both distributions may be similar and then – yes it makes
some sense to transform. Typically we have the GLM, which doesn’t touch the DV, but
transforms with the conditional expected value. Moreover, the DV may be totally
skewed, and the numeric IV – as well, in the same direction, which will result in pretty
nice bell-shape distribution of residuals. Transforming the DV in this case may result in
totally spoilt residuals, and thus – the entire model (misspecified). Not to mention, that
transformations of the DV change so many things, so the final result may be hard to
predict and justify on theoretical ground.
https://www.quora.com/Why-is-the-Box-Cox-transformation-criticized-and-advisedagainst-by-so-many-statisticians-What-is-so-wrong-with-it/answer/Adrian-Olszewski-1?
ch=10&share=b727f842&srid=MByz
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
18/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Getahun Tsegaye says
June 20, 2022 at 8:24 am
in what cases can the assumption of zero conditional mean of error term is violated and
how it affects asymptotic property of OLS estimators
Reply
Jim Frost says
June 20, 2022 at 4:52 pm
Hi Getahun,
If you don’t fit a constant and the model isn’t the correct model, the residual means
might not be zero. If you fit the constant, the mean will equal zero even when the
model is incorrect.
Violating that assumption causes biased estimates.
Reply
Tina LB says
January 15, 2022 at 9:15 am
Hi Jim, first of all a big thank you for all your help, it is a life saver.
I would like to ask you if, after the D’Agostino K2 test I obtain a normal distribution but
when rerunning the regression with the new logdependentvariable, the independent
variable is no longer statistically significant. I am lost here. Should I still include it in the
regression model? (Sorry for such a basic question, I am very new to stat – two weeks
now). Thank you
Reply
Sanjiv says
December 12, 2021 at 5:31 pm
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
19/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Hi Jim, I am running backward regression model to predict outcome variable and as the
algorithm automatically selects the significant variables, I am now thinking at what
stage do I need to check the linear regression assumptions and what do they mean in
this scenario? Are they really play a part now that model is been selected by automated
algorithm?
Reply
Jim Frost says
December 12, 2021 at 11:37 pm
Hi Sanjiv,
Yes! You definitely need to check the assumptions! Typically, I’d check the
assumptions for the final model that the process settles on. That’ll help you
determine whether you should tweak that model. Keep in mind, that, according to
research, stepwise regression generally gets you close to the correct model but not
quite all the way there. For more information, read my post about stepwise vs. best
subsets regression.
You should check all assumptions for the model you end up using even when an
algorithm selects it. The use of an algorithm doesn’t eliminate the need to
verify/satisfy the assumptions.
Reply
Dianelena Eugenio says
December 2, 2021 at 11:28 am
Thanks Jim for your great way to explain these topics. Do you know if normality is
needed for GLS? I want to fit that model and also do hypotheses tests.
Reply
Francesco says
November 9, 2021 at 7:07 am
Hello, first of all thanks for this very informative post. I have a question about it.
So an important assumption is to have the error term uncorrelated with the
independent variables, otherwise our model will be biased. I read a good way to think
about it is to try to introduce the variables in the error term that are correlated with the
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
20/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
variables in the model so that you can better isolate the effects of the variables
themselves.
But then another problem is multicollinearity, so high correlation between the
regressors, which can create problems for statistical tests and estimation of
parameters. So by solving the first problem aren’t we generating correlation between
the regressors? I can’t wrap my head around it.
Reply
Jim Frost says
November 10, 2021 at 7:33 pm
Hi Francesco,
Yes, there’s definitely a tradeoff there! I write about this tradeoff towards the end of
my post about confounding variables. Generally, you want to add confounders to a
model but there are cases where including them can introduce too much
multicollinearity. You’ll need to track multicollinearity to see if that happens. If you
do have too much multicollinearity when you include them but you cannot leave
the potential confounders out of the model because you’ll have too much bias, you
might need to use another form of regression that can handle multicollinearity,
such as LASSO or Ridge Regression.
Reply
Nima Lord says
October 26, 2021 at 12:07 pm
Dear Jim
I want to predict electricity load consumption of the next day using MLR. I have
temperature, humidity, and calendar variables.
since so many other variables affect on electricity consumption, can I use lagged
electricity load consumption to reduce prediction error?
because the effect of other variables can be seen on lagged dependant variables.
Reply
Jim Frost says
October 28, 2021 at 12:16 am
Hi Nima,
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
21/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Yes, for time series models, it can be appropriate to used lagged values. You’ll need
to see if works for you subject area, but it is a worthwhile consideration.
Reply
D Clark says
October 17, 2021 at 8:19 am
Thank you for quick response. I did continue to read and understand your response.
I referenced your website several times for a previous class I took. I really appreciated
how the information is presented in a conversational style. I just started a new class
specifically on regression analysis, so I felt really comfortable buying the book after
knowing your style of writing. I have found it difficult to find my footing in a lot of the
concepts and your content is very helpful because I like to start at “the beginning” of
things and understand the concepts. Your content is perfect. I hope you continue to
publish. I am taking a class at WGU and the course content is snippets of disconnected
content. I will be sure to review your book and post in the student section a reference to
your content for other students. Thank you!
Reply
D Clark says
October 16, 2021 at 8:38 pm
Hi Jim,
I just purchased your E-book “Regression Analysis An Intuitive Guide” today. I have only
read up to page 43; however, I do have a question. On page 43, there is a scatter plot of
preteen girls’ height and weight. In the subsequent paragraph (under the graph) the
following statement is made:
The height coefficient in the regression equation is 106.5. This coefficient represents the
mean increase of weight in kilograms for every additional one meter in height. This
study sampled preteen girls in the United States. Consequently, if a preteen girl’s height
increases by 1 meter, the average weight increases by 106.5 kilograms.
I don’t deal much with meters and kilograms, so I had to convert them which would
roughly be translated to:
If a preteen girls height increases by 3 feet, the average weight increases by 234
pounds. What am I missing here? These numbers seem a bit off for preteen girls. I saw
the reference to “106.5” initially on page 33. It didn’t make much sense but I kept going
and now I’m seeing it again on page 43.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
22/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
I can see on the scatter plot the IV of height ranges from 1.3 to 1.7 meters which is
roughly 4’2 to 5.5 and the DV of weight ranges from about 66 pounds (30 kg) to 176
pounds (80 kg). How can the average weight increase by 234 pounds for every 3 foot of
growth?
I feel like this is a stupid question, but I haven’t been able to reconcile it. Can you help?
Thanks
Reply
Jim Frost says
October 16, 2021 at 11:43 pm
Hi Denise,
Thanks so much for buying my book. I hope you’ve been finding it helpful!
That’s an excellent question because it illustrates an important point about
interpreting regression analysis–don’t extrapolate results outside the observation
space. I cover that concept in more detail in the predictions chapter but I do allude
to it in 2nd and 3rd paragraph below the the graph in question. “Keep in mind that
it is only safe to interpret regression results within the observation space of your
data . . . We don’t know the nature of the relationship between the variables
outside the range and population of our dataset.”
We can’t shift a full meter for these data. So, we shouldn’t try to interpret it for a
full meter. The data range from 1.3 to 1.7 meters. There’s no way we’d expect a
1m/3ft a height difference. So, we’re well outside the sample space.
We’ll have to stick to fractional changes within that space. So, if you were to shift
right by 0.1 m within that range, you’d expect the average weight to increase by
10.65 kg. Equivalently an increase of 3.9 inches will increase weight by an average
of about 23.5 pounds. Almost exactly 6 pounds per inch!
It’s an important point because when you go outside the sample space, you’re
either considering unrealistic scenarios or are potentially in an area of real data
that you haven’t modeled and the relationship can change. I actually show an
example of relationships changing outside the sample space when I talk about the
constant beginning on page 64. The same ideas apply to the regression coefficients.
For these data, we just don’t know what the relationship between height and weight
is outside 1.3 to 1.7 meters. Or for other age groups or boys for that matter! We
can only estimate the relationship for girls of this age who are between 1.3 – 1.7
meters.
It’s not a stupid question at all and I hope I helped clarify it!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
23/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Ibrahimu d yilala says
October 13, 2021 at 2:48 pm
What are the effects of violating the assumptions of normality of error terms in the
regression analysis
Reply
Jim Frost says
October 13, 2021 at 5:04 pm
Hi Ibrahimu,
That depends on why the residuals are not normal. If they’re nonnormal because
the model is specified incorrectly, then you can have biased coefficients.
If you specify the correct model and the residuals are not normal, you might not
have biased coefficients but you won’t be able to trust the p-values and confidence
intervals.
Reply
Bradley James Quiring says
September 14, 2021 at 2:17 pm
Hi Jim
I’ve just worked through a regression model using backwards elimination in Excel. I’ve
finished up with two independent variables (square footage and number of garage
spaces) to predict the market value of a home. Some of the variables I thought would
need to stay in the model, such as no. rooms, no. of bathrooms, property size, were
eliminated due to either high VIFs or large p-values.
In my final model, everything looks good:
Adjusted r square = .93. All p-values well below 0.01. n = 59
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
24/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
However, in doing my residual analysis I get what looks like a U-shape when plotting the
residuals against home size–my understanding of which is that the model is
underpredicting market value for smaller and larger homes and over-predicting market
value for the mid-size homes. I’m not sure how best to interpret this or what to do
about it. ( I’d love to paste it here but am unable to. It does look a lot like the scatterplot
on page 191 of your book.)
Should I consider adding an interaction term? Do I need to start all over again? Might it
be possible that my backwards regression methodology has produced an incomplete
model?
I’ll continue to search for the answers in your book, but any further guidance you could
provide would be much appreciated.
Thanks,
Brad
Reply
Jim Frost says
September 14, 2021 at 2:25 pm
Hi Bradley,
That’s a sure sign that you have curvature in your data. Try adding square footage
squared (in addition to square footage) to model that curvature. See if that helps!
Reply
Kjartan says
August 16, 2021 at 9:09 am
Thank you very much for these very clear insights in the world of statistics. 🙂
I also have a question:
Do these assumptions hold for large sample sizes? (100 000+)
Thanks in advance!
Reply
Jim Frost says
August 16, 2021 at 10:50 pm
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
25/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Hi Kjartan,
Yes, almost all the assumptions apply to very large datasets. However, the very
large sample size might let you waive the normality assumption for the residuals
thanks to the central limit theorem.
Reply
Mohammed Seid says
July 6, 2021 at 4:32 am
which of these assumptions is not relevant for ensuring the unbiasedness of the OLS
estimates?which of the assumptions could be considered optional?
Reply
Taiwo Oluyinka says
June 28, 2021 at 10:12 am
Hi, this is very educative and it makes OLS simpler.
please, i would like to know more about the 7th assumption: The error term is normally
distributed.The explanation for the violation and the correction. Can you please give
more explanation on it
Reply
Yonas says
June 26, 2021 at 2:21 am
Hi Jim first of all I want to say thank you for your valuable information. My question is
for probit model is it mandatory to test for classical linear regression model
assumptions considering nonlinear relationship between dependent variable which is
dummy and independentvariables? thank you in advance.
Reply
Troy Palmer says
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
26/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
June 10, 2021 at 1:21 pm
Jim,
I am trying to conduct a regression analysis of entrance exam scores (predictor
variables) and course grades (dependent variable).
Z
R
Ell
Sll
Ill
Dll
ll
ll
ZPRED
What does this look like to you?
Reply
Roshni Maji says
March 21, 2021 at 5:16 pm
How are these assumptions altered in multivariable cases?
Reply
Jim Frost says
March 22, 2021 at 12:50 am
Hi Roshni,
These assumptions apply without changes to multivariate cases.
Reply
sunny says
March 21, 2021 at 3:46 pm
Hi Jim, for Hubert’s question, what method we should apply instead of OLS? THX
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
27/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Jim Frost says
March 22, 2021 at 12:54 am
Hi Sunny,
When you have endogenous independent variables, you can use instrumental
variables or two-state least squares (2SLS) regression.
Reply
MARSHALL says
March 7, 2021 at 2:57 pm
Hi Jim
My question is on unit root tests. Some of my explanatory variables are
dichotomous/dummy variables taking a value of 0 or 1. These variables are non
stationary even after first differencing. Is it necessary to conduct stationarity tests on
such variables or they can be presumed to be stationary, since they take a value of o or
1? I want to carry out panel data regression..my N (no of cross-sections)=8, and T-7
years ,
Reply
Rainer says
February 16, 2021 at 11:01 am
Dear Jim,
I just discovered your website and your blog. Congratulations, most of the textbooks I
know are not able to explain the concepts in such an easy way (and most of them don’t
even mention the distinction between errors and residuals!) as you do. I will go through
your posts to learn something I do not know, yet and I will surely recommend your
website to my students. Thanks for your work!
Greetings from Germany,
Rainer
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
28/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
P.S.: Have you considered writing something about Bayesian Analyses (or have your
already)? Since you describe the NHST so precisely, I a pretty sure you are aware of the
shortcomings (especially that people typically interpret the results in a bayesian
manner, what the results clearly not warrant). I would love to read similar concise blogs
about this topic from you. I think bayesian inferencing will grow bigger in the near
future.
Reply
Jim Frost says
February 16, 2021 at 4:55 pm
Hi Rainer,
Thanks so much for your kind words. You made my day! And thanks so much for
recommending my site to your students! 🙂
I do plan to write about Bayesian Analyses down the road. I agree that they can be
highly useful. I’ve written about the challenges of interpreting p-values and NHST
results. In fact, in my hypothesis testing book, I include an entire chapter about
those issues. However, it’ll probably be a little while before you starting seeing that
content because there’s just so much on my to-do list.
Reply
Taupo says
February 8, 2021 at 4:32 pm
Hi Jim,
Thank you for your work. I would like to know if least Squares minimizes the – possibly –
weighted average distance between the left hand side observation and its conditional
expectation and would like to know why weights may interfere.
Reply
Samiullah says
January 22, 2021 at 1:12 pm
In Econometrics, ordinary least squares (OLS) which is the standard estimation
procedure for the classical linear regression model can accommodate complex
relationships. That is why there is a considerable amount of flexibility in developing the
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
29/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
theoretical model. But in qualitative analysis, accurate functional form is essential to
validate the true nature of the relationship and unbiased empirical results. The true
nature and intensity of the relationship is compromised without taking care functional
form, which may be harmful for any research work and its policy implication. In
Economics many variables provide U-shaped or inverted U-shaped curve that are called
quadratic terms. If two variables, i.e. Dependent (Y) and Independent (X) have a
quadratic relationship then:
How can we capture quadratic term of the independent variable in regression equation?
What will be the nature (sign) of U-shaped or inverted U-shaped coefficients of
quadratic term?
Reply
Jim Frost says
January 24, 2021 at 5:43 pm
Hi, one way to capture curve relationships is by including polynomial terms in your
model. Read my post about fitting curves for more information!
Reply
Micheal says
January 22, 2021 at 5:14 am
How coefficients estimates may change if the random sampling assumption is violated?
Reply
Jim Frost says
January 24, 2021 at 5:48 pm
Hi Micheal,
When you take a random sample from a population, you expect the sample, and
the statistics you calculate using the sample, will reflect the population from which
you drew the sample. However, when you use a non-random process, the sample
might not accurately reflect the population. Consequently, the coefficients might
not also apply to that population. Instead, the coefficients will represent your
particular sample and you might not be able to generalize the coefficient (and
other statistics) beyond the sample. The precise nature of the changes depends on
precisely how your sample differs from the larger population.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
30/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
I hope that helps!
Reply
Nima Kianfar says
January 20, 2021 at 7:31 am
Hello and good day
Thanks for your complete information about OLS,
I have a simple question,
One of my explanatory variables had a coefficient more than 1 (1.43), I wanted to know
is it true or there is a problem in my data and modeling process?
Thanks a million
Nima
Reply
Jim Frost says
January 22, 2021 at 12:40 am
Hi Nima,
Unlike correlation coefficients, which must be between -1 and 1, there is no such
limit on regression coefficients. So, your coefficient doesn’t indicate a problem by
itself.
Reply
Ildefonso Setubal says
December 2, 2020 at 12:38 pm
Hi Jim. Thanks for making this learning simple and fully understandable for people like
me who panics seeing statistical formula. I find your teaching quick to comprehend and
I am happy with your book – Regression Analysis. This would help me with my learnings
on Data Science. Many thanks for having this kind of format.
Reply
Jim Frost says
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
31/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
December 3, 2020 at 1:40 am
Hi Ildefonso, thanks so much for your kind words. You made my day! I’m also so
glad to hear that you’re happy my regression analysis book! 🙂
Reply
Deniyi says
November 17, 2020 at 1:25 am
Hi Jim,
Please in what ways can the violation of the OLS assumption affects the validity of the
results
Reply
Jim Frost says
November 17, 2020 at 4:06 pm
Hi Deniyi,
You’re in the right place to find your answers! Read through the assumptions. For
each one, I point out ways where violations affect your results. Some of these
points even have links to additional posts for more in-depth information about
particular violations and how to address them.
If you have more specific questions after reading through this post, please don’t
hesitate to ask!
Reply
Derebie says
October 10, 2020 at 11:32 pm
Thank you for great elaboration Sir.
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
32/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Aadil says
October 7, 2020 at 2:09 am
Hii
Jim
Can u explain
Wha is the significance of normality assumption in a classical normal regression model
.Why it is important in research
Reply
Jim Frost says
October 8, 2020 at 12:17 am
Hi Aadil,
This article covers the normality assumption in a regression model. Be sure to
focus on assumption #7. That should answer your questions.
Reply
Hubert Vahia says
October 4, 2020 at 7:48 am
Thanks and appreciate your response.
Reply
Hubert says
October 3, 2020 at 11:47 pm
Hi Jim,
This is a general question. If one of your friends argues that the OLS estimator may be
problematic as x it;1
is probably endogenous. If this were true, which assumption of linear regression would
not be valid, and what could be wrong with using OLS?
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
33/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Jim Frost says
October 4, 2020 at 12:49 am
Hi Hubert,
Read the post and pay particular attention to OLS assumption #3. I discuss
endogeneity and its effects there.
Reply
Emmanuel says
September 29, 2020 at 12:18 pm
Hi Jim,
I’m about to conduct my master’s study on “the prospects and constraints of
commercialisation among smallholder oyster mushroom farmers.” I intend to sample
from list of registered oyster mushroom producers in the study area. However, due to
the covid-19 situation, I’m highly likely not to get my sample through either simple
random sampling or systematic sampling or any of the probability sampling methods. I
instead want to just obtain the list containing the contacts of the target population and
call, so whoever answers the call and agrees to participate in the study, I will include in
the study.
Given my intended sampling procedure, can i still run any of the regression analyses
should i get participants of more than 30?
Thank you.
Reply
Jim Frost says
September 30, 2020 at 4:45 pm
Hi Emmanuel,
You can certainly use regression analysis to look for relationships in your sample.
Your question will be, how well do these results generalize to the larger population?
It’s possible you’ll find relationships, but you might have a hard time inferring that
those relationships exist in the target population. That’s always a concern when you
use a non-random sampling method.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
34/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
You can assess the characteristics of you sample and look for any ways it differs
from the target population. Did you tend to get younger, older, etc. people? That
can help you determine how well you can generalize your results. Possibly you’d
need to generalize to a more specific subpopulation. Suppose mainly younger
people responded to your call. Then, the results would apply more to them and
possibly not to older people. That’s the sort of thing to look for in the sample.
I hope that helps!
Reply
TC says
September 14, 2020 at 1:32 am
Hi Jim,
I attended an Udemy online course. The instructor says the assumptions of Linear
Regression include “the normality of independent variables.”(i.e. independent variables
need to be normally distributed). I can’t find any online articles or books supporting this
statement. Can you confirm this statement indeed is wrong?
TC
Reply
Jim Frost says
September 14, 2020 at 1:45 am
Hi TC,
Yes, that statement is incorrect. OLS regression makes no assumptions about about
the distribution of independent or dependent variables. It only makes distribution
assumptions about the residuals. However, it is easier to obtain normally
distributed residuals when the dependent variable follows a normal distribution.
But it is possible to obtain normally distributed residuals when the dependent
variable is nonnormal. In fact, the residuals don’t need to be normally distributed.
That assumption is technically optional. You only satisfy it if you want to perform
hypothesis testing on the coefficients. Of course, that’s usually an important part of
the results!
I hope that helps!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
35/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Amrita says
September 9, 2020 at 3:01 pm
Hi Jim,
I had one question, if our error term is so widely described, can we say it becomes
something of an outlier, in a given linear model? Because eventually we want the mean
of our error term (or our residual) to equal zero and we don’t factor in outliers either, so
does it make sense to just look at the error term as an outlier?
Reply
Jim Frost says
September 11, 2020 at 5:19 pm
Hi Amrita,
The error term doesn’t apply to a single observation in the same way that the
coefficients don’t apply to a single observation. Instead, it applies to the entire
population (as a parameter) or the sample (as a parameter estimate). So, you can’t
say that the error term itself is an outlier. However, a particular observation can
have a residual (which is is an estimate of the error for a particular observation)
that is an outlier.
You can’t just remove the error term because your model will never explain all the
variation. I talk about this in my post about residuals. Your model has a stochastic
portion (error) and a deterministic portion (the IVs). You need both portions.
Understanding the distribution of errors allows you to perform hypothesis testing
and draw conclusions about the variables in your model. I think that post will help
you understand!
Reply
mary kithome says
August 28, 2020 at 9:26 am
hello, can i use OLS to measure impact, eg impact of adoption on farm income? if not,
which are the appropriate methods to use? thank you
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
36/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Jim Frost says
August 29, 2020 at 3:30 pm
Hi Mary,
Because farm income is a continuous, dependent variable, I think OLS is a good
place to start. Whether you can ultimately settle on that for your final model
depends on whether you can specify a model that adequately fits your data–and
that can depend on the exact nature of the relationship between the IVs and DV.
But, yes, it’s probably the best place to start.
Reply
Olatunji says
August 21, 2020 at 1:29 pm
Hi Jim hope you are well. Would this explanation of yours answer my question of
explaining the concept of a simple (bivariate regression) using the OLS method?
furthermore is this sufficient for developing concepts of population, sample ,regression
and explaining how the model works with assumptions and violations of the
assumptions?
Thank you
Reply
Jim Frost says
August 24, 2020 at 12:40 am
Hi Olatunji,
Yes, use the same assumptions for simple regression. However, you don’t need to
worry about multicollinearity because with only one IV, you won’t have correlated
IVs!
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
37/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Vinay says
August 12, 2020 at 4:27 pm
Thank you! I’m absorbing what you explained so well 🙂
Reply
Vinay says
August 10, 2020 at 3:47 am
Thank you for doing what you do! It is very helpful
One stupid question – You said “Unfortunately, the error term is a population value that
we’ll never know”. I couldn’t grasp what it means to have an error term for a
population… If we know the entire population, then we don’t need to have any error,
right?
Thanks!
Reply
Jim Frost says
August 10, 2020 at 1:47 pm
Hi Vinay,
It’s not a stupid question at all. It actually depends on a very precise understanding
of error in the statistical sense. You’re correct that if you could measure an entire
population for a characteristic, you would know the exact value for the population
parameter. Imagine we measure the heights of everyone in a particular state. We’d
know the true mean height for that population. Well, discounting measurement
error.
However, random error in the statistical sense is the variation that the model
doesn’t account for. So, if we’re using a simple model for heights using the mean,
there’s still variation around our actual population mean–not everyone’s height in
the population equals the mean. That variation is not explained by the mean (our
model), hence it is error. Or take a multiple regression model where height is the
DV and say Gender and Weight are the IVs. That model will explain some
proportion of the population variance, but not all of it. The unexplained portion is
the error. So, even when working with a population, you still have error. Your
model doesn’t explain 100% of the variability in the population.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
38/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
For the error term for a particular observation, each predicted value for an
observation will usually not exactly equal that observations actual value. That
difference is the error term. Residuals are an estimate of the error term just like
the coefficients are estimates of the population parameters. However, if we did
measure the entire population, the coefficients are the true population values and
the residuals are the true error terms–assuming that we’re not fitting a
misspecified model.
I hope that helps! It’s can be a murky topic!
Reply
Ossy Degelleh says
June 6, 2020 at 11:42 am
Hi Jim, in brief what are the main strengths and limitations of OLS technique?
Reply
Jim Frost says
June 8, 2020 at 3:44 pm
Hi Ossy, the answers to your questions are in this article. Just read through it
carefully. I don’t want to retype in the comments section what I’ve already written
in the main article! Hints: look for the term efficient and then look through the
assumptions for various problems. You’ll also want to read my post about how OLS
is BLUE (Best Linear Unbiased Estimator)–which is more about the strengths. If you
have more specific questions after reading these two posts, just ask in the
comments section.
Reply
nil says
June 4, 2020 at 6:07 am
Suppose there is a relationship between the term error and the independent variable.
Can this model be estimated by OLS? First, write this model related to the error term
and independent variables and show how you should estimate this model according to
your
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
39/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Jim Frost says
June 5, 2020 at 12:12 am
Hi “Nil”, I’ll give you a hint. Your answer is in OLS assumption #3 in this article.
Reply
jack says
May 12, 2020 at 9:43 am
Hi jim,
is the t-test in hypothesis testing requires that the sampling distribution of estimators
follow the normal distribution. Do you agree with this statement?
Reply
Jim Frost says
May 12, 2020 at 11:10 pm
Hi Jack, yes, and the distribution of the coefficient estimates is linked to the
assumption about the distribution of the residuals. If the residuals follow a normal
distribution, you can conclude that the distributions for the estimators are also
normal. I suspect that the central limit theorem applies here as well. In that if you
have sufficiently large sample size, the sampling distributions will approximate the
normal distribution even when the residuals are nonnormal. However, I don’t have
good numbers for when that would kick in. Presumably it depends on the number
of observations per predictor.
Reply
jackson says
May 12, 2020 at 9:41 am
Sir, thank you for explaining well
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
40/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
jackson says
May 11, 2020 at 10:39 am
“The ordinary least squares (OLS) estimators are still unbiased even though the error
term is not normally distributed”. Comment on this statement.
Reply
Jim Frost says
May 11, 2020 at 4:15 pm
That statement can be correct but it isn’t necessarily correct. If the residuals are
nonnormal because you misspecified the model, the estimators will be biased. As I
state in this post, OLS does not require that the error term follows a normal
distribution to produce unbiased estimates with the minimum variance. However,
if you want to test hypotheses, it should follow a normal distribution.
Reply
NYAMUYONJO DAVID says
May 8, 2020 at 1:37 pm
Jim, thank you for explaining well and being kind.
Reply
Chris Akenroye says
April 23, 2020 at 6:06 pm
Please add me to your mailing list. I just went through your post on 7 Classical
Assumptions of Ordinary Least Squares (OLS) Linear Regression. Your explanation was
reader-friendly and simple. I really appreciate you and your style of knowledge
dissemination.
Thanks
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
41/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Lovelyn says
March 26, 2020 at 5:47 am
Very useful and informative.
Reply
Katrina says
March 26, 2020 at 3:26 am
Hi Jim,
Thanks for this post and all you do. I really appreciate it! I’m trying to solve a business
problem and I want to know if OLS is the right regression here..essentially I’m trying to
do a driver analysis of net promoter score/customer happiness. I have survey results
from 20000 respondents.
My dependent variable: overall score (11-pt scale containing a single rating from 0 – 10)
My predictor variables: three driver scores (11-pt scale containing a single rating from 0
– 10) asked in the same survey.
So far, I did correlation for the overall rating against the three driver ratings separately
and results show they are positive correlated.
To answer the question “if I increase any of the three driver’s rating by 1, how much
would that affect the overall score,” I tried to use excel’s regression but p-value for all
three drivers are basically 0 and that doesn’t tell me which one is the most important
and explains the overall score. My company wants to know which driver area they
should prioritize so I’m running into a wall and I’m wondering if OLS is even the right
model to use as these aren’t measurement data but rather ordinal data.. could you
please advise? I also read something about multinomial logistic regression online but
that’s beyond me.. any tips to proceed?
Thank you so much!!
Katrina
Reply
dbadrysys says
February 25, 2020 at 11:25 am
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
42/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Hi Jim,
Your post is very helpful in explaining detail.
But I have some unclear in case if the relationship is nonlinear, we can use transforming
exponential or logarithm functions to transform the data before using it (Fixes for
Linearity – OLS assumption 1)
Thanks.
Reply
Jim Frost says
February 25, 2020 at 11:40 am
Hi,
Yes, you can use data transformations as you describe to help. However, I always
recommend those as a last resort. Try other possibilities, such as specifying a
better model, before using data transformations. However, when the dependent
variables is very skewed/nonnormal, it can difficult to satisfy the assumptions
without a transformation.
Reply
Harry says
January 25, 2020 at 4:44 pm
Hello Mr Jim,
Thank you for your exceptional work that helps so many including me.
I am working on the abalone dataset, which has been previously used to model the age
of abalone (rings) through a number of predictors mainly involving gender (categorical
variable), weight variables and other mm variables (eg. length of abalone). In my case I
want to model instead the shucked weight (i.e. meat) for explanatory purpose, using the
other variables as independent and I don’t have in my possession the age variable this
time. However it appears that the independent variables are pairwise highly collinear in
which case it makes it really hard to find a proper model. In addition the biplots against
the shucked weight are not all linear and they all have a tendency of increasing variation
of the response. As a consequence the residual vs fitted plot of most possible models,
say we take the full model, show heteroscedasticity which I tried to solve via
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
43/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
transformation of either the response or the independent variables. At the moment I
don’t seem to be finding an exit to a model that has at least constant variation.
Do you think you could give me some constructive advice? Thanks.
Reply
Jim Frost says
January 26, 2020 at 6:29 pm
Hi Harry,
That sounds like a very interesting analysis that you’re performing!
You mention you don’t have the age, but do you have the number of rings that
could be a proxy for age? If so, use that as a proxy variable. Proxy variables are
variables that aren’t the actual DV but they’re related to both the DV and IV and can
incorporate some of the same information into the model. It helps prevent omitted
variable bias. Read my post about omitted variable bias, which also discusses proxy
variables as a potential solution.
You also mention the problems of multicollinearity and heteroscedasticity. Please
read my post about multicollinearity and learn about VIFs. Some multicollinearity is
OK. VIFs will tell you if you have problematic levels. I also present some solutions.
And, read my post about heteroscedasticity. Again that post shows how to detect it
(which it seems like you have) and potential solutions.
You also mention the need to fit curvature, so I think my post about curve fitting
will be helpful.
I think those posts will provide many answers to your questions. If after reading
those posts, you have specific questions about addressing those problems, please
post in the relevant posts.
Also, because your analysis depends so heavily on regression analysis, I highly
recommend buying my regression ebook. In that book, I go into much more detail
and cover more topics. For example, I would not be surprised if you need to use
some sort of data transformation. That seems to be common in biology growth
models (but I’m not an expert in that field). I don’t cover data transformations for
regression in a blog post (although I do have an example post) but I cover
transformations in my ebook.
Best of luck with your analysis!
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
44/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
khedidja djaballah says
December 8, 2019 at 10:19 am
I wish to know what is the relevance of intercept only models. And on what kind of data
can it be applied?
Reply
Jim Frost says
December 9, 2019 at 2:23 pm
Hi Khedidja,
An intercept model refers to one which does not contain any independent
variables. These models fitted values (or predictions) that equal the mean of the
dependent variable. Use the F-test of overall significance to compare a model to an
intercept only model. This test determines whether your model with IVs is better
than the intercept only model (no IVs). When the F-test is statistically significant,
you can conclude that your model is better than the intercept only model. If that Ftest is not significant, your model with IVs is not better than the intercept only
model. In other words, your model doesn’t explain the changes in the DV any
better than just using the DV mean. For more information, read my post about the
overall F-test of significance.
Typically, you’d only use an intercept only model when you have no significant IVs
and when the overall F-test is not significant. You have no IVs that have a significant
relationship with the DV. In this scenario, the only meaningful way you can predict
the DV is by using the mean of the DV. This outcome is not a good one because you
want to find IVs that explain changes in the DV.
Reply
EMMANUEL APPIAHH says
October 18, 2019 at 10:59 pm
what is the importance of relative efficiency
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
45/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Shannon says
October 15, 2019 at 5:06 pm
This was an actually lifesaver. I’m taking cross section econometrics at my university
and was really struggling with OLS. We were doing all this these linear algebraic proofs
and went over the assumptions, but the conceptual explanations were a bit difficult to
navigate. I particularly found your section on the importance of the “expected value of
the error term being zero” extremely helpful.
Reply
Sophia says
September 18, 2019 at 6:21 pm
Thanks for the amazingly detailed response, Jim! This is very helpful – I’m eagerly
looking forward to reading the articles in your reply. Thanks also for maintaining such
an informative blog!
Reply
Sophia says
September 18, 2019 at 12:03 pm
Thanks for replying, I was asking about this because even after adding what we think
are relevant regressors (like weather..), we are always either significantly underordering or over-ordering, and I was wondering if it was because the assumptions of
Linear Regression were not being met for the store sales data, and if we should look
into a different model.
Any guidance on alternative models to accurately estimate appropriate inventory
ordering quantities would be very helpful! Thanks,
Reply
Jim Frost says
September 18, 2019 at 4:20 pm
Hi Sophia, and apologies because your previous question about using OLS for your
ordering system fell through the cracks. The answer is that, yes, it might well be
suitable system. However, there are some potential challenges. First, be sure that
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
46/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
your model does satisfy the assumptions. This helps ensure that your model fits
the data. It’s possible there’s, say, curvature in the data that the model isn’t fitting.
That can cause biased predictions (systematically too high or too low).
If the model does satisfy the assumptions, say everything is perfect, it’s still
possible to have large differences between the predicted value and the eventual
actual value. It’s possible that your model fits the data well but it’s insufficiently
precise. In other words, the standard deviation of the residuals is simply too large
produce predictions that are sufficiently precise for your requirements. There are
different possibilities at work, and I can’t be sure which one(s) would apply to your
case.
You might have:
Too few data points.
Too few independent variables.
Too much inherent variability in the data.
The first two items, you can address. Unfortunately, the last one you can’t.
I’ve written about this issue with prediction in other blog posts. First and foremost,
read this blog post to see if imprecision might be affecting your model: Understand
Precision in Predictions. There are two key measures you should become familiar
with: standard error of the regression and prediction intervals.
I also walk through using regression to make predictions and assess the precision.
Also, what is the R-squared for your model. While I’ve written that R-squared is
overrated, a low R-squared does indicate that predictions will be imprecise. You can
see this at work in this post about models with low R-squared values. In practice,
I’ve found that models with R-squared values less than 70% produce fairly
imprecision predictions. Even higher R-squared values can be too imprecise
depending on your requirements. Again, the standard error of the regression and
prediction intervals are better and more direct measures of prediction precision.
So, the first step would be to identify where the problem lies. Does the model fit
the data? If not, resolve that. If it does fit the data, assess the precision of the
predictions.
I hope this helps!
Reply
Sinks says
September 18, 2019 at 3:25 am
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
47/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
You can only use these assumptions if the sample is drawn from the main population to
ensure that the results generated are close or are a reflection of the population.
However, there is no harm in using regression for the entire population (as in your case)
to assess the trend in the sales.
Reply
Jim Frost says
September 18, 2019 at 3:55 pm
Hi, I’d disagree with your statement slightly. It’s true that if you perform regression
analysis on the entire population, you don’t need to perform hypothesis testing. In
that light, the residuals don’t need to be normally distributed. However, other
assumptions certainly apply. There are other assumptions that address how well
your model fits the data point. For example, if you data exhibit curvature, your
model needs to fit that curvature. If it doesn’t, you’ll see it show up in the residuals.
Consequently, you still want to check the residuals vs. fits plot to ensure that the
residuals are randomly scattered around zero.
Additionally, if you want to use your model to making predictions, the prediction
intervals are valid only when the residuals are normally distributed. Consequently,
even when you’re working with a population, that normality assumption might still
be in effect!
Reply
Dan says
August 31, 2019 at 10:52 pm
Would you mind discussing (briefly or point me to the right direction) the relationships
among unbiasness, variance, and consistency for an estimator? Thanks!
Reply
Jim Frost says
September 2, 2019 at 2:14 pm
Hi Dan,
Unbiased means that there is no systematic tendency for the estimator to be too
high or too low. Overall, the estimator tends to be correct on average. When you
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
48/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
assess and unbiased estimator, you know that it’s equally likely to be too high as it
is to be too low.
Variance relates to the margin of error around the estimator. In other words, how
precise of an estimate is it? You want minimum variance because that indicates that
your estimator will be relatively close to the correct value. As variance increases,
probability increases that the estimator is further away from the correct value.
Consistency indicates that as you increase the sample size, the value of the
estimator converges on the correct value.
I hope this helps!
Reply
Florentino Menéndez says
August 12, 2019 at 5:32 pm
Thanks a lot for your answer! There are some topics in which I need some additional
study. But now I had a place where to look at! Thanks again 🙂
Reply
Florentino Menéndez says
August 12, 2019 at 3:05 pm
First of all, congratulations on your book. I have bought it and find it very, very clear. I
have learnt a lot of details that collaborate to round up my comprehension of the topic.
Again: congratulations and thanks for it.
Second thing: a question. I teach basic statistics and a student bring me a linear
regression with repeated observations. There were 40 medical patients measured three
time each one. The file was 120 rows. In other ways the regression was ok, but I
objected that the observations were not independent, so the p-values were not real.
My student asked me how he could do the regression in order to use the 120
measurements, but I don´t know what could we do. I use Stata and Spss.
Any help will be very much appreciated.
Reply
Jim Frost says
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
49/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
August 12, 2019 at 3:57 pm
Hi Florentino,
Thank you for buying my book! I’m very happy to hear that you found it to be
helpful!
I once had a repeated measures model where I tracked the outcome over time. For
that model, I simply used the change in the outcome variable as the dependent
variable. Perhaps that would work from your student?
I believe that repeated measures designs are more frequently covered in ANOVA
classes. I do have a post about repeated measures ANOVA. That post talks about
crossover designs where subjects are in multiple treatment groups–which may or
may not be relevant to your student’s study. However, it also discusses how to
include subjects as a random factor in the model, which is relevant and will give
you a glimpse into how linear mixed models work, which I discuss more below.
Mixed models contain both fixed effects and random effects. That post also
explains about how these models account for the variability of each subject.
Linear mixed models, also known as mixed effects models, are a more complex but
a very flexible type of model that you can use for this type of situation. This type of
model adds random effects to the model for the subjects. In other words, the
model controls for the variability of each subject. There are different types of linear
mixed models. Random intercept models accounts for subjects that always have
high or low values. Individual growth curve models describes each subject’s over
time. These types of models are very flexible but also very advanced. I don’t have
much experience with them so I don’t want to give bad advice about what type to
use. Just be aware that they are complicated and easy to misspecify. If your student
goes this route, it’ll take some research to find the correct type and model
specification that meets their study’s requirements.
Another possibility is multilevel modelling. This type of model is particularly good
for nested data.
Again, it’ll take a bit of research combined with knowledge about the student’s data
and objectives to determine the best approach.
Hopefully this will at least help point your student in the right direction!
Reply
Gelgelo says
July 8, 2019 at 10:19 am
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
50/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
hello mr jim it was helpful.but i have some questions to ask
I am conducting some research on socio economic impacts of droughts and i want want
to show some effects of droughts by logit models e.g i have pasroralist and agro
pastoralist and also sex,age.farm land … and the efects are livestock
mortality,rangeland degradation,loss of services and others so how can i use logit
regression models by using coefficient estimated using an ordinary least squares
regression.
Reply
Jim Frost says
July 9, 2019 at 10:41 am
Hi Gelgelo,
I’m not sure that I understand what you’re asking. Typically, you use logit models
for binary dependent variables and OLS for continuous dependent variables. That’s
usually how you decide.
I hope this helps!
Reply
Sophia says
June 29, 2019 at 4:49 pm
Very insightful article, Jim! I have a fairly basic question – I get that these assumptions
need to be checked when you’re working on a sample of the data, but what if I perform
linear regression on the entire population – would the same assumptions still need to
be satisfied?
I’m working on an inventory ordering project at school, and have sales data for a few
years for some products sold in a store. I’d like to regress the sales data for each
product against some independent variables (to understand which variables affect the
demand for a product), and I’m trying to figure out if Linear regression would be a
suitable model.
I’d greatly appreciate any insight – thanks!
Reply
Pavel grabov says
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
51/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
May 25, 2019 at 5:12 am
Thank you very much for your blog.
I have only one question: what about the assumption, that only ‘Y’ is subject to the
errors of measurement? We are searching the best fit line on the basis of the sum of the
distances between the results of experiments and a fitting line, and we measuring these
distances in parallel to the Y-axis. In principle, these distances could be measured in
parallel to the X-axis or in orthogonal to the fitted line. Obviously, if the X-values are
supposed to be error-free, the distances should be measured in parallel to the Y-axis,
but if this assumption is invalid – the linear model presenting Ordinary Least Squares
Linear Regression will be incorrect.
Best Regards,
Pavel
Reply
Jim Frost says
May 27, 2019 at 10:28 pm
Hi Pavel,
So, the assumption about no errors in the X-values is the ideal scenario. I don’t
think any model is going to satisfy that 100%. You just have to try to minimize
measurement errors of the independent variables (X-values). Most studies don’t
need to worry about this problem.
Reply
Justin Tusoe says
April 18, 2019 at 9:42 pm
hi Jim,
I must say the book is very helpful. i am doing a research on the impact of human
capital development on economic growth in Ghana. there are thousands of factors that
affect economic growth. How do i choose the right variables?
Reply
Jim Frost says
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
52/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
April 18, 2019 at 11:42 pm
Hi Justin,
First off, thank you so much for buying my ebook about regression analysis! I’m
glad you’re finding it is very helpful. One of the themes throughout the book is that
you need conduct a lot of research before beginning your analysis. What variables
do other similar studies include? Which variables tend to be significant, what are
their signs and magnitudes, etc. This will help you gather the correct data and
include the correct variables in the model. You need to gain a lot of subject-area
knowledge to know which variables you should include.
In the book, chapter 7 discusses how to specify and settle on a final model.
Best of luck with your analysis!
Reply
Alberto Javier Vigil-Escalera says
April 17, 2019 at 2:31 am
Hi Jim
First of all, thank you for so kindly getting back to me.
A couple of final questions.
In order to detrend the series, do you mean for example differentiating them or using
percentage change, like
Year on Year changes?
If that is what you mean, I am doing that: My dependent variable is the % anual change
in the SP500 and I am doing the same with the independent variable. (Sorry for my lack
of clarity in my explanation)
Also by reading your excellent book, I am already in page 106, I realized that I may have
another problem: Which is that the equation that I try to find may be non linear, or if
linear, I may need to transform some of the variables. Could this be also an explanation
for the heteroscedasticity and the residual correlations that I have in my model?
How do I know which variables I should transform? How do I know what non-linear
regression is the one that I should look? Is that also in your book?
Many thanks.
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
53/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Jim Frost says
April 17, 2019 at 2:37 pm
Hi again Alberto,
Yes, that’s exactly what I meant. If you’ve done that, I would’ve thought that
would’ve also removed the heteroscedasticity. You might have to try other options
for resolving that issue, which I cover later in the book, starting on page 213.
You can certainly try a transformation. I have section dedicated to transformations,
starting on page 244. Consider that as the last resort though. But, yes,
transformations can fit nonlinear relationships, fix heteroscedasticity, and fix
residuals issues. However, while that all sounds great, save that until the last. Try
the other solutions first. I do provide guidelines, tips, etc. for choosing
transformations and for which variables. However, choosing the correct
transformation method and variables to transform is a bit trial and error. You can
also look to see how other similar studies have handled it.
I hope this helps!
Reply
Davidjackline says
April 16, 2019 at 1:40 pm
This is an awesome and brilliant elaboration. Thank you Jim.
Reply
Alberto Javier Vigil-Escalera says
April 14, 2019 at 12:02 pm
Hi Jim
Thank you very much for your blog, and congratulation for your book, I found it very
interesting and easy to understand, specially since I have not a mathematic background,
I am just a lawyer that got involved in finance.
My question is: Can I still use the results of a regression that violates the rules on
heterokedasticity and residual correlations?
Let me give you a bit of background.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
54/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
I am doing regression analysis on the SP 500. My depended variable is the monthly
SP500 Year on Year return. My independent variables are the usual ones: Activity
indicators, Monetary Indicators, Volatility indicators. I have more than 200 monthly
observations and I don´t have more than 7 independent variables.
All, dependent variables and independent variables are used in Year on Year changes.
I run the regression and I get a SQ R above 0,7. I looked at F and it looks very significant.
Then I looked at the T scores on the independent variables and they are all significant
(with the exception of the constant).
Also the regression makes sense from an economic point of view. For example: the
more economic activity the higher the returns, the sign (+/-) of the monetary policy also
makes economic sense…
I look for Collinearity and there is not. So far so good, I think, but my lucky strike ends
right there.
When I checked for Heterokedasticity (B-P) and residual correlation (DW) it shows that
both exists.
I know that those are serious problems but I still wonder:
1. Residual serial correlation means that there are still independent variables out there
that I still should find. However my Rsq is high and my F is too. So even if I don’t know
of all the factors that move the SP 500 I know a good amount of them.
2. My hetorokedasticity makes my estimations of the SP 500 very weak. However, could
I still use this model to give me the SP 500 direction instead of using it to find a specific
target of return? I mean could I use this model to tell me if the SP 500 could go up or
down from here, instead of using it to tell me if it will go up 10% or -15%.
Thanks for your help.
Reply
Jim Frost says
April 16, 2019 at 11:24 am
Hi Alberto,
First, thanks for buying the book and I’m glad to hear that it was helpful!
The quick answer is that you really should fix both heteroscedasticity and
autocorrelation. Both of these condition produce less precise coefficient estimates.
And, heteroscedasticity tends to produces p-values for the coefficients that are
smaller than they should be. So, you’re thinking they’re significant when they might
not be.
I also see an additional potential problem with your model. I think you’re going to
have long term positive trends in both economic activity and the S&P 500. When
you have trends in your data and perform regression analysis, you’ll get significant
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
55/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
results and an inflated R-squared. After all, both things are following long term
trends. What you really need to do is detrend the data. Then show how deviations
from the trend in the economic activity side of things related to deviations from
trends in the S&P 500 side of things. That might also help with your
heteroscedasticity (I have a section in the book about handling heteroscedasticity
you should read). Consider adding lag variables to reduce the autocorrelation. It
might be that previous economic activity relates to the current S&P 500. That might
be the type of information you’re seeing in the residuals. There’s a bunch of
additional things to consider with using regression analysis with time series data.
Predicting the stock market is very difficult. It’s not surprising that you have a weak
model. After all, if it was easy, everyone would be able to make a fortune predicting
the market!
Reply
Kenji Kitamura says
February 10, 2019 at 3:07 pm
Thank you very much for you explanation. It became clear!
Reply
Kenji Kitamura says
February 9, 2019 at 9:13 pm
Thank you very much for your great explanation. It is so helpful.
I have a small question.
You state that
“if the average error is +7, this non-zero error indicates that our model systematically
underpredicts the observed values.”
The average error is for single linear regression is
E[e|x] = (1/n) Summation of [yi – (a+b1xi)]
Thus, the size of the average error E[e|x] depends on the scale of dependent and
independent variables. Therefore, I wonder why you can say +7 is a cut off point for the
bias. I understand that we don’t really care about this in practice given that the constant
addresses this bias, but just curious about this claim.
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
56/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Jim Frost says
February 9, 2019 at 9:33 pm
Hi Kenji,
I see I’m going to have to clarify that text! I was just using +7 as an example, and
didn’t mean to make it sound like a cutoff value for bias. If the average residual is
anything other than zero, the model is biased to some extent. You want the model
to be correct on average, which suggests that the mean of the residuals should
equal zero. Note that right after I mention +7, I refer to it as a non-zero value,
which is the real problem.
And, you’re correct, as I mention in the post, when you include the constant, the
average will always equal zero which eliminates the worry of an overall bias.
Although, you can still have local bias, such as when you don’t correctly model
curvature.
Reply
Farhan says
February 2, 2019 at 1:01 am
Thanks Jim…. 😐
Reply
MahboobUllah says
December 17, 2018 at 11:22 am
The Best
Reply
CATHERINE NAMIRIMU says
December 5, 2018 at 8:10 am
Thanks Jim, the explanations are far easy to understand. Its very interesting to study
from in here.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
57/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
kwaters126 says
November 6, 2018 at 3:14 pm
I think you meant to say 4 and 5 here:
“Note: When assumption 5 (no autocorrelation) and 6 (homoscedasticity) are both true,
statisticians say that the error term is independent and identically distributed (IID) and
refer to them as spherical errors.”
Otherwise, wonderful post
Reply
Jim Frost says
November 6, 2018 at 5:38 pm
Yes! Thank you for catching that! I’m making the change now.
Reply
John Grenci says
October 31, 2018 at 6:18 pm
Hey Jim, I happened to find your site, and hoping you can help me. I am doing a study
on predicting home run rates of actual baseball players. so, I set up criteria, certain
number of plate appearances, etc. and modeled Home run rate for a year based on
their previous home rate (going back 5 years). I also have an age flag. all coefficients are
highly significant. I performed a test of normality. the r squared for several thousand
observations is a little more than .6, so I think the fit is good. but here is my question,
and this question could apply for many contexts, I think. it deals with homoscedasticity.
it seems intuitively that this should rarely hold up. why? because isn’t it true that if you
have two (or more) ranges of similar values, the variances will be in a similar
proportion. in other words, take two rooms of males. one has 30 newborns, and one
has 30 20 year olds. you are analyzing their weights. assume the mean weights of the
newborns olds is 9, and the mean weights of the 20 year olds is 200. It is certain that
the variance of the newborns will be smaller than the variance of the 20 year olds, and
my best guess would be that the variances have the same ratio as the ratio of the
means (9 to 200). proportion to the . so, when predicting ANYTHING, whether it be
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
58/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
advertising predicting revenue, or previous home run rates predicting home run rates
for the upcoming season it just seems almost certain that at least in the case of home
run rates, the same type of phenomenon will happen. so, much like the weights.. in the
20 year olds, you have some people who weigh 350 pounds, and some that weigh 130.
It is IMPOSSIBLE to have that variability among newborns. I gave an extreme example
to illustrate my point. thanks John
Reply
Jim Frost says
November 1, 2018 at 2:41 pm
Hi John,
I’m glad you found my site! Great questions!
One potential issue I see for your model is the fact that you’re using the model to
make predictions and you have an R-squared of 0.6. Now, one thing I never do is
have a blanket rule for what an R-squared should be. That might be the perfectly
correct R-squared for the subject area. However, R-squared values that aren’t
particularly high are often associated with prediction intervals that are too wide to
be useful. I’ve written several posts about using regression to make predictions,
prediction intervals and precision, etc. that talk about this. One you should check
out is my post about how high does your R-squared need to be, and then maybe
some of the others.
Now, on to homoscedasticity. First, you should check out my post about
heteroscedasticity. It talks about the issues you discuss among along with
solutions. You’re absolutely correct that when you have a large range of dependent
variable values, you’re more likely to have heteroscedasticity. In contrast, I often
use a height-weight dataset as an example, but it’s limited to young teen girls. It’s
more restricted and there’s no heteroscedasticity present, which fits in nicely as the
converse of your example.
That all said, I’m often surprised at how rarely heteroscedasticity appears outside
of extreme cases like one that you describe. Anyway, read that blog post, and if you
questions after that, don’t hesitate to ask!
Reply
Rainard Mutuku says
October 30, 2018 at 5:24 am
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
59/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Hae Jim, thanks.
Your presentation is well illustrated and precise.
Reply
ghazanfar says
October 22, 2018 at 3:40 pm
sir thanks you make stat easy for me by your good explanations but one thing is
confusing me which test is best to check the heteroskedasticity.
Reply
Amit says
September 7, 2018 at 3:13 am
SInce you replied sir…I am elated to ask some of my doubts sir:
a)Sir we know expectation of errors is zero is a basic assumption but sir we also get
Summation of errors=0 as the first constraint from LSM(least square method).Now what
is the difference between two…..I think that always linear regression line is going to
pass through the center of points but only LSM is going to minimise the errors .but
E(e)=0,even if we donot use LSM(OLS)..Am I right?
b)Sometimes,our software predict a line for the curves ,then also our E(e)=0,then we
need to add square terms or transformations to meet homoscadasticity ,still E(e)=0…
meaning software always try to predict and get E(e)=0
Reply
Jim Frost says
September 7, 2018 at 9:18 am
Hi Amit,
I’m not 100% sure that I understand your questions. But, yes, the expectation that
errors are zero and the summation of errors equaling zero are related.
Furthermore, if you include the constant in your model, you’ll automatically satisfy
this assumption. Read my post about the regression constant for more information
about this aspect.
However, what I find is that while the overall expectation is that the error equals
zero, you can have patterns in the residuals where it won’t equal zero for
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
60/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
specification ranges. The classic example of that is where you try to fit a straight
line to a data that have curvature. You might have ranges of fitted values that
systematically under-predict the observed values and other ranges that overpredict it even though the overall expectation is zero. In that case, you need to fit
the curvature so that those patterns in the residuals no longer exist. In other
words, having an overall expectation equal zero is not sufficient. Check those
residual plots for patterns. I talk about this in my post about residual plots.
I don’t know about your software and what it does automatically, but in general the
analyst needs to be sure that not only does the overall expectation equals zero,
which isn’t a problem when you include the constant, but that there are no ranges
of fitted values that systematically over- and under-predict. Again, read my post
about checking the residual plots!
I hope this helps!
Reply
Uma Shankar says
September 6, 2018 at 10:41 am
Hi Jim,
I agree to the point where Y need not follow normal distribution as we don’t know the
distribution of population of Y . However the sample statistics i.e. the regression
coefficients or the parameter estimates follow norma distribution ( Thanks to Central
Limit Theorem – the sampling distribution of sample mean follows normal distribution).
In that case, since Y-hat is a linear combination of paramters estimates, it should turn
out that y-hat should follow normal distribution right?
The linear combination of normally distributed random variables results in a normal
distribution .
Thank you.
Reply
Jim Frost says
September 11, 2018 at 12:28 am
Hi Uma,
Sorry about the delay in replying!
As it turns out, y-hat doesn’t necessarily follow a normal distribution even though it
is a linear combination of parameter estimates.
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
61/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
If the residuals are normally distributed, it implies that the betas are also normally
distributed. That part is true. It would also seem to imply that the y-hats are also
normally distributed but that’s not necessarily true. However, if you include
polynomials to model curvature, they can allow the model to fit nonnormally
distributed Ys and yet still produce normally distributed residuals. Even though it is
modeling curvature, it is still a linear model. I actually have an example of this
using real data, which you can download–using regression to make predictions. I
don’t mention it in the post, but the dependent variable is not normally distributed.
Because the model provides a good fit, we know that the y-hats are also
nonnormal.
I hope this helps!
Reply
Uma Shankar says
September 6, 2018 at 9:22 am
Expected value of error is still zero as it is assumed that the mean value of error clusters
around zero. However the error need not be normally distributed which is not a strict
assumption even in OLS regression.
In Linear regression, Y – hat is linear combination of parameter estimates with expected
value of error being zero as the errors are assumed to be iids with mean clustered
around zero. Same applies here as well. Because errors are independent and all
independent variables are exogenous.
My question here is , how can the Y-hat satisfy the normality assumption (it being a
sampling distribution)as here, Y-hat is not a linear combination of parameter estimates
unlike in Linear regression. How does the inferential statistics work here?
Jim, Please help with the analysis and correct me if I’m wrong here with expected error
being zero in the question asked.
Thanks all.
Reply
Jim Frost says
September 6, 2018 at 9:58 am
Hi Uma,
Neither Y-hat nor Y need to follow the normal distribution. The assumptions all
apply to the residuals for both linear and nonlinear regression. While the residuals
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
62/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
don’t need to be normally distributed, it is helpful if you want to perform
hypothesis testing and generate confidence intervals and prediction intervals. Does
that answer your question?
Reply
Amit says
September 6, 2018 at 4:15 am
If some Y=e^xb is the function then E(e)=0 or not?In other words if it is not linear
regression ,will the expectation of errors be zero?Why or why not?
Reply
Jim Frost says
September 6, 2018 at 9:35 am
Hi Amit,
The assumptions for the residuals from nonlinear regression are the same as those
from linear regression. Consequently, you want the expectation of the errors to
equal zero. If fit a model that adequately describes the data, that expectation will
be zero. Of course, if the model doesn’t fit the data, it might not equal zero. But,
that is the goal!
I hope this helps!
Reply
Riana says
August 26, 2018 at 8:33 am
This is just wonderfully written! Thank you so much! I often heard this iid assumption,
but never quite knew what was meant by it! I will definitely read all your other posts.
I hope you will also easily explain the field of time series econometrics and/or
asymptotics anytime soon 🙂
Reply
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
63/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Jim Frost says
August 26, 2018 at 11:07 pm
Hi Riana,
Thank you so much! Your kind words mean a lot to me!
I plan to write about those other topics at a future date, but there’s so much to
write about!
Reply
Uma Shankar Surreddy says
August 22, 2018 at 3:00 pm
Hi Jim, Wonderful explanation. I have a doubt, in assumption 2 – “The error term has a
population mean of zero”, Isn’t this about residual and not the error/disturbance term ?
Because the error/disturbance term ( a population object) is ideally independent or
uncorrelated with other errors and their sum is almost never zero. But in the case of a
sample statistic like, sample mean, the residuals are not independent and hence make
up for a mean value of zero. Please correct me if I’m wrong.
Reply
Jim Frost says
August 23, 2018 at 2:29 am
Hi Uma,
The error term is an unknown just like the true parameter values. The coefficients
estimate the parameters while the residuals estimate the error term. Ideally, the
error term has a zero mean and are independent of each other. Because we can’t
know the real errors, the best we can do is to have a model that produces residuals
with these properties.
So, yes, the error term can and should have a mean of zero. But, we can only use
the residuals to estimate these properties. Consequently, the residuals should have
a mean of zero and be independent of each other.
I hope this helps!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
64/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Reply
Giulio Graziani says
June 6, 2018 at 5:26 pm
This is gold Jim thanks a lot!
Reply
Felix Ajayi says
June 3, 2018 at 11:47 pm
Sir you are simply wonderful. Your post is reader-friendly. Kindly send this piece to my
email
f********@*****.com
I want to follow you for a guide to learning and teaching in econometrics and more
importantly running the analyses in my academic research.
Regards.
Reply
Jim Frost says
June 6, 2018 at 2:49 pm
Thank you, Felix! That means a lot to me. I removed your email address from your
comment for privacy. I don’t have anything to email now, but I’ll save your email
address for when that occasion arises. You can always receive alerts about new
posts by filling in the subscribe box in the right navigation pane. I don’t seen any
junk mail!
Reply
Isaac kojo Annan Yalley says
June 3, 2018 at 9:37 am
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
65/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Thanks for making statistics easy and understanding for us
Reply
Jim Frost says
June 6, 2018 at 2:47 pm
You’re very welcome, Isaac. I’m glad my website has been helpful!
Reply
Tavares says
June 1, 2018 at 1:47 pm
Thank You. I appreciated the content
Reply
Jim Frost says
June 1, 2018 at 1:53 pm
You’re very welcome. I’m glad it was helpful!
Reply
Comments and Questions
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
66/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Meet Jim
I’ll help you intuitively understand statistics
by focusing on concepts and using plain
English so you can concentrate on
understanding your results.
Read More...
Search this website
Buy My Introduction to
Statistics Book!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
67/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Buy My Hypothesis Testing
Book!
Buy My Regression Book!
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
68/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Subscribe by
Email
Enter your email address to
receive notifications of new
posts by email.
Your email address
First Name
Subscribe
I won't send you spam. Unsubscribe
at any time.
Top Posts
How To Interpret R-squared in Regression
Analysis
How to Interpret P-values and Coefficients
in Regression Analysis
F-table
Z-table
How to do t-Tests in Excel
Correlation vs Causation: Understanding
the Differences
T-Distribution Table of Critical Values
Weighted Average: Formula & Calculation
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
69/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
Examples
How to Find the P value: Process and
Calculations
Multicollinearity in Regression Analysis:
Problems, Detection, and Solutions
Recent Posts
Correlation vs Causation: Understanding
the Differences
One Way ANOVA Overview & Example
Observational Study vs Experiment with
Examples
Goodness of Fit: Definition & Tests
Binomial Distribution Formula: Probability,
Standard Deviation & Mean
Expected Value: Definition, Formula &
Finding
Recent Comments
Jim Frost on Nominal, Ordinal, Interval,
and Ratio Scales
Denise Martin on Nominal, Ordinal,
Interval, and Ratio Scales
Jim Frost on Exponential Smoothing for
Time Series Forecasting
Jim Frost on Nominal, Ordinal, Interval,
and Ratio Scales
Anne Nelson on Standard Deviation:
Interpretations and Calculations
Copyright © 2023 · Jim Frost · Privacy Policy
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
70/71
10/29/23, 10:28 PM
7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression - Statistics By Jim
https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/
71/71
Download