Uploaded by Joshua Price (YoshiB)

Econometrics Lecture Notes: Regression, Hypothesis Testing

Matrix algebra: Matrix of (a x b)
dimensions has a rows and b columns.
Vector of order n denotes the amount of
rows/columns in the vector.
For beta <0, X< -tcalc, for beta not equal
to 0, |X|>tcalc and use half the
significance level. P values in Eviews are
assuming a two tailed test, so if a one
sided test is used p value must be halved
to account for this.
Result of AB
Transpose rows become columns.
Symmetric if base and transpose are
equal. Trace is sum of diagonal.
AB=BA=Identity, then B is an inverse of A.
Data structures: Cross sectional is
observations taken at the same point in
time. Time series is taken at different
points in time. Pooled cross-sectional is
cross-sectional at different points in time.
Panel is the same cross sectional at
different points in time (earnings of the
same workers over time).
Prob of A given B
Discrete can only be a set number of
numbers, Continuous has an infinite
amount of numbers within a range.
Expected value is the sum of outcomes
multiplied by their probability, or the
integral of such for a continuous variable.
Variance measures how tightly clustered
values are around the mean = 𝜎 2
Standard deviation always has the same
units. Covariance is how strong two
variables move together. If independent,
will equal 0.
Can only be
between or equal to -1 and 1.
Is equivalent to
OLS Estimator: The best coefficients
are found by partially deriving the sum
of squared residuals equation to find
the values that make the error the
lowest. B1 is equal to covariance over
When using OLS, properties arise. 1.The
sample values of the OLS residuals sum
to zero. 2. Residual is orthoganal to x
3.Mean of the sample and estimation
are equal.
Where SST is the total sum squared,
SSE is the amount explained by
regressors, and SSR is the
Basic T Test:
1. State Null and alternative hypothesis
2.State significance level of test
3. State t stat and null distribute
k = number of
4. Calculate t stat
5. Find t crit from tables
6. State decision rule and decision
“since X>x, we reject the null of (null) in
favour of alternative that (alternative)”
By substituting B1 into
an equation, finding the significance of S
can be used to prove/disprove the
hypothesis of B1 and B2.
Confidence interval: By using
Where c is the critical value from the
textbook tables and a is the significance
level, a confidence interval can be found
with the estimated equation. This can
also be used to test if a coefficient can
be a certain value.
F Test:
1. Find the restrictions and impose
them, making an unrestricted model and
a restricted model (e.g. b1=b2, sub into
1a. State the confidence level
2. Estimate both equations and get the
SSR of both models
3. Test statistic
Where q is the number of restrictions (=
signs) and k is the number of regressors
in the unrestricted.
4. Find F crit from tables and calculate f
5. Reject if Fcalc > Fcrit.
Used to find if all restrictions apply or if
one or more of them doesn’t apply.
E.g. b1=b2=b3=0 would fail f test if b1
didn’t = 0
Dummy variables: Intercept dummy
variables stand on their own, whereas
Slope dummy variables are attached to
another variable. E.g. intercept would
be a female dummy variable with a
coefficient attached, whereas a slope
would be a female dummy variable
multiplied by an education variable with
a coefficient. Therefore, dependent
does not change just based off male or
female, but also the effect of gendered
education on dependent.
Perfect Multicollinearity: Arises when
there is an exact linear relationship
between some or all the regressors.
Typically occurs due to mistakes in
including dummy variables. Cannot use
OLS if present. This can potentially
solved by omitting certain dummy
variables to form a base category, or by
omitting the intercept.
Near Multicollinearity: This occurs
when there is not an exact linear
relationship, but some regressors are
still highly correlated. Results in the
standard errors for T tests being too
large. Makes it more likely to result in
null not being rejected. However F tests
can still be used.
Regression with logs: Log level
coefficients measure the % change in y
given a one unit change in an X variable.
A Log log or log-linear is when both
an x and the y are logged. It
measures the percentage change in y
from a one percent change in x.
When using logs, the variable must
have a strictly positive range, and
must think on whether its more
helpful to have variables in % form
or level form. Variables measured in
% are not logged.
Quadratic regression: The marginal
effect of a x variable with a power to
it is the derivative of that x variable.
E.g. when 𝛽1 π‘₯ + 𝛽2 π‘₯ 2 marginal
effect is equal to
Therefore marginal effect varies
depending on xi. Relationship
involves deriving an estimated
equation to find the turning point,
when the derivative = 0
This shows that minutes of sleep per
week decrease up until age 42.5,
where it reaches a minimum, then
starts to increase.
R^2: R^2 will always increase when
more explanatory variables are
Use when y
is logged.
Level log measures the change in the
level of y when a one percent change in
the level of an x variable is made.
Adjusted R^2 cannot be used to
compare different dependent
Information criteria: Using the formula
Where C and P(k) are a constant and a
function defined by the type of
Information criteria.
AIC(Akaike Information Criteria),
SIC/BIC(Schwarz/Bayes). The preferred
model is the lowest of the options.
Penalties when n>16 are SIC>HQ>AIC.
Statistical properties: Unbiased
estimators are the estimate of a
coefficient, which will be unbiased if the
mean of the estimate is equal to the
original value.
For this to apply to an estimated model,
the following assumptions are required.
1.Linear in parameters:
2.Columns of X are linearly independent:
No column can be written as a linear
function of another.
3.Zero conditional mean assumption
Expected value of u given X is equal to 0
e.g. all correlation between errors and x
variables = 0.
Variance covariance matrices for a
standard linear regression look like:
These variances are used to find
standard error and confidence interval of
These variances are found using
Μ‚2 𝛼𝑗+1𝑗+1
Μ‚ (𝛽̂𝑗 ) = 𝜎
Where a is the element in row j+1 and
column j+1 of the matrix (𝑋 ′ 𝑋)−1 ,
Which is also the estimator of variance
in the error term ui. This variance can
be rooted with the above equation to
find the SE of betaJ.
Gauss-Markov Theorm: Two estimates
that are unbiased can be compared by
their variance, where the lower
variance results in it being a better
estimator. The therom states that
when the three assumptions hold for
unbiased estimators, as well as an
additional assumption that
π‘‰π‘Žπ‘Ÿ(𝑒|𝑋) = 𝜎 2 𝐼𝑛 = π‘‰π‘Žπ‘Ÿ(𝑦|𝑋),that
errors are homoskedastic and serially
uncorrleated, then the OLS estimator is
BLUE (best linear unbiased estimator).
Heteroskedasicity: When the variance
differs between error terms.
Can be caused by different kinds of
things e.g greater variance in the
amount of food consumed depending
on income level. This results in OLS
estimator no longer being BLUE, which
means default standard errors are
incorrect and t and F tests will be
incorrect also. To detect HTSK, the
graph can be viewed to see the
variance in points on a scatterplot, or
tests can be done.
Breusch-Pagan test
1.Null is that all coefficients in aux
regression = 0, alternative is that at
least one does not = 0.
2.Estimate model to obtain residuals
3. Create an auxilary regression using
the same variables but for u^2. Obtain
the R^2 from that regression.
4.Create test statistic:
Where n is sample size and q is number
of regressors. (x is chi in tables)
5. Reject if calc is higher than crit from
tables. Otherwise estimating aux
regression and doing F test also works.
White test is the same process, however
alternative hypothesis is different where
H1: the variance is a smooth function of
xi1…xik. In auxiliary regression include
values, squared values, and cross
q is number of regressors in aux
Use white standard errors and wald
statistic in place of standard errors and f
stat for tests, however in small samples
this can be misleading as they are
Serial correlation: When there is
correlation between error terms,
common in time series data. Same
consequences as heteroskedasticity.
Can be observed from a line graph of
errors by looking for a trend.
Correlogram shows Corr(ut, ut-j). If
column 1 has values outside the bands,
reject the null that Corr(ut, ut-j)=0, and
use column 3 to denote the order of the
equation (ut-1,2,etc) with what’s outside
the bands. Breusch-Godfrey test
involves the null that coefficients to the
lags of the error term is equal to 0, with
the alternative stating that at least one
is not = 0.
Amount of lags used depends on data
e.g. use 4 if using quarterly data.
Estimate model equation to gain OLS
residuals, then obtain R^2 from
auxiliary regression, which is estimate
of ut with original regressors and
designated lags of ut. Test statistic:
Where q is number of lags. Then find
chi values from table and if BG is bigger
null is false. HAC standard errors are
used over HTSK errors, as they use a
different formula for variance matrix.
Modelling dynamics: Time series
trends typically grow over time and can
be exponential, can be shown by
logging the dependent. Seasonality is
where data shows trends in certain
time frames, often removed from data
and can be done so by using dummy
variables and testing for significance.
Structural change occurs when an
event causes the trend to switch e.g.
2008. Static models do not include lags
of regressors whereas Dynamic Models
do. Autoregressive models are simply
an intercept, lags of itself with
coefficients, and an error term that is
white noise (Mean = 0, Var=sigma^2, no
correlation with lags) or iid
(independent errors). AR(p) has p lags
of itself. AR(1) is the simplest, with
Stationary has a constant mean,
constant variance, and covariance
between set points is constant. And
𝐸(𝑦) =
1 − πœ‘1
π‘‰π‘Žπ‘Ÿ(𝑦) =
1 − πœ‘12
πΆπ‘œπ‘£(𝑦𝑑, 𝑦𝑑 − 𝑗) = πœ‘1 π‘‰π‘Žπ‘Ÿ(𝑦𝑑)
πΆπ‘œπ‘Ÿπ‘Ÿ(𝑦𝑑, 𝑦𝑑 − 𝑗) = πœ‘1
An ARDL(p,q) model is one with p lags
of the dependent and q lags of the x.
Long run effect of a one unit change
in x is
Large sample properties of OLS:
Time series violates E(u|X)=0, thus is
not unbiased, but instead consistent.
Means as sample size grows, the
mean and variance of the dependent
become closer to the actual value.
Creates new assumption 3
(E(ut|xt)=0) , which does not always
hold with correlation between
dependent and error term lags.
Additionally, new assumption 4
(Var(ut|xt)=sigma^2 and E(ut us|xt
xs) = 0 for all of t not +s) allows the
OLS estimator to be asymptotically
normal. Allows usage of OLS if n is
large, otherwise use HAC standard
Nonstationary time series: Any time
series with a trend is non-stationary.
Deterministic trend depends on
time, and the mean is time
dependent. If detrended series is
stationary, it will not include a unit
root. Detrended series is a model of
the error terms. Unit root/stochastic
trend follows an AR(1) process,
where mean=initial y value and
Var(y) = (sigma^2)*t. A random walk
model is a AR(1) process with no
intercept, no lag coefficient, only
stochastic, and a random walk model
with drift is the same but with an
intercept constant and deterministic
trend. MLR1-5: 1.Linear in
parameters 2.Random Sampling 3.
No perfect collinearity implies
Columns of X are linearly
independent and X'X is invertible
4.Zero conditional mean implies
Cov(u|x)=0, u and x are independent
and orthogonal to one another, also
results in E(X’u)=0 5.Homoskedascity