Uploaded by Rufat Guliyev

Lecture Notes for Econometrics I

advertisement
Econometrics I
Spring Term 2024
Bachelor of Science in Economics
Bachelor of Business Administration
Khatai Abbasov, PhD candidate
ADA University
© Khatai Abbasov
1
0. Introduction
Instructor:
 Khatai R. Abbasov
 MSc Graduate from FAU Erlangen-Nürnberg
 PhD Candidate @ Istanbul University
 Adjunct Lecturer @ ADA University
 Analyst @ Pasha Holding
 Office Hours: Please send an e-mail.
 Email: kabbasov@ada.edu.az
 Slides: © All rights reserved.
ADA University
© Khatai Abbasov
2
0. Introduction
Organizational issues:
1. Lecture
 Saturday
08:30 – 09:45
A 110
 Saturday
10:00 – 11:15
A 110
2. Application Classes
 Check the Schedule in the Syllabus
ADA University
© Khatai Abbasov
3
0. Introduction
Assessment:
1.
2.
Attendance
Research project presentation 1
–
–
5%, on individual base
5%, on individual/group base
3.
Midterm examination
–
30%, on individual base
4.
Research project presentation 2
–
20%, on individual/group base
5.
Research project paper
–
10%, on individual/group base
–
30%, on individual base
 February 24
 March 30
 April 27 – May 4
 May 13, by 23.59 o’clock
6.
Final examination
 May 25
ADA University
© Khatai Abbasov
4
0. Introduction
Lecture notes:
 You will find the lecture notes and other materials at online platform:
 Please register at online platform for this course.
Software:
 We are going to use Excel & R this semester. R is a free software environment.
 Check the links in Section 8 in the Syllabus, if you have not downloaded R, yet.
 We are going to learn the basics of R during the upcoming sessions.
 Do you want to learn R on your own?: check the links in the Syllabus then.
ADA University
© Khatai Abbasov
5
0. Introduction
Literature:
 Lecture closely follows:
 Wooldridge, J. M. (2016). Introductory Econometrics: A Modern Approach (6th
ed.). Cengage Learning
 Textbooks below are also recommended:
 Greene, W. H. (2012). Econometric Analysis (Seventh ed.). Pearson Education Limited.
 Studenmund, A. H. (2017). Using econometrics: A practical guide
 Verbeek, M. (2004). A Guide to Modern Econometrics (Second ed.). John Wiley & Sons
Ltd.
ADA University
© Khatai Abbasov
6
0. Introduction
Notations:
 In terms of notation, the lecture notes closely follow the book, e.g.
 Greek characters (𝛼, 𝛽, 𝛾, …) denote parameters
መ 𝛾,
 Estimates are usually indicated by ´hats´ (𝛼,
ො 𝛽,
ො …)
 Matrices and vectors are written by bold characters
 Please consider: the lecture notes are not perfect (yet)! Please let me know if you see
any mistakes or if you are confused.
ADA University
© Khatai Abbasov
7
0. Introduction
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
8
0. Introduction
Outline of Section 0:
0.1 Cross-Sectional Data
0.2 Time Series Data
0.3 Pooled Cross-Sectional Data
0.4 Panel Data
0.5 Pooled Cross Section vs Panel Data
0.6 Parametric vs Non-Parametric Tests
0.7 Application: Basic Concepts of Statistical Theory
ADA University
© Khatai Abbasov
9
0. Introduction
0.1 Cross-Sectional Data
ADA University
© Khatai Abbasov
10
0. Introduction
0.2 Time Series Data
ADA University
© Khatai Abbasov
11
0. Introduction
0.3 Pooled Cross-Sectional Data
ADA University
© Khatai Abbasov
12
0. Introduction
0.4 Panel Data
ADA University
© Khatai Abbasov
13
0. Introduction
0.5 Pooled Cross Section vs Panel Data
 Pooled Cross Section
 each year draw a random sample on wages, education, experience, etc.:
 statistically observations are independently sampled
 i.e., no correlation in the error terms across different observations.
 Yet, not identically distributed, since samples are from different time periods:
 e.g., distribution of wages and education have changed over time in most countries.
 Panel Data (also called longitudinal data)
 collect data from the same individuals, firms, states etc. across time:
 e.g., the same individuals are reinterviewed at several subsequent points in time.
 Observations are not indepedently distributed across time,
 e.g., ability that affects someone‘s wage in 1990 will do so in 1991, too.
 Since Panel Data methods are somewhat more advanced, we skip that in Econometrics II.
ADA University
© Khatai Abbasov
14
0. Introduction
0.6 Parametric vs Non-Parametric Tests
Outcome Variable
Input
Variable
ADA University
Nominal
Categorical
>2 categories
Ordinal
Quantitative
Discrete
Quantitative
Non-Normal
Quantitative
Normal
Nominal
Χ2 or Fisher’s
Χ2
Χ2 –trend or
Mann-Whitney
Mann-Whitney
Mann-Whitney
or log-rank
Student’s t test
Categorical
>2 categories
Χ2
Χ2
Kruskal-Wallis
Kruskal-Wallis
Kruskal-Wallis
Analysis of
Variance
Ordinal
Χ2 –trend or
Mann-Whitney
Poisson
Regression
Spearman
rank
Spearman
rank
Spearman
rank
Spearman
rank or linear
reg.
Quantitative
Discrete
Logistic
Regression
Poisson
Regression
Poisson
Regression
Spearman
rank
Spearman
rank
Spearman
rank or linear
reg.
Quantitative
Non-Normal
Logistic
Regression
Poisson
Regression
Poisson
Regression
Poisson
Regression
Plot & Pearson
or Spearman
rank
Plot & Pearson
or Spearman
and linear reg.
Quantitative
Normal
Logistic
Regression
Poisson
Regression
Poisson
Regression
Poisson
Regression
Linear
regression
Pearson and
linear regress.
© Khatai Abbasov
15
0. Introduction
ADA University
© Khatai Abbasov
16
0. Introduction
0.7 Application: Basic Concepts of Statistical Theory
 Mean
 Variance
 Standard Deviation
 Covariance
 Correlation
 Application in Excel…
ADA University
© Khatai Abbasov
17
1. The Simple Regression Model
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
18
1. The Simple Regression Model
Outline of Section 1:
1.1 Statistical Models
1.2 Linear Regression Models
1.3 Ordinary Least Squares
1.4 R-squared
1.5 Multiple Regression Analysis
1.6 Omitted Variable Bias
1.7 Examples to widen your understanding
1.8 Qualitative Information: Binary (or Dummy) Variables
1.9 Binary Dependent Variable – LPM Model
1.10 Discrete Dependent Variables
ADA University
© Khatai Abbasov
19
1. The Simple Regression Model
1.1 Statistical Models
 Statistics/Econometrics is mainly concerned with model building.
 Model: one variable is caused by another
 Model building often begins with an idea of a relation
 Statistical model building: translating this idea into a (a set of) equation(s)
Some features of this equation answer a relevant/interesting question about the
variable of interest
Examples: Does insurance coverage affect health care utilization? What is the
direction of this effect if it exists? How “big” is this effect if it exists?
ADA University
© Khatai Abbasov
20
1. The Simple Regression Model
1.1 Statistical Models
 Statistical point of view: health care utilization, insurance coverage, and further
covariates have a joint probability distribution
 We often interested in conditional distribution of one of these variables given the
other.
 Focus is often on the conditional mean of one variable y given the value of covariates
x i.e. E[y|x]
E[y|x] is the regression function
E.g. expected number of doctor visits given income, health status, insurance
status etc.
Linear regression model most common
ADA University
© Khatai Abbasov
21
1. The Simple Regression Model
1.2 Linear Regression Models
 Simple/Bivariate Regression Model
 y: left-hand-side variable [lhs]
 x: right-hand-side variables. [rhs]
 u : disturbance /errors / unobserved part
 Random component from the underlying theoretical model
 Measurement error in y
 Captures anything not explicitly taken into account by the model
ADA University
© Khatai Abbasov
22
1. The Simple Regression Model
1.2 Linear Regression Models
 Simple/Bivariate Regression Model
ADA University
© Khatai Abbasov
23
1. The Simple Regression Model
1.2 Linear Regression Models
 Simple/Bivariate Regression Model
 Sample of data for 𝑦𝑖 and 𝑥𝑖 with 𝑖 = 1, … , 𝑛
 Key assumption: each observed value of 𝑦𝑖 is generated by the underlying data
generating process
𝑦𝑖 = 𝛽1 𝑥𝑖 + 𝑢𝑖
𝑦𝑖 is determined by deterministic part 𝛽1 𝑥𝑖 and random part 𝑢𝑖
 Objective of statistical analysis: estimate the unknown model parameters, here 𝛽1
Testing hypotheses (Does the number of doctor visits increase by income?)
Identifying independent effects of 𝑥𝑖 on y (How strong will doctor visits increase if
income increases by one unit?)
Making predictions about 𝑦𝑖 (How often will individual 𝑖 with characteristics 𝑥𝑖 visit
a doctor?)
ADA University
© Khatai Abbasov
24
1. The Simple Regression Model
1.2 Linear Regression Models
 Linearity does not mean that the statistical model needs to be linear
𝑦 = 𝛼 ∗ 𝑥𝛽𝑒𝑢
is nonlinear, its transformation
ln 𝑦 = ln 𝛼 + 𝛽 ln(𝑥) + 𝑢
is linear
𝑦 = 𝛼 ∗ 𝑥𝛽 + 𝑒𝑢
is non-linear too but cannot be transformed to
linear model
 Linearity refers to linearity in the parameters and in the disturbances, not to linearity in
the (original, not transformed) variables, e.g.
𝑦 = 𝛼 + 𝛽1 𝑙𝑛 𝑥 + 𝛽2 𝑙𝑛(𝑧) + 𝑢
𝑙𝑛(𝑦) = 𝛼 + 𝛽1 𝑥 + 𝛽2 𝑧 2 + 𝑢
𝑦 = 𝛼 + 𝛽1 1/𝑥 + 𝛽2 𝑧 + 𝑢
are linear regression models though they are
nonlinear in x
 Especially log-linear (log-log, semi-log) models frequently used in applied work
ADA University
© Khatai Abbasov
25
1. The Simple Regression Model
1.2 Linear Regression Models
 Linearity does not mean that the statistical model needs to be linear
 We mean here that the equation is linear in the parameters β.
Nothing prevents us from using simple regression to estimate a model such as
,
where cons is consumption and inc is Income annually.
 Linear models with non-linear variables are often more realistic:
ADA University
© Khatai Abbasov
26
1. The Simple Regression Model
1.2 Linear Regression Models
 Linear models with non-linear variables are often more realistic:
ADA University
© Khatai Abbasov
27
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 Estimation of model parameters: objective of econometric model
 Different approaches to model estimation, least squares regression the most popular
 Starting with simple least squares is often good idea in applied work - even if more
sophisticated (possibly better suited) methods are available.
 Idea of least squares estimation: choosing a coefficient 𝛽 such that the sum of
squared residuals (estimated unobserved parts) is minimized.
 Intuition: The fitted line 𝛽1 𝑥 is close to the observed data points
 Algebraic perspective: least squares allows for algebraic solution of the minimization
problem
 Least squares estimation puts much weight on avoiding large deviations of observed
data points from the fitted line (by considering squares)
ADA University
© Khatai Abbasov
28
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 Population and Sample Regression
ADA University
© Khatai Abbasov
29
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢
 E(u) = 0
-> Distribution of unobserved factors in the population.
 E(u|x) = E(u)
-> u is mean independent of x, in other words full
independence between u and x.
 Example:
-> E(abil|educ) = E(abil|8) = E(abil|16)
-> What if ability increases with years of education?
 E(𝑦|𝑥) = 𝛽1 𝑥 -> Average value of y changes with x, not all units!
 Systematic part and Unsystematic part ???
ADA University
© Khatai Abbasov
30
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 How to derive the OLS, i.e., how to find out the values for 𝛽0 & 𝛽1
 See Chapter 2 in Wooldridge (2016), if you want more explanation about derivation
ADA University
© Khatai Abbasov
31
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 How to derive the OLS, i.e., how to find out the values for 𝛽0 & 𝛽1
 using the hint, we get
see A7 & A8 in the Appendix
of the book, for this hint.
 See Chapter 2 in Wooldridge (2016), if you want more explanation about derivation
ADA University
© Khatai Abbasov
32
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 How to derive the OLS, i.e., how to find out the values for 𝛽0 & 𝛽1
 Variance should be greater than zero
 See Chapter 2 in Wooldridge (2016), if you want more explanation about derivation
ADA University
© Khatai Abbasov
33
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 Variance should be greater than 0
ADA University
© Khatai Abbasov
34
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 Example:
 What is wage for a person with 8 years of education?
 How much does hourly wage increase by 1 & 4 more years of education?
ADA University
© Khatai Abbasov
35
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 More Examples
ADA University
© Khatai Abbasov
36
1. The Simple Regression Model
1.3 Ordinary Least Squares (OLS)
 A note on terminology: Statistical Jargon
 Often, we indicate the estimation of a relationship using OLS by writing equations.
 Alternatively, we run the regression of y on x.
 Or simply we regress y on x: : always the dependent variable on the independents.
 Example: regress salary on roe
When we use such terminology, we will always mean that we plan to estimate the
intercept along with the slope coefficient.
This case is appropriate for the vast majority of applications.
 Unless explicitly stated otherwise, we estimate an intercept along with a slope.
 Fitted values (Explained Part) versus Actual Data versus Residuals.
ADA University
© Khatai Abbasov
37
1. The Simple Regression Model
1.4 R-squared
ADA University
© Khatai Abbasov
38
1. The Simple Regression Model
1.4 R-squared
 R-squared is the ratio of the explained variation compared to the total variation;
 It is interpreted as the fraction of the sample variation in y that is explained by x.
SST = SSE + SSR
divide all side by SST
1 = SSE/SST + SSR/SST
1 – SSR/SST = SSE/SST = R2
R-squared
Example:
 Firm’s ROE explains only 1.3% of the variation in salaries for this sample.
ADA University
© Khatai Abbasov
39
1. The Simple Regression Model
1.4 R-squared
 R-squared is a poor tool in the model analyses
 Example
 narr86 = number of arrests in 1986
 pcnv = the proportion of arrests prior to 1986 that led to conviction
Interpret the coefficients
Interpret the R-squared value
In the arrest example, the small R2 reflects what we already suspect in the social
sciences: It is generally very difficult to predict individual behavior.
ADA University
© Khatai Abbasov
40
2. The Multiple Regression Model
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
41
2. The Multiple Regression Model
Outline of Section 2:
2.1 Models with k independent variables
2.3 Omitted Variable Bias
2.3 Examples to widen your understanding
ADA University
© Khatai Abbasov
42
2. The Multiple Regression Model
2.1 Models with k independent variables
 Linear regression model with multiple independent variables
For any values of 𝑥1 and 𝑥2 in the population, the average of the unobserved
factors is equal to zero.
This implies that other factors affecting y are not related on average to 𝑥1 & 𝑥2
Thus, equation fails when any problem causes u to be correlated with any of the
independent variables.
ADA University
© Khatai Abbasov
43
2. The Multiple Regression Model
2.1 Models with k independent variables
 Linear regression model with multiple independent variables
Example
Exper – years of labor market experience
Tenure – years with the current employer
Ceteris Paripus Effect - if we take two people with the same level of experience
and job tenure, the coefficient on educ is the proportionate difference in predicted
wage when their education levels differ by one year.
Questions
 The estimated effect on wage when an individual stays at the same firm for another year?
 How to obtain fitted or predicted values for each observation?
 What is the difference between the actual values and fitted values?
ADA University
© Khatai Abbasov
44
2. The Multiple Regression Model
2.2 Omitted Variable Bias
 OVB occurs when a statistical model leaves out one or more relevant variables.
 The bias results in the model attributing the effect of the missing variables to those
that were included.
𝑦 = β0 + β1𝑥1 + β2𝑥2 + 𝑢
𝑦෤ = β෨ 0 + β෨ 1𝑥1 + 𝑣
-> Full model
-> Underspecified model
 What if Corr(x1,x2) = 0 ?
ADA University
© Khatai Abbasov
45
2. The Multiple Regression Model
2.3 Examples to widen your understanding
1. Some students are randomly given grants to buy computer. If the amount of the grant
is truly randomly determined, we can estimate the ceteris paribus effect of the grant
amount on subsequent college grade point by simple regression analysis.
a) Because of random assignment, all of the other factors that affect GPA would be
uncorrelated with the amount of the grant.
b) R-squared would probably be very small.
c) In a large sample we could still get the precise estimate of the effect of the grant.
d) For more precise estimate, SAT score, family background variables would be good
candidates since no correlation is an issue with the amount of grant. Remember, I do not
mention here unbiasedness, it is already obtained. Issue here is getting an estimator with
small sampling variance.
 Keep in mind that the smaller the R-squared, the harder the prediction/ forecast.
 Yet, goodness of fit or other factors should not be decisive in model selection. It is
important what is your purpose to assess.
ADA University
© Khatai Abbasov
46
2. The Multiple Regression Model
2.3 Examples to widen your understanding
2. We have ecolabeled apples as a dependent variable: ecolbs, and independent
variables are prices of the ecolabeled apples (ecoprc) and price of regular apples
(regprc). Families are randomly chosen. And we want to estimate the price effects.
a) Since random assignment is in charge, family income, desire for clean environment are
unrelated to prices.
b) Hence, 𝑟𝑒𝑔 𝑒𝑐𝑜𝑙𝑏𝑠 𝑒𝑐𝑜𝑝𝑟𝑐 𝑟𝑒𝑔𝑝𝑟𝑐 produces unbiased estimator of the price effects.
c) R-squared is 0.0364, which means price variables explain only 3.6 % of the total variation
in the dependent variable.
3. Suppose we want to estimate the effect of pesticide usage among farmers on family
health expenditures. Should we include the number of doctor visits as an
explanatory variable? The answer is NO!
a) Health expenditures include doctor visits, and we want to see all health expenditures
b) If we include doctor visits, then we are only measuring the effect of pesticide usage on
health expenditures other than doctor visits.
ADA University
© Khatai Abbasov
47
3. Hypothesis Testing in the Regression Analysis
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
48
3. Hypothesis Testing in the Regression Analysis
Outline of Section 3:
3.1 Variance of Estimates
3.2 Hypothesis Testing
3.3 Type I & Type II Errors
3.4 The t-Test
3.5 The F-Test
ADA University
© Khatai Abbasov
49
3. Hypothesis Testing in the Regression Analysis
3.1 Variance of Estimates
 In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢
Having look on Var(β෡1), it is easy to summarize how this variance depends on the error
variance, σ2, and the total variation in {x1,x2,..,xn}, SSTx.
 The larger the error variance, the larger is Var(β෡1). This makes sense since more variation
in the unobservables affecting y makes it more difficult to precisely estimate β1.
 More variability in independent variables are preferred. That means the more spread out is
the sample if independent variables, the easier is to trace out the relationship between
E(y|x) and x, in other words, easier to estimate β1.
 We want to have lower variance for the estimates!
ADA University
© Khatai Abbasov
50
3. Hypothesis Testing in the Regression Analysis
3.1 Variance of Estimates
 In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢
Estimating the error variance σ2
 Errors (or disturbances) are different from residuals.
 Errors show up in the equation containing the population parameters, β0 and β1.
 Residuals show up in the estimated equation with β෡0 and β෡1.
 Thus, the errors are never observed, while the residuals are computed from the data.
ADA University
© Khatai Abbasov
51
3. Hypothesis Testing in the Regression Analysis
3.1 Variance of Estimates
 Homoskedasticity & Heteroskedasticity
It is likely that people with more education
have a wider variety of interests and job
opportunities, which could lead to more wage
variability at higher levels of education.
People with very low levels of education
have fewer opportunities and often must
work at the minimum wage; this serves to
reduce wage variability at low education
levels.
ADA University
© Khatai Abbasov
52
3. Hypothesis Testing in the Regression Analysis
3.1 Variance of Estimates
 In the multiple regression model of 𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝑢
Having look on Var(β෡𝑗), it is easy to summarize how this variance depends on the error
variance, σ2, the total sample variation in xj, SSTj and on 𝑅𝑗2 .
 The larger the error variance, the larger is Var(β෡1). This makes sense since more variation
in the unobservables affecting y makes it more difficult to precisely estimate β1.
 More variability in independent variables are preferred. That means the more spread out is
the sample if independent variables, the easier is to trace out the relationship between
E(y|x) and x, in other words, easier to estimate β1.
 𝑅𝑗2 can't be seen in simple regression analysis since one independent variable is used.
ADA University
© Khatai Abbasov
53
3. Hypothesis Testing in the Regression Analysis
3.1 Variance of Estimates
 In the multiple regression model of 𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝑢
 What do you understand when 𝑅𝑗2 is closer to 1 and to 0?
 How does Var(β෡𝑗) change when 𝑅𝑗2 is getting closer to 1? Is it really an issue?
 Multicollinearity & Perfect collinearity inference in this manner?
 BLUE – Linear in parameters, Random Sampling, No Perfect Collinearity,
Zero Conditional Mean, Homoskedasticity
ADA University
© Khatai Abbasov
54
3. Hypothesis Testing in the Regression Analysis
3.2 Hypothesis Testing
 Statistical inference has two subdivisions:
estimation
hypothesis testing
 Estimation addresses the question “What is this parameter’s value?”
 Hypothesis testing question is “Is the value of the parameter x (or some other
specific value)?”
 Steps in Hypothesis Testing
1. Stating the hypotheses
2. Identifying the relevant test statistic and its probability distribution
3. Specifying the significance level
4. Calculating the test statistic
5. Making the statistical decision
6. Making economic, financial, investment, or any other decision.
ADA University
© Khatai Abbasov
55
3. Hypothesis Testing in the Regression Analysis
3.2 Hypothesis Testing
 We always state two hypotheses:
the null hypothesis, the null, H0
the alternative hypothesis, Ha, H1
 The null hypothesis is the hypothesis to be tested.
 The alternative is the one accepted when the null is rejected.
 Formulation of Hypotheses
1. 𝐻0 : 𝛽 = 𝑥 versus 𝐻1 : 𝛽 ≠ 𝑥
2. 𝐻0 : 𝛽 ≤ 𝑥 versus 𝐻1 : 𝛽 > 𝑥
3. 𝐻0 : 𝛽 ≥ 𝑥 versus 𝐻1 : 𝛽 < 𝑥
the 1st formulation is a two-sided (or two-tailed)hypothesis test
the 2nd and 3rd are one-sided (or one-tailed) hypothesis test
ADA University
© Khatai Abbasov
56
3. Hypothesis Testing in the Regression Analysis
3.2 Hypothesis Testing
 We calculate the value of a test statistic based on a sample
 Test statistic frequently has the following form (but not always):
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
𝑆𝑎𝑚𝑝𝑙𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑢𝑛𝑑𝑒𝑟𝐻0
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑆𝑎𝑚𝑝𝑙𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒
 The test statistic is used to decide whether we reject the null or not
 For that we compare the value of the test statistic with the critical value
 Critical values are obtained from the table on the basis of probability distributions
for t-test, use t-distribution, i.e., t table
for z-test, use standard normal or z-distribution
for a chi-square test, use chi-square distribution, i.e., χ2 table
for F test, use F-distribution, i.e., F table
ADA University
© Khatai Abbasov
57
3. Hypothesis Testing in the Regression Analysis
3.3 Type I & Type II Errors
 In hypothesis testing two actions are possible
reject the null hypothesis
we cannot reject the null hypothesis
Decision is based on the comparison of the test statistic with the critical value
Critical values are obtained from the statistical tables for the given significance
level, e.g., 10%, 5%, 1%, etc.
 There are 4 possible outcomes when test a null hypothesis
1. The Null is wrong & we reject it
 correct decision
2. The Null is correct, but we reject it
 type 1 error
3. The Null is wrong, we don’t reject it
 type 2 error
4. The Null is correct, we don’t reject it
 correct decision
If we mistakenly reject the null, we can only make type 1 error
The probability of type 1 is denoted by alpha, α – also known as level of significance
ADA University
© Khatai Abbasov
58
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Test statistic has the following form:
𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 − 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
𝑐𝑜𝑚𝑝𝑎𝑟𝑒 𝑤𝑖𝑡ℎ 𝑡𝑛−𝑘−1
መ
𝑠𝑒(𝛽𝑠𝑎𝑚𝑝𝑙𝑒 )
 n stands for the sample size
 k is the count of the variables in the model
 Let’s say that the null hypothesis is as follows:
 H0: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 0, then
 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 /𝑠𝑒(𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 )
 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 will never exactly be zero, whether or not H0 is true.
 The question is: How far is 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 from zero?
 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 being very far from zero provides evidence against H0: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 0
 i.e., test statistic measures how many estimated s.d. 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 is away from zero
ADA University
© Khatai Abbasov
59
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 One-sided alternative: 𝐻0 : 𝛽 ≤ 0 versus 𝐻1 : 𝛽 > 0
1. We must decide on a significance level
2. 5% significance level means to mistakenly reject H0
when it is true 5% of the time
3. Critical value for α = 5% is the 95th percentile in a tdistribution with n-k-1 degrees of freedom: call this
by c
4. 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽෡ > 𝒄, then reject the Null
Example
 For a 5% level with n-k-1 = 28 degrees of freedom, the
critical value is c = 1.701
 if 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽෡ < 1.701, we fail to reject 𝐻0
ADA University
© Khatai Abbasov
60
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
Degrees
of
Freedom
(𝑛 − 𝑘 − 1)
ADA University
© Khatai Abbasov
61
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Example
i.
ii.
iii.
iv.
v.
vi.
ADA University
What is the degrees of freedom here?
What is the t statistic for each variable?
What is the null and alternative hypothesis?
What are the critical values for 5% and 1% levels?
Do we reject the null or fail rejecting the null?
What is economical and statistical significance here for the variable
experience?
© Khatai Abbasov
62
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 One-sided alternative: 𝐻0 : 𝛽 ≥ 0 versus 𝐻1 : 𝛽 < 0
1. The rejection rule for the alternative here is just the
mirror image of the previous case.
2. Now, critical value comes from the left tail of the t
distribution.
3. To reject H0 against the negative alternative, we
must get a negative t statistic.
4. 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽෡ < −𝒄, then reject the Null
Example
 For a 5% level with n-k-1 = 90 degrees of freedom, the
critical value is c = 1.662
 if 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽෡ < −1.662, we reject 𝐻0
ADA University
© Khatai Abbasov
63
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Example
i.
ii.
iii.
iv.
v.
vi.
ADA University
What is the degrees of freedom here?
What is the t statistic for each variable?
What is the null and alternative hypothesis?
What are the critical values for 5% and 10% levels?
Do we reject the null or fail rejecting the null?
If we reject the null, then what significance level is appropriate for failing to
reject the null?
© Khatai Abbasov
64
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Example
i.
ii.
iii.
ADA University
How changing functional form can affect our conclusions?
How did significance of enroll changed in the new model?
Which model should we choose if we consider only significance of enroll and Rsquared estimations?
© Khatai Abbasov
65
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
Degrees
of
Freedom
(𝑛 − 𝑘 − 1)
ADA University
© Khatai Abbasov
66
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Two-sided alternative: 𝐻0 : 𝛽 = 0 versus 𝐻1 : 𝛽 ≠ 0
1. This is the relevant alternative when the sign of βj is
not well defined by theory or common sense.
2. We cannot use estimates to formulate the null or
alternative hypotheses.
3. For the two-sided alternative absolute value of the t
statistic is relevant.
4.
𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽෡ > 𝒄, then reject the Null
Example
 For a 5% level with n-k-1 = 90 degrees of freedom critical
value is c = 2.060
 if 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽෡ > 2.060, we reject 𝐻0
ADA University
© Khatai Abbasov
67
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Example
i.
ii.
iii.
iv.
v.
vi.
vii.
viii.
ADA University
What is the degrees of freedom here?
What is the t statistic for each variable?
What is the null and alternative hypothesis?
What are the critical values for 5% and 1% levels?
Do we reject the null or fail rejecting the null?
Is ACT practically significant?
How to interpret coefficient of skipped? Hint: Skipped = number of lectures missed per week.
What happens if we consider one-sided alternative?
© Khatai Abbasov
68
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Testing other Hypotheses
 Let’s say that the null hypothesis is as follows:
 H0: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 𝑎𝑗 , then
 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 − 𝑎𝑗 /𝑠𝑒(𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 )
 Common cases are 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 1 & 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = −1
Example: One-sided alternative
Log(crime) = β0+ β1log(enroll) + u
 β1 is the elasticity of crime with respect to enrollment.
 It is not much use to test H0: βj = 0, as we expect the total number of crimes to increase as the
size of the campus increases.
 On the other hand, a noteworthy alternative is H1: βj > 1, which implies that a 1% increase in
enrollment raises campus crime by more than 1%.
ADA University
© Khatai Abbasov
69
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Testing other Hypotheses: One-sided alternative
Log(crime) = β0+ β1log(enroll) + u
Alternative hypothesis is H1: βj > 1, which
implies that a 1% increase in enrollment raises
campus crime by more than 1%.
 What is t statistic?
 Can we reject the null? If yes, then at what level?
 Is 1.27 necessarily a good estimate of ceteris
paribus effect?
 What if H1: βj <−1?
ADA University
© Khatai Abbasov
70
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 Testing other Hypotheses: Two-sided alternative
Log(crime) = β0+ β1log(enroll) + u
The Null: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 1
The Alternative: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = −1
 What is t statistic for log(nox)?
 Considering the t statistic, do we need to check the
critical value from the table?
 Can we reject the null? If yes, then at what level?
 Are the estimates statistically different than zero?
ADA University
© Khatai Abbasov
71
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
መ
 Confidence Intervals: 𝛽መ ± 𝑐 ∗ 𝑠𝑒(𝛽)
For 95% confidence interval, c is the critical value in the 97.5th percentile
In other words, α = 5% is divided between the two tails.
መ
For 𝑑𝑓 = 𝑛 − 𝑘 − 1 = 25, a 95% CI is [𝛽መ − 2.06 ∗ 𝑠𝑒 𝛽መ , 𝛽መ + 2.06 ∗ 𝑠𝑒(𝛽)]
 Example
i. What is the confidence interval for βlog(sales)?
ii. How to interpret zero being out of the confidence interval?
iii. What is the confidence interval for βprofmarg?
iv. How to interpret zero being included in the 95% CI?
v. What is the t-statistic, significance level against the two-sided and onesided alternative?
ADA University
© Khatai Abbasov
72
3. Hypothesis Testing in the Regression Analysis
3.4 The t-Test
 A little note about p-value for t Tests
You can consider p-values as perfectly calculated alpha significance levels.
p-value is a probability, which is always between zero and one.
Regression packages usually compute p-values for two-sided alternatives. But it is
simple to obtain the one-sided p-value: just divide the two-sided p-value by 2.
For large samples it is easy to remember critical values on the t table, thus not
essentially crucial to see p-values.
However, in F test, you are going to realize that critical values for F tests are not
easy to memorize, since many tables are in use. Hence, p-values are very helpful.
ADA University
© Khatai Abbasov
73
3. Hypothesis Testing in the Regression Analysis
3.5 The F-Test
 To test whether one variable has no effect on the dependent variable -> t-statistic.
 To test whether a group of variables has no effect on the dep. varbl. -> F statistic.
log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽0 + 𝛽1 𝑦𝑒𝑎𝑟𝑠 + 𝛽2 𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽3 𝑏𝑎𝑣𝑔 + 𝛽4 ℎ𝑟𝑢𝑛𝑠𝑦𝑟 + 𝛽5 𝑟𝑏𝑖𝑠𝑦𝑟 + 𝑢
𝑯𝟎 : 𝜷𝟑 = 𝟎, 𝜷𝟒 = 𝟎, 𝜷𝟓 = 𝟎
 This is called multiple hypotheses test or a joint hypotheses test.
𝑯𝟏 : 𝑯𝟎 𝒊𝒔 𝒏𝒐𝒕 𝒕𝒓𝒖𝒆
Alternative holds if at least one of 𝛽3 , 𝛽4 , 𝛽5 is different from zero.
 Full model in F-test is also called unrestricted model
 If you treat the 𝛽3 , 𝛽4 , 𝛽5 as zero, then the remaining model is called restricted.
 See the examples in the next slide.
ADA University
© Khatai Abbasov
74
3. Hypothesis Testing in the Regression Analysis
3.5 The F-Test
 Unrestricted model estimation
 Restricted model estimation
ADA University
i.
Has bavg, hrunsyr, rbisyr
statistically significant t statistic in
the unrestricted model?
ii. How SSR changes when
variables are dropped?
iii. How R-squared changes when
variables are dropped?
© Khatai Abbasov
75
3. Hypothesis Testing in the Regression Analysis
3.5 The F-Test
 F statistic or F ratio:
1. q is the number of exclusion restrictions
2. F statistic is always nonnegative. Why?
3. 𝐻0 : 𝛽3 = 0, 𝛽4 = 0, 𝛽5 = 0
4. Under H0 ∶ 𝐅~𝑭𝒒, 𝒏 − 𝒌 − 𝟏
5. Fstatistic > 𝐜, then reject the Null
6. What is critical value 𝐜 ?
Example
 For a 5% level with q=3 and n-k-1 = 60, the critical
value is c = 2.76
 If 𝐹 > 2.76, then reject the null at 5% level. In other
words, the variables are jointly significant.
ADA University
© Khatai Abbasov
76
3. Hypothesis Testing in the Regression Analysis
Denominator
Degrees of
Freedom
3.5 The F-Test
 5 % Critical Values of the F Distribution
ADA University
© Khatai Abbasov
77
3. Hypothesis Testing in the Regression Analysis
Denominator
Degrees of
Freedom
3.5 The F-Test
 1 % Critical Values of the F Distribution
ADA University
© Khatai Abbasov
78
3. Hypothesis Testing in the Regression Analysis
3.5 The F-Test
 Unrestricted model estimation
i.
ii.
Calculate the F statistic
What are the null and the
alternative hypotheses?
iii. Are the restricted variables jointly
different than zero?
iv. Consider insignificant t statistics
and explain the result of F test in
the light of multicollinearity.
v. Can we test significance of one
independent variable by F test?
 Restricted model estimation
ADA University
© Khatai Abbasov
79
3. Hypothesis Testing in the Regression Analysis
3.5 The F-Test
 F Statistic for overall significance of a regression
H0 : 𝛽1 = 𝛽2 = 𝛽3 = ⋯ = 𝛽𝑘 = 0
Null states that none of the explanatory variables has and effect on y.
How to calculate the F statistic then?
 Most regression packages report the F statistic automatically.
 If we fail to reject, then we should look for other variables to explain Y.
ADA University
© Khatai Abbasov
80
4. Nonlinearities in the Regression Models
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
81
4. Nonlinearities in the Regression Models
Outline of Section 4:
4.1 Logarithmic Functional Forms
4.2 Models with Quadratics
4.3 Interaction Term
4.4 Adjusted R-squared
ADA University
© Khatai Abbasov
82
4. Nonlinearities in the Regression Models
4.1 Logarithmic Functional Forms
 Linear models with (natural) logarithmic variables are often more realistic:
ADA University
© Khatai Abbasov
83
4. Nonlinearities in the Regression Models
4.1 Logarithmic Functional Forms
 What variables to use in a log form:
 Very common to use wages, salaries, firm sales, firm market values in natural logarithm
 Population, total number of employees, school enrolment often in natural log
 Measurement in years, e.g., education, experience, tenure, age usually in original form
 Proportions or percent: unemployment rate, arrest rate, participation rate in a pension plan,
etc. are preferred in level form, but sometimes may be seen in log as well.
price stands for housing price (price)
nox = amount of nitrogen oxide in the air
 interpret the coefficients
 natural logarithm in log-level model has some approximation error when the
percentage change is large
ADA University
© Khatai Abbasov
84
4. Nonlinearities in the Regression Models
4.1 Logarithmic Functional Forms
 Exact calculation of the percentage change in log-level model requires adjustment
since we are interested in one unit change, ∆𝑥2 = 1
 100[exp(0.306)−1]=35.8%
 What happens if 𝛽2 = −0.052? Is it crucial to adjust for small percentage changes?
 What then if we increase 𝑥2 by 5 unit instead of 1 unit? (𝛽2 = −0.052)
 What if we want to adjust for the decreasing number of rooms by one?
ADA University
© Khatai Abbasov
85
4. Nonlinearities in the Regression Models
4.2 Models with Quadratics
 Quadratic functions are used to capture decreasing or increasing marginal effects.
 β1 does not measure then change in y with respect to x anymore, since no sense of keeping
x2 fixed while changing x.
 Exper has a diminishing effect on wage.
ADA University
© Khatai Abbasov
86
4. Nonlinearities in the Regression Models
4.2 Models with Quadratics
 Quadratic functions are used to capture decreasing or increasing marginal effects.
 Exper has a diminishing effect on wage.
 The 1st year of experience is worth roughly 30 cent/hour. (0.298 – 0 = 0.298)
 The 2nd year of exper worth less, roughly 28.6 cent/hour. (0.298 – 2 x 0.0061 = 0.286)
 From 10 to 11 years of experience, wage is predicted to increase by about
0.298 – 2 x 0.0061 x 10 = 0.176 or 17.6 cent/hour
Remember:
ADA University
, and x is the current year before change
© Khatai Abbasov
87
4. Nonlinearities in the Regression Models
4.2 Models with Quadratics
 Calculate the turning point on the graph
 When the 1st derivative is equalized to zero, then we find the maximum of the function or
the turning point:
 0.298/(2 x 0.0061) = 24.4 years is a point where return to experience becomes zero.
ADA University
© Khatai Abbasov
88
4. Nonlinearities in the Regression Models
4.2 Models with Quadratics
 Possible explanations for the turning point
 Maybe few people have more than 24.4
years of experience, i.e., right side of the
curve can be ignored.
 It is possible that return to experience
becomes negative at some point but not in
24.4, since bias is expectable due to omitted
variables.
 Or functional form might be wrong at all.
ADA University
© Khatai Abbasov
89
4. Nonlinearities in the Regression Models
4.2 Models with Quadratics
 Example 6.2 from the main book
 What is the t-statistic for rooms2?
 See the picture and literally explain the meaning of the
relation when room number is less than 4.4?
 How to calculate the turning point?
 Starting at 3 rooms and increasing to 4 rooms reduces
a house‘s expected value?! What could be the possible
explanation?
 How much prices are increased if we increase the room
number from 5 to 6?
 How much prices are increased if we increase the room
number from 6 to 7?
 Interpret the coefficient of the quadratic variable
practically/economically on the basis of the previous
two questions?
ADA University
© Khatai Abbasov
1% of the sample
90
4. Nonlinearities in the Regression Models
4.3 Interaction Term
 Sometimes, partial effect depends on magnitude of another independent variable
 β3 > 0 means an additional bedroom yields a higher increase in price for larger houses
 β2 is the effect of bdrms on price for a home with zero square feet;
 often this is not of interest. Instead, the model below would be more interesting:
 where δ2 is the partial effect of x2 on y at the mean value of x1.
(For mathematical proof, multiply out the interaction term.)
ADA University
© Khatai Abbasov
91
4. Nonlinearities in the Regression Models
4.3 Interaction Term
 Example 6.3 from the main book
 Dependent is standardized final exam score .
 x1 is attendance rate, unit is %.
 priGPA is the GPA in the past







Show the effect of the attendance rate mathematically?
What β1 measures? Is it interesting finding considering the smallest priGPA = 0.86 in the sample?
What is the t-statistic of β1 and β6, are they significant statistically? Why?
F test having p value of 0.014 for β1 and β6 would mean that?
The effect of atndrte on stndfl at the mean value of priGPA, and its interpretation? (mean = 2.59, answer: 0.0078)
Is this 0.0078 is statistically significant? Remember the transformation in the previous slide!
What about the effect of priGPA on stndfnl. Try to explain mathematically?
ADA University
© Khatai Abbasov
92
4. Nonlinearities in the Regression Models
4.4 Adjusted R-squared
 R-squared never falls with a new independet variable since SSR does not rises.
 Solution: use Adj-R2
 Primary attractiveness is that it penalizes additional independent variable in the model.
 k in the equation penalizes the equation, so Adjusted R-squared can either increase or
decrease with a new independent variable.
 Negative value of Adj. R-squared indicates a very poor model fit relative to the DF.
ADA University
© Khatai Abbasov
93
4. Nonlinearities in the Regression Models
4.4 Adjusted R-squared
 Adjusted R-squared is shown by R-bar squared very often.
Interpret R-squared for each model and explain the difference in values?
Can we use Adjusted R-squared for model decision? Why?
Hint: Dependent variables are different!
If the dependent variables were same, how could we use Adjusted R-squared for
comparison?
ADA University
© Khatai Abbasov
94
5. Binary (or Dummy) Variables
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy)Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
95
5. Binary (or Dummy) Variables
Outline of Section 5:
5.1 Qualitative Information
5.2 Dummy in Multiple Regression Analysis
5.3 Dummy Variable for Multiple Categories
5.4 Ordinal Information
5.5 Binary Dependent Variable – LPM Model
5.6 Discrete Dependent Variables
ADA University
© Khatai Abbasov
96
5. Binary (or Dummy) Variables
5.1 Qualitative Information
ADA University
© Khatai Abbasov
97
5. Binary (or Dummy) Variables
5.1 Qualitative Information
 Binary variable = Dummy variable = Zero-one variable
 What does intercept tell us?
 What does coefficient of female tell us?
 Is the difference between male and females statistically and economically significant?
 Intercept in the models with binary variables are interpreted as the base group
 The base group or benchmark group here is male, since female = 0 leaves us with males
and that is shown by intercept.
ADA University
© Khatai Abbasov
98
5. Binary (or Dummy) Variables
5.2 Dummy in Multiple Regression Analysis
 Intercept in the models with binary variables are interpreted as the base group
 What does negative intercept, -1.57 stand for?
 Is it economically meaningful? Why?
 What does coefficient on woman measure?
 Why in this example it is important to control for educ exper and tenure?
 The null hypothesis of No Difference Between the Wages of M&F,
 Formulate the alternative of There’s Discrimination Against Females?
 How intercept changes under these Hypotheses?
ADA University
© Khatai Abbasov
99
5. Binary (or Dummy) Variables
5.2 Dummy in Multiple Regression Analysis
 Does giving grants increase the hours of training per employee?
hrsemp = hours of training per employee
sales = annual sales
employ = number of employees
 Is variable grant statistically significant?
 What does coefficient grant tell us?
 Is sales statistically and practically significant?
 What about employment then?
 What do you think about the reason of using sales and employment in the model?
 Does the finding for grant tell us anything about causality?
 Not necessarily: there could be firms that train there workers more even in the absence of
grants, i.e., we must know how the grants receiving firms were determined!
ADA University
© Khatai Abbasov
100
5. Binary (or Dummy) Variables
5.3 Dummy Variable for Multiple Categories
 Several dummy independent variables in the same equation
 What is the base group in this model?
 Are the coefficients statistically significant?
 Interpret the coefficients for dummies?
 What would happen if we added singmale to the model?
ADA University
© Khatai Abbasov
101
5. Binary (or Dummy) Variables
5.3 Dummy Variable for Multiple Categories
 Several dummy independent variables in the same equation
 Can we see the wage difference between single and married women in the model above?
 How to interpret this difference?
 What can you do for calculating the t-statistic for this difference?
ADA University
© Khatai Abbasov
102
5. Binary (or Dummy) Variables
5.4 Ordinal Information
 Ordinal information: credit ratings, physical attractiveness, etc.
 Credit ratings for example: {0,1,2,3,4}, where zero is the worst.
 We can either use all the rating information as one variable in the model, or to use it as 4
different dummies where the base group can be the 5th excluded rating.
 Using as dummies has an interpretation advantage, since in other way around it is difficult
to understand what does 1 unit increase exactly mean.
 Base group is average looking people
 Other factors: educ, exper, marital status, race
 What model tells about men below average looks?
 Is it statistically different than zero?
 What about men above average looks?
ADA University
© Khatai Abbasov
103
5. Binary (or Dummy) Variables
5.5 Binary Dependent Variable – LPM Model
 Linear Probability Model (LPM) is about dependent variable being zero/one,
 In other words, y in the model takes only two values, 0 and 1.
 Thus, β cannot be interpreted as the change in y given a one-unit increase in x.
 Given the zero conditional mean assumption, E(u|x) = 0, it is always true that E(y|x) =
P(y=1|x): that is the probability that y = 1.
 Because probabilities must sum to one, P(y = 0|x) = 1 – P(y = 1|x) is also linear function of
the Xj.
We must consider now estimated y is the predicted probability of success.
 inlf = labor force participation
 inlf = 1, if the person works
ADA University
© Khatai Abbasov
104
5. Binary (or Dummy) Variables
5.5 Binary Dependent Variable – LPM Model
 Linear Probability Model (LPM) is about dependent variable being zero/one,
 inlf = labor force participation
 nwifeinc = husband’s earning
 kidslt6 = kids less than 6 years
 kidsge6 = kids from 6 to 18 years
10 more years of education increases the probability of being in the labor force by
0.38, which is a pretty large increase in a probability.
Explain nwifeinc coefficient in the same manner? Is it a large increase or not?
What variable do you expect to be statistically insignificant? Is it?
ADA University
© Khatai Abbasov
105
5. Binary (or Dummy) Variables
5.5 Binary Dependent Variable – LPM Model
 Shortcomings of LPM
 Certain combinations of values for the independent variables can yield results either less
than zero or greater than 1. However, predicted probabilities must be between 0 and 1.
 In fact, 16 of the fitted values are less than zero, and 17 of the fitted values are greater than
1 in the previous estimation model with the dependent of inlf.
 Another problem is that linear relation between probability and independent variables is
awkward. Going from zero children to one children has a same probability change with
going from 1 children to 2 children, which is the reduction in the probability of working by
0.262.
 In fact, when taken to the extreme, the estimation on the previous slide implies that going
from zero to four young children reduces the probability of working by 0.262 x 4 = 1.048,
which is impossible.
Although above mentioned issues exist, LPM models are still usable and can help
to understand the model better, at least in average values and with cautious
interpretation of significances.
ADA University
© Khatai Abbasov
106
5. Binary (or Dummy) Variables
5.6 Discrete Dependent Variables
 Dependent variable has a set of small integer values, and zero is a common value
 For instance, number of arrests, number of living children, etc.
 Interpretation the estimate of educ?
 Each additional years of education reduces the estimated number of children by 0.079?
 Remember that an estimate is the effect of the independent variable on the expected value
(average value) of y.
 Thus, the estimate of education means that, average fertility falls by 0.079 children given
one more year of education.
 Even a better summary would be, if each woman in a group of 100 (women) obtains
another year of education, we estimate there will be roughly 8 fewer children among them.
 What about the interpretation of the dummy variable of electricity?
ADA University
© Khatai Abbasov
107
6. Heteroscedasticity
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
108
6. Heteroscedasticity
Outline of Chapter 6:
6.1 Importance
6.2 Testing for Heteroskedasticity
6.3 Solutions against Heteroskedasticity
ADA University
© Khatai Abbasov
109
6. Heteroscedasticity
6.1 Importance:
 In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢,
& 𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝑢
&
 variance is assumed to be constant, homoscedastic.
 However, it is more likely that,
 people with more education have a wider variety of interests and job opportunities, which
could lead to more wage variability at higher levels of education.
 people with very low levels of education have fewer opportunities and often must work at
the minimum wage; this serves to reduce wage variability at low education levels.
 or the the variance of unobserved part in the savings function is a function of income, i.e.,
people with more income have more opportunity to save, but not people with less income.
ADA University
© Khatai Abbasov
110
6. Heteroscedasticity
6.1 Importance:
 Homoskedasticity & Heteroskedasticity
ADA University
© Khatai Abbasov
111
6. Heteroscedasticity
6.1 Importance:
 Best Linear Unbiased Estimator (BLUE) assumptions are
 Linearity in parameters
 Random Sampling
 No Perfect Collinearity
 Zero Conditional Mean
 Homoskedasticity
 Heteroskedasticity is a problem in the calculations of t, F test, LM-statistic and confidence
intervals since Var(𝛽෡𝑗 ) is biased without homoskedasticity assumption.
 Heteroskedasticity does not cause a bias or inconsistency in the OLS estimators!!!
 R-squared and Adjusted R-squared are unaffected by the presence of heteroskedasticity.
ADA University
© Khatai Abbasov
112
6. Heteroscedasticity
6.2 Testing for Heteroscedasticity:
 Breusch-Pagan Test
1.
2.
3.
4.
 F test, or

3
ADA University
© Khatai Abbasov
113
6. Heteroscedasticity
6.2 Testing for Heteroscedasticity:
 White Test
1.
2.
3.
4.

 F test, or

4
ADA University
© Khatai Abbasov
114
6. Heteroscedasticity
6.3 Solutions against Heteroskedasticity:
 Robus Standard Errors
 standard error of the coefficients are re-estimated
 software packages calculate robust standard error if required
th
 𝑟ෞ
𝑖𝑗 denotes the i residual from regressing 𝑥𝑗 on all other independent variables
 SSRj is the sum of square residuals from this regression
 Heteroskedasticity-robust F statistic, or LM statistic in joint hypothesis test
 software packages do the calculations
ADA University
© Khatai Abbasov
115
6. Heteroscedasticity
6.3 Solutions against Heteroskedasticity:
 Robus Standard Errors
 Do you expect very big difference in
t-statistics? Why?
 Are the robust standard errors larger
or smaller than the usual standard
errors?
4
ADA University
© Khatai Abbasov
116
6. Heteroscedasticity
6.3 Solutions against Heteroskedasticity:
 Weighted Least Squares (WLS)
 The idea of this method lies in the notion of heteroskedasticity.
 heteroskedasticity means that the variance is a function of the independent variables,
mathematically: Var(u|x) = σ2 h(x).
 To eliminate heteroskedasticity a transformation should then be:
Var(ui|xi) = E(u2i |xi) = σ2 h(xi) = σ2 hi
 variables are transformed as it is illustrated above
 but we must interpret the new coefficients according to the original form of the model
 Example: savi = β0 + βi inci + ui

Var(ui|inci) = σ2 inci
savi / inci = β0 / inci + βi / inci + ui / inci
ADA University
© Khatai Abbasov

Var(ui|inci) = σ2
117
6. Heteroscedasticity
6.3 Solutions against Heteroskedasticity:
 Feasible GLS or Estimated GLS
 GLS here means generalized least squares
 it is the estimated version of the weighted least squares (WLS).
 abbreviated as FGLS or EGLS
 The idea comes from the equation
 where the weight, h(x), is
 thus, we can regress log(u2) on x1, x2, …, xk and get fitted values and call it gෝi
 then, estimated weight is
ADA University
h෡i = exp(gෝi )
© Khatai Abbasov
118
6. Heteroscedasticity
6.3 Solutions against Heteroskedasticity:
 Feasible GLS or Estimated GLS
 Steps
ADA University
© Khatai Abbasov
119
7. More on Specification and Data Issues
Content:
0. Introduction
1. The Simple Regression Model
2. The Multiple Regression Model
3. Hypothesis Testing in the Regression Analysis
4. Nonlinearities in the Regression Models
5. Binary (or Dummy) Variables
6. Heteroscedasticity
7. More on Specification and Data Issues
ADA University
© Khatai Abbasov
correspond. chapters in book
Review
Chapter 2
Chapter 3
Chapter 4
Chapter 6
Chapter 7
Chapter 8
Chapter 9
120
7. More on Specification and Data Issues
Outline of Chapter 7:
7.1 Functional Form Misspecification
7.2 RESET Test
7.3 Testing Non-Nested Hypothesis
7.4 Proxy Variables
7.5 Measurement Error
7.6 Data Issues
ADA University
© Khatai Abbasov
121
7. More on Specification and Data Issues
7.1 Functional Form Misspecification:
 Assume that our true model is one of these specifications:
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋𝑍 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑍 2 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝛽4 𝑍 2 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝛽4 𝑍 2 + 𝛽5 𝑋𝑍 + 𝑢
or maybe this:
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝛽4 𝑍 2 + 𝛽5 𝑋𝑍 + 𝛽6 𝑋 3 + 𝛽7 𝑍 3 + … + u
 if you estimate the model
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝑢,
then we are mis-specifying the functional form.
ADA University
© Khatai Abbasov
122
7. More on Specification and Data Issues
7.1 Functional Form Misspecification:
 How to be sure then our model is not misspecified?
 One of the very powerful tool for detecting the misspecified functional form: F test
if the estimated model 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝑢
is thought to have non-linear variables, e.g., quadratics, or interaction terms,
then we could add these variables and test whether they are meaningful:
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝜷𝟑 𝑿𝟐 + 𝜷𝟒 𝒁𝟐 + 𝜷𝟓 𝑿𝒁 + 𝑢
F test  𝐻0 : 𝛽3 = 𝛽4 = 𝛽5 = 0
 It is also possible to add the third or fourth powers of the variables to the model:
(… + 𝜷𝟔 𝑿𝟑 + 𝜷𝟕 𝒁𝟑 + … + 𝑢)
ADA University
© Khatai Abbasov
123
7. More on Specification and Data Issues
7.1 Functional Form Misspecification:
 Drawback of adding explanatory variables to test the misspecification:
if the estimated model has many explanatory variables
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝐿 + 𝛽4 𝑀 + … + 𝑢
then it might use up many degrees of freedom since the quadratics and
interactions will expand the size of the model:
𝐷𝐹 = 𝑛 − 𝑘 − 1,
where k is the number of the variables.
 It gets worse when we add the third or fourth powers of the variables to the model.
 Solution: RESET Test
ADA University
© Khatai Abbasov
124
7. More on Specification and Data Issues
7.2 RESET Test:
 Ramsey’s (1969) regression specification error test
if the estimated model is 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝑢
then OLS predicted values of y is 𝑦ො = 𝛽መ0 + 𝛽መ1 𝑋 + 𝛽መ2 𝑍
in RESET test, we add 𝑦ො 2 , 𝑦ො 3 to the model and test whether coefficients are
statistically meaningful:
ෝ𝟐 + 𝜷𝟒 𝒚
ෝ𝟑 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝜷𝟑 𝒚
F test  𝐻0 : 𝛽3 = 𝛽4 = 0
 𝑦ො 2 , 𝑦ො 3 are nonlinear function of the independent variables. Hint: (𝒙 + 𝒚)𝟐 = 𝒙𝟐 + 𝟐𝒙𝒚 + 𝒚𝟐
ADA University
© Khatai Abbasov
125
7. More on Specification and Data Issues
7.2 RESET Test:
 Ramsey’s (1969) regression specification error test
Example:
𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑊 + 𝑢
level-level
n=96, RESET statistic: 3.11, p-value: 0.05
log 𝑦 = 𝛽0 + 𝛽1 log 𝑋 + 𝛽2 log 𝑍 + 𝛽3 log(𝑊) + 𝑢
log-log model
n=96, RESET statistic: 2.37, p-value: 0.10
 On the basis of RESET, log-log model is appropriate. This is purely because of levellevel model being misspecified is worse off than the log-log model!!!
 If you want to use the table of critical values for the F test, then keep in mind that we
add 𝑦ො 2 , 𝑦ො 3 to the model in the RESET test.
 since 𝑛 = 88, 𝑘 = 5 𝐹2,82 critical should be sought at 1%, 5% & 10% tables.
ADA University
© Khatai Abbasov
126
7. More on Specification and Data Issues
7.2 RESET Test:
 Ramsey’s (1969) regression specification error test
Power of the predicted value, 𝑦,
ො is unclear: squared and cubed values are
common in most applications.
Nonconstructive: no indication what to do if the model is misspecified.
Nevertheless, useful tool if one suspects that null model is too restrictive.
Yet, there is no obvious more general alternative.
RESET is a test of the functional form: not for omitted variables or
heteroscedasticity.
ADA University
© Khatai Abbasov
127
7. More on Specification and Data Issues
7.3 Testing Non-Nested Hypothesis:
 Nested models:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢
 Nonnested models: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽3 𝑥3 + 𝑢
 F statistic is appropriate for nested model, but not for nonnested models.
 Nonnested models: one solution for comparison could be Adjusted R-squared.
 But not when the dependent variable has a different form, e.g., log(y).
 Nonnested models: another solution is more composite or comprehensive model.
Example
ADA University
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢
𝑦 = 𝛽0 + 𝛽1 log(𝑥1 ) + 𝛽2 log(𝑥2 ) + 𝑢
𝑦 = 𝛿0 + 𝛿1 𝑥1 + 𝛿2 𝑥2 + 𝛿3 log(𝑥1 ) + 𝛿4 log(𝑥2 ) + 𝑢
𝐻0 : 𝛿1 = 𝛿2 = 0; &
𝐻0 : 𝛿3 = 𝛿4 = 0
© Khatai Abbasov
1st model
2nd model
Composite model
Null hypotheses
128
7. More on Specification and Data Issues
7.3 Testing Non-Nested Hypothesis:
 Nonnested models: next solution is Davidson-MacKinnon test.
Example
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢
𝑦 = 𝛽0 + 𝛽1 log(𝑥1 ) + 𝛽2 log(𝑥2 ) + 𝑢
𝑦ො = 𝛽መ0 + 𝛽መ1 log(𝑥1 ) + 𝛽መ2 log(𝑥2 )
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜃1 𝑦ො + 𝑢
𝐻0 : 𝜃1 = 0
1st model
2nd model
fitted values
Davidson-MacKinnon model
Null hypotheses
 We can also take the fitted values of the 1st model and add to the 2nd model.
 If the null is rejected, i.e., 𝜃1 is significant, means we choose the 2nd model.
 Rejection of the 1st model does not mean that the other model is correct.
 could also be because of various functional form misspecifications
ADA University
© Khatai Abbasov
129
7. More on Specification and Data Issues
7.4 Proxy Variables:
 Proxy in the place of Unobserved Independent Variable
Some variables cannot be observed in real life, e.g., ability, uncertainty, etc.
Problem: missing variables cause omitted variable bias in the model.
Solution: use proxy variables
Examples: IQ test score instead of ability, EPU index instead of uncertainty, etc.
Crucial point: when we use proxy variables, we know that we still may have a bias,
but we hope that this bias is less than omitted variable bias.
ADA University
© Khatai Abbasov
130
7. More on Specification and Data Issues
7.4 Proxy Variables:
 Proxy in the place of Unobserved Independent Variable
 Mathematically:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3∗ + 𝑢
𝑥3∗ (ability) is unobservable, but 𝑥3 (IQ)
𝑥3∗ = 𝛿0 + 𝛿3 𝑥3 + 𝑣3
we expect 𝛿3 > 0 since 𝑥3 is a proxy
Let’s see a special case where 𝛽1 & 𝛽2 are unbiased:
𝑐𝑜𝑟𝑟 𝑢, 𝑥1 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥2 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥3∗ = 0, 𝑐𝑜𝑟𝑟 𝑢, 𝑥3 = 0
𝑐𝑜𝑟𝑟 𝑣3 , 𝑥1 = 0, 𝑐𝑜𝑟𝑟 𝑣3 , 𝑥2 = 0
The 2nd condition guarantees that 𝑥3 is a good proxy for 𝑥3∗ .
𝑦 = 𝜷𝟎 + 𝜷𝟑 𝜹𝟎 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜷𝟑 𝜹𝟑 𝑥3 + 𝜷𝟑 𝒗𝟑 + 𝒖
1st condition
2nd condition
plug-in model
We get slope for proxy 𝑥3 , 𝜷𝟑 𝜹𝟑 , which is more interpretable.  ability versus IQ
ADA University
© Khatai Abbasov
131
7. More on Specification and Data Issues
7.4 Proxy Variables:
 Proxy in the place of Unobserved Independent Variable
Correlation between 𝑥3∗ (ability) and 𝑥1 (education) & 𝑥2 (experience)???
Then to maintain the 2nd condition we have the following equations:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3∗ + 𝑢
𝑥3∗ = 𝛿0 + 𝛿1 𝑥1 + 𝛿2 𝑥2 + 𝛿3 𝑥3 + 𝑣3
𝑥3∗ (ability) is unobservable, but 𝑥3 (IQ)
𝑥3∗ is related to the observed variables
Conditions:
𝑐𝑜𝑟𝑟 𝑢, 𝑥1 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥2 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥3∗ = 0, 𝑐𝑜𝑟𝑟 𝑢, 𝑥3 = 0
𝑐𝑜𝑟𝑟 𝑣3 , 𝑥1 = 0, 𝑐𝑜𝑟𝑟 𝑣3 , 𝑥2 = 0
1st condition
2nd condition
 𝑦 = 𝛽0 + 𝛽3 𝛿0 +(𝜷𝟏 + 𝜷𝟑 𝜹𝟏 )𝑥1 +(𝜷𝟐 + 𝜷𝟑 𝜹𝟐 )𝑥2 +𝛽3 𝛿3 𝑥3 +𝛽3 𝑣3 + 𝑢
plug-in model
If 𝛽3 > 0 & 𝛿1 > 0: there’s a positive bias, but hopefully smaller than the entirely
omitted variable bias, when the ability is not used in the model at all.
ADA University
© Khatai Abbasov
132
7. More on Specification and Data Issues
7.4 Proxy Variables:
 Example from Wooldridge (2016, p.281-282)
Estimated return to education in column 1?
Estimated return to education in column 2?
Explain this difference in 𝛽𝑒𝑑𝑢𝑐 ?
 Hint: 𝑐𝑜𝑟𝑟 𝑒𝑑𝑢𝑐, 𝑎𝑏𝑖𝑙𝑖𝑡𝑦 > 0
Interpret the coefficient of IQ in column 2.
 Go to the Example 9.3 in the book and read all the
explanations, particularly for the column 3, which is not
shown here in this slide.
ADA University
© Khatai Abbasov
133
7. More on Specification and Data Issues
7.5 Measurement Error:
 Similar statistical characteristics with proxy variable case:
In the proxy variable case, we are looking for a variable that is somehow is related
to the unobserved variable.
In the measurement error case, we do have the variable, but recorded measure of
it may contain error.
For instance:
IQ score is a proxy for ability
Reported annual income is a measure of actual annual income.
ADA University
© Khatai Abbasov
134
7. More on Specification and Data Issues
7.5 Measurement Error:
 Measurement Error in the Dependent Variable
 Let the true regression model be:
𝑦 ∗ = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝑢
𝑦 ∗ is the actual
𝑒0 = 𝑦 − 𝑦 ∗
𝑦 ∗ = 𝑦 − 𝑒0
𝑒0 is the measurement error
y is the observed
𝑦 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝒖 + 𝒆𝟎
plug-in model
 If 𝑐𝑜𝑟𝑟 𝑒0 , 𝑥𝑖 = 0, where 𝑖 = 1,2, … , 𝑘 

OLS estimators are unbiased
 If 𝑐𝑜𝑟𝑟 𝑒0 , 𝑢 = 0, Var u + e0 = σ2u + σ20 > σ2u

larger variance for 𝛽𝑖 , 𝑖 = [1, 𝑘]
 What if 𝑐𝑜𝑟𝑟 𝑒0 , 𝑥𝑖 ≠ 0? What is its meaning and consequence? (see the next example)
ADA University
© Khatai Abbasov
135
7. More on Specification and Data Issues
7.5 Measurement Error:
 Measurement Error in the Dependent Variable
 Example: Does job-training grants decrease the scrap rate in manufacturing firm?
log(𝑠𝑐𝑟𝑎𝑝∗ ) = 𝛽0 + 𝛽1 𝑔𝑟𝑎𝑛𝑡 + 𝑢
𝑔𝑟𝑎𝑛𝑡 = 1 firms receive grant, 0 not
log(𝑠𝑐𝑟𝑎𝑝) = log(𝑠𝑐𝑟𝑎𝑝∗ ) + 𝑒0
scrap rate is measured with error
log(𝑠𝑐𝑟𝑎𝑝) = 𝛽0 + 𝛽1 𝑔𝑟𝑎𝑛𝑡 + 𝑢 + 𝑒0
plug-in model
 Is it really true that 𝑐𝑜𝑟𝑟 𝑒0 , 𝑔𝑟𝑎𝑛𝑡 = 0?
 What if firm underreports its scrap rate in order to make the grant look effective?
 This means 𝑐𝑜𝑟𝑟 𝑒0 , 𝑔𝑟𝑎𝑛𝑡 < 0, consequently a downward bias in 𝛽1 .
 Thus, if the measurement error is just a random reporting error, then no bias.
if it is systematically related to one of the explanatories, then bias in OLS.
ADA University
© Khatai Abbasov
136
7. More on Specification and Data Issues
7.6 Data Issues:
 Missing Data
very common that information is missing on some key variables for several units
no information on father‘s or mother‘s education, IQ level, etc.
in time series data, no information for some years
modern software packages (STATA, R, etc.) ignore missing observations when
computing a regression, and keep track of missing data
data missing completely at random (MCAR), cause no statistical problems
MCAR implies that the reason the data are missing is independent, in a statistical
sense, of both the observed and unobserved factors affecting y.
ADA University
© Khatai Abbasov
137
7. More on Specification and Data Issues
7.6 Data Issues:
 Solution for MCAR (missing completely at random):
it is common to write zero in the place of missing values, and to create dummy of
“missing data indicator“, which is equal to 1 when missing, 0 when observed
then include these two variable together in the regression
this works only under MCAR assumption
exclusion of the dummy variable from the model leads to substantial bias
MCAR is often very unrealistic
the lower the IQ score, the higher the probability of it missing from the data
similar for the education at lower-than-average levels
ADA University
© Khatai Abbasov
138
7. More on Specification and Data Issues
7.6 Data Issues:
 Stratified Sampling: layered or classified into subpopulations
one of the very common nonrandom sampling that is created intentionally
imagine you want to learn whether there is a gender wage gap in the army
the number of women personnel being low, random sampling creates a bias
or you want to predict election results in a country with 3 cities: A-with 1 million
factory workers, B-with 2 million office workers, C-with 3 million retirees.
random sampling of 600 people being poorly balanced will create a bias
solution: random sampling from A with 100, B with 200, C with 300 people,
however, can produce small error in estimation
ADA University
© Khatai Abbasov
139
7. More on Specification and Data Issues
7.6 Data Issues:
 Outliers: unusual observations
OLS minimizes the sum of the squared residuals
Therefore, outliers may change the estimates by large amount
Outlying can occur because of a mistake has been done when entering the data
 e.g., extra zeros, misplacing the decimal point
 check summary statistics: minimums and maximums
 incorrect entries are hard to catch
It also occurs when one or several members are very different from the rest
to keep or drop such outliers in a regression analysis is a difficult decision
OLS results are reported with and without outliers in such cases very often
ADA University
© Khatai Abbasov
140
7. More on Specification and Data Issues
7.6 Data Issues:
 Outliers: unusual observations
 Figure 9.1 from Wooldridge (2016, p.297)
 using data with all 32 companies estimates,
say, the red regression line
 it is clear that the possible outlier was decisive
in the estimation, even though there is only one
firm with $40 billion annual sales
 leaving this outlier and using the model with 31
companies estimates, say, the blue line
 the slope has increased and definitely the t
statistic of the estimate
 Go to the Examples 9.8, 9.9, and 9.10 in the book and read all the explanations.
ADA University
© Khatai Abbasov
141
7. More on Specification and Data Issues
7.6 Data Issues:
 Outliers: unusual observations
 Figure 9.1 from Wooldridge (2016, p.297)
 what about R&D as a % of sales = 9.42
 why not leaving both outliers and estimating the
regression model with 30 observations
 yet, two R&D intensities above 6 (green arrow)
could also be treated as outliers
 Finding out outliers is very difficult endeavor.
 non-linear models (e.g., quadratic, logarithmic,
etc.) could even be better off.
 OLS? Why not LAD (least absolute deviations)
or Quantile Regressions?
ADA University
© Khatai Abbasov
 Go to the Examples 9.8, 9.9, & 9.10 in
the book and read all the explanations.
142
Download