estimate

advertisement
Intermediate Econometrics
Nguyễn Ngọc Anh
Nguyễn Hà Trang
DEPOCEN
Hanoi, 25 May 2012
Dự kiến Nội dung khoá học
• 5 Lecture and 5 practical sessions
– Simple and multiple regression model review
– IV Models
– Discrete choice model – 1
• Random utility model
• Logit/probit
• Multinomial logit
– Discrete choice model – 2
• Ordinal choice model
• Poison model
– Panel data
Statistical Review
• Populations, Parameters and Random Sampling
–
–
–
–
–
Use statistical inference to learn something about a population
Population: Complete group of agents
Typically only observe a sample of data
Random sampling: Drawing random samples from a population
Know everything about the distribution of the population except for
one parameter
– Use statistical tools to say something about the unknown parameter
• Estimation and hypothesis testing
Statistical Review
Estimators and Estimates:
– Given a random sample drawn from a population distribution that
depends on an unknown parameter , an estimator of  is a rule that
assigns each possible outcome of the sample a value of 
– Examples:
• Estimator for the population mean
• Estimator for the variance of the population distribution
– An estimator is given by some function of the r.v.s
– This yields a (point) estimate which is itself a r.v.
– Distribution of estimator is the sampling distribution
– Criteria for selecting estimators
Statistical Review
Finite sample properties of estimators:
– Unbiasedness
An estimator ˆ of  is unbiased if E ˆ   for all values of 
i.e., on average the estimator is correct

If not unbiased then the extent of the bias is measured as
 
Bias ˆ  E ˆ  
Extent of bias depends on underlying distribution of population
and estimator that is used
Choose the estimator to minimise the bias
Statistical Review
Finite sample properties of estimators:
– Efficiency
What about the dispersion of the distribution of the estimator?
i.e, how likely is it that the estimate is close to the true parameter?
Useful summary measure for the dispersion in the distribution is
the sampling variance.
An efficient estimator is one which has the least amount of
dispersion about the mean i.e. the one that has the smallest
sampling variance
If ˆ1 and ˆ2 are two unbiased estimators of , ˆ1 is efficient
   
ˆ
relative to  2 when V ˆ1  V ˆ2
inequality for at least one value of .
for all , with strict
Statistical Review
3. Finite sample properties of estimators:
– Efficiency
What if estimators are not unbiased?
Estimator with lowest Mean Square Error (MSE) is more efficient:


MSE ˆ  E ˆ  

 
2
      
2
ˆ
ˆ
ˆ
MSE   V   Bias 
Example:
Compare the small sample properties of the following estimates of
the population mean
1 n
̂1   Yi
n i 1
1 n
̂ 2   Yi
4n i 1
Statistical Review
Asymptotic Properties of Estimators
– How do estimators behave if we have very large samples – as n
increases to infinity?
– Consistency
How far is the estimator likely to be from the parameter it is
estimating as the sample size increases indefinitely.
ˆ is a consistent estimator of θ if for every ε>0
P ˆ      0 as n  


This is known as convergence in probability
The above can also be written as: plim ˆ  
n 

Statistical Review
Asymptotic Properties of Estimators
– Consistency (continued)
Sufficient condition for consistency:
Bias and the variance both tend to zero as the sample size increases
indefinitely. That is:

MSE ˆ  0
as
n
Law of Large numbers: an important result in statistics
plim Y   

When estimating anpopulation
average, the larger n the closer to the
true population average the estimate will be
Statistical Review
Asymptotic Properties of Estimators
– Asymptotic Efficiency
Compares the variance of the asymptotic distribution of two
estimators
A consistent estimator ˆ of  is asymptotically efficient if its
asymptotic variance is smaller than the asymptotic variance of all
other consistent estimators of 
Statistical Review
Asymptotic Properties of Estimators
– Asymptotic Normality
An estimator is said to be asymptotically normally distributed if its
sampling distribution tends to approach the normal distribution
as the sample size increases indefinitely.
The Central Limit Theorem: average from a random sample for
any population with finite variance, when standardized, has an
asymptotic normal distribution.
Z
Y 
~ Asy.N 0,1
 n
Statistical Review
Approaches to parameter estimation
– Method of Moments (MM)
Moment: Summary statistic of a population distribution (e.g.
mean, variance)
MM: replaces population moments with sample counterparts
Examples:
n

1
Estimate population mean μ with ̂  n  Yi  Y
(unbiased and consistent)
i 1
n
Estimation population variance σ2 with ˆ 2  n 1 
Yi  Y 2
(consistent but biased)
i 1
Statistical Review
Approaches to parameter estimation
– Maximum Likelihood Estimation (MLE)
Let {Y1,Y2,……,Yn} be a random sample from a population
distribution defined by the density function f(Y|θ)
The likelihood function is the joint density of the n independently
and identically distributed observations given by:
f Y1, Y2 ,...., Yn |     f Yi |    L ; Yi 
n
i 1
The log likelihood is given by:
ln L | Y    ln f Yi |  
n
i 1
The likelihood principle:
Choose the estimator of θ that maximises the likelihood of
observing the actual sample
MLE is the most efficient estimate but correct specification
required for consistency
Statistical Review
Approaches to parameter estimation
– Least Squares Estimation
Minimise the sum of the squared deviations between the actual
and the sample values
Example: Find the least squares estimator of the population mean
ˆ  arg min  Yi   2  arg min S  
n
i 1
The least squares, ML and MM estimator of the population mean is
the sample average
Statistical Review
Interval Estimation and Confidence Intervals
– How do we know how accurate an estimate is?
– A confidence interval estimates a population parameter within a
range of possible values at a specified probability, called the level
of confidence, using information from a known distribution – the
standard normal distribution
– Let {Y1,Y2,……,Yn} be a random sample from a population with a
normal distribution with mean μ and variance σ2: Yi~N(μ,σ2)
The distribution of the sample average will be: Y ~ N  ,  2 n
Standardising: Y  
~ N 0,1
 n
– Using what we know about the standard normal distribution we
can construct a 95% confidence interval:



Y 
Pr  1.96 
 1.96   0.95
 n



Statistical Review
Interval Estimation and Confidence Intervals
– Re-arranging:

 

Pr Y  1.96
   Y  1.96
  0.95
n
n

What if σ unknown?
An unbiased estimator of σ
2 1 2
Y 
~ tn1
s n
 1
s
 Yi  Y  
 n  1 i 1

n
95% confidence interval given by:

Y  tn1,

2
s
, Y  tn1,
n
2
s 

n
Statistical Review
Hypothesis Testing
– Hypothesis: statement about a popn. developed for the purpose of
testing
– Hypothesis testing: procedure based on sample evidence and
probability theory to determine whether the hypothesis is a reasonable
statement.
– Steps:
1. State the null (H0 ) and alternate (HA ) hypotheses
Note distinction between one and two-tailed tests
2. State the level of significance
Probability of rejecting H0 when it is true (Type I Error)
Note: Type II Error – failing to reject H0 when it is false
Power of the test: 1-Pr(Type II error)
3. Select a test statistic
Based on sample info., follows a known distribution)
4. Formulate decision rule
Conditions under which null hypothesis is rejected. Based on critical value
from known probability distribution.
5. Compute the value of the test statistic, make a decision, interpret the
results.
Statistical Review
Hypothesis Testing
– P-value:
Alternative means of evaluating decision rule
Probability of observing a sample value as extreme as, or more extreme
than the value observed when the null hypothesis is true
• If the p-value is greater than the significance level, H0 is not rejected
• If the p-value is less than the significance level, H0 is rejected
If the p-value is less than:
0.10, we have some evidence that H0 is not true
0.05 we have strong evidence that H0 is not true
0.01 we have very strong evidence that H0 is not true
The Simple Regression Model
1. Definition of the Simple Regression Model
The population model
Assume linear functional form:
E(Y|Xi) = β0+β1Xi
β0: intercept term or constant
β1: slope coefficient - quantifies the linear relationship between X and Y
Fixed parameters known as regression coefficients
For each Xi, individual observations will vary around E(Y|Xi)
Consider deviation of any individual observation from conditional mean:
ui = Yi - E(Y|Xi)
ui : stochastic disturbance/error term – unobservable random deviation
of an observation from its conditional mean
The Simple Regression Model
Definition of the Simple Regression Model
The linear regression model
Re-arrange previous equation to get:
Yi = E(Y|Xi)+ ui
Each individual observation on Y can be explained in terms of:
E(Y|Xi): mean Y of all individuals with same level of X – systematic or
deterministic component of the model – the part of Y explained by X
ui: random or non-systematic component – includes all omitted
variables that can affect Y
Assuming a linear functional form:
Yi = β0+β1Xi + ui
The Simple Regression Model
Definition of the Simple Regression Model
A note on linearity: Linear in parameters vs. linear in variables
The following is linear in parameters but not in variables:
Yi = β0+β1Xi2 + ui
In some cases transformations are required to make a model linear in
parameters
The Simple Regression Model
Definition of the Simple Regression Model
The linear regression model
Yi = β0+β1Xi + ui
Represents relationship between Y and X in population of data
Using appropriate estimation techniques we use sample data to
estimate values for β0 and β1
β1: measures ceteris paribus affect of X on Y only if all other factors are
fixed and do not change.
Assume ui fixed so that Δui = 0, then
Δ Yi = β1 Δ Xi
Δ Yi /Δ Xi = β1
Unknown ui – require assumptions about ui to estimate ceteris paribus
relationship
The Simple Regression Model
Definition of the Simple Regression Model
The linear regression model: Assumptions about the error term
Assume E(ui) =0: On average the unobservable factors that deviate an
individual observation from the mean are zero
Assume E(ui|Xi) =0: mean of ui conditional on Xi is zero – regardless of
what values Xi takes, the unobservables are on average zero
Zero Conditional Mean Assumption:
E(ui|Xi) = E(ui) = 0
The Simple Regression Model
Definition of the Simple Regression Model
The linear regression model: Notes on the error term
Reasons why an error term will always be required:
–
–
–
–
–
Vagueness of theory
Unavailability of data
Measurement error
Incorrect functional form
Principle of Parsimony
The Simple Regression Model
Definition of the Simple Regression Model
Statistical Relationship vs. Deterministic Relationship
Regression analysis is concerned with statistical relationships since it
deals with random or stochastic variables and their probability
distributions the variation in which can never be completely explained
using other variables – there will always be some form of error
The Simple Regression Model
Definition of the Simple Regression Model
Regression vs. Correlation
Correlation analysis: measures the strength or degree of linear
association between two random variables
Regression analysis: estimating the average values of one variable on
the basis of the fixed values of the other variables for the purpose of
prediction.
Explanatory variables are fixed, dependent variables are random or
stochastic.
The Simple Regression Model
Ordinary Least Squares (OLS) Estimation
Estimate the population relationship given by
Yi   0  1 X i  ui
using a random sample of data i=1,….n
Least Squares Principle: Minimise the sum of the squared deviations
between the actual and the sample values.
Define the fitted values as Yˆi  ˆ0  ˆ1 X i
n
OLS minimises
 uˆ
i 1
2
i
n

i 1

Yi  Yˆi

2
The Simple Regression Model
Ordinary Least Squares
Estimation
n
min
 0 , 1
2


Y




X
Q 0 , 1 
 i 0 1 i  min
 ,
i 1
0
1
First Order Conditions (Normal Equations):
n
Q 0 , 1 
 2  Yi  ˆ0  ˆ1 X i  0
 0
i 1


n
Q 0 , 1 
 2  X i Yi  ˆ0  ˆ1 X i  0
1
i 1

Solve to find:

ˆ0  Y  ˆ1 X
 X i  X Yi  Y 
n
̂1 
Assumptions?
i 1
 X i  X 
n
i 1
2
The Simple Regression Model
Ordinary Least Squares Estimation
Method of Moments Estimator:
Replace population moment conditions with sample counterparts:
E ui   E Yi   0  1 X i   0
n


n  Yi  ˆ0  ˆ1 X i  0
1
i 1
E  X i ui   E  X i Yi   0  1 X i   0
n
 

n  X i Yi  ˆ0  ˆ1 X i  0
1
i 1
Assumptions?
The Simple Regression Model
Properties of OLS Estimator
Gauss-Markov Theorem
Under the assumptions of the Classical Linear Regression Model the OLS
estimator will be the Best Linear Unbiased Estimator
Linear: estimator is a linear function of a random variable
Unbiased: E ˆ  
 
E ˆ   
0
0
1
1
Best: estimator is most efficient estimator, i.e., estimator has the
minimum variance of all linear unbiased estimators
The Simple Regression Model
Goodness of Fit
How well does regression line ‘fit’ the observations?
R2 (coefficient of determination) measures the proportion of the sample
variance of Yi explained by the model where variation is measured as
squared deviation from sample mean.
 Yˆ  Y 
n
R 
2
i 1
n
2
i
 Yi  Y 
2

SSE
SST
i 1
Recall: SST = SSE + SSR  SSE  SST and SSE > 0
 0  SSE/SST  1
If model perfectly fits data SSE = SST and R2 = 1
If model explains none of variation in Yi then SSE=0 since Yˆi  0  Y
and R2 = 0
The Multiple Regression Model
The model with two independent variables
Say we have information on more variables that theory tells us may
influence Y:
Yi   0  1 X 1i   2 X 2i  ui
β0 : measures the average value of Y when X1 and X2 are zero
β1 and β2 are the partial regression coefficients/slope coefficients
which measure the ceteris paribus effect of X1 and X2 on Y, respectively
Key assumption:
E ui | X 1i , X 2i   0
For k independent variables:
Yi   0  1 X 1i   2 X 2i  ........   k X ki  ui
E ui | X 1i , X 2i ,..........., X ki   0
 Covui , X 1i   Covui , X 2i   ........  Covui , X ki   0
The Multiple Regression Model
4.
Goodness-of-Fit in the Multiple Regression Model
How well does regression line ‘fit’ the observations?
As in simple regression model define:
n
SST=Total Sum of Squares  Yi  Y 2
i 1
n
2
SSE=Explained Sum of Squares  Yˆi  Y 
i 1
n
SSR=Residual Sum of Squares  uˆi2
 Yˆ  Y 
n
R 
2
i 1
n
2
i
 Y  Y 
i 1
i 1

2
SSE
SSR
 1
SST
SST
i
Recall: SST = SSE + SSR  SSE  SST and SSE > 0
 0  SSE/SST  1
R2 never decreases as more independent variables are added – use
adjusted R2 :
Includes punishment for adding
R 2  1
SSR n  k  1
SST n  1
more variables to the model
The Multiple Regression Model
5. Properties of OLS Estimator of Multiple Regression Model
Gauss-Markov Theorem
Under certain assumptions known as the Gauss-Markov assumptions
the OLS estimator will be the Best Linear Unbiased Estimator
Linear: estimator is a linear function of the data
Unbiased: E ˆ0   0
 
E ˆ   
k
k
Best: estimator is most efficient estimator, i.e., estimator has the
minimum variance of all linear unbiased estimators
The Multiple Regression Model
Properties of OLS Estimator of Multiple Regression Model
Assumptions required to prove unbiasedness:
A1: Regression model is linear in parameters
A2: X are non-stochastic or fixed in repeated sampling
A3: Zero conditional mean
A4: Sample is random
A5: Variability in the Xs and there is no perfect collinearity in the Xs
Assumptions required to prove efficiency:
A6: Homoscedasticity and no autocorrelation
V ui | X 1i , X 2i ,...., X ki    2
Covui , u j   0
Topic 3: The Multiple Regression Model
Estimating the variance of the OLS estimators
Need to know dispersion (variance) of sampling distribution of OLS
estimator in order to show that it is efficient (also required for
inference)
 
In multiple regression model: V ˆk 
2

SSTk 1  Rk2

Depends on:
a) σ2: the error variance (reduces accuracy of estimates)
b) SSTk: variation in X (increases accuracy of estimates)
c) R2k: the coefficient of determination from a regression of Xk on all
other independent variables (degree of multicollinearity reduces
accuracy of estimates)
What about the variancen of the error terms 2?
1
ˆ 2 
 uˆi2
n  k  1 i 1
The Multiple Regression Model
Model specification
Inclusion of irrelevant variables:
OLS estimator unbiased but with higher variance if X’s correlated
Exclusion of relevant variables:
Omitted variable bias if variables correlated with variables included in
the estimated model
Yˆi  ˆ0  ˆ1 X1i  ˆ2 X 2i
~ ~ ~
Y
Estimated Model:
i   0  1 X 1i
~
ˆ  ˆ ~



OLS estimator:
1
1
2 1
True Model:
Biased:
 
~
~
E 1  1   21
Omitted Variable Bias:
 
~
~
Bias 1   21
Inference in the Multiple Regression Model
The Classical Linear Model
ˆ
Since the ' s can be written as a linear function of u, making
assumptions about the sampling distribution of u allows us say
something about the sampling distribution of the ˆ ' s
Assume u normally distributed

u i ~ N 0,  2

Inference in the Multiple Regression Model
Hypothesis testing about a single population parameter
Assume the following population model follows all CLM assumptions
Yi   0  1 X 1i   2 X 2i  .....   k X ki  ui
OLS produces unbiased estimates but how accurate are they?
Test by constructing hypotheses about population parameters and
using sample estimates and statistical theory to test whether
hypotheses are true
In particular, we are interested in testing whether population
parameters significantly different than zero:
H 0 : k  0
Inference in the Multiple Regression Model
Hypothesis testing about a single population parameter
Two-sided alternative hypothesis
H A : k  0
Large positive and negative values of computed test statistic
inconsistent with null
Reject null if
t ̂  c
k
Example:
H0 : k  0
H A : k  0
df  25
  0.05
threshold is anywhere above or below
97.5th percentile in either tai l of distributi on
c  2.06
Note: If null rejected variable is said to be ‘statistically significant’ at
the chosen significance level
Inference in the Multiple Regression Model
Hypothesis testing about a single population parameter
P-value approach:
Given the computed t-statistic, what is the smallest significance level
at which the null hypothesis would be rejected?
P-values below 0.05 provide strong evidence against the null
For two sided alternative p-value is given by:

P T  tˆ
Example:
k
 2 * PT  t 
ˆk
H 0 : k  0
H A : k  0
df  40

P T  t ˆ
t ˆ  1.85
 PT  1.85  2 * PT  1.85
k
k
 2 * 0.0359  0.0718
Note distinction between economic vs. statistical significance
Inference in the Multiple Regression Model
Testing hypothesis about a single linear combination of parameters
Consider the following model:
Yi  0  1 X1i   2 X 2i  3 X 3i  ui
We wish to test whether X1 and X2 have the same effect on Y
or
H 0 : 1   2  0
H 0 : 1   2
H A : 1   2
Construct statistic as before but standardize difference between
parameters
ˆ1  ˆ2
t
~ tnk 1
ˆ
ˆ
se 1   2

Estimate:



 
 

Var ˆ1  ˆ 2  Var ˆ1  Var ˆ 2  2Cov ˆ1 , ˆ 2

Topic 4: Inference in the Multiple Regression Model
5.
Testing hypothesis about multiple linear restrictions
General model:
Yi  0  1 X1i   2 X 2i  .....   k X ki  ui
We wish to test J exclusion restrictions:
H 0 :  k  J 1  0,  k  J  2  0,.......,  k  0
Restricted model:
Yi  0  1 X1i   2 X 2i  .....   k  J X k  Ji  ui
Estimate both models and compute either:
F
Or:
F
SSRr  SSRur  J
SSRur n  k  1
R

~ FJ ,nk 1
2
2
ur  Rr J
2
1  Rur
n  k 1



~ FJ ,nk 1
Large values
inconsistent with null
Note: degrees of freedom for numerator and denominator!
Topic 4: Inference in the Multiple Regression Model
6.
Overall test for significance of the Regression
General model:
Yi  0  1 X1i   2 X 2i  .....   k X ki  ui
Test of null hypothesis that all variables except intercept insignificant:
H 0 : 1  0,  2  0,.......,  k  0
Test statistic:
R  k ~ F
F
1  R  n  k  1
2
k ,n  k 1
2
2
R 2  Rur
Rr2  0
Large values
inconsistent with null
Download