Intermediate Econometrics Nguyễn Ngọc Anh Nguyễn Hà Trang DEPOCEN Hanoi, 25 May 2012 Dự kiến Nội dung khoá học • 5 Lecture and 5 practical sessions – Simple and multiple regression model review – IV Models – Discrete choice model – 1 • Random utility model • Logit/probit • Multinomial logit – Discrete choice model – 2 • Ordinal choice model • Poison model – Panel data Statistical Review • Populations, Parameters and Random Sampling – – – – – Use statistical inference to learn something about a population Population: Complete group of agents Typically only observe a sample of data Random sampling: Drawing random samples from a population Know everything about the distribution of the population except for one parameter – Use statistical tools to say something about the unknown parameter • Estimation and hypothesis testing Statistical Review Estimators and Estimates: – Given a random sample drawn from a population distribution that depends on an unknown parameter , an estimator of is a rule that assigns each possible outcome of the sample a value of – Examples: • Estimator for the population mean • Estimator for the variance of the population distribution – An estimator is given by some function of the r.v.s – This yields a (point) estimate which is itself a r.v. – Distribution of estimator is the sampling distribution – Criteria for selecting estimators Statistical Review Finite sample properties of estimators: – Unbiasedness An estimator ˆ of is unbiased if E ˆ for all values of i.e., on average the estimator is correct If not unbiased then the extent of the bias is measured as Bias ˆ E ˆ Extent of bias depends on underlying distribution of population and estimator that is used Choose the estimator to minimise the bias Statistical Review Finite sample properties of estimators: – Efficiency What about the dispersion of the distribution of the estimator? i.e, how likely is it that the estimate is close to the true parameter? Useful summary measure for the dispersion in the distribution is the sampling variance. An efficient estimator is one which has the least amount of dispersion about the mean i.e. the one that has the smallest sampling variance If ˆ1 and ˆ2 are two unbiased estimators of , ˆ1 is efficient ˆ relative to 2 when V ˆ1 V ˆ2 inequality for at least one value of . for all , with strict Statistical Review 3. Finite sample properties of estimators: – Efficiency What if estimators are not unbiased? Estimator with lowest Mean Square Error (MSE) is more efficient: MSE ˆ E ˆ 2 2 ˆ ˆ ˆ MSE V Bias Example: Compare the small sample properties of the following estimates of the population mean 1 n ̂1 Yi n i 1 1 n ̂ 2 Yi 4n i 1 Statistical Review Asymptotic Properties of Estimators – How do estimators behave if we have very large samples – as n increases to infinity? – Consistency How far is the estimator likely to be from the parameter it is estimating as the sample size increases indefinitely. ˆ is a consistent estimator of θ if for every ε>0 P ˆ 0 as n This is known as convergence in probability The above can also be written as: plim ˆ n Statistical Review Asymptotic Properties of Estimators – Consistency (continued) Sufficient condition for consistency: Bias and the variance both tend to zero as the sample size increases indefinitely. That is: MSE ˆ 0 as n Law of Large numbers: an important result in statistics plim Y When estimating anpopulation average, the larger n the closer to the true population average the estimate will be Statistical Review Asymptotic Properties of Estimators – Asymptotic Efficiency Compares the variance of the asymptotic distribution of two estimators A consistent estimator ˆ of is asymptotically efficient if its asymptotic variance is smaller than the asymptotic variance of all other consistent estimators of Statistical Review Asymptotic Properties of Estimators – Asymptotic Normality An estimator is said to be asymptotically normally distributed if its sampling distribution tends to approach the normal distribution as the sample size increases indefinitely. The Central Limit Theorem: average from a random sample for any population with finite variance, when standardized, has an asymptotic normal distribution. Z Y ~ Asy.N 0,1 n Statistical Review Approaches to parameter estimation – Method of Moments (MM) Moment: Summary statistic of a population distribution (e.g. mean, variance) MM: replaces population moments with sample counterparts Examples: n 1 Estimate population mean μ with ̂ n Yi Y (unbiased and consistent) i 1 n Estimation population variance σ2 with ˆ 2 n 1 Yi Y 2 (consistent but biased) i 1 Statistical Review Approaches to parameter estimation – Maximum Likelihood Estimation (MLE) Let {Y1,Y2,……,Yn} be a random sample from a population distribution defined by the density function f(Y|θ) The likelihood function is the joint density of the n independently and identically distributed observations given by: f Y1, Y2 ,...., Yn | f Yi | L ; Yi n i 1 The log likelihood is given by: ln L | Y ln f Yi | n i 1 The likelihood principle: Choose the estimator of θ that maximises the likelihood of observing the actual sample MLE is the most efficient estimate but correct specification required for consistency Statistical Review Approaches to parameter estimation – Least Squares Estimation Minimise the sum of the squared deviations between the actual and the sample values Example: Find the least squares estimator of the population mean ˆ arg min Yi 2 arg min S n i 1 The least squares, ML and MM estimator of the population mean is the sample average Statistical Review Interval Estimation and Confidence Intervals – How do we know how accurate an estimate is? – A confidence interval estimates a population parameter within a range of possible values at a specified probability, called the level of confidence, using information from a known distribution – the standard normal distribution – Let {Y1,Y2,……,Yn} be a random sample from a population with a normal distribution with mean μ and variance σ2: Yi~N(μ,σ2) The distribution of the sample average will be: Y ~ N , 2 n Standardising: Y ~ N 0,1 n – Using what we know about the standard normal distribution we can construct a 95% confidence interval: Y Pr 1.96 1.96 0.95 n Statistical Review Interval Estimation and Confidence Intervals – Re-arranging: Pr Y 1.96 Y 1.96 0.95 n n What if σ unknown? An unbiased estimator of σ 2 1 2 Y ~ tn1 s n 1 s Yi Y n 1 i 1 n 95% confidence interval given by: Y tn1, 2 s , Y tn1, n 2 s n Statistical Review Hypothesis Testing – Hypothesis: statement about a popn. developed for the purpose of testing – Hypothesis testing: procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement. – Steps: 1. State the null (H0 ) and alternate (HA ) hypotheses Note distinction between one and two-tailed tests 2. State the level of significance Probability of rejecting H0 when it is true (Type I Error) Note: Type II Error – failing to reject H0 when it is false Power of the test: 1-Pr(Type II error) 3. Select a test statistic Based on sample info., follows a known distribution) 4. Formulate decision rule Conditions under which null hypothesis is rejected. Based on critical value from known probability distribution. 5. Compute the value of the test statistic, make a decision, interpret the results. Statistical Review Hypothesis Testing – P-value: Alternative means of evaluating decision rule Probability of observing a sample value as extreme as, or more extreme than the value observed when the null hypothesis is true • If the p-value is greater than the significance level, H0 is not rejected • If the p-value is less than the significance level, H0 is rejected If the p-value is less than: 0.10, we have some evidence that H0 is not true 0.05 we have strong evidence that H0 is not true 0.01 we have very strong evidence that H0 is not true The Simple Regression Model 1. Definition of the Simple Regression Model The population model Assume linear functional form: E(Y|Xi) = β0+β1Xi β0: intercept term or constant β1: slope coefficient - quantifies the linear relationship between X and Y Fixed parameters known as regression coefficients For each Xi, individual observations will vary around E(Y|Xi) Consider deviation of any individual observation from conditional mean: ui = Yi - E(Y|Xi) ui : stochastic disturbance/error term – unobservable random deviation of an observation from its conditional mean The Simple Regression Model Definition of the Simple Regression Model The linear regression model Re-arrange previous equation to get: Yi = E(Y|Xi)+ ui Each individual observation on Y can be explained in terms of: E(Y|Xi): mean Y of all individuals with same level of X – systematic or deterministic component of the model – the part of Y explained by X ui: random or non-systematic component – includes all omitted variables that can affect Y Assuming a linear functional form: Yi = β0+β1Xi + ui The Simple Regression Model Definition of the Simple Regression Model A note on linearity: Linear in parameters vs. linear in variables The following is linear in parameters but not in variables: Yi = β0+β1Xi2 + ui In some cases transformations are required to make a model linear in parameters The Simple Regression Model Definition of the Simple Regression Model The linear regression model Yi = β0+β1Xi + ui Represents relationship between Y and X in population of data Using appropriate estimation techniques we use sample data to estimate values for β0 and β1 β1: measures ceteris paribus affect of X on Y only if all other factors are fixed and do not change. Assume ui fixed so that Δui = 0, then Δ Yi = β1 Δ Xi Δ Yi /Δ Xi = β1 Unknown ui – require assumptions about ui to estimate ceteris paribus relationship The Simple Regression Model Definition of the Simple Regression Model The linear regression model: Assumptions about the error term Assume E(ui) =0: On average the unobservable factors that deviate an individual observation from the mean are zero Assume E(ui|Xi) =0: mean of ui conditional on Xi is zero – regardless of what values Xi takes, the unobservables are on average zero Zero Conditional Mean Assumption: E(ui|Xi) = E(ui) = 0 The Simple Regression Model Definition of the Simple Regression Model The linear regression model: Notes on the error term Reasons why an error term will always be required: – – – – – Vagueness of theory Unavailability of data Measurement error Incorrect functional form Principle of Parsimony The Simple Regression Model Definition of the Simple Regression Model Statistical Relationship vs. Deterministic Relationship Regression analysis is concerned with statistical relationships since it deals with random or stochastic variables and their probability distributions the variation in which can never be completely explained using other variables – there will always be some form of error The Simple Regression Model Definition of the Simple Regression Model Regression vs. Correlation Correlation analysis: measures the strength or degree of linear association between two random variables Regression analysis: estimating the average values of one variable on the basis of the fixed values of the other variables for the purpose of prediction. Explanatory variables are fixed, dependent variables are random or stochastic. The Simple Regression Model Ordinary Least Squares (OLS) Estimation Estimate the population relationship given by Yi 0 1 X i ui using a random sample of data i=1,….n Least Squares Principle: Minimise the sum of the squared deviations between the actual and the sample values. Define the fitted values as Yˆi ˆ0 ˆ1 X i n OLS minimises uˆ i 1 2 i n i 1 Yi Yˆi 2 The Simple Regression Model Ordinary Least Squares Estimation n min 0 , 1 2 Y X Q 0 , 1 i 0 1 i min , i 1 0 1 First Order Conditions (Normal Equations): n Q 0 , 1 2 Yi ˆ0 ˆ1 X i 0 0 i 1 n Q 0 , 1 2 X i Yi ˆ0 ˆ1 X i 0 1 i 1 Solve to find: ˆ0 Y ˆ1 X X i X Yi Y n ̂1 Assumptions? i 1 X i X n i 1 2 The Simple Regression Model Ordinary Least Squares Estimation Method of Moments Estimator: Replace population moment conditions with sample counterparts: E ui E Yi 0 1 X i 0 n n Yi ˆ0 ˆ1 X i 0 1 i 1 E X i ui E X i Yi 0 1 X i 0 n n X i Yi ˆ0 ˆ1 X i 0 1 i 1 Assumptions? The Simple Regression Model Properties of OLS Estimator Gauss-Markov Theorem Under the assumptions of the Classical Linear Regression Model the OLS estimator will be the Best Linear Unbiased Estimator Linear: estimator is a linear function of a random variable Unbiased: E ˆ E ˆ 0 0 1 1 Best: estimator is most efficient estimator, i.e., estimator has the minimum variance of all linear unbiased estimators The Simple Regression Model Goodness of Fit How well does regression line ‘fit’ the observations? R2 (coefficient of determination) measures the proportion of the sample variance of Yi explained by the model where variation is measured as squared deviation from sample mean. Yˆ Y n R 2 i 1 n 2 i Yi Y 2 SSE SST i 1 Recall: SST = SSE + SSR SSE SST and SSE > 0 0 SSE/SST 1 If model perfectly fits data SSE = SST and R2 = 1 If model explains none of variation in Yi then SSE=0 since Yˆi 0 Y and R2 = 0 The Multiple Regression Model The model with two independent variables Say we have information on more variables that theory tells us may influence Y: Yi 0 1 X 1i 2 X 2i ui β0 : measures the average value of Y when X1 and X2 are zero β1 and β2 are the partial regression coefficients/slope coefficients which measure the ceteris paribus effect of X1 and X2 on Y, respectively Key assumption: E ui | X 1i , X 2i 0 For k independent variables: Yi 0 1 X 1i 2 X 2i ........ k X ki ui E ui | X 1i , X 2i ,..........., X ki 0 Covui , X 1i Covui , X 2i ........ Covui , X ki 0 The Multiple Regression Model 4. Goodness-of-Fit in the Multiple Regression Model How well does regression line ‘fit’ the observations? As in simple regression model define: n SST=Total Sum of Squares Yi Y 2 i 1 n 2 SSE=Explained Sum of Squares Yˆi Y i 1 n SSR=Residual Sum of Squares uˆi2 Yˆ Y n R 2 i 1 n 2 i Y Y i 1 i 1 2 SSE SSR 1 SST SST i Recall: SST = SSE + SSR SSE SST and SSE > 0 0 SSE/SST 1 R2 never decreases as more independent variables are added – use adjusted R2 : Includes punishment for adding R 2 1 SSR n k 1 SST n 1 more variables to the model The Multiple Regression Model 5. Properties of OLS Estimator of Multiple Regression Model Gauss-Markov Theorem Under certain assumptions known as the Gauss-Markov assumptions the OLS estimator will be the Best Linear Unbiased Estimator Linear: estimator is a linear function of the data Unbiased: E ˆ0 0 E ˆ k k Best: estimator is most efficient estimator, i.e., estimator has the minimum variance of all linear unbiased estimators The Multiple Regression Model Properties of OLS Estimator of Multiple Regression Model Assumptions required to prove unbiasedness: A1: Regression model is linear in parameters A2: X are non-stochastic or fixed in repeated sampling A3: Zero conditional mean A4: Sample is random A5: Variability in the Xs and there is no perfect collinearity in the Xs Assumptions required to prove efficiency: A6: Homoscedasticity and no autocorrelation V ui | X 1i , X 2i ,...., X ki 2 Covui , u j 0 Topic 3: The Multiple Regression Model Estimating the variance of the OLS estimators Need to know dispersion (variance) of sampling distribution of OLS estimator in order to show that it is efficient (also required for inference) In multiple regression model: V ˆk 2 SSTk 1 Rk2 Depends on: a) σ2: the error variance (reduces accuracy of estimates) b) SSTk: variation in X (increases accuracy of estimates) c) R2k: the coefficient of determination from a regression of Xk on all other independent variables (degree of multicollinearity reduces accuracy of estimates) What about the variancen of the error terms 2? 1 ˆ 2 uˆi2 n k 1 i 1 The Multiple Regression Model Model specification Inclusion of irrelevant variables: OLS estimator unbiased but with higher variance if X’s correlated Exclusion of relevant variables: Omitted variable bias if variables correlated with variables included in the estimated model Yˆi ˆ0 ˆ1 X1i ˆ2 X 2i ~ ~ ~ Y Estimated Model: i 0 1 X 1i ~ ˆ ˆ ~ OLS estimator: 1 1 2 1 True Model: Biased: ~ ~ E 1 1 21 Omitted Variable Bias: ~ ~ Bias 1 21 Inference in the Multiple Regression Model The Classical Linear Model ˆ Since the ' s can be written as a linear function of u, making assumptions about the sampling distribution of u allows us say something about the sampling distribution of the ˆ ' s Assume u normally distributed u i ~ N 0, 2 Inference in the Multiple Regression Model Hypothesis testing about a single population parameter Assume the following population model follows all CLM assumptions Yi 0 1 X 1i 2 X 2i ..... k X ki ui OLS produces unbiased estimates but how accurate are they? Test by constructing hypotheses about population parameters and using sample estimates and statistical theory to test whether hypotheses are true In particular, we are interested in testing whether population parameters significantly different than zero: H 0 : k 0 Inference in the Multiple Regression Model Hypothesis testing about a single population parameter Two-sided alternative hypothesis H A : k 0 Large positive and negative values of computed test statistic inconsistent with null Reject null if t ̂ c k Example: H0 : k 0 H A : k 0 df 25 0.05 threshold is anywhere above or below 97.5th percentile in either tai l of distributi on c 2.06 Note: If null rejected variable is said to be ‘statistically significant’ at the chosen significance level Inference in the Multiple Regression Model Hypothesis testing about a single population parameter P-value approach: Given the computed t-statistic, what is the smallest significance level at which the null hypothesis would be rejected? P-values below 0.05 provide strong evidence against the null For two sided alternative p-value is given by: P T tˆ Example: k 2 * PT t ˆk H 0 : k 0 H A : k 0 df 40 P T t ˆ t ˆ 1.85 PT 1.85 2 * PT 1.85 k k 2 * 0.0359 0.0718 Note distinction between economic vs. statistical significance Inference in the Multiple Regression Model Testing hypothesis about a single linear combination of parameters Consider the following model: Yi 0 1 X1i 2 X 2i 3 X 3i ui We wish to test whether X1 and X2 have the same effect on Y or H 0 : 1 2 0 H 0 : 1 2 H A : 1 2 Construct statistic as before but standardize difference between parameters ˆ1 ˆ2 t ~ tnk 1 ˆ ˆ se 1 2 Estimate: Var ˆ1 ˆ 2 Var ˆ1 Var ˆ 2 2Cov ˆ1 , ˆ 2 Topic 4: Inference in the Multiple Regression Model 5. Testing hypothesis about multiple linear restrictions General model: Yi 0 1 X1i 2 X 2i ..... k X ki ui We wish to test J exclusion restrictions: H 0 : k J 1 0, k J 2 0,......., k 0 Restricted model: Yi 0 1 X1i 2 X 2i ..... k J X k Ji ui Estimate both models and compute either: F Or: F SSRr SSRur J SSRur n k 1 R ~ FJ ,nk 1 2 2 ur Rr J 2 1 Rur n k 1 ~ FJ ,nk 1 Large values inconsistent with null Note: degrees of freedom for numerator and denominator! Topic 4: Inference in the Multiple Regression Model 6. Overall test for significance of the Regression General model: Yi 0 1 X1i 2 X 2i ..... k X ki ui Test of null hypothesis that all variables except intercept insignificant: H 0 : 1 0, 2 0,......., k 0 Test statistic: R k ~ F F 1 R n k 1 2 k ,n k 1 2 2 R 2 Rur Rr2 0 Large values inconsistent with null