Topics in Microeconometrics William Greene Department of Economics Stern School of Business Descriptive Statistics and Linear Regression Model Building in Econometrics • Parameterizing the model • • • • Nonparametric analysis Semiparametric analysis Parametric analysis Sharpness of inferences follows from the strength of the assumptions A Model Relating (Log)Wage to Gender and Experience Application: Is there a relationship between investment and capital stock? Nonparametric Regression Kernel regression of y on x Semiparametric Regression: Least absolute deviations regression of y on x Parametric Regression: Least squares – maximum likelihood – regression of y on x Cornwell and Rupert Panel Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP WKS OCC IND SOUTH SMSA MS FEM UNION ED BLK LWAGE = = = = = = = = = = = = work experience weeks worked occupation, 1 if blue collar, 1 if manufacturing industry 1 if resides in south 1 if resides in a city (SMSA) 1 if married 1 if female 1 if wage set by union contract years of education 1 if individual is black log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text. A First Look at the Data Descriptive Statistics • • Basic Measures of Location and Dispersion Graphical Devices • • Histogram Kernel Density Estimator Histogram for LWAGE The kernel density estimator is a histogram (of sorts). x i x m* n 1 * ˆf ( x * ) 1 K , fo r a se t o f p o in ts x m m i1 n B B B " b a n d w id th " ch o se n b y th e a n a lyst K th e k e rn e l fu n ctio n , su ch a s th e n o rm a l o r lo g istic p d f (o r o n e o f se v e ra l o th e rs) x* th e p o in t a t w h ich th e d e n sity is a p p ro xim a te d . T h is is e sse n tia lly a h isto g ra m w ith sm a ll b in s. Kernel Estimator for LWAGE Kernel Density Estimator T h e cu rse o f d im e n sio n a lity 1 n 1 * fˆ ( x m ) i 1 K n B x i x m* B * , fo r a se t o f p o in ts x m B " b a n d w id th " K th e k e rn e l fu n ctio n x* th e p o in t a t w h ich th e d e n sity is a p p ro xim a te d . fˆ ( x* ) is a n e stim a to r o f f(x* ) 1 n n i1 Q ( x i | x* ) Q ( x* ). B u t, V a r[Q ( x* )] 1 1 S o m e th in g . R a th e r, V a r[Q ( x* )] N N 3 /5 * so m e th in g I.e., fˆ ( x* ) d o e s n o t co n v e rg e to f ( x* ) a t th e sa m e ra te a s a m e a n co n v e rg e s to a p o p u la tio n m e a n . Objective: Impact of Education on (log) wage • • • • Specification: What is the right model to use to analyze this association? Estimation Inference Analysis Simple Linear Regression LWAGE = 5.8388 + 0.0652*ED Multiple Regression Specification: Quadratic Effect of Experience Partial Effects Education: Experience: FEM .05544 .04062 – 2*.00068*Exp – .37522 Model Implication: Effect of Experience and Male vs. Female Hypothesis Test About Coefficients • Hypothesis • • • Null: Restriction on β: Rβ – q = 0 Alternative: Not the null Approaches • • Fitting Criterion: R2 decrease under the null? Wald: Rb – q close to 0 under the alternative? Hypotheses All Coefficients = 0? R = [ 0 | I ] q = [0] ED Coefficient = 0? R = 0,1,0,0,0,0,0,0,0,0,0,0 q= 0 No Experience effect? R = 0,0,1,0,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0,0 q=0 0 Hypothesis Test Statistics S u b s c rip t 0 = th e m o d e l u n d e r th e n u ll h yp o th e s is S u b s c rip t 1 = th e m o d e l u n d e r th e a lte rn a tiv e h yp o th e s is 1 . B a s e d o n th e F ittin g C rite rio n R 2 F= 2 (R 1 - R 0 ) / J (1 - 2 R1 2 ) / (N - K 1 ) = F [J,N - K 1 ] 2 . B a s e d o n th e W a ld D is ta n c e : N o te , fo r lin e a r m o d e ls , W = J F . -1 2 -1 C h i S q u a re d = ( R b - q ) R s ( X 1 X 1 ) R ( R b - q ) Hypothesis: All Coefficients Equal Zero All Coefficients = 0? R = [0 | I] q = [0] R12 = .42645 R02 = .00000 F = 280.7 with [11,4153] Wald = b2-12[V2-12]-1b2-12 = 3087.83355 Note that Wald = JF = 11(280.7) Hypothesis: Education Effect = 0 ED Coefficient = 0? R = 0,1,0,0,0,0,0,0,0,0,0,0 q= 0 R12 = .42645 R02 = .36355 (not shown) F = 455.396 Wald = (.05544-0)2/(.0026)2 = 455.396 Note F = t2 and Wald = F For a single hypothesis about 1 coefficient. Hypothesis: Experience Effect = 0 No Experience effect? R = 0,0,1,0,0,0,0,0,0,0,0,0 0,0,0,1,0,0,0,0,0,0,0,0 q= 0 0 R02 = .34101, R12 = .42645 F = 309.33 Wald = 618.601 (W* = 5.99) A Robust Covariance Matrix T h e W h ite E stim a to r -1 E st.V a r[b ] = ( X X ) • • i Heteroscedasticty Not robust to: • • • • 2 What does robustness mean? Robust to: • • e i x i x i ( X X ) Autocorrelation Individual heterogeneity The wrong model specification ‘Robust inference’ -1 Robust Covariance Matrix Heteroscedasticity Robust Covariance Matrix