Econometrics I Professor William Greene Stern School of Business Department of Economics 11-1/72 Part 11: Asymptotic Distribution Theory Econometrics I Part 11 – Asymptotic Distribution Theory 11-2/72 Part 11: Asymptotic Distribution Theory Received October 6, 2012 Dear Prof. Greene, I am AAAAAA, an assistant professor of Finance at the xxxxx university of xxxxx, xxxxx. I would be grateful if you could answer my question regarding the parameter estimates and the marginal effects in Multinomial Logit (MNL). After running my estimations, the parameter estimate of my variable of interest is statistically significant, but its marginal effect, evaluated at the mean of the explanatory variables, is not. Can I just rely on the parameter estimates’ results to say that the variable of interest is statistically significant? How can I reconcile the parameter estimates and the marginal effects’ results? Thank you very much in advance! Best, AAAAAA 11-3/72 Part 11: Asymptotic Distribution Theory Asymptotics: Setting Most modeling situations involve stochastic regressors, nonlinear models or nonlinear estimation techniques. The number of exact statistical results, such as expected value or true distribution, that can be obtained in these cases is very low. We rely, instead, on approximate results that are based on what we know about the behavior of certain statistics in large samples. Example from basic statistics: What can we say about 1/ x ? We know a lot about x . What do we know about its reciprocal? 11-4/72 Part 11: Asymptotic Distribution Theory Convergence Definitions, kinds of convergence as n grows large: 1. To a constant; example, the sample mean, x converges to the population mean. 2. To a random variable; example, a t statistic with n -1 degrees of freedom converges to a standard normal random variable 11-5/72 Part 11: Asymptotic Distribution Theory Convergence to a Constant Sequences and limits. Sequence of constants, indexed by n (n(n+1)/2 + 3n + 5) Ordinary limit: -------------------------- ? (1/2) (n2 + 2n + 1) (The use of the “leading term”) Convergence of a random variable. What does it mean for a random variable to converge to a constant? Convergence of the variance to zero. The random variable converges to something that is not random. 11-6/72 Part 11: Asymptotic Distribution Theory Convergence Results Convergence of a sequence of random variables to a constant convergence in mean square: Mean converges to a constant, variance converges to zero. (Far from the most general, but definitely sufficient for our purposes.) xn 1n in1 xi , E[ xn ] , Var[xn ]= 2 / n 0 A convergence theorem for sample moments. Sample moments converge in probability to their population counterparts. Generally the form of The Law of Large Numbers. (Many forms; see Appendix D in your text. This is the “weak” law of large numbers.) Note the great generality of the preceding result. (1/n)Σig(zi) converges to E[g(zi)]. 11-7/72 Part 11: Asymptotic Distribution Theory Extending the Law of Large Numbers Suppose x has mean and finite variance 2 and x1 , x 2 , ..., x n are a random sample. Then the LLN applies to x. Let z i x iP . Then, z1 , z 2 ,..., z n are a random sample from a population with mean E[z] = E[x P ] and Var[z] = E[x 2P ] - {E[x P ]}2 . The LLN applies to z as long as the moments are finite. There is no mention of normality in any of this. Example: If x ~ N[0,2 ], then 0 if P is odd E[x ] P (P 1)!! if P is even (P 1)!! = product of odd numbers up to P-1. P No power of x is normally distributed. Normality is irrelevant to the LLN. 11-8/72 Part 11: Asymptotic Distribution Theory Probability Limit Let be a constant, be any positive value, and n index the sequence. If lim(n )Prob[|x n | > ] 0 then, plim x n . x n converges in probability to . In words, the probability that the difference between x n and is larger than for any goes to zero. x n becomes arbitrarily close to . Mean square convergence is sufficient (not necessary) for convergence in probability. (We will not require other, broader definitions of convergence, such as "almost sure convergence.") 11-9/72 Part 11: Asymptotic Distribution Theory Mean Square Convergence 11-10/72 Part 11: Asymptotic Distribution Theory Probability Limits and Expecations What is the difference between E[xn] and plim xn? A notation P plim x n x n 11-11/72 Part 11: Asymptotic Distribution Theory Consistency of an Estimator If the random variable in question, xn is an estimator (such as the mean), and if plim xn = θ Then xn is a consistent estimator of θ. Estimators can be inconsistent for two reasons: (1) They are consistent for something other than the thing that interests us. (2) They do not converge to constants. They are not consistent estimators of anything. We will study examples of both. 11-12/72 Part 11: Asymptotic Distribution Theory The Slutsky Theorem Assumptions: If xn is a random variable such that plim xn = θ. For now, we assume θ is a constant. g(.) is a continuous function with continuous derivatives. g(.) is not a function of n. Conclusion: Then plim[g(xn)] = g[plim(xn)] assuming g[plim(xn)] exists. (VVIR!) Works for probability limits. Does not work for expectations. E[x n ]=; plim(x n ) , E[1/x n ]=?; plim(1/xn )=1/ 11-13/72 Part 11: Asymptotic Distribution Theory Slutsky Corollaries x n and y n are two sequences of random variables with probability limits and . Plim (x n y n ) (sum) Plim (x n y n ) (product) Plim (x n / y n ) / (product, if 0) Plim[g(x n ,y n )] g( , ) assuming it exists and g(.) is continuous with continuous partials, etc. 11-14/72 Part 11: Asymptotic Distribution Theory Slutsky Results for Matrices Functions of matrices are continuous functions of the elements of the matrices. Therefore, If plimAn = A and plimBn = B (element by element), then plim(An-1) = [plim An]-1 = A-1 and plim(AnBn) = plimAnplim Bn = AB 11-15/72 Part 11: Asymptotic Distribution Theory Limiting Distributions Convergence to a kind of random variable instead of to a constant xn is a random sequence with cdf Fn(xn). If plim xn = θ (a constant), then Fn(xn) becomes a point. But, xn may converge to a specific random variable. The distribution of that random variable is the limiting distribution of xn. Denoted d xn x Fn (xn ) F(x) n 11-16/72 Part 11: Asymptotic Distribution Theory Limiting Distribution x1 , x2 ,..., xn = a random sample from N[,2 ] For purpose of testing H 0 : 0, the usual test statistic is t n 1 s xn n / n x n , where s 2n i 1 i xn 2 n 1 The exact density of the random variable t n 1 is t with n-1 degrees of freedom. The density varies with n; tn21 (n / 2) 1 f(t n 1 ) 1 [(n 1) / 2] (n 1) n 1 The cdf, Fn 1 (t) = t n/2 f n 1 ( x) dx. The distribution has mean zero and variance (n-1)/(n-3). As n , the distribution and the random variable d converge to standard normal, which is written t n1 N[0,1]. 11-17/72 Part 11: Asymptotic Distribution Theory A Slutsky Theorem for Random Variables (Continuous Mapping) d If x n x, and if g(x n ) is a continuous function with continuous derivatives and does not involve n, then d g(x n ) g(x). Example : t n = random variable with t distribution with n degrees of freedom. t n2 = exactly, an F random variable with [1,n] degrees of freedom. d t n N(0,1), d t n2 [N(0,1)]2 = chi-squared[1]. 11-18/72 Part 11: Asymptotic Distribution Theory An Extension of the Slutsky Theorem d If x n x (x n has a limiting distribution) and is some relevant constant, and d g(x n , ) g (gn has a limiting distribution that is some function of ) and plim y n , then d g(x n , y n ) g(x n , ) (replacing with a consistent estimator leads to the same limiting distribution). 11-19/72 Part 11: Asymptotic Distribution Theory Application of the Slutsky Theorem Large sample behavior of the F statistic for testing restrictions (e * 'e * - e'e) d 2 [J] 2 (e * 'e * - e'e)/J J J F= e'e /(n-K) p e'e /(n-K) 1 2 Therefore, d JF 2 [J] Establishing the numerator requires a central limit theorem. 11-20/72 Part 11: Asymptotic Distribution Theory Central Limit Theorems Central Limit Theorems describe the large sample behavior of random variables that involve sums of variables. “Tendency toward normality.” Generality: When you find sums of random variables, the CLT shows up eventually. The CLT does not state that means of samples have normal distributions. 11-21/72 Part 11: Asymptotic Distribution Theory A Central Limit Theorem Lindeberg-Levy CLT (the simplest version of the CLT) If x1 ,..., x n are a random sample from a population with finite mean and finite variance 2 , then n(x ) d N(0,1) Note, not the limiting distribution of the mean, since the mean, itself, converges to a constant. A useful corollary: if plim s n , and the other conditions are met, then n(x ) d N(0,1) sn 11-22/72 Part 11: Asymptotic Distribution Theory Lindeberg-Levy vs. Lindeberg-Feller Lindeberg-Levy assumes random sampling – observations have the same mean and same variance. Lindeberg-Feller allows variances to differ across observations, with some necessary assumptions about how they vary. Most econometric estimators require LindebergFeller (and extensions such as Lyapunov). 11-23/72 Part 11: Asymptotic Distribution Theory Order of a Sequence Order of a sequence ‘Little oh’ o(.). Sequence hn is o(n) (order less than n) iff n- hn 0. Example: hn = n1.4 is o(n1.5) since n-1.5 hn = 1 /n.1 0. ‘Big oh’ O(.). Sequence hn is O(n) iff n- hn a finite nonzero constant. Example 1: hn = (n2 + 2n + 1) is O(n2). Example 2: ixi2 is usually O(n1) since this is nthe mean of xi2 and the mean of xi2 generally converges to E[xi2], a finite constant. What if the sequence is a random variable? The order is in terms of the variance. Example: What is the order of the sequence x in random sampling? n 2 Var[ x ] = σ /n which is O(1/n) n 11-24/72 Part 11: Asymptotic Distribution Theory Cornwell and Rupert Panel Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP WKS OCC IND SOUTH SMSA MS FEM UNION ED LWAGE = = = = = = = = = = = work experience weeks worked occupation, 1 if blue collar, 1 if manufacturing industry 1 if resides in south 1 if resides in a city (SMSA) 1 if married 1 if female 1 if wage set by union contract years of education log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. 11-25/72 Part 11: Asymptotic Distribution Theory 11-26/72 Part 11: Asymptotic Distribution Theory 11-27/72 Part 11: Asymptotic Distribution Theory Histogram for LWAGE 11-28/72 Part 11: Asymptotic Distribution Theory The kernel density estimator is a histogram (of sorts). 1 n 1 x i x m* * f̂(x m ) i1 K , for a set of points x m n B B * B "bandwidth" chosen by the analyst K the kernel function, such as the normal or logistic pdf (or one of several others) x* the point at which the density is approximated. This is essentially a histogram with small bins. 11-29/72 Part 11: Asymptotic Distribution Theory Kernel Density Estimator The curse of dimensionality 1 n 1 xi xm* * f̂(xm ) i1 K , for a set of points x m n B B * B "bandwidth" K the kernel function x* the point at which the density is approximated. f̂(x*) is an estimator of f(x*) 1 n Q(xi | x*) Q(x*). i1 n 1 1 But, Var[Q(x*)] Something. Rather, Var[Q(x*)] 3/5 * Something N N ˆ I.e.,f(x*) does not converge to f(x*) at the same rate as a mean converges to a population mean. 11-30/72 Part 11: Asymptotic Distribution Theory Kernel Estimator for LWAGE f̂(x*) X* 11-31/72 Part 11: Asymptotic Distribution Theory Asymptotic Distribution An asymptotic distribution is a finite sample approximation to the true distribution of a random variable that is good for large samples, but not necessarily for small samples. Stabilizing transformation to obtain a limiting distribution. Multiply random variable xn by some power, a, of n such that the limiting distribution of naxn has a finite, nonzero variance. Example, x has a limiting variance of zero, since the variance is σ2/n. But, n the variance of √n x n is σ2. However, this does not stabilize the distribution because E[ n x] = √ nμ. n The stabilizing transformation would be n(x ) 11-32/72 Part 11: Asymptotic Distribution Theory Asymptotic Distribution Obtaining an asymptotic distribution from a limiting distribution Obtain the limiting distribution via a stabilizing transformation Assume the limiting distribution applies reasonably well in finite samples Invert the stabilizing transformation to obtain the asymptotic d distribution n(x ) / N[0,1] Assume holds in finite samples. Then, a n(x ) N[0, 2 ] a (x ) N[0, 2 / n] a x N[, 2 / n] Asymptotic distribution. 2 / n the asymptotic variance. Asymptotic normality of a distribution. 11-33/72 Part 11: Asymptotic Distribution Theory Asymptotic Efficiency Comparison of asymptotic variances How to compare consistent estimators? If both converge to constants, both variances go to zero. Example: Random sampling from the normal distribution, Sample mean is asymptotically normal[μ,σ2/n] Median is asymptotically normal [μ,(π/2)σ2/n] Mean is asymptotically more efficient 11-34/72 Part 11: Asymptotic Distribution Theory Limiting Distribution: Testing for Normality Normality Test for Random Variable e i 1 (ei e )2 n s= , mj n e = 0 for regression residuals n j ( e e ) i 1 i n , ( m3 / s 3 ) 2 [( m4 / s 4 ) 3]2 Chi-squared[2] = 6 20 If the assumptions of the Lindeberg CLT apply to ( m3 / s 3 ) 2 [( m4 / s 4 ) 3]2 d n ( x ) / x then Chi squared[2] 6 20 11-35/72 Part 11: Asymptotic Distribution Theory Extending the Law of Large Numbers Suppose x has mean and finite variance 2 and x1 , x 2 , ..., x n are a random sample. Then the LLN applies to x. Let z i x iP . Then, z1 , z 2 ,..., z n are a random sample from a population with mean E[z] = E[x P ] and Var[z] = E[x 2P ] - {E[x P ]}2 . The LLN applies to z as long as the moments are finite. There is no mention of normality in any of this. Example: If x ~ N[0,2 ], then 0 if P is odd E[x ] P (P 1)!! if P is even (P 1)!! = product of odd numbers up to P-1. P No power of x is normally distributed. Normality is irrelevant to the LLN. 11-36/72 Part 11: Asymptotic Distribution Theory Elements of a Proof for Bowman Shenton 1 n 1 n 1 n j j (1) CLT applies to m j i 1 (x i ) i 1 ( x i ) i 1 z i n n n (2) Slutsky Theorem: use x instead of in m j . (3) Slutsky Theorem: use s 2 instead of 2 in computing Var[m j ] (4) Continuous mapping: Square of a N[0,1] is chi-squared[1]. (5) Slutsky Theorem: Operate on h(m3 ) + h(m 4 ). h(.) is square. What if the variable is not normally distributed? (a) CLT still applies. (b) E[x i j ] is still zero but (m3 ,m 4 ) no longer (0,3) (c) Sum of squares now has a noncentral chi-squared 11-37/72 Part 11: Asymptotic Distribution Theory Bera and Jarque Directly Apply Bowman and Shenton to Regression Residuals? No. Regression residuals are not a random sample. e = My = MXβ + Mε = M. E[e]=0, but E[ee'] = 2M M I X( XX) 1 X 1 xi ( XX) 1 xi on the diagonal 1 M ij 1 0 xi ( XX) x j off the diagonal As n , M I BUT!! M is n n, so it does not anything! As n , the vector of residuals becomes a random sample. 11-38/72 Part 11: Asymptotic Distribution Theory Expectation vs. Probability Limit 1 xN , PX N ( x) N (1- N1 ) 1 N E[ xN ] 1 (1- N1 ) N N1 2 N1 2 Plim x N 1 by the definition, lim N Prob[|x N 1| > ]=0 (lim N Prob( xN 1) 1 3 1 Var[ xN ] N 3 2 N N 11-39/72 Part 11: Asymptotic Distribution Theory The Delta Method The delta method (combines most of these concepts) Nonlinear transformation of a random variable: f(xn) such that plim xn = but n (xn - ) is asymptotically normally distributed. What is the asymptotic behavior of f(xn)? Taylor series approximation: f(xn) f() + f() (xn - ) By Slutsky theorem, plim f(xn) = f() n[f(xn) - f()] f() [n (xn - )] Large sample behaviors of the LHS and RHS sides are the same (generally - requires f(.) to be nicely behaved. RHS is a constant times something familiar. Large sample variance is [f()]2 times large sample Var[n (xn - )] Return to asymptotic variance of xn Gives us the VIR for the asymptotic distribution of a function. 11-40/72 Part 11: Asymptotic Distribution Theory Delta Method a If x n N[, 2 / n] and f(x n ) is a continuous and continuously differentiable function that does not involve n, then a f(x n ) N{f(),[f '()]2 2 / n} 11-41/72 Part 11: Asymptotic Distribution Theory Delta Method - Applications a x n N[, 2 / n] What is the asymptotic distribution of f(x n )=exp(x n ) or f(x n )=1/x n (1) Normal since x n is asymptotically normally distributed (2) Asymptotic mean is f()=exp() or 1/. (3) For the variance, we need f'() =exp() or -1/2 Asy.Var[f(x n )]= [exp()]2 2 / n or [1/ 4 ]2 / n 11-42/72 Part 11: Asymptotic Distribution Theory Delta Method Application 2 Target of estimation is = 1 2 Estimation strategy: (1) Estimate = log2 (2) Estimate = exp( /2) (3) Estimate = 2 / (1 2 ) 11-43/72 Part 11: Asymptotic Distribution Theory Delta Method ˆ 1.123706 Vˆ .07155482 .00512009 ˆ exp(ˆ / 2) exp(1.123706 / 2) exp(.561853) .5701517 gˆ d ˆ / d ˆ 12 exp(ˆ / 2) 12 ˆ .2850758 gˆ 2 d ˆ / d ˆ .08126821 2 d ˆ / d ˆ 2 Vˆ .08126821(.00512009) .0004161 Estimated Standard Error = .0004161 .02039854 11-44/72 Part 11: Asymptotic Distribution Theory Midterm Question Continuing the previous example, there are two approaches implied for estimating : 2 (1) = f () = 1 2 exp() (2) = h() = 1 exp() Use the delta method to estimate a standard error for each of the two estimators of . Do you obtain the same answer? Bring your prepared solution to the exam and submit it with the in class part of the exam. 11-45/72 Part 11: Asymptotic Distribution Theory Confidence Intervals? ˆ .5701517 1.96(.0203985) = .5301707 to .6101328?? The center of the confidence interval given in the table is .571554! What is going on here? The confidence limits given are exp(-1.23695/2) to exp(-.984361/2)! 11-46/72 Part 11: Asymptotic Distribution Theory Krinsky and Robb vs. the Delta Method Krinsky and Robb, ReStat, 1986. The delta method doesn't work very well when f(.) is highly nonlinear. Alternative approach based on the law of large numbers: (1) Compute x n = the estimator of , and estimate the asymptotic variance (standard deviation), v n (s n ). (2) Compute f(x n ) to estimate f(). (3) Estimate the asymptotic variance of f(x n ) by using a random number generator to draw a normal random sample from the asymptotic population N[x n , s n2 ]. Compute a sample of function values f1 ... fR = f(x n,1 ),...,f(x n,R ). Use the sample variance of these draws to estimate the asymptotic variance of f(x n ). (Krinsky and Robb, ReStat, 1991. Programming error. Retraction.) 11-47/72 Part 11: Asymptotic Distribution Theory Krinsky and Robb Generate R (e.g., 10000) draws on lnsig2u ~N[-1.123706,.07155482 ] (To center around point estimate, use sigma_u r -1.123706 (lnsig2u r lnsig2u) Calculate sigma_u r = exp(.5*lnsig2u r ) Compute standard error and confidence interval for sigma_u r as descriptive statistics. 11-48/72 Part 11: Asymptotic Distribution Theory Krinsky and Robb 11-49/72 Part 11: Asymptotic Distribution Theory Delta Method – More than One Parameter If ˆ 1 , ˆ 1 ,..., ˆ K are K consistent estimators of K parameters v11 v 1 ,2 ,...,K with asymptotic covariance matrix V= 21 ... v K1 v12 ... v1K v 22 ... v 2K , ... ... ... v K2 ... v KK and if f(ˆ 1 , ˆ 1 ,..., ˆ K ) = a continuous function with continuous derivatives, then the asymptotic variance of f(ˆ 1 , ˆ 1 ,..., ˆ K ) is f(.) g'Vg = 1 11-50/72 v11 f(.) f(.) v 21 ... 2 K ... v K1 v12 ... v 22 ... ... ... v K2 ... f(.) 1 v1K f(.) K K v 2K f(.) f(.) Vkl 2 ... l k 1 l1 k ... v KK f(.) K Part 11: Asymptotic Distribution Theory Log Income Equation ---------------------------------------------------------------------Ordinary least squares regression ............ LHS=LOGY Mean = -1.15746 Estimated Cov[b1,b2] Standard deviation = .49149 Number of observs. = 27322 Model size Parameters = 7 Degrees of freedom = 27315 Residuals Sum of squares = 5462.03686 Standard error of e = .44717 Fit R-squared = .17237 --------+------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------AGE| .06225*** .00213 29.189 .0000 43.5272 AGESQ| -.00074*** .242482D-04 -30.576 .0000 2022.99 Constant| -3.19130*** .04567 -69.884 .0000 MARRIED| .32153*** .00703 45.767 .0000 .75869 HHKIDS| -.11134*** .00655 -17.002 .0000 .40272 FEMALE| -.00491 .00552 -.889 .3739 .47881 EDUC| .05542*** .00120 46.050 .0000 11.3202 --------+------------------------------------------------------------- 11-51/72 Part 11: Asymptotic Distribution Theory Age-Income Profile: Married=1, Kids=1, Educ=12, Female=1 11-52/72 Part 11: Asymptotic Distribution Theory Application: Maximum of a Function AGE| AGESQ| .06225*** -.00074*** .00213 .242482D-04 29.189 -30.576 .0000 .0000 43.5272 2022.99 log Y 1 Age 2 Age 2 ... At what age does log income reach its maximum? 1 log Y .06225 1 22 Age 0 => Age* = 42.1 Age 22 2(.00074) Age * 1 1 =g1 = =675.68 1 22 2(.00074) Age * 1 .06225 2 g2 56838.9 2 2 22 2(.00074) 11-53/72 Part 11: Asymptotic Distribution Theory Delta Method Using Visible Digits 675.682 (4.54799 106 ) 56838.92 (5.8797 1010 ) 2(675.68)(56838.9)(5.1285 108 ) .0366952 standard error = square root = .1915599 11-54/72 Part 11: Asymptotic Distribution Theory Delta Method Results ----------------------------------------------------------WALD procedure. --------+-------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] --------+-------------------------------------------------G1| 674.399*** 22.05686 30.575 .0000 G2| 56623.8*** 1797.294 31.505 .0000 AGESTAR| 41.9809*** .19193 218.727 .0000 --------+-------------------------------------------------(Computed using all internal digits of regression results) 11-55/72 Part 11: Asymptotic Distribution Theory 11-56/72 Part 11: Asymptotic Distribution Theory Krinsky and Robb Descriptive Statistics ============================================================ Variable Mean Std.Dev. Minimum Maximum --------+--------------------------------------------------ASTAR | 42.0607 .191563 41.2226 42.7720 11-57/72 Part 11: Asymptotic Distribution Theory More than One Function and More than One Coefficient If ˆ 1 , ˆ 1 ,..., ˆ K are K consistent estimators of K parameters v11 v12 v v 22 1 ,2 ,...,K with asymptotic covariance matrix V= 21 ... ... v K1 v K2 and if f1(ˆ 1 , ˆ 1 ,..., ˆ K ), f2(ˆ 1 , ˆ 1 ,..., ˆ K ), ...,fJ(ˆ 1 , ˆ 1 ,..., ˆ K ) = ... v1K ... v 2K , ... ... ... v KK J continuous ˆ is functions with continuous derivatives, then the asymptotic covariance matrix of ˆf1,...,fJ f1 1 f 2 GVG' = 1 ... fJ 1 11-58/72 ... v 11 f 2 f 2 ... K v 21 2 ... ... ... ... fJ fJ v ... 2 K K1 f1 2 f1 K v12 ... v1K 1 f1 v 22 ... v 2K 2 ... ... ... ... f1 v K2 ... v KK K f1 ... f 2 fJ ... 2 2 ... ... ... f 2 ... fJK K f 2 1 fJ 1 Part 11: Asymptotic Distribution Theory Application: Partial Effects Received October 6, 2012 Dear Prof. Greene, I am AAAAAA, an assistant professor of Finance at the xxxxx university of xxxxx, xxxxx. I would be grateful if you could answer my question regarding the parameter estimates and the marginal effects in Multinomial Logit (MNL). After running my estimations, the parameter estimate of my variable of interest is statistically significant, but its marginal effect, evaluated at the mean of the explanatory variables, is not. Can I just rely on the parameter estimates’ results to say that the variable of interest is statistically significant? How can I reconcile the parameter estimates and the marginal effects’ results? Thank you very much in advance! Best, AAAAAA 11-59/72 Part 11: Asymptotic Distribution Theory Application: Doctor Visits German Individual Health Care data: N=27,236 Model for number of visits to the doctor: True E[V|Income] = exp(1.412 - .0745*income) Linear regression: g*(Income)=3.917 - .208*income 11-60/72 Part 11: Asymptotic Distribution Theory A Nonlinear Model E[docvis | x] exp(1 2 AGE 3 EDUC...) 11-61/72 Part 11: Asymptotic Distribution Theory Interesting Partial Effects Estimate Effects at the Means of the Data Ê[docvis | x] exp(b1 b 2 AGE b 3 EDUC...) Ê[docvis | x] exp(b1 b 2 AGE b3 EDUC...)b 2 AGE = f AGE (b1 , b 2 ,..., b 6 | AGE, EDUC) Ê[docvis | x] exp(b1 b 2 AGE b3 EDUC...)b3 EDUC = f EDUC (b1 , b 2 ,..., b 6 | AGE, EDUC) Ê[docvis | x] exp(b1 b 2 AGE b3 EDUC...)b 6 INCOME = f HHNINC (b1 , b 2 ,..., b 6 | AGE, EDUC) 11-62/72 Part 11: Asymptotic Distribution Theory Necessary Derivatives (Jacobian) Ê[docvis | x] exp(b1 b2 AGE b3 EDUC...)b 2 = f AGE (b1 , b 2 ,..., b 6 | AGE, EDUC) AGE f AGE b2 exp(b1 b2 AGE b3 EDUC...) b2 exp(...) 1 b1 b1 f AGE b exp(...) b2 exp(...) 2 b2 b2 b2 b 2 exp(...) AGE e xp(...)1 f AGE b2 exp(b1 b2 AGE b3 EDUC...) b2 exp(...) EDUC b3 b3 f AGE b2 exp(b1 b2 AGE b3 EDUC...) b2 exp(...) MARRIED b4 b4 f AGE b2 exp(b1 b2 AGE b3 EDUC...) b2 exp(...) FEMALE b5 b5 f AGE b2 exp(b1 b2 AGE b3 EDUC...) b2 exp(...) HHNINC b6 b6 11-63/72 Part 11: Asymptotic Distribution Theory 11-64/72 Part 11: Asymptotic Distribution Theory 11-65/72 Part 11: Asymptotic Distribution Theory Partial Effects at Means vs. Mean of Partial Effects Partial Effects at the Means f | 1n in1x i f ( | x ) (, x ) x x Mean of Partial Effects 1 n f ( | x i ) (, X ) = i 1 n x i Makes more sense for dummy variables, d: i (, x i ,d) f ( | x i ,d=1) - f ( | x i ,d=0) (, X, d) makes more sense than (, x, d) 11-66/72 Part 11: Asymptotic Distribution Theory Partial Effect for a Dummy Variable? 11-67/72 Part 11: Asymptotic Distribution Theory Application: CES Function 11-68/72 Part 11: Asymptotic Distribution Theory Application: CES Function 1 1 G 1 1 11-69/72 2 3 2 3 2 3 2 3 4 exp(1 ) 0 3 0 4 (2 3 )2 0 1 4 34 0 223 4 0 2 (2 3 )2 1 24 322 0 0 0 (2 3 ) 23 Part 11: Asymptotic Distribution Theory Application: CES Function Using Spanish Dairy Farm Data Create Regress Calc Calc ; x1=1 ; x2=logcows ; x3=logfeed ; x4=-.5*(logcows-logfeed)^2$ ; lhs=logmilk;rh1=x1,x2,x3,x4$ ; b1=b(1);b2=b(2);b3=b(3);b4=b(4) $ ; gamma=exp(b1) ; delta=b2/(b2+b3) ; nu=b2+b3 ; rho=b4*(b2+b3)/(b2*b3)$ Calc ;g11=exp(b1) ;g12=0 ;g13=0 ;g14=0 ;g21=0 ;g22=b3/(b2+b3)^2 ;g23=-b2/(b2+b3)^2 ;g24=0 ;g31=0 ;g32=1 ;g33=1 ;g34=0 ;g41=0 ;g42=-b3*b4/(b2^2*b3) ;g43=-b2*b4/(b2*b3^2) ;g44=(b2+b3)/(b2*b3)$ Matrix ; g=[g11,g12,g13,g14/g21,g22,g23,g24/g31,g32,g33,g34/g41,g42,g43,g44]$ Matrix ; VDelta=G*VARB*G' $ Matrix ; theta=[gamma/delta/nu/rho] ; Stat(theta,vdelta)$ 11-70/72 Part 11: Asymptotic Distribution Theory Estimated CES Function --------+-------------------------------------------------------------------| Standard Prob. 95% Confidence Matrix| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------THETA_1| 105981*** 475.2458 223.00 .0000 105049 106912 THETA_2| .56819*** .01286 44.19 .0000 .54299 .59340 THETA_3| 1.06781*** .00864 123.54 .0000 1.05087 1.08475 THETA_4| -.31956** .12857 -2.49 .0129 -.57155 -.06758 --------+-------------------------------------------------------------------- 11-71/72 Part 11: Asymptotic Distribution Theory Asymptotics for Least Squares Looking Ahead… Assumptions: Convergence of XX/n (doesn’t require nonstochastic or stochastic X). Convergence of X/n to 0. Sufficient for consistency. Assumptions: Convergence of (1/n)X’ to a normal vector, gives asymptotic normality What about asymptotic efficiency? 11-72/72 Part 11: Asymptotic Distribution Theory