Lecture :Apply Gauss Markov Modeling Regression with One Explanator (Chapter 3.1–3.5, 3.7 Chapter 4.1–4.4) 5-1 Agenda • Finding a good estimator for a straight line through the origin: Chapter 3.1–3.5, 3.7 • Finding a good estimator for a straight line with an intercept: Chapter 4.1–4.4 5-2 Where Are We? (範例) • We wish to uncover quantitative features of an underlying process, such as the relationship between family income and financial aid. • 更精準些 How much less aid will I receive on average for each dollar of additional family income? • DATA, a sample of the process, for example observations on 10,000 students’ aid awards and family incomes. 5-3 隨機項 • Other factors (e), such as number of siblings, influence any individual student’s aid, so we cannot directly observe the relationship between income and aid. • We need a rule for making a good guess about the relationship between income and financial aid, based on the data. 5-4 Guess • A good guess is a guess which is right on average. • We also desire a guess which will have a low variance around the true value. 5-5 估計式 • Our rule is called an “estimator.” • We started by brainstorming a number of estimators and then comparing their performances in a series of computer simulations. • We found that the Ordinary Least Squares estimator dominated the other estimators. • Why is Ordinary Least Squares so good? 5-6 工具 • To make more general statements, we need to move beyond the computer and into the world of mathematics. • Last time, we reviewed a number of mathematical tools: summations, descriptive statistics, expectations, variances, and covariances. 5-7 DGP • As a starting place, we need to write down all our assumptions about the way the underlying process works, and about how that process led to our data. • These assumptions are called the “Data Generating Process.” • Then we can derive estimators that have good properties for the Data Generating Process we have assumed. 5-8 Model • The DGP is a model to approximate reality. We trade off realism to gain parsimony and tractability. • Models are to be used, not believed. 5-9 DGP assumptions • Much of this course focuses on different types of DGP assumptions that you can make, giving you many options as you trade realism for tractability. 5-10 Two Ways to Screw Up in Econometrics – Your Data Generating Process assumptions missed a fundamental aspect of reality (your DGP is not a useful approximation); or – Your estimator did a bad job for your DGP. • Today we focus on picking a good estimator for your DGP. 5-11 GMT • Today, we will focus on deriving the properties of an estimator for a simple DGP: the Gauss–Markov Assumptions. • First we will find the expectations and variances of any linear estimator under the DGP. • Then we will derive the Best Linear Unbiased Estimator (BLUE). 5-12 Our Baseline DGP: Gauss–Markov (Chapter 3) • Y = bX +e • E(ei ) = 0 • Var(ei ) = s 2 • Cov(ei ,ej ) = 0, for i ≠ j • X ’s fixed across samples (so we can treat them like constants). • We want to estimate b 5-13 A Strategy for Inference • The DGP tells us the assumed relationships between the data we observe and the underlying process of interest. • Using the assumptions of the DGP and the algebra of expectations, variances, and covariances, we can derive key properties of our estimators, and search for estimators with desirable properties. 5-14 An Example: bg1 Yi b X i e i E(e i ) 0 Var(e i ) s 2 Cov(e i ,e j ) 0, for i j X's fixed across samples (so we can treat it as a constant). 1 n Yi b g1 n i1 X i In our simulations, b g1 appeared to give estimates close to b. Was this an accident, or does b g1 on average give us b ? 5-15 An Example: bg1 (OK on average) Yi b X i ei 1 n Yi 1 n 1 n E( b g1 ) E( ) E( ) E( ) n i1 X i n i1 X i n i1 Xi 1 n 1 n 1 E( b ) E(e i ) n i1 n i1 X i 1 nb 0 b n On average, b g1 b . E( b g1 ) b Using the DGP and the algebra of expectations, we conclude that b g1 is unbiased. 5-16 Checking Understanding Yi b X i ei 1 n Yi 1 n 1 n E( b g1 ) E( ) E( ) E( ) n i1 X i n i1 X i n i1 Xi 1 n 1 n 1 E( b ) E(e i ) n i1 n i1 X i 1 nb 0 b n E( b g1 ) b Question: which DGP assumptions did we need to use? 5-17 Which assumption used? Yi b X i ei 1 n Yi 1 n 1 n E( b g1 ) E( ) E( ) E( ) n i1 X i n i1 X i n i1 Xi Here we used Yi b X i e i 1 n 1 n 1 E( b ) E(e i ) n i1 n i1 X i Here we used the assumption that X's are fixed across samples. 1 nb 0 b n Here we used E(e i ) 0 5-18 Checking Point 2: We did NOT use the assumptions about the variance and covariances of e i . We will use these assumptions when we calculate the variance of the estimator. 5-19 Linear Estimators • bg1 is unbiased. Can we generalize? • We will focus on linear estimators. • Linear estimator: a weighted sum of the Y ’s. ˆ b wY ii 5-20 Linear Estimators (weighted sum) • Linear estimator: bˆ wY i i • Example: bg1 is a linear estimator. Yi 1 b g1 n Xi 1 wi nX i b g1 wiYi 5-21 A class of Linear Estimators 1) Mean of Ratios: Y 1 b g1 i n Xi wi 1 nX i 3) Mean of Ratio of Changes: Yi Yi1 1 b g3 n 1 X i X i1 1 1 1 wi n 1 X i X i1 X i1 X i 2) Ratio of Means: 4) Ordinary Least Squares: Y X YX X b g2 wi b g4 i 1 Xj i i i 2 j wi Xi X 2 j • All of our “best guesses” are linear estimators! 5-22 Expectation of Linear Estimators Yi b X i e i E (e i ) 0 Var (e i ) s 2 Cov(e i , e j ) 0, for i j X 's fixed across samples (so we can treat it as a constant). n bˆ wiYi i 1 n n n i 1 i 1 i 1 E ( bˆ ) E ( wiYi ) wi E (Yi ) wi E ( b X i e i ) n n i 1 i 1 wi [ E ( b X i ) E (e i )] b wi X i 5-23 Condition for Unbias n bˆ wiYi i 1 n E ( bˆ ) b wi X i i 1 n A linear estimator is unbiased if w X i 1 i i 1. 5-24 Check others • A linear estimator is unbiased if SwiXi = 1 • Are bg2 and bg4 unbiased? 2) Ratio of Means: 4) Ordinary Least Squares: Y X YX X b g2 b g4 i i 1 wi Xj 1 X w i i X Xi j 1 Xi 1 Xj i i 2 j wi Xi X 2 j w X i i Xi X 2 Xi j 1 2 1 X i 2 Xj 5-25 Better unbiased estimator • Similar calculations hold for bg3 • All 4 of our “best guesses” are unbiased. • But bg4 did much better than bg3. Not all unbiased estimators are created equal. • We want an unbiased estimator with a low mean squared error. 5-26 First: A Puzzle….. • Suppose n = 1 – Would you like a big X or a small X for that observation? – Why? 5-27 What Observations Receive More Weight? 1) Mean of Ratios: Yi 1 b g1 n Xi wi 1 nX i 3) Mean of Ratio of Changes: Yi Yi1 1 b g3 n 1 X i X i1 1 1 1 wi n 1 X i X i1 X i1 X i 2) Ratio of Means: 4) Ordinary Least Squares: Y X YX X b g2 wi b g4 i 1 Xj i i i 2 j wi Xi 2 X j 5-28 (Stat. significant)? • bg1 puts more weight on observations with low values of X. • bg3 puts more weight on observations with low values of X, relative to neighboring observations. • These estimators did very poorly in the simulations. b g1 wi Yi 1 n Xi 1 nX i Yi Yi1 1 n 1 X i X i1 1 1 1 wi n 1 X i X i1 X i1 X i b g3 5-29 What Observations Receive More Weight? (cont.) • bg2 weights all observations equally. • bg4 puts more weight on observations with high values of X. • These observations did very well in the simulations. b g2 Y X 1 wi Xj b g4 i i YX X i i 2 j Xi wi 2 X j 5-30 Why Weight More Heavily Observations With High X ’s? • Under our Gauss–Markov DGP the disturbances are drawn the same for all values of X…. • To compare a high X choice and a low X choice, ask what effect a given disturbance will have for each. 5-31 Figure 3.1 Effects of a Disturbance for Small and Large X 5-32 Linear Estimators and Efficiency • For our DGP, good estimators will place more weight on observations with high values of X • Inferences from these observations are less sensitive to the effects of the same e • Only one of our “best guesses” had this property. • bg4 (a.k.a OLS) dominated the other estimators. • Can we do even better? 5-33 Min. MSE • Mean Squared Error = Variance + Bias2 • To have a low Mean Squared Error, we want two things: a low bias and a low variance. 5-34 Need Variance • An unbiased estimator with a low variance will tend to give answers close to the true value of b • Using the algebra of variances and our DGP, we can calculate the variance of our estimators. 5-35 Algebra of Variances (1) Var (k ) 0 (2) Var (kY ) k 2 ·Var (Y ) (3) Var (k Y ) Var (Y ) (4) Var ( X Y ) Var ( X ) Var (Y ) 2Cov( X , Y ) n n i 1 i 1 n n (5) Var ( Yi ) Var (Yi ) Cov(Yi , Y j ) i 1 j 1 j i • One virtue of independent observations is that Cov( Yi ,Yj ) = 0, killing all the cross-terms in the variance of the sum. 5-36 Back again to Our Baseline DGP : Gauss–Markov • Our benchmark DGP: Gauss–Markov • Y = bX + e • E(ei ) = 0 • Var(ei ) = s 2 • Cov(ei ,ej ) = 0, for i ≠ j • X ’s fixed across samples We will refer to this DGP (very) frequently. 5-37 Variance of OLS X iYi OLS ) Var 2 X i X iYi Var 2 2 X i Var ( bˆ n i 1 X iYi X jY j Cov( , 2 2 X X j 1, k k n j i 2 Xi Var Yi 0 2 X k 2 Xi Var b X i e i X 2 k 5-38 Variance of OLS (cont.) 2 Xi Var b X i e i OLS ) X 2 k Var ( bˆ 2 Xi (0 Var e i 0) X 2 k 2 Xi 2 2 s s X 2 k 1 2 X k 2 2 X i s2 2 X k • Note: the higher the Xk2 , the lower the variance. 5-39 Variance of a Linear Estimator • More generally: Var ( wiYi ) Var ( wiYi ) 2 Covariance Terms Var ( wiYi ) 0 wi 2 Var (Yi ) wi 2 Var ( b X i e i ) wi 2 0 Var (e i ) 0 s 2 wi 2 5-40 Variance of a Linear Estimator (cont.) • The algebras of expectations and variances allow us to get exact results where the Monte Carlos gave only approximations. • The exact results apply to ANY model meeting our Gauss–Markov assumptions. 5-41 Variance of a Linear Estimator (cont.) • We now know mathematically that bg1–bg4 are all unbiased estimators of b under our Gauss–Markov assumptions. • We also think from our Monte Carlo models that bg4 is the best of these four estimators, in that it is more efficient than the others. • They are all unbiased (we know from the algebra), but bg4 appears to have a smaller variance than the other 3. 5-42 Variance of a Linear Estimator (cont.) • Is there an unbiased linear estimator better (i.e., more efficient) than bg4? – What is the Best, Linear, Unbiased Estimator? – How do we find the BLUE estimator? 5-43 BLUE Estimators • Mean Squared Error = Variance + Bias2 • An unbiased estimator is right “on average” • In practice, we don’t get to average. We see only one draw from the DGP. 5-44 BLUE Estimators (Trade-off ??) • Some analysts would prefer an estimator with a small bias, if it gave them a large reduction in variance • What good is being right on average if you’re likely to be very wrong in your one draw? 5-45 BLUE Estimators (cont.) • Mean Squared Error = Variance + Bias2 • In a particular application, there may be a favorable trade-off between accepting a little bias in return for a lot less variance. • We will NOT look for these trade-offs. • Only after we have made sure our estimator is unbiased will we try to make the variance small. 5-46 BLUE Estimators (cont.) A Strategy for Finding the Best Linear Unbiased Estimator: 1. Start with linear estimators: wiYi 2. Impose the unbiasedness condition wiXi=1 3. Calculate the variance of a linear estimator: Var(wiYi) =s2wi2 – Use calculus to find the wi that give the smallest variance subject to the unbiasedness condition Result: the BLUE Estimator for Our DGP 5-47 BLUE Estimators (cont.) Xi Using calculus, we would find wi 2 X j This formula is OLS! OLS is the Best Linear Unbiased Estimator for the Gauss–Markov DGP. This result is called the Gauss–Markov Theorem. 5-48 BLUE Estimators (cont.) • OLS is a very good strategy for the Gauss–Markov DGP. • OLS is unbiased: our guesses are right on average. • OLS is efficient: it has a small variance (or at least the smallest possible variance for unbiased linear estimators). • Our guesses will tend to be close to right (or at least as close to right as we can get; the minimum variance could still be pretty large!) 5-49 BLUE Estimator (cont.) • According to the Gauss–Markov Theorem, OLS is the BLUE Estimator for the Gauss–Markov DGP. • We will study other DGP’s. For any DGP, we can follow this same procedure: – Look at Linear Estimators – Impose the unbiasedness conditions – Minimize the variance of the estimator 5-50 Example: Cobb–Douglas Production Functions (Chapter 3.7) • A classic production function in economics is the Cobb–Douglas function. • Y = aLbK1-b • If firms pay workers and capital their marginal product, then worker compensation equals a fraction b of total output (or national income). 5-51 Example: Cobb–Douglas • To illustrate, we randomly pick 8 years between 1900 and 1995. For each year, we observe total worker compensation and national income. • We use bg1, bg2, bg3, and bg4 to estimate Compensation = b·National Income +e 5-52 TABLE 3.6 Estimates of the Cobb–Douglas Parameter b, with Standard Errors 5-53 TABLE 3.7 Outputs from a Regression* of Compensation on National Income 5-54 Example: Cobb–Douglas • All 4 of our estimators give very similar estimates. • However, bg2 and bg4 have much smaller standard errors. (We will see the value of small standard errors when we cover hypothesis tests.) • Using our estimate from bg4, 0.738, a 1 billion dollar increase in National Income is predicted to increase total worker compensation by 0.738 billion dollars. 5-55 A New DGP • Most lines do not go through the origin. • Let’s add an intercept term and find the BLUE Estimator (from Chapter 4). 5-56 Gauss–Markov with an Intercept Yi b0 b1 X i e i (i 1...n) E(e i ) 0 Var(e i ) s 2 Cov(e i ,e j ) 0, i j X's fixed across samples. All we have done is add a b0 . 5-57 Gauss–Markov with an Intercept (cont.) • Example: let’s estimate the effect of income on college financial aid. • Students whose families have 0 income do not receive 0 aid. They receive a lot of aid. • E[financial aid | family income] = b0 + b1(family income) 5-58 Gauss–Markov with an Intercept (cont.) 5-59 Gauss–Markov with an Intercept (cont.) • How do we construct a BLUE Estimator? • Step 1: focus on linear estimators. • Step 2: calculate the expectation of a linear estimator for this DGP, and find the condition for the estimator to be unbiased. • Step 3: calculate the variance of a linear estimator. Find the weights that minimize this variance subject to the unbiasedness constraint. 5-60 Expectation of a Linear Estimator E ( bˆ ) E wiYi E ( wiYi ) wi E (Yi ) wi E ( b 0 b1 X i e i ) wi E ( b 0 ) wi E ( b1 X i ) wi E (e i ) b 0 wi b1wi X i 0 b 0 wi b1 wi X i 5-61 Checking Understanding E ( bˆ ) b0 wi b1 wi X i • Question: What are the conditions for an estimator of b1 to be unbiased? What are the conditions for an estimator of b0 to be unbiased? 5-62 Checking Understanding (cont.) E ( bˆ ) b0 wi b1 wi X i • When is the expectation equal to b1? – When wi = 0 and wiXi = 1 • What if we were estimating b0? When is the expectation equal to b0? – When wi = 1 and wiXi = 0 • To estimate 1 parameter, we needed 1 unbiasedness condition. To estimate 2 parameters, we need 2 unbiasedness conditions. 5-63 Variance of a Linear Estimator Var ( bˆ ) Var wY i i Var wY i i0 wi Var b 0 b1 X i e i 2 wi 2 0 0 Var (e i ) 0 wi s 2 2 • Adding a constant to the DGP does NOT change the variance of the estimator. 5-64 BLUE Estimator To compute the BLUE estimator for bˆ1, we want to minimize s 2 wi 2 subject to the constraints w 0 w X 1 i i i Solution: ( X i X )(Yi Y ) ˆ b1 n 2 ( X X ) j j 1 5-65 BLUE Estimator of b1 ( X i X )(Yi Y ) ˆ b1 n 2 (X j X ) j 1 • This estimator is OLS for the DGP with an intercept. • It is the Best (minimum variance) Linear Unbiased Estimator for the Gauss–Markov DGP with an intercept. 5-66 BLUE Estimator of b1 (cont.) ( X i X )(Yi Y ) ˆ b1 n 2 (X j X ) j 1 • This formula is very similar to the formula for OLS without an intercept. • However, now we subtract the mean values from both X and Y. 5-67 BLUE Estimator of b1 (cont.) ( X i X )(Yi Y ) ˆ b1 n 2 ( X X ) j j 1 • OLS places more weight on high values of: Xi X • Observations are more valuable if X is far away from its mean. 5-68 BLUE Estimator of b1 (cont.) wi Xi X X j X 2 X X i Var ( bˆ1 ) s 2 wi 2 s 2 2 X X j 1 2 s 2 ( X X ) i 2 2 X j X 2 s2 X j X 2 5-69 BLUE Estimator of b0 • The easiest way to estimate the intercept: bˆ0 Y bˆ1 X • Notice that the fitted regression line always goes through the point ( X ,Y ) • Our fitted regression line passes through “the middle of the data.” 5-70 Example: The Phillips Curve • Phillips argued that nations face a trade-off between inflation and unemployment. • He used annual British data on wage inflation and unemployment from 1861–1913 and 1914–1957 to regress inflation on unemployment. 5-71 Example: The Phillips Curve (cont.) • The fitted regression line for 1861–1913 did a good job predicting the data from 1914 to 1957. • “Out of sample predictions” are a strong test of an econometric model. 5-72 Example: The Phillips Curve (cont.) • The US data from 1958–1969 also suggest a trade-off between inflation and unemployment. Unemploymentt 0.06 - 0.55·Inflationt bˆ0 0.06 bˆ1 0.55 5-73 Example: The Phillips Curve (cont.) Unemploymentt 0.06 - 0.55·Inflationt • How do we interpret these numbers? • If Inflation were 0, our best guess of Unemployment would be 0.06 percentage points. • A one percentage point increase of Inflation decreases our predicted Unemployment level by 0.55 percentage points. 5-74 Figure 4.2 U.S. Unemployment and Inflation, 1958–1969 5-75 TABLE 4.1 The Phillips Curve 5-76 Example: The Phillips Curve • We no longer need to assume our regression line goes through the origin. • We have learned how to estimate an intercept. • A straight line doesn’t seem to do a great job here. Can we do better? 5-77 Review • As a starting place, we need to write down all our assumptions about the way the underlying process works, and about how that process led to our data. • These assumptions are called the “Data Generating Process.” • Then we can derive estimators that have good properties for the Data Generating Process we have assumed. 5-78 Review: The Gauss–Markov DGP • Y = bX +e • E(ei ) = 0 • Var(ei ) = s 2 • Cov(ei ,ej ) = 0, for i ≠ j • X ’s fixed across samples (so we can treat them like constants). • We want to estimate b 5-79 Review • We will focus on linear estimators. • Linear estimator: a weighted sum of the Y ’s. ˆ b wY ii 5-80 Review (cont.) Yi b X i e i E (e i ) 0 Var (e i ) s 2 Cov(e i , e j ) 0, for i j X 's fixed across samples (so we can treat it as a constant). n bˆ wiYi i 1 n E ( bˆ ) b wi X i i 1 n A linear estimator is unbiased if w X i 1 i i 1. 5-81 Review (cont.) Yi b X i e i E(e i ) 0 Var(e i ) s 2 Cov(e i ,e j ) 0, for i j X's fixed across samples (so we can treat it as a constant). n A linear estimator is unbiased if wi X i 1. i1 Many linear estimators will be unbiased. How do I pick the "best" linear unbiased estimator (BLUE)? 5-82 Review: BLUE Estimators A Strategy for Finding the Best Linear Unbiased Estimator: 1. Start with linear estimators: wiYi 2. Impose the unbiasedness condition wiXi = 1 3. Use calculus to find the wi that give the smallest variance subject to the unbiasedness condition. Result: The BLUE Estimator for our DGP 5-83 Review: BLUE Estimators (cont.) • Ordinary Least Squares (OLS) is BLUE for our Gauss–Markov DGP. • This result is called the “Gauss–Markov Theorem.” 5-84 Review: BLUE Estimators (cont.) • OLS is a very good strategy for the Gauss– Markov DGP. • OLS is unbiased: our guesses are right on average. • OLS is efficient: the smallest possible variance for unbiased linear estimators. • Our guesses will tend to be close to right (or at least as close to right as we can get). • Warning: the minimum variance could still be pretty large! 5-85 Gauss–Markov with an Intercept Yi b0 b1 X i e i (i 1...n) E(e i ) 0 Var(e i ) s 2 Cov(e i ,e j ) 0, i j X's fixed across samples. All we have done is add a b0 . 5-86 Review: BLUE Estimator of b1 ( X i X )(Yi Y ) ˆ b1 n 2 ( X X ) j j 1 • This estimator is OLS for the DGP with an intercept. • It is the Best (minimum variance) Linear Unbiased Estimator for the Gauss–Markov DGP with an intercept. 5-87 BLUE Estimator of b0 • The easiest way to estimate the intercept: bˆ0 Y bˆ1 X • Notice that the fitted regression line always goes through the point ( X ,Y ) • Our fitted regression line passes through “the middle of the data.” 5-88