Lecture 2: Estimating a Slope • (Chapter 2.1–2.7) 2-1 Recall 計量經濟學的構成 1) 經濟理論 經濟理論 現象 數理模型 統計理論 2) 數理模型 3) 統計理論 計量模型 4) 計量模型 2-2 計量與經濟理論之差異? • Economic theory: qualitative results— Demand Curves Slope Downward • Econometrics: quantitative results— price elasticity of demand for milk = -.75 2-3 計量與統計之差異? • Statistics: “summarize the data faithfully”; “let the data speak for themselves.” • Econometrics: “ what do we learn from economic theory AND the data at hand?” 2-4 計量能做啥事? • Estimation: What is the marginal propensity to consume of Taiwan? (結構分析) • Hypothesis Testing: Do Korean college workers’ productivity higher than Taiwan?(檢定假說) • Prediction & Forecasting: What will Personal Savings be in 2004 if GDP is $14,864? And will it grow in the near future (2008)?(預期及預測) 2-5 Economists Ask: “What Changes What and How?” • Higher Income, Higher Saving • Higher Price, Lower Quantity Demanded • Higher Interest Rate, Lower Investment 2-6 Savings Versus Income • Theory Would Assume an Exact Relationship, e.g., Y =bX 6000 5000 4000 3000 2000 1000 0 24000 48000 72000 96000 2-7 Slope of the Line Is Key! • Slope is the change in savings with respect to changes in income • Slope is the derivative of savings with respect to income • If we know the slope, we’ve quantified the relationship! 2-8 Never So Neat: Savings Versus Income 6 5 4 saving 3 2 1 0 0 20 40 60 80 100 120 2-9 Long-run Consumption Function 特點 • 向上斜 • 經原點 • 猜猜斜率? 2-10 Underlying Mean + Random Part Yi b Xi i • (憑直覺) 四大猜法 four intuitively appealing ways to estimate b 2-11 估計策略 1. Min Σ (Y – Y) Y 為配適值 2. Min Σ ∣ Y – Y ∣ 3. Min Σ (Y – Y) 2 • 優劣點 2-12 “Best Guess 1” Mean of Ratios: Y Yi 1 b g1 n Xi X 共有n個 2-13 Figure 2.4 Estimating the Slope of a Line with Two Data Points 2-14 “Best Guess 2” Ratio of Means: b g2 Y X i i 2-15 Figure 2.5 Estimating the Slope of a Line: bg2 2-16 “Best Guess 3” Mean of Changes in Y over Changes in X: Yi Yi1 1 b g3 n 1 X i X i1 y X 2-17 “Best Guess 4” Ordinary Least Squares: b g4 XY X i i 2 i (minimizes squared residuals in sample) 2-18 Four Ways to Estimate b Yi b Xi i 1) Mean of Ratios: 1 Yi b g1 n Xi 3) Mean of Ratio of Changes: Yi Yi1 1 b g3 n 1 X i X i1 2) Ratio of Means: 4) Ordinary Least Squares: Y X YX X b g2 b g4 i i i i 2 i 2-19 Underlying Mean + Random Part Yi b Xi i • Are lines through the origin likely phenomena? 2-20 Regression’s Greatest Hits!!! • An Econometric Top 40 2-21 Two Classical Favorites!! • Friedman’s Permanent Income hypothesis: Consumption b ·(Permanent Income) i • Capital Asset Pricing Model (CAPM) : (Asset j’s Return Above a Riskless Rate) b ·(Market’s Return Above a Riskless Rate) i 2-22 A Golden Oldie !! • Engel on the Demand for Rye: E(%change in quantity) elasticity·(%change in price) 2-23 Four Guesses • How to Choose? 2-24 What Criteria Did We Discuss? • Pick The One That's Right • Make Mean Error Close to Zero • Minimize Mean Absolute Error • Minimize Mean Square Error 2-25 What Criteria Did We Discuss? (cont.) • Pick The One That's Right… – In every sample, a different estimator may be “right.” – Can only decide which is right if we ALREADY KNOW the right answer— which is a trivial case. 2-26 What Criteria Did We Discuss? (cont.) • Make Mean Error Close to Zero …seek unbiased guesses Mean Error E(b g - b ) BIAS of b g in estimating b – If E(bg-b) = 0, bg is right on average – If BIAS = 0, bg is an unbiased estimator of b 2-27 Checking Understanding • Question: Which estimator does better under the “minimize mean error” condition? 1. bg-b is always a positive number less than 2 (our guesses are always a little high), or 2. bg-b is always +10 or -10 (50/50 chance) 2-28 Checking Understanding (cont.) Mean Error E(b g - b ) BIAS of b g in estimating b • If our guess is wrong by +10 for half the observations, and by -10 for the other half, then E(bg-b) = 0! – The second estimator is unbiased! • Mistakes in opposite directions cancel out. The first estimator is always closer to being right, but it does worse on this criterion. 2-29 What Criteria Did We Discuss? • Minimize Mean Absolute Error… Mean Absolute Error E(| b g - b |) – Mistakes don’t cancel out. – Implicitly treats cost of a mistake as being proportional to the mistake’s size. – Absolute values don’t go well with differentiation. 2-30 What Criteria Did We Discuss? (cont.) • Minimize Mean Square Error… Mean Square Error E[(b g - b ) ] 2 – Implicitly treats cost of mistakes as disproportionately large for larger mistakes. – Squared expressions are mathematically tractable. 2-31 What Criteria Did We Discuss? (cont.) • Pick The One That’s Right… – only works trivially • Make Mean Error Close to Zero… – seek unbiased guesses • Minimize Mean Absolute Error… – mathematically tough • Minimize Mean Square Error… – more tractable mathematically 2-32 Criteria Focus Across Samples • Make Mean Error Close to Zero • Minimize Mean Absolute Error • Minimize Mean Square Error • What do the distributions of the estimators look like? 2-33 Try the Four in Many Samples • Pros will use estimators repeatedly— what track record will they have? • Idea: Let’s have the computer create many, many data sets. • We apply all our estimators to each data set. 2-34 Try the Four in Many Samples (cont.) • We use our estimates on many datasets that we created ourselves. • We know the true value of b because we picked it! • We can compare estimators. • We run “horseraces.” 2-35 Try the Four in Many Samples (cont.) • Pros will use estimators repeatedly— what track record will they have? • Which horse runs best on many tracks? • Don’t design tracks that guarantee failure. • What properties do we need our computer-generated datasets to have to avoid automatic failure for one of our estimators? 2-36 Building a Fair Racetrack Under what conditions will each estimator fail? 1) Mean of Ratios: Yi 1 b g1 n Xi 3) Mean of Ratio of Changes: Yi Yi1 1 b g3 n 1 X i X i1 2) Ratio of Means: 4) Ordinary Least Squares: Y X YX X b g2 b g4 i i i i 2 i 2-37 To Preclude Automatic Failure... 1) Yi 1 b g1 n Xi No X i 0 Y 2) b g X X 0 i 2 i i Yi Yi1 1 3) b g3 n 1 X i X i1 No successive X's equal 4) b g 4 YX X i i 2 i Some X i 0 2-38 Why Does Viewing Many Samples Work Well? • We are interested in means: mean error, mean absolute error, mean squared error. • Drawing many (m) independent samples lets us estimate means with variance e2 /m, where e2 is the variance of that mean’s error. • If m is large, our estimates will be quite precise. 2-39 How to Build a Race Track... Yi b Xi i i 1, 2, , n • n=? – How big is each sample? • b=? – What slope are we estimating? • Set X1 , X2 , … , Xn – Do it once, or for each sample? • Draw 1 , 2 , ... , n – Must draw randomly each sample. 2-40 What to Assume About the i ? • What do the i represent? • What should the i equal on average? • What variance do we want for the i ? 2-41 Checking Understanding Yi b Xi i i 1, 2, , n • n=? – How big is each sample? • b=? – What slope are we estimating? • Set X1 , X2 , … , Xn – Do it once, or for each sample? • Draw 1 , 2 , … , n – Must draw randomly each sample. • Form Y1 , Y2 , … , Yn – Yi = bXi + i • We create 10,000 datasets with X and Y. • For each dataset, what do we want to do? 2-42 Checking Understanding (cont.) • We create 10,000 datasets with X and Y • For each dataset, we use all four of our estimators to estimate bg1 , bg2 , bg3 , and bg4 • We save the mean error, mean absolute error, and mean squared error for each estimator 2-43 What Have We Assumed? • We are creating our own data. • We get to specify the underlying “Data Generating Process” relating Y to X. • What is our Data Generating Process (DGP)? 2-44 What Is Our Data Generating Process? Yi b Xi i i 1, 2, , n • E(i ) = 0 • Var(i ) = 2 • Cov(i ,k ) = 0 i≠k • X1 , X2 , … , Xn are fixed across samples GAUSS–MARKOV ASSUMPTIONS 2-45 What Will We Get? • We will get precise estimates of: 1. Mean Error of each estimator 2. Mean Absolute Error of each estimator 3. Mean Squared Error of each estimator 4. Distribution of each estimator • By running different racetracks (DGPs), we check the robustness of our results. 2-46 Review • We want an estimator to form a “best guess” of the slope of a line through the origin. • Yi = bXi +i • We want an estimator that works well across many different samples: low average error, low average absolute error, low squared errors… 2-47 Review (cont.) • We have brainstormed 4 “best guesses”: 1) Mean of Ratios: Yi 1 b g1 n Xi 3) Mean of Ratio of Changes: Yi Yi1 1 b g3 n 1 X i X i1 2) Ratio of Means: 4) Ordinary Least Squares: Y X YX X b g2 b g4 i i i i 2 i 2-48 Review (cont.) • We will compare these estimators in “horseraces” across thousands of computer-generated datasets • We get to specify the underlying relationship between Y and X • We know the “right answer” that the estimators are trying to guess • We can see how each estimator does 2-49 Review (cont.) • We choose all the rules for how our data are created. • The underlying rules are the “Data Generating Process” (DGP) • We choose to use the Gauss– Markov Rules. 2-50 What Is Our Data Generating Process? Yi b Xi i i 1, 2, , n • E(i ) = 0 • Var(i ) = 2 • Cov (i ,k ) = 0 i≠k • X1 , X2 , … , Xn are fixed across samples GAUSS–MARKOV ASSUMPTIONS 2-51