Introduction to Econometrics Instructor: Le Hang My Hanh lehangmyhanh.cs2@ftu.edu.vn Course materials 1. Introduction to Econometrics, 3rd Edition, by Addision-Wesley Series in Economics 2. Basic Econometrics by Gujarati, Fourth Edition (Ch1-13) 3. Introductory Econometrics- A modern approach by Jeffrey M. Wooldridge (Ch1-9) 2 Assessment • Performance: 10% • Mid term test + project: 30% • Final term test : 60% 3 Instruction for your project • Each group should write and present a short report (max. 15 pages all included) based on the data and introduction given during the course. • The report should be organized as follows: 1. Introduction Give a brief statement about the purpose of the study. 2. Literature Review - Summarize the main published work concerning your research question. It should be a synthesis and analysis of the relevant published work, linked at all times to your research question. 3. Methodology and data An introduction of your model (dependent and independent variables) A description of the data must be provided here. You should discuss the data sources and the definition of variables and report in a table summary statistics such as minimum and maximum values, means, standard deviations for each variable. 4. Results: Estimation results are provided in a table and discussed in this section. 5. Conclusion: you should summarize the results here. 4 TOPIC • Export and economic growth: – Tea export – Fisheries export – Agricultural export – …. Lehangmyhanh.cs2@ftu.edu.vn 5 • • • • • • Outline Chapter 1: Introduction to Econometrics Chapter 2: Simple Regression Chapter 3: Multiple Regression Chapter 4 : Statistical Inference Chapter 5: Diagnosing Model Problems Reading papers + Replicating empirical Research + Presentation Le Hang My Hanh, FTU CS2 6 • • • • • • • • Some keywords Dependent variables, independent variables Regression, estimation, estimator, estimate Empirical research Significant, significance, significance level, confidence interval Hypothesis, hypothesis testing Assumption Correlation, autocorrelation, multicollinearity, heteroscedasticity, homoscedasticity Biased, unbiased 7 Examples of empirical research ❑ This thesis examines the relationship between the probability of financial distress and some specific financial ratios in order to identify internal factors causing distress for firms. (Phu Kim Yen, K49 CLC) • Findings: Size has negative coefficients which are statistically significant at significance level of 1% in all estimations. This finding is consistent with previous study of Ohlson (1980). The author concludes that size affect the probability of financial distress of Vietnamese listed firms, especially those on HOSE. In reality, large-cap companies often have more power in its trading position with counterparties as well as more approaches to financing resources. Therefore, it is easier for them to weather unexpected downturns. 8 Introduction to Econometrics The Nature and Purpose of Econometrics 1. Why do you need to learn Econometrics? 2. What is Econometrics? What will you learn from the course? 3. How do you learn? Methodology of Econometrics 4. Terminology and notation 5. Types of data 9 Why do you need to learn Econometrics? Economics suggests important relationships, often with policy implications, but virtually never suggests quantitative magnitudes of causal effects. • What is the quantitative effect of reducing class size on student achievement? • How does another year of education change earnings? • What is the price elasticity of cigarettes? • What is the effect on output growth of a 1 percentage point increase in interest rates by the Fed? • What is the effect on housing prices of environmental improvements? 10 What is Econometrics? • Econometrics = “economic measurement”. • “Econometrics may be defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena” (Goldberger 1964). 11 In this course you will: • Learn methods for estimating causal effects using observational data • Focus on applications – theory is used only as needed to understand the “why”s of the methods; • Learn to evaluate the regression analysis of others – this means you will be able to read/understand empirical economics papers in other econ courses; • Get some hands-on experience with regression analysis in your problem sets. 12 1.2. Methodology of Econometrics 1. Statement of theory or hypothesis . 2. Specification of the mathematical model of the theory 3. Specification of the statistical, or econometric model 4. Collecting the data 5. Estimation of the parameters of the econometric model 6. Hypothesis testing 7. Forecasting or prediction 8. Using the model for control or policy purposes. Example 1. Statement of Theory or Hypothesis • Keynes states that on average, consumers increase their consumption as their income increases, but not as much as the increase in their income (MPC < 1). • MPC= marginal propensity to consume Example 2. Specification of the Mathematical Model of Consumption (single-equation model) Y = β1 + β2X 0 < β2 < 1 (1) Y = consumption expenditure (dependent variable) X = income (independent or explanatory variable) β1 = the intercept -184.08 + 0,7064X β2 = the slope coefficient • The slope coefficient β2 measures the MPC. MPC= marginal propensity to consume Example Geometrically, • Geometrically, Example 3. Specification of the Econometric Model of Consumption • Other variables can affect consumption expenditure: size of family, ages of the members in the family, family religion → the inexact relationships between economic variables • To allow for the inexact relationships between economic variables, (1) is modified as follows: • Y = β1 + β2X + u (2) • where u = the disturbance, or error, term, a random (stochastic) variable that has well-defined probabilistic properties. • u may well represent all those factors that affect consumption but are not taken into account explicitly. Example • (2) is an example of a linear regression model, i.e., it hypothesizes that Y is linearly related to X, but that the relationship between the two is not exact; it is subject to individual variation. The econometric model of (2) can be depicted as shown in Figure 2. Example 4. Obtaining Data • Y = personal consumption expenditure (PCE) • X = gross domestic product (GDP) Example 5. Estimation of the Econometric Model • Regression analysis is the main tool used to obtain the estimates. We obtain the estimates β1 = −184.08 and β2= 0.7064 Yˆ = −184.08 + 0.7064Xi (3) → An increase in real income of 1 dollar led, on average, to an increase of about 70 cents in real consumption. Example The data are plotted in Figure I.3 Example 6. Hypothesis Testing • Keynes expected the MPC to be positive but less than 1. • In our example MPC= 0.70 → we must enquire whether this estimate is sufficiently below unity. In other words, is 0.70 statistically less than 1? If it is, it may support Keynes’ theory. • Such confirmation or refutation of economic theories on the basis of sample evidence is based on a branch of statistical theory known as statistical inference (hypothesis testing). Example 7. Forecasting or Prediction • To illustrate, suppose we want to predict the mean consumption expenditure for 2015. The GDP value for 2015 was 7269.8 billion dollars consumption would be: Yˆ2015 = −184.0779 + 0.7064 (7269.8) = 4951.3 8. Use of the Model for Control or Policy Purposes • Suppose the government decides to propose a reduction in the income tax. What will be the effect of such a policy on income and thereby on consumption expenditure and ultimately on employment? Terminology and notation Unless stated otherwise: • The letter Y will denote the dependent variable • The X’s will denote the independent variables, Xk being the kth explanatory variable. • The subscript i or t will denote the ith or the tth observation or value. • N will denote the total number of observations or values in the population, • n will denote the total number of observations in a sample. • u or e will denote the random error or stochastic 24 Terminology and notation • In the literature the terms dependent variable and explanatory variable are described variously. A representative list is: 25 1.3. Types of data • There are three types of data empirical analysis: time series, cross-section, and panel data. • Time series data: a set of observations on the values that a variable takes at different times. It is collected at regular time intervals, such as daily, weekly, monthly, quarterly, annually. Ex: weekly stock return, monthly interest rate, GDP growth, CPI and so on. 26 1.3. Types of data • Cross-section data: data on one or more variables collected at the same point in time. Ex: the census of population conducted by the Vietnam General Statistics Office every 10 years. Profits of listed firms in 2014. • Panel data/ Pooled data: set of combination of time series and cross-section. 27 Example of panel data 28 The accuracy of data The results of research are only as good as the quality of the data. • If in given situations researchers find that the results of the research are “unsatisfactory”, the cause may be not that they use the wrong model but that the quality of the data was poor. 29 Measurement Scales of Variables • Four broad categories: ratio scale, interval scale, ordinal scale and nominal scale. • Ratio scale: GDP growth rate, interest rate, ROE. Most economic variables belong to this category. • Interval scale: the distance between two time periods, say (2000-1995) • Ordinal scale: income class (upper, middle, lower), grading systems (A,B, C grades) • Nominal scale: gender (male, female), marital status (married, unmarried, divorced, separated) 30 1.4 Review of statistics • Emperical problem: educational output Class size and – Policy question: What is the effect on test scores (or some other outcome measure) of reducing class size by one student per class – We must use data to find out (is there any way to answer this without data?) 31 Example: The California Test Score Data Set All K-6 and K-8 California school districts (n=420) Variables: • 5th grade test scores (Stanford-9 achievement test, combined math and reading), district average • Student-teacher ratio (STR) = no. of students in the district divided by no. full-time equivalent teachers 32 33 Do districts with smaller classes have higher test scores? Scatterplot of test score v. student – teacher ratio 34 We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how? 1. Compare average test scores in districts with low STRs to those with high STRs (“estimation”) 2. Test the “null” hypothesis that the mean test scores in the two types of districts are the same, against the “alternative” hypothesis that they differ (“hypothesis testing”) 3. Estimate an interval for the difference in the mean test scores, high v. low STR districts (‘confidence interval”) 35 Initial data analysis: Compare districts with small (STR<20 ) and large (STR>=20) class sizes 36 a. Estimation Is this a large difference in a real-world sense? •Standard deviation across districts = 19.1 • Difference between 60th and 75th percentiles of test score distribution is 667.6 – 659.4 = 8.2 •This is a big enough difference to be important for school reform discussions, for parents, or for a school committee? 37 b. Hypothesis testing • Difference-in-means test: compute the tstatistic: 38 Lehangmyhanh.cs2@ftu.edu.vn 39 c. Confidence interval 40 1.5 Review of probability a. Population, random variable, and distribution b. Moments of a distribution (mean, variance, standard deviation of a deviation, covariance, correlation) c. Conditional distributions and conditional means d. Distribution of a sample of data draw randomly from a population: Y1, …, Yn Lehangmyhanh.cs2@ftu.edu.vn 41 Lehangmyhanh.cs2@ftu.edu.vn 42 Population distribution of Y • The probabilities of different values of Y that occur in the population, for ex. Pr (Y=650) (when Y is discrete) • Or: The probabilities of sets of these values, for ex. Pr(640<=Y<=660) (when Y is continuous) Lehangmyhanh.cs2@ftu.edu.vn 43 Lehangmyhanh.cs2@ftu.edu.vn 44 Lehangmyhanh.cs2@ftu.edu.vn 45 Lehangmyhanh.cs2@ftu.edu.vn 46 Lehangmyhanh.cs2@ftu.edu.vn 47 Lehangmyhanh.cs2@ftu.edu.vn 48 Lehangmyhanh.cs2@ftu.edu.vn 49 Lehangmyhanh.cs2@ftu.edu.vn 50 Lehangmyhanh.cs2@ftu.edu.vn 51 52 Lehangmyhanh.cs2@ftu.edu.vn 53 Distribution of Y1,…, Yn under simple random sampling Because individuals #1 and #2 are selected at random, the value of Y1 has no information content for Y2. Thus: • Y1 and Y2 are independently distributed • Y1 and Y2 come from the same distribution, that is, Y1, Y2 are identically distributed • That is, under simple random sampling, Y1 and Y2 are independently and identically distributed (i.i.d.). • More generally, under simple random sampling, {Yi}, i = 1,…, n, are i.i.d. This framework allows rigorous statistical inferences about moments of population distributions using a sample of data from that population … Lehangmyhanh.cs2@ftu.edu.vn` 54 Some database • IMF data: http://www.imf.org/en/Data • ADB data: http://www.adb.org/data/statistics • WB data: http://data.worldbank.org/vietnamese • GSO: https://www.gso.gov.vn/ • https://www.quandl.com/collections/vietnam 55