Econometrics 341 What is Econometrics? Econometrics is an economic subject that uses theory and develops statistical methods to: - Build and/or estimate economic relationships (models): 𝑄" = 𝑓 (𝑃, 𝑃( , 𝑃)* ) - uses data to provide economic measurement of impact - estimation provides magnitudes for parameters of variables determining the phenomenon under study. E.g., effect of advert on sales. Purpose of econometrics is to use econometric analysis to - test economic theories (e.g., theory predicts 0<MPC<1) - evaluate policy (for govt/business) - forecast/predict key (macro) variables (or phenomenon of interest). Model Variables - Important feature of modern approach to econometrics is that regressors (along with regressand) are random variables. For social sciences, this approach is more realistic than the usual approach of non-random (fixed) regressors. STEPS IN EMPIRICAL ECONOMIC ANALYSIS - DETAILS Empirical analysis relates to the use of actual data to estimate an econ relationship. 1. Formulate the question of interest. This entails clearly stating what phenomenon is to be investigated. 2. Formulate economic model - In general, an economic model represents relationships between variables mathematically. Example 1– Individual Consumer Demand In the case of testing consumer theory in microeconomics; -the framework is that individuals maximise utility by choosing quantities of goods to consume subject to a budget constraint => two equations. -the outcome is a demand equation where quantity demanded depends on price of the good, price of related goods, income and individual characteristics that affect taste. Assume the goal is to study effect of change in price of product, X, on demand for X. Then economic model is: 𝐷- = 𝑓.𝑃- , 𝑃/ , 𝐼, 𝑧2 (1) Where 𝐷- = quantity demanded of good X. 𝑃- = price of good X. 𝑃/ = price of related good Y. 𝐼 = income. 𝑧 = factors influencing the consumer’s tastes and/or preferences. 1|P ag e Observations on the Economic Model i. Economic theory is used to determine variables used to explain qty of X and direction of influence. ii. Occam’s razor principle is used. Other factors affect the demand for good X, but they are left out. iii. Model (1) is not specific about the functional form 𝑓(∙) of the model. 3. Formulate Econometric Model After specifying an economic model, we derive an econometric model from it. To specify an econometric model, four things must be resolved: (i) Specify a specific form of the function 𝑓(∙). (ii) How to deal with model variables that cannot be observed – e.g., tastes of consumers, ability of worker, quality of education, etc. (iii) Account for many other variables that affect the dependent variable but which are not included in the model. (iv) We need to ensure that the model captures a ceteris paribus relationship between the variable we are explaining and its primary determinant. Econometric model of demand for qty demanded of a commodity An econometric model relating to the economic model in (1) might be • • • • 𝐷- = 𝛽7 + 𝛽9 𝑃- + 𝛽: 𝑃/ + 𝛽; 𝐼 + 𝑢 (2) Econometric model assumes linear relationship Where u is a stochastic variable called the error term. It contains factors that influence 𝐷- but not included in the model. Causal effect of 𝑃- is ensured by inclusion of other variables - 𝑃/ and 𝐼. Generally we use the plus sign in specifying the model. This does not imply direction of effect. The sign of coefficient is determined by theory/intuition. 4. Stating Hypothesis/Hypotheses of Interest Following the specification of econometric model, we state the hypothesis of interest. • Since we intend to examine the effect of price of X on 𝐷- , then 𝑃- is the variable of interest. • Hence, we state the hypothesis of interest in terms coefficient of 𝑃- . For example state the hypothesis that price of X has no effect on 𝐷- as 𝛽9 = 0. 5. Collecting Data and Estimating the Model The last step involves collecting relevant data and using econometric method to estimate the model and formally testing the hypothesis of interest. - Often we use a sample instead of the population values. 2|P ag e Example 2 – The Case of Relationship between Job Training and Worker Productivity The question of interest is to examine the effect of on-the-job training on worker productivity. Economic reasoning suggests that worker productivity is affected by training, education and experience. Also, micro says workers are paid a wage commensurate to their productivity. So wage can be used to proxy for productivity. From this the economic model is: wage = f (educ, exper, tra) (3) where wage =hourly wage, educ =years of formal education, exper= years of workforce experience, and tra =weeks spent in job training. Observations i. Other factors generally affect worker’s wage rate, but they are left out. ii. The functional form of model (3) is not specified. iii. Theory is used to determine regressors. Formulating econometric model - Econometric model formulated from equ (3) becomes: 𝑤𝑎𝑔𝑒 = 𝛽7 + 𝛽9 𝑒𝑑𝑢𝑐 + 𝛽: 𝑒𝑥𝑝𝑒𝑟 + 𝛽; 𝑡𝑟𝑎 + 𝑢 (4) Deterministic Vs Stochastic Regression Model Deterministic models assume an exact relationship between 𝑦 and 𝑥. Explicit forms of economic models are deterministic. Example: 𝑦 = 𝛽7 + 𝛽9 𝑥 Specific example of Consumption Function: Let 𝑦=consumption expenditures, denoted by cons; 𝑥=income, denoted by inc. Then model is 𝑐𝑜𝑛𝑠 = 𝛽7 + 𝛽9 𝑖𝑛𝑐 Stochastic models assume an inexact relationship between 𝑦 and 𝑥. Econometric models are stochastic. . Example: 𝑦 = 𝛽7 + 𝛽9 𝑥 + 𝑢 Specific example of Consumption Function: Then model becomes 𝑐𝑜𝑛𝑠 = 𝛽7 + 𝛽9 𝑖𝑛𝑐 + 𝑢 ð Not all changes in 𝑐𝑜𝑛𝑠 are explained by changes in 𝑖𝑛𝑐. Graphical illustrations of the two types of model using Consumption Function. 3|P ag e Figure: Exact Relationship y x Note: all the y-x points fall exactly along the line. Figure: Inexact Relationship y . . . . . . . x Note: all the y-x points lie close around the scatter line (indicating high correlation), but not all the points fall on the line. STRUCTURE OF ECONOMIC DATA There are different types of data. 1. Cross-Sectional (CS) Data A cross-sectional data refers to observations on a set of economic units/entities e.g. households (HHs) taken at a given time period. E.g., income of various HHs in December 2016. - Sometimes the time period is not exactly the same for all the units in the sample. Then we ignore these small timing differences and view the data as cross-sectional data collected in a given month. Key Features of CS Data (i) That cross-sectional data can often be assumed to have been obtained by random sampling from an underlying population. (ii) Another feature is that the ordering of the data is not important for econometric analysis. 4|P ag e For example, if we get info on 300 workers’ wages, experience, etc in a particular month by randomly drawing them from total working people, then we really have a random sample from the population of all workers. Table showing (Hypothetical) Cross-Sectional Data Obsno Wage rate/day (pula) 1 6.10 2 6.24 3 6.00 : : 225 22.56 : : 300 6.50 Exper 2 22 3 : 6 : 5 Note: that all the 300 wage data (and also experience data) were collected from various workers at a given time period – e.g., a week/month. 2. Time Series (TS) Data Time series data refers to observations collected on a variable(s) over time. Examples: include CPI/inflation series, GDP series, credit series etc. Ordering of data is important for econometric analysis in time series data. Reasons - Past events can influence future events. - Lags in behaviour are common in social sciences. Key Features of TS Data (a) Dependency in data -is that economic observations can rarely be taken as independent across time. -an observation in one period tends to be similar to the observation in the immediate next period. Consequences - this phenomenon makes the econometric analysis of TS data difficult. - standard econometric methods of analysis are often not applied directly to TS data. (b) Data frequency This refers to the time interval at which the observations are collected. For Botswana, some data are recorded - daily (e.g., exchange rates) => frequency is daily - monthly (e.g., inflation) => frequency is monthly - quarterly (e.g., GDP). => frequency is quarterly 5|P ag e Minimum Wage and Unemployment obsno year 1 1950 2 1951 3 1952 : : 37 1986 avgmin 0.20 0.21 0.23 : 3.35 unemp 15.4 16.0 14.8 : 18.9 3. Pooled Cross Sections This refers to observations where cross-sectional data for different time periods have been put together. - The dataset has both cross-sectional and time series features simultaneously. This putting together of cross-sectional data is called pooling. Reason for pooling Pooling CS data often provides a useful way to analyse the effect of a new govt policy. - Accomplished by putting together data from before and after the policy change. Example of pooled Cross Sections for data on 120 houses for 2001 and on 165 houses for 2004. Table Showing Pooled CS Data Obsno Year 1 2001 2 2001 3 2001 : : 120 2001 121 2004 : : 285 2004 hprice 500,500 850,000 470,500 : 960,000 485,000 : 350,000 Bdrms 3 4 2 : 4 2 : 2 Note that the houses whose data were collected in 2001 are not necessarily the same houses whose data were collected in 2004 – e.g., the 2 bedroom houses are different. 4. Panel or Longitudinal Data Panel data pools together time series data on a set of cross-sectional units. For example, investment data on the same set of firms over a period of 3 years. Crucial difference between a panel dataset and a pooled CS dataset is that in a panel dataset, data is recorded for the same observation units over time, but observation units differ in pooled CS. 6|P ag e Example of panel data on firms’ investment (Inv), income (Inc), and interest (ir) for 3 years (2000 - 2002). Hypothetical Firm 1 1 1 2 2 2 3 3 3 year 2000 2001 2002 2000 2001 2002 2000 2001 2002 7|P ag e Inv 6.0 4.6 9.4 9.1 8.3 0.6 9.1 4.8 9.1 Inc 5.8 7.9 5.4 6.7 6.6 0.4 2.6 3.2 6.9 ir 1.3 7.8 1.1 4.1 5.0 7.2 6.4 6.4 2.1