Econometrics I Spring Term 2024 Bachelor of Science in Economics Bachelor of Business Administration Khatai Abbasov, PhD candidate ADA University © Khatai Abbasov 1 0. Introduction Instructor: Khatai R. Abbasov MSc Graduate from FAU Erlangen-Nürnberg PhD Candidate @ Istanbul University Adjunct Lecturer @ ADA University Analyst @ Pasha Holding Office Hours: Please send an e-mail. Email: kabbasov@ada.edu.az Slides: © All rights reserved. ADA University © Khatai Abbasov 2 0. Introduction Organizational issues: 1. Lecture Saturday 08:30 – 09:45 A 110 Saturday 10:00 – 11:15 A 110 2. Application Classes Check the Schedule in the Syllabus ADA University © Khatai Abbasov 3 0. Introduction Assessment: 1. 2. Attendance Research project presentation 1 – – 5%, on individual base 5%, on individual/group base 3. Midterm examination – 30%, on individual base 4. Research project presentation 2 – 20%, on individual/group base 5. Research project paper – 10%, on individual/group base – 30%, on individual base February 24 March 30 April 27 – May 4 May 13, by 23.59 o’clock 6. Final examination May 25 ADA University © Khatai Abbasov 4 0. Introduction Lecture notes: You will find the lecture notes and other materials at online platform: Please register at online platform for this course. Software: We are going to use Excel & R this semester. R is a free software environment. Check the links in Section 8 in the Syllabus, if you have not downloaded R, yet. We are going to learn the basics of R during the upcoming sessions. Do you want to learn R on your own?: check the links in the Syllabus then. ADA University © Khatai Abbasov 5 0. Introduction Literature: Lecture closely follows: Wooldridge, J. M. (2016). Introductory Econometrics: A Modern Approach (6th ed.). Cengage Learning Textbooks below are also recommended: Greene, W. H. (2012). Econometric Analysis (Seventh ed.). Pearson Education Limited. Studenmund, A. H. (2017). Using econometrics: A practical guide Verbeek, M. (2004). A Guide to Modern Econometrics (Second ed.). John Wiley & Sons Ltd. ADA University © Khatai Abbasov 6 0. Introduction Notations: In terms of notation, the lecture notes closely follow the book, e.g. Greek characters (𝛼, 𝛽, 𝛾, …) denote parameters መ 𝛾, Estimates are usually indicated by ´hats´ (𝛼, ො 𝛽, ො …) Matrices and vectors are written by bold characters Please consider: the lecture notes are not perfect (yet)! Please let me know if you see any mistakes or if you are confused. ADA University © Khatai Abbasov 7 0. Introduction Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 8 0. Introduction Outline of Section 0: 0.1 Cross-Sectional Data 0.2 Time Series Data 0.3 Pooled Cross-Sectional Data 0.4 Panel Data 0.5 Pooled Cross Section vs Panel Data 0.6 Parametric vs Non-Parametric Tests 0.7 Application: Basic Concepts of Statistical Theory ADA University © Khatai Abbasov 9 0. Introduction 0.1 Cross-Sectional Data ADA University © Khatai Abbasov 10 0. Introduction 0.2 Time Series Data ADA University © Khatai Abbasov 11 0. Introduction 0.3 Pooled Cross-Sectional Data ADA University © Khatai Abbasov 12 0. Introduction 0.4 Panel Data ADA University © Khatai Abbasov 13 0. Introduction 0.5 Pooled Cross Section vs Panel Data Pooled Cross Section each year draw a random sample on wages, education, experience, etc.: statistically observations are independently sampled i.e., no correlation in the error terms across different observations. Yet, not identically distributed, since samples are from different time periods: e.g., distribution of wages and education have changed over time in most countries. Panel Data (also called longitudinal data) collect data from the same individuals, firms, states etc. across time: e.g., the same individuals are reinterviewed at several subsequent points in time. Observations are not indepedently distributed across time, e.g., ability that affects someone‘s wage in 1990 will do so in 1991, too. Since Panel Data methods are somewhat more advanced, we skip that in Econometrics II. ADA University © Khatai Abbasov 14 0. Introduction 0.6 Parametric vs Non-Parametric Tests Outcome Variable Input Variable ADA University Nominal Categorical >2 categories Ordinal Quantitative Discrete Quantitative Non-Normal Quantitative Normal Nominal Χ2 or Fisher’s Χ2 Χ2 –trend or Mann-Whitney Mann-Whitney Mann-Whitney or log-rank Student’s t test Categorical >2 categories Χ2 Χ2 Kruskal-Wallis Kruskal-Wallis Kruskal-Wallis Analysis of Variance Ordinal Χ2 –trend or Mann-Whitney Poisson Regression Spearman rank Spearman rank Spearman rank Spearman rank or linear reg. Quantitative Discrete Logistic Regression Poisson Regression Poisson Regression Spearman rank Spearman rank Spearman rank or linear reg. Quantitative Non-Normal Logistic Regression Poisson Regression Poisson Regression Poisson Regression Plot & Pearson or Spearman rank Plot & Pearson or Spearman and linear reg. Quantitative Normal Logistic Regression Poisson Regression Poisson Regression Poisson Regression Linear regression Pearson and linear regress. © Khatai Abbasov 15 0. Introduction ADA University © Khatai Abbasov 16 0. Introduction 0.7 Application: Basic Concepts of Statistical Theory Mean Variance Standard Deviation Covariance Correlation Application in Excel… ADA University © Khatai Abbasov 17 1. The Simple Regression Model Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 18 1. The Simple Regression Model Outline of Section 1: 1.1 Statistical Models 1.2 Linear Regression Models 1.3 Ordinary Least Squares 1.4 R-squared 1.5 Multiple Regression Analysis 1.6 Omitted Variable Bias 1.7 Examples to widen your understanding 1.8 Qualitative Information: Binary (or Dummy) Variables 1.9 Binary Dependent Variable – LPM Model 1.10 Discrete Dependent Variables ADA University © Khatai Abbasov 19 1. The Simple Regression Model 1.1 Statistical Models Statistics/Econometrics is mainly concerned with model building. Model: one variable is caused by another Model building often begins with an idea of a relation Statistical model building: translating this idea into a (a set of) equation(s) Some features of this equation answer a relevant/interesting question about the variable of interest Examples: Does insurance coverage affect health care utilization? What is the direction of this effect if it exists? How “big” is this effect if it exists? ADA University © Khatai Abbasov 20 1. The Simple Regression Model 1.1 Statistical Models Statistical point of view: health care utilization, insurance coverage, and further covariates have a joint probability distribution We often interested in conditional distribution of one of these variables given the other. Focus is often on the conditional mean of one variable y given the value of covariates x i.e. E[y|x] E[y|x] is the regression function E.g. expected number of doctor visits given income, health status, insurance status etc. Linear regression model most common ADA University © Khatai Abbasov 21 1. The Simple Regression Model 1.2 Linear Regression Models Simple/Bivariate Regression Model y: left-hand-side variable [lhs] x: right-hand-side variables. [rhs] u : disturbance /errors / unobserved part Random component from the underlying theoretical model Measurement error in y Captures anything not explicitly taken into account by the model ADA University © Khatai Abbasov 22 1. The Simple Regression Model 1.2 Linear Regression Models Simple/Bivariate Regression Model ADA University © Khatai Abbasov 23 1. The Simple Regression Model 1.2 Linear Regression Models Simple/Bivariate Regression Model Sample of data for 𝑦𝑖 and 𝑥𝑖 with 𝑖 = 1, … , 𝑛 Key assumption: each observed value of 𝑦𝑖 is generated by the underlying data generating process 𝑦𝑖 = 𝛽1 𝑥𝑖 + 𝑢𝑖 𝑦𝑖 is determined by deterministic part 𝛽1 𝑥𝑖 and random part 𝑢𝑖 Objective of statistical analysis: estimate the unknown model parameters, here 𝛽1 Testing hypotheses (Does the number of doctor visits increase by income?) Identifying independent effects of 𝑥𝑖 on y (How strong will doctor visits increase if income increases by one unit?) Making predictions about 𝑦𝑖 (How often will individual 𝑖 with characteristics 𝑥𝑖 visit a doctor?) ADA University © Khatai Abbasov 24 1. The Simple Regression Model 1.2 Linear Regression Models Linearity does not mean that the statistical model needs to be linear 𝑦 = 𝛼 ∗ 𝑥𝛽𝑒𝑢 is nonlinear, its transformation ln 𝑦 = ln 𝛼 + 𝛽 ln(𝑥) + 𝑢 is linear 𝑦 = 𝛼 ∗ 𝑥𝛽 + 𝑒𝑢 is non-linear too but cannot be transformed to linear model Linearity refers to linearity in the parameters and in the disturbances, not to linearity in the (original, not transformed) variables, e.g. 𝑦 = 𝛼 + 𝛽1 𝑙𝑛 𝑥 + 𝛽2 𝑙𝑛(𝑧) + 𝑢 𝑙𝑛(𝑦) = 𝛼 + 𝛽1 𝑥 + 𝛽2 𝑧 2 + 𝑢 𝑦 = 𝛼 + 𝛽1 1/𝑥 + 𝛽2 𝑧 + 𝑢 are linear regression models though they are nonlinear in x Especially log-linear (log-log, semi-log) models frequently used in applied work ADA University © Khatai Abbasov 25 1. The Simple Regression Model 1.2 Linear Regression Models Linearity does not mean that the statistical model needs to be linear We mean here that the equation is linear in the parameters β. Nothing prevents us from using simple regression to estimate a model such as , where cons is consumption and inc is Income annually. Linear models with non-linear variables are often more realistic: ADA University © Khatai Abbasov 26 1. The Simple Regression Model 1.2 Linear Regression Models Linear models with non-linear variables are often more realistic: ADA University © Khatai Abbasov 27 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) Estimation of model parameters: objective of econometric model Different approaches to model estimation, least squares regression the most popular Starting with simple least squares is often good idea in applied work - even if more sophisticated (possibly better suited) methods are available. Idea of least squares estimation: choosing a coefficient 𝛽 such that the sum of squared residuals (estimated unobserved parts) is minimized. Intuition: The fitted line 𝛽1 𝑥 is close to the observed data points Algebraic perspective: least squares allows for algebraic solution of the minimization problem Least squares estimation puts much weight on avoiding large deviations of observed data points from the fitted line (by considering squares) ADA University © Khatai Abbasov 28 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) Population and Sample Regression ADA University © Khatai Abbasov 29 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢 E(u) = 0 -> Distribution of unobserved factors in the population. E(u|x) = E(u) -> u is mean independent of x, in other words full independence between u and x. Example: -> E(abil|educ) = E(abil|8) = E(abil|16) -> What if ability increases with years of education? E(𝑦|𝑥) = 𝛽1 𝑥 -> Average value of y changes with x, not all units! Systematic part and Unsystematic part ??? ADA University © Khatai Abbasov 30 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) How to derive the OLS, i.e., how to find out the values for 𝛽0 & 𝛽1 See Chapter 2 in Wooldridge (2016), if you want more explanation about derivation ADA University © Khatai Abbasov 31 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) How to derive the OLS, i.e., how to find out the values for 𝛽0 & 𝛽1 using the hint, we get see A7 & A8 in the Appendix of the book, for this hint. See Chapter 2 in Wooldridge (2016), if you want more explanation about derivation ADA University © Khatai Abbasov 32 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) How to derive the OLS, i.e., how to find out the values for 𝛽0 & 𝛽1 Variance should be greater than zero See Chapter 2 in Wooldridge (2016), if you want more explanation about derivation ADA University © Khatai Abbasov 33 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) Variance should be greater than 0 ADA University © Khatai Abbasov 34 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) Example: What is wage for a person with 8 years of education? How much does hourly wage increase by 1 & 4 more years of education? ADA University © Khatai Abbasov 35 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) More Examples ADA University © Khatai Abbasov 36 1. The Simple Regression Model 1.3 Ordinary Least Squares (OLS) A note on terminology: Statistical Jargon Often, we indicate the estimation of a relationship using OLS by writing equations. Alternatively, we run the regression of y on x. Or simply we regress y on x: : always the dependent variable on the independents. Example: regress salary on roe When we use such terminology, we will always mean that we plan to estimate the intercept along with the slope coefficient. This case is appropriate for the vast majority of applications. Unless explicitly stated otherwise, we estimate an intercept along with a slope. Fitted values (Explained Part) versus Actual Data versus Residuals. ADA University © Khatai Abbasov 37 1. The Simple Regression Model 1.4 R-squared ADA University © Khatai Abbasov 38 1. The Simple Regression Model 1.4 R-squared R-squared is the ratio of the explained variation compared to the total variation; It is interpreted as the fraction of the sample variation in y that is explained by x. SST = SSE + SSR divide all side by SST 1 = SSE/SST + SSR/SST 1 – SSR/SST = SSE/SST = R2 R-squared Example: Firm’s ROE explains only 1.3% of the variation in salaries for this sample. ADA University © Khatai Abbasov 39 1. The Simple Regression Model 1.4 R-squared R-squared is a poor tool in the model analyses Example narr86 = number of arrests in 1986 pcnv = the proportion of arrests prior to 1986 that led to conviction Interpret the coefficients Interpret the R-squared value In the arrest example, the small R2 reflects what we already suspect in the social sciences: It is generally very difficult to predict individual behavior. ADA University © Khatai Abbasov 40 2. The Multiple Regression Model Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 41 2. The Multiple Regression Model Outline of Section 2: 2.1 Models with k independent variables 2.3 Omitted Variable Bias 2.3 Examples to widen your understanding ADA University © Khatai Abbasov 42 2. The Multiple Regression Model 2.1 Models with k independent variables Linear regression model with multiple independent variables For any values of 𝑥1 and 𝑥2 in the population, the average of the unobserved factors is equal to zero. This implies that other factors affecting y are not related on average to 𝑥1 & 𝑥2 Thus, equation fails when any problem causes u to be correlated with any of the independent variables. ADA University © Khatai Abbasov 43 2. The Multiple Regression Model 2.1 Models with k independent variables Linear regression model with multiple independent variables Example Exper – years of labor market experience Tenure – years with the current employer Ceteris Paripus Effect - if we take two people with the same level of experience and job tenure, the coefficient on educ is the proportionate difference in predicted wage when their education levels differ by one year. Questions The estimated effect on wage when an individual stays at the same firm for another year? How to obtain fitted or predicted values for each observation? What is the difference between the actual values and fitted values? ADA University © Khatai Abbasov 44 2. The Multiple Regression Model 2.2 Omitted Variable Bias OVB occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included. 𝑦 = β0 + β1𝑥1 + β2𝑥2 + 𝑢 𝑦 = β෨ 0 + β෨ 1𝑥1 + 𝑣 -> Full model -> Underspecified model What if Corr(x1,x2) = 0 ? ADA University © Khatai Abbasov 45 2. The Multiple Regression Model 2.3 Examples to widen your understanding 1. Some students are randomly given grants to buy computer. If the amount of the grant is truly randomly determined, we can estimate the ceteris paribus effect of the grant amount on subsequent college grade point by simple regression analysis. a) Because of random assignment, all of the other factors that affect GPA would be uncorrelated with the amount of the grant. b) R-squared would probably be very small. c) In a large sample we could still get the precise estimate of the effect of the grant. d) For more precise estimate, SAT score, family background variables would be good candidates since no correlation is an issue with the amount of grant. Remember, I do not mention here unbiasedness, it is already obtained. Issue here is getting an estimator with small sampling variance. Keep in mind that the smaller the R-squared, the harder the prediction/ forecast. Yet, goodness of fit or other factors should not be decisive in model selection. It is important what is your purpose to assess. ADA University © Khatai Abbasov 46 2. The Multiple Regression Model 2.3 Examples to widen your understanding 2. We have ecolabeled apples as a dependent variable: ecolbs, and independent variables are prices of the ecolabeled apples (ecoprc) and price of regular apples (regprc). Families are randomly chosen. And we want to estimate the price effects. a) Since random assignment is in charge, family income, desire for clean environment are unrelated to prices. b) Hence, 𝑟𝑒𝑔 𝑒𝑐𝑜𝑙𝑏𝑠 𝑒𝑐𝑜𝑝𝑟𝑐 𝑟𝑒𝑔𝑝𝑟𝑐 produces unbiased estimator of the price effects. c) R-squared is 0.0364, which means price variables explain only 3.6 % of the total variation in the dependent variable. 3. Suppose we want to estimate the effect of pesticide usage among farmers on family health expenditures. Should we include the number of doctor visits as an explanatory variable? The answer is NO! a) Health expenditures include doctor visits, and we want to see all health expenditures b) If we include doctor visits, then we are only measuring the effect of pesticide usage on health expenditures other than doctor visits. ADA University © Khatai Abbasov 47 3. Hypothesis Testing in the Regression Analysis Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 48 3. Hypothesis Testing in the Regression Analysis Outline of Section 3: 3.1 Variance of Estimates 3.2 Hypothesis Testing 3.3 Type I & Type II Errors 3.4 The t-Test 3.5 The F-Test ADA University © Khatai Abbasov 49 3. Hypothesis Testing in the Regression Analysis 3.1 Variance of Estimates In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢 Having look on Var(β1), it is easy to summarize how this variance depends on the error variance, σ2, and the total variation in {x1,x2,..,xn}, SSTx. The larger the error variance, the larger is Var(β1). This makes sense since more variation in the unobservables affecting y makes it more difficult to precisely estimate β1. More variability in independent variables are preferred. That means the more spread out is the sample if independent variables, the easier is to trace out the relationship between E(y|x) and x, in other words, easier to estimate β1. We want to have lower variance for the estimates! ADA University © Khatai Abbasov 50 3. Hypothesis Testing in the Regression Analysis 3.1 Variance of Estimates In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢 Estimating the error variance σ2 Errors (or disturbances) are different from residuals. Errors show up in the equation containing the population parameters, β0 and β1. Residuals show up in the estimated equation with β0 and β1. Thus, the errors are never observed, while the residuals are computed from the data. ADA University © Khatai Abbasov 51 3. Hypothesis Testing in the Regression Analysis 3.1 Variance of Estimates Homoskedasticity & Heteroskedasticity It is likely that people with more education have a wider variety of interests and job opportunities, which could lead to more wage variability at higher levels of education. People with very low levels of education have fewer opportunities and often must work at the minimum wage; this serves to reduce wage variability at low education levels. ADA University © Khatai Abbasov 52 3. Hypothesis Testing in the Regression Analysis 3.1 Variance of Estimates In the multiple regression model of 𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝑢 Having look on Var(β𝑗), it is easy to summarize how this variance depends on the error variance, σ2, the total sample variation in xj, SSTj and on 𝑅𝑗2 . The larger the error variance, the larger is Var(β1). This makes sense since more variation in the unobservables affecting y makes it more difficult to precisely estimate β1. More variability in independent variables are preferred. That means the more spread out is the sample if independent variables, the easier is to trace out the relationship between E(y|x) and x, in other words, easier to estimate β1. 𝑅𝑗2 can't be seen in simple regression analysis since one independent variable is used. ADA University © Khatai Abbasov 53 3. Hypothesis Testing in the Regression Analysis 3.1 Variance of Estimates In the multiple regression model of 𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝑢 What do you understand when 𝑅𝑗2 is closer to 1 and to 0? How does Var(β𝑗) change when 𝑅𝑗2 is getting closer to 1? Is it really an issue? Multicollinearity & Perfect collinearity inference in this manner? BLUE – Linear in parameters, Random Sampling, No Perfect Collinearity, Zero Conditional Mean, Homoskedasticity ADA University © Khatai Abbasov 54 3. Hypothesis Testing in the Regression Analysis 3.2 Hypothesis Testing Statistical inference has two subdivisions: estimation hypothesis testing Estimation addresses the question “What is this parameter’s value?” Hypothesis testing question is “Is the value of the parameter x (or some other specific value)?” Steps in Hypothesis Testing 1. Stating the hypotheses 2. Identifying the relevant test statistic and its probability distribution 3. Specifying the significance level 4. Calculating the test statistic 5. Making the statistical decision 6. Making economic, financial, investment, or any other decision. ADA University © Khatai Abbasov 55 3. Hypothesis Testing in the Regression Analysis 3.2 Hypothesis Testing We always state two hypotheses: the null hypothesis, the null, H0 the alternative hypothesis, Ha, H1 The null hypothesis is the hypothesis to be tested. The alternative is the one accepted when the null is rejected. Formulation of Hypotheses 1. 𝐻0 : 𝛽 = 𝑥 versus 𝐻1 : 𝛽 ≠ 𝑥 2. 𝐻0 : 𝛽 ≤ 𝑥 versus 𝐻1 : 𝛽 > 𝑥 3. 𝐻0 : 𝛽 ≥ 𝑥 versus 𝐻1 : 𝛽 < 𝑥 the 1st formulation is a two-sided (or two-tailed)hypothesis test the 2nd and 3rd are one-sided (or one-tailed) hypothesis test ADA University © Khatai Abbasov 56 3. Hypothesis Testing in the Regression Analysis 3.2 Hypothesis Testing We calculate the value of a test statistic based on a sample Test statistic frequently has the following form (but not always): 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑢𝑛𝑑𝑒𝑟𝐻0 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑆𝑎𝑚𝑝𝑙𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 The test statistic is used to decide whether we reject the null or not For that we compare the value of the test statistic with the critical value Critical values are obtained from the table on the basis of probability distributions for t-test, use t-distribution, i.e., t table for z-test, use standard normal or z-distribution for a chi-square test, use chi-square distribution, i.e., χ2 table for F test, use F-distribution, i.e., F table ADA University © Khatai Abbasov 57 3. Hypothesis Testing in the Regression Analysis 3.3 Type I & Type II Errors In hypothesis testing two actions are possible reject the null hypothesis we cannot reject the null hypothesis Decision is based on the comparison of the test statistic with the critical value Critical values are obtained from the statistical tables for the given significance level, e.g., 10%, 5%, 1%, etc. There are 4 possible outcomes when test a null hypothesis 1. The Null is wrong & we reject it correct decision 2. The Null is correct, but we reject it type 1 error 3. The Null is wrong, we don’t reject it type 2 error 4. The Null is correct, we don’t reject it correct decision If we mistakenly reject the null, we can only make type 1 error The probability of type 1 is denoted by alpha, α – also known as level of significance ADA University © Khatai Abbasov 58 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Test statistic has the following form: 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 − 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝑐𝑜𝑚𝑝𝑎𝑟𝑒 𝑤𝑖𝑡ℎ 𝑡𝑛−𝑘−1 መ 𝑠𝑒(𝛽𝑠𝑎𝑚𝑝𝑙𝑒 ) n stands for the sample size k is the count of the variables in the model Let’s say that the null hypothesis is as follows: H0: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 0, then 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 /𝑠𝑒(𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 ) 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 will never exactly be zero, whether or not H0 is true. The question is: How far is 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 from zero? 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 being very far from zero provides evidence against H0: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 0 i.e., test statistic measures how many estimated s.d. 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 is away from zero ADA University © Khatai Abbasov 59 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test One-sided alternative: 𝐻0 : 𝛽 ≤ 0 versus 𝐻1 : 𝛽 > 0 1. We must decide on a significance level 2. 5% significance level means to mistakenly reject H0 when it is true 5% of the time 3. Critical value for α = 5% is the 95th percentile in a tdistribution with n-k-1 degrees of freedom: call this by c 4. 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽 > 𝒄, then reject the Null Example For a 5% level with n-k-1 = 28 degrees of freedom, the critical value is c = 1.701 if 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽 < 1.701, we fail to reject 𝐻0 ADA University © Khatai Abbasov 60 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Degrees of Freedom (𝑛 − 𝑘 − 1) ADA University © Khatai Abbasov 61 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Example i. ii. iii. iv. v. vi. ADA University What is the degrees of freedom here? What is the t statistic for each variable? What is the null and alternative hypothesis? What are the critical values for 5% and 1% levels? Do we reject the null or fail rejecting the null? What is economical and statistical significance here for the variable experience? © Khatai Abbasov 62 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test One-sided alternative: 𝐻0 : 𝛽 ≥ 0 versus 𝐻1 : 𝛽 < 0 1. The rejection rule for the alternative here is just the mirror image of the previous case. 2. Now, critical value comes from the left tail of the t distribution. 3. To reject H0 against the negative alternative, we must get a negative t statistic. 4. 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽 < −𝒄, then reject the Null Example For a 5% level with n-k-1 = 90 degrees of freedom, the critical value is c = 1.662 if 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽 < −1.662, we reject 𝐻0 ADA University © Khatai Abbasov 63 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Example i. ii. iii. iv. v. vi. ADA University What is the degrees of freedom here? What is the t statistic for each variable? What is the null and alternative hypothesis? What are the critical values for 5% and 10% levels? Do we reject the null or fail rejecting the null? If we reject the null, then what significance level is appropriate for failing to reject the null? © Khatai Abbasov 64 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Example i. ii. iii. ADA University How changing functional form can affect our conclusions? How did significance of enroll changed in the new model? Which model should we choose if we consider only significance of enroll and Rsquared estimations? © Khatai Abbasov 65 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Degrees of Freedom (𝑛 − 𝑘 − 1) ADA University © Khatai Abbasov 66 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Two-sided alternative: 𝐻0 : 𝛽 = 0 versus 𝐻1 : 𝛽 ≠ 0 1. This is the relevant alternative when the sign of βj is not well defined by theory or common sense. 2. We cannot use estimates to formulate the null or alternative hypotheses. 3. For the two-sided alternative absolute value of the t statistic is relevant. 4. 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽 > 𝒄, then reject the Null Example For a 5% level with n-k-1 = 90 degrees of freedom critical value is c = 2.060 if 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝛽 > 2.060, we reject 𝐻0 ADA University © Khatai Abbasov 67 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Example i. ii. iii. iv. v. vi. vii. viii. ADA University What is the degrees of freedom here? What is the t statistic for each variable? What is the null and alternative hypothesis? What are the critical values for 5% and 1% levels? Do we reject the null or fail rejecting the null? Is ACT practically significant? How to interpret coefficient of skipped? Hint: Skipped = number of lectures missed per week. What happens if we consider one-sided alternative? © Khatai Abbasov 68 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Testing other Hypotheses Let’s say that the null hypothesis is as follows: H0: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 𝑎𝑗 , then 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 − 𝑎𝑗 /𝑠𝑒(𝛽መ𝑠𝑎𝑚𝑝𝑙𝑒 ) Common cases are 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 1 & 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = −1 Example: One-sided alternative Log(crime) = β0+ β1log(enroll) + u β1 is the elasticity of crime with respect to enrollment. It is not much use to test H0: βj = 0, as we expect the total number of crimes to increase as the size of the campus increases. On the other hand, a noteworthy alternative is H1: βj > 1, which implies that a 1% increase in enrollment raises campus crime by more than 1%. ADA University © Khatai Abbasov 69 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Testing other Hypotheses: One-sided alternative Log(crime) = β0+ β1log(enroll) + u Alternative hypothesis is H1: βj > 1, which implies that a 1% increase in enrollment raises campus crime by more than 1%. What is t statistic? Can we reject the null? If yes, then at what level? Is 1.27 necessarily a good estimate of ceteris paribus effect? What if H1: βj <−1? ADA University © Khatai Abbasov 70 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test Testing other Hypotheses: Two-sided alternative Log(crime) = β0+ β1log(enroll) + u The Null: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 1 The Alternative: 𝛽𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = −1 What is t statistic for log(nox)? Considering the t statistic, do we need to check the critical value from the table? Can we reject the null? If yes, then at what level? Are the estimates statistically different than zero? ADA University © Khatai Abbasov 71 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test መ Confidence Intervals: 𝛽መ ± 𝑐 ∗ 𝑠𝑒(𝛽) For 95% confidence interval, c is the critical value in the 97.5th percentile In other words, α = 5% is divided between the two tails. መ For 𝑑𝑓 = 𝑛 − 𝑘 − 1 = 25, a 95% CI is [𝛽መ − 2.06 ∗ 𝑠𝑒 𝛽መ , 𝛽መ + 2.06 ∗ 𝑠𝑒(𝛽)] Example i. What is the confidence interval for βlog(sales)? ii. How to interpret zero being out of the confidence interval? iii. What is the confidence interval for βprofmarg? iv. How to interpret zero being included in the 95% CI? v. What is the t-statistic, significance level against the two-sided and onesided alternative? ADA University © Khatai Abbasov 72 3. Hypothesis Testing in the Regression Analysis 3.4 The t-Test A little note about p-value for t Tests You can consider p-values as perfectly calculated alpha significance levels. p-value is a probability, which is always between zero and one. Regression packages usually compute p-values for two-sided alternatives. But it is simple to obtain the one-sided p-value: just divide the two-sided p-value by 2. For large samples it is easy to remember critical values on the t table, thus not essentially crucial to see p-values. However, in F test, you are going to realize that critical values for F tests are not easy to memorize, since many tables are in use. Hence, p-values are very helpful. ADA University © Khatai Abbasov 73 3. Hypothesis Testing in the Regression Analysis 3.5 The F-Test To test whether one variable has no effect on the dependent variable -> t-statistic. To test whether a group of variables has no effect on the dep. varbl. -> F statistic. log 𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽0 + 𝛽1 𝑦𝑒𝑎𝑟𝑠 + 𝛽2 𝑔𝑎𝑚𝑒𝑠𝑦𝑟 + 𝛽3 𝑏𝑎𝑣𝑔 + 𝛽4 ℎ𝑟𝑢𝑛𝑠𝑦𝑟 + 𝛽5 𝑟𝑏𝑖𝑠𝑦𝑟 + 𝑢 𝑯𝟎 : 𝜷𝟑 = 𝟎, 𝜷𝟒 = 𝟎, 𝜷𝟓 = 𝟎 This is called multiple hypotheses test or a joint hypotheses test. 𝑯𝟏 : 𝑯𝟎 𝒊𝒔 𝒏𝒐𝒕 𝒕𝒓𝒖𝒆 Alternative holds if at least one of 𝛽3 , 𝛽4 , 𝛽5 is different from zero. Full model in F-test is also called unrestricted model If you treat the 𝛽3 , 𝛽4 , 𝛽5 as zero, then the remaining model is called restricted. See the examples in the next slide. ADA University © Khatai Abbasov 74 3. Hypothesis Testing in the Regression Analysis 3.5 The F-Test Unrestricted model estimation Restricted model estimation ADA University i. Has bavg, hrunsyr, rbisyr statistically significant t statistic in the unrestricted model? ii. How SSR changes when variables are dropped? iii. How R-squared changes when variables are dropped? © Khatai Abbasov 75 3. Hypothesis Testing in the Regression Analysis 3.5 The F-Test F statistic or F ratio: 1. q is the number of exclusion restrictions 2. F statistic is always nonnegative. Why? 3. 𝐻0 : 𝛽3 = 0, 𝛽4 = 0, 𝛽5 = 0 4. Under H0 ∶ 𝐅~𝑭𝒒, 𝒏 − 𝒌 − 𝟏 5. Fstatistic > 𝐜, then reject the Null 6. What is critical value 𝐜 ? Example For a 5% level with q=3 and n-k-1 = 60, the critical value is c = 2.76 If 𝐹 > 2.76, then reject the null at 5% level. In other words, the variables are jointly significant. ADA University © Khatai Abbasov 76 3. Hypothesis Testing in the Regression Analysis Denominator Degrees of Freedom 3.5 The F-Test 5 % Critical Values of the F Distribution ADA University © Khatai Abbasov 77 3. Hypothesis Testing in the Regression Analysis Denominator Degrees of Freedom 3.5 The F-Test 1 % Critical Values of the F Distribution ADA University © Khatai Abbasov 78 3. Hypothesis Testing in the Regression Analysis 3.5 The F-Test Unrestricted model estimation i. ii. Calculate the F statistic What are the null and the alternative hypotheses? iii. Are the restricted variables jointly different than zero? iv. Consider insignificant t statistics and explain the result of F test in the light of multicollinearity. v. Can we test significance of one independent variable by F test? Restricted model estimation ADA University © Khatai Abbasov 79 3. Hypothesis Testing in the Regression Analysis 3.5 The F-Test F Statistic for overall significance of a regression H0 : 𝛽1 = 𝛽2 = 𝛽3 = ⋯ = 𝛽𝑘 = 0 Null states that none of the explanatory variables has and effect on y. How to calculate the F statistic then? Most regression packages report the F statistic automatically. If we fail to reject, then we should look for other variables to explain Y. ADA University © Khatai Abbasov 80 4. Nonlinearities in the Regression Models Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 81 4. Nonlinearities in the Regression Models Outline of Section 4: 4.1 Logarithmic Functional Forms 4.2 Models with Quadratics 4.3 Interaction Term 4.4 Adjusted R-squared ADA University © Khatai Abbasov 82 4. Nonlinearities in the Regression Models 4.1 Logarithmic Functional Forms Linear models with (natural) logarithmic variables are often more realistic: ADA University © Khatai Abbasov 83 4. Nonlinearities in the Regression Models 4.1 Logarithmic Functional Forms What variables to use in a log form: Very common to use wages, salaries, firm sales, firm market values in natural logarithm Population, total number of employees, school enrolment often in natural log Measurement in years, e.g., education, experience, tenure, age usually in original form Proportions or percent: unemployment rate, arrest rate, participation rate in a pension plan, etc. are preferred in level form, but sometimes may be seen in log as well. price stands for housing price (price) nox = amount of nitrogen oxide in the air interpret the coefficients natural logarithm in log-level model has some approximation error when the percentage change is large ADA University © Khatai Abbasov 84 4. Nonlinearities in the Regression Models 4.1 Logarithmic Functional Forms Exact calculation of the percentage change in log-level model requires adjustment since we are interested in one unit change, ∆𝑥2 = 1 100[exp(0.306)−1]=35.8% What happens if 𝛽2 = −0.052? Is it crucial to adjust for small percentage changes? What then if we increase 𝑥2 by 5 unit instead of 1 unit? (𝛽2 = −0.052) What if we want to adjust for the decreasing number of rooms by one? ADA University © Khatai Abbasov 85 4. Nonlinearities in the Regression Models 4.2 Models with Quadratics Quadratic functions are used to capture decreasing or increasing marginal effects. β1 does not measure then change in y with respect to x anymore, since no sense of keeping x2 fixed while changing x. Exper has a diminishing effect on wage. ADA University © Khatai Abbasov 86 4. Nonlinearities in the Regression Models 4.2 Models with Quadratics Quadratic functions are used to capture decreasing or increasing marginal effects. Exper has a diminishing effect on wage. The 1st year of experience is worth roughly 30 cent/hour. (0.298 – 0 = 0.298) The 2nd year of exper worth less, roughly 28.6 cent/hour. (0.298 – 2 x 0.0061 = 0.286) From 10 to 11 years of experience, wage is predicted to increase by about 0.298 – 2 x 0.0061 x 10 = 0.176 or 17.6 cent/hour Remember: ADA University , and x is the current year before change © Khatai Abbasov 87 4. Nonlinearities in the Regression Models 4.2 Models with Quadratics Calculate the turning point on the graph When the 1st derivative is equalized to zero, then we find the maximum of the function or the turning point: 0.298/(2 x 0.0061) = 24.4 years is a point where return to experience becomes zero. ADA University © Khatai Abbasov 88 4. Nonlinearities in the Regression Models 4.2 Models with Quadratics Possible explanations for the turning point Maybe few people have more than 24.4 years of experience, i.e., right side of the curve can be ignored. It is possible that return to experience becomes negative at some point but not in 24.4, since bias is expectable due to omitted variables. Or functional form might be wrong at all. ADA University © Khatai Abbasov 89 4. Nonlinearities in the Regression Models 4.2 Models with Quadratics Example 6.2 from the main book What is the t-statistic for rooms2? See the picture and literally explain the meaning of the relation when room number is less than 4.4? How to calculate the turning point? Starting at 3 rooms and increasing to 4 rooms reduces a house‘s expected value?! What could be the possible explanation? How much prices are increased if we increase the room number from 5 to 6? How much prices are increased if we increase the room number from 6 to 7? Interpret the coefficient of the quadratic variable practically/economically on the basis of the previous two questions? ADA University © Khatai Abbasov 1% of the sample 90 4. Nonlinearities in the Regression Models 4.3 Interaction Term Sometimes, partial effect depends on magnitude of another independent variable β3 > 0 means an additional bedroom yields a higher increase in price for larger houses β2 is the effect of bdrms on price for a home with zero square feet; often this is not of interest. Instead, the model below would be more interesting: where δ2 is the partial effect of x2 on y at the mean value of x1. (For mathematical proof, multiply out the interaction term.) ADA University © Khatai Abbasov 91 4. Nonlinearities in the Regression Models 4.3 Interaction Term Example 6.3 from the main book Dependent is standardized final exam score . x1 is attendance rate, unit is %. priGPA is the GPA in the past Show the effect of the attendance rate mathematically? What β1 measures? Is it interesting finding considering the smallest priGPA = 0.86 in the sample? What is the t-statistic of β1 and β6, are they significant statistically? Why? F test having p value of 0.014 for β1 and β6 would mean that? The effect of atndrte on stndfl at the mean value of priGPA, and its interpretation? (mean = 2.59, answer: 0.0078) Is this 0.0078 is statistically significant? Remember the transformation in the previous slide! What about the effect of priGPA on stndfnl. Try to explain mathematically? ADA University © Khatai Abbasov 92 4. Nonlinearities in the Regression Models 4.4 Adjusted R-squared R-squared never falls with a new independet variable since SSR does not rises. Solution: use Adj-R2 Primary attractiveness is that it penalizes additional independent variable in the model. k in the equation penalizes the equation, so Adjusted R-squared can either increase or decrease with a new independent variable. Negative value of Adj. R-squared indicates a very poor model fit relative to the DF. ADA University © Khatai Abbasov 93 4. Nonlinearities in the Regression Models 4.4 Adjusted R-squared Adjusted R-squared is shown by R-bar squared very often. Interpret R-squared for each model and explain the difference in values? Can we use Adjusted R-squared for model decision? Why? Hint: Dependent variables are different! If the dependent variables were same, how could we use Adjusted R-squared for comparison? ADA University © Khatai Abbasov 94 5. Binary (or Dummy) Variables Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy)Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 95 5. Binary (or Dummy) Variables Outline of Section 5: 5.1 Qualitative Information 5.2 Dummy in Multiple Regression Analysis 5.3 Dummy Variable for Multiple Categories 5.4 Ordinal Information 5.5 Binary Dependent Variable – LPM Model 5.6 Discrete Dependent Variables ADA University © Khatai Abbasov 96 5. Binary (or Dummy) Variables 5.1 Qualitative Information ADA University © Khatai Abbasov 97 5. Binary (or Dummy) Variables 5.1 Qualitative Information Binary variable = Dummy variable = Zero-one variable What does intercept tell us? What does coefficient of female tell us? Is the difference between male and females statistically and economically significant? Intercept in the models with binary variables are interpreted as the base group The base group or benchmark group here is male, since female = 0 leaves us with males and that is shown by intercept. ADA University © Khatai Abbasov 98 5. Binary (or Dummy) Variables 5.2 Dummy in Multiple Regression Analysis Intercept in the models with binary variables are interpreted as the base group What does negative intercept, -1.57 stand for? Is it economically meaningful? Why? What does coefficient on woman measure? Why in this example it is important to control for educ exper and tenure? The null hypothesis of No Difference Between the Wages of M&F, Formulate the alternative of There’s Discrimination Against Females? How intercept changes under these Hypotheses? ADA University © Khatai Abbasov 99 5. Binary (or Dummy) Variables 5.2 Dummy in Multiple Regression Analysis Does giving grants increase the hours of training per employee? hrsemp = hours of training per employee sales = annual sales employ = number of employees Is variable grant statistically significant? What does coefficient grant tell us? Is sales statistically and practically significant? What about employment then? What do you think about the reason of using sales and employment in the model? Does the finding for grant tell us anything about causality? Not necessarily: there could be firms that train there workers more even in the absence of grants, i.e., we must know how the grants receiving firms were determined! ADA University © Khatai Abbasov 100 5. Binary (or Dummy) Variables 5.3 Dummy Variable for Multiple Categories Several dummy independent variables in the same equation What is the base group in this model? Are the coefficients statistically significant? Interpret the coefficients for dummies? What would happen if we added singmale to the model? ADA University © Khatai Abbasov 101 5. Binary (or Dummy) Variables 5.3 Dummy Variable for Multiple Categories Several dummy independent variables in the same equation Can we see the wage difference between single and married women in the model above? How to interpret this difference? What can you do for calculating the t-statistic for this difference? ADA University © Khatai Abbasov 102 5. Binary (or Dummy) Variables 5.4 Ordinal Information Ordinal information: credit ratings, physical attractiveness, etc. Credit ratings for example: {0,1,2,3,4}, where zero is the worst. We can either use all the rating information as one variable in the model, or to use it as 4 different dummies where the base group can be the 5th excluded rating. Using as dummies has an interpretation advantage, since in other way around it is difficult to understand what does 1 unit increase exactly mean. Base group is average looking people Other factors: educ, exper, marital status, race What model tells about men below average looks? Is it statistically different than zero? What about men above average looks? ADA University © Khatai Abbasov 103 5. Binary (or Dummy) Variables 5.5 Binary Dependent Variable – LPM Model Linear Probability Model (LPM) is about dependent variable being zero/one, In other words, y in the model takes only two values, 0 and 1. Thus, β cannot be interpreted as the change in y given a one-unit increase in x. Given the zero conditional mean assumption, E(u|x) = 0, it is always true that E(y|x) = P(y=1|x): that is the probability that y = 1. Because probabilities must sum to one, P(y = 0|x) = 1 – P(y = 1|x) is also linear function of the Xj. We must consider now estimated y is the predicted probability of success. inlf = labor force participation inlf = 1, if the person works ADA University © Khatai Abbasov 104 5. Binary (or Dummy) Variables 5.5 Binary Dependent Variable – LPM Model Linear Probability Model (LPM) is about dependent variable being zero/one, inlf = labor force participation nwifeinc = husband’s earning kidslt6 = kids less than 6 years kidsge6 = kids from 6 to 18 years 10 more years of education increases the probability of being in the labor force by 0.38, which is a pretty large increase in a probability. Explain nwifeinc coefficient in the same manner? Is it a large increase or not? What variable do you expect to be statistically insignificant? Is it? ADA University © Khatai Abbasov 105 5. Binary (or Dummy) Variables 5.5 Binary Dependent Variable – LPM Model Shortcomings of LPM Certain combinations of values for the independent variables can yield results either less than zero or greater than 1. However, predicted probabilities must be between 0 and 1. In fact, 16 of the fitted values are less than zero, and 17 of the fitted values are greater than 1 in the previous estimation model with the dependent of inlf. Another problem is that linear relation between probability and independent variables is awkward. Going from zero children to one children has a same probability change with going from 1 children to 2 children, which is the reduction in the probability of working by 0.262. In fact, when taken to the extreme, the estimation on the previous slide implies that going from zero to four young children reduces the probability of working by 0.262 x 4 = 1.048, which is impossible. Although above mentioned issues exist, LPM models are still usable and can help to understand the model better, at least in average values and with cautious interpretation of significances. ADA University © Khatai Abbasov 106 5. Binary (or Dummy) Variables 5.6 Discrete Dependent Variables Dependent variable has a set of small integer values, and zero is a common value For instance, number of arrests, number of living children, etc. Interpretation the estimate of educ? Each additional years of education reduces the estimated number of children by 0.079? Remember that an estimate is the effect of the independent variable on the expected value (average value) of y. Thus, the estimate of education means that, average fertility falls by 0.079 children given one more year of education. Even a better summary would be, if each woman in a group of 100 (women) obtains another year of education, we estimate there will be roughly 8 fewer children among them. What about the interpretation of the dummy variable of electricity? ADA University © Khatai Abbasov 107 6. Heteroscedasticity Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 108 6. Heteroscedasticity Outline of Chapter 6: 6.1 Importance 6.2 Testing for Heteroskedasticity 6.3 Solutions against Heteroskedasticity ADA University © Khatai Abbasov 109 6. Heteroscedasticity 6.1 Importance: In the linear model of 𝑦 = 𝛽1 𝑥 + 𝑢, & 𝑦 = 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝑢 & variance is assumed to be constant, homoscedastic. However, it is more likely that, people with more education have a wider variety of interests and job opportunities, which could lead to more wage variability at higher levels of education. people with very low levels of education have fewer opportunities and often must work at the minimum wage; this serves to reduce wage variability at low education levels. or the the variance of unobserved part in the savings function is a function of income, i.e., people with more income have more opportunity to save, but not people with less income. ADA University © Khatai Abbasov 110 6. Heteroscedasticity 6.1 Importance: Homoskedasticity & Heteroskedasticity ADA University © Khatai Abbasov 111 6. Heteroscedasticity 6.1 Importance: Best Linear Unbiased Estimator (BLUE) assumptions are Linearity in parameters Random Sampling No Perfect Collinearity Zero Conditional Mean Homoskedasticity Heteroskedasticity is a problem in the calculations of t, F test, LM-statistic and confidence intervals since Var(𝛽𝑗 ) is biased without homoskedasticity assumption. Heteroskedasticity does not cause a bias or inconsistency in the OLS estimators!!! R-squared and Adjusted R-squared are unaffected by the presence of heteroskedasticity. ADA University © Khatai Abbasov 112 6. Heteroscedasticity 6.2 Testing for Heteroscedasticity: Breusch-Pagan Test 1. 2. 3. 4. F test, or 3 ADA University © Khatai Abbasov 113 6. Heteroscedasticity 6.2 Testing for Heteroscedasticity: White Test 1. 2. 3. 4. F test, or 4 ADA University © Khatai Abbasov 114 6. Heteroscedasticity 6.3 Solutions against Heteroskedasticity: Robus Standard Errors standard error of the coefficients are re-estimated software packages calculate robust standard error if required th 𝑟ෞ 𝑖𝑗 denotes the i residual from regressing 𝑥𝑗 on all other independent variables SSRj is the sum of square residuals from this regression Heteroskedasticity-robust F statistic, or LM statistic in joint hypothesis test software packages do the calculations ADA University © Khatai Abbasov 115 6. Heteroscedasticity 6.3 Solutions against Heteroskedasticity: Robus Standard Errors Do you expect very big difference in t-statistics? Why? Are the robust standard errors larger or smaller than the usual standard errors? 4 ADA University © Khatai Abbasov 116 6. Heteroscedasticity 6.3 Solutions against Heteroskedasticity: Weighted Least Squares (WLS) The idea of this method lies in the notion of heteroskedasticity. heteroskedasticity means that the variance is a function of the independent variables, mathematically: Var(u|x) = σ2 h(x). To eliminate heteroskedasticity a transformation should then be: Var(ui|xi) = E(u2i |xi) = σ2 h(xi) = σ2 hi variables are transformed as it is illustrated above but we must interpret the new coefficients according to the original form of the model Example: savi = β0 + βi inci + ui Var(ui|inci) = σ2 inci savi / inci = β0 / inci + βi / inci + ui / inci ADA University © Khatai Abbasov Var(ui|inci) = σ2 117 6. Heteroscedasticity 6.3 Solutions against Heteroskedasticity: Feasible GLS or Estimated GLS GLS here means generalized least squares it is the estimated version of the weighted least squares (WLS). abbreviated as FGLS or EGLS The idea comes from the equation where the weight, h(x), is thus, we can regress log(u2) on x1, x2, …, xk and get fitted values and call it gෝi then, estimated weight is ADA University hi = exp(gෝi ) © Khatai Abbasov 118 6. Heteroscedasticity 6.3 Solutions against Heteroskedasticity: Feasible GLS or Estimated GLS Steps ADA University © Khatai Abbasov 119 7. More on Specification and Data Issues Content: 0. Introduction 1. The Simple Regression Model 2. The Multiple Regression Model 3. Hypothesis Testing in the Regression Analysis 4. Nonlinearities in the Regression Models 5. Binary (or Dummy) Variables 6. Heteroscedasticity 7. More on Specification and Data Issues ADA University © Khatai Abbasov correspond. chapters in book Review Chapter 2 Chapter 3 Chapter 4 Chapter 6 Chapter 7 Chapter 8 Chapter 9 120 7. More on Specification and Data Issues Outline of Chapter 7: 7.1 Functional Form Misspecification 7.2 RESET Test 7.3 Testing Non-Nested Hypothesis 7.4 Proxy Variables 7.5 Measurement Error 7.6 Data Issues ADA University © Khatai Abbasov 121 7. More on Specification and Data Issues 7.1 Functional Form Misspecification: Assume that our true model is one of these specifications: 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋𝑍 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑍 2 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝛽4 𝑍 2 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝛽4 𝑍 2 + 𝛽5 𝑋𝑍 + 𝑢 or maybe this: 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑋 2 + 𝛽4 𝑍 2 + 𝛽5 𝑋𝑍 + 𝛽6 𝑋 3 + 𝛽7 𝑍 3 + … + u if you estimate the model 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝑢, then we are mis-specifying the functional form. ADA University © Khatai Abbasov 122 7. More on Specification and Data Issues 7.1 Functional Form Misspecification: How to be sure then our model is not misspecified? One of the very powerful tool for detecting the misspecified functional form: F test if the estimated model 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝑢 is thought to have non-linear variables, e.g., quadratics, or interaction terms, then we could add these variables and test whether they are meaningful: 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝜷𝟑 𝑿𝟐 + 𝜷𝟒 𝒁𝟐 + 𝜷𝟓 𝑿𝒁 + 𝑢 F test 𝐻0 : 𝛽3 = 𝛽4 = 𝛽5 = 0 It is also possible to add the third or fourth powers of the variables to the model: (… + 𝜷𝟔 𝑿𝟑 + 𝜷𝟕 𝒁𝟑 + … + 𝑢) ADA University © Khatai Abbasov 123 7. More on Specification and Data Issues 7.1 Functional Form Misspecification: Drawback of adding explanatory variables to test the misspecification: if the estimated model has many explanatory variables 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝐿 + 𝛽4 𝑀 + … + 𝑢 then it might use up many degrees of freedom since the quadratics and interactions will expand the size of the model: 𝐷𝐹 = 𝑛 − 𝑘 − 1, where k is the number of the variables. It gets worse when we add the third or fourth powers of the variables to the model. Solution: RESET Test ADA University © Khatai Abbasov 124 7. More on Specification and Data Issues 7.2 RESET Test: Ramsey’s (1969) regression specification error test if the estimated model is 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝑢 then OLS predicted values of y is 𝑦ො = 𝛽መ0 + 𝛽መ1 𝑋 + 𝛽መ2 𝑍 in RESET test, we add 𝑦ො 2 , 𝑦ො 3 to the model and test whether coefficients are statistically meaningful: ෝ𝟐 + 𝜷𝟒 𝒚 ෝ𝟑 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝜷𝟑 𝒚 F test 𝐻0 : 𝛽3 = 𝛽4 = 0 𝑦ො 2 , 𝑦ො 3 are nonlinear function of the independent variables. Hint: (𝒙 + 𝒚)𝟐 = 𝒙𝟐 + 𝟐𝒙𝒚 + 𝒚𝟐 ADA University © Khatai Abbasov 125 7. More on Specification and Data Issues 7.2 RESET Test: Ramsey’s (1969) regression specification error test Example: 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑍 + 𝛽3 𝑊 + 𝑢 level-level n=96, RESET statistic: 3.11, p-value: 0.05 log 𝑦 = 𝛽0 + 𝛽1 log 𝑋 + 𝛽2 log 𝑍 + 𝛽3 log(𝑊) + 𝑢 log-log model n=96, RESET statistic: 2.37, p-value: 0.10 On the basis of RESET, log-log model is appropriate. This is purely because of levellevel model being misspecified is worse off than the log-log model!!! If you want to use the table of critical values for the F test, then keep in mind that we add 𝑦ො 2 , 𝑦ො 3 to the model in the RESET test. since 𝑛 = 88, 𝑘 = 5 𝐹2,82 critical should be sought at 1%, 5% & 10% tables. ADA University © Khatai Abbasov 126 7. More on Specification and Data Issues 7.2 RESET Test: Ramsey’s (1969) regression specification error test Power of the predicted value, 𝑦, ො is unclear: squared and cubed values are common in most applications. Nonconstructive: no indication what to do if the model is misspecified. Nevertheless, useful tool if one suspects that null model is too restrictive. Yet, there is no obvious more general alternative. RESET is a test of the functional form: not for omitted variables or heteroscedasticity. ADA University © Khatai Abbasov 127 7. More on Specification and Data Issues 7.3 Testing Non-Nested Hypothesis: Nested models: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 Nonnested models: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽3 𝑥3 + 𝑢 F statistic is appropriate for nested model, but not for nonnested models. Nonnested models: one solution for comparison could be Adjusted R-squared. But not when the dependent variable has a different form, e.g., log(y). Nonnested models: another solution is more composite or comprehensive model. Example ADA University 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑦 = 𝛽0 + 𝛽1 log(𝑥1 ) + 𝛽2 log(𝑥2 ) + 𝑢 𝑦 = 𝛿0 + 𝛿1 𝑥1 + 𝛿2 𝑥2 + 𝛿3 log(𝑥1 ) + 𝛿4 log(𝑥2 ) + 𝑢 𝐻0 : 𝛿1 = 𝛿2 = 0; & 𝐻0 : 𝛿3 = 𝛿4 = 0 © Khatai Abbasov 1st model 2nd model Composite model Null hypotheses 128 7. More on Specification and Data Issues 7.3 Testing Non-Nested Hypothesis: Nonnested models: next solution is Davidson-MacKinnon test. Example 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑢 𝑦 = 𝛽0 + 𝛽1 log(𝑥1 ) + 𝛽2 log(𝑥2 ) + 𝑢 𝑦ො = 𝛽መ0 + 𝛽መ1 log(𝑥1 ) + 𝛽መ2 log(𝑥2 ) 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜃1 𝑦ො + 𝑢 𝐻0 : 𝜃1 = 0 1st model 2nd model fitted values Davidson-MacKinnon model Null hypotheses We can also take the fitted values of the 1st model and add to the 2nd model. If the null is rejected, i.e., 𝜃1 is significant, means we choose the 2nd model. Rejection of the 1st model does not mean that the other model is correct. could also be because of various functional form misspecifications ADA University © Khatai Abbasov 129 7. More on Specification and Data Issues 7.4 Proxy Variables: Proxy in the place of Unobserved Independent Variable Some variables cannot be observed in real life, e.g., ability, uncertainty, etc. Problem: missing variables cause omitted variable bias in the model. Solution: use proxy variables Examples: IQ test score instead of ability, EPU index instead of uncertainty, etc. Crucial point: when we use proxy variables, we know that we still may have a bias, but we hope that this bias is less than omitted variable bias. ADA University © Khatai Abbasov 130 7. More on Specification and Data Issues 7.4 Proxy Variables: Proxy in the place of Unobserved Independent Variable Mathematically: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3∗ + 𝑢 𝑥3∗ (ability) is unobservable, but 𝑥3 (IQ) 𝑥3∗ = 𝛿0 + 𝛿3 𝑥3 + 𝑣3 we expect 𝛿3 > 0 since 𝑥3 is a proxy Let’s see a special case where 𝛽1 & 𝛽2 are unbiased: 𝑐𝑜𝑟𝑟 𝑢, 𝑥1 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥2 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥3∗ = 0, 𝑐𝑜𝑟𝑟 𝑢, 𝑥3 = 0 𝑐𝑜𝑟𝑟 𝑣3 , 𝑥1 = 0, 𝑐𝑜𝑟𝑟 𝑣3 , 𝑥2 = 0 The 2nd condition guarantees that 𝑥3 is a good proxy for 𝑥3∗ . 𝑦 = 𝜷𝟎 + 𝜷𝟑 𝜹𝟎 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜷𝟑 𝜹𝟑 𝑥3 + 𝜷𝟑 𝒗𝟑 + 𝒖 1st condition 2nd condition plug-in model We get slope for proxy 𝑥3 , 𝜷𝟑 𝜹𝟑 , which is more interpretable. ability versus IQ ADA University © Khatai Abbasov 131 7. More on Specification and Data Issues 7.4 Proxy Variables: Proxy in the place of Unobserved Independent Variable Correlation between 𝑥3∗ (ability) and 𝑥1 (education) & 𝑥2 (experience)??? Then to maintain the 2nd condition we have the following equations: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3∗ + 𝑢 𝑥3∗ = 𝛿0 + 𝛿1 𝑥1 + 𝛿2 𝑥2 + 𝛿3 𝑥3 + 𝑣3 𝑥3∗ (ability) is unobservable, but 𝑥3 (IQ) 𝑥3∗ is related to the observed variables Conditions: 𝑐𝑜𝑟𝑟 𝑢, 𝑥1 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥2 = 𝑐𝑜𝑟𝑟 𝑢, 𝑥3∗ = 0, 𝑐𝑜𝑟𝑟 𝑢, 𝑥3 = 0 𝑐𝑜𝑟𝑟 𝑣3 , 𝑥1 = 0, 𝑐𝑜𝑟𝑟 𝑣3 , 𝑥2 = 0 1st condition 2nd condition 𝑦 = 𝛽0 + 𝛽3 𝛿0 +(𝜷𝟏 + 𝜷𝟑 𝜹𝟏 )𝑥1 +(𝜷𝟐 + 𝜷𝟑 𝜹𝟐 )𝑥2 +𝛽3 𝛿3 𝑥3 +𝛽3 𝑣3 + 𝑢 plug-in model If 𝛽3 > 0 & 𝛿1 > 0: there’s a positive bias, but hopefully smaller than the entirely omitted variable bias, when the ability is not used in the model at all. ADA University © Khatai Abbasov 132 7. More on Specification and Data Issues 7.4 Proxy Variables: Example from Wooldridge (2016, p.281-282) Estimated return to education in column 1? Estimated return to education in column 2? Explain this difference in 𝛽𝑒𝑑𝑢𝑐 ? Hint: 𝑐𝑜𝑟𝑟 𝑒𝑑𝑢𝑐, 𝑎𝑏𝑖𝑙𝑖𝑡𝑦 > 0 Interpret the coefficient of IQ in column 2. Go to the Example 9.3 in the book and read all the explanations, particularly for the column 3, which is not shown here in this slide. ADA University © Khatai Abbasov 133 7. More on Specification and Data Issues 7.5 Measurement Error: Similar statistical characteristics with proxy variable case: In the proxy variable case, we are looking for a variable that is somehow is related to the unobserved variable. In the measurement error case, we do have the variable, but recorded measure of it may contain error. For instance: IQ score is a proxy for ability Reported annual income is a measure of actual annual income. ADA University © Khatai Abbasov 134 7. More on Specification and Data Issues 7.5 Measurement Error: Measurement Error in the Dependent Variable Let the true regression model be: 𝑦 ∗ = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝑢 𝑦 ∗ is the actual 𝑒0 = 𝑦 − 𝑦 ∗ 𝑦 ∗ = 𝑦 − 𝑒0 𝑒0 is the measurement error y is the observed 𝑦 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝒖 + 𝒆𝟎 plug-in model If 𝑐𝑜𝑟𝑟 𝑒0 , 𝑥𝑖 = 0, where 𝑖 = 1,2, … , 𝑘 OLS estimators are unbiased If 𝑐𝑜𝑟𝑟 𝑒0 , 𝑢 = 0, Var u + e0 = σ2u + σ20 > σ2u larger variance for 𝛽𝑖 , 𝑖 = [1, 𝑘] What if 𝑐𝑜𝑟𝑟 𝑒0 , 𝑥𝑖 ≠ 0? What is its meaning and consequence? (see the next example) ADA University © Khatai Abbasov 135 7. More on Specification and Data Issues 7.5 Measurement Error: Measurement Error in the Dependent Variable Example: Does job-training grants decrease the scrap rate in manufacturing firm? log(𝑠𝑐𝑟𝑎𝑝∗ ) = 𝛽0 + 𝛽1 𝑔𝑟𝑎𝑛𝑡 + 𝑢 𝑔𝑟𝑎𝑛𝑡 = 1 firms receive grant, 0 not log(𝑠𝑐𝑟𝑎𝑝) = log(𝑠𝑐𝑟𝑎𝑝∗ ) + 𝑒0 scrap rate is measured with error log(𝑠𝑐𝑟𝑎𝑝) = 𝛽0 + 𝛽1 𝑔𝑟𝑎𝑛𝑡 + 𝑢 + 𝑒0 plug-in model Is it really true that 𝑐𝑜𝑟𝑟 𝑒0 , 𝑔𝑟𝑎𝑛𝑡 = 0? What if firm underreports its scrap rate in order to make the grant look effective? This means 𝑐𝑜𝑟𝑟 𝑒0 , 𝑔𝑟𝑎𝑛𝑡 < 0, consequently a downward bias in 𝛽1 . Thus, if the measurement error is just a random reporting error, then no bias. if it is systematically related to one of the explanatories, then bias in OLS. ADA University © Khatai Abbasov 136 7. More on Specification and Data Issues 7.6 Data Issues: Missing Data very common that information is missing on some key variables for several units no information on father‘s or mother‘s education, IQ level, etc. in time series data, no information for some years modern software packages (STATA, R, etc.) ignore missing observations when computing a regression, and keep track of missing data data missing completely at random (MCAR), cause no statistical problems MCAR implies that the reason the data are missing is independent, in a statistical sense, of both the observed and unobserved factors affecting y. ADA University © Khatai Abbasov 137 7. More on Specification and Data Issues 7.6 Data Issues: Solution for MCAR (missing completely at random): it is common to write zero in the place of missing values, and to create dummy of “missing data indicator“, which is equal to 1 when missing, 0 when observed then include these two variable together in the regression this works only under MCAR assumption exclusion of the dummy variable from the model leads to substantial bias MCAR is often very unrealistic the lower the IQ score, the higher the probability of it missing from the data similar for the education at lower-than-average levels ADA University © Khatai Abbasov 138 7. More on Specification and Data Issues 7.6 Data Issues: Stratified Sampling: layered or classified into subpopulations one of the very common nonrandom sampling that is created intentionally imagine you want to learn whether there is a gender wage gap in the army the number of women personnel being low, random sampling creates a bias or you want to predict election results in a country with 3 cities: A-with 1 million factory workers, B-with 2 million office workers, C-with 3 million retirees. random sampling of 600 people being poorly balanced will create a bias solution: random sampling from A with 100, B with 200, C with 300 people, however, can produce small error in estimation ADA University © Khatai Abbasov 139 7. More on Specification and Data Issues 7.6 Data Issues: Outliers: unusual observations OLS minimizes the sum of the squared residuals Therefore, outliers may change the estimates by large amount Outlying can occur because of a mistake has been done when entering the data e.g., extra zeros, misplacing the decimal point check summary statistics: minimums and maximums incorrect entries are hard to catch It also occurs when one or several members are very different from the rest to keep or drop such outliers in a regression analysis is a difficult decision OLS results are reported with and without outliers in such cases very often ADA University © Khatai Abbasov 140 7. More on Specification and Data Issues 7.6 Data Issues: Outliers: unusual observations Figure 9.1 from Wooldridge (2016, p.297) using data with all 32 companies estimates, say, the red regression line it is clear that the possible outlier was decisive in the estimation, even though there is only one firm with $40 billion annual sales leaving this outlier and using the model with 31 companies estimates, say, the blue line the slope has increased and definitely the t statistic of the estimate Go to the Examples 9.8, 9.9, and 9.10 in the book and read all the explanations. ADA University © Khatai Abbasov 141 7. More on Specification and Data Issues 7.6 Data Issues: Outliers: unusual observations Figure 9.1 from Wooldridge (2016, p.297) what about R&D as a % of sales = 9.42 why not leaving both outliers and estimating the regression model with 30 observations yet, two R&D intensities above 6 (green arrow) could also be treated as outliers Finding out outliers is very difficult endeavor. non-linear models (e.g., quadratic, logarithmic, etc.) could even be better off. OLS? Why not LAD (least absolute deviations) or Quantile Regressions? ADA University © Khatai Abbasov Go to the Examples 9.8, 9.9, & 9.10 in the book and read all the explanations. 142