EC821 Econometric Methods 2013/14 SCHOOL OF ECONOMICS EC821 Econometric Methods Staff Module convenor Dr Yu Zhu Office Keynes B1.05 Teaching Assistant Email y.zhu-5@kent.ac.uk Ivan Mendieta-Munoz (iim3@kent.ac.uk) Teaching information Teaching period Autumn Term Teaching pattern One two hour lecture/seminar per week and a one hour computer practical per week Hours of study Contact hours 33 Private study hours 117 Total study hours 150 Assessment Task Weighting Test/exam date Class Test 20% Computer Based Coursework Project 20% N/A Exam 60% May/June 2014 Coursework submission date N/A N/A Coursework submission policy All coursework must be submitted by the deadline stipulated by the module convenor, as listed above, to the School of Economics General Office, Mg.14 Keynes College. All coursework should be accompanied by a completed cover sheet. No extensions to submission deadlines are granted. If you miss the deadline and submit the coursework late, you must also submit a concessions form for late submission, available from the Social Sciences Faculty Office, www.kent.ac.uk/socsci/studying/undergrad/concessions.htm UNIVERSITY OF KENT SCHOOL OF ECONOMICS EC821 ECONOMETRIC METHODS MODULE DOCUMENTS September 2013 Module Convener: Yu Zhu Notes: This document contains the basic module materials. Additional handouts may be distributed at the appropriate lectures, seminar classes and computer practicals respectively. The aim is to make the module materials easily accessible to participants. If you wish to print parts of this document you can do so from the pdf file. Note that you can print several pages on one A4 sheet by selecting a suitable option on the print menu. 1 Module Documents Contents Page Number Module outline, syllabus and reading 3–8 Coursework Assessment 9 Class Planner 10 Class Exercises 11 – 18 Computer Practicals 19 – 35 2012/13 Class Test 36 – 38 2011/12 Class Test 39 – 40 2010/11 Class Test 41 – 42 2012/13 Exam Paper 43 – 46 2011/12 Exam Paper 47 – 51 2010/11 Exam Paper 52 – 55 Statistical Tables 56 – 61 Notes 62 - 64 2 1 MODULE OUTLINE Introduction This module aims to study basic single equation econometric techniques in an intuitive and practical way to develop your understanding and ability to apply econometric methods. You will develop an understanding of the conventional linear regression model and the problems associated with the application of regression methods to economic modelling. The module is concerned with the application of econometric methods, with little emphasis on the mathematical aspects of the subject (which may be studied in other modules). The microcomputer software package STATA will be used for practical work throughout this module, both as a means of providing realistic applications of the theory developed in lectures and to give you experience in the use of such software as a preparation for your own empirical research. No previous knowledge of computing or econometrics is required. Aims The module aims to to develop students understanding and ability to apply quantitative economic methods to follow an intuitive approach by use of practical examples and practical classes, using STATA to give participants the ability to critically evaluate empirical literature to contribute to the students' ability to carry out empirical research Learning Outcomes By the end of the module participants should be able to: understand the nature of economic and economic models apply least squares estimation methods using STATA perform and interpret the results of specification tests evaluate model adequacy using diagnostic tests and other criteria understand simultaneous equation methods undertake unsupervised practical work using STATA interpret the empirical economic research of others and be able to evaluate critically empirical literature analyse and report in writing on own and others’ empirical economic results. Skills This module contributes substantially to subject specific skills acquired across all MSc programmes. Empirical evaluation of economic models is crucial to the study and application of economics. By the end of this course you should acquire the skills and understanding to read and evaluate the empirical literature in economics and to carry out your own empirical research. As regards general and transferable skills, the module will develop or reinforce students’ skills in a number of different areas. In addition to technical and research skills, they will: 3 develop their ability to utilize modern computing resources to access and acquire data from the Internet (and other available sources) and utilize standard Office based PC software (currently Microsoft) to generate written reports and undertake oral presentations acquire the ability to undertake modelling of economic behaviour and use statistical software develop and reinforce skills in numeracy and problem solving from the interpretation and manipulation of empirical economic models improve their skills in communication and team work in making group presentations in class present economic arguments orally as well as in written form This module also contributes to most of the intellectual and transferable skills of the MSc programmes. If you need help in study skills you may ask for advice from the lecturer or get assistance from the Student Learning Advisory Service. The Economics Graduate Handbook gives information on support available through the Student Learning Advisory Service, which is part of the Unit for the Enhancement of Learning and Teaching, and through the English Language Unit. You should read this handbook carefully and make full use of these services. All students should visit the Student Advisory Service to see what it offers in terms of advice and literature on essay writing, examination preparation, time management etc. Module Administration Module Convener: Yu ZHU, Keynes B1.05, x 7438, email yz5@kent.ac.uk Timetable: Lecture/seminar: Computer practical: Consultation hours: Tuesday 11 am - 1 pm, KS23 Monday 11.05am – 11.55am, KSA1 Tuesday 3-4 pm and Thursday 4-5 pm Teaching Methods There will be a two-hour lecture/seminar session (22 hours in total) and one computer practical (11 hours in total) per week. The lectures introduce the module material and provide an overview of the principles of basic econometric methods. Applications of these techniques are conducted in computing workshops using simulated or real world data. Seminars will be used to facilitate discussion of computer and class exercises and for student presentations. The seminar programme improves the analytical abilities of students, their understanding of the module material and their communication skills. The seminars also give students the opportunity to show their understanding of the module material and ask questions about topics they are not sure about. Advice and feedback on seminar communication skills are also given. The lectures and computer workshops are designed to improve the analytical and problem solving skills of students, and develop their ability to apply their knowledge and understanding of econometric issues to simulated and real world data. Throughout the module, emphasis is put on the need for students to improve their own learning skills and academic performance. This is achieved through feedback on student work and academic guidance on private study. The lecture is on Tuesdays, 11am-1pm. Normally about 1.5 hours will be devoted to the lecture material, the remaining time used to discuss computer and class exercises and for student presentations. The computer sessions are on Monday, 11.05 – 11.55 in the Terminal Room KSA1. You are also expected to see the lecturer out of class hours if you have any difficulties with the material or exercises and if you have any other problems relating to the module. 4 Study Methods An effective way to study this subject is to regularly attempt questions supplied as part of the module materials, from textbooks or from past examination papers and to read what are often only relatively short sections of the textbooks. You will be given weekly exercises, some of which will be based on the output from the computer sessions. It is extremely important to attempt these exercises prior to the class at which solutions are discussed to test your understanding of the material as the module progresses. The module is cumulative, in the sense that understanding of later parts of the syllabus being dependent on a thorough grasp of earlier material. The exercises enable you to test your grasp of the concepts and give a guide to areas in which to consult the texts or the lecturer. You should devote 10 hours per week in the Autumn term to this module. This means that in addition to teaching hours, you should spend around 7 hours a week during term time. With the examination term, you should devote around 150 hours to this module. A substantial proportion of your study time can be spent on the class and computer exercises, problems from the textbooks and associated reading. The solutions to exercises will be discussed in the Tuesday lecture/seminar session and if you have any difficulty in completing exercises please see the lecturer for help and clarification as soon as possible. All staff have consultation hours during which they are available to see students - the times are posted on their consultation doors and on the economics web pages. You may also e-mail with simple questions. Computer Practicals In the computing practical classes you will estimate models which illustrate and develop the lecture material, and you will gain experience in the use of microcomputers and econometric software. The results from these practicals will be discussed in the lecture/class sessions, so you should bring your printed results to the lectures. A few of the exercises will be from the recommended textbook so that you have access to the textbook discussion of the topic. Although the module uses STATA, there are other programmes available on the networked computers which are useful for both professional and vocational purposes. In particular you should become familiar with Word, a wordprocessing programme, since this will be used for writing essays and your dissertation. Introductory documents are available from the reception desk in the Computer Laboratory and the Computing staff run introductory courses during the year which you may attend. Also familiarity with a spreadsheet programme (for example Excel) is often expected by employers in the private and public sectors, and for data entry and manipulation such programmes can be extremely useful both in this module and in your dissertation work. Again introductory courses and documentation are available and the lecturer can offer assistance. During the module you will also be introduced to the World Wide Web and the sources of information on it of particular interest to economists. This includes economic databases (such as the Penn World Tables) which might be useful for your dissertation. Assessment The final mark for the present module is made up of 40% of the coursework plus 60% of the exam mark. The coursework is in two equally weighted parts; the first is based on a class test in week 8 which tests students’ use and knowledge of the basic single-equation econometrics part of the module. The computer-based coursework project to be submitted by the end of the Autumn Term assesses the writing, modelling, literature, computing, interpretation and empirical research development learning outcomes. The two-hour examination consists of two questions from a choice of six. The exam is designed to test and develop the non-computing and non-oral skills and learning outcomes identified earlier. 5 The word limit for the computer-based coursework project is 1,500 words, plus an appendix up to 5 page long containing summary statistics and estimation results. The work should be submitted to the Economics General Office no later than 12.00 on Friday 24th January 2014. In fairness to those who meet the due date and time, no work will be accepted after this time and a zero grade will be recorded unless there are acceptable, documented medical or other reasons for late submission. You are advised to begin your work for the assignment well before the end of term. Reading The core text for this module is: Wooldridge, J.M., 2013, Introductory Econometrics – A Modern Approach, South-Western, 5th edition (International Edition). All students should either buy a copy or ensure they have easy access to it since the module will follow the text quite closely. In addition, you will find the following book very useful, especially for the computing classes: Baum, C.F., 2006, Introduction to Modern Econometrics Using STATA, STATA Press, ISBN-10: 159718-013-0. The syllabus for the module is also covered adequately by many textbooks, of which the following are suitable. You may like to refer to one or more of these for some topics. Guidance will be given in lectures. References might also be made to journal articles which both illustrate the material and link to other modules. Multiple copies of all texts are in the library, some in the short loan collection. Kennedy, P., 2008, A Guide to Econometrics, 6th edition, John Wiley and Sons Ltd. Mukherjee, C., White, H. and M. Wuyts, 1997, Econometrics and Data Analysis for Developing Countries, Taylor & Francis Book Ltd, paperback. Gujarati, D., 2003, Basic Econometrics, 4th Edition, McGraw-Hill. Dougherty, C., 2002 Introduction to Econometrics, 2nd edition, Oxford University Press. Although there are multiple copies of all the above books (some are of earlier editions) in the Library, if you have any difficulty obtaining the reading, either from the library or the bookshop, please let the lecturer know immediately. Problems We hope you find your study of this module interesting and productive. If you have any problems or suggestions to make about the subject matter, the organisation of the module or any other issues the lecturer would like to hear from you. Alternatively, you can talk in confidence to your Director of Studies or your Staff/Student Liaison Committee representatives. 6 SYLLABUS The references given for each topic are alternatives and it is not essential to read more than one reference, although you find it helpful to do so. More detailed section and page references to the core texts will be given during lectures. 1. The Linear Regression Model 1.1 The "Classical" Assumptions 1.2 Estimators and their properties 1.3 Simple linear regression 1.3.1 OLS Estimators 1.3.2 Predicted Values and Residuals 1.3.3 Interpretation of OLS Estimators 1.3.4 Goodness of Fit 1.3.5 Elasticities 1.3.6 Some Non-linear Functions and Elasticities 1.4 Multiple regression 1.4.1 Introduction: 3-variable regression model 1.4.2 Interpretation of coefficients of multiple regression 1.5 Recovering estimation results and presenting regression estimates 1.6 Properties of Ordinary Least Squares 1.7 Inference 1.7.1 Standard errors and t-ratios 1.7.2 Hypothesis testing: some practical aspects 1.7.3 Tests of linear restrictions - F-tests Reading Wooldridge (2009), Chapters 2, 3.1-3.2, & 4 Baum, Chapters 2, 3 & 4 Kennedy, Chapter 2, 3 & 4 Mukherjee, Chapters 4 & 5 Gujarati, Chapters 2, 3, 4, 5, 7 and 8 Dougherty, Chapters 2, 3, 5 & 6 (section 5) 2. Extensions of the Linear Regression Model 2.1 Dummy Variables 2.1.1 Qualitative and seasonal dummy variables 2.1.2 Slope dummies 2.2 Omitted variable bias (underfitting) 2.3 Non-linear models 2.4 Multicollinearity Reading Wooldridge (2009), Chapters 6, 7, 3.3-3.5 Baum, Chapters 5, 7.1-7.3 Kennedy, Chapter 15, 5, 6 & 12 Mukherjee, Chapter 6 Gujarati, Chapter 9, Chapter 6, Chapter 13(Sections 1-5) Dougherty, Chapters 4, 5 (section 5), 6 (sections 1-4), 9 7 3. Failure of Classical Assumptions 3.1 Autocorrelation 3.2 Heteroscedasticity 3.3 Non-normality 3.4 Misspecification and diagnostic tests Reading Wooldridge(2009), Chapters 10, 12, 8,, 9 Baum, Chapters 6, 7.4 Kennedy, Chapters 7-10 Mukherjee, Chapter 7 & 11 Gujarati, Chapters 11, 12 and 13 Dougherty, Chapter 7 Class Test (Tuesday, Week 8) 4. Instrumental Variable Estimation 4.1. The IV estimator (with a Single Regressor and A Single Instrument) 4.2. The General IV Regression Model 4.3. Errors in Variables 4.4. Testing for Errors in Variables or Exogeneity (the Hausman Test) 4.5 Checking Instrument Validity 4.6. Where Do Valid Instruments Come From? Reading Wooldridge (2009), Chapter 15 Baum, Chapter 8 Kennedy, Chapter 9 Mukherjee, Chapters 13 & 14; Gujarati, Chapters 18-20; Dougherty, Chapter 10 5. Simultaneous Equation Models 5.1. The Seemingly Unrelated Regressions (SUR) Models 5.2. Simultaneous-equation Models 5.3. The Simultaneous-equation Bias 5.4. The Identification Problem 5.5. The Estimation of Structural Equations Reading Wooldridge (2009), Chapters 16 & 15 Baum, Chapter 8 Kennedy, Chapter 11 Mukherjee, Chapters 13 & 14; Gujarati, Chapters 18-20; Dougherty, Chapter 10 8 2 COURSEWORK ASSESSMENT The assessment exercise is in two parts, each contributing 20% to the 40% coursework contribution. 1. The first part is a class test in Week 8 (see the Class Planner on the following page) in which you will answer questions of a similar type to an examination question. The aim is to give some practice in answering quantitative questions under exam conditions, as well as testing subject specific knowledge and skills as stated in the module outline. The work will be marked and returned by Week 10. 2. The second part is a small empirical project. You will be given a dataset. You will be expected to select and estimate a model, interpret the results and evaluate the adequacy of your model. The work will be assessed on the quality of your interpretation and evaluation of your chosen model, not on how good the results are (e.g. the size of R2). However we do expect more than a simple bivariate static regression. Chapter 19 of Wooldridge (2013) offers a nice guide on how to carry out an empirical project. The word limit for the computer-based project is 1,500 words, plus an appendix up to 5 page long containing summary statistics and estimation results. The work should be submitted to the Economics General Office no later than 12.00 on Friday 24th January 2014. In fairness to those who meet the due date and time, no work will be accepted after this time and a zero grade will be recorded unless there are acceptable, documented medical or other reasons for late submission. 9 3 CLASS PLANNER Each week you will complete exercises and other questions for the lecture/seminar classes. Over the course of the term you will make at least one (group) presentation. You may make a note of what you have to do each week on the following timetable grid. To allow flexibility, the lecturer will set the tasks as term progresses, usually announcing in the lecture what is to be done for the following week (and emailing a reminder). Week 1 Seminar work and presentations No EC821 teaching (intensive math course). 2 3 4 5 6 7 8 CLASS TEST 9 10 11 12 COMPUTER-BASED COURSEWORK PROJECT 10 4 CLASS EXERCISES These exercises and those from the Computer Exercises are for class discussion. These may be supplemented by questions from the textbooks and from past class test or examination questions. You will be asked to attempt specific questions each week for discussion in the next class. Make a note of what you are required to do on the Class Planner. Do not be discouraged if you cannot complete some exercises since it is normally the case that students have difficulty in doing so at the first attempt. If you are unable to complete some exercises, do see the lecturer for help after you have read the relevant sections of the textbooks, seek clarification in class or discuss solutions with fellow students. The aim is to help you test your understanding, to guide you in your reading and to provide practice in the type of questions you may expect in the examination. Q1. a) Suppose you are asked to conduct a study to determine whether deficiency in English lead to lower wages for immigrants. Suppose you are given observational data on a large sample of the working-age immigrants in the UK with information on their native language (state or private) and country of birth. Would you expect a positive or negative correlation between English deficiency and wages? b) Would a negative correlation necessarily mean that not being a native English speaker causes lower wages? Explain. c) If you could conduct any experiment you want, what would you do? Be specific. Q2. a) What is meant by the statement that an estimator is unbiased? b) What is the difference between an endogenous variable and an exogenous variable? c) Under what assumptions is ordinary least squares an unbiased estimator? Q3. Estimates of a model for the demand for Maltese exports (data from Mukherjee et al.) are given below. LE = log of Maltese exports (US, current prices) LWD = log of a measure of world demand LP = log of the price of Maltese exports Ordinary Least Squares Estimation ******************************************************************************* Dependent variable is LE 27 observations used for estimation from 1963 to 1989 ******************************************************************************* Regressor Coefficient Standard Error T-Ratio[Prob] INT -3.6341 2.7185 -1.3368[.194] LWD .41156 .59068 .69676[.493] LP 1.4300 .28199 5.0710[.000] ******************************************************************************* R-Squared .88289 R-Bar-Squared .87313 S.E. of Regression .29674 F-stat. F( 2, 24) 90.4691[.000] Mean of Dependent Variable 5.2197 S.D. of Dependent Variable .83311 Residual Sum of Squares 2.1133 Equation Log-likelihood -3.9190 Akaike Info. Criterion -6.9190 Schwarz Bayesian Criterion -8.8627 DW-statistic .50400 11 a) Interpret the coefficients of LWD and LP. Comment on their signs. b) Test the hypothesis that the true parameter of LWD is equal to one. c) Test the significance of LWD. d) Test the hypothesis that the true parameter of LP is zero. e) Interpret the value of R-Squared. f) Examine the graph below. Does it look as though the classical assumptions are met? Plot of Residuals and Two Standard Error Bands 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 1963 1968 1973 1978 1983 1988 1989 Years Q4. A model of the demand for wine uses the following variables: LW = the natural logarithm of sales of wine LY = the natural logarithm of real per capita income LP = the natural logarithm of the price of wine S2, S3 and S4 are seasonal dummy variables for quarters 2,3 and 4 respectively. The data set has 44 quarterly observations for the period 1980 quarter 1 to 1990 quarter 4. a) Interpret the coefficient estimates for Model I. Which coefficients are significantly different from zero at a 5% significance level? Interpret the value of R2 for this model. b) Model I was estimated for the period 1980Q1 to 1985Q4 and the Chow test gave a value of 6.9. What does this show? c) Three seasonal dummy variables were added to give Model II. (i) Interpret the coefficients of these variables. In which quarter are wine sales estimated to be highest? (ii) Test the joint significance of the dummy variables. 12 TABLE 1: RESULTS FOR MODEL I Ordinary Least Squares Estimation ****************************************************************************** Dependent variable is LW 44 observations used for estimation from 80Q1 to 90Q4 ****************************************************************************** Regressor Coefficient Standard Error T-Ratio[Prob] C 2.1170 .12078 17.5277[.000] LY .56078 .10049 5.5805[.000] LP -.031500 .27157 -.11599[.908] ****************************************************************************** R-Squared .46870 F-statistic F( 2, 41) 18.0845[.000] R-Bar-Squared .44278 S.E. of Regression .067881 Residual Sum of Squares .18892 Mean of Dependent Variable 2.7720 S.D. of Dependent Variable .090935 Maximum of Log-likelihood 57.4804 DW-statistic 1.8717 ****************************************************************************** TABLE 2: RESULTS FOR MODEL II Ordinary Least Squares Estimation ****************************************************************************** Dependent variable is LW 44 observations used for estimation from 80Q1 to 90Q4 ****************************************************************************** Regressor Coefficient Standard Error T-Ratio[Prob] C 2.3563 .074266 31.7275[.000] LY .42385 .064480 6.5733[.000] LP -.13756 .12735 -1.0801[.287] S2 -.12468 .012853 -9.7006[.000] S3 -.14616 .012890 -11.3387[.000] S4 -.034041 .017282 -1.9697[.056] ****************************************************************************** R-Squared .90323 F-statistic F( 5, 38) 70.9349[.000] R-Bar-Squared .89049 S.E. of Regression .030092 Residual Sum of Squares .034410 Mean of Dependent Variable 2.7720 S.D. of Dependent Variable .090935 Maximum of Log-likelihood 94.9458 DW-statistic 1.3238 ****************************************************************************** Q5. The following Stata output is based on a random sample of male graduates (i.e. with at least a first degree) in the UK aged 25-55 and in full-time employment. Model A Source | SS df MS -------------+-----------------------------Model | 12.530607 5 2.50612139 Residual | 88.2395447 527 .167437466 -------------+-----------------------------Total | 100.770152 532 .189417578 Number of obs F( 5, 527) Prob > F R-squared Adj R-squared Root MSE = = = = = = 533 14.97 0.0000 0.1243 0.1160 .40919 -----------------------------------------------------------------------------lrhrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .1039071 .020438 5.08 0.000 .0637571 .144057 agesq | -.0011493 .000259 -4.44 0.000 -.001658 -.0006405 highrdeg | .0842088 .0415832 2.03 0.043 .0025195 .165898 london | .1327832 .060486 2.20 0.029 .0139599 .2516066 se | .0988441 .0507766 1.95 0.052 -.0009052 .1985935 _cons | .5586906 .3897368 1.43 0.152 -.2069378 1.324319 ------------------------------------------------------------------------------ where lrhrwage is the natural logarithm of real hourly wage, age is age and agesq is age squared, highrdeg equals one if the respondent holds a higher degree and zero otherwise, london and se are indicators for living in London and the Southeast region (excluding London) respectively. 13 a) What is the interpretation of the coefficient for the term _cons in the Stata output? Provide an interpretation of the coefficient on highrdeg. Which region in the UK has the lowest expected wage for male graduates? b) Briefly comment on the statistical significance of each regressor. Are the regressors statistically significant jointly? c) What is the expected log real hourly wage of a 40-year old male graduate who has a higher degree and lives in London? At what age is his wage expected to peak? Q6. The diagnostic tests for the model of question 5 are given below (NB: Q5 & Q6 are adapted from Class Test of 2011/12). a) . estat ovtest Ramsey RESET test using powers of the fitted values of lrhrwage Ho: model has no omitted variables F(3, 524) = 2.35 Prob > F = 0.0720 . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lrhrwage chi2(1) = 6.82 Prob > chi2 = 0.0090 b) In Model B below, the region dummies are left out. Compare the goodness-of-fit of the two models. Use a formal statistical test to determine whether the region dummies are jointly significant. [15] Model B Source | SS df MS -------------+-----------------------------Model | 11.2606921 3 3.75356404 Residual | 89.5094595 529 .169205028 -------------+-----------------------------Total | 100.770152 532 .189417578 Number of obs F( 3, 529) Prob > F R-squared Adj R-squared Root MSE = = = = = = 533 22.18 0.0000 0.1117 0.1067 .41135 -----------------------------------------------------------------------------lrhrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .1012471 .0204911 4.94 0.000 .0609933 .141501 agesq | -.0011168 .0002597 -4.30 0.000 -.001627 -.0006066 highrdeg | .0773654 .0416706 1.86 0.064 -.0044948 .1592255 _cons | .6397865 .3901326 1.64 0.102 -.1266128 1.406186 ------------------------------------------------------------------------------ Q7. Discuss the implications of a structural break for least squares estimation when pooling survey data from two different years into a larger sample. Show that a test for the joint significance of the intercept and slope dummies in the pooled specification is equivalent to the Chow test. 14 Q8. a) b) c) A simple consumption function model has been estimated using annual UK data for the period 1959 to 1987 inclusive. The dependent variable is real consumer expenditures (rcons). The explanatory variable is real personal disposable income (rpdi). Interpret the coefficient of real personal disposable income. What is the corresponding income elasticity (evaluated at the sample mean)? Test the individual significance of the variable(s) and the joint significance of the model. What is the relationship between these two measures in this simple model? This model was then estimated for the period 1959-1970 and 1971-1987 separately (see STATA results). Is there evidence of a structural break? . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------year | 29 1973 8.514693 1959 1987 rcons | 29 170.2025 32.26553 118.547 238.46 rpdi | 29 188.588 36.92147 124.964 252.185 pcons | 29 .4405449 .3301454 .137464 1.08375 For the sample as a whole . reg rcons rpdi Source | SS df MS -------------+-----------------------------Model | 28700.884 1 28700.884 Residual | 448.912005 27 16.6263706 -------------+-----------------------------Total | 29149.796 28 1041.06414 Number of obs F( 1, 27) Prob > F R-squared Adj R-squared Root MSE = 29 = 1726.23 = 0.0000 = 0.9846 = 0.9840 = 4.0775 -----------------------------------------------------------------------------rcons | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------rpdi | .8671409 .0208709 41.55 0.000 .8243174 .9099644 _cons | 6.670154 4.008166 1.66 0.108 -1.553924 14.89423 -----------------------------------------------------------------------------For the period 1959-1970 . reg rcons rpdi if year<=1970 Source | SS df MS -------------+-----------------------------Model | 1661.60442 1 1661.60442 Residual | 17.5544905 10 1.75544905 -------------+-----------------------------Total | 1679.15891 11 152.65081 Number of obs F( 1, 10) Prob > F R-squared Adj R-squared Root MSE = = = = = = 12 946.54 0.0000 0.9895 0.9885 1.3249 -----------------------------------------------------------------------------rcons | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------rpdi | .8408455 .0273304 30.77 0.000 .7799495 .9017415 _cons | 11.39298 4.148099 2.75 0.021 2.15044 20.63552 -----------------------------------------------------------------------------For the period 1971-1987 . reg rcons rpdi if year>=1971 Source | SS df MS -------------+-----------------------------Model | 6495.60804 1 6495.60804 Residual | 361.249078 15 24.0832719 -------------+-----------------------------Total | 6856.85712 16 428.55357 Number of obs F( 1, 15) Prob > F R-squared Adj R-squared Root MSE = = = = = = 17 269.71 0.0000 0.9473 0.9438 4.9075 -----------------------------------------------------------------------------rcons | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------rpdi | .9567709 .058258 16.42 0.000 .8325968 1.080945 _cons | -13.13152 12.58361 -1.04 0.313 -39.95285 13.68981 15 Q9. a) b) c) Q10. What is the implication of pure autocorrelation? What distinguishes Durbin’s h test from the Durbin-Watson d test? What are the advantages of the Lagrange Multiplier (LM) test over the traditional Durbin-Watson test? The following Stata output is based on an OLS regression over the quarterly period 1963Q1 to 1977Q4 (n=60) xi: reg lrcons lry i.season if _n<=60 i.season _Iseason_0-3 (naturally coded; _Iseason_0 omitted) Source | SS df MS Number of obs = 60 -------------+-----------------------------F( 4, 55) = 551.40 Model | .694516705 4 .173629176 Prob > F = 0.0000 Residual | .01731884 55 .000314888 R-squared = 0.9757 -------------+-----------------------------Adj R-squared = 0.9739 Total | .711835545 59 .012065009 Root MSE = .01775 -----------------------------------------------------------------------------lrcons | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lry | .9082043 .0207827 43.70 0.000 .8665549 .9498538 S1 | -.0626869 .0065651 -9.55 0.000 -.0758436 -.0495303 S2 | -.039448 .0064995 -6.07 0.000 -.0524733 -.0264227 S3 | -.0249351 .0064869 -3.84 0.000 -.0379352 -.0119351 _cons | .9362644 .2274578 4.12 0.000 .4804287 1.3921 ------------------------------------------------------------------------------ where lrcons and lry are the log of the real total consumer spending and the log of real personal disposable income respectively. Sj denotes the dummy variable for quarter j (j=1, 2, 3). The Residual Sum of Squares (RSS) is equal to 0.01732. a) Provide an interpretation of the coefficient for lry. Is it statistically significant? How would you calculate the marginal propensity to consume? b) Provide an interpretation of the coefficients for S1-S3. In which quarter is total consumer spending highest? Test for the overall significance of the sample regression. c) This model was then estimated for the (post-sample) period 1978Q1-1987Q4 (n=40) before the two samples were pooled together (i.e. 1963Q1-1987Q4, n=100). The resulting residual sum of squares are 0.01679 and 0.04788 respectively. Is there evidence of a structural break? d) Comment on the Durbin-Watson test reported below (NB: the number of regressors k in the Durbin-Watson table excludes the constant term). How would you test for autocorrelation if the lagged value of lrcons was included as a regressor? . estat dwatson Durbin-Watson d-statistic (5, 60) = 1.737572 Q11. a) Further diagnostic tests for the model in Question 10 are undertaken. Comment on the appropriateness of the following diagnostic tests and discuss the implications. . estat bgodfrey, lags(4) Breusch-Godfrey LM test for autocorrelation --------------------------------------------------------------------------lags(p) | chi2 df Prob > chi2 -------------+------------------------------------------------------------4 | 11.001 4 0.0266 --------------------------------------------------------------------------H0: no serial correlation 16 . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lrcons chi2(1) Prob > chi2 = = 0.19 0.6663 . estat ovtest Ramsey RESET test using powers of the fitted values of lrcons Ho: model has no omitted variables F(3, 52) = 1.02 Prob > F = 0.3926 b) An alternative to the log-log specification used in Question 9 is a linear regression relating real consumption to real income. In your opinion, how should you choose between these two alternative specifications? Q12. Briefly explain the implications of the following problems and describe how you would deal with them: a) Near-perfect multicollinearity b) Omission of relevant variables (underfitting) Q13. The following Stata output is based on an OLS estimation of returns to a degree relative to 2 or more A Levels using a random sample of male employees in England from the 1996 UK Quarterly Labour Force Survey. Model A: 1996 Sample . reg logwage degree exp expsq if lfsyear==1996 Source | SS df MS -------------+-----------------------------Model | 33.7828767 3 11.2609589 Residual | 192.560253 946 .203552065 -------------+-----------------------------Total | 226.34313 949 .238506986 Number of obs F( 3, 946) Prob > F R-squared Adj R-squared Root MSE = = = = = = 950 55.32 0.0000 0.1493 0.1466 .45117 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------degree | .2885419 .0318528 9.06 0.000 .2260316 .3510521 exp | .0867526 .0127547 6.80 0.000 .0617218 .1117833 expsq | -.0010543 .0001783 -5.91 0.000 -.0014043 -.0007044 _cons | .7414799 .2195963 3.38 0.001 .3105276 1.172432 ------------------------------------------------------------------------------ where logwage is the natural logarithm of real hourly wage, degree is equal to one if the respondent has a degree and zero if he/she only has two or more A Levels, exp is years of potential working experience and expsq is exp squared. The Residual Sum of Squares (RSS) is equal to 192.560. a) What is the interpretation of the constant term (_cons in the Stata output)? Provide an interpretation of the coefficients for degree, exp and expsq. Are these three slope coefficients statistically significant individually? Are they statistically significant jointly? How good does the model fit the data? [30%] b) A second model is estimated using the 2006 UK Quarterly Labour Force Survey (in Model B below). The resulting RSS is equal to 163.906. Comment on the estimates and compare them with the corresponding figures for 1996. [25%] 17 Model B: 2006 Sample . reg logwage degree exp expsq if lfsyear==2006 Source | SS df MS -------------+-----------------------------Model | 40.0258288 3 13.3419429 Residual | 163.905563 794 .20643018 -------------+-----------------------------Total | 203.931392 797 .255873767 Number of obs F( 3, 794) Prob > F R-squared Adj R-squared Root MSE = = = = = = 798 64.63 0.0000 0.1963 0.1932 .45435 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------degree | .3804167 .0340784 11.16 0.000 .3135222 .4473112 exp | .0924649 .0135285 6.83 0.000 .0659091 .1190208 expsq | -.0011228 .0001794 -6.26 0.000 -.001475 -.0007707 _cons | .6150071 .2430529 2.53 0.012 .1379049 1.092109 ------------------------------------------------------------------------------ c) A third model is estimated by pooling data from 1996 and 2006. The resulting RSS is equal to 358.223. Discuss whether this new model is justified. [25%] Model C: Pooled Sample . reg logwage degree exp expsq Source | SS df MS -------------+-----------------------------Model | 73.1637007 3 24.3879002 Residual | 358.222973 1744 .205403081 -------------+-----------------------------Total | 431.386674 1747 .246929979 Number of obs F( 3, 1744) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1748 118.73 0.0000 0.1696 0.1682 .45321 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------degree | .33036 .0232567 14.20 0.000 .284746 .3759741 exp | .0893459 .009078 9.84 0.000 .0715411 .1071508 expsq | -.0010824 .0001236 -8.76 0.000 -.0013247 -.00084 _cons | .6794776 .1597029 4.25 0.000 .3662484 .9927069 ------------------------------------------------------------------------------ d) Comment on the following diagnostic tests and discuss their implications. [20%] Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logwage chi2(1) = 18.01 Prob > chi2 = 0.0000 Ramsey RESET test using powers of the fitted values of logwage Ho: model has no omitted variables F(3, 1741) = 0.85 Prob > F = 0.4663 Q14. a) b) c) Briefly discuss the following: Structural break and the Chow-test. The general-to-specific modelling approach. Omitted variable bias and the RESET test. 18 5 COMPUTER PRACTICALS The purpose of these exercises is to help you understand the material delivered in the lectures by actually doing econometrics. Specialized software packages have been written to help one do econometrics. We will use the package STATA 12. The first computer class introduces you to Stata. The remainder involves undertaking class based exercises that provide “hands-on” implementation of econometric techniques reviewed in the formal lecture programme. Answering the exercises involves writing a program in Stata, running it and commenting on the results. Skeleton Stata programs (known as do files) for the respective exercises can be found on Moodle under the directory “Computer Practicals”. You may not have time to complete the exercises in class, in which case you should do so in your own time. The results of each exercise will be presented by groups of students and discussed by the class in a seminar. The data files are located in the directory “Computer Practicals” on Moodle. For a review of Stata features and commands, consult the webbook titled “Introduction to Stata 8” available on Moodle. A complete set of STATA 9 documentation (12 volumes), together with books on programming with STATA, are available in the library. Additional resources on STATA, such as FAQs, e-tutorials, webbooks and even movies can be accessed through StataCorp’s website at http://www.stata.com/. You will also find a full set of STATA 7 reference manual in the Economics General Office. COMPUTER EXERCISE I(A): Getting Started with Stata The data file is auto.dta. 1) Start STATA Log on to Moodle, then go to the directory Computer Practicals. Double click on the data file auto.dta which should start Stata. 2) 3) Save windowing preferences Adjust window sizes and location Click Prefs (on the menu bar) → Save Windowing Preferences Familiarize yourself with the various windows (Review, Variables, Results, Command and Data Editor) in STATA. 4) Open log file Click File → Log → Begin, and then Select output folder “Z:\ec821\” (you need to click the New Folder button to create the new folder ec821 if it does not already exist); then Type filename comp1a and save as type log. Alternatively Type log using "Z:\EC821\comp1a.log", replace (if the folder ec821 already exists) Describe Data Click Data → Describe Data; alternatively Type describe in the Command Window 5) 19 6) 7) Summary Statistics Click Statistics → Summaries, tables & tests → summary statistics → summary statistics; alternatively Type summarize (or simply su) in the Command Window Graphs Click Graphics → Simple Graphs → Scatter Plot (Xvariable: weight, Yvairable: price); alternatively Type graph twoway scatter price weight in the Command Window Save data Click File → Save As …; alternatively Type save “Z:\ec821\autonew” in the Command Window Exit Click File → Exit Type exit in the Command Window 8) 9) 10) Repeat this exercise with the help of the handout 11) Load the do file exer1a.do Click Window → Do-file Editor o Click File → Open 12) Modify the do file as you wish (especially the file path in the log command) 13) Do/Run the do file by clicking the appropriate icons 14) Exit STATA and check the datafile and log file you have just saved on your home folder. COMPUTER EXERCISE 1(B): To reinforce what you have learnt in this exercise, you should attempt this supplementary exercise on your own after class. Work in pairs if you can. One of you should work with census5.dta while the other with hsng.dta, both from the same data directory. Produce summary statistics and scatter diagrams. Save the data for future use. Discuss your problems and findings with your mates. 20 COMPUTER EXERCISE 2: AN EXPERIMENT In this exercise you will each use the same values for the independent variable (X) and create some observations for the dependent variable (Y) by adding a random disturbance (u) to the deterministic part of the model. The parameter values are known, but you will then use the constructed data to estimate the parameters by Ordinary Least Squares (OLS) and compare the estimated values with the true values. 1. LOAD STATA Click Start → Programs → Central Software → STATA; alternatively Log on to Moodle, then go to the directory Computer Practicals. Double click on the data file auto.dta which should start Stata. Then type clear in the Command Window 2. INVOKE STATA’S SPREADSHEET-LIKE DATA EDITOR Click Windows → Data Editor, or click the Data Editor button directly, or type the command edit Type values 10, 20, 30, 40, 50 in columns STATA automatically calls var1 Assign more informative variable name by double-clicking on the column heading of var1 Type a new name x in the resulting dialogue box Create variable label that contains a brief description, such as indep var Click OK to close the dialogue box Click X (top right corner of the Data Editor window) to close the Editor Check you have created the right dataset (with 5 observations and 1 variable) by using describe and list 3. CREATING RANDOM SAMPLE Type the following commands: set seed xxxx // where xxxx is any positive integer, which specifies the initial value of the random number (RN) seed used by the uniform() function. Explicitly setting the seed number makes it possible to later reproduce the same “random” numbers. generate randnum = uniform() // creates uniformly distributed RNs over the interval [0,1) generate v = invnorm(randnum) // creates a standard normal distribution, v~N(0,1) generate u=5*v // u~N(0, 5) generate y=100 + 0.7*x + u // creates values for the dependent variable Y with known values for the intercept (100) and slope (0.7). Click the DATA Editor button to check the new variables (or type list in the command window). Graphs Click Graphics → Simple Graphs → Scatter Plot (Xvariable: x, Yvairable: y); alternatively Type graph twoway scatter y x in the Command Window. 4. ESTIMATE A MODEL Click Statistics → Linear regression and related → Linear Regression (Dependent variable: y, Independent vairable: x); alternatively 21 Type regress y x in the Command Window. to estimate the model Yt X t ut 5. SAVE YOUR RESULTS TO AN OUTPUT FILE Highlight the summary statistics and regression results Right Click Copy text Read the file into a word processor, such as Word or Notepad, in which it may be edited and printed. You may want to choose a reduced font size - for example 8pt - to fit the results to a page width. Save the file as Z:\login initials\EC821\compex2.doc. Alternatively type in the command window: log using "Z:\Login initials\EC821\compex2.log", replace list sum reg // reg typed without arguments redisplays results log close to write the results to a log file. There is no need to save the data since it will not be used again. Print your output file and bring them to the next lecture. DISCUSSION 1. This is a simple Monte Carlo exercise. The data generating process is known and satisfies all the classical assumptions. The model used is Yt X t ut , t = 1,2,...,5 where the X values are constant (fixed in repeated sampling), the parameters are known (100 and 0.7) and values for the random disturbance are generated by the random normal command, giving an independent random sample of 5 values from a normal distribution with mean zero and standard deviation 5. 2. The objective is to calculate many estimates of a parameter and to construct an empirical sampling distribution for that parameter. We may then be able to judge the accuracy of the OLS estimator since the true value of the parameter is known. 3 For the OLS estimator the properties of the estimator can be derived without the necessity for a Monte Carlo experiment - the nature of the sampling distribution is known. We will compare our empirical sampling distribution with this "theoretical" distribution in the next lecture. QUESTIONS C2 1. Compare your estimate of , the slope parameter, with that of another student. Explain why the two sets of results are different. 2. Compare your estimate of , the slope parameter, with the true value. Explain why your estimate differs from the true value. 3. Suppose 100 students performed this exercise and found a 95% confidence interval for . On average, how many times would the confidence interval not include the true value? 22 COMPUTER EXERCISE 3: A simple consumption function Introduction The aim of this exercise is to estimate a simple consumption function for the U.K. using annual data. The data is in a file named comp3.dta. The variables you will use are as follows: rcons = Real consumption expenditure rpdi = Real personal disposable income A. ESTIMATING A MODEL BY ORDINARY LEAST SQUARES 1. Load STATA 2. Load the datafile comp3.dta 3. Declare the data to be a time series Type: tsset year 4. Generate new variables: gen apc=rcons/rpdi {Creates a new variable - what is its interpretation?) tsline apc, xlab(1959 1962 to 1987) {Shows a plot of APC over time - how has it changed?} 5. Estimate a model by Ordinary Least Squares. The dependent variable is rcons. The independent variables are rpdi. Save the results, in the log file comp3.log log using “Z:\login initial\EC821\comp3.log”, replace 6. Generate the fitted values and residuals. predict rconshat {create the fitted values} gen rconsres = rcons - rconshat {create the residuals} tsline rcons rconshat rconsres {plotting the actual and fitted values, as well as the residuals}. B. ESTIMATE THE MODEL IN LOGS gen lrcons = log(rcons) {The new variable is the natural log of RCONS} gen lrpdi = log(rpdi) {The new variable is the natural log of RPDI} Estimate a model by Ordinary Least Squares with lrcons as the dependent variable and lrpdi as independent variable. Also try plotting the actual and fitted values. 23 Save the results, in the log file comp3.log log using “Z:\login initial\EC821\comp3.log”, append NB: the “append” option specifies that results are to be appended onto the end of an already existing file. C. SAVE THE DATA TO YOUR HOME FOLDER Use file/save to save the data in a file called comp3out.dta This saves the original data and any new variables, such as lrcons Print your results (log file) before exit from STATA. N.B. The RESULTS are in the file comp3.log and the DATA in a special STATA datafile comp3out.dta. QUESTIONS C3 1. For the linear model (a) Test the hypothesis that the marginal propensity to consume (mpc) is equal to 0.7. (b) Examine the residuals. Do they suggest a failure of any of the basic assumptions? (c) Suggest at least one possible explanation for the failure of the basic assumptions. (d) Interpret the value of R2 for the linear model. 2. For the log-log model (also known confusingly as the log-linear model in some textbooks) (a) Interpret the coefficients of the log model. What is the mpc for this model? (b) Test the hypothesis that elasticity of consumption with respect to income is unity. (c) Explain what R2 shows for the log model. 24 COMPUTER EXERCISE 4: Production Function The file prodfun from Mukherjee, C., White, H. and M. Wuyts, 1997 (MWW), which is in the module folder, is to be used to illustrate tests of linear restrictions and the Chow test. The data is cross-section data for a developing country for two manufacturing sectors. The variables are: LQ = log(Output); LK = log(capital stock); LN = log(employment) D = 0 if from manufacturing sector A and = 1 if from manufacturing sector B (a dummy variable) _______________________________________________________ The first part of the exercise is to estimate a Cobb-Douglas production function (equation A) and to compare this with a similar function with constant returns to scale assumed (equation B). The second part of the exercise is to allow the parameters to differ between the two manufacturing sectors, so the general Cobb-Douglas function is estimated separately for each sector (equations C and D). The final part is to use slope and intercept dummy variables as an alternative way of obtaining distinct estimates for each sector as in C and D (equation E). Obtain OLS estimates of the following equations (NB for equations B and E you need to create some new variables before estimation e.g. LQCR=LQ-LN; LKCR=LK-LN; LKD=LK*D;LND=LN*D) A. LQi = 1 + 2 LNi + 3 LK i + i B. ( LQi - LNi ) = 1 + 3 ( LK i - LNi ) + i The first 42 observations are for sector A and observations 43 to 83 are for sector B, so estimate separately for the two sectors by simply setting the sample appropriately: C. Estimate Equation A for observations 1 to 42. D. Estimate Equation A for observations 43 to 83. To use dummy variables to obtain C and D in a single equation (create the necessary new variables first) estimate: E. LQi = 1 + 2 LNi + 3 LK i + 4 Di + 5 ( LNi * Di ) + 6 ( LK i * Di ) + i QUESTIONS C4 (Reading MWW pages 229-233 might be useful): 1. Interpret the parameters of equation A and estimate the returns to scale parameter. 2. Show that equation B can be derived from A by assuming constant returns to scale. 3. Test the validity of the constant returns to scale restriction. 25 4. Using residual sums of squares from A, C and D perform a Chow test for identical parameters for each of the two sectors. 5. Show that equation E gives the same coefficients for each sector as C and D. 6. Use the residual sums of squares from A and E to perform the test of common parameters for the two sectors. Compare this with the test in question 4. 26 COMPUTER EXERCISE 5: An Aid Model Use the file aidsav.dta in the module folder, which has 1987 cross-section data for 66 developing countries on savings (S), aid (A) and income per capita (Y, all in $US) from Mukherjee, C., White, H. and M. Wuyts, 1997 (MWW) to replicate the results in the MWW textbook, pages 209-211. Check the data for missing values and outliers. Is there a good reason to exclude some countries with very high income per capita from the sample? Estimate the following models (N.B: You need to create the new variables, S etc in the processing Y screen): (A) S A = 1 + 2 + i Y i Y i (B) S A 1 = 1 + 2 + 3 + i Y i Y i Y QUESTIONS C5 Answer the following questions (Reading MWW pages 208/211 might be useful) 1. Describe your sample selection criterion. 2. Interpret the coefficients from both models 3. Compare the estimates of 2 from the two models. Why might the estimate from (A) be biased? 4. Determine whether 1 should be included as in model (B), both on theoretical grounds and Y statistically (using a significance test). 5. How can you improve the model (Hints: functional form and dummy variables) 27 COMPUTER EXERCISE 6: Diagnostic Tests You are asked to estimate and carry out diagnostic tests for a cross-sectional data set. Please save relevant results, print them and bring them to lectures. We will discuss the results and the answers to the questions in lectures. 1 Load the data file from comp6.dta into STATA. 2 The data is from Stewart and is for 24 grouped observations from the UK Family Expenditure Survey on total household expenditure (EXTOTAL), the number of children in the household (NCHILD), household expenditure on food (EXFOOD) and the number of households in each group (NFAM). 3 Estimate the models (A) to (C) below. N.B. YOU HAVE TO CREATE THE LOG VARIABLES. (A) LEXFOODi = 0 + 1 LEXTOTALi + ui (B) LEXFOODi = 0 + 1 LEXTOTALi + 2 NCHILDi + ui where) the L prefix indicates the logarithm of the corresponding variable: for example LEXFOOD = log(EXFOOD). (C) EXFOODi = 0 + 1 EXTOTALi + 2 NCHILDi + ui QUESTIONS C6 1. Interpret the coefficients of models (A) and (B) and test their individual significance. Compare the LEXTOTAL coefficients in the two models. 2. Explain why “omitted variable bias” may affect the results from (A). 3. Use specification plots (rvfplot or rvpplot) to find any patterns in the residuals in each of the three models. 4. Using the diagnostic tests, test for heteroscedasticity and functional form misspecification for each of the models. Is there evidence that a linear function (i.e. Model (C)) is inappropriate? 28 COMPUTER EXERCISE 7: Dynamic Models No new features in STATA are used, apart from the specification of lagged variables. The data is in the file comp7.dta The variables are: RCONS = real consumers’ expenditure (billion, 1985 prices). RPDI = real personal disposable income (billion, 1985 prices). RLIQ = real liquid assets of the personal sector (billion, 1985 prices). Declare the data to be a quarterly time series: . gen time = quarterly(date, "yq") . format time %tq . tsset time, quarterly Estimate the following models and save the results, including the diagnostic tests. (A) LRCONSt = 0 + 1 LRPDIt + 2 LRLIQt + ut where LRCONSt = log(RCONSt) etc. (B) LRCONSt = 0 + 1 LRPDIt + 2 LRLIQt + 3 LRCONSt-1 + ut (C) LRCONSt = 0 + 1 LRPDIt + 2 LRLIQt + 3 LRCONSt-4 + 4 LRPDIt-4 + ut N.B. Lagged variables can be entered directly using STATA lag operator (e.g L4.LRCONS=LRCONSt-4, without having to create them in the command window. QUESTIONS C7 1. Which of the coefficients in model (B) are significantly different from zero? 2. Carry out tests for autocorrelation, heteroscedasticity and functional form for each of the models. 3. What are the consequences for the OLS estimates of the results of the diagnostic tests for model (B)? 4. Calculate estimated short-run and long-run income elasticities for models (B) and (C) 29 COMPUTEREXERCISE 7B: General-to-Specific Approach This brief exercise illustrates the principle of the general-to-specific approach. The dataset to be used is on the network server in the file comp7b.dta. (a) Estimate the following autoregressive model for income: logYt = 0 + 1logYt-1 + 2logYt-2 + 3logYt-3 + 4logYt-4 + ut You will need to transform the original variables into the natural log form before generating lags. The Stata lag operator could be helpful here. For instance, gen lrpdi_2 = L2.lrpdi generates the second lag of lrpdi, i.e. lrpdit-2. (b) Eliminate the least significant lag and re-estimate. Repeat this process until all remaining lags are significant (at 5%). Test the final specification against the original model to see if the restrictions imposed are jointly valid (NB: the STATA command sw performs both forward and backward stepwise estimation). 30 COMPUTER EXERCISE 8: Instrumental Variable Estimation I will illustrate how to use ivregress (2sls) with a classical study of male wages (Griliches JPE 1976). Griliches models log real wage as a function of: s: years of schooling; exper: years of experience; rns: South dummy; smsa: urban/rural dummy tenure: years of tenure and a set of year dummies since the data are a set of pooled cross sections. The suspected endogenous variable is iq (the worker’s IQ score), which is believed to contain measurement error. Load the dataset comp8.dta into Stata. 1) Estimate Two-Stage Least Squares (2SLS), instrumenting iq on med, kww, age and mrt (mother’s level of education, the score on another standardized test, own age and own marital status), and test for over-identifying restrictions: ivregress 2sls lw s expr tenure rns smsa _Iyear* (iq=med kww age mrt), first 2) Rerun 2SLS, but only using med, kww as instruments while treating mrt as exogenous, and test again for over-identifying restrictions. ivregress 2sls lw s expr tenure rns smsa _Iyear* mrt (iq=med kww), first 3) Carry out the Hausman test for endogeneity in IV estimation. quietly ivregress 2sls lw s expr tenure rns smsa _Iyear* mrt (iq=med kww) estimates store iv quietly reg lw s expr tenure rns smsa _Iyear* mrt iq estimates store ols hausman iv ols, constant sigmamore QUESTIONS C8 1. Compare your IV (2SLS) and OLS estimates. Comment on the differences in coefficients. 2. Are you convinced that your instruments are both relevant and exogenous? 31 COMPUTER EXERCISE 9: The Computer-based coursework Project The main purpose of this exercise is to familiarize yourself with the dataset to be used in the project, which is a 20% random sample of working-age men in England from the UK Quarterly Labour Force Survey (QLFS). You should make an attempt to obtain a consistent estimator when one or more of your regressors are correlated with the error term, using the instrumental variable (IV) approach. The length of the computer-based coursework project is 1,500 words, plus an appendix up to 5 page long containing summary statistics and main estimation results. The work should be submitted to the Economics General Office no later than 12.00 on Friday 24th January 2014. In fairness to those who meet the due date and time, no work will be accepted after this time and a zero grade will be recorded unless there are acceptable, documented medical or other reasons for late submission. You are advised to begin your work for the assignment well before the end of term. The computer-based coursework project assesses the writing, modelling, literature, computing, interpretation and empirical research development learning outcomes. You are expected to select and estimate your own model, interpret the results and evaluate the adequacy of your model. You are not expected to undertake a comprehensive search for an adequate model. The work will be marked on the quality of your interpretation and evaluation of your chosen model, not necessarily on the success of finding a valid instrument. Here are a few general tips: Motivate your paper with a brief literature review Include a data section with summary statistics for the key variables Check for outliers and inconsistencies before running regressions Run diagnostic tests after regression to assess the validity of the empirical model Interpret the empirical findings and discuss the policy implications if necessary Carry out sensitivity (robustness) checks if possible Summarize your findings Don’t forget your references Chapter 17 of Wooldridge (2009) offers a nice guide on how to carry out an empirical project. * 1) Create and save a 50% personalized random sample using a unique seed number (such as your date of birth) use samp821, clear set seed xxxxxx // e.g. 880301 if you were born on the 1st March 1988 sample 50, by(lfsyear nvqequiv) // create a 50% random sample within each by group save sample50, replace * 2) Check for outliers and inconsistencies tab lfsyear nvqequiv count if logwage==. // find the number of observations with missing real hourly wages codebook logwage egen lgwgpc1 = pctile(logwage), p(1) by(nvqequiv) egen lgwgpc99 = pctile(logwage), p(99) by(nvqequiv) 32 table lfsyear nvqequiv, c(mean logwage median logwage mean lgwgpc1 mean lgwgpc99) format(%4.2f) keep if logwage>=lgwgpc1 & logwage<=lgwgpc99 // drop the top and bottom 1% wages table lfsyear nvqequiv, c(mean logwage) format(%4.2f) row col table nvqequiv highqvoc, c(mean logwage) format(%4.2f) row col gen age_sq = age_^2 // create the quadratic term for age_ tab nvqlv2 anyqual * 3) Summary statistics for the key variables su logwage nvqlv2 anyqual married cohab age_ age_sq nonwhite lim_dis lfsyear london se * 4) OLS estimation and diagonostic tests * 4a) treating qualifications as continuous xi: reg logwage nvqequiv married cohab age_ age_sq nonwhite lim_dis i.lfsyear london se if nvqequiv>=0 & nvqequiv<=2 *4b) treating qualifications as categorical or binary xi: reg logwage i.nvqequiv married cohab age_ age_sq nonwhite lim_dis i.lfsyear london se xi: reg logwage nvqlv2 married cohab age_ age_sq nonwhite lim_dis i.lfsyear london se xi: reg logwage anyqual married cohab age_ age_sq nonwhite lim_dis i.lfsyear london se * 5) Simple IV using either nvqlv2 or anyqual as the education measure xi: ivregress 2sls logwage (nvqlv2=rosla) married cohab age_ age_sq nonwhite lim_dis i.lfsyear london se, first xi: ivregress 2sls logwage (anyqual=rosla) married cohab age_ age_sq nonwhite lim_dis i.lfsyear london se, first exit 33 COMPUTER EXERCISE 10: Simultaneous Equation Systems The data used for this exercise is simeq.dta. The Klein Model Number 1 is a very simple, highly aggregated linear model for the US economy in the inter-war period. While it is not necessarily an accurate model, it is useful for pedagogical purposes. The model consists of three behavioural equations and five identities: Ct = 0 + 1Wt + 2t + 3t-1 + 1t (1) It = 0 + 1t + 2t-1 + 3Kt-1 + 2t (2) W1t = 0 + 1Et + 2Et-1 + 3t + 3t (3) Yt + Tt = Ct + It + Gt Total Product identity (4) Yt = t + Wt Income (5) Kt = It + Kt-1 Capital stock dynamics (6) Wt = W1t + W2t Wage bill (7) Et = Yt + Tt - W2t Private sector product (8) C = consumers’ expenditure I = net investment = profits K = capital stock E = private sector product G = Government expenditure 1t, 2t, 3t are serially uncorrelated error terms. W = Wages W1 = Private sector wages W2 = Government sector wages Y = income T = taxes where The three behavioural equations are: a consumption function (1) which allows for different propensities to consume from wage and profit income and allows for simple dynamics by including a lagged profits term; an investment function (2) which is cash-flow type equation typical of much early US econometric work on investment in which investment is related to current and past profits and the beginning of year (i.e. end of previous year) capital stock; an equation determining the private sector wage bill (3) as a function of current and lagged private sector product and a trend effect to capture productivity growth. The five identities close the model. Klein specifies that the variables Ct, It, W1t, Yt, t, Kt, Wt and Et are endogenous, and the remaining variables are exogenous. 1. Establish the degree of identification of each of the three behavioural equations (1), (2) and (3). 34 2. The dataset to be used in this exercise is contained in the file simeq.dta on the network server in the usual directory. This contains the required data for estimation of the above model - note that KLAG is the capital stock already lagged by one period. Estimate each behavioural equation by OLS and interpret and appraise your results. 3. Write down the reduced form equations for Wt, t and Et (in general terms - do not try to specify the reduced form parameters in terms of the structural coefficients!). Estimate the reduced form equations by OLS and re-estimate the behavioural structural equations using the fitted values of Wt, t and Et as appropriate. Compare and contrast these estimates with the OLS estimates obtained above. 4. Finally replicate these indirect 2SLS estimates by estimating each of equations (1), (2), and (3) directly by two stage least squares. QUESTIONS C10 1 Comment on the econometric specification of the model. 35 School of Economics, University of Kent Class Test for EC821 Econometric Methods, 13th November, 2012 There are TWO sections. Candidates should answer the question in Section A and one of the two questions in Section B. The percentage of marks is given in square brackets. Section A Q1 [60 marks]: The Stata output for Model A in the Appendix is based on a random sample of non-UK born female immigrants in Great Britain who have obtained post-compulsory qualifications in the UK. a) b) c) What is the expected log real hourly wage of a 50 year-old native English speaker, who has a degree and lives in Scotland? At what age is her wage expected to peak? [10] Interpret the coefficients for degree and eal respectively. [10] Are the regressors statistically significant individually? Are they statistically significant jointly? Comment on the goodness-of-fit of the model. [10] d) Comment on the following diagnostic tests for Model A. [10] i) . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logwage chi2(1) Prob > chi2 ii) . = = 2.03 0.1545 estat ovtest Ramsey RESET test using powers of the fitted values of logwage Ho: model has no omitted variables F(3, 395) = 0.40 Prob > F = 0.7550 e) In Model B in the Appendix, four new interaction terms of lse with the regressors are added (e.g. lse_age is the defined as lse*age). Explain the rationale for including these extra regressors. Compare the goodness-of-fit of the two models. Use a formal statistical test to determine whether the inclusion of the additional regressors in Model B is justified. [20] Turnover 36 Section B Q2 [40 marks]: Briefly explain the following terms: a) Omitted variable bias (OVB) [20] b) Cochrane-Orcutt Estimator [20] Q3 [40 marks]: Discuss the usefulness of the difference-in-difference (DID) approach in policy (programme) evaluation. Use an example to illustrate if necessary. End 37 Estimation Results Appendix Model A Source | SS df MS -------------+-----------------------------Model | 24.1105589 5 4.82211177 Residual | 77.4233283 398 .194530976 -------------+-----------------------------Total | 101.533887 403 .251945129 Number of obs F( 5, 398) Prob > F R-squared Adj R-squared Root MSE = = = = = = 404 24.79 0.0000 0.2375 0.2279 .44106 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0764121 .0177832 4.30 0.000 .0414513 .1113728 agesq | -.000843 .0002216 -3.80 0.000 -.0012788 -.0004073 degree | .3894523 .0454262 8.57 0.000 .3001471 .4787575 eal | -.0915466 .044975 -2.04 0.042 -.1799649 -.0031283 lse | .143367 .0461615 3.11 0.002 .0526161 .2341178 _cons | .5539594 .3430593 1.61 0.107 -.1204754 1.228394 Note: logwage is the natural logarithm of real hourly wage, age is age and agesq is age squared, degree equals one if the respondent holds any degree and zero otherwise, eal equals one if the respondent is a non-native English speaker and zero otherwise, and lse is an indicator for living in the Southeast region (including London) of England. Model B Source | SS df MS -------------+-----------------------------Model | 25.0530159 9 2.78366843 Residual | 76.4808712 394 .194113886 -------------+-----------------------------Total | 101.533887 403 .251945129 Number of obs = F( 9, 394) = Prob > F = R-squared = Adj R-squared = Root MSE = 404 14.34 0.0000 0.2467 0.2295 .44058 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .078254 .0295438 2.65 0.008 .0201708 .1363371 agesq | -.0008332 .0003643 -2.29 0.023 -.0015493 -.000117 degree | .5073835 .0787403 6.44 0.000 .3525798 .6621872 eal | -.0165463 .0767522 -0.22 0.829 -.1674415 .1343489 lse | .4374799 .7164851 0.61 0.542 -.9711321 1.846092 lse_age | -.0046711 .0371022 -0.13 0.900 -.0776142 .0682719 lse_agesq | .0000115 .0004602 0.03 0.980 -.0008932 .0009163 lse_degree | -.1794237 .0964716 -1.86 0.064 -.3690872 .0102398 lse_eal | -.1149829 .0947365 -1.21 0.226 -.3012351 .0712693 _cons | .3823179 .5750189 0.66 0.507 -.748171 1.512807 ------------------------------------------------------------------------------ Note: logwage, age, agesq, degree, eal and lse are defined as above. lse_age=lse*age, lse_agesq=lse*agesq, lse_degree=lse*degree and lse_eal=lse*eal. 38 School of Economics, University of Kent Class Test for EC821 Econometric Methods, 15th November, 2011 There are TWO sections. Candidates should answer the question in Section A and one of the two questions in Section B. The percentage of marks is given in square brackets. Section A Q1 [60 marks]: The following Stata output is based on a random sample of male graduates (i.e. with at least a first degree) in the UK aged 25-55 and in full-time employment. Model A Source | SS df MS -------------+-----------------------------Model | 12.530607 5 2.50612139 Residual | 88.2395447 527 .167437466 -------------+-----------------------------Total | 100.770152 532 .189417578 Number of obs F( 5, 527) Prob > F R-squared Adj R-squared Root MSE = = = = = = 533 14.97 0.0000 0.1243 0.1160 .40919 -----------------------------------------------------------------------------lrhrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .1039071 .020438 5.08 0.000 .0637571 .144057 agesq | -.0011493 .000259 -4.44 0.000 -.001658 -.0006405 highrdeg | .0842088 .0415832 2.03 0.043 .0025195 .165898 london | .1327832 .060486 2.20 0.029 .0139599 .2516066 se | .0988441 .0507766 1.95 0.052 -.0009052 .1985935 _cons | .5586906 .3897368 1.43 0.152 -.2069378 1.324319 ------------------------------------------------------------------------------ where lrhrwage is the natural logarithm of real hourly wage, age is age and agesq is age squared, highrdeg equals one if the respondent holds a higher degree and zero otherwise, london and se are indicators for living in London and the Southeast region (excluding London) respectively. d) What is the interpretation of the coefficient for the term _cons in the Stata output? Provide an interpretation of the coefficient on highrdeg. Which region in the UK has the lowest expected wage for male graduates? [15] e) Briefly comment on the statistical significance of each regressor. Are the regressors statistically significant jointly? [10] f) What is the expected log real hourly wage of a 40-year old male graduate who has a higher degree and lives in London? At what age is his wage expected to peak? [10] Turn over 39 d) Comment on the following diagnostic tests for Model A. [10] i) . estat ovtest Ramsey RESET test using powers of the fitted values of lrhrwage Ho: model has no omitted variables F(3, 524) = 2.35 Prob > F = 0.0720 ii) . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lrhrwage chi2(1) Prob > chi2 = = 6.82 0.0090 e) In Model B below, the region dummies are left out. Compare the goodness-of-fit of the two models. Use a formal statistical test to determine whether the region dummies are jointly significant. [15] Model B Source | SS df MS -------------+-----------------------------Model | 11.2606921 3 3.75356404 Residual | 89.5094595 529 .169205028 -------------+-----------------------------Total | 100.770152 532 .189417578 Number of obs F( 3, 529) Prob > F R-squared Adj R-squared Root MSE = = = = = = 533 22.18 0.0000 0.1117 0.1067 .41135 -----------------------------------------------------------------------------lrhrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .1012471 .0204911 4.94 0.000 .0609933 .141501 agesq | -.0011168 .0002597 -4.30 0.000 -.001627 -.0006066 highrdeg | .0773654 .0416706 1.86 0.064 -.0044948 .1592255 _cons | .6397865 .3901326 1.64 0.102 -.1266128 1.406186 ------------------------------------------------------------------------------ Section B Q2 [40 marks]: Briefly explain the following terms: c) Near-perfect multi-collinearity [20] d) Linear Probability Model (LPM) [20] Q3 [40 marks]: Discuss the consequences of underfitting a regression model (omitting relevant variables). Explain how you can test for this potential problem using a formal test. End 40 School of Economics, University of Kent Class Test for EC821 Econometric Methods, 16th November, 2010 There are TWO sections. Candidates should answer the question in Section A and one of the two questions in Section B. The percentage of marks is given in square brackets. Section A Q1 [60 marks]: The following Stata output is based on a 10% random sample of male employees aged 25-59 in the 2000 UK Quarterly Labour Force Survey (n=495). Model A: All Employees Source | SS df MS -------------+-----------------------------Model | 42.7914008 3 14.2638003 Residual | 109.259875 491 .222525203 -------------+-----------------------------Total | 152.051275 494 .307796104 Number of obs F( 3, 491) Prob > F R-squared Adj R-squared Root MSE = = = = = = 495 64.10 0.0000 0.2814 0.2770 .47173 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------edu | .1005878 .0077948 12.90 0.000 .0852725 .1159032 age | .1139737 .0225003 5.07 0.000 .069765 .1581824 agesq | -.001263 .0002707 -4.67 0.000 -.0017948 -.0007311 _cons | -1.793552 .4758444 -3.77 0.000 -2.728494 -.858609 ------------------------------------------------------------------------------ logwage is the natural logarithm of real hourly wage, edu is age left full-time continuous education, age is age and agesq is age squared. The Residual Sum of Squares (RSS) is equal to 109.260 (keeping 3 decimal places). a) What is the interpretation of the constant term (_cons in the Stata output)? Provide an interpretation of the coefficient for edu and comment on its statistical significance. What is the effect of age on log wages? What is the expected log real hourly wage of a 30-year old who left full-time continuous education at age 18? Are the regressors statistically significant jointly? [20] b) Comment on the following diagnostic tests for Model A. [15] i) . predict res1, res . sktest res1 Skewness/Kurtosis tests for Normality ------- joint -----Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2 -------------+--------------------------------------------------------------res1 | 495 0.1706 0.0030 9.88 0.0072 41 ii) . estat ovtest Ramsey RESET test using powers of the fitted values of logwage Ho: model has no omitted variables F(3, 488) = 0.37 Prob > F = 0.7742 iii) . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logwage chi2(1) Prob > chi2 = = 0.31 0.5797 c) A researcher argues that separate regressions should be run for people who are members of the trade-union (n1=146) and those who are not (n2=349), on the grounds that the trade unions have a direct impact on wages through collective wage bargaining. The resulting residuals sum of squares (RSS) are 16.955 and 89.099 for members and non-members respectively. On the basis of the available evidence, do you think that the pooling of the two sub-samples as in Model A is still justified? [15] d) Comment on the view that actually it would be more interesting to test whether, after allowing for an intercept difference, the slopes for union members and non-members are still the same. [10] Section B Q2 [40 marks]: Briefly explain the following terms: a) autocorrelation and the Cochrane-Orcutt estimator [20] b) the Difference-in-differences (DID) estimator [20] Q3 [40 marks]: Explain the consequences of failure of the homoskedasticity assumption. Discuss the advantages and disadvantages of the following two approaches to deal with the problem: a) Computing heteroskedasticity-robust statistics; b) Using Weighted Least Squares (WLS) method. End 42 UNIVERSITY OF KENT EC821/13 FACULTY OF SOCIAL SCIENCES LEVEL M EXAMINATION SCHOOL OF ECONOMICS ECONOMETRIC METHODS Day, date : time (exam is 2 hours long) There are SIX questions, three in Section A and three in Section B. All questions carry equal weight. Candidates should answer TWO questions, ONE from SECTION A and ONE from SECTION B. Statistical tables are attached to the paper. Approved calculators may be used. A percentage breakdown of marks within each question is given as a guide to candidates in their allocation of time. Turn over 43 2 SECTION A Answer ONE question from this section 1 A researcher investigates the pay penalty of being a non-native English speaker for UK male immigrants with some UK qualifications. Model A: Source | SS df MS -------------+-----------------------------Model | 35.9785618 5 7.19571236 Residual | 129.589686 595 .217797791 -------------+-----------------------------Total | 165.568248 600 .275947079 Number of obs F( 5, 595) Prob > F R-squared Adj R-squared Root MSE = = = = = = 601 33.04 0.0000 0.2173 0.2107 .46669 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------degree | 0.42988 0.03854 11.16 0.000 0.35419 0.50556 age | 0.02173 0.02207 0.98 0.325 -0.02163 0.06508 agesq | -0.00014 0.00028 -0.50 0.616 -0.00069 0.00041 londonse | 0.12710 0.03860 3.29 0.001 0.05128 0.20291 nonnative | -0.13854 0.04140 -3.35 0.001 -0.21986 -0.05723 _cons | 1.63717 0.42583 3.84 0.000 0.80086 2.47347 where logwage denotes log real hourly wage, degree is a dummy of holding a degree level qualification, age is the age of the immigrant and agesq is the quadratic term, londonse is a dummy for living in Southeast England including London, and nonnative is equal to one if the immigrant is not a native English speaker. a) What is the expected log real hourly wage of a 30-year old graduate who is a non-native English speaker and lives in London? At what age is his wage expected to peak? What is the wage penalty of not being a native English speaker? (15%) b) Are the slope coefficients statistically significant individually and jointly? Would you drop the age variables? Comment on the goodness-of-fit of the regression model. (15%) c) Comment on the following diagnostic tests. (20%) Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logwage chi2(1) = 3.55 Prob > chi2 = 0.0597 Turn over 3 44 Ramsey RESET test using powers of the fitted values of logwage Ho: model has no omitted variables F(3, 592) = 0.33 Prob > F = 0.8008 d) In Model B below, controls for being non-white (nonwhite) and being born in a developing country (dvlpng) are added to the model. Comment on the estimates and compare them with their counterparts in Model A. Is the inclusion of these extra regressors justified? (25%) Model B: Source | SS df MS -------------+-----------------------------Model | 37.3716564 7 5.33880806 Residual | 128.196591 593 .216183122 -------------+-----------------------------Total | 165.568248 600 .275947079 Number of obs F( 7, 593) Prob > F R-squared Adj R-squared Root MSE = = = = = = 601 24.70 0.0000 0.2257 0.2166 .46495 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------degree | 0.43454 0.03850 11.29 0.000 0.35893 0.51015 age | 0.02071 0.02200 0.94 0.347 -0.02249 0.06391 agesq | -0.00013 0.00028 -0.45 0.652 -0.00067 0.00042 londonse | 0.13386 0.03861 3.47 0.001 0.05802 0.20969 nonnative | -0.13474 0.04128 -3.26 0.001 -0.21581 -0.05367 nonwhite | -0.08542 0.06365 -1.34 0.180 -0.21044 0.03959 dvlpng | -0.06895 0.04359 -1.58 0.114 -0.15455 0.01666 _cons | 1.76811 0.42789 4.13 0.000 0.92774 2.60847 ------------------------------------------------------------------------------ e) Discuss whether you could regress a binary indicator for working (1=working, 0=not working) on the same set of right-hand-side regressors as in Model B. What specific econometric issues would arise from such a model? (25%) 2 Write short essays on TWO of the following: a) The Weighted Least Squares (WLS). b) The Cochrane-Orcutt procedure. (50%) c) The Chow test. 3 (50%) (50%) Discuss the implications of underfitting a model and the strategies one can adopt to deal with the problem in empirical work. Turn over 45 4 SECTION B Answer ONE question from this section 4. A researcher investigates the relationship between weekly earnings and (weekly) overtime hours for working mothers with dependent children. lnY = β10 + β11overtime + β12edu + β13age + β14public + β15union + u1 (1) overtime = β20 + β21lnY + β22edu + β23age + β24ageyngkid + u2 (2) where lnY is log real weekly earnings, overtime is weekly overtime hours, edu is years of education, age is the age of the respondent, public is a dummy for working in the public sector, union is a dummy for being a member of a trade union and ageyngkid is the age of the youngest child. 5. 6. a) Why are OLS estimates for both equations biased? (20%) b) Under what conditions are these two equations identified? c) Write down the reduced-form for log weekly earnings, i.e. eq. (1). d) Briefly describe how you would solve this simultaneous equation system. (20%) e) Suppose it turns out that both public sector jobs and trade union membership have a direct effect on overtime hours. What problem does this pose for the identification of the system? (20%) (20%) (20%) Write short essays on TWO of the following: a) Weak Instruments. (50%) b) The Order Condition. (50%) c) Recursive Systems. (50%) Explain why Two Stage Least Squares (2SLS) can be used to estimate causal relationships when Ordinary Least Squares (OLS) fails. Use examples to illustrate if necessary. END 46 UNIVERSITY OF KENT EC821/12 FACULTY OF SOCIAL SCIENCES LEVEL M EXAMINATION SCHOOL OF ECONOMICS ECONOMETRIC METHODS Day, date : time (exam is 2 hours long) There are SIX questions, three in Section A and three in Section B. All questions carry equal weight. Candidates should answer TWO questions, ONE from SECTION A and ONE from SECTION B. Statistical tables are attached to the paper. Approved calculators may be used. A percentage breakdown of marks within each question is given as a guide to candidates in their allocation of time. Turn over 47 2 SECTION A Answer ONE question from this section 4 A researcher investigates the part-time pay penalty (PTPP) for women in the UK. The following Stata output is based on an OLS regression using a sample of prime-aged female graduates working as employees in Southeast England from the 2010 UK Quarterly Labour Force Survey (QLFS). Model A: Graduate sample Source | SS df MS -------------+-----------------------------Model | 9.50228844 4 2.37557211 Residual | 93.7019943 554 .169137174 -------------+-----------------------------Total | 103.204283 558 .184953912 Number of obs F( 4, 554) Prob > F R-squared Adj R-squared Root MSE = = = = = = 559 14.05 0.0000 0.0921 0.0855 .41126 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .1239444 .0206581 6.00 0.000 .0833667 .1645221 agesq | -.0014951 .0002662 -5.62 0.000 -.0020181 -.0009722 london | .1289722 .0363192 3.55 0.000 .057632 .2003124 parttime | -.0970247 .0414972 -2.34 0.020 -.1785357 -.0155137 _cons | .3604125 .3837088 0.94 0.348 -.3932894 1.114114 where logwage denotes log real hourly wage, age is the age of the graduate and agesq is the quadratic term, london is a dummy for living in London and parttime is equal to one if the respondent works less than 30 hours a week. a) What is the interpretation of the intercept term? Interpret the slope coefficients. Are the slope coefficients statistically significant individually and jointly? Comment on the goodness-of-fit of the regression model. (30%) b) Comment on the following diagnostic tests. (20%) . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logwage chi2(1) Prob > chi2 = = 0.86 0.3548 . estat ovtest Ramsey RESET test using powers of the fitted values of logwage Ho: model has no omitted variables F(3, 551) = 1.11 Prob > F = 0.3460 Turn over 48 3 c) In Model B below, analysis was conducted using a sample of women whose highest qualifications are 2 or more A Levels but otherwise have the same characteristics as the sample in Model A. Comment on the estimates and compare them with their counterparts in Model A. (10%) Model B: Non-graduate sample Source | SS df MS -------------+-----------------------------Model | 4.49693739 4 1.12423435 Residual | 40.6018868 208 .195201379 -------------+-----------------------------Total | 45.0988242 212 .212730303 Number of obs F( 4, 208) Prob > F R-squared Adj R-squared Root MSE = = = = = = 213 5.76 0.0002 0.0997 0.0824 .44182 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0982575 .0363186 2.71 0.007 .0266578 .1698573 agesq | -.0012374 .0004578 -2.70 0.007 -.0021399 -.0003349 london | .074544 .0654488 1.14 0.256 -.0544841 .2035721 parttime | -.2692822 .0675418 -3.99 0.000 -.4024364 -.136128 _cons | .7528719 .6898957 1.09 0.276 -.6072124 2.112956 ------------------------------------------------------------------------------ d) Model C pooled graduates and non-graduates to increase the sample size. Is the pooling justified? (20%) Model C: Pooled sample Source | SS df MS -------------+-----------------------------Model | 12.0149052 4 3.0037263 Residual | 146.116448 767 .190503844 -------------+-----------------------------Total | 158.131353 771 .205099032 Number of obs F( 4, 767) Prob > F R-squared Adj R-squared Root MSE = = = = = = 772 15.77 0.0000 0.0760 0.0712 .43647 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .1103622 .0185349 5.95 0.000 .0739771 .1467474 agesq | -.0013458 .0002375 -5.67 0.000 -.0018121 -.0008796 london | .1226312 .0330094 3.72 0.000 .0578318 .1874306 parttime | -.1591381 .0367086 -4.34 0.000 -.2311994 -.0870768 _cons | .5935165 .346235 1.71 0.087 -.0861641 1.273197 ------------------------------------------------------------------------------ e) Discuss how you would modify Model C above to allow for differential part-time pay penalty for women with different education levels. How would you select your final model specification, to avoid either over-fitting or under-fitting? (20%) Turn over 49 5 6 Write short essays on TWO of the following: a) The Linear Probability Model (LPM). (50%) b) The Omitted Variable Bias (OVB). (50%) d) The Durbin-Watson (DW) test for autocorrelation. (50%) Discuss how natural experiments can be exploited to uncover the treatment effect of government policies. Use examples to illustrate if necessary. Turn over 50 4 SECTION B Answer ONE question from this section 4. A researcher investigates the relationship between wage and house ownership for male employees: lnW = β10 + β11ownhouse + β12edu + β13age + β14age2 + β15union + u1 (1) ownhouse = β20 + β21lnW + β22edu + β23age + β24age2 + β25anykid + u2 (2) where lnW is log real hourly wage, ownhouse is a dummy for owning house (either outright or with mortgage), age is the age of the respondent, edu is years of education, union is a dummy for being a member of a trade union and anykid is a dummy for having any children. f) Why are OLS estimates for the wage equation, i.e. equation (1), biased? (20%) g) Under what conditions are these two equations identified? (20%) h) Write down the reduced-form for house ownership. i.e. eq(2). (20%) i) j) 7. 6. Briefly describe how you would solve this simultaneous equation system. (20%) Suppose it turns out that having any children has a direct effect on wages. What problem does this pose for the identification of the system? (20%) Write short essays on TWO of the following: a) The Two Stage Least Squares (2SLS) method. (50%) b) Structural Equations and Reduced Form (RF). (50%) c) Indirect Least Squares (ILS) Estimator. (50%) Discuss the identification problem in estimating simultaneous equation models. END 51 UNIVERSITY OF KENT EC821/11 FACULTY OF SOCIAL SCIENCES LEVEL M EXAMINATION SCHOOL OF ECONOMICS ECONOMETRIC METHODS Day, date : time (exam is 2 hours long) There are SIX questions, three in Section A and three in Section B. All questions carry equal weight. Candidates should answer TWO questions, ONE from SECTION A and ONE from SECTION B. Statistical tables are attached to the paper. Approved calculators may be used. A percentage breakdown of marks within each question is given as a guide to candidates in their allocation of time. Turn over 52 2 SECTION A Answer ONE question from this section 1 A researcher is interested in the economic returns to a Master’s degree. The following Stata output is based on an OLS regression using a sample of prime- aged male graduates working as employees in England and Wales from the 2008 UK Quarterly Labour Force Survey (QLFS). Model A Source | SS df MS -------------+-----------------------------Model | 37.2972485 3 12.4324162 Residual | 323.485383 1516 .21338086 -------------+-----------------------------Total | 360.782632 1519 .237513253 Number of obs F( 3, 1516) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1520 58.26 0.0000 0.1034 0.1016 .46193 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age_ | .1067626 .0139003 7.68 0.000 .0794967 .1340284 age_sq | -.0011569 .0001744 -6.63 0.000 -.001499 -.0008148 master | .1012331 .0287821 3.52 0.000 .0447761 .1576901 _cons | .4399224 .267999 1.64 0.101 -.0857656 .9656105 ------------------------------------------------------------------------------ where logwage denotes log real hourly wage, age_ is the age of the graduate and age_sq is the quadratic term, master is a dummy for having a Master’s degree. a) What is the interpretation of the intercept term? Interpret the slope coefficients. Are the slope coefficients statistically significant individually and jointly? Comment on the goodness-of-fit of the regression model. (30%) b) Comment on the following diagnostic tests. (20%) . estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of logwage chi2(1) Prob > chi2 = = 15.29 0.0001 . estat ovtest Ramsey RESET test using powers of the fitted values of logwage Ho: model has no omitted variables F(3, 1513) = 2.54 Prob > F = 0.0552 Turn over 53 3 c) In an extended model, dummies for broad undergraduate subjects studied are included: LEM (Law, Economics and Management), COMB (Combined subjects), and OSSAH (Other Social Sciences, Arts and Humanities), with STEM (Science, Technology, Engineering and Maths) omitted. Comment on the coefficients of these dummy variables. Which subjects give the highest returns and which give the lowest? Is the inclusion of the degree subject dummies justified? (30%) Model B Source | SS df MS -------------+-----------------------------Model | 45.1531913 6 7.52553189 Residual | 315.62944 1513 .208611659 -------------+-----------------------------Total | 360.782632 1519 .237513253 Number of obs F( 6, 1513) Prob > F R-squared Adj R-squared Root MSE = = = = = = 1520 36.07 0.0000 0.1252 0.1217 .45674 -----------------------------------------------------------------------------logwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age_ | .1052694 .0137467 7.66 0.000 .0783047 .1322341 age_sq | -.0011343 .0001725 -6.58 0.000 -.0014726 -.0007959 master | .0862536 .02863 3.01 0.003 .030095 .1424122 LEM | .054434 .0322357 1.69 0.091 -.0087975 .1176654 COMB | -.0222508 .0337596 -0.66 0.510 -.0884714 .0439698 OSSAH | -.1654217 .0320012 -5.17 0.000 -.2281932 -.1026502 _cons | .4902699 .2654278 1.85 0.065 -.0303756 1.010915 ------------------------------------------------------------------------------ d) How would you modify Model B above to allow for differential returns to Master’s degree for graduates in different undergraduate subjects? How would you select your final model specification, between Model B above and the more general specification? (20%) 2 3 Write short essays on TWO of the following: a) The Difference-in-differences (DID) estimator. (50%) b) The Cochrane-Orcutt Regression. (50%) e) The Chow-test. (50%) Discuss the implications of underfitting and overfitting a regression model. Turn over 54 4 SECTION B Answer ONE question from this section 4. A researcher investigates the relationship between wages and job tenure for married men: lnW = β10 + β11tenure + β12edu + β13age + β14age2 + β15union + u1 (1) tenure = β20 + β21lnW + β22edu + β23age + β24age2 + β25dist + u2 (2) where lnW is log hourly wage, tenure is job tenure (years with the current employer), edu is years of education, age is the age of the employee, union is a dummy for being a union member and dist is travel-to-work distance. a) Why are OLS estimates of both equations biased in general? (20%) b) Write down the reduced-form for the job tenure equation (ie equation (2)). (20%) c) Discuss the Order Conditions of both equations. (20%) d) Explain in detail how you would solve this simultaneous equation system. (20%) e) Another researcher argues that equation (2) is misspecified, as union status should have a direct effect on job tenure. What problem does this pose for the identification of the system? (20%) 5 6. Write short essays on TWO of the following: d) The Hausman test for endogeneity. (50%) e) The Order Conditions. (50%) f) Recursive System. (50%) Discuss the popularity of the Two Stage Least Squares (2SLS) method in applied microeconometrics. END 55 EC821 QUANTITATIVE ECONOMICS - STATISTICAL TABLES TABLE 1: Areas for the standard normal distribution N(0,1) The table shows the area under the standard normal distribution, N(0,1), between 0 and z. For example, P(0 < z < 1.40) = 0.4192 0 z z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.5 4.0 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987 0.4997 0.4999 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 56 Areas for Student’s t distribution TABLE 2: t -t v The table shows the critical value of the t distribution with v degrees of freedom and total area in the two tails of the distribution. 0 For example, if T has a t-distribution with 3 degrees of freedom, then P(-3.182 < T < 3.182) = 0.95 t df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 80 100 0.20 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.292 1.290 1.282 Two-tail Probability 0.10 0.05 0.02 6.314 12.706 31.821 2.920 4.303 6.965 2.353 3.182 4.541 2.132 2.776 3.747 2.015 2.571 3.365 1.943 2.447 3.143 1.895 2.365 2.998 1.860 2.306 2.896 1.833 2.262 2.821 1.812 2.228 2.764 1.796 2.201 2.718 1.782 2.179 2.681 1.771 2.160 2.650 1.761 2.145 2.624 1.753 2.131 2.602 1.746 2.120 2.583 1.740 2.110 2.567 1.734 2.101 2.552 1.729 2.093 2.539 1.725 2.086 2.528 1.721 2.080 2.518 1.717 2.074 2.508 1.714 2.069 2.500 1.711 2.064 2.492 1.708 2.060 2.485 1.706 2.056 2.479 1.703 2.052 2.473 1.701 2.048 2.467 1.699 2.045 2.462 1.697 2.042 2.457 1.684 2.021 2.423 1.671 2.000 2.390 1.664 1.990 2.374 1.660 1.984 2.364 1.645 1.96 2.33 57 0.01 63.656 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.639 2.626 2.575 Chi-squared (2) Distribution TABLE 3 df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 40 50 60 70 80 90 100 0.50 0.455 1.386 2.366 3.357 4.351 5.348 6.346 7.344 8.343 9.342 10.341 11.340 12.340 13.339 14.339 15.338 16.338 17.338 18.338 19.337 24.337 29.336 39.335 49.335 59.335 69.334 79.334 89.334 99.334 0.20 1.642 3.219 4.642 5.989 7.289 8.558 9.803 11.030 12.242 13.442 14.631 15.812 16.985 18.151 19.311 20.465 21.615 22.760 23.900 25.038 30.675 36.250 47.269 58.164 68.972 79.715 90.405 101.05 111.67 Right-tail Probability 0.10 0.05 2.706 3.841 4.605 5.991 6.251 7.815 7.779 9.488 9.236 11.070 10.645 12.592 12.017 14.067 13.362 15.507 14.684 16.919 15.987 18.307 17.275 19.675 18.549 21.026 19.812 22.362 21.064 23.685 22.307 24.996 23.542 26.296 24.769 27.587 25.989 28.869 27.204 30.144 28.412 31.410 34.382 37.652 40.256 43.773 51.805 55.758 63.167 67.505 74.397 79.082 85.527 90.531 96.578 101.88 107.56 113.14 118.50 124.34 58 0.02 5.412 7.824 9.837 11.668 13.388 15.033 16.622 18.168 19.679 21.161 22.618 24.054 25.472 26.873 28.259 29.633 30.995 32.346 33.687 35.020 41.566 47.962 60.436 72.613 84.580 96.388 108.07 119.65 131.14 0.01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 29.141 30.578 32.000 33.409 34.805 36.191 37.566 44.314 50.892 63.691 76.154 88.379 100.42 112.33 124.12 135.81 TABLE 4 - Durbin-Watson Table N 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 45 50 55 60 65 70 75 80 85 90 95 100 Durbin-Watson d Statistic: dL and dU, 5% Significance Level k=1 k=2 k=3 k=4 dL dU dL dU dL dU dL dU 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 1.18 1.40 1.08 1.53 0.97 1.68 0.86 1.85 1.20 1.41 1.10 1.54 1.00 1.68 0.90 1.83 1.22 1.42 1.13 1.54 1.03 1.67 0.93 1.81 1.24 1.43 1.15 1.54 1.05 1.66 0.96 1.80 1.26 1.44 1.17 1.54 1.08 1.66 0.99 1.79 1.27 1.45 1.19 1.55 1.10 1.66 1.01 1.78 1.29 1.45 1.21 1.55 1.12 1.66 1.04 1.77 1.30 1.46 1.22 1.55 1.14 1.65 1.06 1.76 1.32 1.47 1.24 1.56 1.16 1.65 1.08 1.76 1.33 1.48 1.26 1.56 1.18 1.65 1.10 1.75 1.34 1.48 1.27 1.56 1.20 1.65 1.12 1.74 1.35 1.49 1.28 1.57 1.21 1.65 1.14 1.74 1.36 1.50 1.30 1.57 1.23 1.65 1.16 1.74 1.37 1.50 1.31 1.57 1.24 1.65 1.18 1.73 1.38 1.51 1.32 1.58 1.26 1.65 1.19 1.73 1.39 1.51 1.33 1.58 1.27 1.65 1.21 1.73 1.40 1.52 1.34 1.58 1.28 1.65 1.22 1.73 1.41 1.52 1.35 1.59 1.29 1.65 1.24 1.72 1.42 1.53 1.36 1.59 1.31 1.66 1.25 1.72 1.43 1.54 1.37 1.59 1.32 1.66 1.26 1.72 1.43 1.54 1.38 1.60 1.33 1.66 1.27 1.72 1.44 1.54 1.39 1.60 1.34 1.66 1.29 1.72 1.48 1.57 1.43 1.62 1.38 1.67 1.34 1.72 1.50 1.59 1.46 1.63 1.42 1.67 1.38 1.72 1.53 1.60 1.49 1.64 1.45 1.68 1.41 1.72 1.55 1.62 1.51 1.65 1.48 1.69 1.44 1.73 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 59 k=5 dL DU 0.56 2.21 0.62 2.15 0.67 2.10 0.71 2.06 0.75 2.02 0.79 1.99 0.83 1.96 0.86 1.94 0.90 1.92 0.93 1.90 0.95 1.89 0.98 1.88 1.01 1.86 1.03 1.85 1.05 1.84 1.07 1.83 1.09 1.83 1.11 1.82 1.13 1.81 1.15 1.81 1.16 1.80 1.18 1.80 1.19 1.80 1.21 1.79 1.22 1.79 1.23 1.79 1.29 1.78 1.34 1.77 1.38 1.77 1.41 1.77 1.44 1.77 1.46 1.77 1.49 1.77 1.51 1.77 1.52 1.77 1.54 1.78 1.56 1.78 1.57 1.78 Degrees of Freedom for Denominator TABLE 5A F-distribution: 5% critical values 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 100 120 Degrees of Freedom for Numerator 1 2 3 4 5 6 7 8 9 10 15 20 25 30 40 60 80 100 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 245.9 248.0 249.3 250.1 251.1 252.2 252.7 253.0 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.43 19.45 19.46 19.46 19.47 19.48 19.48 19.49 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.70 8.66 8.63 8.62 8.59 8.57 8.56 8.55 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.86 5.80 5.77 5.75 5.72 5.69 5.67 5.66 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.62 4.56 4.52 4.50 4.46 4.43 4.41 4.41 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 3.94 3.87 3.83 3.81 3.77 3.74 3.72 3.71 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.51 3.44 3.40 3.38 3.34 3.30 3.29 3.27 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.22 3.15 3.11 3.08 3.04 3.01 2.99 2.97 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.01 2.94 2.89 2.86 2.83 2.79 2.77 2.76 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.85 2.77 2.73 2.70 2.66 2.62 2.60 2.59 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.72 2.65 2.60 2.57 2.53 2.49 2.47 2.46 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.62 2.54 2.50 2.47 2.43 2.38 2.36 2.35 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.53 2.46 2.41 2.38 2.34 2.30 2.27 2.26 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.46 2.39 2.34 2.31 2.27 2.22 2.20 2.19 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.40 2.33 2.28 2.25 2.20 2.16 2.14 2.12 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.35 2.28 2.23 2.19 2.15 2.11 2.08 2.07 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.31 2.23 2.18 2.15 2.10 2.06 2.03 2.02 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.27 2.19 2.14 2.11 2.06 2.02 1.99 1.98 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.23 2.16 2.11 2.07 2.03 1.98 1.96 1.94 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.20 2.12 2.07 2.04 1.99 1.95 1.92 1.91 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.18 2.10 2.05 2.01 1.96 1.92 1.89 1.88 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.15 2.07 2.02 1.98 1.94 1.89 1.86 1.85 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.13 2.05 2.00 1.96 1.91 1.86 1.84 1.82 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.11 2.03 1.97 1.94 1.89 1.84 1.82 1.80 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.09 2.01 1.96 1.92 1.87 1.82 1.80 1.78 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.07 1.99 1.94 1.90 1.85 1.80 1.78 1.76 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 2.06 1.97 1.92 1.88 1.84 1.79 1.76 1.74 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.04 1.96 1.91 1.87 1.82 1.77 1.74 1.73 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 2.03 1.94 1.89 1.85 1.81 1.75 1.73 1.71 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.01 1.93 1.88 1.84 1.79 1.74 1.71 1.70 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 1.92 1.84 1.78 1.74 1.69 1.64 1.61 1.59 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.84 1.75 1.69 1.65 1.59 1.53 1.50 1.48 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93 1.77 1.68 1.62 1.57 1.52 1.45 1.41 1.39 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91 1.75 1.66 1.60 1.55 1.50 1.43 1.39 1.37 60 Degrees of Freedom for Denominator TABLE 5B F-distribution: 1% critical values 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 100 120 1 4052 98.51 34.12 21.20 16.26 13.75 12.25 11.26 10.56 10.04 9.65 9.33 9.07 8.86 8.68 8.53 8.40 8.29 8.18 8.10 8.02 7.95 7.88 7.82 7.77 7.72 7.68 7.64 7.60 7.56 7.31 7.08 6.90 6.85 2 5000 99.00 30.82 18.00 13.27 10.92 9.55 8.65 8.02 7.56 7.21 6.93 6.70 6.52 6.36 6.23 6.11 6.01 5.93 5.85 5.78 5.72 5.66 5.61 5.57 5.53 5.49 5.45 5.42 5.39 5.18 4.98 4.82 4.79 3 5402 99.17 29.46 16.70 12.06 9.78 8.45 7.59 6.99 6.55 6.22 5.95 5.74 5.56 5.42 5.29 5.19 5.09 5.01 4.94 4.87 4.82 4.76 4.72 4.68 4.64 4.60 4.57 4.54 4.51 4.31 4.13 3.98 3.95 4 5625 99.24 28.71 15.98 11.39 9.15 7.85 7.01 6.42 5.99 5.67 5.41 5.21 5.04 4.89 4.77 4.67 4.58 4.50 4.43 4.37 4.31 4.26 4.22 4.18 4.14 4.11 4.07 4.04 4.02 3.83 3.65 3.51 3.48 5 5763 99.30 28.24 15.52 10.97 8.75 7.46 6.63 6.06 5.64 5.32 5.06 4.86 4.69 4.56 4.44 4.34 4.25 4.17 4.10 4.04 3.99 3.94 3.90 3.85 3.82 3.78 3.75 3.73 3.70 3.51 3.34 3.21 3.17 6 5859 99.33 27.91 15.21 10.67 8.47 7.19 6.37 5.80 5.39 5.07 4.82 4.62 4.46 4.32 4.20 4.10 4.01 3.94 3.87 3.81 3.76 3.71 3.67 3.63 3.59 3.56 3.53 3.50 3.47 3.29 3.12 2.99 2.96 Degrees of Freedom for Numerator 7 8 9 10 15 20 25 30 40 60 80 100 5927 5980 6023 6054 6156 6209 6240 6261 6287 6312 6326 6334 99.37 99.37 99.40 99.40 99.43 99.46 99.46 99.46 99.47 99.49 99.49 99.49 27.67 27.49 27.34 27.23 26.87 26.69 26.58 26.50 26.41 26.32 26.27 26.24 14.98 14.80 14.66 14.55 14.20 14.02 13.91 13.84 13.75 13.65 13.61 13.58 10.46 10.29 10.16 10.05 9.72 9.55 9.45 9.38 9.29 9.20 9.16 9.13 8.26 8.10 7.98 7.87 7.56 7.40 7.30 7.23 7.14 7.06 7.01 6.99 6.99 6.84 6.72 6.62 6.31 6.16 6.06 5.99 5.91 5.82 5.78 5.75 6.18 6.03 5.91 5.81 5.52 5.36 5.26 5.20 5.12 5.03 4.99 4.96 5.61 5.47 5.35 5.26 4.96 4.81 4.71 4.65 4.57 4.48 4.44 4.42 5.20 5.06 4.94 4.85 4.56 4.41 4.31 4.25 4.17 4.08 4.04 4.01 4.89 4.74 4.63 4.54 4.25 4.10 4.01 3.94 3.86 3.78 3.73 3.71 4.64 4.50 4.39 4.30 4.01 3.86 3.76 3.70 3.62 3.54 3.49 3.47 4.44 4.30 4.19 4.10 3.82 3.66 3.57 3.51 3.43 3.34 3.30 3.27 4.28 4.14 4.03 3.94 3.66 3.51 3.41 3.35 3.27 3.18 3.14 3.11 4.14 4.00 3.89 3.80 3.52 3.37 3.28 3.21 3.13 3.05 3.00 2.98 4.03 3.89 3.78 3.69 3.41 3.26 3.17 3.10 3.02 2.93 2.89 2.86 3.93 3.79 3.68 3.59 3.31 3.16 3.07 3.00 2.92 2.83 2.79 2.76 3.84 3.71 3.60 3.51 3.23 3.08 2.98 2.92 2.84 2.75 2.70 2.68 3.77 3.63 3.52 3.43 3.15 3.00 2.91 2.84 2.76 2.67 2.63 2.60 3.70 3.56 3.46 3.37 3.09 2.94 2.84 2.78 2.69 2.61 2.56 2.54 3.64 3.51 3.40 3.31 3.03 2.88 2.79 2.72 2.64 2.55 2.50 2.48 3.59 3.45 3.35 3.26 2.98 2.83 2.73 2.67 2.58 2.50 2.45 2.42 3.54 3.41 3.30 3.21 2.93 2.78 2.69 2.62 2.54 2.45 2.40 2.37 3.50 3.36 3.26 3.17 2.89 2.74 2.64 2.58 2.49 2.40 2.36 2.33 3.46 3.32 3.22 3.13 2.85 2.70 2.60 2.54 2.45 2.36 2.32 2.29 3.42 3.29 3.18 3.09 2.82 2.66 2.57 2.50 2.42 2.33 2.28 2.25 3.39 3.26 3.15 3.06 2.78 2.63 2.54 2.47 2.38 2.29 2.25 2.22 3.36 3.23 3.12 3.03 2.75 2.60 2.51 2.44 2.35 2.26 2.22 2.19 3.33 3.20 3.09 3.00 2.73 2.57 2.48 2.41 2.33 2.23 2.19 2.16 3.30 3.17 3.07 2.98 2.70 2.55 2.45 2.39 2.30 2.21 2.16 2.13 3.12 2.99 2.89 2.80 2.52 2.37 2.27 2.20 2.11 2.02 1.97 1.94 2.95 2.82 2.72 2.63 2.35 2.20 2.10 2.03 1.94 1.84 1.78 1.75 2.82 2.69 2.59 2.50 2.22 2.07 1.97 1.89 1.80 1.69 1.63 1.60 2.79 2.66 2.56 2.47 2.19 2.03 1.93 1.86 1.76 1.66 1.60 1.56 61 Notes 62 Notes 63 Notes 64