Preliminary draft dated 11 February 2009 Specifying and Diagnostically Testing Econometric Models Third Edition Houston H. Stokes University of Illinois at Chicago For any questions concerning this manuscript or the B34S program see the author or call 312643-4383 or 312-996-0971. Copyright Houston H. Stokes 1982, 1988, 1989, 1990, 1991, 1996, 2006 CONTENTS Tables Figures PREFACE 1. Applied Econometric Modeling 2. Regression Analysis With Appropriate Specification Tests 3. Logit, Tobit, Probit 4. Simultaneous Equations Systems 5. Error-Components Analysis 6. Markov Probability Analysis 7. Time Series Analysis Part I: Identification of ARIMA and Transfer Function Models 8. Time Series Analysis Part II: VAR, VARMA and VMA Models 9. Testing the Specification of OLS Equations With Recursive Residuals 10. Special Topics in OLS Estimation 11. Nonlinear Estimation Options in B34S 12. Special Topics in Time Series Analysis 13. Optimal Control Analysis 14. MARS and Pi_Spline Model Building 15. Spectral Analysis of Time Series 16 Programming using the Matrix Command 17. Model Building Using Nonlinear Nonparametric Methods BIBLIOGRAPHY INDEX Preface 3 PREFACE In the late 1960s, I became aware of the enormous gap between the appropriate statistical procedures suggested by econometric theory and the then availability of options in statistical packages. My research interests comprised estimating equations using generalized least squares (GLS) with more than first-order serial correlation in the error (Sinai and Stokes 1972). To my amazement, I discovered that few statistical packages were able to perform GLS estimation, and, the ones that could, were restricted to first-order GLS. An exception was the b34t program, which was developed by Hodson Thornber (1966) at the University of Chicago in the middle 1960s to estimate up to ninth-order GLS. The program b34t consisted of 3,000 Fontran II statements for an IBM 7094 and represented an enhancement of the UCLA BIMED34 regression program. In the 1960s, applied econometricians were hampered by researchers developing singlepurpose statistical packages, each of which required data in a different form. Many of the statistical routines in these packages were unable to identify matrices that were almost rank deficient and thus unexpectedly gave poor results.1 The research reported in this book originated from the perceived need to implement on the computer a number of statistical methods and specification tests for econometric models.2 This book documents a variety of econometric 1 In a now classic study of statistical program accuracy, Longley (1967) found that the percentage error in the price coefficient of a multicollinear data set ranged from .03% to 375%. Four of the nine programs tested did not agree even in the first digit, two were accurate to 1 digit, two to 3 digits and one to 4 digits. Further detail on this paper is contained in Chapter 10, which discusses the QR approach to OLS estimation. Modern computer programs such as sas® and speakeasy® have built-in accuracy checks. A series of papers by McCullough (1998a, 1998b, 1999a, 1999b) suggests that accuracy still is a problem. McCullough makes the point that it is not an error for a software system to fail to solve a highly multicollinear model if a warning is given. What is not acceptable is to have software produce wrong answers with no warning. The Rogers - Filliben - Gill - Gutrie- Lagergren and Vangel (1998) Statistical Reference Datasets produced by the NIST now provide additional benchmarks developers can use to test their software. B34S is distributed with test programs that run all applicable datasets. Altman-Gill-McDonald (2004) is a good reference regarding desired modern statistical computing practices. In a series of papers, Stokes (2004a, 2004b, 2005) provides a discussion of software design issues impacting accuracy including a discussion of the role of variable precision math. 2 The basic b34s code now consists of over 360,000 Fortran 95 statements and runs on three UNIX platforms (IBM RS6000 AIX, SUN and Linux) and Windows 95/98/NT/2000/XP systems. Versions for IBM-MVS, IBM-CMS and Microsoft DOS have been frozen due to no demand. The full b34s is available for lease from the author (773-6434383) or from Scientific Computing Associates, 1410 N. Harlem Ave., Suite F, River Forest, IL 60305 (708-7714567). For further information see Houston H. Stokes web page under College of Business, Department of Economics, UIC (http:/ /www.uic.edu/~hhstokes). Program updates can be downloaded from the b34s web page if the user has a valid license. Preface 5 diagnostic and specification tools and provides illustrations of their use with actual econometric examples in a number of fields using the b34s® Data Analysis program.3 The reader does not need to have access to the b34s program to use this book effectively. All results are completely documented in the text and illustrated with computer output. Readers desiring to apply the indicated techniques could use b34s, or program the techniques in a higher level programming language such as the speakeasy® system, matlab® or the sas/iml® system. The techniques illustrated have been used in economic analysis, in financial modeling, in health economics, in energy modeling, in environmental economics, in sociology, in political science and in industrial research. Many of the problems in these areas have been used as illustrations in this book. Each chapter in this monograph will indicate briefly the statistical problem, what specific calculations are available, the routines to be used to make these calculations and, wherever possible, provide an example. When more common procedures are being discussed (such as two-stage least squares), the technical discussion will be reduced and the reader will be referred to appropriate textbooks. When procedures use code developed by others, the reader will be directed to the original source for additional detail. In each case major attention to the mechanics of the calculation will be presented. All problems illustrated are distributed test problems from the B34S web page so that interested readers can try to replicate and or modify all calculations. Due to the response from readers of the first edition and second editions, this edition has been enlarged with material on the detection and modeling of nonlinear models using MARS (Multivariate Adaptive Regressions Splines) and PISPLINE ( spline ) models as well as many examples that are now possible with the new matrix programming capability in b34s. Most other chapters have been substantially expanded to incorporate these new facilities. A project the size of this book incurs numerous debts. My father, W. E. D. Stokes, Jr., first introduced me to signal filtering as applied to economic problems and stimulated my interest in graduate work in economics in the 50s. I am deeply indebted to Henri Theil and Arnold Zellner who introduced me to econometrics in the late 60s at the University of Chicago and provided encouragement for this project at many stages. Their classes led me to question whether the assumptions of the usual OLS model were met by the data for the problem at hand. They stressed the importance of model specification and diagnostic checking of the results. Next, I would like to thank the numerous reviewers of my scientific papers and the first edition of this book who have corrected my analysis and suggested many improvements. While any remaining errors or shortcomings of the b34s system are the sole responsibility of the author, certain individuals deserve special mention during the software development aspects of this project. In the 70s, Ron Golland, at the University of Illinois at Chicago, was especially 3 Programs referenced in this monograph are usually shown in courier type. Procedures inside programs are shown in bold, unless part of command file listings. This allows readers to distinguish between, for example, the mars command and the MARS statistical procedure. helpful in pointing me to the finest available utility routines (LINPACK, EISPACK) and in developing other useful utility routines that I have incorporated into B34S. The University of Illinois at Chicago Computer Center has generously provided computer time for the project and George Yanos, the former Associate Director, has been most helpful when serious design problems had to be solved in the 70s. In the 80s Paul Setze and Jim O'Leary made contributions. Stan Cohen and his team of developers of the SPEAKEASY® System have been most helpful over the years and have worked with me on the B34S to SPEAKEASY and SPEAKEASY to B34S interface. On the PC, SPEAKEASY is seen as a command of B34S which is accessible with a hot-key from the Display Manager. The SPEAKEASY software adds a powerful interactive matrix language capability to the tools already in B34S. In the late 90's the matrix command was built to implement a SPEAKEASY like programming language in B34S that was tailored to econometrics and time series analysis. The B34S matrix command and SPEAKEASY share the same data save format. Professor Lon-Mu Liu, the developer of SCA, and Bill Lattyak the developer of WORKBENCH, have made many suggestions and have worked with me on the design of the B34S/SCA data interface. The program WORKBENCH has made the use of B34S substantially easier for many new users. Professor Barry Chiswick has made suggestions involving changes to make B34S easier to use, such as the development of the new syntax and the implementation of the SAS/B34S interface.4 My research colleagues, Georgios Karras, Richard Kosobud, Evelyn Lehrer, John McDonald, Hugh Neuburger, Jin-Man Lee and Allen Sinai all played major roles in pointing out econometric problems whose solutions required the development of procedures that later found their way into B34S. Since 1964 Hugh Neuburger has made many suggestions for improvements in the time series capabilities of the program, which he has extensively used in financial model building. His help, friendship and the extensive testing he has done have substantially improved the final product. The pioneering research of Melvin Hinich and Doug Patterson into detecting nonlinearity in time series changed my research focus. Their generosity in providing me code, advice and friendship are much appreciated. As a consequence of their influence I developed the bispec, polyspec sentences and the mvnltest paragraph, which are involved with testing for nonlinearity. The mars and pispline commands represent efforts to deal with the estimation of nonlinear models. Ali Akarca has assisted me with the testing of the program in its early stages, particularly in the area of time series analysis. I have received many helpful suggestions from former students, such as Terry Elder, Linda Manning, Dimitri Andrianacos, John Sfondouris and Ron Usauskas in the 70's and 80'. For the 1997 edition, Marcos and Maria Lemos made a number of helpful suggestions. Jin Man Lee has helped me in many ways to make B34S run well on modern computer systems. His contributions for the present edition have been substantial and go beyond making suggestions and finding errors. In addition to providing material for a number of chapters, his help in testing and extending the matrix command has been especially valuable. My son William A. Stokes made many contributions and suggestions on the Linux port, helped design and program the web page, and made design suggestions for the B34S Display Manager. His perspective as a modern computer science major opened my eyes to new approaches to old problems. Stokes (2003b) provides an in depth look at software design issues involved in the In the 90’s the B34S/MATLAB and MATLAB / B34S interface was developed. MATLAB is callable from the Display Manager. 4 Preface 7 design of statistical software. For the preparation of edition three I have received expert help and support from my excellent graduate students. Marek Kolodziej who found many slips and Yuliya Yurova and Narsid Golic encouraged me to develop applications in many interesting directions. In the period 2005-2009, Xin Fang, Kathleen Odell and Shaoying Chang proved helpful, especially in the development of the nonlinear and non-parametric estimation capability. I am grateful to Hodson Thornber who has given me permission to adapt material from his manual for B34T and whose program was the basic building block for B34S that started many long years ago in 1972 and to the Review of Economics and Statistics, published by North Holland and from which I adapted material from my papers. Individual code contributions are acknowledged throughout the book. Most important, I owe a large debt of gratitude to my wife, Diana, who not only gave me encouragement and support while I was building the code and fixing elusive "bugs," but, in addition, provided valuable editorial help on the manuscript in the form of detailed editing for the three editions. Individual acknowledgement to others who have contributed to B34S is given in the specific chapters where procedures are discussed and in the two on-line help manuals.5 Any remaining errors or design limitations are, of course, my responsibility. Houston H. Stokes Department of Economics University of Illinois at Chicago hhstokes@uic.edu http://www.uic.edu/~hhstokes 13 February 2016 5 B34S contains two on-line help manuals (Stokes 2003a, 2003b) which are continually being updated. The B34S command b34sexec help=manual newpage bottompn makeindex$ b34seend$ places the complete command reference manual in the log file. The complete manual is available on-line and sections can be downloaded. In addition extensive test output can be downloaded from the B34S web page. If the keyword oldmanual is substituted for manual, the B34S "native" command manual is printed. This is usually niot ever needed. Since complete help is available in these manuals and on line, no attempt is made in this book to provide complete command references. The purpose of this book is to document the calculations and illustrate their use with actual econometric research. This book is not a computer manual, nor is it meant to be a text. The B34S program is supplied with a comprehensive help manual and detailed examples on all procedures and matrix commands as well at over 600,000 lines of sample datasets. Users are encouraged to use these datasets to learn econometrics. It has been my experience that only through analysis of actual data is it possible to fully understand econometrics.