Revised Chapter 1 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 11 February 2009. All rights reserved. Preliminary Draft Chapter 1 Applied Econometric Modeling................................... 1 1.0 Introduction ............................................ 1 1.1 Outline of the Book ..................................... 2 1.2 B34S Overview ........................................... 7 Table 1.1 B34S Commands ................................... 10 Table 1.2 Original Origin of Source Code for Various B34S Commands .................................................. 11 Table 1.3 B34S Run Stand Alone ............................ 11 Table 1.4 B34S Run Under SAS Using the cb34sm Macro ....... 12 Table 1.5 B34S Platforms ................................. 15 1.3 B34s Display Manager ................................... 15 Table 1.6 B34S MAKEMENU Commands to Generate the rr Command 17 1.4 Conclusion ............................................. 21 Applied Econometric Modeling 1.0 Introduction This book illustrates the use of model specification and diagnostic tests applied to a variety of econometric modeling techniques. The techniques discussed include simple, one-equation OLS models with continuous variables on the left-hand side. These models can be tested for the appropriate specification and for changes in the parameters over time and for different levels of the right-hand-side variables using recursive residuals (RR) and Best Linear unbiased scalar (BLUS) residual tests. Extensions of the simple, one-equation model include models in which the left-hand side is a 0-1 variable (probit and logit models) or models in which the left-hand-side variable is bounded (tobit models). If we relax the assumption of exogenous variables on the right-hand side of a model, the appropriate estimation technique is either two-stage least squares or limited information maximum likelihood. Sets of equations should be estimated with three-stage least squares if there is covariance among the error terms. If the data set consists of pooled data (time-series and cross-section), errorcomponents models are appropriate. In more limited cases where market share analysis is desired, Markov probability models are a viable alternative. Forecasting extensions of the simple OLS model include modeling only the error (ARIMA analysis) or specifying the dynamics of the mapping of the exogenous variables on the endogenous variables and modeling the error (transfer function modeling). The vector autoregressive (VAR) and vector autoregressive moving average (VARMA) models are shown to be a time series generalization of three-stage least squares and full information maximum likelihood models. Transfer function and ARIMA modeling is a special case of these more general VARMA forms. VAR models can be viewed in the frequency domain for added insight. More specialized techniques include orderly searches for the appropriate equation specification, 1-1 1-2 Chapter 1 using the MINMAX and L1 models; optimal control analysis; nonlinear analysis and the QR approach to computation. The purpose of this monograph is to illustrate the above techniques, using actual research data. To facilitate the calculations, the B34S Data Analysis program was developed and its application will be illustrated. The B34S matrix command was developed to provide a 4th generation programming language that was especially suitable for econometric and time series analysis. Many problems are illustrated using this programming language. Sample output for all procedures discussed in the text has been provided so that the availability of the B34S program is not required to benefit from this book.1 1.1 Outline of the Book Chapter 2 discusses options involving regression analysis and specification tests. These options are accessed from the regression, reg and robust commands and include ordinary least squares, weighted least squares, generalized least squares, L1 estimation, MINIMAX and heteroskedasticity, normality, and serial correlation tests. Additional features include BLUS residual analysis, BAYES analysis options, and other residual analysis options (ra option). Chapter 3 is devoted to a discussion of logit, probit and tobit models, all of which involve restrictions on the range of the dependent variable. The basic code for the logit routines, which are accessed with the loglin command, was obtained from Nerlove and Press (1973, 1976). The tobit and probit code was obtained from Mathematica Policy Research Corp2 and is accessed by the probit and tobit commands. The multinomial logistic code, which is accessed by the mloglin command, was initially obtained from Kawasaki (1978, 1979). A revision of this program, based on the prior B34S version, was obtained from Klein and Klein (1988), Klein (1988). The multinomial probit procedure, which operates on ordered probit data and which is accessed with the mprobit command, was developed from code originally written by McKelvey and Zavoina (1971, 1975). Chapter 4 discusses the use of routines built by Les Jennings (1980) that calculate ordinary least squares, limited-information maximum likelihood, two-stage least squares, three-stage least squares, iterative three-stage least squares and full-information maximum likelihood estimation for systems of equations. This code is accessed by the simeq command. Advantages of the Jennings code are the speed and accuracy of the algorithms used (QR approach) and the option of obtaining the constrained reduced form of a system of simultaneous equations. Chapter 5 is devoted to problems that arise in the distribution of the error term when pooled 1 Programs are listed in the text using upper case Courier Font. Commands inside a program are listed in the text as bold lower case Times Roman. Techniques such as ordinary least squares are listed in the text in upper case. For example, a MARS model is estimated by the mars command in B34S. 2 The tobit command Fortran code was very old and possibly is the original Tobin program. The probit code most likely was originally developed by John Cragg, but has been changed by a number of others. All three programs (loglin, probit and tobit) were converted to double precision and extensively improved by the addition of the LINPACK matrix subroutines (Dongarra, Bunch, Moler, and Stewart 1979). The original developers are absolved from any responsibility for any possible errors that might have been inadvertently added. Applied Econometric Modeling 1-3 time-series and cross- section data are used in a regression. The error-components procedure is a solution that avoids either the assumption being made that the constant is the same in the cross section as through time or the loss of degrees of freedom if this assumption is relaxed and multiple dummy variables are entered into the equation for each time period or for each cross section observation. The code used in this section is an extension of the Freiden (1973) program by Houston H. Stokes, following suggestions by Henry and McDonald. The basic reference is Henry, McDonald, Stokes (1976). Balanced error component models are accessed with the ecomp command while dynamic unbalanced panel datasets can be analyzed with the reg command.. Chapter 6 discusses an extension of the basic Lee, Judge, and Zellner (1970) Markov probability model, following suggestions contained in Theil (1972, Chap. 5). The basic Markov code was first extended to allow more states, and many of the linear algebra routines were replaced with LINPACK matrix routines. The code was next extended to include decomposition of the transition probability matrix into the fundamental matrix, the exchange matrix, the mean firstpassage matrix, etc and placed in the transprob command. These extensions were used in a number of articles by Kosobud and Stokes (1978, 1979, 1980) modeling OPEC behavior and Neuburger and Stokes (1979a) modeling economic history. Chapters 7 and 8 are devoted to the time series analysis. Chapter 7 discusses the use of the autocorrelation and cross correlation function in building OLS models, autoregressive integrated moving-average models (ARIMA) and transfer-function (TF) models. The commands bjiden and bjest are used to identify and estimate these models, respectively. The basic code used in these commands was originally built by David Pack (1977), following suggestions made by Box and Jenkins (1976) and Box and Tiao (1975). Suggestions of Neuburger and Stokes (1979b), Stokes and Neuburger (1979) and Stokes (1990) for additional diagnostic specification tests have been incorporated. A simplified treatment of the ARIMA modeling process is contained in Nelson (1973). Recent developments are outlined in Enders (1995, 2004) and Tsay (2002, 2005). The Pack code has been extensively modified to include many features not found in the original, such as spectral analysis, further diagnostic tests, and modified to improve accuracy. In addition an automatic model estimation command autobj has been developed to run as part of matrix. Chapter 8 discusses vector autoregressive moving-average model building (VARMA). The btiden and btest commands identify and estimate a VARMA model. These commands were based on a heavily modified version of the Wisconsin WMTS-1 program, which was developed by Tiao and Box (Tiao, Box, Grupe, Hudak, Bell, Chang 1979). Enhancements to the code include tests on the residuals suggested by Hinich (1982), Hinich and Patterson (1985, 1986), Hinich and Wolinsky (1988), and Stokes and Hinich (1989) and a decomposition of the covariance matrix to study instantaneous causality suggested by Granger and Newbold (1977, 223). Melvin Hinich most generously supplied his code formed the basis of the bispec sentence, which is callable from a number of procedures, and the mvnltest command. The important book by Patterson - Ansley (2000) provides further information on detecting nonlinearity. 1-4 Chapter 1 Chapter 9 discusses how to use the recursive-residual (RR) analysis technique to test an equation for parameter stability. The rr command code was written by Houston H. Stokes, following the suggestions in the seminal article by Brown, Durbin, and Evans (1975) and Dufour (1979, 1982). Since the recursive residual technique involves repeated calculation of regressions as new observations are added, a great deal of effort has been devoted to making the code execute quickly and accurately. Modifications of the recursive residual technique have been made to allow it to be used with cross-section samples to test for both variation of the coefficients for different levels of the explanatory variables and for interaction effects. The code has been improved by the inclusion of LINPACK routines, particularly in updating and downdating the Cholesky decomposition, and by the ability to display the results using high resolution graphics on the PC versions of B34S. Chapter 10 discusses the QR approach to OLS estimation, a technique particularly suitable in cases of multicollinearity. While ridge lasso estimation procedure attempts to deal with the multicollinearity problem via investigating the effect on the OLS coefficients of a perturbation of the X'X matrix, the QR procedure factors the N by K X matrix directly such that X=QR, where the K vectors in the N by K matrix Q are orthonormal and the K by K matrix R is upper triangular. Both ridge approaches and lasso approaches can be thought of as data reduction techniques. Although a Cholesky decomposition of X'X into R'R is an alternative approach to get R, the disadvantage of the latter procedure is that the condition (ratio of the largest to smallest eigenvalue) of X'X is the square of the condition of R. When X'X is close to singularity, problems will arise that would have been avoided if one could get R directly (via the QR method) without forming the more rank-deficient matrix X'X. 3 The QR factorization code in B34S is taken directly from LINPACK and, by use of a pivoting option, allows the user to detect dependencies among the columns of X. The qr command performs the above procedures and can optionally calculate principal-component regressions. Chapter 11 concerns nonlinear estimation, which used to be only accessed by the basically superceded nonlin command. At present there are a number nonlinear commands that are available under the matrix command. Unlike the older approach that required the user to code the model in a Fortran subroutine, the matrix command approach allows the user to code the model in a 4th generation language while using a compiled code to actually solve the system. This hybrid approach differs from MATLAB® and other systems where both the solver and the model are coded in a 4th generation language. The basic nonlinear least squares code was originally written by Meeter (1964a, 1964b) to implement the Marquardt (1963) algorithm.4 Initially the code was improved via the addition of LINPACK routines and the use of a dynamic calling option, which allowed the user to code his/her own models and create a library of compiled Fortran subroutines that B34S could branch to during execution. This approach, while fast, required knowledge of Fortran and user compilers. In the matrix command implementation of the same program, while some speed is lost 3 Strang (1976) contains an excellent discussion of this approach. 4 The gaushaus routine, developed by Meeter (1964a, 1964b), has passed the test of time and three variants are used: in the Box-Jenkins ARIMA and transfer function model-building section, in the Box-Tiao VARMA modelbuilding section and in the matrix command. Applied Econometric Modeling 1-5 over the older Fortran implementation, the easy of use has been vastly improved. On the PC, the use of screen writes allows visual monitoring of the solution progress. Chapter 11 discusses both nonlinear least squares and constrained and unconstrained optimization. It should be read in conjunction with chapter 16 that discusses other features of the matrix command. Chapter 12 contains a discussion of the varfreq and kfilter commands, which allow decomposition of a VAR model into the frequency domain, following methods suggested by Geweke (1982b, 1982c), and state space estimation, following suggestions by Aoki (1987). The MTSM program developed by Geweke (1982a) was modified by Stokes (1985, 1986b) and forms the basis for the varfreq command. The kfilter command uses the code developed by Aoki (1987) and is only discussed briefly. The importance of testing for unit roots is discussed and a number of tests are illustrated. The b34s polysolv command is used to test models for unit roots. The dangers of unit roots are illustrated with sample data on which various tests are performed using the pgmcall procedure, which provides an interface to the RATS software and the bispec sentence containing unit root and ARCH test options. Chapter 13 contains a brief treatment of the optcontrol command, which implements the Chow (1975, 1981) optimal control code. Since the use of this approach is extensively discussed and documented in the seminal references by Chow (1975, 1981), only a brief discussion of the program is given here. The initial implementation of the Chow program used the DYNCAL procedure to allow researchers to build a library of compiled subroutines containing their models to which B343S dynamically branched at the time of the execution. Supporting the dynamic link proved too difficult. In recent years the approach was modified to allow B34S to branch to a stand alone user compiled Fortran program and communicate via files. While there is a speed loss over a dynamic link implementation, functionality is maintained and substantial models can be studied. An example with the Klein-Goldberger model is provided. Chapter 14 deals with a number of approaches to model nonlinear data when the explicit model is not known and the techniques in Chapter 11 that require explicit specification of the nonlinear model cannot therefore be used. The focus is on the marspline, mars_var, gamfit, acefit, and pispline commands, which estimate models using MARS, GAM, ACE and spline methods respectively, which are discussed and illustrated on data that is potentially nonlinear. The no longer distributed B34S mars command, that used heavily modified code obtained from Friedman (1991b), has been replaced by the marspline and mars_var commands that use GPL code originally redeveloped for R by Hastie and Tibshirani (1990) who also developed the routines that form the basis for the gamfit and acefit commands The pispline command is based on code from Breiman (1991a).5 After a brief discussion of these techniques, a number of data sets, including the Gas 5 Friedman who, originally developed the MARS program, later registered it as a trademark. In this book the word MARS refers to the MARS method or approach unless explicitly referring to the Friedman 3.5 version software program. The Friedman program is no longer distributed in the commercial B34S. The marspline command was developed using the GPL R code developed by Hastie and Tibshirani to replace the old Friedman MARS™ code. The ACE and GAM capability is based on extensions of the R GPL code developed by Hastie and Tibshirani (1990) 1-6 Chapter 1 Furnace data that was found to be nonlinear in Chapter 8, are studied. Both MARS and spline models were found to reduce the nonlinearity in the data. The McManus dataset, discussed in Chapter 3, and the Sinai - Stokes (1972) production function dataset, discussed in Chapter 2, 9 and 11, are also analyzed using a number of methods. Finally, various sample datasets were developed and models were applied and the residuals tested. In addition to MARS and spline , the generalized additive model GAM developed by Hastie and Tibshirani (1990), and Alternating Conditional Expectation (ACE) approaches are illustrated in a number of cases. It is argued that there is no one technique that can be used in all situations. In many cases the nonlinear diagnostic tests discussed earlier and the GAM procedure can be utilized to point to problem. It has found to be most helpful in model building if a number of diagnostic tests are first performed on the residuals of the preliminary model. If evidence of nonlinearity is found, the next step is to utilize the GAM approach to attempt to point to just what right hand side variables are nonlinear before proceeding further with other specifications. Chapter 15 illustrates the capabilities of the B34S spectral command, which is similar to the SAS command proc spectra. Sample data sets are developed and their spectral representations studied. The gas furnace data is used to illustrate the capability of the procedure. Chapter 16 illustrates the capabilities of the B34S matrix command, which allows users to program the econometric calculations. A major emphasis of the chapter is to discuss the software design issues that went into the implementation of this facility. Many examples are provided and the whole issue of efficient programming is discussed and illustrated. The matrix command is actually a major program in its own right. It can read and write data files for use in the older “procedure” part of B34S or it can be run stand alone. Chapter 17 is concerned with model building using non-linear nonparametric methods. Specific methods discussed include Recursive Covering, Regularized Discriminate Analysis (a compromise between Linear Discriminate Analysis (LDA) and Quadratic Discriminate Analysis), Projection Pursuit Estimation, Exploratory Projection Pursuit and Random Forest Modeling. The Random Forest approach is a generalization of the CART (Classification abd regression trees) approach by implementing "bagging." Bagging, is based on the bootstrap method of analysis. Using bagging roughly two thirds of the sample is randomly selected with replacement (bagged). Once the model is estimated using this bagged subsample, it is tested on the "out of bag" sample. This is repeated many times and the resulting classification model selected is based on "voting." These techniques were implemented in the matrix command as the rcover, rda, ppreg, ppexp and ranforest commands. A main objective of this chapter is to both outline these methods and compare and contrast their performance on a wide range of problems. It is the firm conviction of the author that one learns econometrics by application of techniques. Only by systematically testing the specification of a model can one truly begin to and others. The developer of B34S appreciates having the availability of these routines. Applied Econometric Modeling 1-7 appreciate how sensitive results may be to the initial specification. While many programs require the user to "run blind," because they do not provide adequate diagnostic tests, B34S allows the user to subject his/her model to a battery of test procedures, which will demonstrate how sensitive the results are to alternative specifications of the functional form. The remaining chapters discuss some of these procedures and their use in greater detail. Most chapters contain B34S control setups to run sample programs using the procedures discussed. Readers are encouraged to run these sample programs in their entirety.6 Edited versions of the output of these control programs are contained in the text and are briefly discussed. Before discussion of the regression specification tests, a brief description of the B34S software is provided to give the reader an overview of the model specification and diagnostic testing options available. 1.2 B34S Overview The B34S Data Analysis Program is a collection of econometric procedures that are useful in the analysis of both cross-section and times-series models. The program consists of two parts: a number of command driven procedures and a programming language. The command driven procedures allow data loading and analysis of econometric models using regression and other methods. The programming language, available under the matrix command, allows the user to actually develop procedures to perform the analysis using an object oriented programming language. Within the programming language, user programs, subroutines and functions can be built and used to "customize" the calculations. The B34S has two way links with other systems such as SAS®, SCA®, SPEAKEASY®, MATLAB® and RATS®. It is assumed that the reader has a good background in econometrics, obtainable from study of such books as those by Johnston (1963, 1972, 1984), Theil (1971), Pindyck and Rubinfeld (1981), Chow (1983), Greene (2000), Enders (1995), and Kmenta (1971, 1986). Many of the more advanced statistical procedures discussed involve techniques that are just beginning to appear in econometric textbooks.7 To aid the reader, the treatment of all techniques discussed in this book, while brief, is meant to be self-contained. B34S can be run under SAS, SPEAKEASY, MATLAB, or as a stand-alone program. When running as a stand-alone program, B34S will read data in a number of ways, the most common being from a sequential file that can be either in E format, F format, Z format, A8 format or doubleprecision unformatted.8 B34S can create and read an SCA FSAVE data file library (Liu, Hudak 1986a, 1986b) or an SCA MAD file and can create and read its own data step. These options 6 The B34S code can be typed in or obtained from the libraries that are distributed with the software. Since most sample setups are B34S macros, they can be selectively run with the options command on the PC under the Display Manager. 7 Epstein (1987) contains a good discussion of how current econometric practice has evolved over time. Stokes (2003b) discussed software development issues. 8 For a discussion of these formats, see any basic Fortran textbook or IBM Inc. (1972, 1988a, 1988b). 1-8 Chapter 1 facilitate data interchange between B34S and SCA, which are complementary programs. If PC users have RATS, the citibase command provides access to CITIBASE data files, which can be changed in frequency and loaded for further processing. B34S can also read and write a RATS portable data file. While B34S procedures with the exception of the matrix have a hard limit of no more than 98 series in one file, the dmf command (Data Management Facility) can be used to save and manage data files substantially larger. The current limit of the matrix command is 10,000 objects, each of which can be a matrix. This facility, which can be thought of as a program within a program, has essentially no limit except imposed by hardware. B34S is usually run in batch mode by submitting command files that produce output and log files. In addition to batch mode, the B34S Display Manager provides a graphical interface to edit and submit command files and to inspect output and log files and, view and create high- resolution graphics on the four main platforms, Windows 98/NT/2000/XP, Linux, Sun and RS/6000. The Display Manager provides easy access to The Display Manager is designed to run in graphics mode but automatically will run in text mode on a Unix dial up line. To speed up the learning curve, B34S users of the Display Manager can select their own editor, although KEDIT or the Microsoft NOTEPAD / WORDPAD editors are recommended. The makemenu command provides a facility by which users can construct menus using the B34S control language to provide an automatic program writing feature. Under the Display Manager a number of default menu files are provided under the menu and graphics commands. Since these are just B34S command files using the makemenu command, they can be modified or customized by the user. The makemenu command can be run in batch mode to provide custom, menu-driven B34S applications that can be developed and maintained by users. The makemenu command can be used to generate command files for other programs in addition to B34S. Further information concerning these features are contained in Stokes (1996b) which documents B34S commands and which is available on-line and Stokes (1996a) which documents the "native" B34S command language. As an alternative to loading data into B34S directly, use of the SAS MACRO cb34sm allows B34S to be seen as a SAS procedure. In this mode of operation the B34S learning curve is reduced to only the desired B34S paragraph since the cb34sm macro handles all data loading. Use of this command is illustrated below. The command structure of B34S involves the specification of multiple paragraphs or commands, each containing sentences. The first sentence in each paragraph begins with the keyword b34sexec, which starts the parser. The last sentence in each paragraph, which turns off the parser, must be either b34seend$ (B34S EXEC END) or b34srun$. Each sentence must end with the delimiter $ or the delimiter ;. The b34seend$ sentence is used in batch operation. The b34srun$ sentence is used in an interactive environment to force B34S to execute the command. Its use is similar to that of the SAS run; command. If the b34seend$ sentence is used, B34S will parse all commands prior to attempting to execute. In this book, the term " B34S command" and " B34S paragraph" will be used interchangeably except when referring to commands inside the matrix command. The B34S matrix command starts a programming language, which allows customized programming. This section of the B34S, although fully integrated into the procedure driven section, is logically distinct. The matrix command programming language can load data from the Applied Econometric Modeling 1-9 rest of B34S , can read SPEAKEASY, RATS, and SCA save files or it can directly read and write data. More detail on the B34S command structure is contained in Stokes (2000). A list of currently supported B34S commands or paragraphs is given in Table 1.1. Table 1.2 lists the origin of the basic code in those commands that were developed from source provided by others. The B34S control language is completely free format and is in many ways similar to that of SAS. The SAS macro cb34sm, distributed with B34S, allows B34S to be called by a SAS user and is intended to run on all SAS platforms. This is in contrast with the SAS procedure cb34s, which only ran on SAS version 5.xx on MVS (Stokes 1986a). An example of this mode of operation is illustrated in Table 1.3. Note that the B34S sentence termination character $ is used in place of ; to not confuse SAS. It is recommended to run B34S under SAS in applications that need to utilize the powerful SAS data step to load and process the data prior to the call to the more specialized B34S procedure. A SAS/B34S job is illustrated in Table 1.4, while a B34S stand-alone job is shown in Table 1.3. In this mode of operation, the B34S data paragraph is required, while in the SAS/B34S job, the B34S data paragraph does not have to be explicitly supplied. The complete B34S command reference manuals (Stokes 2006a, 2006b) are available online. While these manuals may be used as the sole references for B34S, they contain little documentation of the statistics calculated and no output from examples of their use although a number of sample programs are shown. The file c:\b34slm\example.mac, which contains working examples of all procedures, and this book can be thought of as a major extension of the original B34T manual written by Hodson Thornber (1966, 1967, 1968). Sections of Thornber's original manual have been included in Chapter 2 of this book, with the author's permission. Chapter 16 contains an overview of the commands available in the matrix command. 1-10 Chapter 1 Table 1.1 B34S Commands Command HELP OPTIONS REGRESSION LIST PLOT PROBIT TOBIT LOGLIN ECOMP AUTOC RR QR DATA MPROBIT MLOGLIN SIMEQ TRANSPROB BJIDEN BJEST BTIDEN BTEST VARFREQ PGMCALL POLYSOLV DTASSM OPTCONTROL GAMFIT KFILTER SOURCE SCAINPUT FORECAST MARS PISPLINE GENMOD HRGRAPHICS SORT SPECTRAL MAKEMENU DMF MVNLTEST CITIBASE READVBYV DESCRIBE REG ROBUST TRANSPOSE FREQ LPMAX EXPAND MATRIX Description Provide help, generate on-line manual. Set B34S run-time options. OLS and GLS estimation. BLUS and RA analysis. Display data in B34S data file. Plot and graph series in B34S data file. Probit analysis on (0-1) dependent variables. Tobit analysis on truncated dependent variables. Logit analysis on up to four equations at once. Error-components analysis. Autocorrelation and cross correlation analysis. Recursive-residual analysis. QR factorization & principal component analysis. Load data into B34S without SAS/B34S interface. Multinomial probit analysis. Multinomial logit analysis. 2SLS, LIML, 3SLS, I3SLS, FIML, SUR estimation. Estimate Markov probability model. Box-Jenkins identification. Spectral analysis. Box-Jenkins ARIMA, transfer-function estimation. Identification of VAR and VARMA models. Estimation of VAR, VARMA and VMA models. Spectral decomposition of VAR models. Branch to SAS, SPEAKEASY, SPSS, SCA, TSP and LIMDEP. Solution of polynomials. Data-manipulation utilities. Optimal control analysis. Estimate a GAM Model. Estimate state-space model. B34S FORTRAN source manager. B34S/SCA/RATS/MATLAB input/output option. Automatic VAR Forecasting Model Development. Multivariate Adaptive Regression Splines (No longer available). PI Method of Fitting an underlying smooth function. Generate Data sets with given covariance structure. High Resolution Graphics. Sort data. Spectral Analysis. User Menu facility. Data Management Facility. Multivariate tests for nonlinearity. Load Citibase data into B34S using RATS. Read Data Variable by Variable Calculation of Various Summary Measures Panel Time Series Analysis L1, MINIMAX and OLS Models Calculate Transpose of Data Matrix Frequency Plots and Cross Tabulation Linear Programming Expand Weighted Dataset General Programming Language with many commands Applied Econometric Modeling 1-11 Table 1.2 Original Origin of Source Code for Various B34S Commands Command REGRESSION LOGLIN ECOMP MPROBIT MLOGLIN SIMEQ TRANSPROB BJIDEN BJEST BTIDEN BTEST VARFREQ OPTCONTROL GAMFIT KFILTER MARS PISPLINE HRGRAPHICS Description Thornber (1966) Nerlove-Press(1973) Freiden (1973), Henry-McDonald-Stokes (1976) McKelvey - Zavoina (1971, 1975) Kawasaki (1978, 1979) Jennings (1980) Lee-Judge-Zellner (1970) Tiao-Box (1981), Tiao-Grupe-Hudak-Bell-Chang (1979) Tiao-Box (1981), Tiao-Grupe-Hudak-Bell-Chang (1979) Tiao-Box (1981), Tiao-Grupe-Hudak-Bell-Chang (1979) Tiao-Box (1981), Tiao-Grupe-Hudak-Bell-Chang (1979) Geweke (1982a, 1982b) Chow (1975, 1981) Hastie and Tibshirani (1990) Aoki (1987) Friedman (1991b) (No longer commercially available) Breiman (1991) Interacter (1995a, 1995b) LINPACK (Dongarra-Bunch-Moler-Stewart, 1979), EISPACK, LAPACK (Anderson-Bai-Bischof-Demmel-DongarraDu Croz-Greenbaum-Hammarling-McKenney-Ostrouchov-Sorenson, 1992) and FFTPACK are used throughout the program. Nonlinearity tests based on code supplied by Hinich (1982) are callable from a number of places in the system. All other code with the exception of the IMSL and Interacter routines were developed by Houston H. Stokes Table 1.3 B34S Run Stand Alone b34sexec data $ input x y datacards$ 11 22 33 44 55 66 99 77 77 88 b34sreturn$ b34seend$ b34sexec list$ var x$ b34sexec regression$ b34sexec robust$ b34seend$ model y = x$ b34seend$ model y=x$ b34seend$ 1-12 Chapter 1 Table 1.4 B34S Run Under SAS Using the cb34sm Macro * This job uses the SAS MACRO CB34SM; %include 'c:\b34slm\cb34sm.sas'; data junk; input x y; cards; 11 22 33 44 55 66 99 77 77 88 ; proc means; * Clean files ********************************** options noxwait; run; data _null_; command ='erase myjob.b34'; call system(command); * End of clean step **************************** * ; * Place B34S commands next after %readpgm ; %readpgm cards; b34sexec list$ var x$ b34seend$ b34sexec regression$ model y = x$ b34seend$ b34sexec rr$ model y=x$ b34seend$ b34sexec describe$ b34seend$ b34sexec reg$ model y=x$ b34seend$ b34sexec options dispmoff$ b34srun$ ; run; %cb34sm(data=junk, var=x y, u8='myjob.b34', u3='myjob.b34', options=nohead) options noxwait; run; * This step calls b34s and copies files ; data _null_; command ='b34s myjob'; call system(command); run; endsas; ; ; Applied Econometric Modeling 1-13 To facilitate importing code as a goal, B34S was designed with multi-level parsing (Stokes, 1987). At the calculation or lowest stage, the procedures run their own control language, which is usually column-dependent. The B34S command language is not parsed by any procedure with the exception of the matrix and options commands. This design facilitates getting code up fast and allows the saving of partially "compiled" code, since a column-dependent command language is very fast to execute. One level up from the bottom, the B34S language (see Table 1.3) looks very much like SAS but with important differences. The B34S parser looks at paragraphs. Each paragraph begins with the key word b34sexec and ends with the keyword b34seend$ (B34S exec end) or b34srun$. Outside the paragraph, the B34S parser passes the command stream to the next level, taking out only comments. This allows two levels of command language to be mixed in the same file. The B34S parser first scans for any B34S macro commands, which, if found, are expanded first.9 In the next parse pass, once the B34S parser detects the key word b34sexec, it reads the complete paragraph and writes the command language of the next level down. Hence the B34S parser stands outside the program in the sense that it is a program generator. Its function is to provide a user command interface and write lower level commands. The B34S program produces two files: *.log containing a listing of the commands parsed together with any errors found and a *.out file, which contains the output of the program. B34S currently runs on the platforms listed in Table 1.5. In the early 1970s B34S ran only under MVS. Later, a port was made to CMS as compilers on IBM progressed from the G compiler, through the H compiler, to the H extended compiler and, finally, to a succession of IBM Fortran VS compilers. At every stage a conscious effort was made to run under full optimization, using the most modern routines. Stokes (2003b) lists some of this history. A major design goal of B34S is to provide one-way, and in many cases two-way, links with 9 There is the potential for confusion between the terms MARCO file and macro command. A MACRO file implements what used to be called an IBM PDS file where the partitions are divided by ==NAME b34sexec commands here; == ==NAME2 b34sexec command here == Such macros are called by b34sexec option include(‘file.mac’) member(name1); b34srun; macro commands, on the other hand, are a programming language that stands outside the normal B34S commands and allows code generation. Examples are given below. 1-14 Chapter 1 other software systems. This is especially true with the various PC versions of B34S. There are currently two-way links with SAS, SPEAKEASY, SCA , MATLAB and RATS. The term "oneway link" means that B34S can make a data-loading step for the other program and pass commands to the program. If the other program can be loaded under B34S with the Lahey call system(' ') subroutine, the other program's output will be seen in the B34S output window as if the other program was a part of B34S. The term "two-way link" means that B34S can read data files from the other program or pass data and commands to the other program. The term "be called by" means that the other program can call B34S, pass data and obtain results as if B34S was a subroutine of the other program. In 1991, B34S was ported to run under the Lahey Fortran compiler F77L-EM/32. Due to the excellent design of this compiler, only two basic changes were needed. These included making sure that the CHARACTER data type was not passed as an address to a routine that thought it was REAL*8, which was allowed under IBM. The other change involved replacing a BAL memory allocation routine with the Lahey Fortran 90 ALLOCATE command. The only capability in B34S that did not port was the dynamic link to a user-compiled Fortran subroutine. The developer of B34S considered implementing this facility with a DLL, although the nonlinear capability in the matrix command that allows links to user programs written in the matrix language makes this increasingly unlikely. An argument for not implementing a DLL link is that the user would require a Fortran or c compiler and models would not be portable across different platforms. In the development of B34S, every effort has been made to make the program independent of any Microsoft® conventions. With the availability of then state of the art 486/DX33 machines and the Lahey compiler, the PC became a viable research platform. Today with the substantially faster Intel chips, PC performance, especially under Linux, is comparable or better than work station performance at a substantially reduced cost. In the early 90s it became apparent that mainframe capability in the PC was not enough. An interface was needed, although batch capability had to be maintained. B34S was enhanced with a GUI based on the Spindrift and Graphoria libraries and the developer of B34S started working with Don Gable, the developer of Spindrift, to test enhancements to the Spindrift library. A major addition to the new library was the doscreen subroutine, which facilitated user menus. This subroutine became the basis of the B34S makemenu command. The Spindrift library was ported to the Lahey LF90 compiler but bugs remained. The Graphoria library never worked properly under LF90, despite many releases. At present these two libraries are being used only with the 6.xx version of B34S under F77L-EM/32, which works well on smaller machines, but has been frozen. In late 1995 the more powerful Interacter Library was integrated with B34S. All features of the Spindrift and Graphoria libraries were implemented, enhancements to the interface were made, and substantial graphic capability was added. In 1999 the same source was made to run on Windows 98/NT/2000 with LF95, Linux with LF95, RS/6000 and Sun. A major advantage to building B34S around IMSL and Interacter is the portability that these systems provide. The Windows B34S does not make any direct API calls itself. These are handled by Interacter which uses the Windows API on Windows and the unix X-Windows system on all other platforms. Table 1.5 lists all the past and current versions of B34S. 1-15 Applied Econometric Modeling Table 1.5 B34S Platforms Hardware Frozen Versions 3090 3090 386/486/586 386/486/586 386/486/586 Supported Versions RS/6000 SUN Intel Intel Operating System Compiler Name Version MVS CMS DOS DOS Windows 95/98/NT VS version 2.4 VS version 2.6 F77L-EM/32 F77L-EM/32 LF90 B34S B34S B34S B34SI B34SW 21Nov86 6.23 6.23a 7.11c 7.11c AIX Solaris Windows NT/2K/XP Linux AIX F77&F90 LF95 LF95 B34SX B34SX B34SLF95 B34S 8.10z 8.10z 8.10z 8.10z _____ Notes: All currently supported versions were built using the Interacter Subroutine Library and link in routines from the IMSL Fortran library. B34S version 6.23a on the PC used the Spindrift and Graphoria libraries. The LF95 versions of B34S use code targeted for the Pentium® II, and III chips. 1.3 B34s Display Manager The B34S Display Manager provides a front end into the program as well as a means by which to write lower level code. The design of this facility is sufficiently unique to warrant discussion in its own right. This facility would not be possible without the Interacter Library. The Display Manager: - Manages the B34S *.log file. - Manages the B34S *.out file. - Allows the user to edit and submit jobs. - Provides access to a user-modifiable help facility. - Provides access to quick graphics. - Provides access to a user-modifiable menu generator. - Provides access to all help files, example jobs and shell files. - Allows calls to be made to other supported systems. Once a job with B34S commands is submitted from the Display Manager, it is parsed in the usual manner and run by the base B34S system. Upon completion B34S returns to the Display Manager. 1-16 Chapter 1 Apart from the multi-level nature of B34S, such an organization is increasingly common. Another approach would be to have the front end directly control the operation of the program, rather than run through a lower-level control language. What is relatively unique about the B34S Display Manager is the menus of the GUI that set up specific commands are themselves generated by programs written in the B34S language. There are several advantages to this design. These include the following: - The ability of users to customize the menus. The removal of menu code from the b34s load module. The ability to "fix" menus without the necessity to recompile. 1-17 Applied Econometric Modeling - The ability to manage menu systems from the user level. The built-in ability of the menu language itself to write program statements in the b34s language and in the language of other programs. In the author’s experience, the first product that provides such an extensive capability is the SAS/AF® facility. After B34S implemented this facility in the early 90’s a few other systems such as MATLAB and later RATS designed such a user extendable menu facility. The advantage of the elegant MATLAB implementation is that it is written in Java. The advantage of the B34S approach is that it runs equally well in text or graphics more. The former implementation is useful on UNIX systems. On Microsoft systems Visual Basic is an attempt to provide part of this capability across a limited number of platforms from outside the system. The end result of this approach is inferior to having the GUI menu generator built right into the application. The way in which the B34S Display Manager menu generator is implemented will now be discussed. Table 1.6 lists B34S makemenu control language, which will generate a B34S menu using the Interacter Software Library to input regression commands. The example uses the B34S macro language. The first field type=info sentence places the text "OLS Model Building" at row=2 and column=2 in the menu. The next field sentence sets a number of B34S macro variables. The third field type=input sentence displays "Beginning obs:" at column=2 in row=4 and asks for input. Automatic help in the form of "Blank defaults to 1" is displayed at the bottom of the screen when this line executes. After stepping through the menu and entering data, the menu can be executed by the enter key. An important advantage of the makemenu facility is that users have full control of menus and can modify them as well. At execution the special comments of the form /$# after the pgmcards$ sentence become commands. B34S macro variable structures such as /$# /$# /$# %b34sif(&in1.ne.0)%then ibegin=%b34seval(&in1) %b34sendif $ $ resolve to be B34S parameters when B34S macro variables such as &in1 are found to be NE 0. More information on the B34S makemenu command and the B34S command language is contained in the on-line B34S help documents. Table 1.6 B34S MAKEMENU Commands to Generate the rr Command B34SEXEC MAKEMENU COMMANDN('OLS Model Building') COMMANDH('Controls setting up a simple OLS Model. Optionally model' 'diagnostic tests and nonlinearity tests can be requested')$ FIELD TYPE=INFO PAGE=1 ROW=2 COL1=2 TEXTCOLOR=YELLOWCHR TEXT(' OLS Model Building') TEXTID(' ')$ 1-18 Chapter 1 FIELD TYPE=HIDDEN PAGE=1 ROW=1 COL1=1 PRELINE('%A34SLET IN1 '%A34SLET IN2 '%A34SLET IIN2 '%A34SLET var3 '%A34SLET white '%A34SLET DFTEST '%A34SLET PPTEST '%A34SLET LMTEST '%A34SLET ACFVARSQ '%A34SLET PACFVARSQ '%A34SLET HINICH = = = = = = = = = = = 0^ ' 0^ ' 0^ ' "_NULL_"^' 0 ^ ' 0 ^ ' 0 ^ ' 0 ^ ' 0 ^ ' 0 ^ ' 0 ^ ' )$ FIELD TYPE=INPUT PAGE=1 ROW=4 COL1=2 LETNAME(IN1) FIELDTYPE=INTEGER DEFAULT=' ' INTRANGE(0,999999999) COL2=24 TEXTID='Blank defaults to 1' TEXT('Beginning obs.:') $ FIELD TYPE=INPUT PAGE=1 ROW=5 COL1=2 LETNAME(IN2) FIELDTYPE=INTEGER DEFAULT=' ' COL2=24 TEXTID='Blank defaults to last observation' INTRANGE(0,999999999) TEXT('Ending Obs.:') OPTIONAL $ field type=input fieldtype=chartest page=1 row=6 col1=2 preline('%A34SLET white=1^') col2=24 fieldhelp('Use White or Robust SE in place of usual formula') textid('Enter YES use White SE') text('White (1980) SE:') default=('no ') POSTSTRING('YES') $ field type=input fieldtype=chartest page=1 row=7 col1=2 preline('%A34SLET HINICH=1^') col2=24 fieldhelp(' Turns on Hinich residual testing options') textid('Enter YES to perform Hinich non linearity tests') text('Hinich t.:') default=('no ') POSTSTRING('YES') $ FIELD TYPE=INPUT PAGE=1 ROW=8 COL1=2 LETNAME(DFTEST) FIELDTYPE=INTEGER DEFAULT=' ' COL2=24 TEXTID='Set Order of Dickey-Fuller Test' INTRANGE(0,999999999) TEXT('D-F Test:') OPTIONAL $ FIELD TYPE=INPUT PAGE=1 ROW=9 COL1=2 LETNAME(PPTEST) FIELDTYPE=INTEGER DEFAULT=' ' COL2=24 TEXTID='Set order of Phillips-Perrone test' Applied Econometric Modeling INTRANGE(0,999999999) TEXT('P-P Test:') OPTIONAL $ FIELD TYPE=INPUT PAGE=1 ROW=10 COL1=2 LETNAME(LMTEST) FIELDTYPE=INTEGER DEFAULT=' ' COL2=24 TEXTID='Set order of Engle Lagrangian Multiplier test' INTRANGE(0,999999999) TEXT('L-M Test:') OPTIONAL $ FIELD TYPE=INPUT PAGE=1 ROW=11 COL1=2 LETNAME(ACFVARSQ) FIELDTYPE=INTEGER DEFAULT=' ' COL2=24 TEXTID='Set order of ACF for ARCH Test on Squared Residuals' INTRANGE(0,999) TEXT('ACF ARCH Test:') OPTIONAL $ FIELD TYPE=INPUT PAGE=1 ROW=11 COL1=42 LETNAME(PACFVARSQ) FIELDTYPE=INTEGER DEFAULT=' ' COL2=64 TEXTID='Set order of PACF for ARCH test on Squared Residuals' INTRANGE(0,999) TEXT('PACF ARCH Test:') OPTIONAL $ FIELD TYPE=INPUT PAGE=1 ROW=18 COL1=2 LETNAME(var1) FIELDTYPE=QVARLIST DEFAULT=' ' COL2=24 TEXTID='Specify left hand variable names here' TEXT('Left Hand Var:') REQUIRED $ FIELD TYPE=INPUT PAGE=1 ROW=19 COL1=2 LETNAME(var2) FIELDTYPE=QVARLIST DEFAULT=' ' COL2=24 TEXTID='Specify right hand variables' TEXT('Right Hand Var:') required $ FIELD TYPE=INPUT PAGE=1 ROW=20 COL1=2 LETNAME(var3) FIELDTYPE=QVARLIST DEFAULT=' ' preline('%A34SLET IIN2=1^') COL2=24 TEXTID='Specify right hand variables' TEXT('Right Hand Var:') optional $ PGMCARDS$ /$# B34SEXEC RR /$# %B34SIF(&IN1.NE.0)%THEN /$# IBEGIN=%B34SEVAL(&IN1) /$# %B34SENDIF /$# %B34SIF(&IN2.NE.0)%THEN /$# IEND =%B34SEVAL(&IN2) /$# %B34SENDIF $ $ $ $ 1-19 20 /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# /$# Chapter 2 %B34SIF(&WHITE.NE.0)%THEN WHITE %B34SENDIF $ $ $ bispec %b34sif(&hinich.ne.0)%then $ iturno iauto vhtest %b34sendif $ %b34sif(&DFTEST.ne.0)%then $ DF ADF(%b34seval(&DFTEST)) ADFT(%b34seval(&DFTEST)) %b34sendif $ %b34sif(&PPTEST.ne.0)%then$ PP APP(%b34seval(&PPTEST)) APPT(%b34seval(&PPTEST)) %b34sendif $ %b34sif(&LMTEST.ne.0)%then $ LM(%b34seval(&LMTEST)) %b34sendif $ %b34sif(&ACFVARSQ.ne.0)%then $ ACFVARSQ(%b34seval(&ACFVARSQ)) %b34sendif $ %b34sif(&PACFVARSQ.ne.0)%then $ PACFVARSQ(%b34seval(&PACFVARSQ)) %b34sendif $ $ MODEL %b34seval(&var1) = %b34seval(&var2) %b34sif(&iin2.ne.0)%then $ %b34seval(&var3) %b34sendif $ $ B34SEEND $ B34SRETURN$ B34SEEND$ The above example provides only a taste of what is possible.10 10 Bill Lattyak’s WORKBENCH program, which is actually a powerful scripting program, provides a very userfriendly and powerful way to run B34S that seems substantially lower the learning costs. The modern group of users, having grown up in the GUI world, are not naturally drawn to scripting languages and thus are particularly helped by such aids. Scripts build by WORKBENCH can be saved and further edited by the user. Regression Specification Tests 21 1.4 Conclusion Depending on the specific econometric problem, the chapters in this book do not necessarily have to be read in chronological order. Chapter 2 should probably be read to get some idea of the assumptions of the basic OLS model. If only the matrix command is needed, the reader can skip to Chapter 16, which provides an introduction to this facility. Since the other chapters show matrix command applications, as these are encountered, the reader may have to consult Chapter 16 to obtain a better idea of the structure of the programming language.