Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 Analytical Formula for Model Selection Probability M. Shafiqur Rahman* and Syfun Nahar** At present many different information criteria are used for choosing better model from competing alternative models in applied Economics, Econometrics and Statistics. Correct selection probabilities are usually used for comparing the performances of different information criteria and choosing a good criterion. For almost all existing procedures, correct model selection probabilities are studied empirically. In this paper an analytical formula for finding the probability of correct selection in the context of linear regression model is developed. It is applied to find correct selection probabilities and to compare performances of some commonly used information criteria. It is observed that for lower parametric model BIC performs best and the performances of other 2 criteria are in order HQC, JIC, AIC, Sp, GCV, Cp and R respectively. On the other hand for higher parametric model the performances of BIC, HQC, JIC, AIC, Sp, GCV, Cp and R 2 criteria are exactly reverse. Field of Research: Econometrics, Applied Economics JEL Codes: B23, C10 and C52 1. Introduction Model selection criteria play an important role in applied Economics, Econometrics and Statistics. At present a large number of model selection criteria are available in the literature including Akaike’s (1973) information criterion (AIC), Schwartz’s (1978) Bayesian information criterion (BIC), Theil's (1961) R 2 (RBAR) criterion, Craven and Wahba’s (1979) generalized cross validation (GCV) criterion, Hannan and Quinn’s (1979) criterion (HQC), Hocking’s (1976) Sp criterion, Rahman and King’s (1999) joint information criterion (JIC), and Mallows (1964) Cp criterion for choosing an appropriate model from a number of competing alternative models for a particular data set. The performance of any criterion varies from situation to situation. None of them is better in all situations. So we need to compare all available criteria with each other to investigate which one is performing better in which situation. Mills and Prasad (1992), Fox (1995) and many authors studied the performances of model selection criteria. Mills and Prasad compared the performances of some criteria in a number of situations: robustness to collinearity among regressors, to distributional assumptions, and to nonstationarity in time series. Fox expressed some model selection criteria as penalized log likelihood functions, ranked them in terms of the penalties paid _________________________ * M. Shafiqur Rahman, Department of Operations Management and Business Statistics, College of Economics and Political Science, Sultan Qaboos University, Muscat, Sultanate of Oman. ** Syfun Nahar, Department of Mathematics and Statistics, College of Science, Sultan Qaboos University, Muscat, Sultanate of Oman. Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 for the addition of an extra parameter and interpreted them as 2 -statistics. We have compared some commonly used criteria on the basis of probabilities of correct selection. In almost all previous research on the small sample properties of model selection procedures, selection probabilities are computed empirically. For example King et al. (1996), Forbes et al. (1995) and Grose and King (1994) estimated probabilities of correct selection using Monte Carlo techniques in the general model selection problem. The level of estimation error depends on the number of replications and also the probability being estimated. In this paper an analytical formula for the probability of correct selection is obtained. By using this formula, the probabilities of correct selection are computed for different IC procedures. It is observed that when the model with lower number of regressors is true then the performance of BIC is better than that of HQC, the performance of HQC is better than that of JIC, the performance of JIC is better than that of AIC, the performance of AIC is better than that of S p, the performance of Sp is better than that of GCV, the performance of GCV is better than that of Cp, the performance of Cp is better than that of R 2 criterion. On the other hand when the model with higher number of regressors is true then the performances of BIC, HQC, JIC, AIC, Sp, GCV, Cp and R 2 criteria are exactly reverse. The plan of this paper is as follows. First we express eight model selection criteria in one common form named residual sum of squares (SS) and presented in section 2. Section 3 introduces an analytical formula for finding the probability of correct selection and comparison of eight model selection criteria based on probability of correct selection. The final section contains some concluding remarks. 2. Expressing All Criteria in Residual SS form Suppose we are interested in selecting a model from m alternative regression models M1, M2, . . ., Mm for a given data set. Let the model Mj (j = 1,2, . . ., m) be represented by Y = Xj j + Uj (2.1) where Y is an n 1 vector of observations on the dependent variable, Xj is an n (kj-1) matrix of observations on regressors, j is a (kj-1) 1 vector of regression coefficients and Uj is a vector of random disturbances following N(0, 2j I). Then the log-likelihood function for the model Mj is given by 1 n Lj( j , 2j ) = - [ln 2j + ln(2 ) + 2 Y X j j Y X j j ] . (2.2) n j 2 The log likelihood can be regarded as an estimator of the expected log likelihood. The mean expected log likelihood is the mean with respect to the data, of the expected likelihood of the maximum likelihood model and is a measure for the goodness of fit of a model. The model with the largest mean expected log likelihood can be considered to be the best model. The mean expected log likelihood can be estimated by the maximum log likelihood which for the jth model is given by Lj( j , 2j ) = n2 ln 2j ln(2 ) 1 (2.3) Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 E2 where, 2j = nj is the maximum likelihood estimator (MLE) of 2j , E 2j = (Y - Xj j )/( Y - Xj j ) is the residual sum of squares and j = ( X j X j ) -1 X jY . Unfortunately, the maximum log likelihood has a general tendency to overestimate the true value of the mean expected log likelihood. This tendency is more prominent for models with a large number of parameters and implies that if we choose the model with the largest maximum log likelihood, a model with an unnecessarily large number of parameters is likely to be chosen. It is evident from (2.3) that choosing the model with the largest maximum log likelihood is equivalent to choose the model with the smallest residual sum of squares ( E 2j ). Therefore, the model with the smallest E 2j can be considered to be the best model. Also if we choose the model with the smallest E 2j , a model with an unnecessarily large number of parameters is likely to be chosen. To overcome this problem we need some adjustment in the E 2j before using it for model selection. This is done by using a penalty function dependent on the number of parameters, among other things. Let Pj be the penalty function for the model Mj. Then we usually select the model with the smallest Ij given by Ij = E 2j Pj . (2.4) It suggests that if E 2j Pj < Ei2 Pi, i = 1,2,..,( j - 1),( j 1),.., m (2.5) then the model Mj will be our choice of the best model. All existing model selection criteria can be expressed in the above form. (a) Theil’s adjusted R2 criterion Theil (1961) suggested the adjusted R2 criterion for model comparison and is given by nE 2j 2 . (2.6) R =1n - k j S 2j This criterion will select the model with the largest R 2 . The probability of correct selection under this criterion when the model Mj is true can be written as 2 nEi2 nE j 2 ; i = 1,2,..., ( j - 1), ( j 1),.., m . P(CS| Mj, R ) = P (2.7) n - k j n - k i The equation (2.7) implies that for model selection the R 2 criterion can be written in the following equivalent form nE 2j (2.8) R2 n - k j . Therefore, (2.8) is a particular case of (3.4) with Pj= n 2 n - k j . That is R criterion is a special case of (2.4). Also using R 2 criterion we will select the model with the smallest error variance. Theil (1961) showed that this criterion will select the true model at least Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 as often as any other model. Later on Schmidt (1973, 1975) showed that R 2 criterion will not help us in selecting the true model from a set of competing alternative regression models if a regression model contains both the variables of the true model with some extra, irrelevant, regressors. (b) Mallows Cp criterion Mallows (1964, 1973) proposed a criterion for model selection which can be written for the jth model in the following form, (n k j ) E 2j Cp = (2.9) n - k j . Also Rothman (1968), Akaike (1969) and Amemiya (1980) have suggested this criterion. Rothman called it Jp, Akaike called it Final Prediction Error (FPE) and Amemiya called it Prediction Criterion (PC). (n k j ) Therefore the Cp criterion is a special case of (2.4) with Pj= n - k j . (c) Hocking Sp criterion Hocking (1976) suggested Sp criterion for model selection and is given by E 2j Sp = n - k j n - k j - 1 1 Therefore the Sp criterion is a special case of (2.4) with Pj= n - k j n - k j - 1 (2.10) (d) Generalised Cross Validation criterion Craven and Wahba (1979) proposed Generalised Cross Validation (GCV) criterion and can be written as E 2j GCV = . (2.11) 2 kj 1 - n Therefore the GCV criterion is an special case of (2.4) with Pj= 1 kj 1 n 2 . (e) Hannan and Quinn criterion (HQC) Hannan and Quinn (1979) suggested a model selection criterion and can be expressed as HQC = E 2j ln n 2k j n . (2.12) Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 Therefore the HQC is an special case of (2.4) with Pj= ln n 2k j n . (f) Akaike information criterion (AIC) Akaike (1973) proposed a model selection criterion usually denoted by AIC and can be expressed as AIC = Lj( j , 2j ) - kj (2.13) The probability of correct selection under AIC when the model Mj is true can be written as P(CS| Mj, AIC ) = P L j ˆ j ,ˆ 2j - k j Li ˆi ,ˆ i2 - ki ; i = 1,2,.., ( j - 1), ( j 1),.., m = P ln ˆ = P n2 ln ˆ ln(2 ) 1 - k j n2 ln ˆ ln(2 ) 1 - ki ; i = 1,2,.., ( j - 1), ( j 1),.., m n 2 2 j 2 j 2 i - k j n2 ln ˆ - ki ; i = 1,2,.., ( j - 1), ( j 1),.., m 2 i 2k j 2 ki P(CS| Mj, AIC ) = PE 2j e n Ei2 e n ; i = 1,2,.., ( j - 1), ( j 1),.., m . Therefore the AIC criterion can be written in the following equivalent form AIC E e 2 j (2.14) 2k j n . (2.15) 2k j Therefore AIC is a special case of (2.4) with Pj= e n . (g) Schwarz Bayes Information Criterion (AIC) Schwarz (1978) proposed the Bayes Information Criterion usually denoted by BIC and can be expressed as kj BIC = Lj( j , 2j ) (2.16) ln n 2 The probability of correct selection under BIC when the model Mj is true can be written as kj k P(CS| Mj, BIC ) = PL j ˆ j , ˆ 2j - ln n Li ˆi , ˆ i2 - i ln n; i = 1,2,.., ( j - 1), ( j 1),.., m 2 2 kj k = P n2 ln ˆ j2 ln( 2 ) 1 - ln n n2 ln ˆ i2 ln( 2 ) 1 - i ln n; i = 1,2,.., ( j - 1), ( j 1),.., m 2 2 kj k = P n2 ln ˆ j2 - ln n n2 ln ˆ i2 - i ln n; i = 1,2,.., ( j - 1), ( j 1),.., m 2 2 kj ki P(CS| Mj, BIC ) = PE 2j n n Ei2 n n ; i = 1,2,.., ( j - 1), ( j 1),.., m . (2.17) Hence the BIC criterion can be written in the following equivalent form Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 kj BIC E n n 2 j (2.18) kj Therefore, the BIC criterion is a special case of (2.4) with Pj= n n (h) Joint Information Criterion (JIC) Rahman and King (1999) proposed the Joint Information Criterion usually denoted by JIC and can be expressed as k j 1 JIC = Lj( j , 2j ) - k j ln n - n ln 1 - . (2.19) 4 n The probability of correct selection under JIC when the model Mj is true can be written as P(CS| Mj, JIC ) k j 1 1 k 2 2 L j ˆ j , ˆ j - k j ln n - n ln 1 - Li ˆi , ˆ i - k i ln n - n ln 1 - i ; = P 4 4 n n i = 1,2,.., ( j - 1), ( j 1),.., m kj ki 1 1 (2.20) = PE 2j n - k j 2 n 2 n Ei2 n - k i 2 n 2 n ; i = 1,2,.., ( j - 1), ( j 1),.., m . Hence the JIC criterion can be written in the following equivalent form kj JIC E 2j n 2 n n - k , (2.21) j kj which is of the form (2.4) with Pj= n 2 n n - k . Therefore the AIC, BIC, HQC, R 2 , Sp, Cp, j JIC and GCV criteria are special cases of (2.4) and can be easily obtained by suitable choices of Pj. Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 Table 1: Values Pj of for Some Model Selection Criteria Criteria AIC Pj e GCV Criteria BIC 2k j n kj nn 1 kj 1 n JIC Pj HQC ln n n R2 n n - k j Sp 1 n - k j n - k j - 1 2 kj n 2 n n - k 2k j j Cp (n k j ) n - k j 3. Probability of Correct Selection Suppose CS denotes the event of correct selection and P(CS|M j) denotes the probability of correct selection when the model Mj is true. Therefore, P(CS|Mj) = P[ I j - I i < 0 | Mj ; i = 1,2, . ., (j -1), (j +1), . . , m]. (3.1) 2 2 =P[ E j Pj < E i Pi ; i = 1,2, . ., (j -1), (j +1), . . , m] As E 2j is the residual sum of squares for the model Mj, E 2j / 2j follows chi-square distribution with (n-kj) degrees of freedom. = P[ 2j , ( nk j ) 2j Pj < i2, ( n ki ) i2 Pi ; i = 1,2, . ., (j -1), (j +1), . . , m] 2j , ( nk j ) /( n k j ) (n k ) 2 P i i i ; i 1,2,...., j 1), ( j 1),..., m =P 2 2 i , ( n ki ) /( n k i ) (n k j ) j Pj As the ratios of two independent mean Chi-squares follows Snedecor’s F-distribution, therefore, (n k i ) i2 Pi ; i 1,2,...., j 1), ( j 1),..., m P(CS|Mj) = = P F( n k j ), ( n ki ) 2 (n k j ) j Pj (n k i ) Pi i2 =P F( n k j ), ( n ki ) 2 Pij ; i 1,2,...., j 1), ( j 1),..., m , where, Pij = . (n k j ) Pj j Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 For Theil’s adjusted R2, Pij = (n k j ) n /( n k i ) n - k i n /( n k j ) = 1. For Mallows Cp, Pij (n k i ) (n k i ) (n k i ) = = . Similarly Pij for other criteria can be obtained and presented (n k j ) n k j n - k j (n k j ) (n k i ) in Table 2. Table 2: Values Pij of for Some Model Selection Criteria Pij Criteria AIC (n k i ) e (n k j ) 2 ( ki k j ) n (n k j ) GCV Criteria BIC HQC n - k i (n k i ) JIC (n k ij ) ( ki k j ) n (n k i ) n k j Cp 2n Pij (n k i ) n (n k j ) ( ki k j ) (n k i ) (ln n) (n k j ) n 2 ( ki k j ) R2 1 Sp (n k j 1) n n - k i 1 3.1: Comparative Study based on probability of correct selection If ki > kj, then for Cp , Pij = (n k i ) (n k i ) > 1 but if ki < kj, then Pij = n k j n k j < 1. Let Pcs(Cp) be the probability of correct selection under Mallows Cp criterion. As Pij( R 2 ) = 1, therefore, Pcs(Cp) > Pcs( R 2 ), if ki < kj and Pcs(Cp) < Pcs( R 2 ), if ki < kj. Consider, Pij(GCV) - Pij(Cp) = (3.2) (k i k j )(k i k j ) (n k j ) (n k i ) = = positive, if ki > kj and n - k i n k j (n k i )n k j = negative if ki < kj. Therefore, Pcs(GCV) > Pcs(Cp) if ki > kj and Pcs(GCV) < Pcs(Cp) if ki < kj. (3.3) Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 Similarly, Pij(Sp) - Pij(GCV) = (n k j 1) (n k j ) (k i k j ) = = positive, if ki > kj n - k i 1 n - k i (n k i )n k i 1 and = negative if ki < kj. Therefore, Pcs(Sp) > Pcs(GCV) if ki > kj and Pcs(Sp) < Pcs(GCV) if ki < kj. Similarly, Pij(AIC) - Pij(Sp) = (n k i ) e (n k j ) 2 ( ki k j ) - n (n k j 1) n - k i 1 = positive, if ki > kj and = negative if ki < kj. Therefore, Pcs(AIC) > Pcs(Sp) if ki > kj and Pcs(AIC) < Pcs(Sp) if ki < kj. (n k i ) n Similarly, Pij(BIC) - Pij(AIC) = (n k j ) ( ki k j ) n (n k i ) e (n k j ) (3.4) (3.5) 2 ( ki k j ) n = positive, if ki > kj and = negative if ki < kj. Therefore, Pcs(BIC) > Pcs(AIC) if ki > kj and Pcs(BIC) < Pcs(AIC) if ki < kj. (3.6) Combining (3.2), (3.3), (3.4), (3.5) and (3.6) we have, Pcs(BIC) > Pcs(AIC) > Pcs(Sp) > Pcs(GCV) > Pcs(Cp) > Pcs( R 2 ), if ki > kj and Pcs(BIC) < Pcs(AIC) < Pcs(Sp) < Pcs(GCV) < Pcs(Cp) < Pcs( R 2 ), if ki < kj . Therefore, when the model with lower number of regressors is true then the performance of BIC is better than that of AIC, the performance of AIC is better than that of Sp, the performance of Sp is better than that of GCV, the performance of GCV is better than that of Cp, the performance of Cp is better than that of R 2 criterion. On the other hand when the model with higher number of regressors is true then the performances of BIC, AIC, Sp, GCV, Cp and R 2 criteria are exactly reverse. The probabilities of correct selection for some hypothetical values of n, k i, kj, i2 & 2j are calculated and presented in Table 3. Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 Table 3: Probabilities of Correct Selection for Some Criteria n 20 20 20 20 20 20 30 30 30 30 30 30 40 40 40 40 40 40 60 60 60 60 60 60 100 100 100 100 100 100 kj 3 7 2 4 3 4 3 7 2 4 3 4 3 7 2 4 3 4 3 7 2 4 3 4 3 7 2 4 3 4 ki 7 3 4 2 4 3 7 3 4 2 4 3 7 3 4 2 4 3 7 3 4 2 4 3 7 3 4 2 4 3 BIC 0.988 0.184 0.931 0.482 0.868 0.634 0.966 0.338 0.899 0.569 0.843 0.675 0.945 0.432 0.878 0.614 0.829 0.696 0.913 0.536 0.852 0.661 0.813 0.717 0.874 0.623 0.826 0.700 0.798 0.735 HQC 0.976 0.276 0.910 0.542 0.852 0.662 0.946 0.430 0.878 0.615 0.829 0.696 0.921 0.513 0.857 0.652 0.816 0.713 0.886 0.598 0.834 0.689 0.803 0.730 0.849 0.666 0.811 0.720 0.790 0.745 JIC 0.976 0.279 0.907 0.550 0.850 0.665 0.942 0.444 0.873 0.624 0.826 0.700 0.915 0.528 0.853 0.660 0.814 0.716 0.881 0.609 0.830 0.695 0.801 0.733 0.845 0.672 0.809 0.723 0.789 0.746 AIC 0.972 0.301 0.904 0.556 0.847 0.668 0.933 0.474 0.866 0.636 0.822 0.705 0.903 0.559 0.845 0.673 0.809 0.722 0.866 0.637 0.821 0.707 0.796 0.738 0.830 0.694 0.801 0.733 0.785 0.751 Sp 0.895 0.579 0.832 0.692 0.803 0.730 0.850 0.664 0.809 0.723 0.789 0.746 0.828 0.697 0.798 0.736 0.784 0.752 0.807 0.724 0.788 0.748 0.778 0.758 0.791 0.744 0.780 0.756 0.774 0.762 GCV 0.889 0.593 0.828 0.697 0.801 0.733 0.847 0.669 0.807 0.724 0.789 0.746 0.827 0.699 0.797 0.737 0.783 0.752 0.807 0.725 0.787 0.748 0.778 0.758 0.791 0.744 0.779 0.756 0.774 0.762 Cp 0.847 0.669 0.814 0.716 0.791 0.744 0.827 0.699 0.801 0.733 0.784 0.751 0.815 0.715 0.793 0.741 0.781 0.755 0.801 0.732 0.785 0.750 0.777 0.759 0.789 0.746 0.779 0.757 0.773 0.763 R2 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 0.768 Based on the information in Table 3 we may conclude that in general Pcs(BIC) > Pcs(HQC) > Pcs(JIC) > Pcs(AIC) > Pcs(Sp) > Pcs(GCV) > Pcs(Cp) > Pcs( R 2 ), if ki > kj and Pcs(BIC) < Pcs(HQC) < Pcs(JIC) < Pcs(AIC) < Pcs(Sp) < Pcs(GCV) < Pcs(Cp) < Pcs( R 2 ), if ki < kj . Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 4. Concluding Remarks We have developed analytical formulae for finding the probability of correct selection in regression model selection. We have applied these formulae for finding probabilities of correct selection under AIC, BIC, JIC, GCV, HQC, Sp, Cp and R 2 criteria and for comparing these criteria among themselves. We have observed that analytical formulae are easy to apply and less time consuming than Monte Carlo simulation. It is observed that for lower parametric model BIC performs best and the performances of other criteria are in order HQC, JIC, AIC, Sp, GCV, Cp and R 2 respectively. On the other hand for higher parametric model the performances of BIC, HQC, JIC, AIC, Sp, GCV, Cp and R 2 criteria are exactly reverse. It is well known that choosing an appropriate penalty function is the main problem in developing a new criterion. By calculating the probability of correct selection exactly we are expecting to better control the choice of penalty function. At present many information criteria are used to choose between competing alternative models. If we rank these available criteria on the basis of probabilities of correct selection, we can see that these rankings vary dramatically from one model to another. So, we need to develop new criterion which gives consistent probability for all competing models. We are currently working on this problem and expect to report on this work in a future paper. References Akaike, H 1969, ‘Fitting Autoregressive Models for Prediction’, Annals of the Institute of Statistical Mathematics, 21, 243-247. Akaike, H 1973, ‘Information theory and an extension of the maximum likelihood principle’, Proceedings of the Second International Symposium on Information Theory, B.N. Petrov and F. Csaki, Akademial Kiado, Budapest, 267-281. Amemiya, T 1980, Selection of Regressors, International Economic Review, 21, 331354. Craven, P and Wahba, G 1979, Smoothing Noisy data with Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalized Cross Validation, Numerische Mathematik, 31, 377-403. Forbes, CS, King, ML and Morgan A 1995, Small sample variable selection procedures, Proceedings of the 1995 Econometrics Conference at Monash, 343360. Fox, KJ 1995, ‘Model Selection Criteria: A Reference Source’, University of British Columbia and University of NSW, School of Economics, Sydney, Australia. Grose, SD and King, ML 1994, ‘The use of information criteria for model selection between models with equal number of parameters’ Paper presented at the 1994 Australian Meeting of the Econometric Society. Hannan, EJ and Quinn, BG 1979, The determination of the order of an auto- Proceedings of 23rd International Business Research Conference 18 - 20 November, 2013, Marriott Hotel, Melbourne, Australia, ISBN: 978-1-922069-36-8 regression, Journal of the Royal Statistical Society, Series B, 41, 190-195. Hocking, RR 1976, The analysis and selection of variables in linear regression, Biometrics, 32, 1-49. King, ML 1981, The Durbin-Watson bounds test and regressions without an intercept, Australian Economic Papers, 20,161-170. King, ML, Forbes, CS and Morgan A 1996, Improved small sample model selection procedures, Paper presented at the World Congress of the Econometric Society, Tokyo. Mallows, CL 1964, ‘Choosing Variables in a Linear Regression: A graphical Aid’, Presented at the Central Regional Meeting of the Institute of Mathematical Statistics, Manhattan, Kansas (May). Mallows, CL 1973, Some Comments on Cp, Technometrics, 15, 661-676. Mills, JA and Prasad, K 1992, A comparison of model selection criteria, Econometric Reviews, 11, 201-233. Rahman, MS and King, ML 1999, Improved Model Selection Criterion, Communications in Statistics, Simulation and Computation. 28(1), 51-71. Rahman, MS and Nahar, S 2004, Generalized Model Selection Criterion, Far East Journal of Theoretical Statistics, 12(2), 117-147. Rothman, D 1968, Letter to the Editor, Technometrics, 10, 432. Schmidt, P 1973, Calculating the Power of the Minimum Standard Error Choice Criterion, International Economic Review, 14, 253-255. Schmidt, P 1975, Choosing among Alternative Linear Regression Models: A correction and some Further Results, Atlantic Economic Journal, 3, 61-63. Schwartz, G 1978, Estimating the dimension of a model, The Annals of Statistics, 6, 461-464. Theil, H 1961, Economic Forecasts and Policy, 2nd edition, North-Holland, Amsterdam.