AsPoly Journal of Sciences, Engineering and Environmental Studies Volume 1: No.1, March 2020, pp. 145 – 165 MODELING HUMAN DEVELOPMENT INDEX USING DISCRIMINANT ANALYSIS Iwuagwu, Chukwuma E. Nwosu, Moses Obinna Department of Statistics Abia State Polytechnic, Aba Abstract The basic objective of human development is to create an enabling environment for people to live long, healthy and creative lives and it is composite statistic of the life expectancy, education income per capita indicators which are used to rank countries into tiers of human development. The paper determined the critical variables that discriminate between High and Low human development indexes. The data was collected through secondary source from United Nation Development Programme (UNDP) 2017. Thirteen variables (indicators) were selected from thirty four countries classified high and low human development. The data was analyzed with discriminant analysis using the stepwise discriminant function and location model. The significance of the developed model was tested and the developed model for high and low human development countries is given by 0.783x1 0.490 x11 The study revealed that out of the thirteen indicators considered to be critical, only two were most important discriminatory variables. Key words: Human Development Index, Location model, Stepwise Discriminant Function, and Indicators. 145 AsPoly Journal of Sciences, Engineering and Environmental Studies 1.1 Introduction Human development has achieved almost universal recognition as the most effective process of enabling the individuals to live their lives in the manner they value, through enrichment of the opportunities and increasing their capacities to use the basic human rights. The basic objective of human development is to create an enabling environment for people to live long, healthy and creative lives. Human Development Index (HDI) is a composite statistic of the life expectancy, education and income per capita indicators which are used to rank countries into tiers of human development. For a country to attend High Human Development depends on successful development and implementation of the necessary variables as enumerated by United Nation Development Programme (UNDP). Human development index is complex, usually requiring simultaneous attention to a wide variety of human, budgetary and technical variables. However the contributions of these variables (indicators) still remain statistically insignificant to the growth of nation’s human development index (HDI). 1.2 Statement of Problems Due to the present state of the economy of nations caused by covid 19 and floods of variables used in calculating human development index (HDI) there are needs to check the real contribution of these variables using a reliable statistical tool 1.3 Objectives The work is intended to achieve the following: 1. To determine the variables that are critical to the achievement of High or Low Human Development. 146 AsPoly Journal of Sciences, Engineering and Environmental Studies 2. Determine the most important variables that discriminate between High and Low Human Developments. 3. Determine the relative discriminatory power of these variables 4. Determine the average error rate 1.4 Hypothesis Ho: The model is not significant in discriminating between High and Low Human Development. H1: The model is significant in discriminating between High and Low Human Development. 1.5 SIGNIFICANCE OF THE STUDY The study will be significant because it will help to determine redundancies among the variables which always receive some quotas of values as was assumed to have contributed to the growth of nation’s human development index (HDI). 2.1 Review of Literature Hamid, Mei and Yahaya (2017) in their work designated as "New Discrimination Procedure of Location Model for Handling Large Categorical Variables". The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables 147 AsPoly Journal of Sciences, Engineering and Environmental Studies even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables. Ikechukwu (2016) in his work titled “Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables.” He said that, classification problems often suffer from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. 148 AsPoly Journal of Sciences, Engineering and Environmental Studies His paper was concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. They obtained two major results from this study. Firstly, using the simulation experiments we ranked the estimators as follows: DS, O, OS, U, R, JK, P and D. The best method was the DS estimator. Secondly, they concluded that, it is better to increase the number of variables because accuracy increases with increasing number of variables. Also, the general trend for the estimators was an increase in error rate as sample size decreases while decreasing the distance between populations generally increase the error rate. DS estimator was the most consistent and thus reliable over all combinations of probability pattern and sample sizes. El-Hanjouri and Hamad (2015) on their work titled “Using Cluster Analysis and Discriminant Analysis Methods in Classification with Application on Standard of Living Family in Palestinian Areas”. In their research work, they applied methods of multivariate statistical analysis, especially cluster analysis (CA) in order to recognize the disparity in the living standards for family among the Palestinian areas. The research results concluded that, there was a convergence in living standards for family between two areas formed the first cluster of high living standards which are the urban of middle West Bank and the camp of middle West Bank, also there was a convergence of living standards for family among the seven areas formed the second cluster of middle living standards which are the urban of North 149 AsPoly Journal of Sciences, Engineering and Environmental Studies West Bank, the camp of North West Bank, the rural of North West Bank, the urban of South West Bank, the camp of South West Bank, the rural of South West Bank and the rural of middle West Bank. In addition, there is a convergence of living standards for family the three areas formed the third cluster of low living standards which are the urban of Gaza strip, the rural of Gaza strip and the camp of Gaza strip. After a comparison among several methods of cluster analysis through a cluster validation (Hierarchical Cluster Analysis, K-means Clustering and K-medoids Clustering), the preference was for the Hierarchical Cluster Analysis method. However, after an examination to choose the best method of connection through agglomerate coefficient in the Hierarchical Cluster Analysis (Single linkage method, Complete linkage method, Average linkage method and Ward linkage method), the preference was for Ward linkage method which has been selected to be used in the classification. Moreover, the Discriminant Analysis method (DA) applied to distinguish the variables that contribute significantly to this disparity among families inside Palestinian areas and the results show that the variables of monthly Income, assistance, agricultural land, animal holdings, total expenditure, imputed rent, remittances and non-consumption expenditure are significantly contributed to disparity. El-Habil, and El-Jazzar (2013) in their paper titled “A Comparative Study between Linear Discriminant Analysis and Multinomial Logistic Regression.” Their paper aimed to compare between the two different methods of classification: linear discriminant analysis (LDA) and multinomial logistic regression (MLR) using the overall classification accuracy, investigating the quality of their prediction in terms of sensitivity and specificity, and examining area under the ROC curve (AUC) in order to make the choice between the two methods easier, and to 150 AsPoly Journal of Sciences, Engineering and Environmental Studies understand how the two models behave under different data and group characteristics. Model performance had been assessed from two special cases of the k-fold partitioning technique, the ‘leave-one-out’ and ‘hold out’ procedures. The performance evaluation for the two methods was carried out using real data and also by simulation. Results show that logistic regression slightly exceeds linear discriminant analysis in the correct classification rate, but when taking into account sensitivity, specificity and AUC, the differences in the AUC were negligible. By simulation, we examined the impact of changes regarding the sample size, distance between group means, categorization, and correlation matrices between the predictors on the performance of each method. Results indicate that the variation in sample size, values of Euclidean distance, different number of categories have similar impact on the result for the two methods, and both methods LDA and MLR show a significant improvement in classification accuracy in the absence of multicollinearity among the explanatory variables. Fernandez, G. (2009) in his work "Discriminant Analysis, a Powerful Classification Technique in Predictive Modeling". He observed that discriminant analysis is one of the classical classification techniques used to discriminate a single categorical variable using multiple attributes. Discriminant analysis also assigns observations to one of the pre-defined groups based on the knowledge of the multi-attributes. When the distribution within each group is multivariate normal, a parametric method can be used to develop a discriminant function using generalized squared distance measure. The classification criterion is derived based on either the individual within-group covariance matrices or the pooled covariance matrix that also takes into account the prior probabilities of the classes. Non-parametric discriminant methods are based on non-parametric group151 AsPoly Journal of Sciences, Engineering and Environmental Studies specific probability densities. Either a kernel or the knearest-neighbor method can be used to generate anonparametric density estimate in each group and to produce a classification criterion. The performance of a discriminant criterion could be evaluated by estimating probabilities of misclassification of new observations in the validation data. Hendrik , Howard & Maximilian (2009) examined the consequences of data error in data series used to construct aggregate indicators, using the most popular indicator of country level economic development, the Human Development Index (HDI). They identify three separate sources of data error and propose a simple statistical framework to investigate how data error may bias rank assignments and identify two striking consequences for the HDI. First, using the cutoff values used by the United Nations to assign a country as ‘low’, ‘medium’, or ‘high’ developed, they found that currently up to 45% of developing countries are misclassified. Moreover, by replicating prior development/macroeconomic studies, they found that key estimated parameters such as Gini coefficients and speed of convergence measures vary by up to 100% due to data error. Hendrik et al. (2009) discussed that frequently social and economic indicators on a country are collapsed into a single, unit free and often double bounded index which forms the basis for cross country comparisons. Such indexes are used to assess country investment risk, political stability, development status, to name but a few. The objective of this paper is to show some of the consequences if indicators are subject to data error. In their empirical analysis, they examine the United Nations’ Human Development Index (HDI) which has become the most widely used measure to communicate the state of a country’s development status. The HDI is currently further 152 AsPoly Journal of Sciences, Engineering and Environmental Studies applied to differentiate between countries of ‘low’, ‘medium’ and ‘high’ development status. Institutions as well as the academic literature explicitly and implicitly accept the HDI values of 0.5 and 0.8 to separate countries into these triple bins. They identify three sources of HDI data error and make the following three empirical contributions. First, they calculate country specific noise measures due to measurement error and formula choice/inconsistencies in the cut-off values. Second, they calculate the misclassification measures with respect to these three sources of data error by simulating the probabilities of being misclassified and sensitivity analysis of the cut-off values. Third, they reproduce prior academic studies and again apply sensitivity analysis with respect to the three sources of data error. Hendrik et al. (2009) find that the HDI statistics contain a substantial amount of noise on the order of 0.01 to 0.11 standard deviations. Secondly, they show that up to 45% of the developing countries are misclassified due to failure to update the cutoff values. The continuous HDI score jointly with this framework of the discrete classification system is vulnerable when many countries are close to the thresholds, as is the case in the most recent years. Third, they discuss various empirical examples from the prior macroeconomic/development literature where the HDI has been employed (Gini coefficients, convergence regressions and foreign aid) and find that its use is very problematic as key parameters of the past academic literature vary by up to 100% in their values. Their results raise serious concerns about the triplebin classification system and they suggest that the United Nations should discontinue the practice of classifying countries into these bins of human development. In their view the cut-off values are arbitrary, can provide incentives for strategic behavior in reporting official statistics, and 153 AsPoly Journal of Sciences, Engineering and Environmental Studies have the potential to misguide politicians, investors, charity donors and the public at large. Suman and Antonio (2017) present a critical evaluation of the indices for measuring human development and poverty in various Human Development Reports. They showed how these indices have evolved over time to capture various aspects of wellbeing and deprivations. The introduction of simplified indices in their early reports was required to catch the attention of the mass media and policy makers to put the concept of human development on the agenda. However, a simplified index is not sufficient for capturing the complexity of human lives and their development and deprivations. These make the construction of these indices more complex. More complex indices, however, make their interpretation difficult. Hence, further research is required to amend the indices in a direction that maintains the intuitive interpretations of the indices and, at the same time, captures the complex realities of human development and deprivations. Another important issue with these indices is the requirement of data. Suman and Antonio reiterated that the consideration of joint distribution is imperative in order to graduate an index of wellbeing or poverty from its composite index status to a truly multidimensional index status. The UNDP has moved in this direction by introducing a multidimensional measure of poverty. However, a move in the same direction has not been possible for the measurement of human development, primarily due to the lack of appropriate data. Their proposals for theoretical improvements cannot be materialized without solving the data constraints first. They discussed the measurement of human development and poverty, especially in United Nations Development Program’s global Human Development Reports. They first outlined the methodological evolution of different indices over the last two decades, focusing on 154 AsPoly Journal of Sciences, Engineering and Environmental Studies the well-known Human Development Index (HDI) and the poverty indices. Moore (2012) evaluated five discrimination procedures for binary variables. These procedures include first and second approximations to multinomial probabilities, the full multinomial model, and the linear and quadratic discriminants. This study evaluated these five procedures through the introduction of correlation and higher orderterms which can be used to characterize any population distribution and the effect of these terms on misclassification probabilities. The classification of these estimates was done by the Baye’s rule. The sampling experiment was performed with Monte Carlo and the results indicated that care should be used in the selection of a procedure for discriminate with binary variables. It showed that, in population where the log likelihood ratio undergo a reversal both the Linear Discriminant Function (LDF) and the first order (independent variables) procedures lead to significantly greater actual error than the full multinomial procedure. In population without reversals, the LDF and first procedure performs better than any of the others. Wilson (2007) investigated the use of stratification to improve discrimination when prior probabilities vary across strata of a population of interest. The researcher considered a screening rule employed to classify a population of potential cancer patients into one of two subpopulations, e.g into a high risk group or into a low risk group. On the basis of a set of diagnostic tests and patients treatment will be determined by the outcome of the classification. The work suggested how to adopt the discriminate function to account for stratified prior probabilities and compared the resulting misclassification probabilities with those obtained when stratification is ignored in favour of pooled prior probability estimates. The study adopted a Monte Carlo analysis to stimulate data and compare the asymptotic and finite sample 155 AsPoly Journal of Sciences, Engineering and Environmental Studies performance of these three discriminant approaches when difference prior probabilities exist across strata using overall misclassification probability as the criterion of comparison. The asymptotic result indicated that, potential gains from a stratified discriminant approach can be substantial when there is variability in the prior probabilities. The gain in the discrimination is an increasing function of the level of variability in the prior probabilities. The largest gains occurred when the two subpopulations are separated at an intermediate distance in terms of the discriminate variables. The finite sample result indicated that gains in discrimination ability can be realized in small sample, both if two events under study are equally common and if there is at least moderate variability in priors. Leung (2007) considered the problem of classifying as individual into one of the two given groups, 1 and 2 , 𝑈, based on a random vector measurement consisting of both binary and continuous variables. The researcher adopted the location model proposed by Krzanowski 1975 and derived the asymptotic distribution of the studentized location linear discriminant function directly without the inversion of the corresponding characteristic function. The resulting plug-in estimates of the overall error of misclassification consist of the estimate based on the limiting contribution of the discriminant plus a correction term to the second order. The work finally reexamined and analyzed the example used in the medical study reported e 0 . in Chang and Afifi 1974 and calculated the value of Shia, Jianping, Kuangnan, and Shuangge (2011) Proposed a way to describe the uncertainty of allocation by using the crisp set theory that the eigenvalues of the data matrices are either 1or 0, which suggests that an individual either belongs or does not belong to a specific 156 AsPoly Journal of Sciences, Engineering and Environmental Studies set. But in fuzzy set theory, the eigenvalues can belong to the interval of (0,1)and this referred to as the degrees of membership. The article used the multivariate analysis approach in fussy theory to classify undefined observation. The key idea in the work is that they first gave a membership degree to each observation in every known group as prior information and unknown groups by their corresponding membership degree. Secondly, the work used the fisher’s linear discriminant function to maximize the ratio of the between-group sum of squares over the within-group sum of squares by applying the initial degree of membership for new observations and the coefficient of the discriminant function using the unknown sample .After that the fitted degrees membership for new observation and the linear combination can be found. Third determine which groups the new observations belong to and calculate the classification error. Last the comparison between the fuzzy discriminant method and canonical discrimination were made. The study was analyzed with iris data published by fisher 1936. The result showed that fuzzy canonical discriminant analysis can reduce the risk of misclassification and has a satisfactory performance, an effective tool in prediction and is better than the canonical discriminant analysis. Gardner and Roux (2011) studied how the process of classification can be performed using biplot methodology approach. The researcher said that biplots are regarded as the multivariate analogues of scatter plots allowing for visual appraisal of the structure of the data in a few dimensions and biplot axes are used to relate the plotted point to the original variables, as is the case in original scatter plots. The application of biplot methodology in discriminant analysis follows from using the canonical variate analysis (CVA) biplot as a graphical representation of linear discriminant analysis (LDA). The mahalanobis distances between the means in the original space is transformed to Pythagorean distance in 157 AsPoly Journal of Sciences, Engineering and Environmental Studies the canonical space. Pythagorean distance is used to classify a new sample to the nearest class mean. The classification regions can be indicated by appropriately colouring each point according to the nearest class mean and this has the advantage of a reduction in dimension. The reduced space can be more stable and therefore the dimension reduction could yield better classification performance. The paper focused on discriminant analysis with categorical predictor variables and in particular the ease of dealing with these categorical predictors by formulating discriminant analysis in terms of biplot methodology. Furthermore it is known that categorical predictors can cause problems in certain discriminination situations, in particular where so called reversals are present. A variable is said to undergo a reversal if the true log ratio of the class-conditional densities does not increase monotonically with the number of positive predictor variables. The work investigated the performance of discrimination formulated in terms of biplot methodology with a simulation study. The result showed that the linear discriminant analysis (LDA) behaves poorly than the biplot based approach. Hendrik, Howard & Maximilian (2009) examines the consequences of data error in data series used to construct aggregate indicators, using the most popular indicator of country level economic development, the Human Development Index (HDI). They identify three separate sources of data error and propose a simple statistical framework to investigate how data error may bias rank assignments and identify two striking consequences for the HDI. First, using the cutoff values used by the United Nations to assign a country as ‘low’, ‘medium’, or ‘high’ developed, they found that currently up to 45% of developing countries are misclassified. Moreover, by replicating prior development/ macroeconomic studies, they found that key estimated 158 AsPoly Journal of Sciences, Engineering and Environmental Studies parameters such as Gini coefficients and speed of convergence measures vary by up to 100% due to data error. 3.1 Methodology The data for the work was collected through secondary source from United Nation Development Programme (UNDP) report from 2014-2018. Thirteen variables (indicators) were selected from thirty-four countries classified by United Nation Development Programme as High and Low Human Development. The data was analyzed with Discriminant Analysis using the Stepwise Discriminant Function and Location Model. 3.2 Discriminant Analysis The problem that is addressed with discriminant function analysis is how well it is possible to separate two or more groups of individual given measurements for these individuals on several variables. The variables are: X1 represents Agriculture, value added (% of GDP) X2 represents Exports of goods and services (% of GDP) X3 represents Fertility rate, total (births per woman) X4 represents GDP growth (annual %) X5 represents Gross capital formation (% of GDP) X6 represents Imports of goods and services (% of GDP) X7 represents Inflation, GDP deflator (annual %) X8 represents Life expectancy at birth, total (years) X9 represents Military expenditure (% of GDP) X10 represents Mortality rate, under-5 (per 1,000) X11 represents Population growth (annual %) X12 represents Population, total X13 represents Urban population growth (annual %) 159 AsPoly Journal of Sciences, Engineering and Environmental Studies 3.3 Stepwise Discriminant Function In this method, variables are added to the discriminant function one by one until it is found that adding extra variable does not give significant better discrimination. The Wilk’s Lambda Criterion was used as the criterion for entering the equation. The Wilk’s Lambdas (λ) is defined as Sc St Where, The matrix Sc is the error of squares and cross product matrix (SSCP) for their sample. St is the total SSCP matrix. This is the matrix of sets of squares and cross products of entire combined sample regardless of which population give rise to the sample items under observation. As in ANOVA, we have the relation Where, St S A S SA is the among SSCP matrix Therefore, for each one of the sample, we can define the SSCP matrix as: W12 W2W1 S W1 Wk W1W2 Wi Wk W22 W2 Wk W2 Wk2 3.4 ESTIMATION OF ERROR RATE The success of an allocation rule can be assessed by the probability of misclassifications or error rates that it gives 160 AsPoly Journal of Sciences, Engineering and Environmental Studies to. If the parameters are known in the location model the error rate are given by: p2 1 p1m logP2 m p1m 12 Dm2 Dm k m1 and p1 2 p2 m log p1m p2 m 12 Dm2 Dm k m1 where, is the cumulative standard normal distribution function and Dm2 1m 2m 1 1 m 2m is the Mahalanobi’s squared distance between 1 and 2 in cell j of the multinomial table. 4.1 Analysis The group statistics shows that the mean for a High Human Development indicators lies between 0.6436 and 80.2112 and Low Human Development lies between 1.3322 and 72.3882, while the standard deviation lies between 0.40527 and 37.22165 for High Human Development while Low Human Development lies between 0.62177 and 13.84240. The standard deviation for various factors was used in the measure of relative discriminatory power of the variables. The matrix reflected the amount of variation in the sample and also the extent to which the selected indicators are correlated. The Canonical correlation of 0.905 indicated a very high correlation among the indicators. The Stepwise method chooses XI and XII, that is Agriculture, value added (% of GDP) and Population growth (annual %) as the two out of thirteen most discriminating variables between the High and Low Human Development indicators. The Fisher Linear Discriminant Function (FLDF) is 0.783x1 0.490 x11 161 AsPoly Journal of Sciences, Engineering and Environmental Studies The positive coefficient will increase the discriminant score and hence increase the score for High Human Development. Agriculture has the most discriminatory variable followed by population growth. Testing the significance of the derived model Hypothesis Ho: The model is not significant in discriminating between the High Human Development and Low Human Development. H1: The model is significant in discriminating between the High Human Development and Low Human Development. Test of Statistic Chi-square = 53.045 P-value = 0.000 Conclusion: Since the P-value = 0.000 is less than the level of significance 0.05, we reject the null hypothesis and conclude that, the model is significant in discriminating between the High and Low Human Development. The Chisquare of 53.045 and Eigen-value of 4.535 also confirmed that the model was significant in discriminating between the High Human Development and Low Human Development. The validation count method was extremely accurate in classifying 97.1% of the total sample correctly. 5.1 Findings 1. The work showed that, the model developed 0.783x1 0.490 x11 is a significant model for discriminating between the High and Low Human Development. 2. The model revealed that, there were two most important variables or indicators (Agriculture and population growth) that discriminate between the High and Low Human Development. 162 AsPoly Journal of Sciences, Engineering and Environmental Studies 3. The study showed Agriculture explained the highest of the average discriminant score separation between High and Low groups. It is the most important discriminatory variable. 4. The Canonical correlation model which was high shows that, there existed strong relationship between the discriminant scores and the discriminating variables. 5. From the analysis, the Statistical values obtained from the analysis confirmed that the model is significant for discriminating between High and Low Human Development. 6. From the Validation Count, the model was extremely accurate in classifying 97.1% of the total sample correctly. 7. The average error rate obtained from the location model is 0.2604 which indicated that the proportion of error made by the rule showed that the model is optimal in minimizing the unconditional probability of misclassification. 5.2 Conclusion From the result, a linear combination of two variables namely; Agriculture, value added (% of GDP) and Population growth (annual %) was formed from High and Low Human Development, hence the model developed as 0.783x1 0.490 x11 They were found to be the most important factors that discriminate between the High and Low Human Development indicators. The Canonical correlation indicated a very high degree of association between the discriminant variables. From these findings, it shows that Agriculture and a reduction in the population growth of any nation can add value to its economy, if properly managed, especially now that there is 163 AsPoly Journal of Sciences, Engineering and Environmental Studies economic meltdown globally created by Covid-19. The government should intensify efforts to improve agriculture and make policies that will check increase in population growth in order to meet with the challenges caused by Covid-19 as agriculture should be the mainstay of nation’s economic development. References El-Habil, A. M., & El-Jazzar, M. (2013). A Comparative Study between Linear Discriminant Analysis and Multinomial Logistic Regression. An-Najah University Journal for Research – Humanities. 28, 1525-1548. El-Hanjouri, M. M. R. and Hamad, B. S. (2015). Using Cluster Analysis and Discriminant Analysis Methods in Classification with Application on Standard of Living Family in Palestinian Areas. International Journal of Statistics and Applications. 5(5):213-222 Fernandez, G. (2009). Discriminant Analysis, a Powerful Classification Technique in Predictive Modeling. George Fernandez University of Nevada. Reno Gardner, S. and Roux, W. (2011). Discriminant Analysis with categorical variables: A biplot based approach. Hamid, H., Mei, L. M. & Yahaya, S. S. S. (2017). New Discrimination Procedure of Location Model for Handling Large Categorical Variables. Sains Malaysiana. 46(6): 1001–1010 Hendrik W., Howard C. & Maximilian A. (2009). Human Development Index: Are Developing Countries Misclassified? Ikechukwu, E. (2016). Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. American Journal of Theoretical and Applied Statistics. 5, 173. Leung, C. Y. (2007). The student zed location linear discriminant function .Communication in statistics vol. 18; 11, 3977-3990. 164 AsPoly Journal of Sciences, Engineering and Environmental Studies Moore, D. H. (2012). Evaluation of five discriminant procedures for Binary variables. Journal of the American statistical Association. Vol.68; 342, 399-404. Shia, B. C., Jianping, Z., Kuangnan, F., and Shuangge, M. (2011). Fuzzy canonical discriminant analysis: Theory and Practice. Communications in Statistics. vol.40:10. 1526-1539. Suman S. and Antonio V. (2017). Measuring Human Development and Human Deprivations. Oxford Poverty & Human Development Initiative (OPHI) Wilson, B. (2007). Discriminant Analysis with stratified prior probabilities. Communication in Statistics. Vol.23; 5, 1283-1295. Human Development Report (2015). Work For Human Development Human Development Report (2014). Sustaining Human Progress: Reducing Vulnerabilities and Building Resilience Human Development Report Technical Notes 2014 Human Development Report Statistical Tables 2014 Human Development Report 2018 http://hdr.undp.org/en/data 165