Uploaded by iwuagwuchuks

MODELING HUMAN DEVELOPMENT INDEX USING DISCRIMINANT ANALYSIS

advertisement
AsPoly Journal of Sciences, Engineering and Environmental Studies
Volume 1: No.1, March 2020, pp. 145 – 165
MODELING HUMAN DEVELOPMENT INDEX USING
DISCRIMINANT ANALYSIS
Iwuagwu, Chukwuma E.
Nwosu, Moses Obinna
Department of Statistics
Abia State Polytechnic, Aba
Abstract
The basic objective of human development is to create an
enabling environment for people to live long, healthy and
creative lives and it is composite statistic of the life
expectancy, education income per capita indicators which
are used to rank countries into tiers of human
development. The paper determined the critical variables
that discriminate between High and Low human
development indexes. The data was collected through
secondary source from United Nation Development
Programme (UNDP) 2017. Thirteen variables (indicators)
were selected from thirty four countries classified high
and low human development. The data was analyzed with
discriminant analysis using the stepwise discriminant
function and location model. The significance of the
developed model was tested and the developed model for
high and low human development countries is given by
  0.783x1  0.490 x11
The study revealed that out of the thirteen indicators
considered to be critical, only two were most important
discriminatory variables.
Key words: Human Development Index, Location model,
Stepwise Discriminant Function, and Indicators.
145
AsPoly Journal of Sciences, Engineering and Environmental Studies
1.1 Introduction
Human development has achieved almost universal
recognition as the most effective process of enabling the
individuals to live their lives in the manner they value,
through enrichment of the opportunities and increasing
their capacities to use the basic human rights. The basic
objective of human development is to create an enabling
environment for people to live long, healthy and creative
lives.
Human Development Index (HDI) is a composite
statistic of the life expectancy, education and income per
capita indicators which are used to rank countries into
tiers of human development. For a country to attend High
Human Development depends on successful development
and implementation of the necessary variables as
enumerated by United Nation Development Programme
(UNDP).
Human development index is complex, usually
requiring simultaneous attention to a wide variety of
human, budgetary and technical variables. However the
contributions of these variables (indicators) still remain
statistically insignificant to the growth of nation’s human
development index (HDI).
1.2 Statement of Problems
Due to the present state of the economy of nations caused
by covid 19 and floods of variables used in calculating
human development index (HDI) there are needs to check
the real contribution of these variables using a reliable
statistical tool
1.3 Objectives
The work is intended to achieve the following:
1. To determine the variables that are critical to the
achievement of High or Low Human Development.
146
AsPoly Journal of Sciences, Engineering and Environmental Studies
2. Determine the most important variables that
discriminate between High and Low Human
Developments.
3. Determine the relative discriminatory power of
these variables
4. Determine the average error rate
1.4 Hypothesis
Ho: The model is not significant in discriminating between
High and Low Human Development.
H1: The model is significant in discriminating between
High and Low Human Development.
1.5 SIGNIFICANCE OF THE STUDY
The study will be significant because it will help to
determine redundancies among the variables which always
receive some quotas of values as was assumed to have
contributed to the growth of nation’s human development
index (HDI).
2.1 Review of Literature
Hamid, Mei and Yahaya (2017) in their work designated as
"New Discrimination Procedure of Location Model for
Handling Large Categorical Variables". The location model
proposed in the past is a predictive discriminant rule that
can classify new observations into one of two predefined
groups based on mixtures of continuous and categorical
variables. The ability of location model to discriminate
new observation correctly is highly dependent on the
number of multinomial cells created by the number of
categorical variables. This study conducts a preliminary
investigation to show the location model that uses
maximum likelihood estimation has high misclassification
rate up to 45% on average in dealing with more than six
categorical variables for all 36 data tested.
Such model indicated highly incorrect prediction as
this model performed badly for large categorical variables
147
AsPoly Journal of Sciences, Engineering and Environmental Studies
even with large sample size. To alleviate the high rate of
misclassification, a new strategy is embedded in the
discriminant rule by introducing nonlinear principal
component analysis (NPCA) into the classical location
model (cLM), mainly to handle the large number of
categorical variables. This new strategy is investigated on
some simulation and real datasets through the estimation
of misclassification rate using leave-one-out method.
The results from numerical investigations manifest
the feasibility of the proposed model as the
misclassification rate is dramatically decreased compared
to the cLM for all 18 different data settings. A practical
application using real dataset demonstrates a significant
improvement and obtains comparable result among the
best methods that are compared. The overall findings
reveal that the proposed model extended the applicability
range of the location model as previously it was limited to
only six categorical variables to achieve acceptable
performance. This study proved that the proposed model
with new discrimination procedure can be used as an
alternative to the problems of mixed variables
classification, primarily when facing with large categorical
variables.
Ikechukwu (2016) in his work titled “Evaluation of
Error Rate Estimators in Discriminant Analysis with
Multivariate Binary Variables.” He said that, classification
problems often suffer from small samples in conjunction
with large number of features, which makes error
estimation problematic. When a sample is small, there is
insufficient data to split the sample and the same data are
used for both classifier design and error estimation. Error
estimation can suffer from high variance, bias or both. The
problem of choosing a suitable error estimator is
exacerbated by the fact that estimation performance
depends on the rule used to design the classifier, the
feature-label distribution to which the classifier is to be
applied and the sample size.
148
AsPoly Journal of Sciences, Engineering and Environmental Studies
His paper was concerned with evaluation of error rate
estimators in two group discriminant analysis with
multivariate binary variables. Behaviour of eight most
commonly used estimators are compared and contrasted
by mean of Monte Carlo Simulation. The criterion used for
comparing those error rate estimators is sum squared
error rate (SSE). Four experimental factors are considered
for the simulation namely: the number of variables, the
sample size relative to number of variables, the prior
probability and the correlation between the variables in
the populations. They obtained two major results from this
study. Firstly, using the simulation experiments we ranked
the estimators as follows: DS, O, OS, U, R, JK, P and D.
The best method was the DS estimator. Secondly, they
concluded that, it is better to increase the number of
variables because accuracy increases with increasing
number of variables. Also, the general trend for the
estimators was an increase in error rate as sample size
decreases while decreasing the distance between
populations generally increase the error rate. DS
estimator was the most consistent and thus reliable over
all combinations of probability pattern and sample sizes.
El-Hanjouri and Hamad (2015) on their work titled
“Using Cluster Analysis and Discriminant Analysis
Methods in Classification with Application on Standard of
Living Family in Palestinian Areas”. In their research work,
they applied methods of multivariate statistical analysis,
especially cluster analysis (CA) in order to recognize the
disparity in the living standards for family among the
Palestinian areas. The research results concluded that,
there was a convergence in living standards for family
between two areas formed the first cluster of high
living standards which are the urban of middle West
Bank and the camp of middle West Bank, also there
was a convergence of living standards for family
among the seven areas formed the second cluster of
middle living standards which are the urban of North
149
AsPoly Journal of Sciences, Engineering and Environmental Studies
West Bank, the camp of North West Bank, the rural of
North West Bank, the urban of South West Bank, the
camp of South West Bank, the rural of South West Bank
and the rural of middle West Bank.
In addition, there is a convergence of living standards
for family the three areas formed the third cluster of low
living standards which are the urban of Gaza strip, the
rural of Gaza strip and the camp of Gaza strip. After a
comparison among several methods of cluster analysis
through a cluster validation (Hierarchical Cluster Analysis,
K-means Clustering and K-medoids Clustering), the
preference was for the Hierarchical Cluster Analysis
method.
However, after an examination to choose the best
method of connection through agglomerate coefficient in
the Hierarchical Cluster Analysis (Single linkage method,
Complete linkage method, Average linkage method and
Ward linkage method), the preference was for Ward
linkage method which has been selected to be used in the
classification. Moreover, the Discriminant Analysis method
(DA) applied to distinguish the variables that contribute
significantly to this disparity among families inside
Palestinian areas and the results show that the variables of
monthly Income, assistance, agricultural land, animal
holdings, total expenditure, imputed rent, remittances
and non-consumption expenditure are significantly
contributed to disparity.
El-Habil, and El-Jazzar (2013) in their paper titled “A
Comparative Study between Linear Discriminant Analysis
and Multinomial Logistic Regression.” Their paper aimed
to compare between the two different methods of
classification: linear discriminant analysis (LDA) and
multinomial logistic regression (MLR) using the overall
classification accuracy, investigating the quality of their
prediction in terms of sensitivity and specificity, and
examining area under the ROC curve (AUC) in order to
make the choice between the two methods easier, and to
150
AsPoly Journal of Sciences, Engineering and Environmental Studies
understand how the two models behave under different
data and group characteristics. Model performance had
been assessed from two special cases of the k-fold
partitioning technique, the ‘leave-one-out’ and ‘hold out’
procedures. The performance evaluation for the two
methods was carried out using real data and also by
simulation.
Results show that logistic regression slightly exceeds
linear discriminant analysis in the correct classification
rate, but when taking into account sensitivity, specificity
and AUC, the differences in the AUC were negligible. By
simulation, we examined the impact of changes regarding
the sample size, distance between group means,
categorization, and correlation matrices between the
predictors on the performance of each method. Results
indicate that the variation in sample size, values of
Euclidean distance, different number of categories have
similar impact on the result for the two methods, and
both methods LDA and MLR show a significant
improvement in classification accuracy in the absence of
multicollinearity among the explanatory variables.
Fernandez, G. (2009) in his work "Discriminant
Analysis, a Powerful Classification Technique in Predictive
Modeling". He observed that discriminant analysis is one of
the classical classification techniques used to discriminate
a single categorical variable using multiple attributes.
Discriminant analysis also assigns observations to one of
the pre-defined groups based on the knowledge of the
multi-attributes. When the distribution within each group
is multivariate normal, a parametric method can be used
to develop a discriminant function using generalized
squared distance measure.
The classification criterion is derived based on either
the individual within-group covariance matrices or the
pooled covariance matrix that also takes into account the
prior probabilities of the classes. Non-parametric
discriminant methods are based on non-parametric group151
AsPoly Journal of Sciences, Engineering and Environmental Studies
specific probability densities. Either a kernel or the knearest-neighbor method can be used to generate anonparametric density estimate in each group and to produce
a classification criterion. The performance of a
discriminant criterion could be evaluated by estimating
probabilities of misclassification of new observations in
the validation data.
Hendrik , Howard & Maximilian (2009) examined the
consequences of data error in data series used to
construct aggregate indicators, using the most popular
indicator of country level economic development, the
Human Development Index (HDI). They identify three
separate sources of data error and propose a simple
statistical framework to investigate how data error may
bias rank assignments and identify two striking
consequences for the HDI. First, using the cutoff values
used by the United Nations to assign a country as ‘low’,
‘medium’, or ‘high’ developed, they found that currently
up to 45% of developing countries are misclassified.
Moreover,
by
replicating
prior
development/macroeconomic studies, they found that key
estimated parameters such as Gini coefficients and speed
of convergence measures vary by up to 100% due to data
error.
Hendrik et al. (2009) discussed that frequently social
and economic indicators on a country are collapsed into a
single, unit free and often double bounded index which
forms the basis for cross country comparisons. Such
indexes are used to assess country investment risk,
political stability, development status, to name but a few.
The objective of this paper is to show some of the
consequences if indicators are subject to data error. In
their empirical analysis, they examine the United Nations’
Human Development Index (HDI) which has become the
most widely used measure to communicate the state of a
country’s development status. The HDI is currently further
152
AsPoly Journal of Sciences, Engineering and Environmental Studies
applied to differentiate between countries of ‘low’,
‘medium’ and ‘high’ development status.
Institutions as well as the academic literature
explicitly and implicitly accept the HDI values of 0.5 and
0.8 to separate countries into these triple bins. They
identify three sources of HDI data error and make the
following three empirical contributions. First, they
calculate country specific noise measures due to
measurement error and formula choice/inconsistencies in
the cut-off values. Second, they calculate the
misclassification measures with respect to these three
sources of data error by simulating the probabilities of
being misclassified and sensitivity analysis of the cut-off
values. Third, they reproduce prior academic studies and
again apply sensitivity analysis with respect to the three
sources of data error.
Hendrik et al. (2009) find that the HDI statistics
contain a substantial amount of noise on the order of 0.01
to 0.11 standard deviations. Secondly, they show that up
to 45% of the developing countries are misclassified due to
failure to update the cutoff values. The continuous HDI
score jointly with this framework of the discrete
classification system is vulnerable when many countries
are close to the thresholds, as is the case in the most
recent years. Third, they discuss various empirical
examples from the prior macroeconomic/development
literature where the HDI has been employed (Gini
coefficients, convergence regressions and foreign aid) and
find that its use is very problematic as key parameters of
the past academic literature vary by up to 100% in their
values.
Their results raise serious concerns about the triplebin classification system and they suggest that the United
Nations should discontinue the practice of classifying
countries into these bins of human development. In their
view the cut-off values are arbitrary, can provide incentives
for strategic behavior in reporting official statistics, and
153
AsPoly Journal of Sciences, Engineering and Environmental Studies
have the potential to misguide politicians, investors,
charity donors and the public at large.
Suman and Antonio (2017) present a critical
evaluation of the indices for measuring human
development and poverty in various Human Development
Reports. They showed how these indices have evolved over
time to capture various aspects of wellbeing and
deprivations. The introduction of simplified indices in
their early reports was required to catch the attention of
the mass media and policy makers to put the concept of
human development on the agenda. However, a simplified
index is not sufficient for capturing the complexity of
human lives and their development and deprivations.
These make the construction of these indices more
complex. More complex indices, however, make their
interpretation difficult. Hence, further research is
required to amend the indices in a direction that
maintains the intuitive interpretations of the indices and,
at the same time, captures the complex realities of human
development and deprivations. Another important issue
with these indices is the requirement of data.
Suman and Antonio reiterated that the consideration
of joint distribution is imperative in order to graduate an
index of wellbeing or poverty from its composite index
status to a truly multidimensional index status. The UNDP
has moved in this direction by introducing a
multidimensional measure of poverty. However, a move in
the same direction has not been possible for the
measurement of human development, primarily due to the
lack of appropriate data. Their proposals for theoretical
improvements cannot be materialized without solving the
data constraints first.
They discussed the measurement of human
development and poverty, especially in United Nations
Development Program’s global Human Development
Reports. They first outlined the methodological evolution
of different indices over the last two decades, focusing on
154
AsPoly Journal of Sciences, Engineering and Environmental Studies
the well-known Human Development Index (HDI) and the
poverty indices.
Moore (2012) evaluated five discrimination procedures
for binary variables. These procedures include first and
second approximations to multinomial probabilities, the
full multinomial model, and the linear and quadratic
discriminants. This study evaluated these five procedures
through the introduction of correlation and higher orderterms which can be used to characterize any population
distribution and the effect of these terms on
misclassification probabilities.
The classification of these estimates was done by the
Baye’s rule. The sampling experiment was performed with
Monte Carlo and the results indicated that care should be
used in the selection of a procedure for discriminate with
binary variables. It showed that, in population where the
log likelihood ratio undergo a reversal both the Linear
Discriminant Function (LDF) and the first order
(independent variables) procedures lead to significantly
greater actual error than the full multinomial procedure.
In population without reversals, the LDF and first
procedure performs better than any of the others.
Wilson (2007) investigated the use of stratification to
improve discrimination when prior probabilities vary
across strata of a population of interest. The researcher
considered a screening rule employed to classify a
population of potential cancer patients into one of two
subpopulations, e.g into a high risk group or into a low
risk group. On the basis of a set of diagnostic tests and
patients treatment will be determined by the outcome of
the classification. The work suggested how to adopt the
discriminate function to account for stratified prior
probabilities and compared the resulting misclassification
probabilities with those obtained when stratification is
ignored in favour of pooled prior probability estimates.
The study adopted a Monte Carlo analysis to stimulate
data and compare the asymptotic and finite sample
155
AsPoly Journal of Sciences, Engineering and Environmental Studies
performance of these three discriminant approaches when
difference prior probabilities exist across strata using
overall misclassification probability as the criterion of
comparison. The asymptotic result indicated that,
potential gains from a stratified discriminant approach
can be substantial when there is variability in the prior
probabilities. The gain in the discrimination is an
increasing function of the level of variability in the prior
probabilities. The largest gains occurred when the two
subpopulations are separated at an intermediate distance
in terms of the discriminate variables. The finite sample
result indicated that gains in discrimination ability can be
realized in small sample, both if two events under study
are equally common and if there is at least moderate
variability in priors.
Leung (2007) considered the problem of classifying as


individual into one of the two given groups, 1 and 2 ,
𝑈,
based on a random vector measurement
consisting of
both binary and continuous variables. The researcher
adopted the location model proposed by Krzanowski 1975
and derived the asymptotic distribution of the studentized
location linear discriminant function directly without the
inversion of the corresponding characteristic function.
The resulting plug-in estimates of the overall error of
misclassification consist of the estimate based on the
limiting contribution of the discriminant plus a correction
term to the second order. The work finally reexamined and
analyzed the example used in the medical study reported
e 0
.
in Chang and Afifi 1974 and calculated the value of
Shia, Jianping, Kuangnan, and Shuangge (2011)
Proposed a way to describe the uncertainty of allocation by
using the crisp set theory that the eigenvalues of the data
matrices are either 1or 0, which suggests that an
individual either belongs or does not belong to a specific
156
AsPoly Journal of Sciences, Engineering and Environmental Studies
set. But in fuzzy set theory, the eigenvalues can belong to
the interval of (0,1)and this referred to as the degrees of
membership. The article used the multivariate analysis
approach in fussy theory to classify undefined observation.
The key idea in the work is that they first gave a
membership degree to each observation in every known
group as prior information and unknown groups by their
corresponding membership degree. Secondly, the work
used the fisher’s linear discriminant function to maximize
the ratio of the between-group sum of squares over the
within-group sum of squares by applying the initial degree
of membership for new observations and the coefficient of
the discriminant function using the unknown sample
.After that the fitted degrees membership for new
observation and the linear combination can be found.
Third determine which groups the new observations
belong to and calculate the classification error. Last the
comparison between the fuzzy discriminant method and
canonical discrimination were made. The study was
analyzed with iris data published by fisher 1936. The result
showed that fuzzy canonical discriminant analysis can
reduce the risk of misclassification and has a satisfactory
performance, an effective tool in prediction and is better
than the canonical discriminant analysis.
Gardner and Roux (2011) studied how the process of
classification can be performed using biplot methodology
approach. The researcher said that biplots are regarded as
the multivariate analogues of scatter plots allowing for
visual appraisal of the structure of the data in a few
dimensions and biplot axes are used to relate the plotted
point to the original variables, as is the case in original
scatter plots. The application of biplot methodology in
discriminant analysis follows from using the canonical
variate analysis (CVA) biplot as a graphical representation
of linear discriminant analysis (LDA).
The mahalanobis distances between the means in the
original space is transformed to Pythagorean distance in
157
AsPoly Journal of Sciences, Engineering and Environmental Studies
the canonical space. Pythagorean distance is used to
classify a new sample to the nearest class mean. The
classification regions can be indicated by appropriately
colouring each point according to the nearest class mean
and this has the advantage of a reduction in dimension.
The reduced space can be more stable and therefore the
dimension reduction could yield better classification
performance.
The paper focused on discriminant analysis with
categorical predictor variables and in particular the ease
of dealing with these categorical predictors by formulating
discriminant analysis in terms of biplot methodology.
Furthermore it is known that categorical predictors can
cause problems in certain discriminination situations, in
particular where so called reversals are present. A variable
is said to undergo a reversal if the true log ratio of the
class-conditional
densities
does
not
increase
monotonically with the number of positive predictor
variables. The work investigated the performance of
discrimination formulated in terms of biplot methodology
with a simulation study. The result showed that the linear
discriminant analysis (LDA) behaves poorly than the biplot
based approach.
Hendrik, Howard & Maximilian (2009) examines the
consequences of data error in data series used to
construct aggregate indicators, using the most popular
indicator of country level economic development, the
Human Development Index (HDI). They identify three
separate sources of data error and propose a simple
statistical framework to investigate how data error may
bias rank assignments and identify two striking
consequences for the HDI. First, using the cutoff values
used by the United Nations to assign a country as ‘low’,
‘medium’, or ‘high’ developed, they found that currently
up to 45% of developing countries are misclassified.
Moreover,
by
replicating
prior
development/
macroeconomic studies, they found that key estimated
158
AsPoly Journal of Sciences, Engineering and Environmental Studies
parameters such as Gini coefficients and speed of
convergence measures vary by up to 100% due to data
error.
3.1 Methodology
The data for the work was collected through secondary
source from United Nation Development Programme
(UNDP) report from 2014-2018. Thirteen variables
(indicators) were selected from thirty-four countries
classified by United Nation Development Programme as
High and Low Human Development. The data was analyzed
with Discriminant Analysis using the Stepwise
Discriminant Function and Location Model.
3.2 Discriminant Analysis
The problem that is addressed with discriminant function
analysis is how well it is possible to separate two or more
groups of individual given measurements for these
individuals on several variables.
The variables are:
X1 represents Agriculture, value added (% of GDP)
X2 represents Exports of goods and services (% of GDP)
X3 represents Fertility rate, total (births per woman)
X4 represents GDP growth (annual %)
X5 represents Gross capital formation (% of GDP)
X6 represents Imports of goods and services (% of GDP)
X7 represents Inflation, GDP deflator (annual %)
X8 represents Life expectancy at birth, total (years)
X9 represents Military expenditure (% of GDP)
X10 represents Mortality rate, under-5 (per 1,000)
X11 represents Population growth (annual %)
X12 represents Population, total
X13 represents Urban population growth (annual %)
159
AsPoly Journal of Sciences, Engineering and Environmental Studies
3.3 Stepwise Discriminant Function
In this method, variables are added to the discriminant
function one by one until it is found that adding extra
variable does not give significant better discrimination.
The Wilk’s Lambda Criterion was used as the criterion
for entering the equation. The Wilk’s Lambdas (λ) is
defined as

Sc
St
Where,
The matrix Sc is the error of squares and cross product
matrix (SSCP) for their sample.
St is the total SSCP matrix.
This is the matrix of sets of squares and cross products of
entire combined sample regardless of which population
give rise to the sample items under observation.
As in ANOVA, we have the relation
Where,
St  S A  S
SA
is the among SSCP matrix
Therefore, for each one of the sample, we can define the
SSCP matrix as:
 W12

W2W1
S
 

W1 Wk
W1W2  Wi Wk 

W22  W2 Wk 





W2 
Wk2 
3.4 ESTIMATION OF ERROR RATE
The success of an allocation rule can be assessed by the
probability of misclassifications or error rates that it gives
160
AsPoly Journal of Sciences, Engineering and Environmental Studies
to. If the parameters are known in the location model the
error rate are given by:

p2 1   p1m  logP2 m p1m  12 Dm2 Dm
k
m1
and


p1 2   p2 m  log p1m p2 m   12 Dm2 Dm
k
m1

where,  is the cumulative standard normal distribution

function and Dm2  1m    2m 
  
1
1
m
  2m 

is the
Mahalanobi’s squared distance between  1 and  2 in cell
j
of the multinomial table.
4.1 Analysis
The group statistics shows that the mean for a High
Human Development indicators lies between 0.6436 and
80.2112 and Low Human Development lies between
1.3322 and 72.3882, while the standard deviation lies
between 0.40527 and 37.22165 for High Human
Development while Low Human Development lies between
0.62177 and 13.84240. The standard deviation for various
factors was used in the measure of relative discriminatory
power of the variables.
The matrix reflected the amount of variation in the
sample and also the extent to which the selected
indicators are correlated. The Canonical correlation of
0.905 indicated a very high correlation among the
indicators.
The Stepwise method chooses XI and XII, that is
Agriculture, value added (% of GDP) and Population
growth (annual %) as the two out of thirteen most
discriminating variables between the High and Low
Human Development indicators.
The Fisher Linear Discriminant Function (FLDF) is
  0.783x1  0.490 x11
161
AsPoly Journal of Sciences, Engineering and Environmental Studies
The positive coefficient will increase the discriminant
score and hence increase the score for High Human
Development. Agriculture has the most discriminatory
variable followed by population growth.
Testing the significance of the derived model
Hypothesis
Ho: The model is not significant in discriminating between
the High Human Development and Low Human
Development.
H1: The model is significant in discriminating between the
High Human Development and Low Human Development.
Test of Statistic
Chi-square = 53.045
P-value = 0.000
Conclusion: Since the P-value = 0.000 is less than the
level of significance 0.05, we reject the null hypothesis and
conclude that, the model is significant in discriminating
between the High and Low Human Development. The Chisquare of 53.045 and Eigen-value of 4.535 also confirmed
that the model was significant in discriminating between
the High Human Development and Low Human
Development. The validation count method was extremely
accurate in classifying 97.1% of the total sample correctly.
5.1 Findings
1. The work showed that, the model developed
  0.783x1  0.490 x11
is a significant model
for discriminating between the High and Low
Human Development.
2. The model revealed that, there were two most
important variables or indicators (Agriculture and
population growth) that discriminate between the
High and Low Human Development.
162
AsPoly Journal of Sciences, Engineering and Environmental Studies
3. The study showed Agriculture explained the
highest of the average discriminant score
separation between High and Low groups. It is the
most important discriminatory variable.
4. The Canonical correlation model which was high
shows that, there existed strong relationship
between the discriminant scores and the
discriminating variables.
5. From the analysis, the Statistical values obtained
from the analysis confirmed that the model is
significant for discriminating between High and
Low Human Development.
6. From the Validation Count, the model was
extremely accurate in classifying 97.1% of the total
sample correctly.
7. The average error rate obtained from the location
model is 0.2604 which indicated that the
proportion of error made by the rule showed that
the model is optimal in minimizing the
unconditional probability of misclassification.
5.2 Conclusion
From the result, a linear combination of two variables
namely; Agriculture, value added (% of GDP) and
Population growth (annual %) was formed from High and
Low Human Development, hence the model developed as
  0.783x1  0.490 x11
They were found to be the most important factors that
discriminate between the High and Low Human
Development indicators.
The Canonical correlation indicated a very high degree
of association between the discriminant variables. From
these findings, it shows that Agriculture and a reduction
in the population growth of any nation can add value to its
economy, if properly managed, especially now that there is
163
AsPoly Journal of Sciences, Engineering and Environmental Studies
economic meltdown globally created by Covid-19. The
government should intensify efforts to improve agriculture
and make policies that will check increase in population
growth in order to meet with the challenges caused by
Covid-19 as agriculture should be the mainstay of nation’s
economic development.
References
El-Habil, A. M., & El-Jazzar, M. (2013). A Comparative
Study between Linear Discriminant Analysis and
Multinomial Logistic Regression. An-Najah University
Journal for Research – Humanities. 28, 1525-1548.
El-Hanjouri, M. M. R. and Hamad, B. S. (2015). Using
Cluster Analysis and Discriminant Analysis Methods in
Classification with Application on Standard of Living
Family in Palestinian Areas. International Journal of
Statistics and Applications. 5(5):213-222
Fernandez, G. (2009). Discriminant Analysis, a Powerful
Classification Technique in Predictive Modeling.
George Fernandez University of Nevada. Reno
Gardner, S. and Roux, W. (2011). Discriminant Analysis
with categorical variables: A biplot based approach.
Hamid, H., Mei, L. M. & Yahaya, S. S. S. (2017). New
Discrimination Procedure of Location Model for
Handling
Large
Categorical
Variables.
Sains
Malaysiana. 46(6): 1001–1010
Hendrik W., Howard C. & Maximilian A. (2009). Human
Development Index:
Are Developing Countries
Misclassified?
Ikechukwu, E. (2016). Evaluation of Error Rate Estimators
in Discriminant Analysis with Multivariate Binary
Variables. American Journal of Theoretical and Applied
Statistics. 5, 173.
Leung, C. Y. (2007). The student zed location linear
discriminant function .Communication in statistics
vol. 18; 11, 3977-3990.
164
AsPoly Journal of Sciences, Engineering and Environmental Studies
Moore, D. H. (2012). Evaluation of five discriminant
procedures for Binary variables. Journal of the
American statistical Association. Vol.68; 342, 399-404.
Shia, B. C., Jianping, Z., Kuangnan, F., and Shuangge, M.
(2011). Fuzzy canonical discriminant analysis: Theory
and Practice. Communications in Statistics. vol.40:10.
1526-1539.
Suman S. and Antonio V. (2017). Measuring Human
Development and Human Deprivations. Oxford Poverty
& Human Development Initiative (OPHI)
Wilson, B. (2007). Discriminant Analysis with stratified
prior probabilities. Communication in Statistics.
Vol.23; 5, 1283-1295.
Human Development Report (2015). Work For Human
Development
Human Development Report (2014). Sustaining Human
Progress: Reducing Vulnerabilities and Building
Resilience
Human Development Report Technical Notes 2014
Human Development Report Statistical Tables 2014
Human Development Report 2018
http://hdr.undp.org/en/data
165
Download