Research Journal of Mathematics and Statistics 3(2): 77-81, 2011 ISSN: 2040-7505

advertisement
Research Journal of Mathematics and Statistics 3(2): 77-81, 2011
ISSN: 2040-7505
© Maxwell Scientific Organization, 2011
Received: March 21, 2011
Accepted: April 20, 2011
Published: May 25, 2011
Discriminant and Classification Analysis of Health Status of
Bell Pepper (Capsium annum)
Oludare S. Ariyo, O. Arogundade and Ayoola M. Abdul-Rafiu
National Horticultural Research Institute, PMB, 5432, Jericho Idi- Ishin, Ibadan, Nigeria
Abstract: This study is considered as an attempt for employing the discriminant analysis and classification
method for the purpose of achieving the assessment of discriminnant function through which we can discover
the health status of bell pepper based on measured growth parameters. Linear Discriminant Function (LDF) was
used as a tool for statistics analysis. It was estimated on the bases of data on growth attributes for data taken
from 6 plants from each plot; 3 healthy and 3 diseased. The growth attributes taken were Plant height, number
of leaves and number of fruit six and ten weeks after transplanting. The study results show high significant
difference between both groups based on Hotelling T2 and wilks’ lambda statistics and different variance
covariance structure based on Berlet’s approximation to Chi-square statistics. Finally, the study results showed
that the adapted classification rule lead to the classification of about 83% of the bell pepper to the group which
they belong and 17% to the wrong group. These results shows that discriminant analysis may be an efficient
and useful tool for predicting the health status of bell pepper on the bases of a few growths attributes that are
routinely assessed by researchers and farmers.
Key words: Hotelling T2, Linear Discriminant Function (LDF), wilks’ lambda
functions (or variable groups) into classification of
microarray data. Discriminant analysis methods were
integrated to food authentication (Murphy et al., 2010)
using a model-based dicriminant analysis methods that
includes variable selection. The DA was applied to study
the effect of Irrigation Frequency on the growth and yield
of Okro in Ibadan (Dauda et al., 2007). In recent years, a
number of developments have occurred in DA procedures
for the analysis of data from repeated measures designs.
Specifically, DA procedures have been developed for
repeated measures data characterized by missing
observations and/or unbalanced measurement occasions,
as well as high-dimensional data in which measurements
are collected repeatedly on two or more variables (Lix and
Sojobi, 2010). Yaakob et al. (2010) used DA to exploit
the classification of Lard and other commercial vegetable
oil and animal facts, the result shows that all vegetable
fats/oils and animal facts, including lard are clustered in
a district group. Discriminant analysis has been used
successfully in horticulture in classifying plant material
(Cruz-Castillo et al., 1994; Ebdon et al., 1998;
Pydipati et al., 2006). Few literatures has applied DA to
determine the health Status of an horticultural crop due to
it complex nature, this paper therefore considered as an
attempt for employing the discriminant analysis and
INTRODUCTION
Capscium pepper refrer primarily to Capsicum
annum L and Capsicum frutescens L., plants used in the
manufacture of selected commercial products known for
their pungency and colour (Simon et al., 1984). Capsium
pepper is of the night shade family solanaceae along with
egg plant and tomato, it is a vegetable grown mainly for
its fruit (Hedge, 1997). Tropical south America, particular
Brazil is thought to be the original home of pepper, it is
now widely cultuvavated in America , South America,
Europe, some counties in Asia pacific and African
(Shoemaker and Tesky, 1995).
Discriminant Analysis (DA) encompasses procedures
for classifying observation into groups (i.e., Predictive
discriminant analysis) and describing the relative
importance of variable for distinguishing amongst groups
(Lix and Sojobi, 2010). It based on the conception of a
function which could provide the identification of the
subject of analysis (Kara et al., 2005). Discriminant
analysis could be put into application in case data
matrix has a common variance-covariance matrix
(Atewi et al., 2006; Mikkonen et al., 2006). Tai and
Pan (2007) proposed modified linear discriminant
methods to integrate biological knowledge of gene
Corresponding Author: Oludare S. Ariyo, National Horticultural Research Institute, PMB, 5432, Jericho Idi- Ishin, Ibadan,
Nigeria. Ph: +234-08035206932
77
Res. J. Math. Stat., 3(2): 77-81, 2011
⎛n n ⎞
trW −1 B = ⎜ 1 2 ⎟ d ′W −1d
⎝ n ⎠
classification method for the purpose of achieving the
assessment of discriminnant function through which we
can discover the health status of bell pepper based on
measured growth parameters.
(6)
And corresponding eigenvector i:
THEORETICAL SECTION
a = W-1 d
Another approach to the discriminnat problem based
on a data matrix can be made by not assuming any
particular parametric from for the distribution of the
population B1, B2, ..., Bn but by merely looking for a rule
to discrimante between them (Mardia et al., 1971).
Fisher (1938) looks at the linear function a’x which
maximized the ratio of the between-group sum of squares
to the within-groups sum of squares that are a linear
combination of the columns of X. Then y has the total
sum of squares:
Y!Hy = a!XHAa = a!TA
The dicriminant rule becomes: allocate x to B1 if >0 to B2
otherwise. Testing the significant of the separation
between the two populations.
Suppose the population B1 and B2 are multivariate
with a common covariance matrix 3, we know that to test
the following statistical hypothesis:
H0 : :1 = :2 vs H1 : :1 … :2
which can be partitioned as a sum of within-groups sum
of squares:
T2 =
2
{
}
− ∑ ni a ′( xi − x )
2
T2 =
= a ′Ba (3)
F=
)
(
)
( n1 + n2 − 2)
( n1 + n2 − p − 1)
Fp ,n1 + n2 − p −1 (α )
( n1 + n2 − p − 1) T 2 ~ F
p ,n + n − p −1
( n1 + n2 − 2)
1
2
(9)
(10)
(11)
where Fp ,n + n − p −1 ( α ) is the value of F with degree of
1
1
freedom of numerator p and degree of freedom of
denominator n1 + n2 – p – 1. So, we reject the null
hypothesis H0 : :1 = :2 at:
the level of significant a when:
(4)
If is the vector which maximized (4) we shall call the
linear function a’x Fisher’s discriminant function or the
first canonical variant. Fisher’s discriminant function is
most important in the special case of g = 2 groups. Then
B has rank one and can be written as:
⎛n n ⎞
B = ⎜ 1 2 ⎟ dd ′
⎝ n ⎠
(
(Anderson, 1984)
Where the mean of the ith is sub-vector of y and is the
(centring matrix.
The ratio is given by:
a ′Ba
a ′Wa
′ −1
n1n2
X 1 − X 2 S pooled
X1 − X 2
n1 + n2
(2)
Plus the between-groups sum of squares:
∑ ni ( yi − y )
(8)
we can use the hotelling’s T2-statistics as:
(1)
∑ yi′Hi yi = ∑ y ′ X i′Hi X i a = a ′Wa
(7)
F > Fp,n1 + n1 − p −1 (α )
(12)
Apparent error rate (APER) can be calculated from
the confussion matrix, which shows actual versus
predicted group membership. According to (Richard and
Dean, 1998), the confussion matrix has the form:
(5)
where d = x1 − x2 . This WG1 B has only one non-zero
B1
n1c
n2M = n1 – n1c
eigenvalue which can be found explicitly. This eigenvalue
equals:
78
B2
n1M = n1 – n1c
n2c
Res. J. Math. Stat., 3(2): 77-81, 2011
Table 1: Test for normality (Shapiro-Wilk) for each of the growth
attributes of bell pepper (Capsium annum)
Variables
Statistics
p-value
Significant
X1
0.8965
Pr>W
0.0001
X2
0.9252
Pr>W
0.0013
X3
0.8019
Pr>W
0.0001
X4
0.79115
Pr>W
0.0001
X5
0.8170
Pr>W
0.0001
X6
0.8683
Pr>W
0.0001
X7
0.9351
Pr>W
0.0033
X8
0.9366
Pr>W
0.0001
X1 = Plant height at 6 week after transplant, X2 = Plant height at 10
week after transplant, X3 = number of leaves at 6 weeks after transplant,
X4 = Number of leaves at 10 weeks after transplant, X5 = Number of
branches at 6 weeks after transplant,X6 = Number of branches at 10
weeks after transplant, X7 = Stem girth at 6 weeks after transplant, X8
= Stem girth at 10 weeks after transplant
where
n1c = Number of B1 items correctly as B1 items
n2c = Number of B2 items correctly as B2 items
n1M = Number of B1 items misclassified as B2 items
n1M = Number of B2 items misclassified as B1 items
The apparent error rate is then:
APER =
n1 M + n2 M
n1c + n2 c
(13)
APPLICATION SECTION
The experiment was carried out at the research farm
of National Horticultural Research Institutes (NIHORT),
Ibadan. The experiment was laid out in the Randomized
Complete Block Design (RCBD) with three replications.
The experiment field was first divided into three blocks.
Each block was further divided into 10 unit plots. The size
of each unit plot was 2 m × 2 m . The blocks and the plots
were spaced at 0.5 m between and within. Thirty five-dayold healthy seedlings were transplanted in the
experimental plots at a spacing of 50 cm × 50 cm. The
data taken were 6 plants from each plot; 3 healthy and 3
diseased. The diseased plants were shows varying
symptoms of disease while the healthy plant was
apparently healthy. The following growth attributes were
taken Plant height, number of leaves and number of fruit.
The Recorded data were analysed using Statistical
Analysis System (SAS) version 9.1.
Table 2: Wilks’ Lambda and chi-square test values pertained to
discriminnat function
Statistics
Hypothesis
value
p-values
0.3707
0.001
Wilks’ Lambda
H0 : :1 = :2
H1 : :1 … :2
Chi-square
H0 : 3 1 = 3 2
186.370
0.9927
H0 : 31 … 32
Table 3: Pooled within canonical structure
Variable
CAN1
CAN2
X1
- 0.604617
0.438617
X2
0.971020
0.458473
X3
0.242800
0.362540
X4
- 0.357270
0.246691
X5
- 0.418430
0.197429
X6
0.575110
0.165260
X7
0.316336
0.576054
X8
0.791720
0.702272
X1 = Plant height at 6 week after transplant, X2 = Plant height at 10
week after transplant,X3 = number of leaves at 6 weeks after transplant,
X4 = Number of leaves at 10 weeks after transplant,X5 = Number of
branches at 6 weeks after transplant,X6= Number of branches at 10
weeks after transplant, X7 = Stem girth at 6 weeks after transplant, X8
= Stem girth at 10 weeks after transplant
RESULTS AND DISCUSSION
The Shapiro-Wilks (W) test was used to test the
normality assumption of the growth attributes of
Capsicum annum across the health status. The normality
assumption was significant at 5% level of significant; this
implies that all the growth attributes are approximality
normal (Table 1).
To test whether the population mean and differ
significantly, Wilks’ lambda statistics (Table 2) was used.
We conclude that the separation between the two groups
(disease and Healthy Capsicum annum) is significant.
This gives a logic justification for the importance of the
discriminant function for discriminating or classificatying
Capsicum annum into diseased or healthy group. The
value of Wilks’ lambda statistic is 0.3707 (Table 2), the
complement of the Wilks’ lambda 1- , can be used as a
generalized R2 indicator (Huberty and Olejnik, 2006)
which indicates the amount of variance in the p-variables
system that is attributed to the effect of the grouping
variable). Therefore, about 60% of the growth attributes
of Capsicum annum attributed to the diseased or Healthy
Capsicum annum.
To test whether the variance covariance matrix of the
two groups differs, Berlet’s approximation to chi-squares
test was used, this gave an insignificant result, we
conclude that the two group had the same variance
covariance matrix, this justified the used of linear
discriminan function.
The structure of Correlation between eight growth
attributes (depended variables) and the Linear
Discriminant Functions (LDFs) are conveniently
summarized in Table 3. This shows that the first LDF is
labelled CAN1 and the second LDF CAN2. As seen from
Table 3, X2 ( Plant height at 10 week after transplant ) has
largest loading on the first LDF, followed by X8 (stem
girth at 10 week after transplant), X1 (Plant height at 6
79
Res. J. Math. Stat., 3(2): 77-81, 2011
Table 4: Coefficient of Linear Discriminnat Function (LDF) status in
health status of Capsicum annum
Vairables
Diseased
Health
Constant
- 22.92579
- 39.52036
X1
0.07742
- 0.14573
X2
0.39047
0.60337
X3
0.31987
0.37266
X4
- 0.15112
- 0.21414
X5
- 2.01418
- 2.65213
X6
1.49387
2.11115
X7
41.2201
48.48411
X8
35.6086
50.0531
few growths attributes which can easily measured by
farmers.
In this study, Linear Discriminant Function (LDF)
was used as a statistical tool for statistical analysis. The
discriminnat analysis adopted was found to effective (with
about 83%) to classified bell pepper to its true health
statutes. These results shows that discriminant analysis
may be an efficient and useful tool for predicting the
health status of bell pepper on the bases of a few growths
attributes that are routinely assessed by breeders,
researchers and farmers.
Table 5: Summary of classification with cross-validation using growth
attributes as prediction healthy status in bell pepper
(Capsicum annum)
From
Diseased
Healthy
Total
Diseased
27(90%)
2(10%)
30(100%)
Healthy
3(6.67%)
28(93.33%)
30(100%)
Total
30(50%)
30(50%)
60(100%)
REFERENCES
Anderson, T.W., 1984. An Introduction to Multivariate
Statistical Method. 2nd Edn., John Willey, New
York.
Atewi, A.M., S.A. Al-Obady and I.V. Komashynka,
2006. Discriminnant and classification levels of
students as per their major specification. J. Appl.
Sci., 6(8): 1845-1853.
Cruz-Castillo, J.G., S. Ganeshanandam, B.R. MacKay,
G.S. Lawes, C.R.O. Lawoko and D.J. Woolley,
1994. Applications of canonical discriminant analysis
in horticultural research. Hort. Sci., 29: 1115-1995.
Dauda, T.O., A.O. Adetayo and M.O. Akande, 2007.
Discriminnant Analysis of effects of irrigation
frequency on growth and yield of Okra. Nigeria J.
Hortic. Sci., 12: 46-52.
Ebdon, J.S., A.M. Petrovic and S.J. Schwager, 1998.
Evaluation of discriminant analysis in identification
of low- and high-water use kentucky bluegrass
cultivars. Crop Sci., 38: 152-157.
Fisher, R.A., 1938. The statistical utilization of multiple
measurements. Ann. Eugenic., 8: 376-386.
Hedge, J.W., 1997. Nematode management in tomatoes,
Pepper and eggplant. Department of Entomology and
Haematology Florida cooperation extension services
institutes of food and Agriculture Sciences,
University of Florida, Gainesivilla, Florida.
Huberty, C.J. and S. Olejnik, 2006. Applied MANOVA
and Discriminant Analysis. 2nd Edn., John Wiley &
Sons, Inc. Publication.
Kara, K., G. Ser and F. Arslan, 2005. Discriminant
analysis results of some yield characteristics of
sample dairy cattle corporations by considering
education status of producer. Int. J. Dairy Sci., 1(1):
14-17.
Lix, L.M. and T.T. Sojobi, 2010. Discriminant analysis
for repeated measures data: A review. Psychology.
1: 146. Doi: 10. 3389/fpsy.
week after transplant), X6 (number of branches at 10
weeks after transplant), X5 (number of branches 6 week
after transplant), X4 (number of leave at 10 weeks after
transplant) and X7 (stem girth at 6 week after transplant).
Most of the loading on the first LDF are negative in sign,
indicating a reverse relationship existing between these
variables and LDF. The second LDF strongest correlation
with X8 (stem girth at 10 week after transplant), X7 (stem
girth at 6 week after transplant), (0.70 and 0.57,
respectively) and moderate correlation with X1 (Plant
height at 6 week after transplant), X2 (Plant height at 10
week after transplant), X3 (number of leaves 6 weeks
after transplant) (0.43, 0.45 and 0.36, respectively) signs
are positive. The canonical variables were generated to
find the optimal combination of variables needed to
discriminant between two groups.
The linear discriminnant analysis coefficient in
(Table 4) was used to predict, using Cross-validation error
rate, the number of healthy Capsicum annum that are
correctly classified. The summary is presented in Table 5,
90% of diseased Capsicum annum was correctly classified
while about 7% was misclassified. The Apparent error
rate (APER), from Eq. (13), for the discriminnat analysis
(S/60 × 100) is 83.3%. The adopted classification rule
leads to about 83.3% of Capsicum annum correctly
classified according to their healthy status which leads to
about 17% of the pepper classified to the group which
they do not belong to.
CONCLUSION
This study is considered as an attempt of using the
discriminnat and classification methods for the purpose of
achieving a discriminant function, though which we can
identify the healthy from the diseased bell pepper with
80
Res. J. Math. Stat., 3(2): 77-81, 2011
Mardia, L.V., J.T. Keni and J.M. Bibby, 1971.
Multivariate Analysis. Academic Press, New Yolk.
Mikkonen, S., K.E.J. Lehtinen, A. Hamed, J.
Joutsenscari, M.C. Facchini and A. Haaksonen, 2006.
Using discrimmant analysis as a nucleation event
classification method. Atom. Chem. Phys., 6:
5549-5557.
Murphy, T.B., N. Dean and A.E. Raftery, 2010. Variable
Selection and update in model-based discriminnant
analysis for high dimensional data with food
authenticity applications. Ann. Appl. Stat. Statistics,
4(1): 396-421.
Pydipati, R., T.F. Burks and W.S. Lee, 2006.
Identification of citrus disease using color texture
features and discriminant analysis. Comput. Electr.
Agric., 52: 49-59.
Richard, A.J. and W.W. Dean, 1998. Applied
Multivariate Statistical Analysis. 4th Edn., Prentice
Hall, Inc., New Jersey.
Shoemaker, J.S. and B.J E. Tesky, 1995. Practiacal
Horticultural, John Willy and Sons. Inc., New York,
pp: 371.
Simon, J.E., A.F. Chadwick and L.E. Crater, 1984. Herbs:
An Indexed Bibliography. 1991-1980. The Scientific
Literature on Selected Herbs, Aromatic Medical
Plants of the Temperate zone. Arcon Books, pp: 770.
Tai, F. and W. Pan, 2007. Incorporating prior Knowledge
of gene functional groups into regularized
discriminant analysis of microarray data.
Bioinformation, 23(23): 3170-3177.
Yaakob, B.C., Z. Syahariza and A. Abulrohman, 2010.
Discriminnat analysis of selected edible facts and oils
and those in biscuits formulation using FRIR
spectroscopy. Food Anal. Methods. Doi: 10.1007/s
12161-010-9184.
81
Download