Research Journal of Mathematics and Statistics 3(2): 77-81, 2011 ISSN: 2040-7505 © Maxwell Scientific Organization, 2011 Received: March 21, 2011 Accepted: April 20, 2011 Published: May 25, 2011 Discriminant and Classification Analysis of Health Status of Bell Pepper (Capsium annum) Oludare S. Ariyo, O. Arogundade and Ayoola M. Abdul-Rafiu National Horticultural Research Institute, PMB, 5432, Jericho Idi- Ishin, Ibadan, Nigeria Abstract: This study is considered as an attempt for employing the discriminant analysis and classification method for the purpose of achieving the assessment of discriminnant function through which we can discover the health status of bell pepper based on measured growth parameters. Linear Discriminant Function (LDF) was used as a tool for statistics analysis. It was estimated on the bases of data on growth attributes for data taken from 6 plants from each plot; 3 healthy and 3 diseased. The growth attributes taken were Plant height, number of leaves and number of fruit six and ten weeks after transplanting. The study results show high significant difference between both groups based on Hotelling T2 and wilks’ lambda statistics and different variance covariance structure based on Berlet’s approximation to Chi-square statistics. Finally, the study results showed that the adapted classification rule lead to the classification of about 83% of the bell pepper to the group which they belong and 17% to the wrong group. These results shows that discriminant analysis may be an efficient and useful tool for predicting the health status of bell pepper on the bases of a few growths attributes that are routinely assessed by researchers and farmers. Key words: Hotelling T2, Linear Discriminant Function (LDF), wilks’ lambda functions (or variable groups) into classification of microarray data. Discriminant analysis methods were integrated to food authentication (Murphy et al., 2010) using a model-based dicriminant analysis methods that includes variable selection. The DA was applied to study the effect of Irrigation Frequency on the growth and yield of Okro in Ibadan (Dauda et al., 2007). In recent years, a number of developments have occurred in DA procedures for the analysis of data from repeated measures designs. Specifically, DA procedures have been developed for repeated measures data characterized by missing observations and/or unbalanced measurement occasions, as well as high-dimensional data in which measurements are collected repeatedly on two or more variables (Lix and Sojobi, 2010). Yaakob et al. (2010) used DA to exploit the classification of Lard and other commercial vegetable oil and animal facts, the result shows that all vegetable fats/oils and animal facts, including lard are clustered in a district group. Discriminant analysis has been used successfully in horticulture in classifying plant material (Cruz-Castillo et al., 1994; Ebdon et al., 1998; Pydipati et al., 2006). Few literatures has applied DA to determine the health Status of an horticultural crop due to it complex nature, this paper therefore considered as an attempt for employing the discriminant analysis and INTRODUCTION Capscium pepper refrer primarily to Capsicum annum L and Capsicum frutescens L., plants used in the manufacture of selected commercial products known for their pungency and colour (Simon et al., 1984). Capsium pepper is of the night shade family solanaceae along with egg plant and tomato, it is a vegetable grown mainly for its fruit (Hedge, 1997). Tropical south America, particular Brazil is thought to be the original home of pepper, it is now widely cultuvavated in America , South America, Europe, some counties in Asia pacific and African (Shoemaker and Tesky, 1995). Discriminant Analysis (DA) encompasses procedures for classifying observation into groups (i.e., Predictive discriminant analysis) and describing the relative importance of variable for distinguishing amongst groups (Lix and Sojobi, 2010). It based on the conception of a function which could provide the identification of the subject of analysis (Kara et al., 2005). Discriminant analysis could be put into application in case data matrix has a common variance-covariance matrix (Atewi et al., 2006; Mikkonen et al., 2006). Tai and Pan (2007) proposed modified linear discriminant methods to integrate biological knowledge of gene Corresponding Author: Oludare S. Ariyo, National Horticultural Research Institute, PMB, 5432, Jericho Idi- Ishin, Ibadan, Nigeria. Ph: +234-08035206932 77 Res. J. Math. Stat., 3(2): 77-81, 2011 ⎛n n ⎞ trW −1 B = ⎜ 1 2 ⎟ d ′W −1d ⎝ n ⎠ classification method for the purpose of achieving the assessment of discriminnant function through which we can discover the health status of bell pepper based on measured growth parameters. (6) And corresponding eigenvector i: THEORETICAL SECTION a = W-1 d Another approach to the discriminnat problem based on a data matrix can be made by not assuming any particular parametric from for the distribution of the population B1, B2, ..., Bn but by merely looking for a rule to discrimante between them (Mardia et al., 1971). Fisher (1938) looks at the linear function a’x which maximized the ratio of the between-group sum of squares to the within-groups sum of squares that are a linear combination of the columns of X. Then y has the total sum of squares: Y!Hy = a!XHAa = a!TA The dicriminant rule becomes: allocate x to B1 if >0 to B2 otherwise. Testing the significant of the separation between the two populations. Suppose the population B1 and B2 are multivariate with a common covariance matrix 3, we know that to test the following statistical hypothesis: H0 : :1 = :2 vs H1 : :1 … :2 which can be partitioned as a sum of within-groups sum of squares: T2 = 2 { } − ∑ ni a ′( xi − x ) 2 T2 = = a ′Ba (3) F= ) ( ) ( n1 + n2 − 2) ( n1 + n2 − p − 1) Fp ,n1 + n2 − p −1 (α ) ( n1 + n2 − p − 1) T 2 ~ F p ,n + n − p −1 ( n1 + n2 − 2) 1 2 (9) (10) (11) where Fp ,n + n − p −1 ( α ) is the value of F with degree of 1 1 freedom of numerator p and degree of freedom of denominator n1 + n2 – p – 1. So, we reject the null hypothesis H0 : :1 = :2 at: the level of significant a when: (4) If is the vector which maximized (4) we shall call the linear function a’x Fisher’s discriminant function or the first canonical variant. Fisher’s discriminant function is most important in the special case of g = 2 groups. Then B has rank one and can be written as: ⎛n n ⎞ B = ⎜ 1 2 ⎟ dd ′ ⎝ n ⎠ ( (Anderson, 1984) Where the mean of the ith is sub-vector of y and is the (centring matrix. The ratio is given by: a ′Ba a ′Wa ′ −1 n1n2 X 1 − X 2 S pooled X1 − X 2 n1 + n2 (2) Plus the between-groups sum of squares: ∑ ni ( yi − y ) (8) we can use the hotelling’s T2-statistics as: (1) ∑ yi′Hi yi = ∑ y ′ X i′Hi X i a = a ′Wa (7) F > Fp,n1 + n1 − p −1 (α ) (12) Apparent error rate (APER) can be calculated from the confussion matrix, which shows actual versus predicted group membership. According to (Richard and Dean, 1998), the confussion matrix has the form: (5) where d = x1 − x2 . This WG1 B has only one non-zero B1 n1c n2M = n1 – n1c eigenvalue which can be found explicitly. This eigenvalue equals: 78 B2 n1M = n1 – n1c n2c Res. J. Math. Stat., 3(2): 77-81, 2011 Table 1: Test for normality (Shapiro-Wilk) for each of the growth attributes of bell pepper (Capsium annum) Variables Statistics p-value Significant X1 0.8965 Pr>W 0.0001 X2 0.9252 Pr>W 0.0013 X3 0.8019 Pr>W 0.0001 X4 0.79115 Pr>W 0.0001 X5 0.8170 Pr>W 0.0001 X6 0.8683 Pr>W 0.0001 X7 0.9351 Pr>W 0.0033 X8 0.9366 Pr>W 0.0001 X1 = Plant height at 6 week after transplant, X2 = Plant height at 10 week after transplant, X3 = number of leaves at 6 weeks after transplant, X4 = Number of leaves at 10 weeks after transplant, X5 = Number of branches at 6 weeks after transplant,X6 = Number of branches at 10 weeks after transplant, X7 = Stem girth at 6 weeks after transplant, X8 = Stem girth at 10 weeks after transplant where n1c = Number of B1 items correctly as B1 items n2c = Number of B2 items correctly as B2 items n1M = Number of B1 items misclassified as B2 items n1M = Number of B2 items misclassified as B1 items The apparent error rate is then: APER = n1 M + n2 M n1c + n2 c (13) APPLICATION SECTION The experiment was carried out at the research farm of National Horticultural Research Institutes (NIHORT), Ibadan. The experiment was laid out in the Randomized Complete Block Design (RCBD) with three replications. The experiment field was first divided into three blocks. Each block was further divided into 10 unit plots. The size of each unit plot was 2 m × 2 m . The blocks and the plots were spaced at 0.5 m between and within. Thirty five-dayold healthy seedlings were transplanted in the experimental plots at a spacing of 50 cm × 50 cm. The data taken were 6 plants from each plot; 3 healthy and 3 diseased. The diseased plants were shows varying symptoms of disease while the healthy plant was apparently healthy. The following growth attributes were taken Plant height, number of leaves and number of fruit. The Recorded data were analysed using Statistical Analysis System (SAS) version 9.1. Table 2: Wilks’ Lambda and chi-square test values pertained to discriminnat function Statistics Hypothesis value p-values 0.3707 0.001 Wilks’ Lambda H0 : :1 = :2 H1 : :1 … :2 Chi-square H0 : 3 1 = 3 2 186.370 0.9927 H0 : 31 … 32 Table 3: Pooled within canonical structure Variable CAN1 CAN2 X1 - 0.604617 0.438617 X2 0.971020 0.458473 X3 0.242800 0.362540 X4 - 0.357270 0.246691 X5 - 0.418430 0.197429 X6 0.575110 0.165260 X7 0.316336 0.576054 X8 0.791720 0.702272 X1 = Plant height at 6 week after transplant, X2 = Plant height at 10 week after transplant,X3 = number of leaves at 6 weeks after transplant, X4 = Number of leaves at 10 weeks after transplant,X5 = Number of branches at 6 weeks after transplant,X6= Number of branches at 10 weeks after transplant, X7 = Stem girth at 6 weeks after transplant, X8 = Stem girth at 10 weeks after transplant RESULTS AND DISCUSSION The Shapiro-Wilks (W) test was used to test the normality assumption of the growth attributes of Capsicum annum across the health status. The normality assumption was significant at 5% level of significant; this implies that all the growth attributes are approximality normal (Table 1). To test whether the population mean and differ significantly, Wilks’ lambda statistics (Table 2) was used. We conclude that the separation between the two groups (disease and Healthy Capsicum annum) is significant. This gives a logic justification for the importance of the discriminant function for discriminating or classificatying Capsicum annum into diseased or healthy group. The value of Wilks’ lambda statistic is 0.3707 (Table 2), the complement of the Wilks’ lambda 1- , can be used as a generalized R2 indicator (Huberty and Olejnik, 2006) which indicates the amount of variance in the p-variables system that is attributed to the effect of the grouping variable). Therefore, about 60% of the growth attributes of Capsicum annum attributed to the diseased or Healthy Capsicum annum. To test whether the variance covariance matrix of the two groups differs, Berlet’s approximation to chi-squares test was used, this gave an insignificant result, we conclude that the two group had the same variance covariance matrix, this justified the used of linear discriminan function. The structure of Correlation between eight growth attributes (depended variables) and the Linear Discriminant Functions (LDFs) are conveniently summarized in Table 3. This shows that the first LDF is labelled CAN1 and the second LDF CAN2. As seen from Table 3, X2 ( Plant height at 10 week after transplant ) has largest loading on the first LDF, followed by X8 (stem girth at 10 week after transplant), X1 (Plant height at 6 79 Res. J. Math. Stat., 3(2): 77-81, 2011 Table 4: Coefficient of Linear Discriminnat Function (LDF) status in health status of Capsicum annum Vairables Diseased Health Constant - 22.92579 - 39.52036 X1 0.07742 - 0.14573 X2 0.39047 0.60337 X3 0.31987 0.37266 X4 - 0.15112 - 0.21414 X5 - 2.01418 - 2.65213 X6 1.49387 2.11115 X7 41.2201 48.48411 X8 35.6086 50.0531 few growths attributes which can easily measured by farmers. In this study, Linear Discriminant Function (LDF) was used as a statistical tool for statistical analysis. The discriminnat analysis adopted was found to effective (with about 83%) to classified bell pepper to its true health statutes. These results shows that discriminant analysis may be an efficient and useful tool for predicting the health status of bell pepper on the bases of a few growths attributes that are routinely assessed by breeders, researchers and farmers. Table 5: Summary of classification with cross-validation using growth attributes as prediction healthy status in bell pepper (Capsicum annum) From Diseased Healthy Total Diseased 27(90%) 2(10%) 30(100%) Healthy 3(6.67%) 28(93.33%) 30(100%) Total 30(50%) 30(50%) 60(100%) REFERENCES Anderson, T.W., 1984. An Introduction to Multivariate Statistical Method. 2nd Edn., John Willey, New York. Atewi, A.M., S.A. Al-Obady and I.V. Komashynka, 2006. Discriminnant and classification levels of students as per their major specification. J. Appl. Sci., 6(8): 1845-1853. Cruz-Castillo, J.G., S. Ganeshanandam, B.R. MacKay, G.S. Lawes, C.R.O. Lawoko and D.J. Woolley, 1994. Applications of canonical discriminant analysis in horticultural research. Hort. Sci., 29: 1115-1995. Dauda, T.O., A.O. Adetayo and M.O. Akande, 2007. Discriminnant Analysis of effects of irrigation frequency on growth and yield of Okra. Nigeria J. Hortic. Sci., 12: 46-52. Ebdon, J.S., A.M. Petrovic and S.J. Schwager, 1998. Evaluation of discriminant analysis in identification of low- and high-water use kentucky bluegrass cultivars. Crop Sci., 38: 152-157. Fisher, R.A., 1938. The statistical utilization of multiple measurements. Ann. Eugenic., 8: 376-386. Hedge, J.W., 1997. Nematode management in tomatoes, Pepper and eggplant. Department of Entomology and Haematology Florida cooperation extension services institutes of food and Agriculture Sciences, University of Florida, Gainesivilla, Florida. Huberty, C.J. and S. Olejnik, 2006. Applied MANOVA and Discriminant Analysis. 2nd Edn., John Wiley & Sons, Inc. Publication. Kara, K., G. Ser and F. Arslan, 2005. Discriminant analysis results of some yield characteristics of sample dairy cattle corporations by considering education status of producer. Int. J. Dairy Sci., 1(1): 14-17. Lix, L.M. and T.T. Sojobi, 2010. Discriminant analysis for repeated measures data: A review. Psychology. 1: 146. Doi: 10. 3389/fpsy. week after transplant), X6 (number of branches at 10 weeks after transplant), X5 (number of branches 6 week after transplant), X4 (number of leave at 10 weeks after transplant) and X7 (stem girth at 6 week after transplant). Most of the loading on the first LDF are negative in sign, indicating a reverse relationship existing between these variables and LDF. The second LDF strongest correlation with X8 (stem girth at 10 week after transplant), X7 (stem girth at 6 week after transplant), (0.70 and 0.57, respectively) and moderate correlation with X1 (Plant height at 6 week after transplant), X2 (Plant height at 10 week after transplant), X3 (number of leaves 6 weeks after transplant) (0.43, 0.45 and 0.36, respectively) signs are positive. The canonical variables were generated to find the optimal combination of variables needed to discriminant between two groups. The linear discriminnant analysis coefficient in (Table 4) was used to predict, using Cross-validation error rate, the number of healthy Capsicum annum that are correctly classified. The summary is presented in Table 5, 90% of diseased Capsicum annum was correctly classified while about 7% was misclassified. The Apparent error rate (APER), from Eq. (13), for the discriminnat analysis (S/60 × 100) is 83.3%. The adopted classification rule leads to about 83.3% of Capsicum annum correctly classified according to their healthy status which leads to about 17% of the pepper classified to the group which they do not belong to. CONCLUSION This study is considered as an attempt of using the discriminnat and classification methods for the purpose of achieving a discriminant function, though which we can identify the healthy from the diseased bell pepper with 80 Res. J. Math. Stat., 3(2): 77-81, 2011 Mardia, L.V., J.T. Keni and J.M. Bibby, 1971. Multivariate Analysis. Academic Press, New Yolk. Mikkonen, S., K.E.J. Lehtinen, A. Hamed, J. Joutsenscari, M.C. Facchini and A. Haaksonen, 2006. Using discrimmant analysis as a nucleation event classification method. Atom. Chem. Phys., 6: 5549-5557. Murphy, T.B., N. Dean and A.E. Raftery, 2010. Variable Selection and update in model-based discriminnant analysis for high dimensional data with food authenticity applications. Ann. Appl. Stat. Statistics, 4(1): 396-421. Pydipati, R., T.F. Burks and W.S. Lee, 2006. Identification of citrus disease using color texture features and discriminant analysis. Comput. Electr. Agric., 52: 49-59. Richard, A.J. and W.W. Dean, 1998. Applied Multivariate Statistical Analysis. 4th Edn., Prentice Hall, Inc., New Jersey. Shoemaker, J.S. and B.J E. Tesky, 1995. Practiacal Horticultural, John Willy and Sons. Inc., New York, pp: 371. Simon, J.E., A.F. Chadwick and L.E. Crater, 1984. Herbs: An Indexed Bibliography. 1991-1980. The Scientific Literature on Selected Herbs, Aromatic Medical Plants of the Temperate zone. Arcon Books, pp: 770. Tai, F. and W. Pan, 2007. Incorporating prior Knowledge of gene functional groups into regularized discriminant analysis of microarray data. Bioinformation, 23(23): 3170-3177. Yaakob, B.C., Z. Syahariza and A. Abulrohman, 2010. Discriminnat analysis of selected edible facts and oils and those in biscuits formulation using FRIR spectroscopy. Food Anal. Methods. Doi: 10.1007/s 12161-010-9184. 81