Multivar

Multivariate Discriminant Analysis The Multivariate Discriminant Analysis is used to determine which variables discriminate better an event occurrence between two or more groups. The fundamental basic concept of the Analysis of the Discriminant Function is to establish which groups are differentiated in respect to a variable mean, and then using said variable as a member of the predictor’s group. The Discriminant Analysis is very similar to the Analysis of Variance (ANOVA). It can be specifically answered Yes or No to two or more groups of variables that are significantly different, each one in respect to the mean of a particular variable, if the mean for a variable is significantly different in different groups. It can then be said that this variable discriminates between the groups. The Fischer test is that which permits in the case of only one variable the verification of whether or not the variable discriminates between groups. As it is described in elemental concepts and in the ANOVA analysis of variance and/or the MANOVA analysis of multivariate variance, the statistical F Fischer is evaluated as the variance relation between groups of data having to do with the combination (average) with variance groups. If the variance among groups is significantly higher, then there will be significant differences between the means. Generally several variances are included in the study to find out which one or which ones contribute to the discrimination between groups. When this is the case, there is a matrix of the variances and co-variances total, obtaining therefore the combination matrix of variances and co-variances. These two matrices can be compared through the multivariate F-test in order to determine if there exists between both groups a significant difference, in regards to all variables. This procedure is identical to the Manova analysis of multivariate variance. Just like in Manova, the multivariate test is done first. If it is statistically significant, then it can be seen which of the variables has means significantly different through the groups. The procedure with multiple variables is very complex, the main reasoning would be to search for variables that discriminate between groups looking for differences in the means. The most common application of the Multivariate Discriminant Analysis is to include many variables in order to determine those that will better discriminate between groups. This way a model is constructed that will allow to obtain the best predictor for each group. Stepwise Discriminant Analysis One of the methods to built a statistical predicting model is the "Forward Stepwise Discriminant Analysis" , where a step by step discriminant model is done. The statistical program especifically revises all the variables at each step and evaluates which one contributes more to the discrimination between groups. This variable will be included in the model, and the analysis will go on with the following step. Another method is the "Backward Stepwise Discriminant" Analysis, in which the statistical program includes first all the model variables, and then in each step it eliminates the variable of the member groups contributing less to the prediction. A successful analysis will result in the model having only the most important variables, this is those variables that best contribute to discrimination among groups. The stepwise process is “conducted” by the respective values F to enter and F to remove. The F enter/remove value for a variable indicates its statitical significance in the discrimination between groups. Therefore, it is a measure of how each variable contributes as member of the group and as the only contribution to the prediction. The F enter and F remove values can be interpreted in the same sense as the procedure step by step of multiple regression. Generally the statistical program will continue to chose those variables that will be included in the stastistical model, while the respective F values for those variables are higher than that specified for F enter; and it will exclude from the model the variables with significance lower than the F remove specified value. A common interpretation of the SDA results is to consider the levels of statistical significance at nominal value. When the statistic program decides which variable to include or to exclude in the following step of the analysis, it will computerize the significance of the contribution of each considered variable. The stepwise procedure will work at random, as it takes and/or chooses the variables that will be included in the model as the field of maximum discrimination. Therefore, when the stepwise approximation is used the significance levels should not rebound in the true range of the alfa error. This is the probability of erroneously rejecting the null hypothesis H0 that there is no discrimination between groups. The multivariate analysis of variance was originally developed by Wilks (1932) through a generalized principle of probability ratio. Statistical Program There was a Statistical Program for PC that permitted to carry out the SDA in order to obtain a statistical model for Zonda wind an it severity prediction. Once the variables have been chosen and the procedure has been indicated, the program carries out the SDA and affords the following results: - Quantity of steps of the analysis - Number of variables entered in the model - Last entered variable - Variables in the model - Name of the variable, Wilks Lambda, Partial Lambda, F to remove, p-level referred to F remove, Tolerance, 1-Tolerance - Variables outside the model: Name of the variable, Wilks Lambda, Partial Lambda, F to remove, p-level referred to F to remove, Tolerance, 1-Tolerance - Distance among groups - Summary of the Stepwise Analysis for the chosen variables: Number of step, F to enter or to remove, degrees of freedom for the respective F, number of variables in the model after the chosen variable, Wilks Lambda after the respective step, the value of F associated to Lambda, degrees of freedom for that F, the p-level for that value of F. - Canonic analysis and graphics - Classification functions - Classification Matrix - Classification of the cases - Mahalanobis Square distance - Later probabilities The Wilk’s Lambda value represents the discrimination among groups, assuming values in the range between 0 and 1, 0 correspond to the total discrimination and 1 to the non-discrimination. This value is evaluated above all the discrimination as the relation between the determinant for the variance-covariance matrix of the entering groups, over the determinant for the total matrix of the variance/covariance Wilks’s Lambda = det (W)/ det (T) The partial Lambda is defined as the multiplicative increase in Lambda resulting from adding the respective variable, this is the Wilk’s Lambda associated wih the only contribution of the respective variable, to the discriminatory power of the model. Partial Lambda = lambda after/ lambda before In other words it can be defined as the relation between the Wilk’s Lambda after incorporating the variable divided by the Wilk’s lambda before its incorporation. The F to remove value is the F Fisher value associated to the partial respective Lambda and it is calculated as: Where: F = [(n-q-p) /(q-1)]* [(1-partial  )/partial ] n = number of cases q is the number of groups p is the number of variables partial  is partial lambda The tolerance for each variable is calculated as 1 – R2 of the respective variable with all the other variables of the model, it is a measure of the variable’s redundancy. The classification functions permit to determine the scores for each case and each group. The scores are calculated through the following formula: Si = Ci + wi1 * x1 * x2 + wi3 * x3 + ......+ wim * xm Where subindex i indicates the respective group, subindexes 1, 2, 3, ......, m indicate the m variables (predictors) chosen by the discriminant analysis, ci is a constant for the i – nth group, wi is the weight of the i – nth variable when computing the score for the i – nth group, xi is the value observed for the respective case for the i-nth variable, Si is the score result. The Classification Matrix has the information about the number and percentage of cases correctly classified in each group, giving a classification of aprioristic probability.

Multivar

Related documents

Products

Support

Multivar

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib