STATS 780 Assignment 1 Stduent number: 400415239 Kyuson Lim 13 February, 2022 Contents Explatory data analysis 2 Multinomial logistics regression & assessment Cross validation: Predicting & Validating the model . . . . . . . . . . . . . . . . . . . . . 2 3 ROC performance 3 1 Explatory data analysis 2 0 value MinorAxisLength 1 −1 −2 −3 6 5 4 3 2 1 0 −1 −2 −3 −4 −5 −6 UCI Machine learning repository Raisin Dataset Data Set class Besni te nt Ex ity on ve xA re a C nt ric Ec ce en sL xi Variables M in o rA xi rA aj o class Kecimen gt h gt en sL im 4 M 2 Extent rm 0 pe −2 et ar e er a UCI Machine learning repository Raisin Dataset Data Set −5 h −4 Besni Kecimen First of all, from the density-scatter plot of 3 species against sepal width and sepal length as a bivariate scatter plot, the setosa species does not overlap with any of two species but separate from other two species. Also, the setosa species have high sepal width with mean 3.43 and low sepal length with mean 5 while versicolor (with mean 2.97 in sepal width and mean sepal length of 5.936) and virginica species (with mean 2.77 in sepal width and mean sepal length of 6.59) overlap with each other for high sepal length and low sepal width. However, the separation becomes obvious from the box-plot where setosa species have lowest mean in sepal length compared to other two species where virginica species have highest sepal length with mean 6.58 and versicolor have 2nd highest sepal length with mean 5.94, as a distinct species. Multinomial logistics regression & assessment From the fitted multinomial model, among 150 observation 52 observations are classfied to the setosa species, 47 observations are classified to the versicolor species and 51 observations are classified to the virginica species. Considering that all species are equally likely to be sampled from 150 observations, these numbers of classification is not very different from the original number of samples. Moreover, by the visual inspection of all 3 classes, we expect to find if there is any dissimilarity in the density plots. 2 4.5 3.5 value Sepal.Width 4.0 3.0 2.5 2.0 5 6 Sepal.Length model_fit setosa 7 versicolor 8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 8 Sepal.Length Sepal.Width Petal.Length Petal.Width Species model_fit virginica setosa versicolor virginica To be observed from the classified density-scatter plot, the species of iris dataset is not completely the same for the exact classification. Some point on the high values of sepal width and low on the sepal length is classified for versicolor species even though the orginal class of species is setosa. Sepal Width Sepal Length (model) Sepal Width (model) Sepal Length setosa 3.43 5.01 3.43 5.01 versicolor 2.77 5.94 2.77 5.88 virginica 2.97 6.59 2.97 6.63 For difference, the table summarizes the mean and variance of 3 species. Cross validation: Predicting & Validating the model ROC performance Specificity 0.2 0.8 0.6 0.2 Specificity 3 0.4 0.4 1.0 1 0.0 0.6 Sensitivity 0.8 1.0 1 0.0 Sensitivity 0.4 1 0.0 Sensitivity versicolor/virginica setosa/virginica 0.8 setosa/versicolor 1.0 0.6 Specificity 0.2