Uploaded by Kevin Kyuson Lim

assignment2

advertisement
STATS 780
Assignment 1
Stduent number: 400415239
Kyuson Lim
13 February, 2022
Contents
Explatory data analysis
2
Multinomial logistics regression & assessment
Cross validation: Predicting & Validating the model . . . . . . . . . . . . . . . . . . . . .
2
3
ROC performance
3
1
Explatory data analysis
2
0
value
MinorAxisLength
1
−1
−2
−3
6
5
4
3
2
1
0
−1
−2
−3
−4
−5
−6
UCI Machine learning repository
Raisin Dataset Data Set
class
Besni
te
nt
Ex
ity
on
ve
xA
re
a
C
nt
ric
Ec
ce
en
sL
xi
Variables
M
in
o
rA
xi
rA
aj
o
class
Kecimen
gt
h
gt
en
sL
im
4
M
2
Extent
rm
0
pe
−2
et
ar
e
er
a
UCI Machine learning repository
Raisin Dataset Data Set
−5
h
−4
Besni
Kecimen
First of all, from the density-scatter plot of 3 species against sepal width and sepal length as a
bivariate scatter plot, the setosa species does not overlap with any of two species but separate from
other two species.
Also, the setosa species have high sepal width with mean 3.43 and low sepal length with mean
5 while versicolor (with mean 2.97 in sepal width and mean sepal length of 5.936) and virginica
species (with mean 2.77 in sepal width and mean sepal length of 6.59) overlap with each other for
high sepal length and low sepal width.
However, the separation becomes obvious from the box-plot where setosa species have lowest mean
in sepal length compared to other two species where virginica species have highest sepal length with
mean 6.58 and versicolor have 2nd highest sepal length with mean 5.94, as a distinct species.
Multinomial logistics regression & assessment
From the fitted multinomial model, among 150 observation 52 observations are classfied to the setosa
species, 47 observations are classified to the versicolor species and 51 observations are classified to
the virginica species.
Considering that all species are equally likely to be sampled from 150 observations, these numbers
of classification is not very different from the original number of samples. Moreover, by the visual
inspection of all 3 classes, we expect to find if there is any dissimilarity in the density plots.
2
4.5
3.5
value
Sepal.Width
4.0
3.0
2.5
2.0
5
6
Sepal.Length
model_fit
setosa
7
versicolor
8.0
7.5
7.0
6.5
6.0
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
8
Sepal.Length Sepal.Width Petal.Length Petal.Width
Species
model_fit
virginica
setosa
versicolor
virginica
To be observed from the classified density-scatter plot, the species of iris dataset is not completely
the same for the exact classification.
Some point on the high values of sepal width and low on the sepal length is classified for versicolor
species even though the orginal class of species is setosa.
Sepal Width
Sepal Length
(model) Sepal Width
(model) Sepal Length
setosa
3.43
5.01
3.43
5.01
versicolor
2.77
5.94
2.77
5.88
virginica
2.97
6.59
2.97
6.63
For difference, the table summarizes the mean and variance of 3 species.
Cross validation: Predicting & Validating the model
ROC performance
Specificity
0.2
0.8
0.6
0.2
Specificity
3
0.4
0.4
1.0
1
0.0
0.6
Sensitivity
0.8
1.0
1
0.0
Sensitivity
0.4
1
0.0
Sensitivity
versicolor/virginica
setosa/virginica
0.8
setosa/versicolor
1.0
0.6
Specificity
0.2
Download