Uploaded by AJ

Cluster wise regression

advertisement
Bonus Assignment 1
Result of Cluster wise regression R code execution.
For this assignment, I have used the FlexMix package in order to perform cluster-wise
regression on car.test.frame data in package rpart. We first fit a cluster-wise regression using a
single cluster and then I tried to fit the cluster-wise regression with cluster 2, 3, 4 and finally 5. I
stopped at 5 because thr AIC and BIC value is lowest with 5 clusters. The model fit is shown
below.
Coefficients estimate seems reasonable, as increase in fuel consumption translating to
a decrease in price. An increase in horse power resulting in an increase in price.
I have used all continuous predictors for the cluster wise regression, which are Mileage , Weight
, Disp. and HP
> library(ISLR)
> library(flexmix)
> library(rpart)
>
> cartestframe <- car.test.frame
> str(cartestframe)
'data.frame': 60 obs. of 8 variables:
$ Price : int 8895 7402 6319 6635 6599 8672 7399 7254 9599 5866 ...
$ Country : Factor w/ 8 levels "France","Germany",..: 8 8 5 4 3 6 4 5 3 3 ...
$ Reliability: int 4 2 4 5 5 4 5 1 5 NA ...
$ Mileage : int 33 33 37 32 32 26 33 28 25 34 ...
$ Type
: Factor w/ 6 levels "Compact","Large",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Weight : int 2560 2345 1845 2260 2440 2285 2275 2350 2295 1900 ...
$ Disp.
: int 97 114 81 91 113 97 97 98 109 73 ...
$ HP
: int 113 90 63 92 103 82 90 74 90 73 ...
>
> #Model with 1 cluster
> set.seed(1)
> model1 <- flexmix(Price ~ Mileage + Weight + Disp. + HP, data=cartestframe, k=1)
> summary(model1)
Call:
flexmix(formula = Price ~ Mileage + Weight + Disp. + HP, data = cartestframe,
k = 1)
prior size post>0 ratio
Comp.1 1 60 60 1
'log Lik.' -555.404 (df=6)
AIC: 1122.808 BIC: 1135.374
> parameters(model1, component=1)
Comp.1
coef.(Intercept) 822.181804
coef.Mileage -165.400112
coef.Weight
4.602914
coef.Disp.
-40.567694
coef.HP
70.908074
sigma
2642.448365
>
> #Model with 2 cluster
> set.seed(1)
> model2 <- flexmix(Price ~ Mileage + Weight + Disp. + HP, data=cartestframe, k=2)
> summary(model2)
Call:
flexmix(formula = Price ~ Mileage + Weight + Disp. + HP, data = cartestframe,
k = 2)
prior size post>0 ratio
Comp.1 0.7 46 55 0.836
Comp.2 0.3 14 55 0.255
'log Lik.' -538.7509 (df=13)
AIC: 1103.502 BIC: 1130.728
>
> #Model with 3 cluster
> set.seed(1)
> model3 <- flexmix(Price ~ Mileage + Weight + Disp. + HP, data=cartestframe, k=3)
> summary(model3)
Call:
flexmix(formula = Price ~ Mileage + Weight + Disp. + HP, data = cartestframe,
k = 3)
prior size post>0 ratio
Comp.1 0.150 7 37 0.189
Comp.2 0.472 32 52 0.615
Comp.3 0.378 21 47 0.447
'log Lik.' -528.0994 (df=20)
AIC: 1096.199 BIC: 1138.086
>
> #Model with 4 cluster
> set.seed(1)
> model4 <- flexmix(Price ~ Mileage + Weight + Disp. + HP, data=cartestframe, k=4)
> summary(model4)
Call:
flexmix(formula = Price ~ Mileage + Weight + Disp. + HP, data = cartestframe,
k = 4)
prior size post>0 ratio
Comp.1 0.387 29 42 0.690
Comp.2 0.400 21 52 0.404
Comp.3 0.213 10 41 0.244
'log Lik.' -527.0793 (df=20)
AIC: 1094.159 BIC: 1136.046
>
> #Model with 5 cluster. Lowest SIC and BIC values compare to
above cluster wise regression
> set.seed(1)
> model5 <- flexmix(Price ~ Mileage + Weight + Disp. + HP, data=cartestframe, k=5)
> summary(model5)
Call:
flexmix(formula = Price ~ Mileage + Weight + Disp. + HP, data = cartestframe,
k = 5)
prior size post>0 ratio
Comp.1 0.109 7
7 1.000
Comp.2 0.307 17 37 0.459
Comp.3 0.137 9 12 0.750
Comp.4 0.174 9 42 0.214
Comp.5 0.272 18 28 0.643
'log Lik.' -490.3728 (df=34)
AIC: 1048.746 BIC: 1119.953
>
> #The parameters are aligned with the relationships.
> #For example the ccoefficent of Milage is negative in the case of all 5 clusters. That means
any increase in Gas milage will reduce the price of the vehicle.
> parameters(model5, component=1)
Comp.1
coef.(Intercept) 24504.8392315
coef.Mileage -788.9528115
coef.Weight
0.5570891
coef.Disp.
-75.3209703
coef.HP
140.8121374
sigma
8.9184939
> parameters(model5, component=2)
Comp.2
coef.(Intercept) 6916.112200
coef.Mileage -223.699851
coef.Weight
3.995574
coef.Disp.
-13.787570
coef.HP
2.370265
sigma
351.799042
> parameters(model5, component=3)
Comp.3
coef.(Intercept) 1.021417e+04
coef.Mileage -1.604458e+02
coef.Weight
-7.851282e-02
coef.Disp.
-1.899640e+01
coef.HP
5.976686e+01
sigma
8.845261e+01
> parameters(model5, component=4)
Comp.4
coef.(Intercept) 6301.9717124
coef.Mileage -668.1869203
coef.Weight
18.4157791
coef.Disp.
-182.2142509
coef.HP
-0.1179906
sigma
1675.9284938
> parameters(model5, component=5)
Comp.5
coef.(Intercept) 12014.705768
coef.Mileage -431.141374
coef.Weight
1.340096
coef.Disp.
-1.676479
coef.HP
60.965397
sigma
275.968118
Download