Austell - assignment2

advertisement

Score 7 out of 10

Assignment question 1.

If age <= 43 & sex = female

Then life insurance promotion = yes

6 / 7 = 86%

If age <= 43 & sex = male & credit card insurance = no

Then life insurance promotion = no

4 / 7 = 57%

IF Age > 43

THEN life insurance promotion = no

IF age <=43 & sex = male & credit card insurance = yes

THEN life insurance promotion = yes

Simplified rule:

If age <= 43 & credit card insurance = no

Then life insurance promotion = no

4 / 7 = 57%

If the individual is male, we can ignore the attribute AGE.

Score 9 out of 10

Assignment question 2. age

> 43

<= 43 yes no 3/0 female female 6/0 male 7

Janet L. Austell

Total Score 80 out of 90

Janet L. Austell

One possibility is to split the Credit Card Insurance No branch on age >29 and age<=29.

The two instances following age >29 will have life insurance promotion = no. The two instances following age <=29 once again split on attribute age. This time, the split is age

<=27 and age > 27.

Score 10 out of 10

Assignment question 3.

Very good

If magazine promotion = yes

Then watch promotion = no

Confidence 4 / 7 = 57%

Support 4 / 11 = 36%

If credit card insurance = yes

Then life insurance promotion = no

Confidence 5 / 8 = 63%

Support 5 / 11 = 45%

If sex = male

Then credit card insurance = no

Confidence 4 / 6 = 66%

Support 4 / 11 = 36%

Score 9 out of 10

Assignment question 4.

For third iteration

Center of cluster C1 = (1.33, 2.5) C2 = (3.33, 4.00)

The new cluster center for cluster 1 is (1.5,5). The new cluster center for cluster 2 is

(4.0,4.25).

1. Distance (c1 – 1 ) = 1.05 Distance (c2 – 1 ) = 3.42

2. Distance (c1 – 2) = 2.03 Distance (c2 – 2) = 2.38

3. Distance (c1 – 3 ) = 1.20 Distance (c2 – 2) = 2.83

4. Distance (c1 – 4) = 1.20 Distance (c2 – 4) = 1.42

5. Distance (c1 – 5 ) = 1.67 Distance (c2 – 5 ) = 1.54

6. Distance (c1 – 6) = 5.07 Distance (c2 – 6) = 2.61 c1 contains instances 1, 2, 3, 4, 5 c2 contains instance 6

Score 10 out of 10

Assignment question 5.

Good

10

~Line is blank

18

~Bad numerical data for attribute Age

Janet L. Austell

Score 10 out of 10

Assignment question 6. a.

two classess :

43% are male

Very good

These males are less likely to participate in a promotional purchase

47% are female

These females are more likely to participate in a promotional purchase b.

Males predictability :

50% these males do not participate in Magazine Promo

50% these males do not participate in Watch Promo

63% of these males do not participate in participate Life Ins Promo

75% of these males do not participate in participate Credit Card Ins.

Males predictiveness:

With a certainty of 57% we can predict that those that do not participate in

Magazine Promo are male

With a certainty of 57% we can predict that those that do not participate in Watch

Promo are male

With a certainty of 83% we can predict those that do not participate in Life Ins

Promo are male

With a certainty of 50% we can predict that those that do not participate in Credit

Card Ins are male

females predictability :

57% these females participate in Magazine Promo

50% these females participate in Watch Promo

86% of these females participate in participate Life Ins Promo

86% of these females do not participate in participate Credit Card Ins. females predictiveness:

With a certainty of 50% we can predict that participates in Magazine Promo are female

With a certainty of 50% we can predict those that Watch Promo are female

With a certainty of 67% we can predict those that participate in Life Ins Promo are female

With a certainty of 50% we can predict that those participates that do not participate in Credit Card Ins are female c.

*******************************

Rules for Class Male

8 instances

*******************************

Life Ins Promo = No

:rule accuracy 83.33%

:rule coverage 62.50%

Janet L. Austell

**Total Percent Coverage = 62.50%

*******************************

Rules for Class Female

7 instances

*******************************

38.00 <= Age <= 41.00

:rule accuracy 100.00%

:rule coverage 57.14%

38.00 <= Age <= 41.00

and Life Ins Promo = Yes

:rule accuracy 100.00%

:rule coverage 57.14%

**Total Percent Coverage = 57.14%

Score 10 out of 10 Very good

Assignment question 7. a.

Res. Score:

Class Sick

0.553

Class Healthy

0.581

Domain

0.52 b.

Male 140 of 203 = 69%

Female 63 of 203 = 31% c. Flat d. 51.945 e. Normal f. blood pressure

125

130 g. 16 of 93 = 17%

Female 16 0.17 h. With a certainty of 75 % I can predict that patient with the condition angina are sick i. #colored vessels j. 82% k. We have a 95% confidence in the test instance being classed correctly l. 13 m. thal = Rev and chest pain type = Asymptomatic

:rule accuracy 91.53%

# Covered = 54

Janet L. Austell

# remaining= 39

**Total Percent Coverage = 58.06%

Score 8 out of 10

Assignment question 8. j. 138 sick instances exist. 45% of all the test instances are classed as sick. This has an error rate of 49.3% to 60.7%. Accuracy rate 81% k. 165 healthy instances exist. 54% of all test instances are classed as healthy. This has am error rate of 51.7% to 54% Accuracy rate 34%

Score 7 out of 10

Assignment question 9. a.

Yes Age and Sex

The predictiveness score for sex = female is 0.81 for the survivors. The predictiveness score for sex = male is 0.77 for the non-survivors. For the non-survivors, class =

third has a predictiveness score of 0.72 and class = crew has a predictiveness score of

0.76. b.

77% Good c.

A 95% test set accuracy would result in a coverage of 00.0%. The error rate is

19.8% to 26.2% therefore an accuracy of 95% is unachievable

The lower-bound accuracy is 73.8%. The upper-bound accuracy is 80.2%.

d.

The example in section 4.8 uses randomly selected test data

The test set for the example in section 4.8 contains 190 non-survivors and 10 survivors. That is, 95% of the test data holds non-survivor instances. The test data does not reflect the ratio of survivors to non-survivors seen in the entire dataset. The test set for this problem contains 77% non-survivors and 23% survivors. This nonsurvivor to survivor ratio more closely matches the ratio seen in the entire domain of data instances.

Download