McDonald - Homework Week 4

advertisement
Total Score 58 out of 60
Nancy McDonald
CSIS 5420
Homework Week 4
Score 10 out of 10
1. Compute the numeric attribute significance score, P, for the two numeric attributes
having the highest attribute significance scores as determined by ESX:
The 2 attributes having the highest [ESX] attr. sign. scores are
8 (sign. score = .65) and 11 (sign. score = .82)
Note: variance = standard deviation squared
Total instances for Class + is 307; for Class - is 383
For attribute 8:
P = abs(3.43-1.26)/sqrt( (4.12^2/307) + (2.12^2/383))
P = 2.17/sqrt(.0553+.0117) = 2.17/.2588
P = 8.385
For attribute 11:
P = abs(4.61-.63)/sqrt( (6.32^2/307) + (1.9^2/383))
P = 3.98/sqrt(.13+.0094) = 3.98/.3734
P = 10.66
The attribute is significant if the value for P>=2. Therefore, both attributes are
significant. Good work
Score 10 out of 10
2. The linear correlation between the attributes #colored vessels and thal is .254651. This
is closer to 0 than to 1 or –1, so my initial reaction is that there is no correlation between
the two. Continuing on to create a scatterplot diagram to look for a curvilinear
relationship, I found the chart to be disjointed (scattered). Therefore, I conclude that the
two attributes are not related. Good work
Score 10 out of 10
3.a. Quick mine option:
1
2
1
753
235
2
946
1125
Percent correct: 61%
Error upper bound: 40.3%
Error lower bound: 37.7%
b. Full mine (not quick) option:
1
2
1
718
198
Percent correct: 61%
Error upper bound: 40.8%
Error lower bound: 37.2%
2
981
1162
c. Since the two above mining tries were so close, I performed another quick mine with
the following results:
Quick mine option (2nd time):
1
2
1
706
233
2
993
1127
Percent correct: 59%
Error upper bound: 42.3%
Error lower bound: 39.7%
The results of the second quick mine are a little further apart than the first quick
mine, but all-in-all I concluded that the quick mine sessions were just about as accurate in
classifying the test data as was the full-blown data mining session Good
Score 10 out of 10
4. Working with CardiologyCategorical.xls:
c. Test set accuracy when using thal (categorical) as an output and #colored vessels
(numerical) as input: 19%. This poor accuracy for the test set leading me to believe that
using only #colored vessels as a numerical input to classify thal is of little predictive
value. Good
d. Test set accuracy when using chest pain type (categorical) as an output and
maximum heart rate (numerical) as input: 52%. While this accuracy for the test set is
better than the previous try however, 52% is still not a good predictor of chest pain type.
I.e. its predictive value is better than using #colored vessels to predict thal, but I don’t
consider it to have great predictive value. Good
e. When applying supervised learning toe locate redundant attributes, keep like types
together. In other words, don’t use numerical inputs to determine categorical outputs,
and vice versa. Advantage: more accurate results in determining whether an attribute is
of highly or little predictive value for another (output) attribute.
Disadvantage: There are some numerical attributes that may be highly predictive for
categorical attributes, and we lose ability to locate redundant data by not using them,
Good work
Score 10 out of 10
5.a. convert $40 to a value between 0 and 1 using the 1-year stock price range of $20-$50
(low = 20, high = 50):
price to be converted = 40
new value = (40-20)/(50-20) = .67 Good
b. neural network predicts future price value of .3. Convert to understandable price:
price = (50-20)(.3) + 20 = $29. Good
Score 8 out of 10
6. Apply input instance [.5, .2, 1] to the feed forward neural n/w on page 247:
a. Compute input to nodes i and j:
Input to i = .5*.2 + .2*-.1 + 1*.2 = .1-.02+.2 = .28
Input to j = .5*.1 + .2*.3 + -.1*1 = .05 +.06 - .1 = .01
b. use sigmoid function to compute outputs of nodes i and j:
Output of i = 1/(1+e^-.28) = 1/(1+.7558) = .5695
Output of j = 1/(1+e^-.01) = 1/(1+.95) = .5128
c. Compute input and output values for node k:
Input to k = .5*.5695 + .1*.5128 = .05128 + .28475 = .336
Output of k = 1/(1+e^-.336) = 1/(1+.7146) = .56323
a. Input to node j =0.06, Input to node i =0.23
b. Output from node j  0.514995, Output from node i  0.442752
c. Input to node k  0.2728755, Output from node k  0.432201
Download