Total Score 58 out of 60 Nancy McDonald CSIS 5420 Homework Week 4 Score 10 out of 10 1. Compute the numeric attribute significance score, P, for the two numeric attributes having the highest attribute significance scores as determined by ESX: The 2 attributes having the highest [ESX] attr. sign. scores are 8 (sign. score = .65) and 11 (sign. score = .82) Note: variance = standard deviation squared Total instances for Class + is 307; for Class - is 383 For attribute 8: P = abs(3.43-1.26)/sqrt( (4.12^2/307) + (2.12^2/383)) P = 2.17/sqrt(.0553+.0117) = 2.17/.2588 P = 8.385 For attribute 11: P = abs(4.61-.63)/sqrt( (6.32^2/307) + (1.9^2/383)) P = 3.98/sqrt(.13+.0094) = 3.98/.3734 P = 10.66 The attribute is significant if the value for P>=2. Therefore, both attributes are significant. Good work Score 10 out of 10 2. The linear correlation between the attributes #colored vessels and thal is .254651. This is closer to 0 than to 1 or –1, so my initial reaction is that there is no correlation between the two. Continuing on to create a scatterplot diagram to look for a curvilinear relationship, I found the chart to be disjointed (scattered). Therefore, I conclude that the two attributes are not related. Good work Score 10 out of 10 3.a. Quick mine option: 1 2 1 753 235 2 946 1125 Percent correct: 61% Error upper bound: 40.3% Error lower bound: 37.7% b. Full mine (not quick) option: 1 2 1 718 198 Percent correct: 61% Error upper bound: 40.8% Error lower bound: 37.2% 2 981 1162 c. Since the two above mining tries were so close, I performed another quick mine with the following results: Quick mine option (2nd time): 1 2 1 706 233 2 993 1127 Percent correct: 59% Error upper bound: 42.3% Error lower bound: 39.7% The results of the second quick mine are a little further apart than the first quick mine, but all-in-all I concluded that the quick mine sessions were just about as accurate in classifying the test data as was the full-blown data mining session Good Score 10 out of 10 4. Working with CardiologyCategorical.xls: c. Test set accuracy when using thal (categorical) as an output and #colored vessels (numerical) as input: 19%. This poor accuracy for the test set leading me to believe that using only #colored vessels as a numerical input to classify thal is of little predictive value. Good d. Test set accuracy when using chest pain type (categorical) as an output and maximum heart rate (numerical) as input: 52%. While this accuracy for the test set is better than the previous try however, 52% is still not a good predictor of chest pain type. I.e. its predictive value is better than using #colored vessels to predict thal, but I don’t consider it to have great predictive value. Good e. When applying supervised learning toe locate redundant attributes, keep like types together. In other words, don’t use numerical inputs to determine categorical outputs, and vice versa. Advantage: more accurate results in determining whether an attribute is of highly or little predictive value for another (output) attribute. Disadvantage: There are some numerical attributes that may be highly predictive for categorical attributes, and we lose ability to locate redundant data by not using them, Good work Score 10 out of 10 5.a. convert $40 to a value between 0 and 1 using the 1-year stock price range of $20-$50 (low = 20, high = 50): price to be converted = 40 new value = (40-20)/(50-20) = .67 Good b. neural network predicts future price value of .3. Convert to understandable price: price = (50-20)(.3) + 20 = $29. Good Score 8 out of 10 6. Apply input instance [.5, .2, 1] to the feed forward neural n/w on page 247: a. Compute input to nodes i and j: Input to i = .5*.2 + .2*-.1 + 1*.2 = .1-.02+.2 = .28 Input to j = .5*.1 + .2*.3 + -.1*1 = .05 +.06 - .1 = .01 b. use sigmoid function to compute outputs of nodes i and j: Output of i = 1/(1+e^-.28) = 1/(1+.7558) = .5695 Output of j = 1/(1+e^-.01) = 1/(1+.95) = .5128 c. Compute input and output values for node k: Input to k = .5*.5695 + .1*.5128 = .05128 + .28475 = .336 Output of k = 1/(1+e^-.336) = 1/(1+.7146) = .56323 a. Input to node j =0.06, Input to node i =0.23 b. Output from node j 0.514995, Output from node i 0.442752 c. Input to node k 0.2728755, Output from node k 0.432201