Review2

COSC 6335 Review2 on Thursday, October 31 Dr. Eick 1. Decision Trees/Classification a) Compute the GINI-gain1 for the following decision tree split[4] (just giving the formula is fine!): (5,3,2) (4,0,2) (1,3,0) GINI-gain=GINIbefore-GINIafter= G(5/10,3/10,2/10) – 0.6*G(2/3,0,1/3)-0.4*G(1/4,3/4,0) b) Assume you learn a decision tree for a dataset that solely contains numerical attributes. What can be said about the shape of the decision boundaries that the learnt decision tree model employs? [2] Axis-parallel lines or line segments c) Are decision trees capable to model the ‘either-or’ operator; for example. IF EITHER A OR B THEN class1 ELSE class2? Give a reason for your answer! [3] Solution given on white board during lecture! c) What are the characteristics of over-fitting when learning decision trees? What can be done to deal with over-fitting? [3] overfitting: training error low[0.5], testing error not optimal[0.5], models is too complex—the decision tree has to many nodes[1] What to do to deal with it? 1. increase the degree of pruning in the decision tree learning algorithms to obtain smaller decision trees [2] 2. increase the number of training examples [1] Other answers might exist which might deserve some credit! d) Are decision trees suitable for classification problems involving continuous attributes when classes have multi-modal (http://en.wikipedia.org/wiki/Multimodal) distributions? Give reasons for your answer! Yes, decision tree can learn disjunctive concepts and can deal with multi-modal classes, as follows: each path in the decision tree to a leaf node identifies a patch where the decision tree model predicts the class of the leaf node; multi-modal models for a class C can be obtained by using multiple patches with leaf nodes that predict C. e) Most machine learning approaches use training sets, test sets and validation sets to derive models. Describe the role each of the three sets plays! [4] Training set: used to learn the model [1.5] Test set: used to evaluate the model, particularly its accuracy [1.5] Validation set: used to determine the “best” input parameter(s) for the algorithm which learns the model; e.g. parameters which control the degree of pruning of a decision tree learning algorithm. [2] 1 (GINI before the split) minus (GINI after the split) 1 2 2. APRIORI a) What is the APRIORI property? XYi(X)≥i(Y) b) Assume the APRIORI algorithm identified the following 7 4-item sets that satisfy a user given support threshold: abcd, acde, acdf, acdg, adfg, bcde, and bcdf; what initial candidate 5-itemsets are created by the APRIORI algorithm; which of those survive subset pruning? abcde, abcdf, abcef, bcdef All four 5-item sets are pruned c) What is the final result that the APRIORI algorithm computes? Let D be the set of items and  be the support threshold for which APRIORI is run and T is the transaction database {YD | support(Y,T) ≥} d) Assume an association rule if smoke then cancer has a confidence of 86% and a high lift of 5.4. What does this tell you about the relationship of smoking and cancer? P(Cancer|Smoke)=P(Cancer and Smoke)/P(Smoke)=0.86 “you class mate Arko Barman was correct that we have to divide by P(smoke), as we make a statement about people who smoke” P(Cancer|Smoke)/P(Cancer)=4.3 e) What are the main differences between the APRIORI algorithm and GSP (the “apriori”-like algorithm which generalizes APRIOR to mine sequences) 1. Order of items matters in sequences, but not in sets  more patterns 2. ARIORI is based on sets/subsets and GSP is based on sequences/subsequences 3. … 3. A little more on clustering a) What of the following cluster shapes K-means is capable to discover? i) triangles ii) clusters inside clusters iii) the letter ‘T ‘iv) any polygon of 5 points v) the letter ’I’ yes, no, no, not if the polygon is concave, yes b) What the weaknesses of the DBSCAN clustering algorithm ? a. Does not work well for high dimensional datasets b. Parameter selection is difficult c. Not very fast; O(n*log(n)) at best; O(n**2) for most implementations d. Problems dealing with clusters with varying densities. e. … 3 4) Exploratory Data Analysis a) Assume attribute A has a correlation of -0.95 with attribute B; what does this say about the relationship of the two attributes? Strong linear relation; if A goes up B goes down and vice versa. b) Assume you have a dataset with 3 attributes and the entries of the covariance matrix have positive numbers in the diagonal, but all other entries are 0. What does this say about the relationship of the 3 attributes? Interpret the following histogram for the body weight in a group of cancer patients! Two peaks around body weight 63kg and 78kg [2] Median around 70kg[0.5] No gap or two small gabs at 112&118, not significantly skewed[2] At most 4 points; other observations might deserve credit! c) Assume you create an Boxplot for attribute A does not show any outlier—what does this mean? Assume that the value of the 25% percentile is 7 and the value of the 75% percentile is 2; that is, the box goes from 2 to 7? Only points that are 1.5*IQR higher/lower than the upper/lower box boundary are visualized as outliers. In the particular example, this means there are no values lower than -5.5=2-7.5 and there are no values higher than 14.5=7+7.5. 4

Review2

Related documents

Products

Support

Review2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib