Errata for Data Mining Introductory and Advanced Topics by Margaret H. Dunham Updated 3/3/05. Chapter 1: Chapter 2: Chapter 3: Example 3.3, p54: Formula for P(h1 | x4 ) has an extra “)”. It should read: ( P( x4 | h1 ) P(h1 )) . P( x 4 ) Chapter 4: Last paragraph on p79 correct fallout/recall as follows (Thanks to Nick Street) : “fallout(percentage of irrelevant that are retrieved) versus recall (percentage of relevant that are retrieved).” Last sentence on the bottom of page 79/top of page 80 should be changed to read: “The curve is constructed by examining tuples classified as relevant in a particular order, such as descending order of similarity.” Equation 4.26 on p101 should read H ( Corrections to calculations in Example 4.9 on p102 can be found at http://www.engr.smu.edu/~mhd/dmbook/ex49.pdf . Exercise 2 on p121 should read “that the Output1 column is the correct classification and Output2 is what is seen.” Exercise 3 on p121 should read “assuming Output2 is the correct assignment”. Exercise 7 on p122 should replace <Jim,M,2.0> with <John,M,2.5>. Exercise 21 on p 122 should make guideline plural. 2 2 3 4 2 2 , , , , , ). 15 15 15 15 15 15 Chapter 5: P 144, total cost complexity for PAM should be k(n-k)**2 (Thanks to Lars Helge Hass). Example 5.9 on p158 uses a threshold of 0.2 (not 0.6) (Thanks to Aryya Gangopadhyay) Chapter 6: Chapter 7: Page 198, Last sentence prior to section 7.2.1 should read : “An alternative markup language such as extensible markup language (XML), provides structured documents and facilitates easier mining. Page 212, Table 7.1, 4th row should be labeled as: “Maximal forward references” Page 212, last line should read “in Example 7.4 is shown in Example 7.5.” Page 213, first line after Algorithm 7.2 should read: “… for Example 7.4 is shown …” Page 215, Example 7.7, second line should read: “data in Example 7.4 …” Page 215, Paragrph label at bottom of page should read: “Maximal Frequent Forward References” Page 216, second line should read: “… Looking at Example 7.7 and the …” Page 216, the sequence on lines two and three should be <A, B, C, A, C, B, C, A, C, D, C, E> Page 216, the maximal forward references on line four should be : <A,B,C>, <A,C,B>, <A,C,D>,<A,C,E> Page 216, the first line in second paragraph should end with: “... mine maximal frequent forward references “ Page 216, title of Algorithm 7.3 should be: “Maximal frequent forward references algorithm” Page 218, Exercise 5, should read: “… indicate the sequential patterns, maximal forward references, and maximal frequent sequences …” Page 219, The last two lines of Exercise 6 should read: “Identify the maximal frequent forward references when the users can be distinguished as well as when they can not be. Assume that a sequence must occur twice to be large.” Chapter 8: Chapter 9: Appendix A: Appendix B: Please let me know of any additional corrections which should be included. Thanks, Maggie Dunham