Spring 2007 Data Mining Dr. Xiaoyan Li Department of Computer Science Mount Holyoke College email: xli@MtHolyoke.edu phone: (413) 538-2554 Course Description Data Mining has become one of the most exciting and fastest growing fields in computer science. Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex, multimedia data including text, graphics, video, audio and bioinformatics data. Data Mining has evolved from several areas including: databases, artificial intelligence, machine learning, pattern recognition, multimedia information retrieval, and can be applied to the exploration of hidden information from web, video, and bioinformatics data. This course is designed to provide senior undergraduate students with introductory of data mining concepts and tools. In addition, related concepts such as information retrieval, web mining and bioinformatics will be covered. Prerequisites: CS 211 and CS 221 or permission of instructor Visible Notes: 2 meetings (75 minutes) Course Syllabus Part I. Introduction and Related Topics 1. Introduction: tasks, issues, metrics and social implications 2. Related topics in database: OLTP, OLAP and data warehousing 3. Relate topics in information retrieval: web search, question-answering and novelty detection 4. Related Topics in artificial intelligence: machine learning and pattern matching Part II. Core Techniques 1. Classification: Bayesian, KNN, ID3, ANN, rule-based 2. Clustering: hierarchical, partitional, clustering in large database 3. Associate Rules: basic and advanced algorithms Part III. Advanced Topics 1. Web Mining: contents, structure and usage 2. Image/Video Mining: CBIR, MPEG-7, video event detection 3. Bioinformatics: biology preliminaries, information aspects, microarray data clustering Textbook: Data Mining Introductory and Advanced Topics by Margaret H. Dunham Prentice Hall, 2003 Book Web Page References: Data Mining: Multimedia, Soft Computing, and Bioinformatics, Sushmita Mitra, Tinku Acharya, ISBN: 0-471-46054-0, Hardcover, 424 pages, September 2003 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471460540.html Principle of Data Mining, by Hand, Mannila and Smith, MIT Press, 2001. http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=3520