DataMining-MHC-2007

advertisement
Spring 2007
Data Mining
Dr. Xiaoyan Li
Department of Computer Science
Mount Holyoke College
email: xli@MtHolyoke.edu
phone: (413) 538-2554
Course Description
Data Mining has become one of the most exciting and fastest growing fields in computer science.
Data Mining refers to various techniques which can be used to uncover hidden information from
a database. The data to be mined may be complex, multimedia data including text, graphics,
video, audio and bioinformatics data. Data Mining has evolved from several areas including:
databases, artificial intelligence, machine learning, pattern recognition, multimedia information
retrieval, and can be applied to the exploration of hidden information from web, video, and
bioinformatics data. This course is designed to provide senior undergraduate students with
introductory of data mining concepts and tools. In addition, related concepts such as information
retrieval, web mining and bioinformatics will be covered.
Prerequisites:
CS 211 and CS 221 or permission of instructor
Visible Notes:
2 meetings (75 minutes)
Course Syllabus
Part I. Introduction and Related Topics
1. Introduction: tasks, issues, metrics and social implications
2. Related topics in database: OLTP, OLAP and data warehousing
3. Relate topics in information retrieval: web search, question-answering and novelty detection
4. Related Topics in artificial intelligence: machine learning and pattern matching
Part II. Core Techniques
1. Classification: Bayesian, KNN, ID3, ANN, rule-based
2. Clustering: hierarchical, partitional, clustering in large database
3. Associate Rules: basic and advanced algorithms
Part III. Advanced Topics
1. Web Mining: contents, structure and usage
2. Image/Video Mining: CBIR, MPEG-7, video event detection
3. Bioinformatics: biology preliminaries, information aspects, microarray data clustering
Textbook:
Data Mining Introductory and Advanced Topics by Margaret H. Dunham Prentice Hall, 2003
Book Web Page
References:
Data Mining: Multimedia, Soft Computing, and Bioinformatics, Sushmita Mitra, Tinku Acharya,
ISBN: 0-471-46054-0, Hardcover, 424 pages, September 2003
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471460540.html
Principle of Data Mining, by Hand, Mannila and Smith, MIT Press, 2001.
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=3520
Download