CSI 5V93 (001): Information Retrieval, Text Mining and Web Search

advertisement
CSI 5V93 (001) Special Topics in Computer Science:
Information retrieval, Text mining and Web search
Spring 2015
Dr. King-Ip (David) Lin
Tentative Syllabus
(A copy of this can be found on http://cs.baylor.edu/~lind/)
Contact Information:
Office: ROGERS 220.07
Phone #: 254-710-4742
E-mail: David_Lin@baylor.edu
Office hours: Mon, Wed 1-4 pm / Thur 10am – 12 noon(Or by appointment)
Course Description:
The course focuses on information retrieval (IR) – a field that look at techniques that store, index, query and
retrieve relevant information from a huge source of (unstructured) data. We put specific emphasis on text data
retrieval from the web, which is a critical part of many important application.
Topics include








Basis of information retrieval system
Information retrieval models: tf-idf, vector models, probabilistic models, latent semantic models
Retrieval evaluation: Precision/Recall, F-measures and other measure
Relevance feedback
Document (pre)-processing for IR: lexical analysis, tagging, stemming, parsing, corpus-based techniques,
taxonomies
Text classification: clustering, classification, dimensionality reduction, semi-supervised techniques
Web crawling: architecture and implementation
Web search-engine/retrieval: Exploiting web structure, link analysis, search engine architecture, ranking
Pre-requisites
Students are expected to have taken an undergraduate data structure and database courses. Also they should
expect quite a bit of programming (although you would be free to choose a programming language that you feel
most comfortable with).
Also, if you have taken (or is taking) data mining and/or machine learning, you will be at a slight advantage.
However, I will NOT assume you have taken those classes, and any information from those two classes that is
needed will be covered here (albeit in a less detailed level).
Textbook
Ricard Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval: the concepts and technology behind
search, 2nd edition, Pearsons. ISBN 978-0-321-41691-, 2011. (1st edition do NOT work).
Download