CSI 5V93 (001) Special Topics in Computer Science: Information retrieval, Text mining and Web search Spring 2015 Dr. King-Ip (David) Lin Tentative Syllabus (A copy of this can be found on http://cs.baylor.edu/~lind/) Contact Information: Office: ROGERS 220.07 Phone #: 254-710-4742 E-mail: David_Lin@baylor.edu Office hours: Mon, Wed 1-4 pm / Thur 10am – 12 noon(Or by appointment) Course Description: The course focuses on information retrieval (IR) – a field that look at techniques that store, index, query and retrieve relevant information from a huge source of (unstructured) data. We put specific emphasis on text data retrieval from the web, which is a critical part of many important application. Topics include Basis of information retrieval system Information retrieval models: tf-idf, vector models, probabilistic models, latent semantic models Retrieval evaluation: Precision/Recall, F-measures and other measure Relevance feedback Document (pre)-processing for IR: lexical analysis, tagging, stemming, parsing, corpus-based techniques, taxonomies Text classification: clustering, classification, dimensionality reduction, semi-supervised techniques Web crawling: architecture and implementation Web search-engine/retrieval: Exploiting web structure, link analysis, search engine architecture, ranking Pre-requisites Students are expected to have taken an undergraduate data structure and database courses. Also they should expect quite a bit of programming (although you would be free to choose a programming language that you feel most comfortable with). Also, if you have taken (or is taking) data mining and/or machine learning, you will be at a slight advantage. However, I will NOT assume you have taken those classes, and any information from those two classes that is needed will be covered here (albeit in a less detailed level). Textbook Ricard Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval: the concepts and technology behind search, 2nd edition, Pearsons. ISBN 978-0-321-41691-, 2011. (1st edition do NOT work).