With effect from the academic year 2015-16 IT 6123 WEB MINING Instruction Duration of University Examination University Examination Sessional Objectives: 3 Periods per Week 3 Hours 80 Marks 20 Marks 1. Introduce students to the basic concepts and techniques of Information Retrieval, Web Search, Machine Learning for extracting knowledge from the web. 2. Develop skills of using recent data mining software for solving practical problems of Web Mining. 3. Gain experience of doing independent study and research. Outcomes: After the course is completed student should be able to: 1. Describe key concepts such as web log, hypertext, social network, information synthesis, corpora and evaluation measures such as precision and recall. 2. Discuss the use of methods and techniques such as word frequency and co-occurrence statistics, normalization of data, machine learning, clustering and vector space models. 3. Analyze and explain what web mining problems are satisfiably solved, what is worked upon at the research frontier and what still lies beyond the current state-ofthe-art. UNIT-I Introduction: Crawling and Indexing, Topic Directories, Clustering and Classification, Hyperlink analysis, Resource Discovery and Vertical Portals. Structured vs Unstructured Data Mining.Crawling the web: HTML and HTTP basics, Crawling Basics, Engineering Large Scale Crawlers, Putting Together a Crawler. UNIT-II Web Search and Information Retrieval: Boolean Queries and Inverted index, Relevance Ranking, Similarity Search. Similarity and Clustering: Foundations and Approaches, Bottom-up and Top-Down partitioning paradigms. UNIT-III Supervised learning: Introduction, Overview of classification strategies, Nearest Neighbor Learners, Feature Selection, Bayesian Learners, Discriminative Classification, Hypertext Classification. UNIT-IV Semi supervised learning: Expectation Maximization, Labelling Hypertext Graphs, CoTraining Social network analysis: Social Sciences and bibliometry, Page Rank and HITS, Coarse Grained Graph, Model, EnhancedModel and Techniques, Evaluation of Topic Distillation. With effect from the academic year 2015-16 UNIT-V Resource discovery:Collecting Important Pages, Similarity Search using Link Topology, Topical Locality and Focused Crawling, Discovering Communities. Future of Web Mining: Information Extraction, Natural Language Processing, Question Answering, Profile, Personalization and Collaboration. Suggested Reading: 1) ChakrabartiSoumen, "Mining the Web: Discovering Knowledge from Hypertext Data ", Morgan Kaufmann Publishers, 2003. 2) Manu Konchady, "Text Mining Application Programming" Cengage Learning, 2006.