With effect from the academic year 2015-16 IT 6123 WEB MINING Instruction

advertisement
With effect from the academic year 2015-16
IT 6123
WEB MINING
Instruction
Duration of University Examination
University Examination
Sessional
Objectives:
3 Periods per Week
3 Hours
80 Marks
20 Marks
1. Introduce students to the basic concepts and techniques of Information Retrieval, Web
Search, Machine Learning for extracting knowledge from the web.
2. Develop skills of using recent data mining software for solving practical problems of
Web Mining.
3. Gain experience of doing independent study and research.
Outcomes:
After the course is completed student should be able to:
1. Describe key concepts such as web log, hypertext, social network, information
synthesis, corpora and evaluation measures such as precision and recall.
2. Discuss the use of methods and techniques such as word frequency and co-occurrence
statistics, normalization of data, machine learning, clustering and vector space
models.
3. Analyze and explain what web mining problems are satisfiably solved, what is
worked upon at the research frontier and what still lies beyond the current state-ofthe-art.
UNIT-I
Introduction: Crawling and Indexing, Topic Directories, Clustering and Classification,
Hyperlink analysis, Resource Discovery and Vertical Portals. Structured vs Unstructured
Data Mining.Crawling the web: HTML and HTTP basics, Crawling Basics, Engineering
Large Scale Crawlers, Putting Together a Crawler.
UNIT-II
Web Search and Information Retrieval: Boolean Queries and Inverted index, Relevance
Ranking, Similarity Search.
Similarity and Clustering: Foundations and Approaches, Bottom-up and Top-Down
partitioning paradigms.
UNIT-III
Supervised learning: Introduction, Overview of classification strategies, Nearest Neighbor
Learners, Feature Selection, Bayesian Learners, Discriminative Classification, Hypertext
Classification.
UNIT-IV
Semi supervised learning: Expectation Maximization, Labelling Hypertext Graphs, CoTraining
Social network analysis: Social Sciences and bibliometry, Page Rank and HITS, Coarse
Grained Graph, Model, EnhancedModel and Techniques, Evaluation of Topic Distillation.
With effect from the academic year 2015-16
UNIT-V
Resource discovery:Collecting Important Pages, Similarity Search using Link Topology,
Topical Locality and Focused Crawling, Discovering Communities.
Future of Web Mining: Information Extraction, Natural Language Processing, Question
Answering, Profile, Personalization and Collaboration.
Suggested Reading:
1)
ChakrabartiSoumen, "Mining the Web: Discovering Knowledge from Hypertext
Data ", Morgan Kaufmann Publishers, 2003.
2) Manu Konchady, "Text Mining Application Programming" Cengage Learning, 2006.
Download