Introduction to Web Mining Spring 2013 What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web, images, etc. Patterns must be: valid, novel, potentially useful, understandable Classic data mining tasks Classification: mining patterns that can classify future (new) data into known classes. Association rule mining mining any rule of the form X Y, where X and Y are sets of data items. Clustering identifying a set of similarity groups in the data Classic data mining tasks (contd) Sequential pattern mining: A sequential rule: A B, says that event A will be immediately followed by event B with a certain confidence Deviation detection: discovering the most significant changes in data Data visualization CS583, Bing Liu, UIC 4 Why is data mining important? Huge amount of data How to make best use of data? Knowledge discovered from data can be used for competitive advantage. Many interesting things that one wants to find cannot be found using database queries, e.g., “find people likely to buy my products” 6 WWW Web is an internet-based computer network that allows users of one computer to access information stored on another through the internet. Client-server model, hypertext documents Invented in 1989 by Tim Berners-Lee at CERN with HTTP/HTML Mosaic (1993), Netscape(1994), Internet Explore (1995) Related with Internet (ARPANET, TCP/IP) Web mining traditional data mining data is structured and relational well-defined tables, columns, rows, keys, and constraints. Web data readily available data rich in features and patterns Content/link/usage data 8 Topic Description Introduction to basic data mining: association and sequential mining, classification, clustering Crawling, Web search and information retrieval Social network analysis Structure data extraction, information integration Opinion mining and sentiment analysis Web usage mining Related fields Web mining is an multi-disciplinary field: Machine learning Statistics Databases Information retrieval Visualization Natural language processing etc.