Overview of Database Systems

advertisement
Introduction to Web Mining
Spring 2013
What is data mining?

Data mining is


extraction of useful patterns from data
sources, e.g., databases, texts, web, images,
etc.
Patterns must be:

valid, novel, potentially useful, understandable
Classic data mining tasks

Classification:
mining patterns that can classify future (new) data
into known classes.

Association rule mining
mining any rule of the form X  Y, where X and Y
are sets of data items.

Clustering
identifying a set of similarity groups in the data
Classic data mining tasks

(contd)
Sequential pattern mining:
A sequential rule: A B, says that event A will be
immediately followed by event B with a certain
confidence

Deviation detection:
discovering the most significant changes in data

Data visualization
CS583, Bing Liu, UIC
4
Why is data mining important?

Huge amount of data



How to make best use of data?
Knowledge discovered from data can be used for
competitive advantage.
Many interesting things that one wants to find
cannot be found using database queries, e.g.,
“find people likely to buy my products”
6
WWW





Web is an internet-based computer network
that allows users of one computer to access
information stored on another through the
internet.
Client-server model, hypertext documents
Invented in 1989 by Tim Berners-Lee at
CERN with HTTP/HTML
Mosaic (1993), Netscape(1994), Internet
Explore (1995)
Related with Internet (ARPANET, TCP/IP)
Web mining

traditional data mining



data is structured and relational
well-defined tables, columns, rows, keys,
and constraints.
Web data


readily available data rich in features and
patterns
Content/link/usage data
8
Topic Description






Introduction to basic data mining: association
and sequential mining, classification,
clustering
Crawling, Web search and information
retrieval
Social network analysis
Structure data extraction, information
integration
Opinion mining and sentiment analysis
Web usage mining
Related fields

Web mining is an multi-disciplinary field:
Machine learning
Statistics
Databases
Information retrieval
Visualization
Natural language processing
etc.
Download