A Brief Overview of Data Mining

A Brief Overview of Data

Mining

- IR Group Meeting

04/11/2006

Qiaozhu Mei

• Introduction

• Functionalities

• Hot topics

• Research Groups

• Useful Resources

Outline

Part 1: Introduction

• Introduction

– What is data mining?

– General Process

– Related Fields

– Different Views

• Functionalities

• Hot topics

• Research Groups


What is Data Mining?

• ( From Prof. Jiawei Han’s Slides ): Data mining (knowledge discovery from data)

– Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data

• ( From Prof. Sunita Sarawagi’s slides ): Process of semi-automatically analyzing large databases to find patterns that are

– valid: hold on new data with some certainty

– novel: non-obvious to the system

– useful: should be possible to act on the item

– understandable: humans should be able to interpret the pattern

• ( From Prof. Vipin Kumar’ Slides ): Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

What is Data Mining? (cont.)

• Under these definitions:

– What is not Data Mining?

• Look up phone number in phone directory

• Query a Web search engine for information about “Amazon”

– What is Data Mining?

• Certain names are more prevalent in certain US locations

(O’Brien, O’Rurke, O’Reilly… in Boston area)

• Group together similar documents returned by search engine according to their context

- Tan, Steinbach, Kumar, Introduction to Data Mining

General Process of KDD

– Data mining—core of

Pattern Evaluation

knowledge discovery process

Data Mining

Task-relevant Data

Data Warehouse

Data Cleaning

Data Integration

Selection

Databases - Han & Kamber, Data Mining: Concepts and Techniques

Related Fields

Machine

Learning

Database

Technology

Algorithm

Data Mining

Statistics

Visualization

• Confluence of

Multiple Disciplines

- Han & Kamber, Data Mining:

Concepts and Techniques

Other

Disciplines

• Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems

• But different…

- Tan, Steinbach, Kumar, Introduction to Data Mining

Statistics/

AI

Machine Learning/

Pattern

Recognition

Data Mining

Database systems

Differences to Related Fields

• Traditional Techniques may be unsuitable due to

– Enormity of data

– High dimensionality of data

From Prof. Vipin Kumar’s slides

– Heterogeneous, distributed nature of data

• Overlaps with machine learning, statistics, artificial intelligence, databases, visualization, but more stress on

– scalability of number of features and instances

– stress on algorithms and architectures whereas foundations of methods and formulations provided by statistics and machine learning.

– automation for handling large, heterogeneous data

-From Prof. Sunita Sarawagi’s slides

Different Views of Data Mining

• Categorize a data mining task from different views

• By general functionality and operations:

– Descriptive data mining

• Find human-interpretable patterns that describe the data.

• Clustering / similarity matching

• Association rules and variants

• Deviation detection

– Predictive data mining

• Use some variables to predict unknown or future values of other variables.

• Regression

• Classification

• Collaborative Filtering

Different Views of Data Mining (II)

• By data to be mined

– Relational, data warehouse, transactional, stream, objectoriented, sequence, graph, social network, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW

• By knowledge to be discovered

– Characterization, discrimination, frequent patterns, association, classification, clustering, trend/deviation, outlier analysis, etc

• By techniques utilized

– Database-oriented, data warehouse (OLAP), combinational algorithms, machine learning, statistics, visualization, etc.

• By application adapted

– Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.

- Han & Kamber, Data Mining: Concepts and Techniques

Part 2: Functionalities

• Introduction

• Functionalities

– Data Warehousing and OLAP

– Frequent patterns, association, correlation and causality

– Classification and prediction

– Clustering

– Outlier analysis, Trend and evolution analysis

• Hot topics

• Research Groups


Data Warehousing and OLAP

• Data Warehousing:

– “

A data warehouse is a subject-oriented , integrated , time-variant , and nonvolatile collection of data in support of management

’ s decisionmaking process.

”—

W. H. Inmon

• OLAP: on-line analytical processing

– Major task of data warehouse system

– Data analysis and decision making

– Drill-down, roll-up, exception/discovery driven

• Methodology product

– Data Cubing

– Iceberg cube

– Multi-way, BUC, Star, MM, product,date all date product,country country date, country shell, close-cube , etc.

- Han & Kamber, Data Mining: Concepts and Techniques product, date, country

Frequent Patterns and Associations

• Frequent pattern : a pattern (itemsets, subsequences, substructures, etc.) that occurs frequently in a data set

– Comparing to n-grams, phrases, etc.

• Motivation : Finding inherent regularities in data

• Applications : Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis

• Association rule mining:

– Given a set of records each of which contain some number of items from a given collection;

– Produce dependency rules which will predict occurrence of an item based on occurrences of other items.

– Frequent pattern  association rules  correlations

Mining Frequent Patterns

• Types of data:

– Itemsets, sequences, graphs.

• Scalable mining methods: Three major approaches

– Apriori (Agrawal & Srikant@VLDB’94)

– FPgrowth (Han, Pei & Yin @SIGMOD’00)

• Prefixspan, clospan, gSpan, closegraph, etc.

– Vertical data format approach (Charm, Zaki & Hsiao @SDM’02)

• Apriori:

– Candidate pattern generation and pruning

– Breadth-first search over pattern space

• FPgrowth:

– Pattern growth through FP-tree, no candidate generation

– Depth-first search, doing pruning smartly

Classification and Prediction

• Supervised Learning, already discussed in Machine Learning.

– Classification: classifies data (constructs a model) based on the training set and the values ( categorical class labels ) in a classifying attribute and uses it in classifying new data

– Prediction: models continuous-valued functions, i.e., predicts unknown or missing values

• Algorithms:

– Decision Tree based: C4.5, ID3, Rainforest, etc.

– Bayesian Method: Naïve Bayesian, Bayesian network , a lot of others covered in Machine Learning..

– Discriminative: Perceptron/Winnow, NN, SVM, CB-SVM , etc.

– Rule-based, Associative, k-NN, etc.

– Prediction: Regression,

• Bagging, Boosting, Model Selection, Cross-Validation

Clustering

• Unsupervised Learning, as discussed in Machine Learning

– Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that

– Data points in one cluster are more similar to one another.

– Data points in separate clusters are less similar to one another.

– Similarities/distances: many!

• Algorithms:

– Partition based: K-means, K-Medoids, CLARA, etc

– Hierarchical: Bottom-up (single/complete/average link), top-down,

Birch

– Density-based/Grid-based: DBSCAN, DENCLUE, CLIQUE, etc.

– Model-based: EM, COBWEB, SOM, etc.

– High-Dimensional, Constraint based

Outlier, Trend and Evolution

• outliers: The set of objects that are considerably dissimilar from the remainder of the data

– Statistical: hypothesis testing, bug mining

– Density based

– Clustering based, etc

• Deviation/Anomaly Detection

• Fraud Detection

• Trend and Evolution:

– Usually coupled with outlier analysis

– Basic functionalities in temporal data mining

– Trend, cycle, seasonal, irregular patterns

Part 3: Hot Topics

• Introduction

• Functionalities

• Hot topics

– Mining data stream, Mining time series, Spatiotemporal data mining, mining Social Networks, Sequential data mining, Graph

Mining, Biology data mining, Privacy Preserving Data Mining

– Text and Web mining

• Research Groups


Mining Data Streams

• Data: Data streams

— continuous, ordered, changing fast, huge amount

• Characteristics and Challenges :

– Huge volumes

– Fast changing, requires fast and real-time response

– Random access is expensive — need single scan algorithms

– Difficult to keep the universe — need approximations

• Basic problems:

– Multi-dimensional on-line analysis of streams

– Mining outliers and unusual patterns in stream data

– Clustering data streams

– Classification of stream data

Mining Data Streams (II)

• Methods:

– Basic: Sliding windows, Tilted time frames

– Counting (FP mining, etc):

• Random sampling

• Approximated counting

– OLAP:

• Keep Critical layers in stream cube computation

• Partial materialization

• outlier: exception-based exploration

– Clustering:

• Offline microclustering and online macroclustering

• Text Related Applications:

– Web logs and Web page click streams

Mining Time series

• Data: Time-series database

– Consists of sequences of values or events changing with time

– Data is recorded at regular intervals

• Characteristics and Challenges :

– Characteristic time-series components: Trend, cycle, seasonal, irregular patterns

• Basic Problems:

– Trends discovery, Similarity Search, outlier detection, prediction and clustering

Mining Time series (II)

• Methods:

– Statistical modeling (Regression, Spline, Mixture Model, etc)

– Data transformation (DFT, DWT)

– Sliding windows, Atomic matching, window stitching,

Subsequence Ordering

– Clustering

-Han & Kamber, Data Mining:



– Transliteration mining, Temporal text mining, word bursting, etc.

Spatiotemporal data mining

• Data: object data sets, spatial/spatiotemporal databases and data warehouses

• Characteristics and Challenges:

– Generalize detailed geographic points into clustered regions, such as business, residential, industrial, or agricultural areas, according to land usage

– handling objects in space that have identity and well-defined extents, locations, and relationships.

– Require the merge of a set of geographic areas by spatial operations

• Basic Problems:

– Querying objects; distribution/cluster/correlation/evolution/trend analysis

Spatiotemporal data mining (II)

• Methods

– GIS (Geographic Information System): Analysis and visualization of geographic data

• Search, Location analysis, Terrain analysis, Distribution,

Spatial analysis/statistics, Measurement

– Indexing Spatial data (R-tree, etc. )

– Modeling single objects with points, lines and regions

– Modeling spatially related collection of objects: plane partitions and networks.

– Spatiotemporal patterns, correlations, trend analysis, clustering …


– Spatiotemporal text mining; community evolution in weblogs;

– Information diffusing; web evolution

Special topics in Frequent Pattern

Mining

• Association rule mining and frequent itemset mining are pretty old topics

• However, some special topics of frequent pattern mining are still hot

– Sequential pattern mining

– Graph mining

– Pattern post-processing

Sequential pattern mining

• Data: sequential data base

• Basic problems:

– Discovery of frequent subsequences (allow gap, comparing to n-grams); close subsequences

– Sequence Similarity Search, Sequence Alignment

• Methods:

– Apriori: GSP

– FP-Growth: PrefixSpan, Clospan

– BLAST, Hidden Markov models,

CRF, etc.


– Most text patterns are sequential patterns

– Phrase extraction, entity/relation extraction, opinion mining, etc

– Biology sequence modeling

-Han & Kamber, Data Mining:


Graph Mining

• Data: graph databases (like social network, but multiple graphs, more general), examples include

– Chemical component, protein structure, program flow, XML/Web,

– Directed, undirected, labeled/unlabeled, weighted, 2-D/3-D, etc.


– Theoretically, most are of high complexity, but practically, the graphs are solvable.

– Too many substructures to index

– …

• Basic problems

– Frequent subgraph mining

– Close subgraph mining

– Graph indexing by substructures

– Similarity search

-Han & Kamber, Data Mining: Concepts and Techniques

Graph Mining (II)

• Methods:

– Subgraph mining: Apriori (e.g. FSG), Pattern Growth (e.g. gSpan)

– gSpan : pattern growth, depth first search, active elimination of duplicated subgraphs; Flatten a graph into a sequence using depth first search; enumerate graph using right-most extension.

– CloseGraph: mining close subgraph patterns

– gIndex : identify frequent structures, prune redundancy to maintain discriminative structures , create index on such structures.

– Similarity search: indexing; feature based similarities; estimate feature missing


– Multi-resolution topic map, entity-relation network, pathway extraction, etc.

Graph Mining (III): Graph Indexing &

Querying

• More on Graph Indexing and Similarity Search

• Comparing to Text Retrieval:

Objects

Basic Units

Text Retrieval

Documents

Words

Pruning stopwords stemming Redundancy?

Representation Term vectors

Dimensions Terms

Relevance Vector similarity

Approximation No

Graph Indexing & Search

Graphs

Frequent structures

Need to mine frequent subgraph

Need discriminative structs.

Feature vectors

Substructures

Vector similarity

Yes, need to estimate feature missing (relax substructures)

Graph Mining (IV): Graph Indexing &

Querying

• What if we want to index on phrases instead of words?

– Need to extract phrases first

– N-grams/sequential patterns, have to remove redundancy

• E.g. “natural language processing” v.s. “language processing”

– Substructures are like phrases…

• Can IR help?

– Representation and Similarity measures? (Vector Space Models,

Probabilistic models…)

– How to weight features? (TF-IDF, …)

– Generative models?

– Query expansion? Feedback?

Pattern Post-processing

• Data: frequent patterns extracted by mining algorithms

• Challenge:

– Mining algorithms output explosively large number of patterns

– How to interpret the frequent patterns extracted

• Basic Problems:

– Pattern summarization

– Mining compressed patterns

– Top-K patterns

– Pattern annotation

– User-oriented ranking

• Methods:

– Modeling Pattern profiles, coverage and contexts

– Using Clustering to summarize and compress patterns

– Bridging IR/NLP and frequent pattern mining: profile, context, ranking, feedback, filtering, summarization, MMR, etc.

Mining Social Networks

• Data: Graphs/networks with nodes and links

– Example: communication networks, webpages, citations, biological pathways, etc.


– Connected Components: few

– Network diameter: small

– Clustering: high degree

– Degree distribution: heavy-tailed

– Modeling Logical/statistical dependencies

• Basic Problems:

– Model the generation of graphs/networks

– Link based object ranking, classification,

Identification, Clustering, entity resolution

– Link Prediction, querying, community discovery

H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai,

Nature 411, 41-42 (2001)

Mining Social Networks (II)

• Methods:

– Graph Generation Models: trying to derive generative models which explains the characteristics and evolutions of social networks/graphs.

– Vertex Ranking: PageRank, HITS, etc.

– Community Detection: Hierarchical Clustering, Spectral clustering, Stochastic modeling, etc.

– Link based classification: semi-supervised learning, propagation

– Entity resolution: duplicate prediction, collective resolution, probabilistic models

– Link Prediction: binary classification problem, local conditional probabilistic models

– Substructure mining: graph pattern mining, indexing

Mining Social Networks (III)

• Generative Models of social network/graph generation and evolution

• Random graphs (Erd ö s-R

é nyi models)

– Fix vertices, generate each edge independently with probability p

– N(N-1)/2 trials of a biased coin flip, p ~ 1/N

– Degree distribution is Poisson, E[d] = p(N-1); E[# of e] = pN(N-1)/2

– Parameter: p

• Graph process model:

– starting with no edges, just keep adding one edge at a time

– always choose next edge randomly from among all missing edges

Mining Social Networks (IV)

• α-model (Watts-Strogatz models, Small-world)

– For vertices u, v, define m(u,v) to be the number of common neighbors (so far)

– Define the propensity R(u,v) of u to connect to v

• if m(u,v) >= k, R(u,v) = 1 (share too many friends, must connect)

• if m(u,v) = 0, R(u,v) = p (no mutual friends  no bias to connect)

• else, R(u,v) = p + (m(u,v)/k) a

(1-p)  biased to connect

– Generate network incrementally, with R(u,v) as the edge probability;

– α  ∞ , is similar to Erdos-Renyi models

– Need to tune parameter α, p, k

Mining Social Networks (V)

• Scale free models: not fix N (# of vertices)

– Start with (say) two vertices connected by an edge

– let Z = Σ d(j) where d(j) = degree of vertex j so far

– add new vertex i with k edges back to {1, …

, i-1}: i is connected back to j with probability d(j)/Z

– Richer get richer …

• Evaluation of generative models

– Can they explain all the characteristics of social networks?

– Parameter tuning?

• Other models for Social network analysis

– Copying model: leads to communities

– Forest Fire Model

– Electricity network (not generative model, but interesting)

Mining Social Networks (VI)

• Text Related Applications: quite a lot!

– Ranking webpages

– Multi-resolution Concept/Topic Map

– Citation Impact of scientific literature

– Entity-relation extraction

– Bioinformatics: Pathway extraction

– Reference Reconciliation

– Web structure evolution

– Community discovery in Weblogs..

Text and Web mining

• Data: text, unstructured/semi-structured; webpages with linkages, user logs;

– E.g. webpage, news, email, weblogs, scientific literature, citations, customer reviews, forums, search logs, chatting logs, legal documents, etc.

• Challenges:

– Modeling unstructured/semi-structured data

– Coupling with Natural Language Processing

– Handling high dimensionality

– Handling data sparseness and ambiguity

– The Web is too complicated!

Text and Web mining (II)

• Selected Problems:

– Text categorization/clustering ( Already covered in NLP and ML )

– Word sense disambiguation ( Covered in NLP )

– Information Extraction ( Covered in NLP )

– Dimension Reduction ( Overlapping with ML and IR )

– Collaborative Filtering, User-interest modeling

– Topic Detection and Tracking

– Comparative Text Mining, Theme based text mining

– Transliteration mining

– Email clustering / spam detection

– Opinion mining ( Overlapping with NLP )

– Social Networks Related (Already covered)

– Temporal Text Mining

– Vision based page segmentation / Block based search

Text and Web mining (III)

• Methods: Confluence of Multiple Disciplines

– Database: data integration, schema matching, XML

– Data mining: sequential pattern mining, association rule mining, …

– IR: Search, language models, feedback, …

– Machine Learning: SVD, Supervised/unsupervised learning, semi-supervised learning, Topicmodels, …

– NLP: POS tagging, parsing, context modeling, sentiment extraction, entity extraction, …

– Statistical Learning: Bayesian methods, word bursting, timeseries analysis, hypothesis testing, other statistical models, …

Text and Web mining (IV)

• Resolution:

– Word level: Word sense disambiguation, word bursting, transliteration mining

– Entity level: information extraction, entity-relation network

– Pattern level: opinion mining, relation extraction

– Document level: document classification/clustering

– Theme level: PLSI, LDA, comparative text mining, temporal text mining/spatiotemporal text mining

– Topic level: topic detection and tracking, email threading

– Web level: social network, weblog mining, block based search

• Selected topics will be discussed in next meeting..

Part 4: Research Groups

• Introduction

• Functionalities

• Hot topics

• Research Groups

– Stanford, CMU, UIUC, Wisc, Helsinki, UMN

– IBM, Microsoft, MSRA, Yahoo!

– Others


Research Groups

• Rakesh Agrawal

– One of the Leaders in Data Mining

– Frequent patterns, Privacy Preserved Data Mining

• Stanford: Jerome H. Friedman

– http://www-stat.stanford.edu/~jhf/

– Strong Statistical flavor, machine learning, boosting

• CMU: Christos Faloutsos

– http://www.cs.cmu.edu/~christos/

– Graph mining, Social Networks, Stream data mining, Image/Multimedia mining, time-series mining

• UIUC: Jiawei Han

– http://www-sal.cs.uiuc.edu/~hanj/

– Many! Frequent pattern mining, graph mining, OLAP/Cubing, Stream data mining, Classification, Clustering, …

Research Groups (II)

• University of Helsinki: Heikki Mannila

– http://www.cs.helsinki.fi/research/fdk/

– http://www.cs.helsinki.fi/u/mannila/

– Frequent itemset mining, computational biology

• Wisconsin: Raghu Ramakrishnan

– http://www.cs.wisc.edu/dmi/

– http://www.cs.wisc.edu/~raghu/

– Data warehousing, cubing, classification/clustering,

• Minnesota: Vipin Kumar

– http://www-users.cs.umn.edu/~kumar/

– Spatiotemporal data mining

• IBM T.J Watson: Philip S. Yu

– http://domino.research.ibm.com/comm/research.nsf/pages/r.kdd.html

– http://www.research.ibm.com/people/p/psyu/index.html

– Frequent pattern mining, graph mining, data streams

Research Groups (III)

• Microsoft Research Redmond: Surajit Chaudhuri

– http://research.microsoft.com/dmx/

– Data base related, Data cleaning, etc.

• Microsoft Research Redmond: Eric Brill

– http://research.microsoft.com/tmsn/

– http://research.microsoft.com/~brill/

– Text Mining, Search and Navigation Research, NLP

• Microsoft Research Asia:

– http://research.microsoft.com/wsm/

– Web search, web/text mining

• Yahoo! Research: Prabhakar Raghavan

– http://research.yahoo.com/researcher.shtml

– http://theory.stanford.edu/~pragh/

– Web/Text Mining, Social Networks

Research Groups (IV)

• IBM Webfountain

– http://www.almaden.ibm.com/webfountain/

• UIC: Bing Liu

– http://www.cs.uic.edu/~liub/

– Association rule mining, web/text mining

• UNC: Wei Wang

– http://www.cs.unc.edu/~weiwang/

– Biology data mining, frequent pattern mining

• Simon Fraser: Jian Pei

– http://www.cs.sfu.ca/~jpei/

– Sequential pattern mining, OLAP

• National University of Singapore: Anthony K.H. Tung

– http://www.comp.nus.edu.sg/~atung/

– Spatial data mining, Biology data mining

• …

Part 5: Useful Resources

• Introduction

• Functionalities

• Hot topics

• Research Groups


– Text Books

– Toolkits

– Conferences

– Others

Text Books

•

S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured

Data. Morgan Kaufmann, 2002

•

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Interscience,

2000

•

T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley &

Sons, 2003

• U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in

Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996

• U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and

Knowledge Discovery, Morgan Kaufmann, 2001

• J. Han and M. Kamber. Data Mining: Concepts and Techniques . Morgan Kaufmann, 2nd ed., 2006

• D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001

•

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data

Mining, Inference, and Prediction , Springer-Verlag, 2001

•

T. M. Mitchell, Machine Learning, McGraw Hill, 1997

• G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT

Press, 1991

• P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining , Wiley, 2005

• S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998

•

I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and

Techniques with Java Implementations , Morgan Kaufmann, 2nd ed. 2005

From Prof. Jiawei Han’s slides

Toolkits

• Weka: Data mining software in Java

– http://www.cs.waikato.ac.nz/%7Eml/weka/

• IlliniMine (Illinois Data Mining System)

– http://illimine.cs.uiuc.edu/

– Data Cubing

– Frequent Pattern Mining

– Sequential pattern mining

– Graph pattern Mining

– Classification

• Collected by Vipin Kumar:

– http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm

Conferences

• KDD Conferences • Other related conferences

– ACM SIGKDD Int. Conf. on

Knowledge Discovery in

Databases and Data Mining

( KDD )

– ACM SIGMOD

– VLDB

– (IEEE) ICDE

– SIAM Data Mining Conf. ( SDM )

– (IEEE) Int. Conf. on Data

Mining ( ICDM )

– WWW, SIGIR

– ICML, CVPR, NIPS

• Journals

– Conf. on Principles and practices of Knowledge

Discovery and Data Mining

( PKDD )

– Data Mining and Knowledge

Discovery (DAMI or DMKD)

– IEEE Trans. On Knowledge and Data Eng. (TKDE)

– Pacific-Asia Conf. on

Knowledge Discovery and

Data Mining ( PAKDD )

– KDD Explorations

– ACM Trans. on KDD

From Prof. Jiawei Han’s slides

Others

• KDnuggets

– http://www.kdnuggets.com/

• Tutorial: Machine Learning Techniques for Data Mining

(WEKA) Slides- Eibe Frank, University of Waikato http://books.elsevier.com/companions/1558605525?country=United

+States

• Ideas for course projects in data mining

– Collected by Vipin Kumar

– http://www-users.cs.umn.edu/~kumar/dmbook/projects.htm

End of the presentation

Thanks!

A Brief Overview of Data Mining

A Brief Overview of Data

Mining

- IR Group Meeting

04/11/2006

Qiaozhu Mei

Outline

Part 1: Introduction

What is Data Mining?

What is Data Mining? (cont.)

• Under these definitions:

General Process of KDD

– Data mining—core of

knowledge discovery process

Related Fields

Differences to Related Fields

Different Views of Data Mining

Different Views of Data Mining (II)

Part 2: Functionalities

Data Warehousing and OLAP

Frequent Patterns and Associations

Mining Frequent Patterns

Classification and Prediction

Clustering

Outlier, Trend and Evolution

Part 3: Hot Topics

Mining Data Streams

Mining Data Streams (II)

Mining Time series

Mining Time series (II)

Spatiotemporal data mining

Spatiotemporal data mining (II)

Special topics in Frequent Pattern

Mining

Sequential pattern mining

Graph Mining

Graph Mining (II)

Graph Mining (III): Graph Indexing &

Querying

Graph Mining (IV): Graph Indexing &

Querying

Pattern Post-processing

Mining Social Networks

Mining Social Networks (II)

Mining Social Networks (III)

Mining Social Networks (IV)

Mining Social Networks (V)

Mining Social Networks (VI)

Text and Web mining

Text and Web mining (II)

Text and Web mining (III)

Text and Web mining (IV)

Part 4: Research Groups

Research Groups

Research Groups (II)

Research Groups (III)

Research Groups (IV)

Part 5: Useful Resources

Text Books

Toolkits

Conferences

Others

End of the presentation

Thanks!

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib