Text Mining Application Programming

advertisement
Text Mining Application Programming
Chapter 1 Introduction
Manu Konchady, 2006
Definition: Text Mining
 all types of text processing that deal with finding,
organizing, and analyzing information.
(formal) the creation of new information that is not
obvious in a collection of documents.
New information is defined as a pattern, trend, or
relationship that can’t be easily gleaned by reading
individual documents.
The term document to refer to any unit of text, such as a
Web page, an e-mail, a formatted article, a set of slides, or
a plain text file.
Data Mining vs. Text Mining
 Data mining deals with structured numeric
data, text mining deals with unstructured text.
Data used for data mining is extracted,
transformed, and loaded in a data warehouse.
Text mining attempts to build a model from
data that is assumed to be imprecise.
Origins of Text Mining
Information Retrieval
Natural Language Processing
Understanding Text
 “Alice saw the rabbit with glasses,”
Polysemy
“In what state would you find Lincoln”
“free software”
Synonymy
More than one word can be expressed the same meaning.
Exuberant: lush, luxuriant, profuse, and riotous.
An Architecture for Text Mining
Applications
Text Mining Functions
Searching
Information Extraction
Clustering
Categorization
Summarization
Information Monitor
Question and Answer
A Layered Model
Text Mining Installation
Text Mine (http://textmine.sf.net) is a
collection of Perl modules and code on
SourceForge to index, cluster, classify, and
summarize text.
Usage
Command line
Web-based interface.
Web Interface
Download