Classifying Text Documents using
Unconventional Representation
Abstract:
In our work, the text data of text mining has gradually become a new follow a
line of investigation. Text clustering can greatly simplify browsing large
collections of documents by reorganizing them into a smaller number of
patterns in text documents manageable clusters. Text clustering is mainly used for
a document clustering system which clusters the set of documents based on
the user typed key term. Firstly the system preprocesses the set of documents
and the user given terms. We use the feature evaluation to reduce the
dimensionality of high-dimensional text vector. The system then identifies the
term frequency and then those frequencies are weighted by using the
inverted document frequency method. Then this weight of documents is used for
clustering. Feature clustering is a powerful method to reduce the dimensionality of
feature vectors for text classification. Presents an innovative and effective pattern
discovery technique which includes the processes of pattern deploying and pattern
evolving We propose a new method of representing documents based on clustering
of term frequency vectors. Term frequency vectors of each cluster are used to form
a symbolic representation by the use of Mean and Standard Deviation. Further,
term frequency vectors are used in the form a interval valued features. Words that
are similar to each other are grouped into the same cluster. Each cluster is
characterized by a membership function with statistical mean and deviation. When
all the words have been fed in, a desired number of clusters are formed
automatically. We then have one extracted feature for each cluster. The extracted
feature, corresponding to a cluster, is a weighted combination of the words
contained in the cluster. Experimental results show that our method is applied to
the text clustering, making the results of clustering more efficient & accurate and
stable than the existing algorithm.
Existing System:
1
Existing text clustering uses the frequent word sets to cluster the
documents.
2 Clustering has been used in the literature of text classification as an
alternative representation scheme for text documents.
3 Many well known clustering algorithms deal with documents as bag
of words and ignore the important relationships between words like
synonyms.
4 Existing algorithm has a higher probability of grouping unrelated
documents into the same cluster.
Proposed System:
1) Our proposed text clustering has a frequent concept to cluster the text
documents.
2) simple and efficient symbolic text classification is presented. A text
document is represented by the use of symbolic features.
3) Term frequency vectors of each cluster are used to form a symbolic
representation by the use of interval valued features.
4) To check the effectiveness and robustness of the proposed method,
extensive experimentation is conducted on various datasets.
5) The proposed technique uses two processes, pattern deploying and
pattern evolving, to refine the discovered patterns in text documents.
6) Our Proposed algorithm utilizes the semantic relationship between
words to create concepts.
7) The Relationship between words like synonyms, hypernymy, also be
identified & hypernymy is most effective for Text clustering.
8) Associating a meaningful label to each final cluster is more essential.
Then, the high dimensionality of text documents should be reduced.
9) A clustering algorithm works with frequent concepts rather than
frequent items used in traditional text mining techniques.
10) FCDC found more accurate, scalable and effective when compared
with existing text clustering algorithms.
Software Requirements:
Platform
: JDK 1.6
Program Language
: JAVA
Tool IDE
: Net beans
Data Base
: MYSQL
Operating System
: Microsoft Windows XP
Hardware Requirements:
Processor
: Pentium IV Processor
RAM
: 512 MB
Hard Drive
: 10GB
Monitor
: 14” VGA COLOR MONITOR
Keyboard
: 104 Keys
Mouse
: Logitech Serial Mouse
Disk Space
: 1 GB
Download

1 GB - IEEE 2015 Final Year Projects