untitled - Academic Science

advertisement
Artificial Neural Network - New Dimensions for
Data Mining
Amit Bhagat
Assistant Professor
Maulana Azad National Institute of Technology
Abstract— Due to dramatic advancement in the field of
computer science and rapidly decreases in hardware costs In
today’s world when data is being collected through various
sources and storage cost is very low, Data mining techniques
are working as an effective tool to extract desired knowledge
from this massive data. Neural network is successfully applied
to various other areas but it is quite new method in the
application of data mining. Neural networks may have
shortcomings of complex structure, long training time and
uneasily understandable representation of results, it has
high accuracy which is superior to other methods and this
makes it more available in data mining. In this paper, neural
network techniques for data mining is discussed in detail,
including process, categories, applications, problems and
trends of it.
Keywords- neural network; data mining;
unsupervised
supervised;
I.
INTRODUCTION
With the continuous development of computer and
internet, it is easy to get the related information. But
nowadays, the data volume stored in database increases
too rapidly and in the large amounts of data much
important information is hidden. It is hard to analyze the
mass and wide reference date with the anciently state
method. So an intellectualized technology, data mining,
emergency as the times require, which integrated apply all
kinds of state and analyze, data base and capacity
language to analyze mass data. By data mining more and
more information can be extracted from the database and
will create a lot of potential profit for the companies. Data
mining must by data mining tools. With the development
of the data mining, a number of data mining tools were
developed. Data mining tools can forecast the future
trends and activities to support the decision of people.
For example, through analyzing the whole database
system of the garment company, the data mining tools
can answer the problems such as “Which customer is
most likely to respond to the color or style of the company,
why”, and other similar problems. Some data mining tools
can also resolve some traditional problems which
consumed much time, this is because that they can
rapidly browse the entire database and find some useful
information experts unnoticed. Scientists of all fields for its
developments have interested neural network. Neural
network is a complex network system which generated
with simulating the image intuitive thinking of human, on
the basis of the research of biological neural network,
according to the features of biological neurons and neural
network and by simplifying, summarizing and refining. It
uses the idea of non-linear mapping, the method of
parallel processing and the structure of the neural network
itself to express the associated knowledge of input and
output.[1-6]
In this paper, data mining based on neural network is
discussed in detail, including process of data mining
based on neural network, categories of data mining
based on neural network and problems and trends of it.
II. Basic Data Flow
The process of data mining based on neural network
is shown in Fig 1. It is mainly composed of three stages:
data preparation, rule extraction and rule evaluation.
Raw Data
Data Selection
Rule Generation
Rule Calculation
Useful rules
Fig 1.process of data mining based on neural
network
A. Data Selection
Data selection is defining, processing and
representing data which are to be mined in order to adapt
them to specific data mining method. It includes the
following four stages.
1) Cleaning of Data
Cleaning of data means filling the vacancy in the data,
eliminating noise data and correcting inconsistent data.
Because the data in data warehouse are from other
databases in which data is not all right, often inevitably
incomplete, inconsistent and inaccurate. Data cleaning
can be done either before data are loaded in data
warehouse or after data are loaded in data warehouse.
2 ) Selection of Data
Data selection is selecting columns and rows for this
data mining. In most cases, although people can not
precisely know which parameters are most important
decision-making, the neural networks can help people
build a parameter associated model, and then help to
determine which are the most important parameters.
3 ) Pre-processing of Data
Pre-processing of data is doing enhancement
processing on clean data after data selection. This
enhanced processing sometimes means generating new
data item according to one or
more fields, while sometimes means replacing some fields
with a field which has more information.
4 ) Representation of Data
Representation of data is to converse the preprocessed data to a form which data mining algorithm
based on neural network can accept. Data mining based
on neural network can only handle numerical data, so it is
necessary to converse sign data to numerical data.
B. Rule Generation
1)
Generation of rule based on structure analysis
Rule generation based on structure analysis looks
rule generation as a searching process. The basic idea is
to map the trained network to the corresponding rules.
Since the computational complexity of the searching
process has exponential relationship with inputs of neural
network, when the inputs increase, combinatorial
explosion will occur. Thus, pruning and clustering are
usually utilized to reduce the connections in network in
order to reduce the computational complexity.
2 ) Performance analysis based rule generation
Different from rule generation based on structure
analysis, rule generation based on performance analysis
doesn’t analyze and search the structure of neural
network, but deal with the neural network as a whole.
This method focused more on the capacity of
reproducing the network of the rules extracted. That is to
produce a set of rules can replace the original network.
C. Rule Calculation
Although the objective of the rule calculation depends
on each specific application, generally speaking, the
rules can be evaluated according to the following goals.
1. Find the optimal sequence of rules, making it obtain
the best results on a given data set.
2. Test the correctness of the rules.
3. Detect how much knowledge in the neural network
is not extracted.
4. Detect inconsistencies between the extracted rules
and the trained neural network.
Predetermining the sequence of rules has great effect
on application of the rules, however, the process of
extracting rules from neural networks does not give
any information about the sequence of rules. But it can
be realized based on the following three measures: (1)
robustness. It tests the times each rule is activated.
Obviously, it has nothing to do with the sequence of
rules. (2) Comprehensive. It tests how many patterns are
recognized by a separate rule. (3) Error vigilance. It tests
the times a rule is wrongly activated.
III. Neural Network Methods for Data Mining
There are hundreds of types of data mining based on
neural network, but the most commonly used are data
mining based on self-organizing neural network and data
mining based on fuzzy neural network.
A. Self-organizing Neural Network
Self-organizing process is a learning process without
teachers. Through the study, important features or
some internal knowledge can be extracted from a set of
data, such as the distribution characteristic. Kohonen
thinks that the roles of the adjacent units in neural
network, like neurons in the brain, are not the same.
Through the interaction, neural network can be adaptively
developed into different specific detectors. Kohonen has
also proposed a learning pattern, so that the input signal
is mapped to low dimensional space, and keep the input
signal with same characteristics correspond adjacent
regions in space. This is the so-called self-organizing
feature mapping.
B. Fuzzy Neural Network
Although the neural network has strong ability of
learning, classification, association and memory, the
biggest difficulty is that output can not be given an
intuitive explanation when neural network is applied in
data mining. When fuzzy processing is applied in neural
network, the neural network can not only increase the
expression power of output, but also make the system
more stable. The output value of traditional network is
either 0 or 1, while the output value of fuzzy network is
changed as membership which is a real number between
0 and 1. After the samples and their memberships are
trained, the network has the capability of reflecting the
relationship of inputs and outputs in the training set, and
then can give the membership of the pattern to be
identified.
IV.
APPLICATIONS OF DATA MINING BASED ON
NEURAL NETWORK
A. Application in Visualization
Visualization exists in the whole process of knowledge
discovery, but concentrates in the early and late stage of
knowledge discovery. An emphasis of visualization is to
make data in multi-dimensional space be displayed in
two- dimensional or three-dimensional space. Through
visualization, data can be classified initially, and then can
discrete the feature attributes. This is very meaningful
to the process of data mining which uses the
classification algorithm.
B. Application in Classification
Classification is a kind of problem of data mining. The
purpose of classification is to generate a classification
model which can map a data item in the database to one
of a given category. Neural network method is a common
method to construct classifier.
C. Application in Intrusion Detection
Wenke Lee first applied data mining techniques to
intrusion detection. He used the classification analysis,
correlation analysis and sequence analysis to extract a
user's behavioral characteristics in audit data, and
applied them to anomaly detection and misuse detection.
In this basis, the world's research institutions developed a
variety of data mining-based intrusion detection
technology which effectively reduces the false detection
rate of intrusion detection[7]-[12].
[2]
[3]
[4]
V.
PROBLEMS AND TRENDS
The research of data mining based on neural network
still has many problems at present. Firstly, data mining
based on neural network mainly hopes to utilize the
ability of nonlinear handling and continuous attributes
handling of neural network, especially in processing
problems of regression model. Yet, researches on rule
extraction of neural network are nearly all aimed at
category-type network while the research aimed at
regression model are almost blank. If we can make a
breakthrough in the research of the later, the research of
date mining based on neural network will be promoted
greatly.
Secondly, rule extraction of neural network focuses on
enhancing the fidelity of extracted rules, that is, whether
the rules can reproduce the function of network really.
Yet, in the application of date mining, the
understandability of rules is more important. In some
actual area, it needs to reduce fidelity in order to achieve
better understandability. Thus, how to make balance
between understandability and fidelity is a task to be
researched.
Thirdly, most neural networks have no increment
learning ability. If training data vary, the network should
be trained again by the whole training set. This can’t
satisfy operation such as addition, deletion or
modification which usually occurs in database. Moreover,
it makes the system have to contain a huge training set.
If neural networks have increment learning ability, this
problem will be solved easily.
At last, date mining based on neural network is mainly
oriented to classification rule mining at present. Though
some achievements have been attained, the ability of
neural computation wasn’t fully reflected. Expanding the
knowledge types of data mining and broadening the
scope of applications further are important research
content of data mining based on neural network in the
future.
VI.CONCLUSION
Date mining based on neural network is accepted
widely at present. This paper figures out its usage in
detail. The process of date mining based on neural
network was discussed at first. Then the categories and
applications of it were studied. Finally, the problems and
trends were indicated. This field of applications of neural
networks for data mining is new and upcoming, and it has
vast potential for further research.
REFERENCES
[1]
Liu Zhao, Jiang Liangxiao, “The research of data mining
based
on neural networks”, Computer engineering and
applications, 2004,3,pp:
172 -174.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
Li Yeli, Chang Guiran, Xu Xi, “Applications of neural networks in
data mining”, Computer engineering and applications, 2000,8,pp:
103 -105.
Lu Hongying, Xiao Sihe, “Data mining approaches research based
on improved genetic neural network”, Computer applications, 2006
,4, pp:878 -880.
Li Yang, Gao Zhengguang, Li Qiyan, “A data mining architecture
based on ANN and genetic algorithm”, Computer engineering,
2004, vol
30,6,pp:155-156.
Pan Xiao, Wan Min, “Data mining study based on fuzzy
neural network”, Microelectronics and computer, 2005 ,vo22,
12,pp:48-51.
Du Jinlian, Guo Wenjun, Chi Zhongxian, “Locating multi-dimentiol
data mining space with neural networks”, Journal of Chinese
Computer Systems, 2002, vol 23,9,pp:1100-1103
U. Fayyad, G. Piatetsky-Shapiro, et a1 (Eds.),Advances in
Knowledge Discovery and DataMining, AAA1 Press, Menlo Park:
CA, 1996.
M. Craven and J. Shavlik, Using Neural Networks for Data Mining,
Future Generation ComputerSystems, 1997.
H. Lu, R. Setiono and H. Liu, “Effective Data Mining Using Neural
Networks”, VLDB’95Proceedings, Springer, Singapore, 1995.
S.S.R. Abidi and A. Goh, “Applying Knowledge Discovery to
Predict
Infectious Disease Epidemics”, In Lecture Notes in
Artificial
Intelligence 153 1- PRICAI’98: Topics in Artificial Intelligence, H.
Lee and H. Motoda (Eds.), Berlin:Springer Verlag, 1998.
T. Kohonen, “Self-organized Formation of Topologically Correct
Feature Maps”, Biological Cybernetics, Vol. 43, pp. 59-69,1982.
T. Kohonen, “The Self-organizing Map”, Proceedings of the IEEE,
Vol. 78, No. 9, pp. 1464-1480, 1990.
Download