Artificial Neural Network - New Dimensions for Data Mining Amit Bhagat Assistant Professor Maulana Azad National Institute of Technology Abstract— Due to dramatic advancement in the field of computer science and rapidly decreases in hardware costs In today’s world when data is being collected through various sources and storage cost is very low, Data mining techniques are working as an effective tool to extract desired knowledge from this massive data. Neural network is successfully applied to various other areas but it is quite new method in the application of data mining. Neural networks may have shortcomings of complex structure, long training time and uneasily understandable representation of results, it has high accuracy which is superior to other methods and this makes it more available in data mining. In this paper, neural network techniques for data mining is discussed in detail, including process, categories, applications, problems and trends of it. Keywords- neural network; data mining; unsupervised supervised; I. INTRODUCTION With the continuous development of computer and internet, it is easy to get the related information. But nowadays, the data volume stored in database increases too rapidly and in the large amounts of data much important information is hidden. It is hard to analyze the mass and wide reference date with the anciently state method. So an intellectualized technology, data mining, emergency as the times require, which integrated apply all kinds of state and analyze, data base and capacity language to analyze mass data. By data mining more and more information can be extracted from the database and will create a lot of potential profit for the companies. Data mining must by data mining tools. With the development of the data mining, a number of data mining tools were developed. Data mining tools can forecast the future trends and activities to support the decision of people. For example, through analyzing the whole database system of the garment company, the data mining tools can answer the problems such as “Which customer is most likely to respond to the color or style of the company, why”, and other similar problems. Some data mining tools can also resolve some traditional problems which consumed much time, this is because that they can rapidly browse the entire database and find some useful information experts unnoticed. Scientists of all fields for its developments have interested neural network. Neural network is a complex network system which generated with simulating the image intuitive thinking of human, on the basis of the research of biological neural network, according to the features of biological neurons and neural network and by simplifying, summarizing and refining. It uses the idea of non-linear mapping, the method of parallel processing and the structure of the neural network itself to express the associated knowledge of input and output.[1-6] In this paper, data mining based on neural network is discussed in detail, including process of data mining based on neural network, categories of data mining based on neural network and problems and trends of it. II. Basic Data Flow The process of data mining based on neural network is shown in Fig 1. It is mainly composed of three stages: data preparation, rule extraction and rule evaluation. Raw Data Data Selection Rule Generation Rule Calculation Useful rules Fig 1.process of data mining based on neural network A. Data Selection Data selection is defining, processing and representing data which are to be mined in order to adapt them to specific data mining method. It includes the following four stages. 1) Cleaning of Data Cleaning of data means filling the vacancy in the data, eliminating noise data and correcting inconsistent data. Because the data in data warehouse are from other databases in which data is not all right, often inevitably incomplete, inconsistent and inaccurate. Data cleaning can be done either before data are loaded in data warehouse or after data are loaded in data warehouse. 2 ) Selection of Data Data selection is selecting columns and rows for this data mining. In most cases, although people can not precisely know which parameters are most important decision-making, the neural networks can help people build a parameter associated model, and then help to determine which are the most important parameters. 3 ) Pre-processing of Data Pre-processing of data is doing enhancement processing on clean data after data selection. This enhanced processing sometimes means generating new data item according to one or more fields, while sometimes means replacing some fields with a field which has more information. 4 ) Representation of Data Representation of data is to converse the preprocessed data to a form which data mining algorithm based on neural network can accept. Data mining based on neural network can only handle numerical data, so it is necessary to converse sign data to numerical data. B. Rule Generation 1) Generation of rule based on structure analysis Rule generation based on structure analysis looks rule generation as a searching process. The basic idea is to map the trained network to the corresponding rules. Since the computational complexity of the searching process has exponential relationship with inputs of neural network, when the inputs increase, combinatorial explosion will occur. Thus, pruning and clustering are usually utilized to reduce the connections in network in order to reduce the computational complexity. 2 ) Performance analysis based rule generation Different from rule generation based on structure analysis, rule generation based on performance analysis doesn’t analyze and search the structure of neural network, but deal with the neural network as a whole. This method focused more on the capacity of reproducing the network of the rules extracted. That is to produce a set of rules can replace the original network. C. Rule Calculation Although the objective of the rule calculation depends on each specific application, generally speaking, the rules can be evaluated according to the following goals. 1. Find the optimal sequence of rules, making it obtain the best results on a given data set. 2. Test the correctness of the rules. 3. Detect how much knowledge in the neural network is not extracted. 4. Detect inconsistencies between the extracted rules and the trained neural network. Predetermining the sequence of rules has great effect on application of the rules, however, the process of extracting rules from neural networks does not give any information about the sequence of rules. But it can be realized based on the following three measures: (1) robustness. It tests the times each rule is activated. Obviously, it has nothing to do with the sequence of rules. (2) Comprehensive. It tests how many patterns are recognized by a separate rule. (3) Error vigilance. It tests the times a rule is wrongly activated. III. Neural Network Methods for Data Mining There are hundreds of types of data mining based on neural network, but the most commonly used are data mining based on self-organizing neural network and data mining based on fuzzy neural network. A. Self-organizing Neural Network Self-organizing process is a learning process without teachers. Through the study, important features or some internal knowledge can be extracted from a set of data, such as the distribution characteristic. Kohonen thinks that the roles of the adjacent units in neural network, like neurons in the brain, are not the same. Through the interaction, neural network can be adaptively developed into different specific detectors. Kohonen has also proposed a learning pattern, so that the input signal is mapped to low dimensional space, and keep the input signal with same characteristics correspond adjacent regions in space. This is the so-called self-organizing feature mapping. B. Fuzzy Neural Network Although the neural network has strong ability of learning, classification, association and memory, the biggest difficulty is that output can not be given an intuitive explanation when neural network is applied in data mining. When fuzzy processing is applied in neural network, the neural network can not only increase the expression power of output, but also make the system more stable. The output value of traditional network is either 0 or 1, while the output value of fuzzy network is changed as membership which is a real number between 0 and 1. After the samples and their memberships are trained, the network has the capability of reflecting the relationship of inputs and outputs in the training set, and then can give the membership of the pattern to be identified. IV. APPLICATIONS OF DATA MINING BASED ON NEURAL NETWORK A. Application in Visualization Visualization exists in the whole process of knowledge discovery, but concentrates in the early and late stage of knowledge discovery. An emphasis of visualization is to make data in multi-dimensional space be displayed in two- dimensional or three-dimensional space. Through visualization, data can be classified initially, and then can discrete the feature attributes. This is very meaningful to the process of data mining which uses the classification algorithm. B. Application in Classification Classification is a kind of problem of data mining. The purpose of classification is to generate a classification model which can map a data item in the database to one of a given category. Neural network method is a common method to construct classifier. C. Application in Intrusion Detection Wenke Lee first applied data mining techniques to intrusion detection. He used the classification analysis, correlation analysis and sequence analysis to extract a user's behavioral characteristics in audit data, and applied them to anomaly detection and misuse detection. In this basis, the world's research institutions developed a variety of data mining-based intrusion detection technology which effectively reduces the false detection rate of intrusion detection[7]-[12]. [2] [3] [4] V. PROBLEMS AND TRENDS The research of data mining based on neural network still has many problems at present. Firstly, data mining based on neural network mainly hopes to utilize the ability of nonlinear handling and continuous attributes handling of neural network, especially in processing problems of regression model. Yet, researches on rule extraction of neural network are nearly all aimed at category-type network while the research aimed at regression model are almost blank. If we can make a breakthrough in the research of the later, the research of date mining based on neural network will be promoted greatly. Secondly, rule extraction of neural network focuses on enhancing the fidelity of extracted rules, that is, whether the rules can reproduce the function of network really. Yet, in the application of date mining, the understandability of rules is more important. In some actual area, it needs to reduce fidelity in order to achieve better understandability. Thus, how to make balance between understandability and fidelity is a task to be researched. Thirdly, most neural networks have no increment learning ability. If training data vary, the network should be trained again by the whole training set. This can’t satisfy operation such as addition, deletion or modification which usually occurs in database. Moreover, it makes the system have to contain a huge training set. If neural networks have increment learning ability, this problem will be solved easily. At last, date mining based on neural network is mainly oriented to classification rule mining at present. Though some achievements have been attained, the ability of neural computation wasn’t fully reflected. Expanding the knowledge types of data mining and broadening the scope of applications further are important research content of data mining based on neural network in the future. VI.CONCLUSION Date mining based on neural network is accepted widely at present. This paper figures out its usage in detail. The process of date mining based on neural network was discussed at first. Then the categories and applications of it were studied. Finally, the problems and trends were indicated. This field of applications of neural networks for data mining is new and upcoming, and it has vast potential for further research. REFERENCES [1] Liu Zhao, Jiang Liangxiao, “The research of data mining based on neural networks”, Computer engineering and applications, 2004,3,pp: 172 -174. [5] [6] [7] [8] [9] [10] [11] [12] Li Yeli, Chang Guiran, Xu Xi, “Applications of neural networks in data mining”, Computer engineering and applications, 2000,8,pp: 103 -105. Lu Hongying, Xiao Sihe, “Data mining approaches research based on improved genetic neural network”, Computer applications, 2006 ,4, pp:878 -880. Li Yang, Gao Zhengguang, Li Qiyan, “A data mining architecture based on ANN and genetic algorithm”, Computer engineering, 2004, vol 30,6,pp:155-156. Pan Xiao, Wan Min, “Data mining study based on fuzzy neural network”, Microelectronics and computer, 2005 ,vo22, 12,pp:48-51. Du Jinlian, Guo Wenjun, Chi Zhongxian, “Locating multi-dimentiol data mining space with neural networks”, Journal of Chinese Computer Systems, 2002, vol 23,9,pp:1100-1103 U. Fayyad, G. Piatetsky-Shapiro, et a1 (Eds.),Advances in Knowledge Discovery and DataMining, AAA1 Press, Menlo Park: CA, 1996. M. Craven and J. Shavlik, Using Neural Networks for Data Mining, Future Generation ComputerSystems, 1997. H. Lu, R. Setiono and H. Liu, “Effective Data Mining Using Neural Networks”, VLDB’95Proceedings, Springer, Singapore, 1995. S.S.R. Abidi and A. Goh, “Applying Knowledge Discovery to Predict Infectious Disease Epidemics”, In Lecture Notes in Artificial Intelligence 153 1- PRICAI’98: Topics in Artificial Intelligence, H. Lee and H. Motoda (Eds.), Berlin:Springer Verlag, 1998. T. Kohonen, “Self-organized Formation of Topologically Correct Feature Maps”, Biological Cybernetics, Vol. 43, pp. 59-69,1982. T. Kohonen, “The Self-organizing Map”, Proceedings of the IEEE, Vol. 78, No. 9, pp. 1464-1480, 1990.