International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 Data Mining - An Evolutionary View of Agriculture Mr. Abhishek B. Mankar1, Mr. Mayur S. Burange2 1 ME (CSE) Scholar, department of CSE, P.R. Pote (Patil) College of Engg. & Management, Amravati Amravati-444605 2 Assitant Professor, department of CSE, P.R. Pote (Patil) College of Engg. & Management, Amravati Amravati-444605 Abstract It is very hard to acquire the information what really want with the accumulation of large number of data. Data mining technology has been received a great progress with the rapid development of computer science, artificial intelligence. In this paper our focus is on the applications of Data mining techniques in agricultural field. Generally for doing agriculture land, labour, capital and organization are required without that cannot produce with a new agriculture technology. Fuzzy sets are suitable for handling the issues related to understandability of pattern of incomplete data, and human interaction, mixed media information and can provide approximate solutions faster for given pattern of data. Yield prediction is a very important agricultural problem to farmer that remains to be solved based on the previous available data. The yield prediction problem can be solved by employing Data Mining techniques. This work aims to find suitable data models that achieve a high accuracy and a high generality in terms of yield prediction capabilities. For this purpose, different Data Mining techniques and their types were evaluated on different data sets. Keywords: Data mining, Fuzzy sets, Knowledge discovery, Agriculture. 1. Introduction In recent years, different emerging application domains have introduced new constraints and methods for information technology. Information technology (IT) has become more and more part of our everyday lives nowadays, this is especially true for agriculture. Data Mining is the technique of extracting useful and important information from large data set. Data mining in agriculture field is a relatively novel research field. Five years ago, one of the authors of this survey co-authored a book named “Data Mining in Agriculture” [1]. The book gives a wide range overview of recent data mining techniques. Data mining and knowledge discovery in database (KDD) are concerned with extracting patterns and models of interest from huge databases. KDD says that “knowledge” is the necessary end product of given data-driven discovery. KDD refers to the overall process of discovering useful knowledge from data while data mining refers to a particular step in this process [2]. In this paper describe applications to agricultural related areas. Such as Yield prediction is a very important agricultural problem. Any farmer is interested in knowing how much yield he is about to expect. In the past, yield prediction was performed by considering farmer's experience on particular field, crop and climate condition. In this paper we discuss additional information about data like probability in probability theory, grade of membership in fuzzy set theory [3]. 2. DATA MINING 2.1 KNOWLEDGE DISCOVERY AND AGRICULTURE The term KDD refers to the overall process of knowledge discovery in databases. In this process, Data mining is a particular step involving the application of specific algorithms for extracting patterns (models) from data set. The additional steps in the knowledge discovery process, such as data preparation, data selection, data cleaning, incorporation of appropriate knowledge of data set, and proper interpretation of the results of mining, ensures that useful knowledge is derived from the data set. The subject of KDD has evolved, and continues to evolve, from the intersection of research from such fields as databases, machine learning, statistics, artificial intelligence, reasoning with uncertainties, data visualization, pattern recognition, machine discovery, high-performance computing and knowledge acquisition for expert systems. KDD systems incorporate theories, algorithms, and methods from all these fields. Many successful applications have been reported from varied sectors such as marketing, finance, banking, manufacturing, telecommunications and Agriculture. A good overview of KDD can be found in Ref. [4], [5]. KDD is used in agriculture to show the statistical information about soil condition, climate conditions, past crop yield, Government strategies, all the information about pesticides, fertilizer. So the farmer should understand what types of information are useful? and how they used in their farm? KDD refers to the overall process of discovering useful knowledge from data while data mining refers to particular step in the process. Volume 3, Issue 3, March 2014 Page 102 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 2.2 Fuzzy Set The modeling of imprecise and qualitative knowledge, as well as the transmission and handling uncertainty of data at various stages are possible through the use of fuzzy sets. Knowledge discovery in databases is mainly concerned with identifying interesting patterns and describing them in a concise and meaningful manner [4]. Fuzzy models can be said to represent a prudent and user-oriented sifting of data, qualitative observations and calibration of commonsense rules in an attempt to establish meaningful and useful relationships between system variables [6]. Despite a growing versatility of knowledge discovery systems, there is an important component of human interaction that is inherent to any process of knowledge representation, manipulation, and processing. Fuzzy sets depend on membership function which is mainly from human ideas so that decisions in agriculture can easily find by using fuzzy set. 2.2.1 Clustering Data mining aims at sifting through large volumes of data in order to reveal useful information in the form of new relationships, patterns or clusters, for decision making by a user farmer [7]. So that, user can finds appropriate data for the crop development and their management. 2.2.2 Association Rules: An important area of data mining research deals with the discovery of association rules [8]. An association rule describes an interesting association relationship among different attributes. In Agriculture insecticide association involves disease related to that insect. 2.2.3 Functional Dependencies: Fuzzy inference generalizes both imprecise (set-valued) and precise inference. Similarly, fuzzy relational databases generalize their classical and imprecise counterparts by supporting fuzzy information storage and retrieval [9]. In agriculture field functional dependency describe between yield and their price. Also it provides information about soil and their productivity. 2.2.4 Data Summarization Summary discovery is one of the major components of knowledge discovery in databases. Data summarization provides the user with comprehensive information for grasping the essence from a large amount of information in a database. Fuzzy set theory is also used for data summarization [10]. Agriculture is very big field so comprehensive information much required to grasping the data about agriculture. 3. Application There are several applications of Data Mining techniques in the field of agriculture. Some of the data mining techniques are related to weather conditions and Short term forecasting of air pollution in the atmosphere [12]. Data Mining techniques are applied to study sound recognition problems. Fagerlund S [13] uses SVMs to classify the sound of birds and other different sounds using Support Vector Machines. Data Mining techniques are often used to study soil characteristics. classifying soils in combination with GPS-based technologies [14]. Meyer GE et al. [15] uses a K-Means approach to classify soils and plants and Camps Valls et al. [16] uses SVMs to classify crops. Different possible changes of the weather scenarios are analyzed using SVMs [17]. The K Nearest Neighbor(KNN) is applied for simulating daily precipitations and other weather variables [18]. Patel VC et al. [19] uses Computer Vision to recognize cracks in eggs. Das KC et al. [20] uses ANNs to classify eggs as fertility Karimi Y et al. [21] uses SVMs for detecting weed and nitrogen stress in corn. 4. Analysis Data Excel software are use to conduct qualitative analyses and to create a benchmark for the analysis of the research dataset. The benchmark allowed current statistical methods for the dataset to be established and any limitations for dataset to be identified. The dataset was then analyzed using a clustering process within the data mining software. That result were compared against the benchmark for a number of factors that included ease of application, speed, time and accuracy of results to determine if data mining was superior to current methods. The results of statistical and data mining experiments may still require some expertise to be understood and utilized of given dataset. Figurer1: Data mining and statistics analysis Volume 3, Issue 3, March 2014 Page 103 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 Above figure1 shows the relation between Data mining and statistics analysis. The analysis and interpretation of patterns is a timeless consuming process that gives a deep understanding of statistics. The process requires a small amount of time to complete analysis and to examine any patterns and relationships within the data. 5. Statistical Result Analysis and Overview: The data available for the statistical analysis is obtained for the years from 2004 to 2010 in Amravati district of Maharashtra in India. The data is taken in four input variables. They are Year, Rainfall, Area of Sowing and Production. Year attribute specifies the year in which the data available in Hectares. Rainfall attribute specifies the Rainfall in Amravati district in the specified year in Centimeters. Area of Sowing attribute specifies the total area sowed in Amravati district in the specified year. Production attribute specifies the production of crop in Amravati district in the specified year in Tons. The preliminary data collection is carried out for all the districts of Maharashtra in India. Each area in this collection is identified by the respective longitude and latitude of the region. In this paper the evaluation is considered for only one district i.e. Amravati. The information gathering process is done with three government department like Indian Meteorological Department, Statistical Institution and Agricultural department. Instead of restricting with few regions and few samples of data, it is aimed at applying the Expectation Maximization approach on all the regions of Maharashtra in India. Table1: showing average statistic of crop. Year Average Average Average Rain in Area of Production Year Sowing 2006 1007.0 681103 792634 2007 1097.7 682831 826193 2008 600.9 683192 653915 2009 698 684628 685930 2010 1101.4 693254 726594 In this table the estimation of the crop yield is analyzed with respect to four parameters namely Year, Rainfall, Area of Sowing and Production. This gives the analysis and detail of patterns about the crop to understanding of farmer and it easily acquired by farmer. The process requires a small amount of time to complete analysis and to examine any patterns and relationships within the data so it should handle by farmer easily. Conclusion With the improvement of data mining technologies, especially those without any premises or humans subjective, data mining can be applied in many areas. In this paper some Data Mining techniques were adopted in order to estimate crop yield analysis with existing data and their use in data mining. Some data mining technique have not yet applied to agriculture problem, as an example GPS techniques may be employed for discovering important information from agricultural-related like soil identification. References: [1] A. Mucherino, P. Papajorgji, P.M. Pardalos, Data Mining in Agriculture, Springer, 2009. [2] U. Fayyad, P. Stolorz, Data mining and KDD: Promise and challenges, Future Generation Computer Systems, Vol.13, pp. 99-115, 1997. [3] I. Jagielska, C. Mattehews, T. Whitfort, An investigation into the application of neural networks, fuzzy logic, genetic alrorithms, and rough sets to automated knowledge acquisition for classification problems, Neurocomputing, Vol. 24, pp. 37-54, 1999 [4] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds., Advances in Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI/MIT Press, 1996. [5] “Special issue on knowledge discovery in data- and knowledge bases,” Int. J. Intell. Syst., vol. 7, 1992. [6] S. K. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing. New York: Wiley, 1999. [7] J. Furnkranz, J. Petrak, and R. Trappl, “Knowledge discovery in international conflict databases,” Applied Artificial Intelligence, vol. 11, pp. 91–118, 1997. [8] J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann, 1993. [9] J. Hale and S. Shenoi, “Analyzing FD inference in relational databases,” Data Knowledge Eng., vol. 18, pp. 167– 183, 1996. Volume 3, Issue 3, March 2014 Page 104 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 [10] D. H. Lee and M. H. Kim, “Database summarization using fuzzy ISA hierarchies,” IEEE Trans. Syst., Man, Cybern. B, vol. 27, pp. 68–78, 1997. [11] Jorquera H, Perez R, Cipriano A, Acuna G Short term forecasting of air pollution episodes. In: Zannetti P (eds) Environmental modeling 4. WIT Press, UK, 2001. [12] Fagerlund S Bird species recognition using Support Vector Machines. EURASIP J Adv Signal Processing, Article ID 38637, p 8, 2007. [13] Verheyen K, Adriaens D, Hermy M, Deckers S High-resolution continuous soil classification using morphological soil profile descriptions. Geoderma 101:31–48, 2001. [14] Meyer GE, Neto JC, Jones DD, Hindman TW Intensified fuzzy clusters for classifying plant, soil, and residue regions of interest from color images. Comput Electronics Agric 42:161–180, 2004. [15] Camps-Valls G, Gomez-Chova L, Calpe-Maravilla J, Soria- Olivas E, Martin-Guerrero JD, Moreno J Support Vector Machines for crop classification using hyperspectral data. Lect Notes Comp Sci 2652:134–141 , 2003. [16] Tripathi S, Srinivas VV, Nanjundiah RS Downscaling of precipitation for climate change scenarios: a Support Vector Machine approach. J Hydrol 330:621–640, 2006. [17] Rajagopalan B, Lall U, A K-Nearest Neighbor simulator for daily precipitation and other weather variables. Wat Res Res 35(10) : 3089–3101, 1999. [18] Patel VC, McClendon RW, Goodrum JW Crack detection in eggs using computer vision and neural networks. Artif Intell Appl 8(2):21–31, 1994. [19] Das KC, Evans MD Detecting fertility of hatching eggs using machine vision II: Neural Network classifiers. Trans ASAE 35(6):2035–2041, 1992. [20] Karimi Y, Prasher SO, Patel RM, Kim SH Application of support vector machine technology for Weed and nitrogen stress detection in corn. Comput Electronics Agricult 51:99–109, 2006. AUTHOR Mr. Abhishek B. Mankar Received Bachelor’s Degree In Computer Science Engineering from SGB Amravati University & Pursuing Master Degree In CSE from P. R. Pote (Patil) College of Engineering. Amravati-444602 Maharashtra, India Mr. Mayur S. Burange, Received the Master Degree in Computer Science from SGB Amravati University. Working as a Assistant Professor In Department of Computer Science and Engineering at P. R. Pote (Patil). College of Engineering. Amravati-444602. Maharashtra, India. Volume 3, Issue 3, March 2014 Page 105