International Journal of Application or Innovation in Engineering & Management... Web Site: www.ijaiem.org Email: Volume 3, Issue 3, March 2014

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 3, March 2014
ISSN 2319 - 4847
Data Mining - An Evolutionary View of
Agriculture
Mr. Abhishek B. Mankar1, Mr. Mayur S. Burange2
1
ME (CSE) Scholar, department of CSE,
P.R. Pote (Patil) College of Engg. & Management, Amravati Amravati-444605
2
Assitant Professor, department of CSE,
P.R. Pote (Patil) College of Engg. & Management, Amravati Amravati-444605
Abstract
It is very hard to acquire the information what really want with the accumulation of large number of data. Data mining
technology has been received a great progress with the rapid development of computer science, artificial intelligence. In this paper
our focus is on the applications of Data mining techniques in agricultural field. Generally for doing agriculture land, labour,
capital and organization are required without that cannot produce with a new agriculture technology. Fuzzy sets are suitable for
handling the issues related to understandability of pattern of incomplete data, and human interaction, mixed media information
and can provide approximate solutions faster for given pattern of data. Yield prediction is a very important agricultural problem
to farmer that remains to be solved based on the previous available data. The yield prediction problem can be solved by employing
Data Mining techniques. This work aims to find suitable data models that achieve a high accuracy and a high generality in terms
of yield prediction capabilities. For this purpose, different Data Mining techniques and their types were evaluated on different
data sets.
Keywords: Data mining, Fuzzy sets, Knowledge discovery, Agriculture.
1. Introduction
In recent years, different emerging application domains have introduced new constraints and methods for information
technology. Information technology (IT) has become more and more part of our everyday lives nowadays, this is
especially true for agriculture. Data Mining is the technique of extracting useful and important information from large
data set. Data mining in agriculture field is a relatively novel research field. Five years ago, one of the authors of this
survey co-authored a book named “Data Mining in Agriculture” [1]. The book gives a wide range overview of recent data
mining techniques. Data mining and knowledge discovery in database (KDD) are concerned with extracting patterns and
models of interest from huge databases. KDD says that “knowledge” is the necessary end product of given data-driven
discovery. KDD refers to the overall process of discovering useful knowledge from data while data mining refers to a
particular step in this process [2]. In this paper describe applications to agricultural related areas. Such as Yield
prediction is a very important agricultural problem. Any farmer is interested in knowing how much yield he is about to
expect. In the past, yield prediction was performed by considering farmer's experience on particular field, crop and
climate condition. In this paper we discuss additional information about data like probability in probability theory, grade
of membership in fuzzy set theory [3].
2. DATA MINING
2.1 KNOWLEDGE DISCOVERY AND AGRICULTURE
The term KDD refers to the overall process of knowledge discovery in databases. In this process, Data mining is a
particular step involving the application of specific algorithms for extracting patterns (models) from data set. The
additional steps in the knowledge discovery process, such as data preparation, data selection, data cleaning, incorporation
of appropriate knowledge of data set, and proper interpretation of the results of mining, ensures that useful knowledge is
derived from the data set. The subject of KDD has evolved, and continues to evolve, from the intersection of research
from such fields as databases, machine learning, statistics, artificial intelligence, reasoning with uncertainties, data
visualization, pattern recognition, machine discovery, high-performance computing and knowledge acquisition for expert
systems. KDD systems incorporate theories, algorithms, and methods from all these fields. Many successful applications
have been reported from varied sectors such as marketing, finance, banking, manufacturing, telecommunications and
Agriculture. A good overview of KDD can be found in Ref. [4], [5]. KDD is used in agriculture to show the statistical
information about soil condition, climate conditions, past crop yield, Government strategies, all the information about
pesticides, fertilizer. So the farmer should understand what types of information are useful? and how they used in their
farm? KDD refers to the overall process of discovering useful knowledge from data while data mining refers to particular
step in the process.
Volume 3, Issue 3, March 2014
Page 102
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 3, March 2014
ISSN 2319 - 4847
2.2 Fuzzy Set
The modeling of imprecise and qualitative knowledge, as well as the transmission and handling uncertainty of data at
various stages are possible through the use of fuzzy sets. Knowledge discovery in databases is mainly concerned with
identifying interesting patterns and describing them in a concise and meaningful manner [4]. Fuzzy models can be said to
represent a prudent and user-oriented sifting of data, qualitative observations and calibration of commonsense rules in an
attempt to establish meaningful and useful relationships between system variables [6]. Despite a growing versatility of
knowledge discovery systems, there is an important component of human interaction that is inherent to any process of
knowledge representation, manipulation, and processing. Fuzzy sets depend on membership function which is mainly
from human ideas so that decisions in agriculture can easily find by using fuzzy set.
2.2.1 Clustering
Data mining aims at sifting through large volumes of data in order to reveal useful information in the form of new
relationships, patterns or clusters, for decision making by a user farmer [7]. So that, user can finds appropriate data for
the crop development and their management.
2.2.2 Association Rules:
An important area of data mining research deals with the discovery of association rules [8]. An association rule describes
an interesting association relationship among different attributes. In Agriculture insecticide association involves disease
related to that insect.
2.2.3 Functional Dependencies:
Fuzzy inference generalizes both imprecise (set-valued) and precise inference. Similarly, fuzzy relational databases
generalize their classical and imprecise counterparts by supporting fuzzy information storage and retrieval
[9]. In agriculture field functional dependency describe between yield and their price. Also it provides information about
soil and their productivity.
2.2.4 Data Summarization
Summary discovery is one of the major components of knowledge discovery in databases. Data summarization provides
the user with comprehensive information for grasping the essence from a large amount of information in a database.
Fuzzy set theory is also used for data summarization [10]. Agriculture is very big field so comprehensive information
much required to grasping the data about agriculture.
3. Application
There are several applications of Data Mining techniques in the field of agriculture. Some of the data mining techniques
are related to weather conditions and Short term forecasting of air pollution in the atmosphere [12]. Data Mining
techniques are applied to study sound recognition problems. Fagerlund S [13] uses SVMs to classify the sound of birds
and other different sounds using Support Vector Machines. Data Mining techniques are often used to study soil
characteristics. classifying soils in combination with GPS-based technologies [14]. Meyer GE et al. [15] uses a K-Means
approach to classify soils and plants and Camps Valls et al. [16] uses SVMs to classify crops. Different possible changes
of the weather scenarios are analyzed using SVMs [17]. The K Nearest Neighbor(KNN) is applied for simulating daily
precipitations and other weather variables [18]. Patel VC et al. [19] uses Computer Vision to recognize cracks in eggs.
Das KC et al. [20] uses ANNs to classify eggs as fertility Karimi Y et al. [21] uses SVMs for detecting weed and nitrogen
stress in corn.
4. Analysis Data
Excel software are use to conduct qualitative analyses and to create a benchmark for the analysis of the research dataset.
The benchmark allowed current statistical methods for the dataset to be established and any limitations for dataset to be
identified. The dataset was then analyzed using a clustering process within the data mining software. That result were
compared against the benchmark for a number of factors that included ease of application, speed, time and accuracy of
results to determine if data mining was superior to current methods. The results of statistical and data mining
experiments may still require some expertise to be understood and utilized of given dataset.
Figurer1: Data mining and statistics analysis
Volume 3, Issue 3, March 2014
Page 103
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 3, March 2014
ISSN 2319 - 4847
Above figure1 shows the relation between Data mining and statistics analysis. The analysis and interpretation of patterns
is a timeless consuming process that gives a deep understanding of statistics. The process requires a small amount of time
to complete analysis and to examine any patterns and relationships within the data.
5. Statistical Result Analysis and Overview:
The data available for the statistical analysis is obtained for the years from 2004 to 2010 in Amravati district of
Maharashtra in India. The data is taken in four input variables. They are Year, Rainfall, Area of Sowing and Production.
Year attribute specifies the year in which the data available in Hectares. Rainfall attribute specifies the Rainfall in
Amravati district in the specified year in Centimeters. Area of Sowing attribute specifies the total area sowed in Amravati
district in the specified year. Production attribute specifies the production of crop in Amravati district in the specified year
in Tons. The preliminary data collection is carried out for all the districts of Maharashtra in India. Each area in this
collection is identified by the respective longitude and latitude of the region. In this paper the evaluation is considered for
only one district i.e. Amravati. The information gathering process is done with three government department like Indian
Meteorological Department, Statistical Institution and Agricultural department. Instead of restricting with few regions
and few samples of data, it is aimed at applying the Expectation Maximization approach on all the regions of
Maharashtra in India.
Table1: showing average statistic of crop.
Year
Average
Average
Average
Rain in
Area of
Production
Year
Sowing
2006
1007.0
681103
792634
2007
1097.7
682831
826193
2008
600.9
683192
653915
2009
698
684628
685930
2010
1101.4
693254
726594
In this table the estimation of the crop yield is analyzed with respect to four parameters namely Year, Rainfall, Area of
Sowing and Production. This gives the analysis and detail of patterns about the crop to understanding of farmer and it
easily acquired by farmer. The process requires a small amount of time to complete analysis and to examine any patterns
and relationships within the data so it should handle by farmer easily.
Conclusion
With the improvement of data mining technologies, especially those without any premises or humans subjective, data
mining can be applied in many areas. In this paper some Data Mining techniques were adopted in order to estimate crop
yield analysis with existing data and their use in data mining. Some data mining technique have not yet applied to
agriculture problem, as an example GPS techniques may be employed for discovering important information from
agricultural-related like soil identification.
References:
[1] A. Mucherino, P. Papajorgji, P.M. Pardalos, Data Mining in Agriculture, Springer, 2009.
[2] U. Fayyad, P. Stolorz, Data mining and KDD: Promise and challenges, Future Generation Computer Systems,
Vol.13, pp. 99-115, 1997.
[3] I. Jagielska, C. Mattehews, T. Whitfort, An investigation into the application of neural networks, fuzzy logic, genetic
alrorithms, and rough sets to automated knowledge acquisition for classification problems, Neurocomputing, Vol. 24,
pp. 37-54, 1999
[4] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds., Advances in Knowledge Discovery and
Data Mining. Menlo Park, CA: AAAI/MIT Press, 1996.
[5] “Special issue on knowledge discovery in data- and knowledge bases,” Int. J. Intell. Syst., vol. 7, 1992.
[6] S. K. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing. New York: Wiley, 1999.
[7] J. Furnkranz, J. Petrak, and R. Trappl, “Knowledge discovery in international conflict databases,” Applied Artificial
Intelligence, vol. 11, pp. 91–118, 1997.
[8] J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann, 1993.
[9] J. Hale and S. Shenoi, “Analyzing FD inference in relational databases,” Data Knowledge Eng., vol. 18, pp. 167–
183, 1996.
Volume 3, Issue 3, March 2014
Page 104
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 3, March 2014
ISSN 2319 - 4847
[10] D. H. Lee and M. H. Kim, “Database summarization using fuzzy ISA hierarchies,” IEEE Trans. Syst., Man, Cybern.
B, vol. 27, pp. 68–78, 1997.
[11] Jorquera H, Perez R, Cipriano A, Acuna G Short term forecasting of air pollution episodes. In: Zannetti P (eds)
Environmental modeling 4. WIT Press, UK, 2001.
[12] Fagerlund S Bird species recognition using Support Vector Machines. EURASIP J Adv Signal Processing, Article ID
38637, p 8, 2007.
[13] Verheyen K, Adriaens D, Hermy M, Deckers S High-resolution continuous soil classification using morphological
soil profile descriptions. Geoderma 101:31–48, 2001.
[14] Meyer GE, Neto JC, Jones DD, Hindman TW Intensified fuzzy clusters for classifying plant, soil, and residue regions
of interest from color images. Comput Electronics Agric 42:161–180, 2004.
[15] Camps-Valls G, Gomez-Chova L, Calpe-Maravilla J, Soria- Olivas E, Martin-Guerrero JD, Moreno J Support Vector
Machines for crop classification using hyperspectral data. Lect Notes Comp Sci 2652:134–141 , 2003.
[16] Tripathi S, Srinivas VV, Nanjundiah RS Downscaling of precipitation for climate change scenarios: a Support
Vector Machine approach. J Hydrol 330:621–640, 2006.
[17] Rajagopalan B, Lall U, A K-Nearest Neighbor simulator for daily precipitation and other weather variables. Wat Res
Res 35(10) : 3089–3101, 1999.
[18] Patel VC, McClendon RW, Goodrum JW Crack detection in eggs using computer vision and neural networks. Artif
Intell Appl 8(2):21–31, 1994.
[19] Das KC, Evans MD Detecting fertility of hatching eggs using machine vision II: Neural Network classifiers. Trans
ASAE 35(6):2035–2041, 1992.
[20] Karimi Y, Prasher SO, Patel RM, Kim SH Application of support vector machine technology for Weed and nitrogen
stress detection in corn. Comput Electronics Agricult 51:99–109, 2006.
AUTHOR
Mr. Abhishek B. Mankar Received Bachelor’s Degree In Computer Science Engineering from SGB
Amravati University & Pursuing Master Degree In CSE from P. R. Pote (Patil) College of Engineering.
Amravati-444602 Maharashtra, India
Mr. Mayur S. Burange, Received the Master Degree in Computer Science from SGB Amravati University.
Working as a Assistant Professor In Department of Computer Science and Engineering at P. R. Pote (Patil).
College of Engineering. Amravati-444602. Maharashtra, India.
Volume 3, Issue 3, March 2014
Page 105
Download