Uploaded by Milind Rahatwal

Profile Based Agriculture System

advertisement
Soil Profile Based Agricultural System
Submitted in partial fulfillment of the requirements
of the degree of
B. E. Computer Engineering
By
Ankita Fernandes
Roll no.11 PID 152030
Nishita Desai
Roll no.53 PID 162259
Samiksha Rahatwal Roll no.67 PID 162252
Guide:
Ms. Priya Karunakaran
Assistant Professor
Department of Computer Engineering
St. Francis Institute of Technology
(Engineering College)
University of Mumbai
2018-2019
CERTIFICATE
This is to certify that the project entitled “Soil Profile Based Agricultural System” is a
bonafide
work
of
“Ankita
Fernandes”
(Roll
No.11),
“Nishita
Desai”
(Roll
No.53),“Samiksha Rahatwal” (Roll No.67) submitted to the University of Mumbai in partial
fulfillment of the requirement for the award of the degree of B.E. in Computer Engineering
(Name and sign)
Guide
(Name and sign)
Head of Department
(Name and sign)
Principal
Project Report Approval for B.E.
This project report entitled Soil Profile Based Agricultural System by Ankita
Fernandes, Nishita Desai and Samiksha Rahatwal is approved for the degree of
B.E. in Computer Engineering.
Examiners
1.--------------------------------------------2.---------------------------------------------
Date:
Place:
Declaration
I declare that this written submission represents my ideas in my own words
and where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles
of academic honesty and integrity and have not misrepresented or fabricated or
falsified any idea/data/fact/source in my submission. I understand that any
violation of the above will be cause for disciplinary action by the Institute and
can also evoke penal action from the sources which have thus not been properly
cited or from whom proper permission has not been taken when needed.
----------------------------------------(Signature)
----------------------------------------(Name of student and Roll No.)
Date:
Abstract
In many regions in India, farmers face problems of crop production due to soil and weather
conditions. Owing to illiteracy, farmers might not be able to take advantage of scientific
advances made in the field of agriculture and still adhere to traditional practices. This makes
obtaining desirable yields difficult. For instance, crop failure can occur due to improper use of
fertilizers or undesirable amounts of rainfall. In such situations, an adequate solution could be
to choose crops for cultivation that will be well suited with current soil quality and probable
expected rainfall during cultivation. Therefore, we introduce the ‘Soil Profile Based
Agricultural System’, based on data mining. We will provide a list of crops a farmer can
cultivate based on inputted soil attributes (NPK and pH) and predicted rainfall of farmer’s
region. In addition to this, it will also suggest fertilizers that can be used to improve soil quality
and thus bring more crops under successful cultivation. This Android application will be
developed to solve the growing problem of crop failure.
Contents
Chapter
Description
INTRODUCTION
1
1.1
Description
1.2
Problem Formulation
1.3
Motivation
1.4
Proposed Solution
1.5
Scope of the project
2
REVIEW OF LITERATURE
SYSTEM ANALYSIS
3
3.1
Functional Requirements
3.2
Non Functional Requirements
3.3
Specific Requirements
3.4
Use-Case Diagrams and Description
ANALYSIS MODELING
4
5
6
7
4.1
Data Modeling
4.2
Class Diagram
4.3
Functional Modeling
4.4
Timeline Chart
DESIGN
5.1
Architectural Design
IMPLEMENTATION
6.1
Algorithms/ Methods Used
CONCLUSION
Page No.
List of Figures
Fig. No.
Figure Caption
1.1
SVM Explained
3.1
Use Case Diagram
4.1.1
Dataset 1: NPK, pH values
Dataset
4.1.2
Dataset 1: Rainfall Dataset
4.1.3
Dataset 3: Rainfall range for
crops Dataset
4.2
Class Diagram
4.2.1
DFD Level 0
4.2.2
DFD Level 1
5.1.1
Flowchart for User Activity
5.1.2
Flowchart for System Activity
Page No.
List of Abbreviations
Sr. No.
Abbreviation
Expanded form
i
NPK
Nitrogen, Phosphorus and Potassium
ii
SVM
Support Vector Machine
iii
KNN
K-nearest Neighbour
iv
ANN
Artificial Neural Networks
v
SVD
Singular Value Decomposition
Chapter 1
Introduction
Agriculture is a primary occupation in India providing a livelihood to 118.6 million
cultivators as per the 2011 Census. Understanding the nature of soil, knowing when and where
to apply which fertilizers, predicting future weather conditions, maintaining quality of crops,
understanding how different factors work differently on different parts of one land are few of
the many problems farmers face before as well as while cultivation. There are many factors and
statistics that need to be taken into consideration while making important agricultural decisions
that could be troublesome to do manually or sometimes even inefficient.
1.1
Description
We aim at solving some of these problems through a crop suggestion application based on data
mining. Soil reports are mandatory before commencing any cultivation. But, many a times
farmers find it difficult to comprehend these reports themselves. Based on weather conditions
as well as values from the soil reports, it will suggest crops suitable for cultivation in that
particular area. Farmers can also test suitability of different crops with regional rainfall or
receive list of fertilizers to bring current soil conditions up to the mark for cultivation of a
particular crop.
1.2 Problem Formulation
Bringing Data mining technologies into agriculture presents a significant challenge, at the same
time this technology contributes effectively in many countries economic and social
development. It represents a crucial source of data in need of being wisely managed and
analyzed with appropriate methods and tools in order to extract meaningful information.
The main purpose of our proposed system is to bring effective data mining architecture based
on profiling system which can assist users, to make better decisions by providing them real
time data processing, and a dynamic data mining service composition method, to enhance and
monitor the agricultural productivity. Thus, improving their traditional decision making
process, and allow better management of the natural resources.
Being able to know when to apply fertilizers, soil fertility, applying efficient and sustainable
techniques to crop production, all of these represent some of the features of this system.
By using a dynamic and accurate selection of data mining services, the agricultural actors can
exploit data in real time and with appropriate tools. One of the main features of this system is
that it will help the farmer understand what fertilizer the soil would require and whether the
soil is appropriate for the crop grown in it.
We will use SVM algorithm to facilitate classification of inputted soil attributes to give 3
classes of fertility. The same will be applicable to classify predicted rainfall in the ranges of
low, medium, high and very high and thus provide a list of suitable crops.
1.3 Motivation
This system aims at providing reliable solutions to farmers by the easiest means possible. It
will help farmers to better understand effect of soil nutrients on different crop growths. Since
rainfall plays a very important role in crop yield, considering this factor along with soil quality
will help farmers make smarter decisions.
1.4 Proposed Solution
Our system will be an Android Application that could be freely available and easily
accessible to farmers. A simple and friendly user interface will achieve proper understanding of
the system by the farmers. We work on soil profiling taking in NPK and pH values of soil as
input from the farmer. NPK is short for Nitrogen (N), Phosphorus (P) and Potassium (K),
which are the three major elements in deciding soil quality and fertility. Nitrogen is primarily
responsible for vegetative growth. Nitrogen assimilation into amino acids is the building block
for protein in the plant. It is a component of chlorophyll and is required for several enzyme
reactions. Phosphorus is a major component in plant DNA and RNA. Phosphorus is also
critical in root development, crop maturity and seed production. The role of potassium in the
plant is indirect, meaning that it does not make up any plant part. Potassium is required for the
activation of over 80 enzymes throughout the plant. Soil pH affects many physical, chemical
and biological reactions necessary for crop survival, growth and yield. Nutrient Availability is
strongly influenced by pH, thus making it an important factor to monitor.
We will use Data mining algorithm like SVM for Initial crop suggestion based on soil
attributes and also based on the rainfall for that particular year in the inputted region will
provide a final list of crops suitable for the particular soil.
1.5 Scope of the Project
Being deployed as a simple Android Application, this system will be useful to all farmers.
Since most people living in rural areas possess mobile phones nowadays, a few clicks on their
phones can provide them with smarter agricultural solutions. The constraint of this application
would be that it would require internet connectivity for accessing it.
Chapter 2
Review of Literature
In order to design the system we need to plan all aspects of the system. We therefore, need to
study various techniques and technologies that can be used to develop our system. This
literature survey aims to find out how data mining works, how soil parameters play a vital role
in cultivation, in what context data mining can be used in agriculture which components of soil
are used and studied and what the result of these studies are.
Significance of NPK and pH: We will work on soil profiling taking in NPK and pH values of
soil as input from the farmer. NPK is short for Nitrogen (N), Phosphorus (P) and Potassium
(K), which are the three major elements in deciding soil quality and fertility. Nitrogen is
primarily responsible for vegetative growth. Nitrogen assimilation into amino acids is the
building block for protein in the plant. It is a component of chlorophyll and is required for
several enzyme reactions. Phosphorus is a major component in plant DNA and RNA.
Phosphorus is also critical in root development, crop maturity and seed production. The role of
potassium in the plant is indirect, meaning that it does not make up any plant part. Potassium is
required for the activation of over 80 enzymes throughout the plant[2]. Soil pH affects many
physical, chemical and biological reactions necessary for crop survival, growth and yield.
Nutrient Availability is strongly influenced by pH, thus making it an important factor to
monitor[3].Data mining algorithms are divided into two broad categories i.e. Supervised and
Unsupervised algorithms. For instance, K-nearest neighbor method provides a set of
instructions for classification purposes, and hence it belongs to the first group. Here a provided
input is classified as one of the classes provided in the training dataset. SVD and K-means on
the other hand, are clustering algorithms that group training data into clusters based on
similarities between its parameters [13].
The concept of crop and agricultural field monitoring system has existed for a long period.
One problem might be that these monitoring systems create huge volumes of data that are
difficult to manage causing a huge technology system. This increase impacts the hardware cost
required for implementing monitoring system[8]. Agriculture as a business is unique. Crop
production is dependent on many climatic, geographical, biological political and economic
factors that are mostly independent of one another. This multiple factor introduces risk. The
efficient management of these risks is imperative for successful agriculture and consistent
output of food[9].
Feedforward backpropagation is one of the common implementations of artificial neural
networks. In [4], this type of ANN is used to build a crop suggestion system. Trial and error is
undertaken to determine the number of hidden layers. Inputs from the users include NPK, pH
as well as depth, temperature and rainfall of the region. The latter three inputs might not be
known to the farmer or might differ between planning phase and actual cultivation phase. A
random forest algorithm is used in [5] to suggest required level of NPK for soil and thus
prevent undernutrition as well as overnutrition of soil. With these results farmers can then
choose suitable fertilisers. Inputs to the system are existing NPK values of soil and soil and
crop variety. Paper[6] studies the different data mining algorithms that can be used to predict
crop yield of different crops taking into consideration climatic parameters like temperature,
humidity, rainfall, agronomic parameters like soil, nutrient contents like N, P, K, and pesticides
etc. It suggests ID3 decision tree for soyabean, Neural networks for corn, K-NN for wheat and
potato, J-48 for rice, etc. Paper [7] uses ensembling technique for crop prediction. Ensemble is
a data mining model also known as the Committee Methods or Model Combiners, that
combine the power of multiple models to acquire greater prediction, efficiency than any of its
models could achieve alone. The predicted output by the different learners with the highest
majority is considered as the final classification output. The classification models used are
K-NN, Naive Bayes, random tree and CHAID.
Support Vector Machine (SVM) is a supervised classification algorithm. Its
representation for two valued dataset is as shown in Fig 1.1. In this algorithm, we plot each
data item as a point in n-dimensional space (where n is number of features we have) with the
value of each feature being the value of a particular coordinate. Then, we perform classification
by finding the hyperplane that differentiates two classes very well [10]. We will use this
algorithm to train our soil attributes dataset. Having 4 attributes, we will obtain a
4-dimensional space. Therefore, through this algorithm we will obtain soil fertility value and
suggested suitable crops.
Fig. 1.1 SVM explained[11]
Chapter 3
System Analysis
3.1 Functional Requirements
Following are the functional requirements of the system:
1) Information Gathering: Data extraction from crop recommendation report and online
government weather datasets.
2) Data Preprocessing: Converting linguistic values to numeric ones for data processing
phase.
3) Data Processing/Data Mining: Acquiring trained models using data mining
algorithms.
4) Deployment of System on Android Studio: Creation of Android Application as User
interface for the system.
3.2 Non-Functional Requirements
Following are the non-functional requirements of the system:
1) Accuracy: The system must be able to suggest crops with high level of accuracy.
2) User Friendly:
The application must be user friendly to be effectively used by
farmers.
3) Scalability: System must be scalable so as to increase its scope to be used in states in
India or even other countries of the world.
3.3 Specific Requirements
1) Hardware Requirements
The Hardware Requirements for the proposed system are as follows.
a) Android smartphone
b) Minimum of 4 GB RAM
c) 100 GB Hard Disk Space
d) I3 processor or above
e) Windows 7 or above
2) Software Requirements
The Software Requirements for the proposed system are as follows.
a) Android Studio 3.0.1
b) Android SDK
c) Python 3.6
d) Java JDK
e) XAMPP
f) Eclipse
g) Apache Tomcat 9
3.4 Use Case Diagram
A use case is a methodology used in system analysis to identify, clarify, and organize system
requirements. There are three actors which is user and system. The figure 3.4.1 contains the use
case for our application. The actors for our application is farmers and the system.
Farmer:
The farmer has 3 roles:
● Registration
● Login
● Entering the values for NPK and pH as well as region
System:
The system is initially trained using the values in datasets. Once the farmer enters the values, the
system determines the output based on the trained data.
Fig. 3.4.1 Use Case Diagram
Chapter 4
Analysis Modeling
4.1 Data Modelling
Fig. 4.1.1 Dataset 1:NPK, pH Dataset
Fig. 4.1.2 Dataset 2: Rainfall Dataset
Fig. 4.1.3 Dataset 3: Rainfall range for Crops Dataset
We will be working with three datasets as shown in fig 4.1.1, fig. 4.1.2 and fig. 4.1.3.
Dataset 1 will give the initial list of suggested crops based on soil attributes. It will also provide
fertility of soil as 0, 1, 2 where 0 signifies low fertility and 2 signifies high fertility. The trained
model from dataset 2 will then refine this list based on rainfall predicted for given year in
farmers region and rainfall required for crops contained in the obtained list as suggested by
dataset 3. The dataset-2 includes 36 regions in India along with the rainfall experiences from
years 1951 to 2014. The third dataset provides the required rainfall for various crops to get
final list of suitable crops after rainfall prediction for a year in a particular region. These
datasets have been collected from Indian Government website[12].
4.2 Class Diagram
The class diagram is a static diagram. It represents the static view of an application.
Class diagram is not only used for visualizing, describing and documenting different aspects of
a system but also for constructing executable code of the software application. The class
diagram shows a collection of classes, interfaces, associations, collaborations and constraints.
It is also known as a structural diagram.
The class diagram describes the attributes and operations of a class and also the constraints
imposed on the system. The class diagrams are widely used in the modeling of object oriented
systems because they are the only UML diagrams which can be mapped directly with object
oriented languages.
So the purpose of the class diagram can be summarized as:
·
Analysis and design of the static view of an application.
·
Describe responsibilities of a system.
·
Base for component and deployment diagrams.
·
Forward and reverse engineering.
Fig. 4.2.1 Class Diagram
Our application consists of two classes i.e. farmer and crop prediction. Each farmer would have
a user ID,user name, password,location as attributes. The farmer can register,login,enter values
i.e NPK and pH as well as location and also view results.
The crop prediction class will deal with all the prediction attributes and function i.e. training
dataset, predicting crops and fertilizers required
4.3 Functional Modelling
A data flow diagram (DFD) is a graphical representation of the "flow" of data through
an information system, modelling its process aspects. A DFD is often used as a preliminary
step to create an overview of the system, which can later be elaborated.
A two-dimensional diagram that explains how data is processed and transferred in a
system. The graphical depiction identifies each source of data and how it interacts with other
data sources to reach a common output.
Individuals seeking to draft a data flow diagram must (1) identify external inputs and
outputs, (2) determine how the inputs and outputs relate to each other, and (3) explain with
graphics how these connections relate and what they result in. This type of diagram helps
business development and design teams visualize how data is processed.
DFD level 0:
The basic process of our system is i.e. the farmer would input the values which would
pass through the system and is stored in the database. Once the results are stored the farmer
views the results on the application.
Fig. 4.3.1 DFD Level 0
DFD Level 1:
Level 1 of the DFD consists of details of the system. There are 5 major processes:
1. Training the data using SVM
2. Process the values entered by farmer.
3. Based on these values a list of predicted crops would be presented
4. Along with predicted crops, list of fertilizers is also displayed
5. The results are than stored in the database for future reference.
Fig. 4.3.2 DFD Level 1
4.4 Timeline Chart
Chapter 5
Design
5.1 Architectural Design
Flowcharts are used in designing and documenting complex processes or programs. Like
other types of diagrams, they help visualize what is going on and thereby help the people to
understand a process, and perhaps also find flaws, bottlenecks, and other less-obvious features
within it. There are many different types of flowcharts, and each type has its own repertoire of
boxes and notational conventions. The two most common types of boxes in a flowchart are:
● a processing step, usually called activity, and denoted as a rectangular box
● a decision, usually denoted as a diamond.
Fig. 5.1.2 Flowchart for System Activity
Fig. 5.1.1 Flowchart for User Activity
The execution flow of the system is depicted in the figure above. The farmer first registers in
the system, he then enters the NPK and pH values as well as region of production. Once he
enters the details the list of possible crops are generated. Along with it the list of fertilizers that
can be applied are also generated. After that the farmer can view his previous results.
Chapter 6
Implementation
6.1 Algorithms/ Methods used
Requirements Gathering:
In this application we will use training data and a testing data. In training data we will use
information about the soil nutrients parameters like nitrogen, phosphorus, potassium, pH value,
rainfall and NPK pH,region for the testing data.
Input:
Training part of the data will be processed in the system and we have to take testing data from
the user. In our application user will provide the testing data as region, nitrogen, phosphorus,
potassium, pH value, etc.
Processing and output:
After getting a testing data system will start processing of that data with the training data and
after comparing the both data i.e. testing data with training data system will provide the output
as the soil fertility and according to soil fertility level predict crops and what fertilizers added
can increase the scope of crops to be cultivated.
What is Support Vector Machine?
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be
used for both classification or regression challenges. However,
it is mostly used in
classification problems. In this algorithm, plot each data item as a point in n-dimensional space
(where n is number of features you have) with the value of each feature being the value of a
particular coordinate. Then, perform classification by finding the hyper-plane that differentiate
the two classes very well.
Support Vectors are simply the coordinates of individual observation. Support Vector Machine
is a frontier which best segregates the two classes (hyper-plane/ line).
However, Svm is a supervised learning technique. When we have a dataset with features &
class labels both then we can use Support Vector Machine. But if in our dataset do not have
class labels or outputs of our feature set then it is considered as an unsupervised learning
algorithm. In that case, we can use Support Vector Clustering.
How Svm classifier Works?
For a dataset consisting of features set and labels set, an SVM classifier builds a model to
predict classes for new examples. It assigns new example/data points to one of the classes. If
there are only 2 classes then it can be called as a Binary SVM Classifier.
There are 2 kinds of SVM classifiers:
1. Linear SVM Classifier
2. Non-Linear SVM Classifier
Svm Linear Classifier:
In the linear classifier model, we assumed that training examples plotted in space. These data
points are expected to be separated by an apparent gap. It predicts a straight hyperplane
dividing 2 classes. The primary focus while drawing the hyperplane is on maximizing the
distance from hyperplane to the nearest data point of either class. The drawn hyperplane called
as a maximum-margin hyperplane.
SVM Non-Linear Classifier:
In the real world, our dataset is generally dispersed up to some extent. To solve this problem
separation of data into different classes on the basis of a straight linear hyperplane can’t be
considered a good choice. For this Vapnik suggested creating Non-Linear Classifiers by
applying the kernel trick to maximum-margin hyperplanes. In Non-Linear SVM Classification,
data points plotted in a higher dimensional space.
Linear Support Vector Machine Classifier
In Linear Classifier, A data point considered as a p-dimensional vector(list of p-numbers) and
we separate points using (p-1) dimensional hyperplane. There can be many hyperplanes
separating data in a linear order, but the best hyperplane is considered to be the one which
maximizes the margin i.e., the distance between hyperplane and closest data point of either
class.
The Maximum-margin hyperplane is determined by the data points that lie nearest to it. Since
we have to maximize the distance between hyperplane and the data points. These data points
which influences our hyperplane are known as support vectors.
Non-Linear Support Vector Machine Classifier
Vapnik proposed Non-Linear Classifiers in 1992. It often happens that our data points are not
linearly separable in a p-dimensional(finite) space. To solve this, it was proposed to map
p-dimensional
space
into
a
much
higher
dimensional
space.
We
can
draw
customized/non-linear hyperplanes using Kernel trick.
Every kernel holds a non-linear kernel function.
This function helps to build a high dimensional feature space. There are many kernels that have
been developed. Some standard kernels are:
1. Polynomial (homogeneous) Kernel:
The polynomial kernel function can be represented by the above expression. Where
k(xi, xj) is a kernel function, xi & xj are vectors of feature space and d is the degree of
polynomial function.
2. Polynomial(non-homogeneous) Kernel:
In the non-homogeneous kernel, a constant term is also added. The constant term “c” is
also known as a free parameter. It influences the combination of features. x & y are
vectors of feature space.
3. Radial Basis Function Kernel:
It is also known as RBF kernel. It is one of the most popular kernels. For distance
metric squared euclidean distance is used here. It is used to draw completely non-linear
hyperplanes.
where x & x’ are vectors of feature space. is a free parameter. Selection of parameters is
a critical choice. Using a typical value of the parameter can lead to overfitting our data.
Chapter 7
Conclusion
Agriculture is the backbone of India. We are dedicated to create a simple solution by
creating a Soil Profile based Agricultural System. Implementing this system can achieve better
productivity of crops. The issue of land loss due to over fertilization or under fertilization can
be avoided. It may increase the profit margin for farmers by removing unnecessary usage of
fertilizers or by avoiding wrong crop production.The main aim of our application is to help
farmers make better decisions regarding crop productivity and also help farmers decide on
what crops to cultivate based on the rainfall conditions in the region, we will also help farmers
decide on what fertilizers to apply so as to avoid the over fertilization or under fertilization
problems which would ultimately produce better yield.
References
[1] Soumaya Lamrhari1, Hamid Elghazi2, Tayeb Sadiki2, and Abdellatif El Faker1,”Profile based
big data architecture for agriculture context”, 2nd International Conference on Electrical and
Information Technologies ICEIT 2016
[2] https://www.noble.org/news/publications/ag-news-and-views/2007/january/back-to-basics-the-r
oles-of-n-p-k-and-their-sources/
[3] https://www.pioneer.com/home/site/us/agronomy/library/managing-soil-pH
[4] Giritharan Ravichandran, Koteeshwari R S, “Agricultural Crop Predictor and Advisor using
ANN for Smartphones”, 2016 International Conference on Emerging Trends in Engineering,
Technology and Science (ICETETS).
[5] Mr. Ambarish G. Mohapatra, Dr. Bright Keswani, “Soil N-P-K Prediction using Location and
Crop Specific Random Forest Classification Technique in Precision Agriculture”, International
Journal of Advanced Research in Computer Science, Volume 8, No. 7, July – August 2017.
[6] Yogesh Gandge, Sandhya, “A Study on Various Data Mining Techniques for Crop Yield
Prediction”, 2017 International Conference on Electrical, Electronics, Communication,
Computer and Optimization Techniques (ICEECCOT).
[7] S.Pudumalar, E.Ramanujam, R.Harine Rajashreeń, C.Kavyań, T.Kiruthikań, J.Nishań, “Crop
Recommendation System for Precision Agriculture”, 2016 IEEE Eighth International
Conference on Advanced Computing (ICoAC).
[8] Aqeel-ur-Rehman, A. Z. Abbasi, N. Islam, Z. A. Shaikh, “A review of wireless sensors and
networks”, Applications in agriculture Comput. Stand. Interfaces, vol. 36, no. 2, pp. 263-270,
Feb. 2014.
[9] Raorane, RV Kulkarni, “Data Mining: An effective tool for yield estimation in the agricultural
sector”, International Journal of Emerging Trends, 2012.
[10] https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-examp
le-code/
[11]
https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-a
lgorithms-934a444fca47
[12]
https://data.gov.in/
Acknowledgements
Download