Soil Profile Based Agricultural System Submitted in partial fulfillment of the requirements of the degree of B. E. Computer Engineering By Ankita Fernandes Roll no.11 PID 152030 Nishita Desai Roll no.53 PID 162259 Samiksha Rahatwal Roll no.67 PID 162252 Guide: Ms. Priya Karunakaran Assistant Professor Department of Computer Engineering St. Francis Institute of Technology (Engineering College) University of Mumbai 2018-2019 CERTIFICATE This is to certify that the project entitled “Soil Profile Based Agricultural System” is a bonafide work of “Ankita Fernandes” (Roll No.11), “Nishita Desai” (Roll No.53),“Samiksha Rahatwal” (Roll No.67) submitted to the University of Mumbai in partial fulfillment of the requirement for the award of the degree of B.E. in Computer Engineering (Name and sign) Guide (Name and sign) Head of Department (Name and sign) Principal Project Report Approval for B.E. This project report entitled Soil Profile Based Agricultural System by Ankita Fernandes, Nishita Desai and Samiksha Rahatwal is approved for the degree of B.E. in Computer Engineering. Examiners 1.--------------------------------------------2.--------------------------------------------- Date: Place: Declaration I declare that this written submission represents my ideas in my own words and where others' ideas or words have been included, I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. ----------------------------------------(Signature) ----------------------------------------(Name of student and Roll No.) Date: Abstract In many regions in India, farmers face problems of crop production due to soil and weather conditions. Owing to illiteracy, farmers might not be able to take advantage of scientific advances made in the field of agriculture and still adhere to traditional practices. This makes obtaining desirable yields difficult. For instance, crop failure can occur due to improper use of fertilizers or undesirable amounts of rainfall. In such situations, an adequate solution could be to choose crops for cultivation that will be well suited with current soil quality and probable expected rainfall during cultivation. Therefore, we introduce the ‘Soil Profile Based Agricultural System’, based on data mining. We will provide a list of crops a farmer can cultivate based on inputted soil attributes (NPK and pH) and predicted rainfall of farmer’s region. In addition to this, it will also suggest fertilizers that can be used to improve soil quality and thus bring more crops under successful cultivation. This Android application will be developed to solve the growing problem of crop failure. Contents Chapter Description INTRODUCTION 1 1.1 Description 1.2 Problem Formulation 1.3 Motivation 1.4 Proposed Solution 1.5 Scope of the project 2 REVIEW OF LITERATURE SYSTEM ANALYSIS 3 3.1 Functional Requirements 3.2 Non Functional Requirements 3.3 Specific Requirements 3.4 Use-Case Diagrams and Description ANALYSIS MODELING 4 5 6 7 4.1 Data Modeling 4.2 Class Diagram 4.3 Functional Modeling 4.4 Timeline Chart DESIGN 5.1 Architectural Design IMPLEMENTATION 6.1 Algorithms/ Methods Used CONCLUSION Page No. List of Figures Fig. No. Figure Caption 1.1 SVM Explained 3.1 Use Case Diagram 4.1.1 Dataset 1: NPK, pH values Dataset 4.1.2 Dataset 1: Rainfall Dataset 4.1.3 Dataset 3: Rainfall range for crops Dataset 4.2 Class Diagram 4.2.1 DFD Level 0 4.2.2 DFD Level 1 5.1.1 Flowchart for User Activity 5.1.2 Flowchart for System Activity Page No. List of Abbreviations Sr. No. Abbreviation Expanded form i NPK Nitrogen, Phosphorus and Potassium ii SVM Support Vector Machine iii KNN K-nearest Neighbour iv ANN Artificial Neural Networks v SVD Singular Value Decomposition Chapter 1 Introduction Agriculture is a primary occupation in India providing a livelihood to 118.6 million cultivators as per the 2011 Census. Understanding the nature of soil, knowing when and where to apply which fertilizers, predicting future weather conditions, maintaining quality of crops, understanding how different factors work differently on different parts of one land are few of the many problems farmers face before as well as while cultivation. There are many factors and statistics that need to be taken into consideration while making important agricultural decisions that could be troublesome to do manually or sometimes even inefficient. 1.1 Description We aim at solving some of these problems through a crop suggestion application based on data mining. Soil reports are mandatory before commencing any cultivation. But, many a times farmers find it difficult to comprehend these reports themselves. Based on weather conditions as well as values from the soil reports, it will suggest crops suitable for cultivation in that particular area. Farmers can also test suitability of different crops with regional rainfall or receive list of fertilizers to bring current soil conditions up to the mark for cultivation of a particular crop. 1.2 Problem Formulation Bringing Data mining technologies into agriculture presents a significant challenge, at the same time this technology contributes effectively in many countries economic and social development. It represents a crucial source of data in need of being wisely managed and analyzed with appropriate methods and tools in order to extract meaningful information. The main purpose of our proposed system is to bring effective data mining architecture based on profiling system which can assist users, to make better decisions by providing them real time data processing, and a dynamic data mining service composition method, to enhance and monitor the agricultural productivity. Thus, improving their traditional decision making process, and allow better management of the natural resources. Being able to know when to apply fertilizers, soil fertility, applying efficient and sustainable techniques to crop production, all of these represent some of the features of this system. By using a dynamic and accurate selection of data mining services, the agricultural actors can exploit data in real time and with appropriate tools. One of the main features of this system is that it will help the farmer understand what fertilizer the soil would require and whether the soil is appropriate for the crop grown in it. We will use SVM algorithm to facilitate classification of inputted soil attributes to give 3 classes of fertility. The same will be applicable to classify predicted rainfall in the ranges of low, medium, high and very high and thus provide a list of suitable crops. 1.3 Motivation This system aims at providing reliable solutions to farmers by the easiest means possible. It will help farmers to better understand effect of soil nutrients on different crop growths. Since rainfall plays a very important role in crop yield, considering this factor along with soil quality will help farmers make smarter decisions. 1.4 Proposed Solution Our system will be an Android Application that could be freely available and easily accessible to farmers. A simple and friendly user interface will achieve proper understanding of the system by the farmers. We work on soil profiling taking in NPK and pH values of soil as input from the farmer. NPK is short for Nitrogen (N), Phosphorus (P) and Potassium (K), which are the three major elements in deciding soil quality and fertility. Nitrogen is primarily responsible for vegetative growth. Nitrogen assimilation into amino acids is the building block for protein in the plant. It is a component of chlorophyll and is required for several enzyme reactions. Phosphorus is a major component in plant DNA and RNA. Phosphorus is also critical in root development, crop maturity and seed production. The role of potassium in the plant is indirect, meaning that it does not make up any plant part. Potassium is required for the activation of over 80 enzymes throughout the plant. Soil pH affects many physical, chemical and biological reactions necessary for crop survival, growth and yield. Nutrient Availability is strongly influenced by pH, thus making it an important factor to monitor. We will use Data mining algorithm like SVM for Initial crop suggestion based on soil attributes and also based on the rainfall for that particular year in the inputted region will provide a final list of crops suitable for the particular soil. 1.5 Scope of the Project Being deployed as a simple Android Application, this system will be useful to all farmers. Since most people living in rural areas possess mobile phones nowadays, a few clicks on their phones can provide them with smarter agricultural solutions. The constraint of this application would be that it would require internet connectivity for accessing it. Chapter 2 Review of Literature In order to design the system we need to plan all aspects of the system. We therefore, need to study various techniques and technologies that can be used to develop our system. This literature survey aims to find out how data mining works, how soil parameters play a vital role in cultivation, in what context data mining can be used in agriculture which components of soil are used and studied and what the result of these studies are. Significance of NPK and pH: We will work on soil profiling taking in NPK and pH values of soil as input from the farmer. NPK is short for Nitrogen (N), Phosphorus (P) and Potassium (K), which are the three major elements in deciding soil quality and fertility. Nitrogen is primarily responsible for vegetative growth. Nitrogen assimilation into amino acids is the building block for protein in the plant. It is a component of chlorophyll and is required for several enzyme reactions. Phosphorus is a major component in plant DNA and RNA. Phosphorus is also critical in root development, crop maturity and seed production. The role of potassium in the plant is indirect, meaning that it does not make up any plant part. Potassium is required for the activation of over 80 enzymes throughout the plant[2]. Soil pH affects many physical, chemical and biological reactions necessary for crop survival, growth and yield. Nutrient Availability is strongly influenced by pH, thus making it an important factor to monitor[3].Data mining algorithms are divided into two broad categories i.e. Supervised and Unsupervised algorithms. For instance, K-nearest neighbor method provides a set of instructions for classification purposes, and hence it belongs to the first group. Here a provided input is classified as one of the classes provided in the training dataset. SVD and K-means on the other hand, are clustering algorithms that group training data into clusters based on similarities between its parameters [13]. The concept of crop and agricultural field monitoring system has existed for a long period. One problem might be that these monitoring systems create huge volumes of data that are difficult to manage causing a huge technology system. This increase impacts the hardware cost required for implementing monitoring system[8]. Agriculture as a business is unique. Crop production is dependent on many climatic, geographical, biological political and economic factors that are mostly independent of one another. This multiple factor introduces risk. The efficient management of these risks is imperative for successful agriculture and consistent output of food[9]. Feedforward backpropagation is one of the common implementations of artificial neural networks. In [4], this type of ANN is used to build a crop suggestion system. Trial and error is undertaken to determine the number of hidden layers. Inputs from the users include NPK, pH as well as depth, temperature and rainfall of the region. The latter three inputs might not be known to the farmer or might differ between planning phase and actual cultivation phase. A random forest algorithm is used in [5] to suggest required level of NPK for soil and thus prevent undernutrition as well as overnutrition of soil. With these results farmers can then choose suitable fertilisers. Inputs to the system are existing NPK values of soil and soil and crop variety. Paper[6] studies the different data mining algorithms that can be used to predict crop yield of different crops taking into consideration climatic parameters like temperature, humidity, rainfall, agronomic parameters like soil, nutrient contents like N, P, K, and pesticides etc. It suggests ID3 decision tree for soyabean, Neural networks for corn, K-NN for wheat and potato, J-48 for rice, etc. Paper [7] uses ensembling technique for crop prediction. Ensemble is a data mining model also known as the Committee Methods or Model Combiners, that combine the power of multiple models to acquire greater prediction, efficiency than any of its models could achieve alone. The predicted output by the different learners with the highest majority is considered as the final classification output. The classification models used are K-NN, Naive Bayes, random tree and CHAID. Support Vector Machine (SVM) is a supervised classification algorithm. Its representation for two valued dataset is as shown in Fig 1.1. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features we have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyperplane that differentiates two classes very well [10]. We will use this algorithm to train our soil attributes dataset. Having 4 attributes, we will obtain a 4-dimensional space. Therefore, through this algorithm we will obtain soil fertility value and suggested suitable crops. Fig. 1.1 SVM explained[11] Chapter 3 System Analysis 3.1 Functional Requirements Following are the functional requirements of the system: 1) Information Gathering: Data extraction from crop recommendation report and online government weather datasets. 2) Data Preprocessing: Converting linguistic values to numeric ones for data processing phase. 3) Data Processing/Data Mining: Acquiring trained models using data mining algorithms. 4) Deployment of System on Android Studio: Creation of Android Application as User interface for the system. 3.2 Non-Functional Requirements Following are the non-functional requirements of the system: 1) Accuracy: The system must be able to suggest crops with high level of accuracy. 2) User Friendly: The application must be user friendly to be effectively used by farmers. 3) Scalability: System must be scalable so as to increase its scope to be used in states in India or even other countries of the world. 3.3 Specific Requirements 1) Hardware Requirements The Hardware Requirements for the proposed system are as follows. a) Android smartphone b) Minimum of 4 GB RAM c) 100 GB Hard Disk Space d) I3 processor or above e) Windows 7 or above 2) Software Requirements The Software Requirements for the proposed system are as follows. a) Android Studio 3.0.1 b) Android SDK c) Python 3.6 d) Java JDK e) XAMPP f) Eclipse g) Apache Tomcat 9 3.4 Use Case Diagram A use case is a methodology used in system analysis to identify, clarify, and organize system requirements. There are three actors which is user and system. The figure 3.4.1 contains the use case for our application. The actors for our application is farmers and the system. Farmer: The farmer has 3 roles: ● Registration ● Login ● Entering the values for NPK and pH as well as region System: The system is initially trained using the values in datasets. Once the farmer enters the values, the system determines the output based on the trained data. Fig. 3.4.1 Use Case Diagram Chapter 4 Analysis Modeling 4.1 Data Modelling Fig. 4.1.1 Dataset 1:NPK, pH Dataset Fig. 4.1.2 Dataset 2: Rainfall Dataset Fig. 4.1.3 Dataset 3: Rainfall range for Crops Dataset We will be working with three datasets as shown in fig 4.1.1, fig. 4.1.2 and fig. 4.1.3. Dataset 1 will give the initial list of suggested crops based on soil attributes. It will also provide fertility of soil as 0, 1, 2 where 0 signifies low fertility and 2 signifies high fertility. The trained model from dataset 2 will then refine this list based on rainfall predicted for given year in farmers region and rainfall required for crops contained in the obtained list as suggested by dataset 3. The dataset-2 includes 36 regions in India along with the rainfall experiences from years 1951 to 2014. The third dataset provides the required rainfall for various crops to get final list of suitable crops after rainfall prediction for a year in a particular region. These datasets have been collected from Indian Government website[12]. 4.2 Class Diagram The class diagram is a static diagram. It represents the static view of an application. Class diagram is not only used for visualizing, describing and documenting different aspects of a system but also for constructing executable code of the software application. The class diagram shows a collection of classes, interfaces, associations, collaborations and constraints. It is also known as a structural diagram. The class diagram describes the attributes and operations of a class and also the constraints imposed on the system. The class diagrams are widely used in the modeling of object oriented systems because they are the only UML diagrams which can be mapped directly with object oriented languages. So the purpose of the class diagram can be summarized as: · Analysis and design of the static view of an application. · Describe responsibilities of a system. · Base for component and deployment diagrams. · Forward and reverse engineering. Fig. 4.2.1 Class Diagram Our application consists of two classes i.e. farmer and crop prediction. Each farmer would have a user ID,user name, password,location as attributes. The farmer can register,login,enter values i.e NPK and pH as well as location and also view results. The crop prediction class will deal with all the prediction attributes and function i.e. training dataset, predicting crops and fertilizers required 4.3 Functional Modelling A data flow diagram (DFD) is a graphical representation of the "flow" of data through an information system, modelling its process aspects. A DFD is often used as a preliminary step to create an overview of the system, which can later be elaborated. A two-dimensional diagram that explains how data is processed and transferred in a system. The graphical depiction identifies each source of data and how it interacts with other data sources to reach a common output. Individuals seeking to draft a data flow diagram must (1) identify external inputs and outputs, (2) determine how the inputs and outputs relate to each other, and (3) explain with graphics how these connections relate and what they result in. This type of diagram helps business development and design teams visualize how data is processed. DFD level 0: The basic process of our system is i.e. the farmer would input the values which would pass through the system and is stored in the database. Once the results are stored the farmer views the results on the application. Fig. 4.3.1 DFD Level 0 DFD Level 1: Level 1 of the DFD consists of details of the system. There are 5 major processes: 1. Training the data using SVM 2. Process the values entered by farmer. 3. Based on these values a list of predicted crops would be presented 4. Along with predicted crops, list of fertilizers is also displayed 5. The results are than stored in the database for future reference. Fig. 4.3.2 DFD Level 1 4.4 Timeline Chart Chapter 5 Design 5.1 Architectural Design Flowcharts are used in designing and documenting complex processes or programs. Like other types of diagrams, they help visualize what is going on and thereby help the people to understand a process, and perhaps also find flaws, bottlenecks, and other less-obvious features within it. There are many different types of flowcharts, and each type has its own repertoire of boxes and notational conventions. The two most common types of boxes in a flowchart are: ● a processing step, usually called activity, and denoted as a rectangular box ● a decision, usually denoted as a diamond. Fig. 5.1.2 Flowchart for System Activity Fig. 5.1.1 Flowchart for User Activity The execution flow of the system is depicted in the figure above. The farmer first registers in the system, he then enters the NPK and pH values as well as region of production. Once he enters the details the list of possible crops are generated. Along with it the list of fertilizers that can be applied are also generated. After that the farmer can view his previous results. Chapter 6 Implementation 6.1 Algorithms/ Methods used Requirements Gathering: In this application we will use training data and a testing data. In training data we will use information about the soil nutrients parameters like nitrogen, phosphorus, potassium, pH value, rainfall and NPK pH,region for the testing data. Input: Training part of the data will be processed in the system and we have to take testing data from the user. In our application user will provide the testing data as region, nitrogen, phosphorus, potassium, pH value, etc. Processing and output: After getting a testing data system will start processing of that data with the training data and after comparing the both data i.e. testing data with training data system will provide the output as the soil fertility and according to soil fertility level predict crops and what fertilizers added can increase the scope of crops to be cultivated. What is Support Vector Machine? “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, perform classification by finding the hyper-plane that differentiate the two classes very well. Support Vectors are simply the coordinates of individual observation. Support Vector Machine is a frontier which best segregates the two classes (hyper-plane/ line). However, Svm is a supervised learning technique. When we have a dataset with features & class labels both then we can use Support Vector Machine. But if in our dataset do not have class labels or outputs of our feature set then it is considered as an unsupervised learning algorithm. In that case, we can use Support Vector Clustering. How Svm classifier Works? For a dataset consisting of features set and labels set, an SVM classifier builds a model to predict classes for new examples. It assigns new example/data points to one of the classes. If there are only 2 classes then it can be called as a Binary SVM Classifier. There are 2 kinds of SVM classifiers: 1. Linear SVM Classifier 2. Non-Linear SVM Classifier Svm Linear Classifier: In the linear classifier model, we assumed that training examples plotted in space. These data points are expected to be separated by an apparent gap. It predicts a straight hyperplane dividing 2 classes. The primary focus while drawing the hyperplane is on maximizing the distance from hyperplane to the nearest data point of either class. The drawn hyperplane called as a maximum-margin hyperplane. SVM Non-Linear Classifier: In the real world, our dataset is generally dispersed up to some extent. To solve this problem separation of data into different classes on the basis of a straight linear hyperplane can’t be considered a good choice. For this Vapnik suggested creating Non-Linear Classifiers by applying the kernel trick to maximum-margin hyperplanes. In Non-Linear SVM Classification, data points plotted in a higher dimensional space. Linear Support Vector Machine Classifier In Linear Classifier, A data point considered as a p-dimensional vector(list of p-numbers) and we separate points using (p-1) dimensional hyperplane. There can be many hyperplanes separating data in a linear order, but the best hyperplane is considered to be the one which maximizes the margin i.e., the distance between hyperplane and closest data point of either class. The Maximum-margin hyperplane is determined by the data points that lie nearest to it. Since we have to maximize the distance between hyperplane and the data points. These data points which influences our hyperplane are known as support vectors. Non-Linear Support Vector Machine Classifier Vapnik proposed Non-Linear Classifiers in 1992. It often happens that our data points are not linearly separable in a p-dimensional(finite) space. To solve this, it was proposed to map p-dimensional space into a much higher dimensional space. We can draw customized/non-linear hyperplanes using Kernel trick. Every kernel holds a non-linear kernel function. This function helps to build a high dimensional feature space. There are many kernels that have been developed. Some standard kernels are: 1. Polynomial (homogeneous) Kernel: The polynomial kernel function can be represented by the above expression. Where k(xi, xj) is a kernel function, xi & xj are vectors of feature space and d is the degree of polynomial function. 2. Polynomial(non-homogeneous) Kernel: In the non-homogeneous kernel, a constant term is also added. The constant term “c” is also known as a free parameter. It influences the combination of features. x & y are vectors of feature space. 3. Radial Basis Function Kernel: It is also known as RBF kernel. It is one of the most popular kernels. For distance metric squared euclidean distance is used here. It is used to draw completely non-linear hyperplanes. where x & x’ are vectors of feature space. is a free parameter. Selection of parameters is a critical choice. Using a typical value of the parameter can lead to overfitting our data. Chapter 7 Conclusion Agriculture is the backbone of India. We are dedicated to create a simple solution by creating a Soil Profile based Agricultural System. Implementing this system can achieve better productivity of crops. The issue of land loss due to over fertilization or under fertilization can be avoided. It may increase the profit margin for farmers by removing unnecessary usage of fertilizers or by avoiding wrong crop production.The main aim of our application is to help farmers make better decisions regarding crop productivity and also help farmers decide on what crops to cultivate based on the rainfall conditions in the region, we will also help farmers decide on what fertilizers to apply so as to avoid the over fertilization or under fertilization problems which would ultimately produce better yield. References [1] Soumaya Lamrhari1, Hamid Elghazi2, Tayeb Sadiki2, and Abdellatif El Faker1,”Profile based big data architecture for agriculture context”, 2nd International Conference on Electrical and Information Technologies ICEIT 2016 [2] https://www.noble.org/news/publications/ag-news-and-views/2007/january/back-to-basics-the-r oles-of-n-p-k-and-their-sources/ [3] https://www.pioneer.com/home/site/us/agronomy/library/managing-soil-pH [4] Giritharan Ravichandran, Koteeshwari R S, “Agricultural Crop Predictor and Advisor using ANN for Smartphones”, 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS). [5] Mr. Ambarish G. Mohapatra, Dr. Bright Keswani, “Soil N-P-K Prediction using Location and Crop Specific Random Forest Classification Technique in Precision Agriculture”, International Journal of Advanced Research in Computer Science, Volume 8, No. 7, July – August 2017. [6] Yogesh Gandge, Sandhya, “A Study on Various Data Mining Techniques for Crop Yield Prediction”, 2017 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT). [7] S.Pudumalar, E.Ramanujam, R.Harine Rajashreeń, C.Kavyań, T.Kiruthikań, J.Nishań, “Crop Recommendation System for Precision Agriculture”, 2016 IEEE Eighth International Conference on Advanced Computing (ICoAC). [8] Aqeel-ur-Rehman, A. Z. Abbasi, N. Islam, Z. A. Shaikh, “A review of wireless sensors and networks”, Applications in agriculture Comput. Stand. Interfaces, vol. 36, no. 2, pp. 263-270, Feb. 2014. [9] Raorane, RV Kulkarni, “Data Mining: An effective tool for yield estimation in the agricultural sector”, International Journal of Emerging Trends, 2012. [10] https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-examp le-code/ [11] https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-a lgorithms-934a444fca47 [12] https://data.gov.in/ Acknowledgements